Higher Logics

Posts

Showing posts from 2010

Sasa v0.9.3 Released!

I recently realized that it's been over a year since I last put out a stable Sasa release . Sasa is in production use in a number of applications, but the stable releases on Sourceforge have lagged somewhat, and a number of fixes and enhancements have been added since v0.9.2. So I decided to simply exclude the experimental and broken abstractions and push out a new release so others could benefit from everything Sasa v0.9.3 has to offer. The changelog contains a full list of changes, too numerous to count. I'll list here a few of the highlights. IL Rewriter C# unfortunately forbids certain types from being used as generic type constraints, even though these constraints are available to CIL. For instance, the following is legal CIL but illegal in C#: public void Foo<T>(T value) where T : Delegate { ... } Sasa now provides a solution for this using its ilrewrite tool. The above simply uses the Sasa.TypeConstraint<T> in the type constraint and the rewriter will

Abstracting Computation: Arrows in C#

I just committed an implementation of Arrows to my open source Sasa library . It's in its own dll and namespace, Sasa.Arrow, so it doesn't pollute the other production quality code. The implementation is pretty straightforward, and it also supports C#'s monadic query pattern, also known as LINQ. It basically boils down to implementing combinators on delegates, like Func<T, R>. It's not possible to implement the query pattern as extension methods for Func<T, R> because type inference fails for even the simplest of cases. So instead I wrapped Func<T, R> in a struct Arrow<T, R>, and implemented the query pattern as instance methods instead of extension methods. This removes a number of explicit type parameters that the inference engine struggles with, and type inference now succeeds. Of course, type inference still fails when calling Arrow.Return() on a static method, but this is a common and annoying failure of C#'s type inference [1]. What i

Factoring Out Common Patterns in Libraries

A well designed core library is essential for building concise, maintainable programs in any programming language. There are many common, recurrent patterns when writing code, and ideally, these recurring uses should be factored into their own abstractions that are distributed as widely as possible throughout the core library. Consider for instance, an "encapsulated value" pattern: /// <summary> /// A read-only reference to a value. /// </summary> /// <typeparam name="T">The type of the encapsulated value.</typeparam> public interface IValue<T> { /// <summary> /// A read-only reference to a value. /// </summary> T Value { get; } } This shows up everywhere, like Nullable<T> , Lazy<T> , and Task<T> (property called "Result"), IEnumerator<T> (property called "Current"), and many many more. However, the common interface of a value encapsulated in an object has not been factored o

High-level constructs for low-level C: exception handling, RAII, sum types and pattern matching

There are numerous high-level abstractions available in other languages that simply make programming easier and less error prone. For instance, automatic memory management, pattern matching, exceptions, higher order functions, and so on. Each of these features enable the developer to reason about program behaviour at a higher level, and factor out common behaviour into separate but composable units. For fun, I've create a few small macro headers that enable some of these patterns in pure C. If anyone sees any portability issues, please let me know! libex: Exception Handling and RAII RAII in C is definitely possible via a well-known pattern used everywhere in the Linux kernel . It's a great way to organize code, but the program logic and finalization and error logic are not syntactically apparent. You have to interpret the surrounding context to identify the error conditions, and when and how finalization is triggered. To address this, I encapsulated this RAII pattern in a macro

CLR: Verification for Runtime Code Generation

The CLR's lightweight code generation via DynamicMethod is pretty useful, but it's sometimes difficult to debug the generated code and ensure that it verifies. In order to verify generated code, you must save the dynamic assembly to disk and run the peverify.exe tool on it, but DynamicMethod does not have any means to do so. In order to save the assembly, there's a more laborious process of creating dynamic assemblies, modules and types, and then finally adding a method to said type. This is further complicated by the fact that a MethodBuilder and DynamicMethod don't share any common interfaces or base types for generating IL, despite both of them supporting a GetILGenerator() method . This difficulty in switching between saved codegen and pure runtime codegen led me to add a CodeGen class to Sasa, which can generate code for either case based on a bool parameter. Since no common interface is available for code generation, it also accepts a delegate to which it dispa

Peirce's Criterion: command-line tool

I've written a simple command-line tool filter out statistical outliers using the rigourous Peirce's Criterion . The algorithm has been available in Sasa for awhile, and will be in the forthcoming v0.9.3 release. I've also packaged the command-line tool binary for running Peirce's Criterion over multi-column CSV files (LGPL source available here ).

The Cost of Type.GetType()

Most framework-style software spends an appreciable amount of time dynamically loading code. Some of this code is executed quite frequently. I've recently been working on a web framework where URLs map to type names and methods, so I've been digging into these sort of patterns a great deal lately. The canonical means to map a type name to a System.Type instance is via System.Type.GetType(string). In a framework which performs a significant number of these lookups, it's not clear what sort of performance characteristics one can expect from this static framework function. Here's the source for a simple test pitting Type.GetType() against a cache backed by a Dictionary<string, Type>. All tests were run on a Core 2 Duo 2.2 GHz, .NET CLR 3.5, and all numbers indicate the elapsed CPU ticks. Type.GetType() Dictionary<string, Type> 6236070640 51351056 6236193856 51440360 6237466224 51463192 6238210488 51583336 6240645816 51599480 6242089400 51687448 6244450392 5171

LINQ Transpose Extension Method

I had recent need for a transpose operation which could swap the columns and rows of a nested IEnumerable sequence, it's simple enough to express in LINQ but after a quick search, all the solutions posted online are rather ugly. Here's a concise and elegant version expressed using LINQ query syntax: /// <summary> /// Swaps the rows and columns of a nested sequence. /// </summary> /// <typeparam name="T">The type of elements in the sequence.</typeparam> /// <param name="source">The source sequence.</param> /// <returns>A sequence whose rows and columns are swapped.</returns> public static IEnumerable<IEnumerable<T>> Transpose<T>( this IEnumerable<IEnumerable<T>> source) { return from row in source from col in row.Select( (x, i) => new KeyValuePair<int, T>(i, x)) group col.Value by col.Key into c select c as IEnume

Asymmetries in CIL

Most abstractions of interest have a natural dual, for instance, IEnumerable and IObservable, induction and co-induction, algebra and co-algebra, objects and algebraic data types, message-passing and pattern matching, etc. Programs are more concise and simpler when using the proper abstraction, be that some abstraction X or its dual. For instance, reactive programs written using the pull-based processing semantics of IEnumerable are far more unwieldy than those using the natural push-based semantics of IObsevable. As a further example, symbolic problems are more naturally expressed via pattern matching than via message-passing. This implies that any system is most useful when we ensure that every abstraction provided is accompanied by its dual. This also applies to virtual machine instruction sets like CIL, as I recently discovered while refining my safe reflection abstraction for the CLR. A CIL instruction stream embeds some necessary metadata required for the CLR's correct opera

CIL Verification and Safety

I've lamented here and elsewhere some unfortunate inconveniences and asymmetries in the CLR -- for example, we have nullable structs but lack non-nullable reference types, an issue I address in my Sasa class library . I've recently completed some Sasa abstractions for safe reflection, and an IL rewriter based on Mono.Cecil which allows C# source code to specify type constraints that are supported by the CLR but unnecessarily restricted in C#. In the process, I came across another unjustified decision regarding verification: the jmp instruction . The jmp instruction strikes me as potentially incredibly useful for alternative dispatch techniques, and yet I recently discovered that it's classified as unverifiable. This seems very odd, since the instruction is fully statically typed, and I can't think of a way its use could corrupt the VM. In short, the instruction performs a control transfer to a named method with a signature matching exactly the current method's si