Skip to main content

Immutable Sasa.Collections.Tree vs. System.Collections.Dictionary vs. C5 HashDictionary

I've previously posted about Sasa's hash-array mapped trie, but I never posted any benchmarks. I recently came across this post on Stackoverflow which provided a decent basic benchmark between .NET's default Dictionary<TKey, TValue>, the C5 collection's hash dictionary, F#'s immutable map, and .NET's new immutable collections.

I slightly modified the file to remove the bench against the F# map and the new immutable collections since I'm still using VS 2010, and I added a simple warmup phase to ensure the methods have all been JIT compiled and the GC run to avoid introducing noise:

static void Warmup()
{
    var x = Tree.Make<string, object>();
    var y = new C5.HashDictionary<string, object>();
    var z = new Dictionary<string, object>();
    z.Add("foo", "bar");
    for (var i = 0; i < 100; ++i)
    {
        x = x.Add("foo" + i, "bar");
        y.Add("foo" + i, "bar");
        z.Add("foo" + i, "bar");
        var tmp1 = x["foo" + i];
        var tmp2 = y["foo" + i];
        var tmp3 = z["foo" + i];
    }
    x = default(Tree<string, object>);
    y = null;
    z = null;
    GC.Collect();
}

The results are still somewhat representative. This is a sample of an average output, where "Imm" is Sasa's immutable HAMT:

# - 100
SCGD -          0 MS -         25 Ticks
C5   -          0 MS -        887 Ticks
Imm  -          0 MS -        387 Ticks

# - 1000
SCGD -          0 MS -        257 Ticks
C5   -          0 MS -        294 Ticks
Imm  -          0 MS -        368 Ticks

# - 10000
SCGD -          1 MS -       4084 Ticks
C5   -          1 MS -       5182 Ticks
Imm  -          1 MS -       5436 Ticks

# - 100000
SCGD -         28 MS -      85742 Ticks
C5   -         32 MS -      99280 Ticks
Imm  -         32 MS -      97720 Ticks

Observations:

  1. C5's standard deviation was somewhat wider than both Sasa's HAMT and SCGD, so it's performance seems slightly less predictable
  2. Sasa's immutable HAMT appears to perform within 5% of the mutable C5 collection at all collection sizes
  3. Sasa's immutable HAMT appears to perform within 15% of the mutable SCGD for large collections where the hash table with higher load factors
  4. Small collections requiring a small load factor clearly advantage the mutable SCGD by up to an order of magnitude, an advantage not shared by C5 for some reason (possibly they maintain a higher load factor)
  5. C5's terrible performance on very small collections of 100 items was consistent on every test run, again possibly because they maintain a high load factor before resizing
  6. Sasa's HAMT takes just as much time to load 1000 items as it takes to load 100 items; this was consistent across every test run, and it's not clear why

Finally, while not exactly apples-to-apples, Sasa's HAMT is easily 3-4× faster than F#'s map given the numbers cited in the above Stackoverflow post. F# still has an advantage for very small collections though. Sasa's HAMT also appears to be at least 2× faster than the new immutable collections.

Also keep in mind that this benchmark only tests lookup performance. F#'s map would have an advantage over Sasa's HAMT in load performance because the HAMT does not yet include a "bulk-load" operation, which the F# map does appear to support.

Comments

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre

Building a Query DSL in C#

I recently built a REST API prototype where one of the endpoints accepted a string representing a filter to apply to a set of results. For instance, for entities with named properties "Foo" and "Bar", a string like "(Foo = 'some string') or (Bar > 99)" would filter out the results where either Bar is less than or equal to 99, or Foo is not "some string". This would translate pretty straightforwardly into a SQL query, but as a masochist I was set on using Google Datastore as the backend, which unfortunately has a limited filtering API : It does not support disjunctions, ie. "OR" clauses. It does not support filtering using inequalities on more than one property. It does not support a not-equal operation. So in this post, I will describe the design which achieves the following goals: A backend-agnostic querying API supporting arbitrary clauses, conjunctions ("AND"), and disjunctions ("OR"). Implemen

Easy Automatic Differentiation in C#

I've recently been researching optimization and automatic differentiation (AD) , and decided to take a crack at distilling its essence in C#. Note that automatic differentiation (AD) is different than numerical differentiation . Math.NET already provides excellent support for numerical differentiation . C# doesn't seem to have many options for automatic differentiation, consisting mainly of an F# library with an interop layer, or paid libraries . Neither of these are suitable for learning how AD works. So here's a simple C# implementation of AD that relies on only two things: C#'s operator overloading, and arrays to represent the derivatives, which I think makes it pretty easy to understand. It's not particularly efficient, but it's simple! See the "Optimizations" section at the end if you want a very efficient specialization of this technique. What is Automatic Differentiation? Simply put, automatic differentiation is a technique for calcu