In a switch vs dictionary for a value of Func, which is faster and why?
Suppose there is the following code:
private static int DoSwitch(string arg)
{
switch (arg)
{
case "a": return 0;
case "b": return 1;
case "c": return 2;
case "d": return 3;
}
return -1;
}
private static Dictionary<string, Func<int>> dict = new Dictionary<string, Func<int>>
{
{"a", () => 0 },
{"b", () => 1 },
{"c", () => 2 },
{"d", () => 3 },
};
private static int DoDictionary(string arg)
{
return dict[arg]();
}
By iterating over both methods and comparing, I am getting that the dictionary is slightly faster, even when "a", "b", "c", "d" is expanded to include more keys. Why is this so?
Does this have to do with cyclomatic complexity? Is is because the jitter compiles the return statements in the dictionary to native code only once? Is it because the lookup of the dictionary is O(1), which may not be the case for a switch statement? (These are just guesses)
The short answer is that the switch statement executes linearly, while the dictionary executes logarithmically.
At the IL level, a small switch statement is usually implemented as a series of if-elseif statements comparing equality of the switched variable and each case. So, this statement will execute in a time linearly proportional to the number of valid options for myVar; the cases will be compared in the order they appear, and the worst-case scenario is that all the comparisons are tried and either the last one matches or none do. So, with 32 options, the worst case is that it's none of them, and the code will have made 32 comparisons to determine this.
A Dictionary, on the other hand, uses an index-optimized collection to store values. In .NET, a Dictionary is based on a Hashtable, which has effectively constant access time (the disadvantage being extremely poor space efficiency). Other options commonly used for "mapping" collections like Dictionaries include balanced tree structures like red-black trees, which provide logarithmic access (and linear space efficiency). Any of these will allow the code to find the key corresponding to the proper "case" in the collection (or determine it does not exist) much faster than a switch statement can do the same.
EDIT: Other answers and commenters have touched on this, so in the interest of completeness I will as well. The Microsoft compiler does not always compile a switch to an if/elseif as I inferred originally. It typically does so with small numbers of cases, and/or with "sparse" cases (non-incremental values, like 1, 200, 4000). With larger sets of adjacent cases, the compiler will convert the switch into a "jump table" using a CIL statement. With large sets of sparse cases, the compiler can implement a binary search to narrow the field, and then "fall through" a small number of sparse cases or implement a jump table for adjacent cases.
However, the compiler will typically choose the implementation that is the best compromise of performance and space efficiency, so it will only use a jump table for a large number of densely-packed cases. This is because a jump table requires a space in memory on the order of the range of cases it must cover, which for sparse cases is terribly inefficient memory-wise. By using a Dictionary in source code, you basically force the compiler's hand; it will do it your way, instead of compromising on performance to gain memory efficiency.
So, I would expect most cases in which either a switch statement or a Dictionary could be used in source to perform better when using a Dictionary. Large numbers of cases in switch statements are to be avoided anyway, as they are less maintainable.
This is a good example of why micro-benchmarks can be misleading. The C# compiler generates different IL depending on the size of the switch/case. So switching on a string like this
switch (text)
{
case "a": Console.WriteLine("A"); break;
case "b": Console.WriteLine("B"); break;
case "c": Console.WriteLine("C"); break;
case "d": Console.WriteLine("D"); break;
default: Console.WriteLine("def"); break;
}
produce IL that essentially does the following for each case:
L_0009: ldloc.1
L_000a: ldstr "a"
L_000f: call bool [mscorlib]System.String::op_Equality(string, string)
L_0014: brtrue.s L_003f
and later
L_003f: ldstr "A"
L_0044: call void [mscorlib]System.Console::WriteLine(string)
L_0049: ret
I.e. it's a series of comparisons. So run time is linear.
However, adding additional cases, e.g. to include all the letters from a-z, changes the IL generated to something like this for each:
L_0020: ldstr "a"
L_0025: ldc.i4.0
L_0026: call instance void [mscorlib]System.Collections.Generic.Dictionary`2<string, int32>::Add(!0, !1)
and
L_0176: ldloc.1
L_0177: ldloca.s CS$0$0001
L_0179: call instance bool [mscorlib]System.Collections.Generic.Dictionary`2<string, int32>::TryGetValue(!0, !1&)
L_017e: brfalse L_0314
and finally
L_01f6: ldstr "A"
L_01fb: call void [mscorlib]System.Console::WriteLine(string)
L_0200: ret
I.e. it now uses a dictionary instead of a series of string compares, and thus gets the performance of a dictionary.
In other words the IL code generated for these are different and this is just at the IL level. The JIT compiler may optimize further.
TL;DR: So the morale of the story is to look at real data and profile instead of trying to optimize based on micro-benchmarks.
As with many questions involving compiler codegen decisions, the answer is "it depends".
Building your own hash table will probably run faster than compiler generated code in many cases because the compiler has other cost metrics it is trying to balance that you are not: primarily, memory consumption.
A hash table will use more memory than a handful of if-then-else IL instructions. If the compiler spit out a hash table for every switch statement in a program, memory use would explode.
As the number of case blocks in the switch statement grows, you will probably see the compiler produce different code. With more cases, there is greater justification for the compiler to abandon small and simple if-then-else patterns in favor of faster but fatter alternatives.
I don't know offhand if the C# or JIT compilers perform this particular optimization, but a common compiler trick for switch statements when the case selectors are many and mostly sequential is to compute a jump vector. This requires more memory (in the form of compiler generated jump tables embedded in the code stream) but executes in constant time. Subtract arg - "a", use result as index into jump table to jump to appropriate case block. Boom, yer done, regardless of whether there are 20 or 2000 cases.
A compiler is more likely to shift into jump table mode when the switch selector type is char or int or enum and the values of the case selectors are mostly sequential ("dense"), since these types can be easily subtracted to create an offset or index. String selectors are a little harder.
String selectors are "interned" by the C# compiler, meaning the compiler adds the string selectors values to an internal pool of unique strings. The address or token of an interned string can be used as its identity, allowing for int-like optimizations when comparing intern'd strings for identity / byte-wise equality. With sufficient case selectors, the C# compiler will produce IL code that looks up the intern'd equivalent of the arg string (hash table lookup), then compares (or jump tables) the interned token with the precomputed case selector tokens.
If you can coax the compiler into producing jump-table code in the char/int/enum selector case, this can execute faster than using your own hash table.
For the string selector case, the IL code still has to do a hash lookup so any performance difference from using your own hash table is probably a wash.
In general, though, you shouldn't dwell on these compiler nuances too much when writing application code. Switch statements are generally much easier to read and understand than a hash table of function pointers. Switch statements that are big enough to push the compiler into jump table mode are often too big to be humanly readable.
If you find a switch statement is in a performance hotspot of your code, and you have measured with a profiler that it has tangible performance impact, then changing your code to use your own dictionary is a reasonable tradeoff for a performance gain.
Writing your code to use a hash table from the start, with no performance measurements to justify this choice, is over-engineering that will lead to unfathomable code with an unnecessarily higher maintenance cost. Keep It Simple.