Tuples( or arrays ) as Dictionary keys in C#
I am trying to make a Dictionary lookup table in C#. I need to resolve a 3-tuple of values to one string. I tried using arrays as keys, but that did not work, and I don't know what else to do. At this point I am considering making a Dictionary of Dictionaries of Dictionaries, but that would probably not be very pretty to look at, though it is how I would do it in javascript.
If you are on .NET 4.0 use a Tuple:
lookup = new Dictionary<Tuple<TypeA, TypeB, TypeC>, string>();
If not you can define a Tuple
and use that as the key. The Tuple needs to override GetHashCode
, Equals
and IEquatable
:
struct Tuple<T, U, W> : IEquatable<Tuple<T,U,W>>
{
readonly T first;
readonly U second;
readonly W third;
public Tuple(T first, U second, W third)
{
this.first = first;
this.second = second;
this.third = third;
}
public T First { get { return first; } }
public U Second { get { return second; } }
public W Third { get { return third; } }
public override int GetHashCode()
{
return first.GetHashCode() ^ second.GetHashCode() ^ third.GetHashCode();
}
public override bool Equals(object obj)
{
if (obj == null || GetType() != obj.GetType())
{
return false;
}
return Equals((Tuple<T, U, W>)obj);
}
public bool Equals(Tuple<T, U, W> other)
{
return other.first.Equals(first) && other.second.Equals(second) && other.third.Equals(third);
}
}
If you're on C# 7, you should consider using value tuples as your composite key. Value tuples typically offer better performance than the traditional reference tuples (Tuple<T1, …>
) since value tuples are value types (structs), not reference types, so they avoid the memory allocation and garbage collection costs. Also, they offer conciser and more intuitive syntax, allowing for their fields to be named if you so wish. They also implement the IEquatable<T>
interface needed for the dictionary.
var dict = new Dictionary<(int PersonId, int LocationId, int SubjectId), string>();
dict.Add((3, 6, 9), "ABC");
dict.Add((PersonId: 4, LocationId: 9, SubjectId: 10), "XYZ");
var personIds = dict.Keys.Select(k => k.PersonId).Distinct().ToList();
Between tuple and nested dictionaries based approaches, it's almost always better to go for tuple based.
From maintainability point of view,
-
its much easier to implement a functionality that looks like:
var myDict = new Dictionary<Tuple<TypeA, TypeB, TypeC>, string>();
than
var myDict = new Dictionary<TypeA, Dictionary<TypeB, Dictionary<TypeC, string>>>();
from the callee side. In the second case each addition, lookup, removal etc require action on more than one dictionary.
Furthermore, if your composite key require one more (or less) field in future, you will need to change code a significant lot in the second case (nested dictionary) since you have to add further nested dictionaries and subsequent checks.
From performance perspective, the best conclusion you can reach is by measuring it yourself. But there are a few theoretical limitations which you can consider beforehand:
In the nested dictionary case, having an additional dictionary for every keys (outer and inner) will have some memory overhead (more than what creating a tuple would have).
In the nested dictionary case, every basic action like addition, updation, lookup, removal etc need to be carried out in two dictionaries. Now there is a case where nested dictionary approach can be faster, i.e., when the data being looked up is absent, since the intermediate dictionaries can bypass the full hash code computation & comparison, but then again it should be timed to be sure. In presence of data, it should be slower since lookups should be performed twice (or thrice depending on nesting).
Regarding tuple approach, .NET tuples are not the most performant when they're meant to be used as keys in sets since its
Equals
andGetHashCode
implementation causes boxing for value types.
I would go with tuple based dictionary, but if I want more performance, I would use my own tuple with better implementation.
On a side note, few cosmetics can make the dictionary cool:
-
Indexer style calls can be a lot cleaner and intuitive. For eg,
string foo = dict[a, b, c]; //lookup dict[a, b, c] = ""; //update/insertion
So expose necessary indexers in your dictionary class which internally handles the insertions and lookups.
-
Also, implement a suitable
IEnumerable
interface and provide anAdd(TypeA, TypeB, TypeC, string)
method which would give you collection initializer syntax, like:new MultiKeyDictionary<TypeA, TypeB, TypeC, string> { { a, b, c, null }, ... };
The good, clean, fast, easy and readable ways is:
- generate equality members (Equals() and GetHashCode()) method for the current type. Tools like ReSharper not only creates the methods, but also generates the necessary code for an equality check and/or for calculating hash code. The generated code will be more optimal than Tuple realization.
- just make a simple key class derived from a tuple.
add something similar like this:
public sealed class myKey : Tuple<TypeA, TypeB, TypeC>
{
public myKey(TypeA dataA, TypeB dataB, TypeC dataC) : base (dataA, dataB, dataC) { }
public TypeA DataA => Item1;
public TypeB DataB => Item2;
public TypeC DataC => Item3;
}
So you can use it with dictionary:
var myDictinaryData = new Dictionary<myKey, string>()
{
{new myKey(1, 2, 3), "data123"},
{new myKey(4, 5, 6), "data456"},
{new myKey(7, 8, 9), "data789"}
};
- You also can use it in contracts
- as a key for join or groupings in linq
- going this way you never ever mistype order of Item1, Item2, Item3 ...
- you no need to remember or look into to code to understand where to go to get something
- no need to override IStructuralEquatable, IStructuralComparable, IComparable, ITuple they all alredy here
If for some reason you really want to avoid creating your own Tuple class, or using on built into .NET 4.0, there is one other approach possible; you can combine the three key values together into a single value.
For example, if the three values are integer types together not taking more than 64 bits, you could combine them into a ulong
.
Worst-case you can always use a string, as long as you make sure the three components in it are delimited with some character or sequence that does not occur inside the components of the key, for example, with three numbers you could try:
string.Format("{0}#{1}#{2}", key1, key2, key3)
There is obviously some composition overhead in this approach, but depending on what you are using it for this may be trivial enough not to care about it.