Why can't I preallocate a hashset<T>

Solution 1:

Answer below was written in 2011. It's now in .NET 4.7.2 and .NET Core 2.0; it will be in .NET Standard 2.1.


There's no technical reason why this shouldn't be possible - Microsoft just hasn't chosen to expose a constructor with an initial capacity.

If you can call a constructor which takes an IEnumerable<T> and use an implementation of ICollection<T>, I believe that will use the size of the collection as the initial minimum capacity. This is an implementation detail, mind you. The capacity only has to be large enough to store all the distinct elements...

EDIT: I believe that if the capacity turns out to be way larger than it needs to be, the constructor will trim the excess when it's finished finding out how many distinct elements there really are.

Anyway, if you have the collection you're going to add to the HashSet<T> and it implements ICollection<T>, then passing it to the constructor instead of adding the elements one by one is going to be a win, basically :)

EDIT: One workaround would be to use a Dictionary<TKey, TValue> instead of a HashSet<T>, and just not use the values. That won't work in all cases though, as it won't give you the same interface as HashSet<T>.

Solution 2:

The answer by Jon Skeet is almost a complete one. To solve this problem with HashSet<int> I had to do the following:

public class ClassUsingHashSet
{
    private static readonly List<int> PreallocationList
        = Enumerable.Range(0, 10000).ToList();

    public ClassUsingHashSet()
    {
        this.hashSet = new HashSet<int>(PreallocationList);
        this.hashSet.Clear();
    }

    public void Add(int item)
    {
        this.hashSet.Add(item);
    }

    private HashSet<int> hashSet;
}

This trick works because after Clear the HashSet is not trimmed, as described in the documentation:

The capacity remains unchanged until a call to TrimExcess is made.

Solution 3:

I'm using this code to set initial capacity for HashSet. You can use it as extension or directly

public static class HashSetExtensions
{
    private const BindingFlags Flags = BindingFlags.Instance | BindingFlags.NonPublic;
    public static HashSet<T> SetCapacity<T>(this HashSet<T> hs, int capacity)
    {
        var initialize = hs.GetType().GetMethod("Initialize", Flags);
        initialize.Invoke(hs, new object[] { capacity });
        return hs;
    }

    public static HashSet<T> GetHashSet<T>(int capacity)
    {
        return new HashSet<T>().SetCapacity(capacity);
    }
}

upd. 04 jule

This code may be also enhanced by using reflection caching. Here we go:

public static class HashSetExtensions
{
    private static class HashSetDelegateHolder<T>
    {
        private const BindingFlags Flags = BindingFlags.Instance | BindingFlags.NonPublic;
        public static MethodInfo InitializeMethod { get; } = typeof(HashSet<T>).GetMethod("Initialize", Flags);
    }

    public static void SetCapacity<T>(this HashSet<T> hs, int capacity)
    {
        HashSetDelegateHolder<T>.InitializeMethod.Invoke(hs, new object[] { capacity });
    }

    public static HashSet<T> GetHashSet<T>(int capacity)
    {
        var hashSet = new HashSet<T>();
        hashSet.SetCapacity(capacity);
        return hashSet;
    }
}