Define: What is a HashSet?
A
HashSet
holds a set of objects, but in a way that it allows you to easily and quickly determine whether an object is already in the set or not. It does so by internally managing an array and storing the object using an index which is calculated from the hashcode of the object. Take a look hereHashSet
is an unordered collection containing unique elements. It has the standard collection operations Add, Remove, Contains, but since it uses a hash-based implementation, these operations are O(1). (As opposed to List for example, which is O(n) for Contains and Remove.)HashSet
also provides standard set operations such as union, intersection, and symmetric difference. Take a look here
There are different implementations of Sets. Some make insertion and lookup operations super fast by hashing elements. However, that means that the order in which the elements were added is lost. Other implementations preserve the added order at the cost of slower running times.
The HashSet
class in C# goes for the first approach, thus not preserving the order of elements. It is much faster than a regular List
. Some basic benchmarks showed that HashSet is decently faster when dealing with primary types (int, double, bool, etc.). It is a lot faster when working with class objects. So that point is that HashSet is fast.
The only catch of HashSet
is that there is no access by indices. To access elements you can either use an enumerator or use the built-in function to convert the HashSet
into a List
and iterate through that. Take a look here
A HashSet
has an internal structure (hash), where items can be searched and identified quickly. The downside is that iterating through a HashSet
(or getting an item by index) is rather slow.
So why would someone want be able to know if an entry already exists in a set?
One situation where a HashSet
is useful is in getting distinct values from a list where duplicates may exist. Once an item is added to the HashSet
it is quick to determine if the item exists (Contains
operator).
Other advantages of the HashSet
are the Set operations: IntersectWith
, IsSubsetOf
, IsSupersetOf
, Overlaps
, SymmetricExceptWith
, UnionWith
.
If you are familiar with the object constraint language then you will identify these set operations. You will also see that it is one step closer to an implementation of executable UML.