Hash Set and Array List performances

My experiment shows that HashSet is faster than an ArrayList starting at collections of 3 elements inclusively.

A complete results table

| Boost  |  Collection Size  |
|  2x    |       3 elements  |
|  3x    |      10 elements  |
|  6x    |      50 elements  |
|  12x   |     200 elements  |  <= proportion 532-12 vs 10.000-200 elements
|  532x  |  10.000 elements  |  <= shows linear lookup growth for the ArrayList

They're completely different classes, so the question is: what kind of behaviour do you want?

HashSet ensures there are no duplicates, gives you an O(1) contains() method but doesn't preserve order.
ArrayList doesn't ensure there are no duplicates, contains() is O(n) but you can control the order of the entries.

I believe using the hash set has a better performance than an array list. Am I correct in stating that?

With many (whatever it means) entries, yes. With small data sizes, raw linear search could be faster than hashing, though. Where exactly the break-even is, you have to just measure. My gut feeling is that with fewer than 10 elements, linear look-up is probably faster; with more than 100 elements hashing is probably faster, but that's just my feeling...

Lookup from a HashSet is constant time, O(1), provided that the hashCode implementation of the elements is sane. Linear look-up from a list is linear time, O(n).

Hash Set and Array List performances

Related

Recent Posts