What's the difference between HashSet and Set?
A Set
represents a generic "set of values". A TreeSet
is a set where the elements are sorted (and thus ordered), a HashSet
is a set where the elements are not sorted or ordered.
A HashSet
is typically a lot faster than a TreeSet
.
A TreeSet
is typically implemented as a red-black tree (See http://en.wikipedia.org/wiki/Red-black_tree - I've not validated the actual implementation of sun/oracle's TreeSet
), whereas a HashSet
uses Object.hashCode()
to create an index in an array. Access time for a red-black tree is O(log(n))
whereas access time for a HashSet
ranges from constant-time to the worst case (every item has the same hashCode) where you can have a linear search time O(n)
.
The HashSet
is an implementation of a Set
.
Set is a collection that contains no duplicate elements. Set is an interface.
HashSet implements the Set
interface, backed by a hash table (actually a HashMap
instance).
Since HashSet
is one of the specific implementations of Set
interface.
ASet
can be any of following since it was implemented by below classes
ConcurrentSkipListSet : A scalable concurrent NavigableSet implementation based on a ConcurrentSkipListMap
. The elements of the set are kept sorted according to their natural ordering, or by a Comparator
provided at set creation time, depending on which constructor is used.
CopyOnWriteArraySet : A Set that uses an internal CopyOnWriteArrayList for all of its operations.
EnumSet : A specialized Set implementation for use with enum types. All of the elements in an enum set must come from a single enum type that is specified, explicitly or implicitly, when the set is created.
TreeSet :A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
LinkedHashSet: ash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries.
But HashSet
can be only LinkedHashSet
since LinkedHashSet
subclasses HashSet
The question has been answered, but I haven't seen the answer to why the code mentions both types in the same code.
Typically, you want to code against interfaces which in this case is Set. Why? Because if you reference your object through interfaces always (except the new HashSet()) then it is trivial to change the implementation of the object later if you find it would be better to do so because you've only mentioned it once in your code base (where you did new HashSet()).
Set is the general interface to a set-like collection, while HashSet is a specific implementation of the Set interface (which uses hash codes, hence the name).