HashSet allows duplicate item insertion - C#

This kind of seems like a noob question, but I could not find an answer for this question specifically.

I have this class:

public class Quotes{ 
    public string symbol; 
    public string extension
}

And am using this:

HashSet<Quotes> values = new HashSet<Quotes>();

However I am able to add the same Quotes object multiple times. For example, my Quotes object may have 'symbol' equal to 'A' and 'extension' equal to '=n', and this Quotes object appears multiple times in the HashSet (viewing Hashset through debug mode). I had thought that when calling

values.Add(new Quotes(symb, ext));

with the same symb and ext, 'false' would be returned and the element would not be added. I have a feeling it has something to do with comparing Quotes objects when the HashSet is adding a new object. Any help would be greatly appreciated!


Solution 1:

I'm guessing that you are creating a new Quotes with the same values. In this case they are not equal. If they should be considered equal, override the Equals and GetHashCode methods.

public class Quotes{ 
    public string symbol; 
    public string extension

    public override bool Equals(object obj)
    {
        Quotes q = obj as Quotes;
        return q != null && q.symbol == this.symbol && q.extension == this.Extension;
    }

    public override int GetHashCode()
    {
        return this.symbol.GetHashCode() ^ this.extension.GetHashCode();
    }
}

Solution 2:

I had thought that when calling values.Add(new Quotes(symb, ext)); with the same symb and ext, 'false' would be returned and the element would not be added.

This is not the case.

HashSet will use GetHashCode and Equals to determine equality of your objects. Right now, since you're not overriding these methods in Quotes, the default System.Object's reference equality will be used. Each time you add a new Quote, it's a unique object instance, so the HashSet sees it as a unique object.

If you override Object.Equals and Object.GetHashCode, it will work as you expect.

Solution 3:

HashSets first compare entries based on their hash which is calculated by GetHashCode.
The default implementation returns a hashcode based on the object itself (differs between each instance).

Only if the hashes are the same (very improbable for hashes based on instances), the Equals method is called and used to definitely compare two objects.

You have to options:

  • Change Quotes to a struct
  • Override GetHashCode and Equals in Quotes

Example:

 public override int GetHashCode()
 {
    return (this.symbol == null ? 0 : this.symbol.GetHashCode())
       ^ (this.extension == null ? 0 : this.extension.GetHashCode());
 }
 public override bool Equals(object obj)
 {
    if (Object.ReferenceEquals(this, obj))
      return true;

    Quotes other = obj as Quotes;
    if (Object.ReferenceEquals(other, null))
      return false;

    return String.Equals(obj.symbol, this.symbol)
        && String.Equals(obj.extension, this.extension);
 }

Solution 4:

Just wanted to fix something in Kendall's answer (can't comment for some strange reason).

return this.symbol.GetHashCode() ^ this.extension.GetHashCode();

Note that the xor function is an exceptionally collision-prone way of combining two hashes, especially when they are both of the same type (since every object where symbol == extension will hash into 0). Even when they are not of the same type or are unlikely to be equal to each other, this is bad practice, and getting used to it might cause problems in different appliances.

Instead, multiply one hash with a small prime number, and add the second, e.g:

return 3 * this.symbol.GetHashCode() + this.extension.GetHashCode();

Solution 5:

I know this is kinda late, but I ran into the same issue and found an unacceptable performance hit while implementing the selected answer especially when you have a lot of records.

I found it much faster to turn this into a two step process using Hashset and Tuple and finally transforming via a Select.

public class Quotes{ 
    public string symbol; 
    public string extension
}

var values = new HashSet<Tuple<string,string>>();

values.Add(new Tuple<string,string>("A","=n"));
values.Add(new Tuple<string,string>("A","=n"));

// values.Count() == 1

values.Select (v => new Quotes{ symbol = v.Item1, extension = v.Item2 });