Finding a single number in a list [duplicate]

The fastest (O(n)) and most memory efficient (O(1)) way is with the XOR operation.

In C:

int arr[] = {3, 2, 5, 2, 1, 5, 3};

int num = 0, i;

for (i=0; i < 7; i++)
    num ^= arr[i];

printf("%i\n", num);

This prints "1", which is the only one that occurs once.

This works because the first time you hit a number it marks the num variable with itself, and the second time it unmarks num with itself (more or less). The only one that remains unmarked is your non-duplicate.


By the way, you can expand on this idea to very quickly find two unique numbers among a list of duplicates.

Let's call the unique numbers a and b. First take the XOR of everything, as Kyle suggested. What we get is a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask -- in more detail: choose x as a power of 2 so that x & (a^b) is nonzero.

Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that a and b are in different buckets. We also know that each pair of duplicates is still in the same bucket. So we can now apply ye olde "XOR-em-all" trick to each bucket independently, and discover what a and b are completely.

Bam.


O(N) time, O(N) memory

HT= Hash Table

HT.clear() go over the list in order for each item you see

if(HT.Contains(item)) -> HT.Remove(item)
else
ht.add(item)

at the end, the item in the HT is the item you are looking for.

Note (credit @Jared Updike): This system will find all Odd instances of items.


comment: I don't see how can people vote up solutions that give you NLogN performance. in which universe is that "better" ? I am even more shocked you marked the accepted answer s NLogN solution...

I do agree however that if memory is required to be constant, then NLogN would be (so far) the best solution.


Kyle's solution would obviously not catch situations were the data set does not follow the rules. If all numbers were in pairs the algorithm would give a result of zero, the exact same value as if zero would be the only value with single occurance.

If there were multiple single occurance values or triples, the result would be errouness as well.

Testing the data set might well end up with a more costly algorithm, either in memory or time.

Csmba's solution does show some errouness data (no or more then one single occurence value), but not other (quadrouples). Regarding his solution, depending on the implementation of HT, either memory and/or time is more then O(n).

If we cannot be sure about the correctness of the input set, sorting and counting or using a hashtable counting occurances with the integer itself being the hash key would both be feasible.