How to find a duplicate element in an array of shuffled consecutive integers?

I recently came across a question somewhere:

Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array, except for one number, which occurs twice. Assume that you can access each element of the array only once. Describe an algorithm to find the repeated number. If you used auxiliary storage in your algorithm, can you find an algorithm that does not require it?

What I am interested in to know is the second part, i.e., without using auxiliary storage. Do you have any idea?


Solution 1:

Just add them all up, and subtract the total you would expect if only 1001 numbers were used from that.

Eg:

Input: 1,2,3,2,4 => 12
Expected: 1,2,3,4 => 10

Input - Expected => 2

Solution 2:

Update 2: Some people think that using XOR to find the duplicate number is a hack or trick. To which my official response is: "I am not looking for a duplicate number, I am looking for a duplicate pattern in an array of bit sets. And XOR is definitely suited better than ADD to manipulate bit sets". :-)

Update: Just for fun before I go to bed, here's "one-line" alternative solution that requires zero additional storage (not even a loop counter), touches each array element only once, is non-destructive and does not scale at all :-)

printf("Answer : %d\n",
           array[0] ^
           array[1] ^
           array[2] ^
           // continue typing...
           array[999] ^
           array[1000] ^
           1 ^
           2 ^
           // continue typing...
           999^
           1000
      );

Note that the compiler will actually calculate the second half of that expression at compile time, so the "algorithm" will execute in exactly 1002 operations.

And if the array element values are know at compile time as well, the compiler will optimize the whole statement to a constant. :-)

Original solution: Which does not meet the strict requirements of the questions, even though it works to find the correct answer. It uses one additional integer to keep the loop counter, and it accesses each array element three times - twice to read it and write it at the current iteration and once to read it for the next iteration.

Well, you need at least one additional variable (or a CPU register) to store the index of the current element as you go through the array.

Aside from that one though, here's a destructive algorithm that can safely scale for any N up to MAX_INT.

for (int i = 1; i < 1001; i++)
{
   array[i] = array[i] ^ array[i-1] ^ i;
}

printf("Answer : %d\n", array[1000]);

I will leave the exercise of figuring out why this works to you, with a simple hint :-):

a ^ a = 0
0 ^ a = a