Quickest way to find missing number in an array of numbers
You can do this in O(n). Iterate through the array and compute the sum of all numbers. Now, sum of natural numbers from 1 to N, can be expressed as Nx(N+1)/2
. In your case N=100.
Subtract the sum of the array from Nx(N+1)/2
, where N=100.
That is the missing number. The empty slot can be detected during the iteration in which the sum is computed.
// will be the sum of the numbers in the array.
int sum = 0;
int idx = -1;
for (int i = 0; i < arr.length; i++)
{
if (arr[i] == 0)
{
idx = i;
}
else
{
sum += arr[i];
}
}
// the total sum of numbers between 1 and arr.length.
int total = (arr.length + 1) * arr.length / 2;
System.out.println("missing number is: " + (total - sum) + " at index " + idx);
We can use XOR operation which is safer than summation because in programming languages if the given input is large it may overflow and may give wrong answer.
Before going to the solution, know that A xor A = 0
. So if we XOR two identical numbers the value is 0.
Now, XORing [1..n] with the elements present in the array cancels the identical numbers. So at the end we will get the missing number.
// Assuming that the array contains 99 distinct integers between 1..99
// and empty slot value is zero
int XOR = 0;
for(int i=0; i<100; i++) {
if (ARRAY[i] != 0) // remove this condition keeping the body if no zero slot
XOR ^= ARRAY[i];
XOR ^= (i + 1);
}
return XOR;
//return XOR ^ ARRAY.length + 1; if your array doesn't have empty zero slot.
Let the given array be A with length N. Lets assume in the given array, the single empty slot is filled with 0.
We can find the solution for this problem using many methods including algorithm used in Counting sort
. But, in terms of efficient time and space usage, we have two algorithms. One uses mainly summation, subtraction and multiplication. Another uses XOR. Mathematically both methods work fine. But programatically, we need to assess all the algorithms with main measures like
- Limitations(like input values are large(
A[1...N]
) and/or number of input values is large(N
)) - Number of condition checks involved
- Number and type of mathematical operations involved
etc. This is because of the limitations in time and/or hardware(Hardware resource limitation) and/or software(Operating System limitation, Programming language limitation, etc), etc. Lets list and assess the pros and cons of each one of them.
Algorithm 1 :
In algorithm 1, we have 3 implementations.
-
Calculate the total sum of all the numbers(this includes the unknown missing number) by using the mathematical formula(
1+2+3+...+N=(N(N+1))/2
). Here,N=100
. Calculate the total sum of all the given numbers. Subtract the second result from the first result will give the missing number.Missing Number = (N(N+1))/2) - (A[1]+A[2]+...+A[100])
-
Calculate the total sum of all the numbers(this includes the unknown missing number) by using the mathematical formula(
1+2+3+...+N=(N(N+1))/2
). Here,N=100
. From that result, subtract each given number gives the missing number.Missing Number = (N(N+1))/2)-A[1]-A[2]-...-A[100]
(
Note:
Even though the second implementation's formula is derived from first, from the mathematical point of view both are same. But from programming point of view both are different because the first formula is more prone to bit overflow than the second one(if the given numbers are large enough). Even though addition is faster than subtraction, the second implementation reduces the chance of bit overflow caused by addition of large values(Its not completely eliminated, because there is still very small chance since (N+1
) is there in the formula). But both are equally prone to bit overflow by multiplication. The limitation is both implementations give correct result only ifN(N+1)<=MAXIMUM_NUMBER_VALUE
. For the first implementation, the additional limitation is it give correct result only ifSum of all given numbers<=MAXIMUM_NUMBER_VALUE
.) -
Calculate the total sum of all the numbers(this includes the unknown missing number) and subtract each given number in the same loop in parallel. This eliminates the risk of bit overflow by multiplication but prone to bit overflow by addition and subtraction.
//ALGORITHM missingNumber = 0; foreach(index from 1 to N) { missingNumber = missingNumber + index; //Since, the empty slot is filled with 0, //this extra condition which is executed for N times is not required. //But for the sake of understanding of algorithm purpose lets put it. if (inputArray[index] != 0) missingNumber = missingNumber - inputArray[index]; }
In a programming language(like C, C++, Java, etc), if the number of bits representing a integer data type is limited, then all the above implementations are prone to bit overflow because of summation, subtraction and multiplication, resulting in wrong result in case of large input values(A[1...N]
) and/or large number of input values(N
).
Algorithm 2 :
We can use the property of XOR to get solution for this problem without worrying about the problem of bit overflow. And also XOR is both safer and faster than summation. We know the property of XOR that XOR of two same numbers is equal to 0(A XOR A = 0
). If we calculate the XOR of all the numbers from 1 to N(this includes the unknown missing number) and then with that result, XOR all the given numbers, the common numbers get canceled out(since A XOR A=0
) and in the end we get the missing number. If we don't have bit overflow problem, we can use both summation and XOR based algorithms to get the solution. But, the algorithm which uses XOR is both safer and faster than the algorithm which uses summation, subtraction and multiplication. And we can avoid the additional worries caused by summation, subtraction and multiplication.
In all the implementations of algorithm 1, we can use XOR instead of addition and subtraction.
Lets assume, XOR(1...N) = XOR of all numbers from 1 to N
Implementation 1 => Missing Number = XOR(1...N) XOR (A[1] XOR A[2] XOR...XOR A[100])
Implementation 2 => Missing Number = XOR(1...N) XOR A[1] XOR A[2] XOR...XOR A[100]
Implementation 3 =>
//ALGORITHM
missingNumber = 0;
foreach(index from 1 to N)
{
missingNumber = missingNumber XOR index;
//Since, the empty slot is filled with 0,
//this extra condition which is executed for N times is not required.
//But for the sake of understanding of algorithm purpose lets put it.
if (inputArray[index] != 0)
missingNumber = missingNumber XOR inputArray[index];
}
All three implementations of algorithm 2 will work fine(from programatical point of view also). One optimization is, similar to
1+2+....+N = (N(N+1))/2
We have,
1 XOR 2 XOR .... XOR N = {N if REMAINDER(N/4)=0, 1 if REMAINDER(N/4)=1, N+1 if REMAINDER(N/4)=2, 0 if REMAINDER(N/4)=3}
We can prove this by mathematical induction. So, instead of calculating the value of XOR(1...N) by XOR all the numbers from 1 to N, we can use this formula to reduce the number of XOR operations.
Also, calculating XOR(1...N) using above formula has two implementations. Implementation wise, calculating
// Thanks to https://a3nm.net/blog/xor.html for this implementation
xor = (n>>1)&1 ^ (((n&1)>0)?1:n)
is faster than calculating
xor = (n % 4 == 0) ? n : (n % 4 == 1) ? 1 : (n % 4 == 2) ? n + 1 : 0;
So, the optimized Java code is,
long n = 100;
long a[] = new long[n];
//XOR of all numbers from 1 to n
// n%4 == 0 ---> n
// n%4 == 1 ---> 1
// n%4 == 2 ---> n + 1
// n%4 == 3 ---> 0
//Slower way of implementing the formula
// long xor = (n % 4 == 0) ? n : (n % 4 == 1) ? 1 : (n % 4 == 2) ? n + 1 : 0;
//Faster way of implementing the formula
// long xor = (n>>1)&1 ^ (((n&1)>0)?1:n);
long xor = (n>>1)&1 ^ (((n&1)>0)?1:n);
for (long i = 0; i < n; i++)
{
xor = xor ^ a[i];
}
//Missing number
System.out.println(xor);