How to find the kth smallest element in the union of two sorted arrays?

This is a homework question, binary search has already been introduced:

Given two arrays, respectively N and M elements in ascending order, not necessarily unique:
What is a time efficient algorithm to find the kth smallest element in the union of both arrays?

They say it takes O(logN + logM) where N and M are the arrays lengths.

Let's name the arrays a and b. Obviously we can ignore all a[i] and b[i] where i > k.
First let's compare a[k/2] and b[k/2]. Let b[k/2] > a[k/2]. Therefore we can discard also all b[i], where i > k/2.

Now we have all a[i], where i < k and all b[i], where i < k/2 to find the answer.

What is the next step?


Solution 1:

I hope I am not answering your homework, as it has been over a year since this question was asked. Here is a tail recursive solution that will take log(len(a)+len(b)) time.

Assumption: The inputs are correct, i.e., k is in the range [0, len(a)+len(b)].

Base cases:

  • If length of one of the arrays is 0, the answer is kth element of the second array.

Reduction steps:

  • If mid index of a + mid index of b is less than k:
    • If mid element of a is greater than mid element of b, we can ignore the first half of b, adjust k.
    • Otherwise, ignore the first half of a, adjust k.
  • If k is less than sum of mid indices of a and b:
    • If mid element of a is greater than mid element of b, we can safely ignore second half of a.
    • Otherwise, we can ignore second half of b.

Code:

def kthlargest(arr1, arr2, k):
    if len(arr1) == 0:
        return arr2[k]
    elif len(arr2) == 0:
        return arr1[k]

    mida1 = len(arr1) // 2  # integer division
    mida2 = len(arr2) // 2
    if mida1 + mida2 < k:
        if arr1[mida1] > arr2[mida2]:
            return kthlargest(arr1, arr2[mida2+1:], k - mida2 - 1)
        else:
            return kthlargest(arr1[mida1+1:], arr2, k - mida1 - 1)
    else:
        if arr1[mida1] > arr2[mida2]:
            return kthlargest(arr1[:mida1], arr2, k)
        else:
            return kthlargest(arr1, arr2[:mida2], k)

Please note that my solution is creating new copies of smaller arrays in every call, this can be easily eliminated by only passing start and end indices on the original arrays.

Solution 2:

You've got it, just keep going! And be careful with the indexes...

To simplify a bit I'll assume that N and M are > k, so the complexity here is O(log k), which is O(log N + log M).

Pseudo-code:

i = k/2
j = k - i
step = k/4
while step > 0
    if a[i-1] > b[j-1]
        i -= step
        j += step
    else
        i += step
        j -= step
    step /= 2

if a[i-1] > b[j-1]
    return a[i-1]
else
    return b[j-1]

For the demonstration you can use the loop invariant i + j = k, but I won't do all your homework :)