How do I get the intersection between two arrays as a new array?

I faced this problem many times during various situations. It is generic to all programming languages although I am comfortable with C or Java.

Let us consider two arrays (or collections):

char[] A = {'a', 'b', 'c', 'd'};
char[] B = {'c', 'd', 'e', 'f'};

How do I get the common elements between the two arrays as a new array? In this case, the intersection of array A and B is char[] c = {'c', 'd'}.

I want to avoid the repeated iteration of one array inside the other array which will increase the execution time by (length of A times length of B) which is too much in the case of huge arrays.

Is there any way we could do a single pass in each array to get the common elements?

Solution 1:

foreach element e in array A
    insert e into hash table H

foreach element e in array B
    if H contains e 
        print e

This algorithm is O(N) in time and O(N) in space.

To avoid the extra space, you can use the sorting based approach.

Solution 2:

The lower bound on efficiency is O(n) - you need to at least read all the elements. Then there are several apporaches:

Dumb simplest approach

Search for every element from array one in array two. Time complexity O(n^2).

Sorting approach

You need to sort only array one, then search for elements from array two using binary search. Time complexity: sorting O(nlogn), searching O(n * logn) = O(nlogn), total O(nlogn).

Hash approach

Create a hash table from array one elements. Search for elements form second table in the hash table. The time complexity depends on the hash function. You can achieve O(1) for searches in the optimal case (all elements will have different hash value), but O(n) in the worst case (all elements will have the same hash value). Total time complexity: O(n^x), where x is a factor of hash function efficiency (between 1 and 2).

Some hash functions are guaranteed to build a table with no collisions. But the building no longer takes strictly O(1) time for every element. It will be O(1) in most cases, but if the table is full or a collision is encountered, then the table needs to be rehashed - taking O(n) time. This happens not so often, much less frequently than clean adds. So the AMORTISED time complexity is O(1). We don't care about some of the adds taking O(n) time, as long as the majority of adds takes O(1) time.

But even so, in an extreme case, the table must be rehashed every single insertion, so the strict time complexity would be O(n^2)

Solution 3:

There are a few methods in some languages that I'm aware of that do exactly what you want, have you considered looking at some of these implementations?

PHP - array_intersect()

$array1 = array("a" => "green", "red", "blue");
$array2 = array("b" => "green", "yellow", "red");
$result = array_intersect($array1, $array2);
print_r($result);

>> green
   red

Java - List.retainAll

Collection listOne = new ArrayList(Arrays.asList("milan","dingo", "elpha", "hafil", "meat", "iga", "neeta.peeta"));
Collection listTwo = new ArrayList(Arrays.asList("hafil", "iga", "binga", "mike", "dingo"));

listOne.retainAll( listTwo );
System.out.println( listOne );

>> dingo, hafil, iga