Finding pairs in an array

I saw this question and wanted to play with it, especially as @Neil and I are on the same wavelength.

It’s a day later now, and since there are O(n log n) and O(n²) solutions posted, here’s an actual O(n) version (where n is actually the length of the hash table, not the input, which we don’t count).
Even if we did count it, though, it would be O(n+k) → O(n).
(The same thing happens with the other solutions too, which is why we don’t count the input.)

#define __STDC_WANT_LIB_EXT1__ 1
#include <math.h>
#include <stdio.h>


// the histogram
enum { MAX_SAMPLES = 1000 };  // this should be overkill for the expected load factor
int values[ MAX_SAMPLES ];    // the sample value being counted
int counts[ MAX_SAMPLES ];    // non-negative, zero == value not sampled


void add( int value )
{
  int hash = abs( value ) % MAX_SAMPLES;  // simplest hash EVER

  for (int offset = 0;  offset < MAX_SAMPLES;  offset++)  // give up if table is saturated
  {
    int index = (hash + offset) % MAX_SAMPLES;

    if (counts[index] == 0)      // if found an empty spot:
      values[index] = value;     //   add the value

    if (values[index] == value)  // if found the value:
    {                            //
      counts[index] += 1;        //   increment value's count
      break;                     //   and done searching
    }
  }
}


int read_int()
{
  int value;
  scanf_s( "%d", &value );
  return value;
}


int main()
{
  // input list --> histogram
  for (int n = read_int();  n;  n--)
    add( read_int() );

  // Σ ⌊count / 2⌋ --> number of pairs
  int num_pairs = 0;
  for (int n = 0;  n < MAX_SAMPLES;  n++)
    num_pairs += counts[n] / 2;

  printf( "%d", num_pairs );
}

This solution has been condensed to the simplest terms. In particular:

  • there is no error checking: no allocation error, no input failure, no hash table saturation
  • uses a global, fixed size hash map for the histogram, meaning:
    • it cannot grow if needed
    • but all counts are conveniently automatically initialized to zero

This solution does have its drawbacks though:

  • it requires extra O(k) space
  • bad input can bog it down
    For example, if all inputs (mod table size) were the same value then every input would be a hash collision. More sophisticated management of the hash map would alleviate both problems, at the cost of added complexity (grow the hash map if overloaded or too many collisions detected).
    It would also help a lot if the table size were a prime number.

Input method

I don’t know if the input must be { n, a1, ..., an }, which is very common in academia, at least, but it annoys me.

I very much prefer input that doesn’t require the user to know how many items he or she intends to provide: just type them and press Enter. This is much more easily done than people realize:

  1. input from command-line arguments (very, very convenient for testing/playing):
int main( int argc, char ** argv )
{
  for (int n = 1;  n < argc;  n++)
    add( atoi( argv[n] ) );
  1. input from a single line of text, which we then parse:
// using strtok

fgets( s, sizeof(s), stdin );
for (char * token = strtok( s, " " );  token;  token = strtok( NULL, " " ))
  add( atoi( token ) );

 

// using sscanf

fgets( s, sizeof(s), stdin );
for (int value, k, n = 0;  sscanf_s( s + n, "%d%n", &value, &k ) == 1;  n += k)
  add( value );

Both of these input solutions are easily combined into the same program:

int main( int argc, char ** argv )
{
  if (argc > 1)
    // get input from command-line arguments //
  else
    // get input from a single line of text //

Anyway, this solution is totally over-the-top for an introductory assignment, even with the pared-down global histogram object reducing lines of code.

Ideally, SO answers to homework questions should help the OP work his or her way through his or her own solution (instead of just being an “ooh, ooh, I can do that” dumping ground of other people’s solutions, making it no more than a free code-writing service for students). Hopefully this solution gives just another tool in the box to reason about future problems.