Storing 1 million phone numbers [closed]

What is the most efficient way, memory-wise, to store 1 million phone numbers?

Apparently this is an interview question at Google, please give your ideas.


Solution 1:

If memory is our biggest consideration, then we don't need to store the number at all, just the delta between i and i+1.

Now if numbers range from 200 0000 - 999 9999, then there are 7,999,999 possible phone numbers. Since we have 1-million numbers, and if we assume that they are uniformly distributed, we have an Expected distance of E = n_i+1 - n_i ~ 8 (3 bits) between sequential numbers n_i and n_i+1. So for a 32-bit int we could potentially store up to 10 sequential offsets (~400kb optimal total memory footprint), however its likely that we'll have some cases where we need an offset greater than 8 (Maybe we have 400, or 1500??). In which case, we can simply reserve the first 2 bits of the int as a header which tells us what frame size we use to read the bits it stores. For example, maybe we use: 00 = 3x10, 01 = 5x6, 10 = 7x4, 11 = 1*30.

Solution 2:

Write them in ASCII, space-separated.

Zip the resulting string, using your favourite compression algorithm. If order isn't important, sorting them first might help the compression, gives you more repetition closer together.

Oh, did you want efficient random access? Then you should've said.