Sorting 1 million 8-decimal-digit numbers with 1 MB of RAM

I have a computer with 1 MB of RAM and no other local storage. I must use it to accept 1 million 8-digit decimal numbers over a TCP connection, sort them, and then send the sorted list out over another TCP connection.

The list of numbers may contain duplicates, which I must not discard. The code will be placed in ROM, so I need not subtract the size of my code from the 1 MB. I already have code to drive the Ethernet port and handle TCP/IP connections, and it requires 2 KB for its state data, including a 1 KB buffer via which the code will read and write data. Is there a solution to this problem?

Sources Of Question And Answer:

slashdot.org

cleaton.net


There is one rather sneaky trick not mentioned here so far. We assume that you have no extra way to store data, but that is not strictly true.

One way around your problem is to do the following horrible thing, which should not be attempted by anyone under any circumstances: Use the network traffic to store data. And no, I don't mean NAS.

You can sort the numbers with only a few bytes of RAM in the following way:

  • First take 2 variables: COUNTER and VALUE.
  • First set all registers to 0;
  • Every time you receive an integer I, increment COUNTER and set VALUE to max(VALUE, I);
  • Then send an ICMP echo request packet with data set to I to the router. Erase I and repeat.
  • Every time you receive the returned ICMP packet, you simply extract the integer and send it back out again in another echo request. This produces a huge number of ICMP requests scuttling backward and forward containing the integers.

Once COUNTER reaches 1000000, you have all of the values stored in the incessant stream of ICMP requests, and VALUE now contains the maximum integer. Pick some threshold T >> 1000000. Set COUNTER to zero. Every time you receive an ICMP packet, increment COUNTER and send the contained integer I back out in another echo request, unless I=VALUE, in which case transmit it to the destination for the sorted integers. Once COUNTER=T, decrement VALUE by 1, reset COUNTER to zero and repeat. Once VALUE reaches zero you should have transmitted all integers in order from largest to smallest to the destination, and have only used about 47 bits of RAM for the two persistent variables (and whatever small amount you need for the temporary values).

I know this is horrible, and I know there can be all sorts of practical issues, but I thought it might give some of you a laugh or at least horrify you.


Here's some working C++ code which solves the problem.

Proof that the memory constraints are satisfied:

Editor: There is no proof of the maximum memory requirements offered by the author either in this post or in his blogs. Since the number of bits necessary to encode a value depends on the values previously encoded, such a proof is likely non-trivial. The author notes that the largest encoded size he could stumble upon empirically was 1011732, and chose the buffer size 1013000 arbitrarily.

typedef unsigned int u32;

namespace WorkArea
{
    static const u32 circularSize = 253250;
    u32 circular[circularSize] = { 0 };         // consumes 1013000 bytes

    static const u32 stageSize = 8000;
    u32 stage[stageSize];                       // consumes 32000 bytes

    ...

Together, these two arrays take 1045000 bytes of storage. That leaves 1048576 - 1045000 - 2×1024 = 1528 bytes for remaining variables and stack space.

It runs in about 23 seconds on my Xeon W3520. You can verify that the program works using the following Python script, assuming a program name of sort1mb.exe.

from subprocess import *
import random

sequence = [random.randint(0, 99999999) for i in xrange(1000000)]

sorter = Popen('sort1mb.exe', stdin=PIPE, stdout=PIPE)
for value in sequence:
    sorter.stdin.write('%08d\n' % value)
sorter.stdin.close()

result = [int(line) for line in sorter.stdout]
print('OK!' if result == sorted(sequence) else 'Error!')

A detailed explanation of the algorithm can be found in the following series of posts:

  • 1MB Sorting Explained
  • Arithmetic Coding and the 1MB Sorting Problem
  • Arithmetic Encoding Using Fixed-Point Math