Computation with a memory wiped computer

Here is another result from Scott Aaronson's blog:

If every second or so your computer’s memory were wiped completely clean, except for the input data; the clock; a static, unchanging program; and a counter that could only be set to 1, 2, 3, 4, or 5, it would still be possible (given enough time) to carry out an arbitrarily long computation — just as if the memory weren’t being wiped clean each second. This is almost certainly not true if the counter could only be set to 1, 2, 3, or 4. The reason 5 is special here is pretty much the same reason it’s special in Galois’ proof of the unsolvability of the quintic equation.

Does anyone have idea of how to show this?


Solution 1:

1) Why does a small number of states suffice?

Regardless of whether the constant is 5 or 500, its still very surprising. Thankfully, it's fairly straightforward to prove this if you allow the counter to be $\{1, \ldots, 8\}$ instead of $\{1, \ldots, 5\}$. [This proof is by Ben-Or and Cleve.]. Start by representing the computation as a circuit, and ignore the whole wiping-clean thing.

Define a register machine as follows: It has 3 registers $(R_1,R_2,R_3)$, each of which holds a single bit. At each step, the program performs some computation on the registers of the form $R_i \gets R_a + x_b R_c$ or $R_i \gets R_a + x_b R_c + R_d$ (where $x_1\ \ldots x_n$ is the input).

Initially, set $(R_1,R_2,R_3) = (1,0,0)$. The machine should end in the state $R_3 + f R_1$. We'll simulate the circuit using a register machine.

We now proceed by induction on the depth of the circuit. If the circuit has depth 0, then we just copy the appropriate bit: $R_3 \gets R_3 + x_i R_1$.

For the induction, we have 3 cases, according to whether the final gate is NOT, AND, or OR.

Suppose that the circuit is $\neg f$. By induction, we can compute $f$, yielding the state $(R_1,R_2,R_3 + f R_1)$. We can therefore perform the instruction $R_3 \gets R_3 + R_1$ to get the desired output.

If the circuit is $f_1 \wedge f_2$, then life is a tad more complicated. By induction, we then execute the following 4 instructions:

\begin{align*} R_2 &\gets R_2 + f_1 R_1 \\ R_3 &\gets R_3 + f_2 R_2 \\ R_2 &\gets R_2 + f_1 R_1 \\ R_3 &\gets R_3 + f_2 R_2 \end{align*}

Assuming I haven't made any typos, we are left with the state $(R_1,R_2,R_3+f_1f_2R_1)$, as desired. $f_1 \vee f_2$ works similarly.

QED.

Take a moment to process what just happened. It's a slick proof that you have to read 2 or 3 times before it begins to sink in. What we've shown is that we can simulate a circuit by applying a fixed program that stores only 3 bits of information at any time.

To convert this into Aaronson's version, we encode the three registers into the counter (that's why we needed the extra 3 spaces). The simple program uses the input and the clock to determine how far we've made it through the computation and then applies the appropriate change to the counter.

2) But what's the deal with 5?

To get from 8 states down to 5, you use a similar argument, but are much more careful about exactly how much information needs to be propagated between stages and how it can be encoded. A formal proof requires lots of advanced group theory.


Edit to answer Casebash's questions:

1) Correct. Any computation can be expressed as a circuit composed solely of "NOT", binary-"AND", and binary-"OR" gates.

2) The notation $f R_1$ means (boolean) multiplication.

3) The program for computing $f$ should take input $(R_1,R_2,R_3)$ to $(R_1,R_2,R_3 + f R_1)$. We insist that the first two registers are unchanged since we use those as temporary storage in the induction. For example, when computing $f_1 \wedge f_2$, we compute the first branch and store the result in $R_2$ while computing the second branch.

4) The single bit of output is the final value of $R_3$. Since we started with $(1,0,0)$, we end with $(1,0,f)$.

Solution 2:

As Scott himself states in comments section of the post in question (comment #9):

(4) Width-5 branching programs can compute NC1 (Barrington 1986); corollary pointed out by Ogihara 1994 that width-5 bottleneck Turing machines can compute PSPACE

Unfortunately, I don't have any ideas how this is proved.