(I don't think that this is a good fit on cstheory, since I figure that this question already has a known answer. However, if you think that this would be a better fit there, please feel free to migrate it)

It can easily be shown that REGULAR (the class of regular languages) is contained in DTIME(O(n)), because for any regular language L we can decide membership in that language in O(n) time by converting a DFA that decides L into a TM that decides L in a single pass over the input. However, is it also true that DTIME(O(n)) is contained in REGULAR; that is, that any language that can be decided in O(n) time by a single-tape TM is necessarily regular?

I have no idea how to approach this problem, and every language I can think of in DTIME(O(n)) seems to be regular. Intuitively, this would make sense, since if you're only allowed to make finitely many passes over the input, you could only remember finitely many things about it, which it seems like you could simulate with a sufficiently large DFA. However, the fact that you can write to the input tape seems like it might complicate things.

And no, this isn't a homework problem. I'm actually teaching a course that covers computability and complexity theory and thought it would be interesting to present the answer to this question in lecture. :-)

Thanks!


Solution 1:

For the benefit of the readers, here is an outline of the proof that an $o(n\log n)$-time TM accepts a regular language.

Consider the computation of the TM on some input $x$. Write $x$ as follows: $$ |x_0|x_1|\cdots|x_n|. $$ Each $|$ signifies a boundary between two cells. The crossing number at a boundary on $x$ is simply the number of times the machine crosses the boundary. The crossing sequence is the sequence of states that the TM enters immediately after crossing the boundary (first from left to right, then from right to left, and so on).

Claim (Kobayashi '85): If a TM runs in time $o(n\log n)$, then the crossing number at any boundary on any input is $O(1)$.

Proof idea: Suppose that on some input, the crossing number at some boundary $X$ is large (at least $A$). Pick a smallest such input. Since the machine runs in time $o(n\log n)$, there must be many boundaries (at least $B(n)$) at which the crossing number is small (at most $c(n)$). Since $B(n)$ is so big compared to $c(n)$, by the pigeonhole principle there must be four boundaries with the same crossing sequence, out of which there are two on the same side of $X$. It follows that if we "excise" the substring bounded by these boundaries, we get a smaller string with a boundary with crossing number at least $A$. Q.E.D.

A computation is of course needed to show that there is an appropriate choice of $A,B(n),c(n)$ whenever the time is $o(n\log n)$. That's left for you to verify.

Claim (Hennie '65): If the crossing number of a TM is $O(1)$ for any input and any boundary, then the language it accepts is regular.

Proof idea: For strings $x,y$, define $x \equiv y$ if for any $z$, whenever the machine accepts $xz$ then it accepts $yz$, and vice versa. We show that $\equiv$ has finitely many equivalence classes, so the claim follows from Myhill-Nerode.

For a string $x$, let $S(x)$ denote the set of crossing sequences at the boundary $x|z$ for any $z$ such that the TM accepts $xz$. Suppose that $S(x) = S(y)$. We will show that $x \equiv y$. Indeed, take any $z$ such that the TM accepts $xz$. Since $S(x) \subseteq S(y)$, there is $w$ such that the TM accepts $yw$, and the crossing sequences at $x|z$ and $y|w$ are the same. This implies (after some thought) that indeed $yz$ is accepted by the TM. Q.E.D.

Solution 2:

Single-tape $o(n \log n)$ Turing machines can recognize only regular languages. See http://arxiv.org/abs/cs/0310046 for reference. I recall seeing the proof for $O(n)$ only (can't remember the source now), and it was definitely not simple.