A computer's memory is finite, so how can there be languages more powerful than regular?

A computer has a finite memory. There are no computers with infinite memory. Therefore the only languages that a computer can process are those whose member strings are finite. As I recall, the computational power required for any finite language is small -- a deterministic finite state automata is sufficient to process finite languages, I think.

So there is no such thing as context-free languages or recursively enumerable languages, right? Programming languages are not context-free, they are just simple regular languages, right? The idea that a programming language is a context-free language ... well, that's an illusion, right?

Why do we even bother to talk about context-free languages, recursive languages, recursively enumerable languages since inside the (finite) computer everything is just a regular language?

Obviously I am playing devils-advocate here, but I would really like to understand the fallacy of my argument.

Summary of Responses

Thank you very much to all who responded. I have studied your responses carefully. There are so many pearls of wisdom in your responses that I decided to summarize them. If my summary is incorrect, unclear, or missing an important concept, please let me know.

> A computer has a finite memory. There are no computers 
> with infinite memory. Therefore the only languages that 
> a computer can process are those whose member strings 
> are finite.

Responses

A computer's memory is finite, so in its memory it can recognize only finite languages. However, if you would pass an external input to the computer, then it could recognize some infinite languages.

Example: a*b is an infinite language and any string in that language can be recognized, even if the string exceeds the size of the computer's memory. Here's how: pass the string into the computer character by character. If the first character that is passed in is not an "a" then go to an error state. Otherwise, stay in the "a" state until a non-"a" character is input. If the character is not a "b" then go to the error state. Otherwise, go to an accept state. For any characters beyond "b" go to the error state.

Not all infinite languages can be recognized using this technique of feeding the input strings in as external input. In fact, the technique can be used only with regular languages. And even of the regular languages, only some of them can be recognized.

Example: For illustration purposes, suppose the computer has a ridiculously small memory size, say, 2 bytes. Suppose 1 byte is required for each state of a finite state automaton. Then the computer will not be able to recognize this regular language: a*b*c* because at least three byes are needed – one byte for the state that consumes all the a's, a second byte for the state that consumes all the b's, and a third byte for the state that consumes all the c's.

Any finite state automaton that requires more states than there are bytes in the computer's memory cannot be recognized.

Finally, with this technique of feeding arbitrarily long strings into the computer as external input, we would need to find a way around certain technical limitations such as a continuous power supply.

> A computer has a finite memory. Therefore the only languages 
> that a computer can process are those whose member strings 
> are finite … So there is no such thing as context-free languages 
> or recursively enumerable languages, right?

Responses

Consider an infinite language. It consists of strings of arbitrary length. Many strings will be longer than the size of the computer's memory. It is incorrect to say that because a computer can hold in its memory only those strings that are of a certain finite length, the language has only finitely many strings. It is true that any finite subset of a language is regular. But that does not mean that the original language must have been regular.

> A computer has a finite memory. Therefore, in general, a computer 
> can only process a finite subset of an infinite language. A finite 
> subset of a language is regular. So why don't we simply treat all 
> languages used in computers as regular? 

Responses

The reason to treat, say, a programming language as context-free is that the context-free grammar tells how to parse the language. If you considered just the subset of programs with length no more than 232, that would be regular but the regular expression would likely consist of millions of individual cases and wouldn't be helpful for parsing the programs.  


The reason to treat a programming language as context-free is that the context-free grammar tells how to parse the language. If you considered just the subset of C consisting of programs of length no more than $2^{32}$ that would be regular, but the regular expression would likely consist of millions of individual cases, and wouldn't be helpful for parsing the programs. You might not even be able to fit the compiler in memory...

For a simpler example, consider a context free grammar for arithmetical expressions with natural numbers, addition and multiplication.

  • E -> {sequence of digits 0-9}
  • E -> ( E + E )
  • E -> ( E * E )

If you only look at expressions with 500 or fewer symbols, that fragment of the language is regular, but it is much more difficult to describe as a regular language than as a context-free one. Plus, the parse tree for the context-free grammar gives a direct way to evaluate the expressions, while the parse tree for that smaller fragment as a regular language is not likely to help evaluate the expression.


Actually, in the sense of the program, you can make a computer as power as the universal Turing Machine. (In fact, the computer you are on probably is.) More precisely, you can write down explicitly a universal Turing Machine in many of the actual computer languages out there. So in a very real-world sense, there are computer program much more power than any finite automata (the model of computation that recognizes regular langauges).

It is true that a computer has finite memory. This just means that a computer will imitate any Turing machine until it runs out of memory, electric, etc. However any computation that halts takes finite amount of time. So theoretically, one would just need to build a more powerful computer. If a computation does not halt by an idealize Turing machine, neither will it by any real computer no matter how powerful. This does change the fact that your real computer is running the same program as a universal Turing machine.

Morever, you should make a distinction between languages and the model of computation. A language is a string of symbols. A recursive langauge is one decided by a Turing Machine. A recursively enumerable language is one accepted by a Turing machine. For instance, the set $H$ of code $\langle e, f \rangle$ such that the $e^\text{th}$ Turing Machine halts on input $f$ is the recursively enumerable set corresponding to the Halting problem. This is a just a set of numbers! An idea! Why should the fact that real computers have finite memory have any bearing on whether a recursive language or regular language (some infinite set; an abstract idea) should "exists" or not.

To ask whether languages (an abstract notion) exists because of the nature of real computers is not entire well-defined question. Does a recursive language not exists because no existing computer can run long enough to halt? But it will halt eventually. Similarly, one can make huge finite automata that will not halt in anyone's lifetime. It will stop eventually too. Does this mean no regular language exist either because no current device can finish the calculation.

Recursive language and regular languages are just ideas computed by other abstract notions like Turing machines or finite automata. The nature of physical computers don't make ideas real.