Is the C99 preprocessor Turing complete?
After discovering the Boost preprocessor's capabilities I found myself wondering: Is the C99 preprocessor Turing complete?
If not, what does it lack to not qualify?
Solution 1:
Well macros don't directly expand recursively, but there are ways we can work around this.
The easiest way of doing recursion in the preprocessor is to use a deferred expression. A deferred expression is an expression that requires more scans to fully expand:
#define EMPTY()
#define DEFER(id) id EMPTY()
#define OBSTRUCT(...) __VA_ARGS__ DEFER(EMPTY)()
#define EXPAND(...) __VA_ARGS__
#define A() 123
A() // Expands to 123
DEFER(A)() // Expands to A () because it requires one more scan to fully expand
EXPAND(DEFER(A)()) // Expands to 123, because the EXPAND macro forces another scan
Why is this important? Well when a macro is scanned and expanding, it creates a disabling context. This disabling context will cause a token, that refers to the currently expanding macro, to be painted blue. Thus, once its painted blue, the macro will no longer expand. This is why macros don't expand recursively. However, a disabling context only exists during one scan, so by deferring an expansion we can prevent our macros from becoming painted blue. We will just need to apply more scans to the expression. We can do that using this EVAL
macro:
#define EVAL(...) EVAL1(EVAL1(EVAL1(__VA_ARGS__)))
#define EVAL1(...) EVAL2(EVAL2(EVAL2(__VA_ARGS__)))
#define EVAL2(...) EVAL3(EVAL3(EVAL3(__VA_ARGS__)))
#define EVAL3(...) EVAL4(EVAL4(EVAL4(__VA_ARGS__)))
#define EVAL4(...) EVAL5(EVAL5(EVAL5(__VA_ARGS__)))
#define EVAL5(...) __VA_ARGS__
Now if we want to implement a REPEAT
macro using recursion, first we need some increment and decrement operators to handle state:
#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__)
#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__
#define INC(x) PRIMITIVE_CAT(INC_, x)
#define INC_0 1
#define INC_1 2
#define INC_2 3
#define INC_3 4
#define INC_4 5
#define INC_5 6
#define INC_6 7
#define INC_7 8
#define INC_8 9
#define INC_9 9
#define DEC(x) PRIMITIVE_CAT(DEC_, x)
#define DEC_0 0
#define DEC_1 0
#define DEC_2 1
#define DEC_3 2
#define DEC_4 3
#define DEC_5 4
#define DEC_6 5
#define DEC_7 6
#define DEC_8 7
#define DEC_9 8
Next we need a few more macros to do logic:
#define CHECK_N(x, n, ...) n
#define CHECK(...) CHECK_N(__VA_ARGS__, 0,)
#define NOT(x) CHECK(PRIMITIVE_CAT(NOT_, x))
#define NOT_0 ~, 1,
#define COMPL(b) PRIMITIVE_CAT(COMPL_, b)
#define COMPL_0 1
#define COMPL_1 0
#define BOOL(x) COMPL(NOT(x))
#define IIF(c) PRIMITIVE_CAT(IIF_, c)
#define IIF_0(t, ...) __VA_ARGS__
#define IIF_1(t, ...) t
#define IF(c) IIF(BOOL(c))
#define EAT(...)
#define EXPAND(...) __VA_ARGS__
#define WHEN(c) IF(c)(EXPAND, EAT)
Now with all these macros we can write a recursive REPEAT
macro. We use a REPEAT_INDIRECT
macro to refer back to itself recursively. This prevents the macro from being painted blue, since it will expand on a different scan(and using a different disabling context). We use OBSTRUCT
here, which will defer the expansion twice. This is necessary because the conditional WHEN
applies one scan already.
#define REPEAT(count, macro, ...) \
WHEN(count) \
( \
OBSTRUCT(REPEAT_INDIRECT) () \
( \
DEC(count), macro, __VA_ARGS__ \
) \
OBSTRUCT(macro) \
( \
DEC(count), __VA_ARGS__ \
) \
)
#define REPEAT_INDIRECT() REPEAT
//An example of using this macro
#define M(i, _) i
EVAL(REPEAT(8, M, ~)) // 0 1 2 3 4 5 6 7
Now this example is limited to 10 repeats, because of limitations of the counter. Just like a repeat counter in a computer would be limited by the finite memory. Multiple repeat counters could be combined together to workaround this limitation, just like in a computer. Furthermore, we could define a FOREVER
macro:
#define FOREVER() \
? \
DEFER(FOREVER_INDIRECT) () ()
#define FOREVER_INDIRECT() FOREVER
// Outputs question marks forever
EVAL(FOREVER())
This will try to output ?
forever, but will eventually stop because there are no more scans being applied. Now the question is, if we gave it an infinite number of scans would this algorithm complete? This is known as the halting problem, and Turing completeness is necessary to prove the undecidability of the halting problem. So as you can see, the preprocessor can act as a Turing complete language, but instead of being limited to the finite memory of a computer it is instead limited by the finite number of scans applied.
Solution 2:
Here is an example of abusing the preprocessor to implement a Turing machine. Note that an external build script is needed to feed the preprocessor's output back into its input, so the preprocessor in and of itself isn't Turing complete. Still, it's an interesting project.
From the description of the afore-linked project:
the preprocessor is not Turing complete, at least not if the program is preprocessed only once. This is true even if the program is allowed to include itself. (The reason being that for a given program, the preprocessor has only a finite number of states, plus a stack consisting of the places which the file has been included from. This is only a push-down automaton.)
The answer by Paul Fultz II is quite impressive and certainly closer than I thought the preprocessor could ever get, but it's not a true Turing machine. The C preprocessor has certain limits that prevent it from executing an arbitrary program like a Turing machine could, even if you had infinite memory and time. Section 5.2.4.1 of the C spec gives the following minimum limits for a C compiler:
- 63 nesting levels of parenthesized expressions within a full expression
- 63 significant initial characters in an internal identifier or a macro name
- 4095 macro identifiers simultaneously defined in one preprocessing translation unit
- 4095 characters in a logical source line
The counter mechanism below requires a macro definition per value, so the macro definition limit will limit how many times you can loop (EVAL(REPEAT(4100, M, ~))
would yield undefined behavior). This essentially puts a cap on the complexity of the program that you can execute. The nesting and complexity of the multi-level expansions may hit one of the other limits as well.
This is fundamentally different than the "infinite memory" limitation. In this case, the spec specifically says that a standards-conforming C compiler is only required to conform to these limits, even if it has infinite time, memory, etc. Any input file exceeding these limits can be processed in an unpredictable or undefined manner (or outright rejected). Some implementations may have higher limits, or no limits at all, but that's considered "implementation-specific" and not part of the standard. It may be possible to use Paul Fultz II's method to implement something like a Turing machine on some specific compiler implementation that has no finite limits, but in a general sense of "can this be done on any arbitrary, standards-conforming C99 pre-processor", the answer is no. Since the limit here is built into the language itself and not simply a side-effect of our inability to construct an infinite computer, I say that breaks Turing completeness.
Solution 3:
To be Turing complete, one needs to define recursion that may never finish -- one calls them mu-recursive operator.
To define such an operator one needs an infinite space of defined identifiers (in case that each identifier is evaluated a finite number of times), as one cannot know a priori an upper limit of time in which the result is found. With a finite number of operators inside the code one needs to be able to check an unlimited number of possibilities.
So this class of functions cannot be computed by the C preprocessor because in C preprocessor there is a limited number of defined macros and each one is expanded only once.
The C preprocessor uses the Dave Prosser's algorithm (written by Dave Prosser for the WG14 team in 1984). In this algorithm a macro is painted blue in the moment of the first expansion; a recursive call (or mutual recursive call) does not expand it, as it has already been painted blue in the moment when the first expansion starts. So with a finite number of preprocessing lines it is impossible to make infinite calls of functions(macros), which characterizes the mu-recursive operators.
The C preprocessor can compute only sigma-recursive operators .
For details see the course of computation of Marvin L. Minsky (1967) -- Computation: Finite and Infinite Machines, Prentice-Hall, Inc. Englewood Cliffs, N.J. etc.
Solution 4:
It's Turing complete within limits (as are all computers since they don't have infinite RAM). Check out the kinds of things you can do with Boost Preprocessor.
Edit in response to question edits:
The main limitation on Boost is the maximum macro expansion depth which is compiler-specific. Also, the macros that implement recursion (FOR..., ENUM..., etc.) aren't truly recursive, they just appear that way thanks to a bunch of near-identical macros. In the big picture, this limitation is no different than having a maximum stack size in an actually recursive language.
The only two things that are really necessary for limited Turing-completeness (Turing-compatibility?) are iteration/recursion (equivalent constructs) and conditional branching.