How are GCC and g++ bootstrapped?
The oldest version of GCC was compiled using another C compiler, since there were others when it was written. The very first C compiler ever (ca. 1973, IIRC) was implemented either in PDP-11 assembly, or in the B programming language which preceded it, but in any case the B compiler was written in assembly. Similarly, the first ever C++ compiler (CPre/Cfront, 1979-1983) were probably first implemented in C, then rewritten in C++.
When you compile GCC or any other self-hosting compiler, the full order of building is:
- Build new version of GCC with existing C compiler
- re-build new version of GCC with the one you just built
- (optional) repeat step 2 for verification purposes.
This process is called bootstrapping. It tests the compiler's capability of compiling itself and makes sure that the resulting compiler is built with all the optimizations that it itself implements.
EDIT: Drew Dormann, in the comments, points to Bjarne Stroustrup's account of the earliest implementation of C++. It was implemented in C++ but translated by what Stroustrup calls a "preprocessor" from C++ to C; not a full compiler by his definition, but still C++ was bootstrapped in C.
If you want to replicate the bootstrap process of GCC in a modern environment (x86 Linux), you can use the tools developed by the bootstrappable project:
-
We can start with
hex0
assembler (on x86 it's 357 byte binary) which does roughly what the following two commands dosed 's/[;#].*$//g' hex0_x86.hex0 | xxd -r -p > hex0 chmod +x hex0
I.e. it translates ASCII equivalent of binary program into binary code, but it is written in hex0 itself.
Basically, hex0 has equivalent source code that is in one to one correspondence to its binary code.
-
hex0
can be used to build a slighly more powerfulhex1
assembler that supports a few more features (one character labels and calculates offsets). hex1 is written in hex0 assembly. -
hex1
can be used to buildhex2
(even more advanced assembler that supports multi character labels). -
hex2
then can be used to build a macro assembler (where program using macros instead of hex opcodes). -
You can then use thismacro assembler to build
cc_x86
which is a "C compiler" written in assembly. cc_x86 only supports a small subset of C but that's an impresive start. -
You can use
cc_x86
to buildM2-Planet
(Macro Platform Neutral Transpiler) which is a C compiler written in C. M2-Planet is self hosting and can build itself. -
You can then use M2-Planet to build GNU Mes which is a small scheme interpreter.
-
mes can be used to run mescc which is a C compiler written in scheme and lives in the same repository as mes.
-
mescc can be used to rebuild mes and also build mes C library.
-
Then mescc can be used to build a slighly patched Tiny C compiler.
-
Then you can use it to build newer version of TCC 0.9.27.
-
GCC 4.0.4 and musl C library can be built with TCC 0.9.27.
-
Then you can build newer GCC using older GCC. E.g. GCC 4.0.4 -> GCC 4.7.4 -> modern GCC.
TL;DR:
hex0 -> hex1 -> hex2 -> M0 -> M2-Planet -> Mes -> Mescc -> TCC -> GCC.