I understand how a language can bootstrap itself, but I haven't been able to find much reference on why you should consider bootstrapping.

The intuitive answer is that the language you're writing offers utilities that are not found in the "base" language of the compiler, and the language's features are relatively well-suited for a compiler.

For instance, it would make sense to bootstrap a C++ compiler -- it could potentially be much easier to maintain the compiler when OOP is properly used, as opposed to using plain C.

On the other hand, MATLAB certainly makes matrix math a lot easier than plain C, but I can't see any apparent benefits from writing a MATLAB compiler/interpreter in MATLAB -- it seems like it would become less maintainable. A similar view could be applied to the R programming language. Or a pretty extreme example would be bootstrapping Whitespace, which is written in Haskell -- definitely a massive superset of Whitespace.

Is the only reason for bootstrapping to take advantage of the new language's features? I know there's also the "because we can" reason, but that's not what I'm looking for :)


Solution 1:

There's a principle called "eating your own dogfood". By using a tool, you demonstrate the usefulness of the tool.

It is often asked, "if the compiler for language X isn't written in language X, why should I risk using it?"

This of course only applies to languages suitable for the domain of compiler writing.

Solution 2:

There are two main advantages to bootstrapped language implementations: first, as you suggest, to take advantages of the high-level features of said language in the implementation. However, a less-obvious but no less important advantage is that it lets you customize and extend the language without dropping into a lower layer written in C (or Java, or whatever sits below the new language runtime).

Metaprogramming may not be useful for most day-to-day tasks, but there are times where it can save you a lot of duplicated or boilerplate code. Being able to hook into the compiler and core runtime for a language at a high level can make advanced metaprogramming tasks much easier.

Solution 3:

Ken Thompson's Reflections on Trusting Trust explains one of the best reasons for bootstrapping. Essentially, your compiler learn new things for every version of the compiler in the bootstrapping chain that you will never have to teach it again.

In the case he mentions, The first compiler (C1) you write has to be explicitly told how to handle backslash escape. However, the second compiler (C2) is compiled using C1, so backslash escape handling is natively handled.

The cornerstone of his talk is the possibility that you could teach a compiler to add a backdoor to programs, and that future compilers compiled with the compromised compiler would also be given this ability and that it would never appear in the source!

Essentially, your program can learn new features at every compilation cycle that do not have to be reimplemented or recompiled in later compilation cycles because you compiler knows all about them already.

Take a minute to realise the ramifications.

[edit]: This is pretty terrible way to build a compiler, but the cool factor is through the roof. I wonder if it could be manageable with the right framework?