How to create a C compiler for custom CPU?

What would be the easiest way to create a C compiler for a custom CPU, assuming of course I already have an assembler for it?

Since a C compiler generates assembly, is there some way to just define standard bits and pieces of assembly code for the various C idioms, rebuild the compiler, and thereby obtain a cross compiler for the target hardware?

Preferably the compiler itself would be written in C, and build as a native executable for either Linux or Windows.

Please note: I am not asking how to write the compiler itself. I did take that course in college, I know about general compiler-compilers, etc. In this situation, I'd just like to configure some existing framework if at all possible. I don't want to modify the language, I just want to be able to target an arbitrary architecture. If the answer turns out to be "it doesn't work that way", that information will be useful to myself and anyone else who might make similar assumptions.


Solution 1:

Quick overview/tutorial on writing a LLVM backend.

This document describes techniques for writing backends for LLVM which convert the LLVM representation to machine assembly code or other languages.

[ . . . ]

To create a static compiler (one that emits text assembly), you need to implement the following:

  • Describe the register set.
  • Describe the instruction set.
  • Describe the target machine.
  • Implement the assembly printer for the architecture.
  • Implement an instruction selector for the architecture.

Solution 2:

There's the concept of a cross-compiler, ie., one that runs on one architecture, but targets a different one. You can see how GCC does it (for example) and add a new architecture to the set, if that's the compiler you want to extend.

Edit: I just spotted a question a few years ago on a GCC mailing list on how to add a new target and someone pointed to this

Solution 3:

The short answer is that it doesn't work that way.

The longer answer is that it does take some effort to write a compiler for a new CPU type. You don't need to create a compiler from scratch, however. Most compilers are structured in several passes; here's a typical architecture (a lot of variations are possible):

  1. Syntactic analysis (lexer and parser), and for C preprocessing, leading to an abstract syntax tree.
  2. Type checking, leading to an annotated abstract syntax tree.
  3. Intermediate code generation, leading to architecture-independent intermediate code. Some optimizations are performed at this stage.
  4. Machine code generation, leading to assembly or directly to machine code. More optimizations are performed at this stage.

In this description, only step 4 is machine-dependent. So you can take a compiler where step 4 is clearly separated and plug in your own step 4. Doing this requires a deep understanding of the CPU and some understanding of the compiler internals, but you don't need to worry about what happens before.

Almost all CPUs that are not very small, very rare or very old have a backend (step 4) for GCC. The main documentation for writing a GCC backend is the GCC internals manual, in particular the chapters on machine descriptions and target descriptions. GCC is free software, so there is no licensing cost in using it.