When someone writes a new programming language, what do they write it IN?
Its not a stupid question. Its an excellent question.
As already answered the short answer is, "Another language."
Well that leads to some interesting questions? What if its the very first language written for your particular piece of hardware? A very real problem for people who work on embedded devices. As already answered "a language on another computer". In fact some embedded devices will never get a compiler, their programs will always be compiled on a different computer.
But you can push it back even further. What about the first programs ever written?
Well the first compilers for "high level languages" would have been written in whats called "assembly language". Assembly language is a language where each instruction in the language corresponds to a single instruction to the CPU. Its very low level language and extremely verbose and very labor intensive to write in.
But even writing assembly language requires a program called an assembler to convert the assembly language into "machine language". We go back further. The very first assemblers were written in "machine code". A program consisting entirely of binary numbers that are a direct one-to-one correspondence with the raw language of the computer itself.
But it still doesn't end. Even a file with just raw numbers in it still needs translation. You still need to get those raw numbers in a file into the computer.
Well believe it or not the early computers had a row of switches on the front of them. You flipped the switches till they represented a binary number, then you flicked another switch and that loaded that single number into the computers memory. Then you kept going flicking switched until you had loaded a minimal computer program that could read programs from disk files or punch cards. You flicked another switch and it started the program running. When I went to university in the 80's I saw computers that had that capacity but never was given the job of loading in a program with the switches.
And even earlier than that computer programs had to be hard wired with plug boards!
The most common answer is C
. Most languages are implemented in C or in a hybrid of C with callbacks and a "lexer" like Flex and parser generator like YACC. These are languages which are used for one purpose - to describe syntax of another language. Sometimes, when it comes to compiled languages, they are first implemented in C. Then the first version of the language is used to create a new version, and so on. (Like Haskell.)
A lot of languages are bootstrapped- that is written in themselves. As to why you would want to do this, it is often a good idea to eat your own dogfood.
The wikipedia article I refer to discusses the chicken and egg issue. I think you will find it quite interesting.
Pretty much any language, though using one suited to working with graphs and other complex data structures will make many things easier. Production compilers are often written in C or C++ for performance reasons, but languages such as OCaml, SML, Prolog, and Lisp are arguably better for prototyping the language.
There are also several "little languages" used in language design. Lex and yacc are used for specifying syntax and grammars, for example, and they compile to C. (There are ports for other languages, such as ocamllex / ocamlyacc, and many other similar tools.)
As a special case, new Lisp dialects are often built on existing Lisp implementations, since they can piggyback on most of the same infrastructure. Writing a Scheme interpreter can be done in Scheme in under a page of code, at which point one can easily add new features.
Fundamentally, compilers are just programs that read in something and translate it to something else - converting LaTeX source to DVI, converting C code to assembly and then to machine language, converting a grammar specification to C code for a parser, etc. Its designer specifies the structure of the source format (parsing), what those structures mean, how to simplify the data (optimizing), and the kind of output to generate. Interpreters read the source and execute it directly. (Interpreters are typically simpler to write, but much slower.)
"Writing a new programming language" technically doesn't involve any code. It's just coming up with a specification for what your language looks like and how it works. Once you have an idea of what your language is like, you can write translators and interpreters to actually make your language "work".
A translator inputs a program in one language and outputs an equivalent program in another language. An interpreter inputs a program in some language and runs it.
For example, a C compiler typically translates C source code (the input language) to an assembly language program (the output language). The assembler then takes the assembly language program and produces machine language. Once you have your output, you don't need the translators to run your program. Since you now have a machine language program, the CPU acts as the interpreter.
Many languages are implemented differently. For example, javac
is a translator that converts Java source code to JVM bytecode. The JVM is an interpreter [1] that runs Java bytecode. After you run javac
and get bytecode, you don't need javac
anymore. However, whenever you want to run your program, you'll need the JVM.
The fact that translators don't need to be kept around to run a program is what makes it possible to "bootstrap" your language without having it end up running "on top of" layers and layers of other languages.
[1] Most JVMs do translation behind the scenes, but they're not really translators in that the interface to the JVM is not "input language -> output language".