Understanding STG
The design of GHC is based on something called STG, which stands for "spineless, tagless G-machine".
Now G-machine is apparently short for "graph reduction machine", which defines how laziness is implemented. Unevaluated thunks are stored as an expression tree, and executing the program involves reducing these down to normal form. (A tree is an acyclic graph, but Haskell's pervasive recursion means that Haskell expressions form general graphs, hence graph-reduction and not tree-reduction.)
What is less clear are the terms "spineless" and "tagless".
I think that "spineless" refers to the fact that function applications do not have a "spine" of function application nodes. Instead, you have an object that names the function called and points to all of its arguments. Is that correct?
I thought that "tagless" referred to constructor nodes not being "tagged" with a constructor ID, and instead case-expressions are resolved using a jump instruction. But now I'm not sure that's correct. Instead, it seems to refer to the fact that nodes aren't tagged with their evaluation state. Can anyone clarify which (if any) of these interpretations is correct?
GHC wiki contains an introductory article about STG written by Max Bolingbroke:
The STG machine is an essential part of GHC, the world's leading Haskell compiler. It defines how the Haskell evaluation model should be efficiently implemented on standard hardware. Despite this key role, it is generally poorly understood amongst GHC users. This document aims to provide an overview of the STG machine in its modern, eval/apply-based, pointer-tagged incarnation by a series of simple examples showing how Haskell source code is compiled.
You are right about the "Spineless", that is it, if I'm correct. It is basically described on the 1988 article by Burn, Peyton-Jones and Robson, "The Spineless G-Machine". I've read it, but it is not so fresh in my mind. Basically, on the G-Machine, all stack entries point to an application node except the one on the top, which points to the head of the expression. Those application nodes make access to the arguments indirect, and in some G-Machine descriptions, before applying a function the stack is rearranged, so that the last n nodes on the stack are made to point to the argument instead of the application node. If I am not mistaken, the "Spineless" part is about avoiding having these application nodes (which are called the spine of the graph) on the stack altogether, thus avoiding that re-arrangement before each reduction.
As to the "Tagless" part, you are more correct now that you used to be, but... Using tags on nodes is a very, very old thing. Can you think on how a dynamically-typed language such as LISP was implemented? Every cell must have its value and a tag which says the type. If you want something you must examine the tag and act accordingly. In the case of Haskell, the evaluation state is more important than type, Haskell is statically typed.
In the STG machine, tags are not used. Tags were replaced, maybe through inspiration of OO lanaguages, by a set of function pointers. When you want the value of a node which has not been computed, the function will compute it. When it is already computed, the function returns it. This allows for a lot of creativity in what this function can do without making client code any more complex.
This "Tagless" part yes, is described in the "implementation of functional languages on stock hardware" article by SPJ.
There is also objection to this "tagless" thing. Basically, this involves function pointers, which is an indirect jump in computer architecture terms. And indirect jumps are an obstacle to branch prediction and hence to pipelining in general. Because either the architecture considers there is a data dependency on the jump argument, halting the pipeline, or the architecture assumes it does not know the destination and halts the pipeline.
You want to read SPJ's book about functional PL implementation:
- http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/index.htm