In which step of compilation are comments removed?

  • Preprocessing is a phase of its own with its own scanning and parsing, which precede lexical analysis.
  • I'm a compiler writer and I've never heard of 'line reconstruction'. Compilers don't process lines: they process token streams. Your citation specifically says this is a special case for a few odd languages.
  • You've left out flow analysis, optimization, register allocation, and code generation, and a few more.
  • Comments are ignored, not removed, during lexical analysis, which is sometimes conceptually described as 'screening' and 'scanning', in which case you can say comments are screened out, like white space.

I'm going to answer keeping a C compiler in mind. The following is usually the case in most compilers, but the examples I'm going to give would be for a C compiler.

The comments are removed after the line reconstruction phase and typically ignored during the lexical analysis phase. A quick verification can be done this way. Consider the following code:

printf("Hello "); // *************\
printf("World");  // I like boxes!\
printf("!\n);     // ^^^^^^^^^^^^^\

When the C compiler finds a backslash immediately followed by a new-line, the line-reconstruction phase would make that into a single line.

You can figure out what the above code would result in!

The design choice is because it is consistent with the behavior that we expect, that a backslash followed by a new-line must be concatenated always.

However, the lexical analysis phase involves tokenizing. This stage can conveniently ignore the comments when tokenizing the code for further processing. So by the time the next phase is called, the comments would've been already lost!

Hope this clarifies! :)

P.S.: Sources!

That line-reconstruction takes place before comments are even analyzed