Which, if any, C++ compilers do tail-recursion optimization?

Solution 1:

All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade), even for mutually recursive calls such as:

int bar(int, int);

int foo(int n, int acc) {
    return (n == 0) ? acc : bar(n - 1, acc + 2);
}

int bar(int n, int acc) {
    return (n == 0) ? acc : foo(n - 1, acc + 1);
}

Letting the compiler do the optimisation is straightforward: Just switch on optimisation for speed:

  • For MSVC, use /O2 or /Ox.
  • For GCC, Clang and ICC, use -O3

An easy way to check if the compiler did the optimisation is to perform a call that would otherwise result in a stack overflow — or looking at the assembly output.

As an interesting historical note, tail call optimisation for C was added to the GCC in the course of a diploma thesis by Mark Probst. The thesis describes some interesting caveats in the implementation. It's worth reading.

Solution 2:

gcc 4.3.2 completely inlines this function (crappy/trivial atoi() implementation) into main(). Optimization level is -O1. I notice if I play around with it (even changing it from static to extern, the tail recursion goes away pretty fast, so I wouldn't depend on it for program correctness.

#include <stdio.h>
static int atoi(const char *str, int n)
{
    if (str == 0 || *str == 0)
        return n;
    return atoi(str+1, n*10 + *str-'0');
}
int main(int argc, char **argv)
{
    for (int i = 1; i != argc; ++i)
        printf("%s -> %d\n", argv[i], atoi(argv[i], 0));
    return 0;
}

Solution 3:

As well as the obvious (compilers don't do this sort of optimization unless you ask for it), there is a complexity about tail-call optimization in C++: destructors.

Given something like:

   int fn(int j, int i)
   {
      if (i <= 0) return j;
      Funky cls(j,i);
      return fn(j, i-1);
   }

The compiler can't (in general) tail-call optimize this because it needs to call the destructor of cls after the recursive call returns.

Sometimes the compiler can see that the destructor has no externally visible side effects (so it can be done early), but often it can't.

A particularly common form of this is where Funky is actually a std::vector or similar.

Solution 4:

Most compilers don't do any kind of optimisation in a debug build.

If using VC, try a release build with PDB info turned on - this will let you trace through the optimised app and you should hopefully see what you want then. Note, however, that debugging and tracing an optimised build will jump you around all over the place, and often you cannot inspect variables directly as they only ever end up in registers or get optimised away entirely. It's an "interesting" experience...

Solution 5:

As Greg mentions, compilers won't do it in debug mode. It's ok for debug builds to be slower than a prod build, but they shouldn't crash more often: and if you depend on a tail call optimization, they may do exactly that. Because of this it is often best to rewrite the tail call as an normal loop. :-(