Generate tail call opcode

c# recursion tail-recursion f# cil

Out of curiosity I was trying to generate a tail call opcode using C#. Fibinacci is an easy one, so my c# example looks like this:

    private static void Main(string[] args)
    {
        Console.WriteLine(Fib(int.MaxValue, 0));
    }

    public static int Fib(int i, int acc)
    {
        if (i == 0)
        {
            return acc;
        }

        return Fib(i - 1, acc + i);
    }

If I build it in release and run it without debugging I do not get a stack overflow. Debugging or running it without optimizations and I do get a stack overflow, implying that tail call is working when in release with optimizations on (which is what I expected).

The MSIL for this looks like this:

.method public hidebysig static int32 Fib(int32 i, int32 acc) cil managed
{
    // Method Start RVA 0x205e
    // Code Size 17 (0x11)
    .maxstack 8
    L_0000: ldarg.0 
    L_0001: brtrue.s L_0005
    L_0003: ldarg.1 
    L_0004: ret 
    L_0005: ldarg.0 
    L_0006: ldc.i4.1 
    L_0007: sub 
    L_0008: ldarg.1 
    L_0009: ldarg.0 
    L_000a: add 
    L_000b: call int32 [ConsoleApplication2]ConsoleApplication2.Program::Fib(int32,int32)
    L_0010: ret 
}

I would've expected to see a tail opcode, per the msdn, but it's not there. This got me wondering if the JIT compiler was responsible for putting it in there? I tried to ngen the assembly (using ngen install <exe>, navigating to the windows assemblies list to get it) and load it back up in ILSpy but it looks the same to me:

.method public hidebysig static int32 Fib(int32 i, int32 acc) cil managed
{
    // Method Start RVA 0x3bfe
    // Code Size 17 (0x11)
    .maxstack 8
    L_0000: ldarg.0 
    L_0001: brtrue.s L_0005
    L_0003: ldarg.1 
    L_0004: ret 
    L_0005: ldarg.0 
    L_0006: ldc.i4.1 
    L_0007: sub 
    L_0008: ldarg.1 
    L_0009: ldarg.0 
    L_000a: add 
    L_000b: call int32 [ConsoleApplication2]ConsoleApplication2.Program::Fib(int32,int32)
    L_0010: ret 
}

I still don't see it.

I know F# handles tail call well, so I wanted to compare what F# did with what C# did. My F# example looks like this:

let rec fibb i acc =  
    if i = 0 then
        acc
    else 
        fibb (i-1) (acc + i)


Console.WriteLine (fibb 3 0)

And the generated IL for the fib method looks like this:

.method public static int32 fibb(int32 i, int32 acc) cil managed
{
    // Method Start RVA 0x2068
    // Code Size 18 (0x12)
    .custom instance void [FSharp.Core]Microsoft.FSharp.Core.CompilationArgumentCountsAttribute::.ctor(int32[]) = { int32[](Mono.Cecil.CustomAttributeArgument[]) }
    .maxstack 5
    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: brtrue.s L_0006
    L_0004: ldarg.1 
    L_0005: ret 
    L_0006: ldarg.0 
    L_0007: ldc.i4.1 
    L_0008: sub 
    L_0009: ldarg.1 
    L_000a: ldarg.0 
    L_000b: add 
    L_000c: starg.s acc
    L_000e: starg.s i
    L_0010: br.s L_0000
}

Which, according to ILSpy, is equivalent to this:

[Microsoft.FSharp.Core.CompilationArgumentCounts(Mono.Cecil.CustomAttributeArgument[])]
public static int32 fibb(int32 i, int32 acc)
{
    label1:
    if !(((i != 0))) 
    {
        return acc;
    }
    (i - 1);
    i = acc = (acc + i);;
    goto label1;
}

So F# generated tail call using goto statements? This isn't what I was expecting.

I'm not trying to rely on tail call anywhere, but I am just curious where exactly does that opcode get set? How is C# doing this?

C# compiler does not give you any guarantees about tail-call optimizations because C# programs usually use loops and so they do not rely on the tail-call optimizations. So, in C#, this is simply a JIT optimization that may or may not happen (and you cannot rely on it).

F# compiler is designed to handle functional code that uses recursion and so it does give you certain guarantees about tail-calls. This is done in two ways:

if you write a recursive function that calls itself (like your fib) the compiler turns it into a function that uses loop in the body (this is a simple optimization and the produced code is faster than using a tail-call)
if you use a recursive call in a more complex position (when using continuation passing style where function is passed as an argument), then the compiler generates a tail-call instruction that tells the JIT that it must use a tail call.

As an example of the second case, compile the following simple F# function (F# does not do this in Debug mode to simplify debugging, so you may need Release mode or add --tailcalls+):

let foo a cont = cont (a + 1)

The function simply calls the function cont with the first argument incremented by one. In continuation passing style, you have a long sequence of such calls, so the optimization is crucial (you simply cannot use this style without some handling of tail calls). The generates IL code looks like this:

IL_0000: ldarg.1
IL_0001: ldarg.0
IL_0002: ldc.i4.1
IL_0003: add
IL_0004: tail.                          // Here is the 'tail' opcode!
IL_0006: callvirt instance !1 
  class [FSharp.Core] Microsoft.FSharp.Core.FSharpFunc`2<int32, !!a>::Invoke(!0)
IL_000b: ret

The situation with tail call optimization in .Net is quite complicated. As far as I know, it's like this:

The C# compiler will never emit the tail. opcode and it will also never do the tail call optimization by itself.
The F# compiler sometimes emits the tail. opcode and sometimes does the tail call optimization by itself by emitting IL that's not recursive.
The CLR will honor the tail. opcode if it's present and the 64-bit CLR will sometimes make the tail call optimization even when the opcode is not present.

So, in your case, you didn't see the tail. opcode in the IL generated by the C# compiler, because it doesn't do that. But the method was tail-call optimized, because the CLR sometimes does that even without the opcode.

And in the F# case, you observed that the f# compiler did the optimization by itself.

Like all optimizations performed in .NET (Roslyn languages), tail call optimization is a job performed by the jitter, not the compiler. The philosophy is that putting the job on the jitter is useful since any language will benefit from it and the normally difficult job of writing and debugging a code optimizer has to be done only once per architecture.

You have to look at the generated machine code to see it being done, Debug + Windows + Disassembly. With the further requirement that you do so by looking at the Release build code that's generated with the Tools + Options, Debugging, General, Suppress JIT optimization unticked.

The x64 code looks like this:

        public static int Fib(int i, int acc) {
            if (i == 0) {
00000000  test        ecx,ecx 
00000002  jne         0000000000000008 
                return acc;
00000004  mov         eax,edx 
00000006  jmp         0000000000000011 
            }

            return Fib(i - 1, acc + i);
00000008  lea         eax,[rcx-1] 
0000000b  add         edx,ecx 
0000000d  mov         ecx,eax 
0000000f  jmp         0000000000000000              // <== here!!!
00000011  rep ret

Note the marked instruction, a jump instead of a call. That's tail call optimization at work. A quirk in .NET is that the 32-bit x86 jitter does not perform this optimization. Simply a to-do item that they'll probably never get around to. Which did require the F# compiler writers to not ignore the problem and emit Opcodes.Tailcall. You'll find other optimizations performed by the jitter documented in this answer.

Generate tail call opcode

Related

Recent Posts