Performance of "direct" virtual call vs. interface call in C#

I think the article Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects will answer your questions. In particular, see the section *Interface Vtable Map and Interface Map-, and the following section on Virtual Dispatch.

It's probably possible for the JIT compiler to figure things out and optimize the code for your simple case. But not in the general case.

IFoo f2 = GetAFoo();

And GetAFoo is defined as returning an IFoo, then the JIT compiler wouldn't be able to optimize the call.


Here is what the disassembly looks like (Hans is correct):

            f.Bar(); // This is faster.
00000062  mov         rax,qword ptr [rsp+20h]
00000067  mov         rax,qword ptr [rax]
0000006a  mov         rcx,qword ptr [rsp+20h]
0000006f  call        qword ptr [rax+60h]
            f2.Bar();
00000072  mov         r11,7FF000400A0h
0000007c  mov         qword ptr [rsp+38h],r11
00000081  mov         rax,qword ptr [rsp+28h]
00000086  cmp         byte ptr [rax],0
00000089  mov         rcx,qword ptr [rsp+28h]
0000008e  mov         r11,qword ptr [rsp+38h]
00000093  mov         rax,qword ptr [rsp+38h]
00000098  call        qword ptr [rax]

I tried your test and on my machine, in a particular context, the result is actually the other way around.

I am running Windows 7 x64 and I have created a Visual Studio 2010 Console Application project into which I have copied your code. If a compile the project in Debug mode and with the platform target as x86 the output will be the following:

Direct call: 48.38 Through interface: 42.43

Actually every time when running the application it will provide slightly different results, but the interface calls will always be faster. I assume that since the application is compiled as x86, it will be run by the OS through WoW.

For a complete reference, below are the results for the rest of compilation configuration and target combinations.

Release mode and x86 target
Direct call: 23.02
Through interface: 32.73

Debug mode and x64 target
Direct call: 49.49
Through interface: 56.97

Release mode and x64 target
Direct call: 19.60
Through interface: 26.45

All of the above tests were made with .NET 4.0 as the target platform for the compiler. When switching to 3.5 and repeating the above tests, the calls through the interface were always longer than the direct calls.

So, the above tests rather complicate things since it seems that the behavior you spotted is not always happening.

In the end, with the risk of upsetting you, I would like to add a few thoughts. Many people added comments that the performance differences are quite small and in real world programming you should not care about them and I agree with this point of view. There are two main reasons for it.

The first and the most advertised one is that .NET was build on a higher level in order to enable developers to focus on the higher levels of applications. A database or an external service call is thousands or sometimes millions of times slower than a virtual method call. Having a good high level architecture and focusing on the big performance consumers will always bring better results in modern applications rather than avoiding double-pointer-dereferences.

The second and more obscure one is that the .NET team by building the framework on a higher level has actually introduced a series of abstraction levels which the just in time compiler would be able to use for optimizations on different platforms. The more access they would give to the under layers, the more developers would be able to optimize for a specific platform, but the less the runtime compiler would be able to do for the others. That is the theory at least and that is why things are not as well documented as in C++ regarding this particular matter.