Performance of compiled-to-delegate Expression

Solution 1:

This is pretty strange for such a huge overheard. There are a few things to take into account. First the VS compiled code has different properties applied to it that might influence the jitter to optimize differently.

Are you including the first execution for the compiled delegate in these results? You shouldn't, you should ignore the first execution of either code path. You should also turn the normal code into a delegate as delegate invocation is slightly slower than invoking an instance method, which is slower than invoking a static method.

As for other changes there is something to account for the fact that the compiled delegate has a closure object which isn't being used here but means that this is a targeted delegate which might perform a bit slower. You'll notice the compiled delegate has a target object and all the arguments are shifted down by one.

Also methods generated by lcg are considered static which tend to be slower when compiled to delegates than instance methods because of register switching business. (Duffy said that the "this" pointer has a reserved register in CLR and when you have a delegate for a static it has to be shifted to a different register invoking a slight overhead). Finally, code generated at runtime seems to run slightly slower than code generated by VS. Code generated at runtime seems to have extra sandboxing and is launched from a different assembly (try using something like ldftn opcode or calli opcode if you don't believe me, those reflection.emited delegates will compile but won't let you actually execute them) which invokes a minimal overhead.

Also you are running in release mode right? There was a similar topic where we looked over this problem here: Why is Func<> created from Expression<Func<>> slower than Func<> declared directly?

Edit: Also see my answer here: DynamicMethod is much slower than compiled IL function

The main takeaway is that you should add the following code to the assembly where you plan to create and invoke run-time generated code.

[assembly: AllowPartiallyTrustedCallers]
[assembly: SecurityTransparent]
[assembly: SecurityRules(SecurityRuleSet.Level2,SkipVerificationInFullTrust=true)]

And to always use a built-in delegate type or one from an assembly with those flags.

The reason being that anonymous dynamic code is hosted in an assembly that is always marked as partial trust. By allowing partially trusted callers you can skip part of the handshake. The transparency means that your code is not going to raise the security level (i.e. slow behavior), And finally the real trick is to invoke a delegate type hosted in an assembly that is marked as skip verification. Func<int,int>#Invoke is fully trusted, so no verification is needed. This will give you performance of code generated from the VS compiler. By not using these attributes you are looking at an overhead in .NET 4. You might think that SecurityRuleSet.Level1 would be a good way to avoid this overhead, but switching security models is also expensive.

In short, add those attributes, and then your micro-loop performance test, will run about the same.

Solution 2:

It sounds like you're running into invocation overhead. Regardless of the source, though, if your method runs faster when loaded from a compiled assembly, simply compile it into an assembly and load it! See my answer at Why is Func<> created from Expression<Func<>> slower than Func<> declared directly? for more details on how.

Solution 3:

Check these links to see what happens when you compile your LambdaExpression (and yes, it is done using Reflection)

  1. http://msdn.microsoft.com/en-us/magazine/cc163759.aspx#S3
  2. http://blogs.msdn.com/b/ericgu/archive/2004/03/19/92911.aspx

Solution 4:

You are may compile Expression Tree manually via Reflection.Emit. It will generally provide faster compilation time (in my case below ~30 times faster), and will allow you to tune emitted result performance. And it not so hard to do, especially if your Expressions are limited known subset.

The idea is to use ExpressionVisitor to traverse the expression and emit the IL for corresponding expression type. It's also "quite" simple to write your own Visitor to handle the known subset of expressions, and fallback to normal Expression.Compile for not yet supported expression types.

In my case I am generating the delegate:

Func<object[], object> createA = state =>
    new A(
        new B(), 
        (string)state[11], 
        new ID[2] { new D1(), new D2() }) { 
        Prop = new P(new B()), Bop = new B() 
    };

The test creates the corresponding expression tree and compares its Expression.Compile vs visiting and emitting the IL and then creating delegate from DynamicMethod.

The results:

Compile Expression 3000 times: 814
Invoke Compiled Expression 5000000 times: 724
Emit from Expression 3000 times: 36
Run Emitted Expression 5000000 times: 722

36 vs 814 when compiling manually.

Here the full code.

Solution 5:

I think that's the impact of having Reflection at this point. The second method is using reflection to get and set the values. As far as I can see that at this point, it's not the delegate, but the reflection that costs its time.

About the third solution: Also Lambda Expressions need to be evaluated at runtime, which also costs its time. And that's not few...

So you'll never get the second and third solution as fast as the manual copying.

Have a look at my code samples here. Think that is propably the fasted solution you can take, if you don't want manual coding: http://jachman.wordpress.com/2006/08/22/2000-faster-using-dynamic-method-calls/