How does deferred LINQ query execution actually work?
Recently I faced such question:
What numbers will be printed considering the following code:
class Program
{
static void Main(string[] args)
{
int[] numbers = { 1, 3, 5, 7, 9 };
int threshold = 6;
var query = from value in numbers where value >= threshold select value;
threshold = 3;
var result = query.ToList();
result.ForEach(Console.WriteLine);
Console.ReadLine();
}
}
Answer: 3, 5, 7, 9
Which was quite surprising to me. I thought that threshold
value will be put onto stack at the query construction and later at execution time, that number will be pulled back and used in the condition..which didn't happen.
Another case (numbers
is set to null
just before execution):
static void Main(string[] args)
{
int[] numbers = { 1, 3, 5, 7, 9 };
int threshold = 6;
var query = from value in numbers where value >= threshold select value;
threshold = 3;
numbers = null;
var result = query.ToList();
...
}
Seems to have no effect on the query. It prints out exactly the same answer as in previous example.
Could anyone help me understand what is really going on behind the scene? Why changing threshold
has the impact on the query execution while changing numbers
doesn't?
Solution 1:
Your query can be written like this in method syntax:
var query = numbers.Where(value => value >= threshold);
Or:
Func<int, bool> predicate = delegate(value) {
return value >= threshold;
}
IEnumerable<int> query = numbers.Where(predicate);
These pieces of code (including your own query in query syntax) are all equivalent.
When you unroll the query like that, you see that predicate
is an anonymous method and threshold
is a closure in that method. That means it will assume the value at the time of execution. The compiler will generate an actual (non-anonymous) method that will take care of that. The method will not be executed when it's declared, but for each item when query
is enumerated (the execution is deferred). Since the enumeration happens after the value of threshold
is changed (and threshold
is a closure), the new value is used.
When you set numbers
to null
, you set the reference to nowhere, but the object still exists. The IEnumerable
returned by Where
(and referenced in query
) still references it and it does not matter that the initial reference is null
now.
That explains the behavior: numbers
and threshold
play different roles in the deferred execution. numbers
is a reference to the array that is enumerated, while threshold
is a local variable, whose scope is ”forwarded“ to the anonymous method.
Extension, part 1: Modification of the closure during the enumeration
You can take your example one step further when you replace the line...
var result = query.ToList();
...with:
List<int> result = new List<int>();
foreach(int value in query) {
threshold = 8;
result.Add(value);
}
What you are doing is to change the value of threshold
during the iteration of your array. When you hit the body of the loop the first time (when value
is 3), you change the threshold to 8, which means the values 5 and 7 will be skipped and the next value to be added to the list is 9. The reason is that the value of threshold
will be evaluated again on each iteration and the then valid value will be used. And since the threshold has changed to 8, the numbers 5 and 7 do not evaluate as greater or equal anymore.
Extension, part 2: Entity Framework is different
To make things more complicated, when you use LINQ providers that create a different query from your original and then execute it, things are slightly different. The most common examples are Entity Framework (EF) and LINQ2SQL (now largely superseded by EF). These providers create an SQL query from the original query before the enumeration. Since this time the value of the closure is evaluated only once (it actually is not a closure, because the compiler generates an expression tree and not an anonymous method), changes in threshold
during the enumeration have no effect on the result. These changes happen after the query is submitted to the database.
The lesson from this is that you have to be always aware which flavor of LINQ you are using and that some understanding of its inner workings is an advantage.
Solution 2:
Easiest is to see what will be generated by compiler. You can use this site: https://sharplab.io
using System.Linq;
public class MyClass
{
public void MyMethod()
{
int[] numbers = { 1, 3, 5, 7, 9 };
int threshold = 6;
var query = from value in numbers where value >= threshold select value;
threshold = 3;
numbers = null;
var result = query.ToList();
}
}
And here is the output:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Security;
using System.Security.Permissions;
[assembly: AssemblyVersion("0.0.0.0")]
[assembly: Debuggable(DebuggableAttribute.DebuggingModes.Default | DebuggableAttribute.DebuggingModes.DisableOptimizations | DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints | DebuggableAttribute.DebuggingModes.EnableEditAndContinue)]
[assembly: CompilationRelaxations(8)]
[assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)]
[assembly: SecurityPermission(SecurityAction.RequestMinimum, SkipVerification = true)]
[module: UnverifiableCode]
public class MyClass
{
[CompilerGenerated]
private sealed class <>c__DisplayClass0_0
{
public int threshold;
internal bool <MyMethod>b__0(int value)
{
return value >= this.threshold;
}
}
public void MyMethod()
{
MyClass.<>c__DisplayClass0_0 <>c__DisplayClass0_ = new MyClass.<>c__DisplayClass0_0();
int[] expr_0D = new int[5];
RuntimeHelpers.InitializeArray(expr_0D, fieldof(<PrivateImplementationDetails>.D603F5B3D40E40D770E3887027E5A6617058C433).FieldHandle);
int[] source = expr_0D;
<>c__DisplayClass0_.threshold = 6;
IEnumerable<int> source2 = source.Where(new Func<int, bool>(<>c__DisplayClass0_.<MyMethod>b__0));
<>c__DisplayClass0_.threshold = 3;
List<int> list = source2.ToList<int>();
}
}
[CompilerGenerated]
internal sealed class <PrivateImplementationDetails>
{
[StructLayout(LayoutKind.Explicit, Pack = 1, Size = 20)]
private struct __StaticArrayInitTypeSize=20
{
}
internal static readonly <PrivateImplementationDetails>.__StaticArrayInitTypeSize=20 D603F5B3D40E40D770E3887027E5A6617058C433 = bytearray(1, 0, 0, 0, 3, 0, 0, 0, 5, 0, 0, 0, 7, 0, 0, 0, 9, 0, 0, 0);
}
As you can see, if you change threshold
variable, you really changes field in auto-generated
class. Because you can execute query at any time, it is not possible to have reference to field which lives on the stack - because when you exit method, threshold
will be removed from the stack - so compiler changes this field into auto-generated class with field
of the same type.
And second problem: why null works (it is not visible in this code)
When you use: source.Where
it calls this extension method:
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
if (source == null) throw Error.ArgumentNull("source");
if (predicate == null) throw Error.ArgumentNull("predicate");
if (source is Iterator<TSource>) return ((Iterator<TSource>)source).Where(predicate);
if (source is TSource[]) return new WhereArrayIterator<TSource>((TSource[])source, predicate);
if (source is List<TSource>) return new WhereListIterator<TSource>((List<TSource>)source, predicate);
return new WhereEnumerableIterator<TSource>(source, predicate);
}
As you can see, it passes reference to:
WhereEnumerableIterator<TSource>(source, predicate);
And here is source code for where iterator
:
class WhereEnumerableIterator<TSource> : Iterator<TSource>
{
IEnumerable<TSource> source;
Func<TSource, bool> predicate;
IEnumerator<TSource> enumerator;
public WhereEnumerableIterator(IEnumerable<TSource> source, Func<TSource, bool> predicate) {
this.source = source;
this.predicate = predicate;
}
public override Iterator<TSource> Clone() {
return new WhereEnumerableIterator<TSource>(source, predicate);
}
public override void Dispose() {
if (enumerator is IDisposable) ((IDisposable)enumerator).Dispose();
enumerator = null;
base.Dispose();
}
public override bool MoveNext() {
switch (state) {
case 1:
enumerator = source.GetEnumerator();
state = 2;
goto case 2;
case 2:
while (enumerator.MoveNext()) {
TSource item = enumerator.Current;
if (predicate(item)) {
current = item;
return true;
}
}
Dispose();
break;
}
return false;
}
public override IEnumerable<TResult> Select<TResult>(Func<TSource, TResult> selector) {
return new WhereSelectEnumerableIterator<TSource, TResult>(source, predicate, selector);
}
public override IEnumerable<TSource> Where(Func<TSource, bool> predicate) {
return new WhereEnumerableIterator<TSource>(source, CombinePredicates(this.predicate, predicate));
}
}
So it just simply keeps reference to our source object in private field.