From Eric Lippert's blog: "don't close over the loop variable" [duplicate]

Possible Duplicates:
Why is it bad to use a iteration variable in a lambda expression
C# - The foreach identifier and closures

From Eric Lippert's 28 June 2010 entry:

static IEnumerable<IEnumerable<T>>
  CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
  // base case:
  IEnumerable<IEnumerable<T>> result = new[] { Enumerable.Empty<T>() };

  foreach(var sequence in sequences)
  {
    var s = sequence; // don't close over the loop variable

    // recursive case: use SelectMany to build the new product out of the old one
    result = 
      from seq in result
      from item in s
      select seq.Concat(new[] {item});
  }

  return result;
}

The var s = sequence; looks like a no-op. Why isn't it one? What goes wrong when sequence is used directly?

And, more subjectively: to what extent is this considered a wart in C#'s behaviour?


Solution 1:

A couple of relevant articles from Eric himself, along with some interesting discussion in the comments:

  • Closing over the loop variable considered harmful
  • Closing over the loop variable, part two

Solution 2:

It's a subtle scope issue that has to do with how closures and deferred execution work.

If you don't use a local variable, but instead sequence directly, the result IEnumarable is bound to the VARIABLE sequence rather than the VALUE of sequence and at the time the query is executed, the VARIABLE sequence contains the LAST VALUE of sequences.

If you declare another local variable as in Eric's example, the scope is limited to each loop iteration. Therefore it will be evaluated as intended, even if execution is deferred.

Solution 3:

The LINQ query being used here causes the value of s to be available outside the scope of where it was originally defined (that is, the CartesianProduct method). This is what is referred to as a closure. Due to delayed execution, by the time the LINQ query is actually evaluated (assuming it is eventually evaluated), the enclosing method will have been completed and the s variable will be 'out of scope', at least according to the traditional scope rules. Despite this, it is still 'safe' in this context to refer s.

Closures are very convenient and well-behaved in a traditional functional programming language where things are naturally immutable. And the fact that C# is foremost an imperative programming language, where variables are mutable by default, is the basis for the issue that results in this strange work-around.

By creating the intermediate variable within the scope of the loop, you are effectively instructing the compiler to allocate a separate, non-shared variable for each iteration of the LINQ query. Otherwise, each iteration will share the same instance of the variable, which will also be (obviously) the same value...probably not what you want.