How to tell if an IEnumerable<T> is subject to deferred execution?
I always assumed that if I was using Select(x=> ...)
in the context of LINQ to objects, then the new collection would be immediately created and remain static. I'm not quite sure WHY I assumed this, and its a very bad assumption but I did. I often use .ToList()
elsewhere, but often not in this case.
This code demonstrates that even a simple 'Select' is subject to deferred execution :
var random = new Random();
var animals = new[] { "cat", "dog", "mouse" };
var randomNumberOfAnimals = animals.Select(x => Math.Floor(random.NextDouble() * 100) + " " + x + "s");
foreach (var i in randomNumberOfAnimals)
{
testContextInstance.WriteLine("There are " + i);
}
foreach (var i in randomNumberOfAnimals)
{
testContextInstance.WriteLine("And now, there are " + i);
}
This outputs the following (the random function is called every time the collection is iterated through):
There are 75 cats
There are 28 dogs
There are 62 mouses
And now, there are 78 cats
And now, there are 69 dogs
And now, there are 43 mouses
I have many places where I have an IEnumerable<T>
as a member of a class. Often the results of a LINQ query are assigned to such an IEnumerable<T>
. Normally for me, this does not cause issues, but I have recently found a few places in my code where it poses more than just a performance issue.
In trying to check for places where I had made this mistake I thought I could check to see if a particular IEnumerable<T>
was of type IQueryable
. This I thought would tell me if the collection was 'deferred' or not. It turns out that the enumerator created by the Select operator above is of type System.Linq.Enumerable+WhereSelectArrayIterator``[System.String,System.String]
and not IQueryable
.
I used Reflector to see what this interface inherited from, and it turns out not to inherit from anything that indicates it is 'LINQ' at all - so there is no way to test based upon the collection type.
I'm quite happy now putting .ToArray()
everywhere now, but I'd like to have a mechanism to make sure this problem doesn't happen in future. Visual Studio seems to know how to do it because it gives a message about 'expanding the results view will evaluate the collection.'
The best I have come up with is :
bool deferred = !object.ReferenceEquals(randomNumberOfAnimals.First(),
randomNumberOfAnimals.First());
Edit: This only works if a new object is created with 'Select' and it not a generic solution. I'm not recommended it in any case though! It was a little tongue in the cheek of a solution.
Solution 1:
Deferred execution of LINQ has trapped a lot of people, you're not alone.
The approach I've taken to avoiding this problem is as follows:
Parameters to methods - use IEnumerable<T>
unless there's a need for a more specific interface.
Local variables - usually at the point where I create the LINQ, so I'll know whether lazy evaluation is possible.
Class members - never use IEnumerable<T>
, always use List<T>
. And always make them private.
Properties - use IEnumerable<T>
, and convert for storage in the setter.
public IEnumerable<Person> People
{
get { return people; }
set { people = value.ToList(); }
}
private List<People> people;
While there are theoretical cases where this approach wouldn't work, I've not run into one yet, and I've been enthusiasticly using the LINQ extension methods since late Beta.
BTW: I'm curious why you use ToArray();
instead of ToList();
- to me, lists have a much nicer API, and there's (almost) no performance cost.
Update: A couple of commenters have rightly pointed out that arrays have a theoretical performance advantage, so I've amended my statement above to "... there's (almost) no performance cost."
Update 2: I wrote some code to do some micro-benchmarking of the difference in performance between Arrays and Lists. On my laptop, and in my specific benchmark, the difference is around 5ns (that's nanoseconds) per access. I guess there are cases where saving 5ns per loop would be worthwhile ... but I've never come across one. I had to hike my test up to 100 million iterations before the runtime became long enough to accurately measure.
Solution 2:
In general, I'd say you should try to avoid worrying about whether it's deferred.
There are advantages to the streaming execution nature of IEnumerable<T>
. It is true - there are times that it's disadvantageous, but I'd recommend just always handling those (rare) times specifically - either go ToList()
or ToArray()
to convert it to a list or array as appropriate.
The rest of the time, it's better to just let it be deferred. Needing to frequently check this seems like a bigger design problem...
Solution 3:
My five cents. Quite often you have to deal with an enumerable that you have no idea what's inside of it.
Your options are:
- turn it to a list before using it but chances are it's endless you are in trouble
- use it as is and you are likely to face all kinds of deferred execution funny things and you are in trouble again
Here is an example:
[TestClass]
public class BadExample
{
public class Item
{
public String Value { get; set; }
}
public IEnumerable<Item> SomebodysElseMethodWeHaveNoControlOver()
{
var values = "at the end everything must be in upper".Split(' ');
return values.Select(x => new Item { Value = x });
}
[TestMethod]
public void Test()
{
var items = this.SomebodysElseMethodWeHaveNoControlOver();
foreach (var item in items)
{
item.Value = item.Value.ToUpper();
}
var mustBeInUpper = String.Join(" ", items.Select(x => x.Value).ToArray());
Trace.WriteLine(mustBeInUpper); // output is in lower: at the end everything must be in upper
Assert.AreEqual("AT THE END EVERYTHING MUST BE IN UPPER", mustBeInUpper); // <== fails here
}
}
So there is no way to get away with it but the one: iterate it exactly one time on as-you-go basis.
It was clearly a bad design choice to use the same IEnumerable interface for immediate and deferred execution scenarios. There must be a clear distinction between these two, so that it's clear from the name or by checking a property whether or not the enumerable is deferred.
A hint: In your code consider using IReadOnlyCollection<T>
instead of the plain IEnumerable<T>
, because in addition to that you get the Count
property. This way you know for sure it's not endless and you can turn it to a list no problem.
Solution 4:
The message about expanding the results view will evaluate the collection is a standard message presented for all IEnumerable
objects. I'm not sure that there is any foolproof means of checking if an IEnumerable
is deferred, mainly because even a yield
is deferred. The only means of absolutely ensuring that it isn't deferred is to accept an ICollection
or IList<T>
.
Solution 5:
It's absolutely possible to manually implement a lazy IEnumerator<T>
, so there's no "perfectly general" way of doing it. What I keep in mind is this: if I'm changing things in a list while enumerating something related to it, always call ToArray()
before the foreach
.