How to tell if an IEnumerable<T> is subject to deferred execution?

I always assumed that if I was using Select(x=> ...) in the context of LINQ to objects, then the new collection would be immediately created and remain static. I'm not quite sure WHY I assumed this, and its a very bad assumption but I did. I often use .ToList() elsewhere, but often not in this case.

This code demonstrates that even a simple 'Select' is subject to deferred execution :

var random = new Random();
var animals = new[] { "cat", "dog", "mouse" };
var randomNumberOfAnimals = animals.Select(x => Math.Floor(random.NextDouble() * 100) + " " + x + "s");

foreach (var i in randomNumberOfAnimals)
{
    testContextInstance.WriteLine("There are " + i);
}

foreach (var i in randomNumberOfAnimals)
{
    testContextInstance.WriteLine("And now, there are " + i);
}

This outputs the following (the random function is called every time the collection is iterated through):

There are 75 cats
There are 28 dogs
There are 62 mouses
And now, there are 78 cats
And now, there are 69 dogs
And now, there are 43 mouses

I have many places where I have an IEnumerable<T> as a member of a class. Often the results of a LINQ query are assigned to such an IEnumerable<T>. Normally for me, this does not cause issues, but I have recently found a few places in my code where it poses more than just a performance issue.

In trying to check for places where I had made this mistake I thought I could check to see if a particular IEnumerable<T> was of type IQueryable. This I thought would tell me if the collection was 'deferred' or not. It turns out that the enumerator created by the Select operator above is of type System.Linq.Enumerable+WhereSelectArrayIterator``[System.String,System.String] and not IQueryable.

I used Reflector to see what this interface inherited from, and it turns out not to inherit from anything that indicates it is 'LINQ' at all - so there is no way to test based upon the collection type.

I'm quite happy now putting .ToArray() everywhere now, but I'd like to have a mechanism to make sure this problem doesn't happen in future. Visual Studio seems to know how to do it because it gives a message about 'expanding the results view will evaluate the collection.'

The best I have come up with is :

bool deferred = !object.ReferenceEquals(randomNumberOfAnimals.First(),
                                        randomNumberOfAnimals.First());

Edit: This only works if a new object is created with 'Select' and it not a generic solution. I'm not recommended it in any case though! It was a little tongue in the cheek of a solution.

Solution 1:

Deferred execution of LINQ has trapped a lot of people, you're not alone.

The approach I've taken to avoiding this problem is as follows:

Parameters to methods - use IEnumerable<T> unless there's a need for a more specific interface.

Local variables - usually at the point where I create the LINQ, so I'll know whether lazy evaluation is possible.

Class members - never use IEnumerable<T>, always use List<T>. And always make them private.

Properties - use IEnumerable<T>, and convert for storage in the setter.

public IEnumerable<Person> People 
{
    get { return people; }
    set { people = value.ToList(); }
}
private List<People> people;

While there are theoretical cases where this approach wouldn't work, I've not run into one yet, and I've been enthusiasticly using the LINQ extension methods since late Beta.

BTW: I'm curious why you use ToArray(); instead of ToList(); - to me, lists have a much nicer API, and there's (almost) no performance cost.

Update: A couple of commenters have rightly pointed out that arrays have a theoretical performance advantage, so I've amended my statement above to "... there's (almost) no performance cost."

Update 2: I wrote some code to do some micro-benchmarking of the difference in performance between Arrays and Lists. On my laptop, and in my specific benchmark, the difference is around 5ns (that's nanoseconds) per access. I guess there are cases where saving 5ns per loop would be worthwhile ... but I've never come across one. I had to hike my test up to 100 million iterations before the runtime became long enough to accurately measure.

Solution 2:

In general, I'd say you should try to avoid worrying about whether it's deferred.

There are advantages to the streaming execution nature of IEnumerable<T>. It is true - there are times that it's disadvantageous, but I'd recommend just always handling those (rare) times specifically - either go ToList() or ToArray() to convert it to a list or array as appropriate.

The rest of the time, it's better to just let it be deferred. Needing to frequently check this seems like a bigger design problem...

Solution 3:

My five cents. Quite often you have to deal with an enumerable that you have no idea what's inside of it.

Your options are:

turn it to a list before using it but chances are it's endless you are in trouble
use it as is and you are likely to face all kinds of deferred execution funny things and you are in trouble again

Here is an example:

[TestClass]
public class BadExample
{
    public class Item
    {
        public String Value { get; set; }
    }
    public IEnumerable<Item> SomebodysElseMethodWeHaveNoControlOver()
    {
        var values = "at the end everything must be in upper".Split(' ');
        return values.Select(x => new Item { Value = x });
    }
    [TestMethod]
    public void Test()
    {
        var items = this.SomebodysElseMethodWeHaveNoControlOver();
        foreach (var item in items)
        {
            item.Value = item.Value.ToUpper();
        }
        var mustBeInUpper = String.Join(" ", items.Select(x => x.Value).ToArray());
        Trace.WriteLine(mustBeInUpper); // output is in lower: at the end everything must be in upper
        Assert.AreEqual("AT THE END EVERYTHING MUST BE IN UPPER", mustBeInUpper); // <== fails here
    }
}

So there is no way to get away with it but the one: iterate it exactly one time on as-you-go basis.

It was clearly a bad design choice to use the same IEnumerable interface for immediate and deferred execution scenarios. There must be a clear distinction between these two, so that it's clear from the name or by checking a property whether or not the enumerable is deferred.

A hint: In your code consider using IReadOnlyCollection<T> instead of the plain IEnumerable<T>, because in addition to that you get the Count property. This way you know for sure it's not endless and you can turn it to a list no problem.

Solution 4:

The message about expanding the results view will evaluate the collection is a standard message presented for all IEnumerable objects. I'm not sure that there is any foolproof means of checking if an IEnumerable is deferred, mainly because even a yield is deferred. The only means of absolutely ensuring that it isn't deferred is to accept an ICollection or IList<T>.

Solution 5:

It's absolutely possible to manually implement a lazy IEnumerator<T>, so there's no "perfectly general" way of doing it. What I keep in mind is this: if I'm changing things in a list while enumerating something related to it, always call ToArray() before the foreach.