Why use the yield keyword, when I could just use an ordinary IEnumerable?

Given this code:

IEnumerable<object> FilteredList()
{
    foreach( object item in FullList )
    {
        if( IsItemInPartialList( item ) )
            yield return item;
    }
}

Why should I not just code it this way?:

IEnumerable<object> FilteredList()
{
    var list = new List<object>(); 
    foreach( object item in FullList )
    {
        if( IsItemInPartialList( item ) )
            list.Add(item);
    }
    return list;
}

I sort of understand what the yield keyword does. It tells the compiler to build a certain kind of thing (an iterator). But why use it? Apart from it being slightly less code, what's it do for me?


Solution 1:

Using yield makes the collection lazy.

Let's say you just need the first five items. Your way, I have to loop through the entire list to get the first five items. With yield, I only loop through the first five items.

Solution 2:

The benefit of iterator blocks is that they work lazily. So you can write a filtering method like this:

public static IEnumerable<T> Where<T>(this IEnumerable<T> source,
                                   Func<T, bool> predicate)
{
    foreach (var item in source)
    {
        if (predicate(item))
        {
            yield return item;
        }
    }
}

That will allow you to filter a stream as long as you like, never buffering more than a single item at a time. If you only need the first value from the returned sequence, for example, why would you want to copy everything into a new list?

As another example, you can easily create an infinite stream using iterator blocks. For example, here's a sequence of random numbers:

public static IEnumerable<int> RandomSequence(int minInclusive, int maxExclusive)
{
    Random rng = new Random();
    while (true)
    {
        yield return rng.Next(minInclusive, maxExclusive);
    }
}

How would you store an infinite sequence in a list?

My Edulinq blog series gives a sample implementation of LINQ to Objects which makes heavy use of iterator blocks. LINQ is fundamentally lazy where it can be - and putting things in a list simply doesn't work that way.

Solution 3:

With the "list" code, you have to process the full list before you can pass it on to the next step. The "yield" version passes the processed item immediately to the next step. If that "next step" contains a ".Take(10)" then the "yield" version will only process the first 10 items and forget about the rest. The "list" code would have processed everything.

This means that you see the most difference when you need to do a lot of processing and/or have long lists of items to process.