When NOT to use yield (return) [duplicate]
This question already has an answer here:
Is there ever a reason to not use 'yield return' when returning an IEnumerable?
There are several useful questions here on SO about the benefits of yield return
. For example,
What is the yield keyword
I'm looking for thoughts on when NOT to use yield return
. For example, if I expect to need to return all items in a collection, it doesn't seem like yield
would be useful, right?
What are the cases where use of yield
will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?
What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?
It's a good idea to think carefully about your use of "yield return" when dealing with recursively defined structures. For example, I often see this:
public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
if (root == null) yield break;
yield return root.Value;
foreach(T item in PreorderTraversal(root.Left))
yield return item;
foreach(T item in PreorderTraversal(root.Right))
yield return item;
}
Perfectly sensible-looking code, but it has performance problems. Suppose the tree is h deep. Then there will at most points be O(h) nested iterators built. Calling "MoveNext" on the outer iterator will then make O(h) nested calls to MoveNext. Since it does this O(n) times for a tree with n items, that makes the algorithm O(hn). And since the height of a binary tree is lg n <= h <= n, that means that the algorithm is at best O(n lg n) and at worst O(n^2) in time, and best case O(lg n) and worse case O(n) in stack space. It is O(h) in heap space because each enumerator is allocated on the heap. (On implementations of C# I'm aware of; a conforming implementation might have other stack or heap space characteristics.)
But iterating a tree can be O(n) in time and O(1) in stack space. You can write this instead like:
public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
var stack = new Stack<Tree<T>>();
stack.Push(root);
while (stack.Count != 0)
{
var current = stack.Pop();
if (current == null) continue;
yield return current.Value;
stack.Push(current.Left);
stack.Push(current.Right);
}
}
which still uses yield return, but is much smarter about it. Now we are O(n) in time and O(h) in heap space, and O(1) in stack space.
Further reading: see Wes Dyer's article on the subject:
http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx
What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?
I can think of a couple of cases, IE:
-
Avoid using yield return when you return an existing iterator. Example:
// Don't do this, it creates overhead for no reason // (a new state machine needs to be generated) public IEnumerable<string> GetKeys() { foreach(string key in _someDictionary.Keys) yield return key; } // DO this public IEnumerable<string> GetKeys() { return _someDictionary.Keys; }
-
Avoid using yield return when you don't want to defer execution code for the method. Example:
// Don't do this, the exception won't get thrown until the iterator is // iterated, which can be very far away from this method invocation public IEnumerable<string> Foo(Bar baz) { if (baz == null) throw new ArgumentNullException(); yield ... } // DO this public IEnumerable<string> Foo(Bar baz) { if (baz == null) throw new ArgumentNullException(); return new BazIterator(baz); }
The key thing to realize is what yield
is useful for, then you can decide which cases do not benefit from it.
In other words, when you do not need a sequence to be lazily evaluated you can skip the use of yield
. When would that be? It would be when you do not mind immediately having your entire collection in memory. Otherwise, if you have a huge sequence that would negatively impact memory, you would want to use yield
to work on it step by step (i.e., lazily). A profiler might come in handy when comparing both approaches.
Notice how most LINQ statements return an IEnumerable<T>
. This allows us to continually string different LINQ operations together without negatively impacting performance at each step (aka deferred execution). The alternative picture would be putting a ToList()
call in between each LINQ statement. This would cause each preceding LINQ statement to be immediately executed before performing the next (chained) LINQ statement, thereby forgoing any benefit of lazy evaluation and utilizing the IEnumerable<T>
till needed.
There are a lot of excellent answers here. I would add this one: Don't use yield return for small or empty collections where you already know the values:
IEnumerable<UserRight> GetSuperUserRights() {
if(SuperUsersAllowed) {
yield return UserRight.Add;
yield return UserRight.Edit;
yield return UserRight.Remove;
}
}
In these cases the creation of the Enumerator object is more expensive, and more verbose, than just generating a data structure.
IEnumerable<UserRight> GetSuperUserRights() {
return SuperUsersAllowed
? new[] {UserRight.Add, UserRight.Edit, UserRight.Remove}
: Enumerable.Empty<UserRight>();
}
Update
Here's the results of my benchmark:
These results show how long it took (in milliseconds) to perform the operation 1,000,000 times. Smaller numbers are better.
In revisiting this, the performance difference isn't significant enough to worry about, so you should go with whatever is the easiest to read and maintain.
Update 2
I'm pretty sure the above results were achieved with compiler optimization disabled. Running in Release mode with a modern compiler, it appears performance is practically indistinguishable between the two. Go with whatever is most readable to you.