Is there a performance impact when calling ToList()?

When using ToList(), is there a performance impact that needs to be considered?

I was writing a query to retrieve files from a directory, which is the query:

string[] imageArray = Directory.GetFiles(directory);

However, since I like to work with List<> instead, I decided to put in...

List<string> imageList = Directory.GetFiles(directory).ToList();

So, is there some sort of performance impact that should be considered when deciding to do a conversion like this - or only to be considered when dealing with a large number of files? Is this a negligible conversion?


Solution 1:

IEnumerable.ToList()

Yes, IEnumerable<T>.ToList() does have a performance impact, it is an O(n) operation though it will likely only require attention in performance critical operations.

The ToList() operation will use the List(IEnumerable<T> collection) constructor. This constructor must make a copy of the array (more generally IEnumerable<T>), otherwise future modifications of the original array will change on the source T[] also which wouldn't be desirable generally.

I would like to reiterate this will only make a difference with a huge list, copying chunks of memory is quite a fast operation to perform.

Handy tip, As vs To

You'll notice in LINQ there are several methods that start with As (such as AsEnumerable()) and To (such as ToList()). The methods that start with To require a conversion like above (ie. may impact performance), and the methods that start with As do not and will just require some cast or simple operation.

Additional details on List<T>

Here is a little more detail on how List<T> works in case you're interested :)

A List<T> also uses a construct called a dynamic array which needs to be resized on demand, this resize event copies the contents of an old array to the new array. So it starts off small and increases in size if required.

This is the difference between the Capacity and Count attributes on List<T>. Capacity refers to the size of the array behind the scenes, Count is the number of items in the List<T> which is always <= Capacity. So when an item is added to the list, increasing it past Capacity, the size of the List<T> is doubled and the array is copied.

Solution 2:

Is there a performance impact when calling toList()?

Yes of course. Theoretically even i++ has a performance impact, it slows the program for maybe a few ticks.

What does .ToList do?

When you invoke .ToList, the code calls Enumerable.ToList() which is an extension method that return new List<TSource>(source). In the corresponding constructor, under the worst circumstance, it goes through the item container and add them one by one into a new container. So its behavior affects little on performance. It's impossible to be a performance bottle neck of your application.

What's wrong with the code in the question

Directory.GetFiles goes through the folder and returns all files' names immediately into memory, it has a potential risk that the string[] costs a lot of memory, slowing down everything.

What should be done then

It depends. If you(as well as your business logic) gurantees that the file amount in the folder is always small, the code is acceptable. But it's still suggested to use a lazy version: Directory.EnumerateFiles in C#4. This is much more like a query, which will not be executed immediately, you can add more query on it like:

Directory.EnumerateFiles(myPath).Any(s => s.Contains("myfile"))

which will stop searching the path as soon as a file whose name contains "myfile" is found. This is obviously has a better performance then .GetFiles.