Is it better to call ToList() or ToArray() in LINQ queries?

I often run into the case where I want to eval a query right where I declare it. This is usually because I need to iterate over it multiple times and it is expensive to compute. For example:

string raw = "...";
var lines = (from l in raw.Split('\n')
             let ll = l.Trim()
             where !string.IsNullOrEmpty(ll)
             select ll).ToList();

This works fine. But if I am not going to modify the result, then I might as well call ToArray() instead of ToList().

I wonder however whether ToArray() is implemented by first calling ToList() and is therefore less memory efficient than just calling ToList().

Am I crazy? Should I just call ToArray() - safe and secure in the knowledge that the memory won't be allocated twice?


Solution 1:

Unless you simply need an array to meet other constraints you should use ToList. In the majority of scenarios ToArray will allocate more memory than ToList.

Both use arrays for storage, but ToList has a more flexible constraint. It needs the array to be at least as large as the number of elements in the collection. If the array is larger, that is not a problem. However ToArray needs the array to be sized exactly to the number of elements.

To meet this constraint ToArray often does one more allocation than ToList. Once it has an array that is big enough it allocates an array which is exactly the correct size and copies the elements back into that array. The only time it can avoid this is when the grow algorithm for the array just happens to coincide with the number of elements needing to be stored (definitely in the minority).

EDIT

A couple of people have asked me about the consequence of having the extra unused memory in the List<T> value.

This is a valid concern. If the created collection is long lived, is never modified after being created and has a high chance of landing in the Gen2 heap then you may be better off taking the extra allocation of ToArray up front.

In general though I find this to be the rarer case. It's much more common to see a lot of ToArray calls which are immediately passed to other short lived uses of memory in which case ToList is demonstrably better.

The key here is to profile, profile and then profile some more.

Solution 2:

The performance difference will be insignificant, since List<T> is implemented as a dynamically sized array. Calling either ToArray() (which uses an internal Buffer<T> class to grow the array) or ToList() (which calls the List<T>(IEnumerable<T>) constructor) will end up being a matter of putting them into an array and growing the array until it fits them all.

If you desire concrete confirmation of this fact, check out the implementation of the methods in question in Reflector -- you'll see they boil down to almost identical code.

Solution 3:

It's 2020 outside and everyone is using .NET Core 3.1 so I decided to run some benchmarks with Benchmark.NET.

TL;DR: ToArray() is better performance-wise and does a better job conveying intent if you're not planning to mutate the collection.


    [MemoryDiagnoser]
    public class Benchmarks
    {
        [Params(0, 1, 6, 10, 39, 100, 666, 1000, 1337, 10000)]
        public int Count { get; set; }

        public IEnumerable<int> Items => Enumerable.Range(0, Count);

        [Benchmark(Description = "ToArray()", Baseline = true)]
        public int[] ToArray() => Items.ToArray();

        [Benchmark(Description = "ToList()")]
        public List<int> ToList() => Items.ToList();

        public static void Main() => BenchmarkRunner.Run<Benchmarks>();
    }

The results are:


    BenchmarkDotNet=v0.12.0, OS=Windows 10.0.14393.3443 (1607/AnniversaryUpdate/Redstone1)
    Intel Core i5-4460 CPU 3.20GHz (Haswell), 1 CPU, 4 logical and 4 physical cores
    Frequency=3124994 Hz, Resolution=320.0006 ns, Timer=TSC
    .NET Core SDK=3.1.100
      [Host]     : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT
      DefaultJob : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT


    |    Method | Count |          Mean |       Error |      StdDev |        Median | Ratio | RatioSD |   Gen 0 | Gen 1 | Gen 2 | Allocated |
    |---------- |------ |--------------:|------------:|------------:|--------------:|------:|--------:|--------:|------:|------:|----------:|
    | ToArray() |     0 |      7.357 ns |   0.2096 ns |   0.1960 ns |      7.323 ns |  1.00 |    0.00 |       - |     - |     - |         - |
    |  ToList() |     0 |     13.174 ns |   0.2094 ns |   0.1958 ns |     13.084 ns |  1.79 |    0.05 |  0.0102 |     - |     - |      32 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |     1 |     23.917 ns |   0.4999 ns |   0.4676 ns |     23.954 ns |  1.00 |    0.00 |  0.0229 |     - |     - |      72 B |
    |  ToList() |     1 |     33.867 ns |   0.7350 ns |   0.6876 ns |     34.013 ns |  1.42 |    0.04 |  0.0331 |     - |     - |     104 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |     6 |     28.242 ns |   0.5071 ns |   0.4234 ns |     28.196 ns |  1.00 |    0.00 |  0.0280 |     - |     - |      88 B |
    |  ToList() |     6 |     43.516 ns |   0.9448 ns |   1.1949 ns |     42.896 ns |  1.56 |    0.06 |  0.0382 |     - |     - |     120 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |    10 |     31.636 ns |   0.5408 ns |   0.4516 ns |     31.657 ns |  1.00 |    0.00 |  0.0331 |     - |     - |     104 B |
    |  ToList() |    10 |     53.870 ns |   1.2988 ns |   2.2403 ns |     53.415 ns |  1.77 |    0.07 |  0.0433 |     - |     - |     136 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |    39 |     58.896 ns |   0.9441 ns |   0.8369 ns |     58.548 ns |  1.00 |    0.00 |  0.0713 |     - |     - |     224 B |
    |  ToList() |    39 |    138.054 ns |   2.8185 ns |   3.2458 ns |    138.937 ns |  2.35 |    0.08 |  0.0815 |     - |     - |     256 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |   100 |    119.167 ns |   1.6195 ns |   1.4357 ns |    119.120 ns |  1.00 |    0.00 |  0.1478 |     - |     - |     464 B |
    |  ToList() |   100 |    274.053 ns |   5.1073 ns |   4.7774 ns |    272.242 ns |  2.30 |    0.06 |  0.1578 |     - |     - |     496 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |   666 |    569.920 ns |  11.4496 ns |  11.2450 ns |    571.647 ns |  1.00 |    0.00 |  0.8688 |     - |     - |    2728 B |
    |  ToList() |   666 |  1,621.752 ns |  17.1176 ns |  16.0118 ns |  1,623.566 ns |  2.85 |    0.05 |  0.8793 |     - |     - |    2760 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |  1000 |    796.705 ns |  16.7091 ns |  19.8910 ns |    796.610 ns |  1.00 |    0.00 |  1.2951 |     - |     - |    4064 B |
    |  ToList() |  1000 |  2,453.110 ns |  48.1121 ns |  65.8563 ns |  2,460.190 ns |  3.09 |    0.10 |  1.3046 |     - |     - |    4096 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() |  1337 |  1,057.983 ns |  20.9810 ns |  41.4145 ns |  1,041.028 ns |  1.00 |    0.00 |  1.7223 |     - |     - |    5416 B |
    |  ToList() |  1337 |  3,217.550 ns |  62.3777 ns |  61.2633 ns |  3,203.928 ns |  2.98 |    0.13 |  1.7357 |     - |     - |    5448 B |
    |           |       |               |             |             |               |       |         |         |       |       |           |
    | ToArray() | 10000 |  7,309.844 ns | 160.0343 ns | 141.8662 ns |  7,279.387 ns |  1.00 |    0.00 | 12.6572 |     - |     - |   40064 B |
    |  ToList() | 10000 | 23,858.032 ns | 389.6592 ns | 364.4874 ns | 23,759.001 ns |  3.26 |    0.08 | 12.6343 |     - |     - |   40096 B |

    // * Hints *
    Outliers
      Benchmarks.ToArray(): Default -> 2 outliers were removed (35.20 ns, 35.29 ns)
      Benchmarks.ToArray(): Default -> 2 outliers were removed (38.51 ns, 38.88 ns)
      Benchmarks.ToList(): Default  -> 1 outlier  was  removed (64.69 ns)
      Benchmarks.ToArray(): Default -> 1 outlier  was  removed (67.02 ns)
      Benchmarks.ToArray(): Default -> 1 outlier  was  removed (130.08 ns)
      Benchmarks.ToArray(): Default -> 1 outlier  was  detected (541.82 ns)
      Benchmarks.ToArray(): Default -> 1 outlier  was  removed (7.82 us)

    // * Legends *
      Count     : Value of the 'Count' parameter
      Mean      : Arithmetic mean of all measurements
      Error     : Half of 99.9% confidence interval
      StdDev    : Standard deviation of all measurements
      Median    : Value separating the higher half of all measurements (50th percentile)
      Ratio     : Mean of the ratio distribution ([Current]/[Baseline])
      RatioSD   : Standard deviation of the ratio distribution ([Current]/[Baseline])
      Gen 0     : GC Generation 0 collects per 1000 operations
      Gen 1     : GC Generation 1 collects per 1000 operations
      Gen 2     : GC Generation 2 collects per 1000 operations
      Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
      1 ns      : 1 Nanosecond (0.000000001 sec)

Solution 4:

(seven years later...)

A couple of other (good) answers have concentrated on microscopic performance differences that will occur.

This post is just a supplement to mention the semantic difference that exists between the IEnumerator<T> produced by an array (T[]) as compared to that returned by a List<T>.

Best illustrated with by example:

IList<int> source = Enumerable.Range(1, 10).ToArray();  // try changing to .ToList()

foreach (var x in source)
{
  if (x == 5)
    source[8] *= 100;
  Console.WriteLine(x);
}

The above code will run with no exception and produces the output:

1
2
3
4
5
6
7
8
900
10

This shows that the IEnumarator<int> returned by an int[] does not keep track on whether the array has been modified since the creation of the enumerator.

Note that I declared the local variable source as an IList<int>. In that way I make sure the C# compiler does not optimze the foreach statement into something which is equivalent to a for (var idx = 0; idx < source.Length; idx++) { /* ... */ } loop. This is something the C# compiler might do if I use var source = ...; instead. In my current version of the .NET framework the actual enumerator used here is a non-public reference-type System.SZArrayHelper+SZGenericArrayEnumerator`1[System.Int32] but of course this is an implementation detail.

Now, if I change .ToArray() into .ToList(), I get only:

1
2
3
4
5

followed by a System.InvalidOperationException blow-up saying:

Collection was modified; enumeration operation may not execute.

The underlying enumerator in this case is the public mutable value-type System.Collections.Generic.List`1+Enumerator[System.Int32] (boxed inside an IEnumerator<int> box in this case because I use IList<int>).

In conclusion, the enumerator produced by a List<T> keeps track on whether the list changes during enumeration, while the enumerator produced by T[] does not. So consider this difference when choosing between .ToList() and .ToArray().

People often add one extra .ToArray() or .ToList() to circumvent a collection that keeps track on whether it was modified during the life-time of an enumerator.

(If anybody wants to know how the List<> keeps track on whether collection was modified, there is a private field _version in this class which is changed everytime the List<> is updated. It would in fact be possible to change this behavior of List<> by simply removing the line that increments _version in the set accessor of the indexer public T this[int index], just like it was done inside Dictionary<,> recently, as described in another answer.)