Is it worthwhile to initialize the collection size of a List<T> if it's size reasonably known?
Yes, it gets to be important when your List<T>
gets large. The exact numbers depend on the element type and the machine architecture, let's pick a List of reference types on a 32-bit machine. Each element will then take 4 bytes inside an internal array. The list will start out with a Capacity of 0 and an empty array. The first Add()
call grows the Capacity to 4, reallocating the internal array to 16 bytes. Four Add()
calls later, the array is full and needs to be reallocated again. It doubles the size, Capacity grows to 8, array size to 32 bytes. The previous array is garbage.
This repeats as necessary, several copies of the internal array will become garbage.
Something special happens when the array has grown to 65,536 bytes (16,384 elements). The next Add() doubles the size again to 131,072 bytes. That's a memory allocation that exceeds the threshold for "large objects" (85,000 bytes). The allocation is now no longer made on the generation 0 heap, it is taken from the Large Object Heap.
Objects on the LOH are treated specially. They are only garbage collected during a generation 2 collection. And the heap doesn't get compacted, it takes too much time to move such large chunks.
This repeats as necessary, several LOH objects will become garbage. They can take up memory for quite a while, generation 2 collections do not happen very often. Another problem is that these large blocks tend to fragment the virtual memory address space.
This doesn't repeat endlessly, sooner or later the List class needs to re-allocate the array and it has grown so large that there isn't a hole left in the virtual memory address space to fit the array. Your program will bomb with an OutOfMemoryException. Usually well before all available virtual memory has been consumed.
Long story short, by setting the Capacity early, before you start filling the List, you can reserve that large internal array up front. You won't get all those awkward released blocks in the Large Object Heap and avoid fragmentation. In effect, you'll be able to store many more objects in the list and your program runs leaner since there's so little garbage. Do this only if you have a good idea how large the list will be, using a large Capacity that you'll never fill is wasteful.
It is, as per documentation
If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the List(T).
Well, it will stop you the values in the list (which will be references if the element type is a reference type) from having to be copied occasionally as the list grows.
If it's going to be a particularly large list and you've got a pretty good idea of the size, it won't hurt. However, if estimating the size involves extra calculations or any significant amount of code, I wouldn't worry about it unless you find it becomes a problem - it could distract from the main focus of the code, and the resizing is unlikely to cause performance issues unless it's a really big list or you're doing it a lot.