Under what circumstances are linked lists useful?
Linked lists are very useful when you need to do a lot of insertions and removals, but not too much searching, on a list of arbitrary (unknown at compile-time) length.
Splitting and joining (bidirectionally-linked) lists is very efficient.
You can also combine linked lists - e.g. tree structures can be implemented as "vertical" linked lists (parent/child relationships) connecting together horizontal linked lists (siblings).
Using an array based list for these purposes has severe limitations:
- Adding a new item means the array must be reallocated (or you must allocate more space than you need to allow for future growth and reduce the number of reallocations)
- Removing items leaves wasted space or requires a reallocation
- inserting items anywhere except the end involves (possibly reallocating and) copying lots of the data up one position
They can be useful for concurrent data structures. (There is now a non-concurrent real-world usage sample below - that would not be there if @Neil hadn't mentioned FORTRAN. ;-)
For example, ConcurrentDictionary<TKey, TValue>
in .NET 4.0 RC use linked lists to chain items that hash to the same bucket.
The underlying data structure for ConcurrentStack<T>
is also a linked list.
ConcurrentStack<T>
is one of the data structures that serve as the foundation for the new Thread Pool, (with the local "queues" implemented as stacks, essentially). (The other main supporting structure being ConcurrentQueue<T>
.)
The new Thread Pool in turn provides the basis for the work scheduling of the new Task Parallel Library.
So they can certainly be useful - a linked list is currently serving as one of the main supporting structures of at least one great new technology.
(A singly-linked list makes a compelling lock-free - but not wait-free - choice in these cases, because main operations can be carried out with a single CAS (+retries). In a modern GC-d environment - such as Java and .NET - the ABA problem can easily be avoided. Just wrap items you add in freshly created nodes and do not reuse those nodes - let the GC do its work. The page on the ABA problem also provides the implementation of a lock-free stack - that actually works in .Net (&Java) with a (GC-ed) Node holding the items.)
Edit:
@Neil:
actually, what you mentioned about FORTRAN reminded me that the same kind of linked lists can be found in probably the most used and abused data structure in .NET:
the plain .NET generic Dictionary<TKey, TValue>
.
Not one, but many linked lists are stored in an array.
- It avoids doing many small (de)allocations on inserts/deletes.
- Initial loading of the hash table is pretty fast, because the array is filled sequentially (plays very nice with CPU cache).
- Not to mention that a chaining hash table is expensive in terms of memory - and this "trick" cuts "pointer sizes" in half on x64.
Essentially, many linked lists are stored in an array. (one for each bucket used.) A free list of reusable nodes is "interwoven" between them (if there were deletes). An array is allocated at the start/on rehash and nodes of chains are kept in it. There is also a free pointer - an index into the array - that follows deletes. ;-) So - believe it or not - the FORTRAN technique still lives on. (...and nowhere else, than in one of the most commonly used .NET data structures ;-).
Linked lists are very flexible: With the modification of one pointer, you can make a massive change, where the same operation would be very inefficient in an array list.
Arrays are the data structures to which Linked Lists are usually compared.
Normally linked lists are useful when you have to make a lot of modification to the list itself while arrays performs better than lists on direct element access.
Here's a list of operations that can be performed on lists and arrays, compared with the relative operation cost (n = list/array length):
- Adding an element:
- on lists you just need to allocate memory for the new element and redirecting pointers. O(1)
- on arrays you have to relocate the array. O(n)
- Removing an element
- on lists you just redirect pointers. O(1).
- on arrays you spend O(n) time to relocate the array if the element to remove is not the first or the last element of the array; otherwise you can simply relocate the pointer to the start of the array or decrease the array length
- Getting an element in a known position:
- on lists you have to walk the list from the first element to the element in the specific position. Worst case: O(n)
- on arrays you can access the element immediately. O(1)
This is a very low-level comparison of these two popular and basic data structures and you can see that lists performs better in situations where you have to make a lot of modifications to the list it self (removing or adding elements). On the other hand arrays performs better than lists when you have to access directly the elements of the array.
From the point of view of the memory allocation, lists are better because there's no need to have all the elements next to each others. On the other hand there's the (little) overhead of storing the pointers to the next (or even to the previous) element.
Knowing these differences is important to developers to choose between lists and arrays in their implementations.
Note that this is a comparison of lists and arrays. There are good solutions to the problems here reported (eg: SkipLists, Dynamic Arrays, etc...). In this answer I've taken into account the basic data structure every programmer should know about.
They're useful when you need high-speed push, pop and rotate, and don't mind O(n) indexing.