Is it faster to sort a list after inserting items or adding them to a sorted list

If you add enough items that you're effectively building the list from scratch, you should be able to get better performance by sorting the list afterwards.

If items are mostly in order, you can tweak both incremental update and regular sorting to take advantage of that, but frankly, it usually isn't worth the trouble. (You also need to be careful of things like making sure some unexpected ordering can't make your algorithm take much longer, q.v. naive quicksort)

Both incremental update and regular list sort are O(N log N) but you can get a better constant factor sorting everything afterward (I'm assuming here that you've got some auxiliary datastructure so your incremental update can access list items faster than O(N)...). Generally speaking, sorting all at once has a lot more design freedom than maintaining the ordering incrementally, since incremental update has to maintain a complete order at all times, but an all-at-once bulk sort does not.

If nothing else, remember that there are lots of highly-optimized bulk sorts available.

Usually it's far better to use a heap. in short, it splits the cost of maintaining order between the pusher and the picker. Both operations are O(log n), instead of O(n log n), like most other solutions.

If you're adding in bunches, you can use a merge sort. Sort the list of items to be added, then copy from both lists, comparing items to determine which one gets copied next. You could even copy in-place if resize your destination array and work from the end backwards.

The efficiency of this solution is O(n+m) + O(m log m) where n is the size of the original list, and m is the number of items being inserted.

Edit: Since this answer isn't getting any love, I thought I'd flesh it out with some C++ sample code. I assume that the sorted list is kept in a linked list rather than an array. This changes the algorithm to look more like an insertion than a merge, but the principle is the same.

// Note that itemstoadd is modified as a side effect of this function
template<typename T>
void AddToSortedList(std::list<T> & sortedlist, std::vector<T> & itemstoadd)
{
    std::sort(itemstoadd.begin(), itemstoadd.end());
    std::list<T>::iterator listposition = sortedlist.begin();
    std::vector<T>::iterator nextnewitem = itemstoadd.begin();
    while ((listposition != sortedlist.end()) || (nextnewitem != itemstoadd.end()))
    {
        if ((listposition == sortedlist.end()) || (*nextnewitem < *listposition))
            sortedlist.insert(listposition, *nextnewitem++);
        else
            ++listposition;
    }
}

In principle, it's faster to create a tree than to sort a list. The tree inserts are O(log(n)) for each insert, leading to overall O(nlog(n)). Sorting in O(nlog(n)).

That's why Java has TreeMap, (in addition to TreeSet, TreeList, ArrayList and LinkedList implementations of a List.)

A TreeSet keeps things in object comparison order. The key is defined by the Comparable interface.
A LinkedList keeps things in the insertion order.
An ArrayList uses more memory, is faster for some operations.
A TreeMap, similarly, removes the need to sort by a key. The map is built in key order during the inserts and maintained in sorted order at all times.

However, for some reason, the Java implementation of TreeSet is quite a bit slower than using an ArrayList and a sort.

[It's hard to speculate as to why it would be dramatically slower, but it is. It should be slightly faster by one pass through the data. This kind of thing is often the cost of memory management trumping the algorithmic analysis.]

Is it faster to sort a list after inserting items or adding them to a sorted list

Related

Recent Posts