Quicksort: Choosing the pivot
Solution 1:
Choosing a random pivot minimizes the chance that you will encounter worst-case O(n2) performance (always choosing first or last would cause worst-case performance for nearly-sorted or nearly-reverse-sorted data). Choosing the middle element would also be acceptable in the majority of cases.
Also, if you are implementing this yourself, there are versions of the algorithm that work in-place (i.e. without creating two new lists and then concatenating them).
Solution 2:
It depends on your requirements. Choosing a pivot at random makes it harder to create a data set that generates O(N^2) performance. 'Median-of-three' (first, last, middle) is also a way of avoiding problems. Beware of relative performance of comparisons, though; if your comparisons are costly, then Mo3 does more comparisons than choosing (a single pivot value) at random. Database records can be costly to compare.
Update: Pulling comments into answer.
mdkess asserted:
'Median of 3' is NOT first last middle. Choose three random indexes, and take the middle value of this. The whole point is to make sure that your choice of pivots is not deterministic - if it is, worst case data can be quite easily generated.
To which I responded:
Analysis Of Hoare's Find Algorithm With Median-Of-Three Partition (1997) by P Kirschenhofer, H Prodinger, C Martínez supports your contention (that 'median-of-three' is three random items).
There's an article described at portal.acm.org that is about 'The Worst Case Permutation for Median-of-Three Quicksort' by Hannu Erkiö, published in The Computer Journal, Vol 27, No 3, 1984. [Update 2012-02-26: Got the text for the article. Section 2 'The Algorithm' begins: 'By using the median of the first, middle and last elements of A[L:R], efficient partitions into parts of fairly equal sizes can be achieved in most practical situations.' Thus, it is discussing the first-middle-last Mo3 approach.]
Another short article that is interesting is by M. D. McIlroy, "A Killer Adversary for Quicksort", published in Software-Practice and Experience, Vol. 29(0), 1–4 (0 1999). It explains how to make almost any Quicksort behave quadratically.
AT&T Bell Labs Tech Journal, Oct 1984 "Theory and Practice in the Construction of a Working Sort Routine" states "Hoare suggested partitioning around the median of several randomly selected lines. Sedgewick [...] recommended choosing the median of the first [...] last [...] and middle". This indicates that both techniques for 'median-of-three' are known in the literature. (Update 2014-11-23: The article appears to be available at IEEE Xplore or from Wiley — if you have membership or are prepared to pay a fee.)
'Engineering a Sort Function' by J L Bentley and M D McIlroy, published in Software Practice and Experience, Vol 23(11), November 1993, goes into an extensive discussion of the issues, and they chose an adaptive partitioning algorithm based in part on the size of the data set. There is a lot of discussion of trade-offs for various approaches.
A Google search for 'median-of-three' works pretty well for further tracking.
Thanks for the information; I had only encountered the deterministic 'median-of-three' before.
Solution 3:
Heh, I just taught this class.
There are several options.
Simple: Pick the first or last element of the range. (bad on partially sorted input)
Better: Pick the item in the middle of the range. (better on partially sorted input)
However, picking any arbitrary element runs the risk of poorly partitioning the array of size n into two arrays of size 1 and n-1. If you do that often enough, your quicksort runs the risk of becoming O(n^2).
One improvement I've seen is pick median(first, last, mid); In the worst case, it can still go to O(n^2), but probabilistically, this is a rare case.
For most data, picking the first or last is sufficient. But, if you find that you're running into worst case scenarios often (partially sorted input), the first option would be to pick the central value( Which is a statistically good pivot for partially sorted data).
If you're still running into problems, then go the median route.