Formulating list sorting as a pure math problem

Sorting a list is a classic problem in computer science, and many interesting algorithms such as merge sort and heap sort have been discovered.

I'd like to have a precise formulation of the list sorting problem as a pure math problem, so that a hypothetical pure mathematician who has never heard of computers (such as Gauss, let's say) would be able to read the problem, understand it, and then invent algorithms such as merge sort and heap sort.

A statement such as "devise an algorithm for sorting a list of numbers into ascending order" is inadequate, because it doesn't state which basic operations are allowed to be used by the algorithm. And it also doesn't say precisely what an "algorithm" is. Yes, it should be something that can be implemented in assembly code, but we are posing the problem for a mathematician who has never heard of a computer.

Once the problem has been formulated precisely, it should be possible to state and prove theorems such as "The worst-case running time of the quicksort algorithm is $O(n^2)$" with a level of precision and rigor that would meet the standards of, say, baby Rudin.

Question: How would you precisely state the problem of devising a list sorting algorithm?

Bonus question: Do you know of a reference that presents list sorting algorithms as a topic in pure math, with a level of precision that would satisfy an author such as Rudin?

Edit: This might make the question more specific. The beginning of chapter 8 of Introduction to Algorithms by Cormen et al [p. 191 in the third edition] states:

These algorithms share an interesting property: the sorted order they determine is based only on comparisons between the input elements. We call such sorting algorithms comparison sorts. All the sorting algorithms introduced thus far are comparison sorts.

In section 8.1, we shall prove that any comparison sort must make $\Omega(n \lg n)$ comparisons in the worst case to sort $n$ elements.

More specific question: How would you formulate that statement precisely as a math theorem -- the fact that any comparison sort must make $\Omega(n \lg n)$ comparisons in the worst case. In order to make this rigorous, we must either define precisely what a "comparison sort" is, or else avoid using the term entirely.

There are several mathematical studies of sorting algorithms, starting from Knuth's bible [3] on the topic. The Handbook [2] also offers a rigorous approach to algorithms and gives a number of references.

I will not try to summarize in a few lines the 790 pages of Knuth's volume, but here are some points to take into account for a mathematical approach to sorting.

You are given a totally ordered $n$-element set $S = \{a_1 < a_2 < \ldots < a_n\}$. Initially the array to be sorted is $(a_{\sigma(1)}, a_{\sigma(2)}, \ldots, a_{\sigma(n)})$, where $\sigma$ is a permutation of $\{1, \ldots, n\}$.
Usually, the (time) complexity measures the number of comparisons needed to sort the array. No other parameter is taken into account. Thus comparisons of the form "is $a < b$?" are the only basic operations considered. This way, the problem is totally independent of the implementation.
There are several complexity measures, including time-complexity (including worst case time-complexity), average time-complexity, space complexity (which measures the amount of memory used by your algorithm), not to speak about the more recent amortized computational complexity. In your question, you seem to be interested in the worst case time-complexity.
[EDIT] The minimal number of comparisons needed to sort $n$ elements is OEIS sequence A036604. See [4] for a recent contribution on this topic.

In a comment, you claim that

if the list is an array of numbers in computer memory, moving the item in position $j$ into position $i$ requires moving a lot of stuff around in memory (because many items in the array must be shifted by one position).

This is not true. For instance you can play around with pointers in C (or references in Java). Furthermore, a number of sorting algorithms are based on swapping two positions $i$ and $j$, which does not need any shift. To familiarise yourself with algorithms, [1] is one of the standard references, but they are many other ones.

[1] Cormen, Thomas H., Leiserson, Charles E., Rivest, Ronald L. and Stein, Clifford. Introduction to algorithms. Third edition. MIT Press, Cambridge, MA, 2009. xx+1292 pp. ISBN: 978-0-262-03384-8

[2] Gonnet, Gaston H. and Baeza-Yates, R., Handbook of algorithms and data structures. 2nd ed. (English) International Computer Science Series. Bonn etc.: Addison-Wesley Publishing Company. XIV, 424 p. (1989).

[3] Knuth, Donald E. The art of computer programming. Vol. 3. Sorting and searching. Second edition. Addison-Wesley, Reading, MA, 1998. xiv+780

[4] Peczarski, Marcin (3 August 2011). Towards Optimal Sorting of 16 Elements. Acta Universitatis Sapientiae 4 (2): 215–224

Check out discussions on comparison-based algorithms (a class of algorithms that includes most sorting algorithms). It is a good starting point for formalisation even though the definition of these algorithms in the literature remains too vague to satisfy a pure mathematician.

A common ``definition" is that comparison-based algorithms are algorithms for which each basic operation (swap, assignment etc.) must be based on a prior comparison (or sequence of comparisons). This definition has a gap (not the only one). What really matters is that the basic operations are decided on the basis of the transitive closure of the outcome of prior comparisons. For instance, for quicksort, two elements a, b may be compared to a pivot element p, and may be swapped on the basis that a > p and p > b even though a and b never have been compared directly.

A second issue to look into is the relationship between comparison-based algorithms and partial order production (this will help with your question on lower bounds). This is rarely spelled out in text books. Partial order production is (among other aspects) an approach to establish more general lower bounds for comparison-based sorting. There are however excellent proofs of the lower bound specifically tailored for sorting algorithms that do no make use of more general partial order production results. One proof uses Jensens' inequality. Send me a message and I'll share it with you if it is of interest. The definition given above for a comparison-based sorting algorithm certainly is robust enough to make the statement of the theorem unambiguous and for the proof to hold up.

A third issue (for mathematicians) is the specification of the operations involved. The tightest presentation would be Knuth's volume III on sorting and searching (The Art of Computer Programming, Vol III), an excellent resource already mentioned upstream. Even at that level of precision a pure mathematician will want to abstract away from the detail, which is a challenge in its own right. See below on the tension between code and maths.

A fourth issue is the distinction between implicit data structures and explicit ones. For the case of heap sort, the list (array) data structure is typically used. However, the implicit data structure is that of a heap which is in the programmer's mind when writing the code, even though the heap data structures are not explicitly used in the code (only arrays are used).

Finally there is the status of algorithm analysis in general which remains a mixture of pure and applied maths. Many techniques are tailored to the algorithm under consideration. There is no unified approach. Knuth's TAOCP Vol III would have the most detailed info on a formal approach (be it with methods tailored still to various algorithms). Flajolet's work on algorithm analysis can be viewed as a way towards unification (though the operations used at times veer away from commonly used operations in computer science to ensure that generating functions can be applied. This as always creates a tension between computer science and maths due to shoe-horning code to fit the maths). I wrote a book on the topic focused on a unifying theory, remaining closer to standard computation (A modular calculus for the average cost of data structuring). It comes with the disclaimer (from this pure mathematician and computer scientist) that it will still not fully satisfy a pure mathematician.

A lot of work remains to be done to bring the theory to the right level of abstraction to pass a pure mathematician's expectations (your original question I believe). The game is to balance clarity and precision and achieve unification while respecting the actual goals of the discipline (CS in this case).

Formulating list sorting as a pure math problem

Related

Recent Posts