What is the sorting algorithm for Java [closed]

How does OpenJDK internally sort the datatypes and why ? It would be great if the specific algorithms can be mentioned


Solution 1:

Beginning with version 7, Oracle's Java implementation is using Timsort for object arrays bigger than 10 elements, and Insertion sort for arrays with less than that number of elements. The same considerations apply for both Arrays.sort() and Collections.sort(). In older versions of Java, Merge sort was used instead of Timsort.

Other implementations of the language (other than Oracle's) might use a different sorting algorithm, as this is not mandated by the specification. Quoting Collections' documentation:

The documentation for the polymorphic algorithms contained in this class generally includes a brief description of the implementation. Such descriptions should be regarded as implementation notes, rather than parts of the specification. Implementors should feel free to substitute other algorithms, so long as the specification itself is adhered to. (For example, the algorithm used by sort does not have to be a mergesort, but it does have to be stable.)

For sorting numeric primitives, JDK 7 uses "dual pivot quicksort".

Solution 2:

Collections.sort() uses a modified mergesort. Arrays.sort() uses a variation of quicksort for the primitives and mergesort for Object sorting.

For Java 7, read the comment from @SebastianPaaskeTørholm below

Solution 3:

OK attempting to come up with the canonical list. Basically the contract is that Collections.sort has to be a "stable" sort (i.e. equal elements won't be re-arranged), where Arrays.sort (for native type arrays) can re-arrange them, since they're identical, so it has more freedom to use different (i.e. faster) algorithms. Rationale for wanting stable contract is given here. Also it is presumed that comparing objects (vs. natives) is "much more expensive" (it typically is) so a side-goal for Collections.sort is to minimize number of comparisons, and be stable.

For all versions, Collections.sort initially makes a copy of the list (to an array), modifies that, then copies the sorted elements back into the initial list, to avoid O(n^2) complexity for sorting linked lists. Guess they thought the extra copy wouldn't be too expensive since it's just copying references, not actual values (?).

In JDK 6:

Arrays of native types: tuned quicksort

 * The sorting algorithm is a tuned quicksort, adapted from Jon
 * L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function",
 * Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November
 * 1993).  This algorithm offers n*log(n) performance on many data sets
 * that cause other quicksorts to degrade to quadratic performance.

It was deemed that quadratic "worst case" O(n^2) behavior was not a problem for this modified quicksort.

Quicksort itself was chosen for performance.

List of Objects: modified mergesort

 * The sorting algorithm is a modified mergesort (in which the merge is
 * omitted if the highest element in the low sublist is less than the
 * lowest element in the high sublist).  This algorithm offers guaranteed
 * n log(n) performance. 

"It is a reasonably fast stable sort that guarantees O(n log n) performance and requires O(n) extra space."

It also defaults to an insertion sort for small arrays.

JDK 7:

Arrays of native types: dual-pivot quicksort

 * ...The sorting algorithm is a Dual-Pivot Quicksort
 * by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm
 * offers O(n log(n)) performance on many data sets that cause other
 * quicksorts to degrade to quadratic performance, and is typically
 * faster than traditional (one-pivot) Quicksort implementations.

"The new algorithm reduces the average number of swaps by 20%."

There are also certain thresholds where if size is "below x" it will just do a counting sort, insertion sort, or quicksort instead of the "dual pivot quicksort." (depending on what type of primitive is being sorted) https://stackoverflow.com/a/41129231/32453

List of Objects: Timsort a kind of hybrid merge/insertion sort.

"It is a stable, adaptive, iterative mergesort that requires far fewer than n log(n) comparisons when running on partially sorted arrays, while offering performance comparable to a traditional mergesort when run on random arrays. Like all proper mergesorts timsort is stable and runs in O(n log n) time (worst case). In the worst case, timsort requires temporary storage space for n/2 object references; in the best case, it requires only a small constant amount of space. Contrast this with the current implementation, which always requires extra space for n object references, and beats n log n only on nearly sorted lists."

"On highly ordered data, this code can run up to 25 times as fast as the current implementation."

"1) Guaranteed O(n*log(n)) or less comparisons with a low constant. 2) Exactly n-1 comparisons for presorted (or revsorted) data. 3) Stable sort."

You can revert back to using LegacyMergeSort with an env. setting.

JDK 8:

Arrays of native types: dual-pivot quicksort, with some small modifications over jdk 7 (what?).

List of Objects: Timsort (same)

Parallel sort: ???

JDK 9:

Arrays of native types: dual-pivot quicksort, with at least some small modifications so if data is "mostly ordered" it will just do a modified merge sort on it.

List of Objects: Timsort (same)

Parallel sort: ???

JDK 10:

Arrays of native types: dual-pivot quicksort, some modifications have been proposed.

List of Objects: Timsort (same)

Parallel sort: ???

This is a community wiki feel free to update it and/or elaborate.