Is there a way to measure how sorted a list is?
Is there is a way to measure how sorted a list is?
I mean, it's not about knowing if a list is sorted or not (boolean), but something like a ratio of "sortness", something like the coefficient of correlation in statistics.
For example,
If the items of a list are in ascending order, then its rate would be 1.0
If list is sorted descending, its rate would be -1.0
If list is almost sorted ascending, its rate would be 0.9 or some value close to 1.
If the list is not sorted at all (random), its rate would be close to 0
I'm writting a small library in Scala for practice. I think a sorting rate would be useful, but I don't find any information about something like that. Maybe I don't know adequate terms for the concept.
You can simply count the number of inversions in the list.
Inversion
An inversion in a sequence of elements of type T
is a pair of sequence elements that appear out of order according to some ordering <
on the set of T
's.
From Wikipedia:
Formally, let
A(1), A(2), ..., A(n)
be a sequence ofn
numbers.
Ifi < j
andA(i) > A(j)
, then the pair(i,j)
is called an inversion ofA
.The inversion number of a sequence is one common measure of its sortedness.
Formally, the inversion number is defined to be the number of inversions, that is,
To make these definitions clearer, consider the example sequence 9, 5, 7, 6
. This sequence has the inversions (0,1), (0,2), (0,3), (2,3)
and the inversion number 4
.
If you want a value between 0
and 1
, you can divide the inversion number by N choose 2
.
To actually create an algorithm to compute this score for how sorted a list is, you have two approaches:
Approach 1 (Deterministic)
Modify your favorite sorting algorithm to keep track of how many inversions it is correcting as it runs. Though this is nontrivial and has varying implementations depending on the sorting algorithm you choose, you will end up with an algorithm that is no more expensive (in terms of complexity) than the sorting algorithm you started with.
If you take this route, be aware that it's not as simple as counting "swaps." Mergesort, for example, is worst case O(N log N)
, yet if it is run on a list sorted in descending order, it will correct all N choose 2
inversions. That's O(N^2)
inversions corrected in O(N log N)
operations. So some operations must inevitably be correcting more than one inversion at a time. You have to be careful with your implementation. Note: you can do this with O(N log N)
complexity, it's just tricky.
Related: calculating the number of “inversions” in a permutation
Approach 2 (Stochastic)
- Randomly sample pairs
(i,j)
, wherei != j
- For each pair, determine whether
list[min(i,j)] < list[max(i,j)]
(0 or 1) - Compute the average of these comparisons and then normalize by
N choose 2
I would personally go with the stochastic approach unless you have a requirement of exactness - if only because it's so easy to implement.
If what you really want is a value (z'
) between -1
(sorted descending) to 1
(sorted ascending), you can simply map the value above (z
), which is between 0
(sorted ascending) and 1
(sorted descending), to this range using this formula:
z' = -2 * z + 1
The traditional measure of how sorted a list (or other sequential structure) is, is the number of inversions.
The number of inversions is the number of pairs (a,b) st index of a < b AND b <<
a. For these purposes <<
represents whatever ordering relation you choose for your particular sort.
A fully sorted list has no inversions, and a completely reversed list has the maximum number of inversions.
You can use actual correlation.
Suppose that to each item in the sorted list, you assign an integer rank starting from zero. Note that a graph of the elements position index versus rank will look like dots in a straight line (correlation of 1.0 between the position and rank).
You can compute a correlation on this data. For a reverse sort you will get -1 and so on.
There has been great answers, and I would like to add a mathematical aspect for completeness:
You can measure how sorted a list is by measuring how much it is correlated to a sorted list. To do that, you may use the rank correlation (the most known being Spearman's), which is exactly the same as the usual correlation, but it uses the rank of elements in a list instead of the analog values of its items.
Many extensions exist, like a correlation coefficient (+1 for exact sort, -1 for exact inversion)
This allows you to have statistical properties for this measure, like the permutational central limit theorem, which allows you to know the distribution of this measure for random lists.
Apart from inversion count, for numeric lists, mean square distance from the sorted state is imaginable:
#! ruby
d = -> a { a.zip( a.sort ).map { |u, v| ( u - v ) ** 2 }.reduce( :+ ) ** 0.5 }
a = 8, 7, 3, 4, 10, 9, 6, 2, 5, 1
d.( a ) #=> 15.556
d.( a.sort ) #=> 0.0
d.( a.sort.reverse ) # => 18.166 is the worrst case