How the usort() sorting algorithm works?
I have an usort() example and I added some echo statements to see how the code works:
<?php
function list_cmp($a, $b) {
global $order;
echo "\$a=$a, \$b=$b </br>";
foreach ($order as $key => $value) {
echo "\$value=$value </br>";
if ($a == $value) {
echo "\$a=\$value, returing 0. </br>";
return 0;
}
if ($b == $value) {
echo "\$b=\$value, returing 1. </br>";
return 1;
}
}
}
$order[0] = 1;
$order[1] = 3;
$order[2] = 4;
$order[3] = 2;
$array[0] = 2;
$array[1] = 1;
$array[2] = 3;
$array[3] = 4;
$array[4] = 2;
array[5] = 1;
$array[6] = 2;
usort($array, "list_cmp");
?>
The output of the code is this:
$a=2, $b=1
$value=1
$b=$value, returing 1.
$a=2, $b=3
$value=1
$value=3
$b=$value, returing 1.
$a=1, $b=3
$value=1
$a=$value, returing 0.
$a=2, $b=4
$value=1
$value=3
$value=4
$b=$value, returing 1.
$a=3, $b=4
$value=1
$value=3
$a=$value, returing 0.
$a=2, $b=2
$value=1
$value=3
$value=4
$value=2
$a=$value, returing 0.
$a=2, $b=1
$value=1
$b=$value, returing 1.
$a=2, $b=1
$value=1
$b=$value, returing 1.
$a=4, $b=1
$value=1
$b=$value, returing 1.
$a=3, $b=1
$value=1
$b=$value, returing 1.
$a=1, $b=1
$value=1
$a=$value, returing 0.
$a=2, $b=2
$value=1
$value=3
$value=4
$value=2
$a=$value, returing 0.
What is the mechanism of creating the 12 $a-$b pairs - 2-1, 2-3, 1-3, 2-4, 3-4, 2-2, 2-1, 2-1 (the same again?), 4-1, 3-1, 1-1, 2-2. The above code returns 1,1,0,1,0,0,1,1,1,1,0,0. And also what is the mechanism of sorting the array, based on the values that get returned? I am trying to understand how the usort() mechanism works.
Thanks.
Solution 1:
- How does the comparator work?
I'm not sure wether this was part of the question, but to be clear about how the comparator works:
You have an order specified by ordered list $order
and a special comparator callback list_cmp
which (should) return wether argument
-
$a
is smaller than$b
(return -1
or a value< 0
) -
$a
greater than$b
(return 1
or a value> 0
) -
$a
equal$b
(return 0
)
list_cmp
does this by looking up its order table and rather checking if
-
$a
has a smaller (or equal) order in which case the loop exits early withreturn 0
or if -
$b
has a smaller order in which case the loop exits early withreturn 1
.
Be aware that this is wrong according to the PHP Documentation which states it wants positive/negative/0 as return values. This is only correct if you know that the internals only check for comparator($a,$b) > 0
, meaning it only checks if $b
is smaller than and not equal to $a
, making this a comparison order of $a <= order of $b
. It can easily break if the code starts to check for $a
being smaller than and not equal to $b
.
- How does the
quicksortsorting work?
For starters I'm assuming you are using PHP 7 or higher. In that case you hit a special case with 6-15 Element sized arrays. PHP 7+ does not seem to use quicksort for short lists, instead it uses an Insertion Sort Variant (which is most probably faster due to Hardware related stuff like caching/code locality). You can check the sorting source code f.e. on the Github PHP Mirror (as an example: PHP 7.0 Zend/Zend_sort.c Line 177-198).
The code works in 3 steps:
-
comparision: compare neighbor elements
array[j]
andarray[j+1]
, ifarray[j] <= array[j+1]
move on, else goto 2. -
find insertion point: now if
array[j] > array[j+1]
, scan backwards to find the point wherearray[x] < array[j+1] <= array[x+1]
forx < j
(obviously only untilx
hits the start) -
insert: shift elements
x+1 ... j
one up such that it becomesx+2 ... j+1
and insert former element at positionx+1
If you apply that code to the pairings (2-1, 2-3, 1-3, 2-4, 3-4, 2-2, 2-1, 2-1, 4-1, 3-1, 1-1, 2-2) it gets obvious what the code does.
-- [2,1],3,4,2,1,2 -> 1./2./3. compare [2,1], find and insert 1 before 2
-- 1,[2,3],4,2,1,2 -> 1./2. compare [2,3], find insert point for 3 (since order of 3 < order of 2)
-- [1,3],2,4,2,1,2 -> 3. compare [1,3], found insert point for 3 before 2
-- 1,3,[2,4],2,1,2 -> 1./2. compare [2,4], find insert point for 4 (since order of 4 < order of 2)
-- 1,[3,4],2,2,1,2 -> 3. compare [3,4], found insert point for 4 before 2
-- 1,3,4,[2,2],1,2 -> 1. compare [2,2], skip
-- 1,3,4,2,[2,1],2 -> 1./2. compare [2,1], find insert point for 1
-- 1,3,4,[2,1],2,2 -> 2. compare [2,1], find insert point for 1
-- 1,3,[4,1],2,2,2 -> 2. compare [4,1], find insert point for 1
-- 1,[3,1],4,2,2,2 -> 2. compare [3,1], find insert point for 1
-- [1,1],3,4,2,2,2 -> 3. compare [1,1], fond insert point for 1 before 3
-- 1,1,3,4,2,[2,2] -> 1. compare [2,2], skip
-- sorted: 1,1,3,4,2,2,2
PS: Here you already see that it is quite complex to derive the work of even a simple sorting algorithm (22 lines of code) by its comparison patterns. The PHP 7 quicksort implementation is around 10 times as big in lines of code and has some freaky optimizations (besides normal madness due to pivot selection and recursions).
Most of the times it is better to ignore in depth implementation details and only reduce it to the stuff that is needed. Typical questions for a sorting algorithm would be if it is stable/unstable and performing in O(log n)
with O(n)
Memory consumption. There are easier ways to learn the core algorithms behind those optimized implementations like the Quicksort Dance or any other visualization or a good old (e)book or webpage with examples.
-- Edited
Added a (bad, unoptimized, unsafe) php implementation for insertion sort for another visualization of how it works:
<?php
function my_usort($A, $comparator) {
// Start .. End Positions
$current_pos = 0;
$last_pos = count($A)-1;
// Outer Loop: each step checks that A[0] up to A[current_pos] is sorted.
// When the algorithm finishes we know that A[0] ... A[last_pos] is sorted
while($current_pos < $last_pos) {
echo "Sorted Subarray from \$A is " . json_encode(array_slice($A, 0, $current_pos+1)) . "<br>\n";
echo "\$A looks like this now: " . json_encode($A) .
", comparing [" . $A[$current_pos] . "," . $A[$current_pos +1] . "] (verify step)<br>\n";
// "Verification Step"
// At this point A[0] ... A[current_pos] is sorted.
// Check A[current_pos] <= A[current_pos +1]
if($comparator($A[$current_pos], $A[$current_pos +1]) > 0) {
// nope, A[current_pos] > A[current_pos +1] (list_cmp/comparator returns value > 0)
// "Insertion Step" start, find the correct position for A[current_pos+1] in the already
// sorted list A[0] ... A[current_pos]
$insert_point = $current_pos;
// Swap the missmatching Neighbor pair
echo "swapping: \$A[" . $insert_point . "] and \$A[" . ($insert_point+1) . "]<br>\n";
$tmp = $A[$insert_point +1];
$A[$insert_point +1] = $A[$insert_point];
$A[$insert_point] = $tmp;
$sorted_up_to_current_pos = false;
// Inner Loop: find correct insertion point
while($insert_point > 0 && !$sorted_up_to_current_pos) {
echo "\$A looks like this now: " . json_encode($A) .
", comparing [" . $A[$insert_point-1] . "," . $A[$insert_point] . "] (insertion step)<br>\n";
// "Insertion Step", Swap the missmatching Neighbor pairs until A[0] ... A[current_pos] is sorted again
if($comparator($A[$insert_point-1], $A[$insert_point]) > 0) {
// Swap the missmatching Neighbor pair
echo "swapping: \$A[" . ($insert_point-1) . "] and \$A[" . $insert_point . "]<br>\n";
$tmp = $A[$insert_point];
$A[$insert_point] = $A[$insert_point-1];
$A[$insert_point-1] = $tmp;
// goto new pair
$insert_point = $insert_point -1;
} else {
// found correct spot, done
$sorted_up_to_current_pos = true;
}
}
$A[$insert_point] = $tmp;
echo "\$A looks like this now: " . json_encode($A) . ", insertion done<br>\n";
}
$current_pos = $current_pos + 1;
}
echo "Sorted Array \$A is " . json_encode(array_slice($A, 0, $current_pos+1)) . "<br>\n";
}
function list_cmp($a, $b) {
global $order;
//echo "\$a=$a, \$b=$b </br>\n";
foreach ($order as $key => $value) {
//echo "\$value=$value </br>\n";
if ($a == $value) {
echo "\$a=\$value, returing 0. </br>\n";
return 0;
}
if ($b == $value) {
echo "\$b=\$value, returing 1. </br>\n";
return 1;
}
}
}
$order[0] = 1;
$order[1] = 3;
$order[2] = 4;
$order[3] = 2;
$array[0] = 2;
$array[1] = 1;
$array[2] = 3;
$array[3] = 4;
$array[4] = 2;
$array[5] = 1;
$array[6] = 2;
my_usort($array, "list_cmp");
Output is complete now with currently sorted array, positions:
Sorted Subarray from $A is [2]
$A looks like this now: [2,1,3,4,2,1,2], comparing [2,1] (verify step)
$b=$value, returing 1.
swapping: $A[0] and $A[1]
$A looks like this now: [1,2,3,4,2,1,2], insertion done
Sorted Subarray from $A is [1,2]
$A looks like this now: [1,2,3,4,2,1,2], comparing [2,3] (verify step)
$b=$value, returing 1.
swapping: $A[1] and $A[2]
$A looks like this now: [1,3,2,4,2,1,2], comparing [1,3] (insertion step)
$a=$value, returing 0.
$A looks like this now: [1,3,2,4,2,1,2], insertion done
Sorted Subarray from $A is [1,3,2]
$A looks like this now: [1,3,2,4,2,1,2], comparing [2,4] (verify step)
$b=$value, returing 1.
swapping: $A[2] and $A[3]
$A looks like this now: [1,3,4,2,2,1,2], comparing [3,4] (insertion step)
$a=$value, returing 0.
$A looks like this now: [1,3,4,2,2,1,2], insertion done
Sorted Subarray from $A is [1,3,4,2]
$A looks like this now: [1,3,4,2,2,1,2], comparing [2,2] (verify step)
$a=$value, returing 0.
Sorted Subarray from $A is [1,3,4,2,2]
$A looks like this now: [1,3,4,2,2,1,2], comparing [2,1] (verify step)
$b=$value, returing 1.
swapping: $A[4] and $A[5]
$A looks like this now: [1,3,4,2,1,2,2], comparing [2,1] (insertion step)
$b=$value, returing 1.
swapping: $A[3] and $A[4]
$A looks like this now: [1,3,4,1,2,2,2], comparing [4,1] (insertion step)
$b=$value, returing 1.
swapping: $A[2] and $A[3]
$A looks like this now: [1,3,1,4,2,2,2], comparing [3,1] (insertion step)
$b=$value, returing 1.
swapping: $A[1] and $A[2]
$A looks like this now: [1,1,3,4,2,2,2], comparing [1,1] (insertion step)
$a=$value, returing 0.
$A looks like this now: [1,1,3,4,2,2,2], insertion done
Sorted Subarray from $A is [1,1,3,4,2,2]
$A looks like this now: [1,1,3,4,2,2,2], comparing [2,2] (verify step)
$a=$value, returing 0.
Sorted Array $A is [1,1,3,4,2,2,2]