How can I sort an array of UTF-8 strings in PHP?
need help with sorting words by utf-8. For example, we have 5 cities from Belgium.
$array = array('Borgloon','Thuin','Lennik','Éghezée','Aubel');
sort($array); // Expected: Aubel, Borgloon, Éghezée, Lennik, Thuin
// Actual: Aubel, Borgloon, Lennik, Thuin, Éghezée
City Éghezée should be third. Is it possible to use/set some kind of utf-8 or create my own character order?
Solution 1:
intl comes bundled with PHP from PHP 5.3 and it only supports UTF-8.
You can use a Collator in this case:
$array = array('Borgloon','Thuin','Lennik','Éghezée','Aubel');
$collator = new Collator('en_US');
$collator->sort($array);
print_r($array);
Output:
Array
(
[0] => Aubel
[1] => Borgloon
[2] => Éghezée
[3] => Lennik
[4] => Thuin
)
Solution 2:
I think you can use strcoll:
setlocale(LC_COLLATE, 'nl_BE.utf8');
$array = array('Borgloon','Thuin','Lennik','Éghezée','Aubel');
usort($array, 'strcoll');
print_r($array);
Result:
Array
(
[0] => Aubel
[1] => Borgloon
[2] => Éghezée
[3] => Lennik
[4] => Thuin
)
You need the nl_BE.utf8 locale on your system:
fy@Heisenberg:~$ locale -a | grep nl_BE.utf8
nl_BE.utf8
If you are using debian you can use dpkg --reconfigure locales to add locales.
Solution 3:
This script should resolve in a custom way. I hope it help. Note the mb_strtolower function. You need to use it do make the function case insensitive. The reason why I didn't use the strtolower function is that it does not work well with special chars.
<?php
function customSort($a, $b) {
static $charOrder = array('a', 'b', 'c', 'd', 'e', 'é',
'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o',
'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z');
$a = mb_strtolower($a);
$b = mb_strtolower($b);
for($i=0;$i<mb_strlen($a) && $i<mb_strlen($b);$i++) {
$chA = mb_substr($a, $i, 1);
$chB = mb_substr($b, $i, 1);
$valA = array_search($chA, $charOrder);
$valB = array_search($chB, $charOrder);
if($valA == $valB) continue;
if($valA > $valB) return 1;
return -1;
}
if(mb_strlen($a) == mb_strlen($b)) return 0;
if(mb_strlen($a) > mb_strlen($b)) return -1;
return 1;
}
$array = array('Borgloon','Thuin','Lennik','Éghezée','Aubel');
usort($array, 'customSort');
EDIT: Sorry. I made many mistakes in the last code. Now is tested.
EDIT {2}: Everything with multibyte functions.
Solution 4:
If you want to use native solution, so i can propose this one
function compare($a, $b)
{
$alphabet = 'aąbcćdeęfghijklłmnnoóqprstuvwxyzźż'; // i used polish letters
$a = mb_strtolower($a);
$b = mb_strtolower($b);
for ($i = 0; $i < mb_strlen($a); $i++) {
if (mb_substr($a, $i, 1) == mb_substr($b, $i, 1)) {
continue;
}
if ($i > mb_strlen($b)) {
return 1;
}
if (mb_strpos($alphabet, mb_substr($a, $i, 1)) > mb_strpos($alphabet, mb_substr($b, $i, 1))) {
return 1;
} else {
return -1;
}
}
}
usort($needed_array, 'compare');
Not sure, that is the best solution, but it works for me =)