How to remove diacritics from text?

Solution 1:

This should be useful which handles almost all the cases.

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_COMPAT, 'UTF-8'));
}

Solution 2:

Use iconv to convert strings from a given encoding to ASCII, then replace non-alphanumeric characters using preg_replace:

$input = 'räksmörgås och köttbullar'; // UTF8 encoded
$input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
$input = preg_replace('/[^a-zA-Z0-9]/', '_', $input);
echo $input;

Result:

raksmorgas_och_kottbullar

Solution 3:

// normalize data (remove accent marks) using PHP's *intl* extension
$data = normalizer_normalize($data);

// replace everything NOT in the sets you specified with an underscore
$data = preg_replace("#[^A-Za-z1-9]#","_", $data);

Solution 4:

and all swedish should be converted like this:

'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove the dots above).

Use normalizer_normalize() to get rid of diacritical marks.

The rest should become underscores as I said.

Use preg_replace() with a pattern of [\W] (i.o.w: any character which doesn't match letters, digits or underscore) to replace them by underscores.

Final result should look like:

$data = preg_replace('[\W]', '_', normalizer_normalize($data));