How to solve JSON_ERROR_UTF8 error in php json_decode?
There is a good function to sanitize your arrays.
I suggest you use a json_encode wrapper like this :
function safe_json_encode($value, $options = 0, $depth = 512, $utfErrorFlag = false) {
$encoded = json_encode($value, $options, $depth);
switch (json_last_error()) {
case JSON_ERROR_NONE:
return $encoded;
case JSON_ERROR_DEPTH:
return 'Maximum stack depth exceeded'; // or trigger_error() or throw new Exception()
case JSON_ERROR_STATE_MISMATCH:
return 'Underflow or the modes mismatch'; // or trigger_error() or throw new Exception()
case JSON_ERROR_CTRL_CHAR:
return 'Unexpected control character found';
case JSON_ERROR_SYNTAX:
return 'Syntax error, malformed JSON'; // or trigger_error() or throw new Exception()
case JSON_ERROR_UTF8:
$clean = utf8ize($value);
if ($utfErrorFlag) {
return 'UTF8 encoding error'; // or trigger_error() or throw new Exception()
}
return safe_json_encode($clean, $options, $depth, true);
default:
return 'Unknown error'; // or trigger_error() or throw new Exception()
}
}
function utf8ize($mixed) {
if (is_array($mixed)) {
foreach ($mixed as $key => $value) {
$mixed[$key] = utf8ize($value);
}
} else if (is_string ($mixed)) {
return utf8_encode($mixed);
}
return $mixed;
}
In my application utf8_encode() works better than iconv()
You need simple line of code:
$input = iconv('UTF-8', 'UTF-8//IGNORE', utf8_encode($input));
$json = json_decode($input);
Credit: Sang Le, my teamate gave me this code. Yeah!
The iconv function is pretty worthless unless you can guarantee the input is valid. Use mb_convert_encoding instead.
mb_convert_encoding($value, "UTF-8", "auto");
You can get more explicit than "auto", and even specify a comma-separated list of expected input encodings.
Most importantly, invalid characters will be handled without causing the entire string to be discarded (unlike iconv).
There is no magic bullet which will "solve" encoding problems; you have to understand what encoding you have, and then convert it.
Computers ultimately transmit and store binary data; to make that binary data useful, we devise codes that say "this string of binary represents an 'a', that one represents a 'b', and this other one represents the man-in-business-suit-levitating emoji 🕴️". UTF-8 (simplifying a little bit) is just one of those encodings. Others have names like ASCII, ISO-8859-1, Windows Code Page 1252, and Shift-JIS.
If all you know is that a string is "not UTF-8" you cannot make it into UTF-8 because you don't know if the first character is supposed to be an "a", or a "🕴️".
If you do know what encoding your string is in, you can use any of three functions in PHP; depending on your installation of PHP, some or all might be unavailable, but they are what you want.
- iconv
- mb_convert_encoding
- UConverter::transcode
Note that mb_convert_encoding lets you leave out the argument that states the current encoding. This does not automatically work out the correct encoding, it just uses a global setting which you control.
There are two other functions provided in PHP which are badly named: utf8_encode and utf8_decode. These are just extremely limited versions of the three functions above: they can only convert from ISO-8859-1 to UTF-8 and back. If your string is not in that encoding (and you don't want it to be) these functions will not help you. They might make your errors go away, but that's not the same as fixing your data.