Javascript, convert unicode string to Javascript escape?
I have a variable that contains a string consisting of Japanese characters, for instance;
"みどりいろ"
How would I go about converting this to its Javascript escape form?
The result I am after for this example specifically is:
"\u306f\u3044\u3044\u308d"
I'd prefer a jquery approach if there's a variation.
"み".charCodeAt(0).toString(16);
This will give you the unicode (in Hex). You can run it through a loop:
String.prototype.toUnicode = function(){
var result = "";
for(var i = 0; i < this.length; i++){
// Assumption: all characters are < 0xffff
result += "\\u" + ("000" + this[i].charCodeAt(0).toString(16)).substr(-4);
}
return result;
};
"みどりいろ".toUnicode(); //"\u307f\u3069\u308a\u3044\u308d"
"Mi Do Ri I Ro".toUnicode(); //"\u004d\u0069\u0020\u0044\u006f\u0020\u0052\u0069\u0020\u0049\u0020\u0052\u006f"
"Green".toUniCode(); //"\u0047\u0072\u0065\u0065\u006e"
Demo: http://jsfiddle.net/DerekL/X7MCy/
More on: .charCodeAt
Above answer is reasonable. A slight space and performance optimization:
function escapeUnicode(str) {
return str.replace(/[^\0-~]/g, function(ch) {
return "\\u" + ("000" + ch.charCodeAt().toString(16)).slice(-4);
});
}
just
escape("みどりいろ")
should meet the needs for most cases, buf if you need it in the form of "\u" instead of "%xx" / "%uxxxx" then you might want to use regular expressions:
escape("みどりいろ").replace(/%/g, '\\').toLowerCase()
escape("みどりいろ").replace(/%u([A-F0-9]{4})|%([A-F0-9]{2})/g, function(_, u, x) { return "\\u" + (u || '00' + x).toLowerCase() });
(toLowerCase
is optional to make it look exactly like in the first post)
It doesn't escape characters it doesn't need to in most cases which may be a plus for you; if not - see Derek's answer, or use my version:
'\\u' + "みどりいろ".split('').map(function(t) { return ('000' + t.charCodeAt(0).toString(16)).substr(-4) }).join('\\u');
My version of code, based on previous answers. I use if to convert non UTF8 chars in JSON.stringify().
const toUTF8 = string =>
string.split('').map(
ch => !ch.match(/^[^a-z0-9\s\t\r\n_|\\+()!@#$%^&*=?/~`:;'"\[\]\-]+$/i)
? ch
: '\\' + 'u' + '000' + ch.charCodeAt(0).toString(16)
).join('');
Usage:
JSON.stringify({key: 'Категория дли импорта'}, (key, value) => {
if (typeof value === "string") {
return toUTF8(value);
}
return value;
});
Returns JSON:
{"key":"\\u00041a\\u000430\\u000442\\u000435\\u000433\\u00043e\\u000440\\u000438\\u00044f \\u000434\\u00043b\\u000438 \\u000438\\u00043c\\u00043f\\u00043e\\u000440\\u000442\\u000430"}