Convert International String to \u Codes in java
Solution 1:
there is a JDK tools executed via command line as following :
native2ascii -encoding utf8 src.txt output.txt
Example :
src.txt
بسم الله الرحمن الرحيم
output.txt
\u0628\u0633\u0645 \u0627\u0644\u0644\u0647 \u0627\u0644\u0631\u062d\u0645\u0646 \u0627\u0644\u0631\u062d\u064a\u0645
If you want to use it in your Java application, you can wrap this command line by :
String pathSrc = "./tmp/src.txt";
String pathOut = "./tmp/output.txt";
String cmdLine = "native2ascii -encoding utf8 " + new File(pathSrc).getAbsolutePath() + " " + new File(pathOut).getAbsolutePath();
Runtime.getRuntime().exec(cmdLine);
System.out.println("THE END");
Then read content of the new file.
Solution 2:
You could use escapeJavaStyleString
from org.apache.commons.lang.StringEscapeUtils
.
Solution 3:
I also had this problem. I had some Portuguese text with some special characters, but these characters where already in unicode format (ex.: \u00e3
).
So I want to convert S\u00e3o
to São
.
I did it using the apache commons StringEscapeUtils. As @sorin-sbarnea said. Can be downloaded here.
Use the method unescapeJava
, like this:
String text = "S\u00e3o"
text = StringEscapeUtils.unescapeJava(text);
System.out.println("text " + text);
(There is also the method escapeJava
, but this one puts the unicode characters in the string.)
If any one knows a solution on pure Java, please tell us.
Solution 4:
Here's an improved version of ArtB's answer:
StringBuilder b = new StringBuilder();
for (char c : input.toCharArray()) {
if (c >= 128)
b.append("\\u").append(String.format("%04X", (int) c));
else
b.append(c);
}
return b.toString();
This version escapes all non-ASCII chars and works correctly for low Unicode code points like Ä
.