Replace Unicode escape sequences in a string [duplicate]
We have one text file which has the following text
"\u5b89\u5fbd\u5b5f\u5143"
When we read the file content in C# .NET it shows like:
"\\u5b89\\u5fbd\\u5b5f\\u5143"
Our decoder method is
public string Decoder(string value)
{
Encoding enc = new UTF8Encoding();
byte[] bytes = enc.GetBytes(value);
return enc.GetString(bytes);
}
When I pass a hard coded value,
string Output=Decoder("\u5b89\u5fbd\u5b5f\u5143");
it works well, but when we use a variable value it is not working.
When we use the string this is what we get from the text file:
value=(text file content)
string Output=Decoder(value);
It returns the wrong output.
How can I fix this?
Use the below code. This unescapes any escaped characters from the input string
Regex.Unescape(value);
You could use a regular expression to parse the file:
private static Regex _regex = new Regex(@"\\u(?<Value>[a-zA-Z0-9]{4})", RegexOptions.Compiled);
public string Decoder(string value)
{
return _regex.Replace(
value,
m => ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString()
);
}
And then:
string data = Decoder(File.ReadAllText("test.txt"));
So your file contains the verbatim string
\u5b89\u5fbd\u5b5f\u5143
in ASCII and not the string represented by those four Unicode codepoints in some given encoding?
As it happens, I just wrote some code in C# that can parse strings in this format for a JSON parser project -- here's a variant that only handles \uXXXX escapes:
private static string ReadSlashedString(TextReader reader) {
var sb = new StringBuilder(32);
bool q = false;
while (true) {
int chrR = reader.Read();
if (chrR == -1) break;
var chr = (char) chrR;
if (!q) {
if (chr == '\\') {
q = true;
continue;
}
sb.Append(chr);
}
else {
switch (chr) {
case 'u':
case 'U':
var hexb = new char[4];
reader.Read(hexb, 0, 4);
chr = (char) Convert.ToInt32(new string(hexb), 16);
sb.Append(chr);
break;
default:
throw new Exception("Invalid backslash escape (\\ + charcode " + (int) chr + ")");
}
q = false;
}
}
return sb.ToString();
}
And you could use it like:
var str = ReadSlashedString(new StringReader("\\u5b89\\u5fbd\\u5b5f\\u5143"));
(or using a StreamReader
to read from a file).
Darin Dimitrov's regexp-utilizing answer is probably faster, but I happened to have this code at hand. :)