How to output unicode string to RTF (using C#)
Solution 1:
Provided that all the characters that you're catering for exist in the Basic Multilingual Plane (it's unlikely that you'll need anything more), then a simple UTF-16 encoding should suffice.
Wikipedia:
All possible code points from U+0000 through U+10FFFF, except for the surrogate code points U+D800–U+DFFF (which are not characters), are uniquely mapped by UTF-16 regardless of the code point's current or future character assignment or use.
The following sample program illustrates doing something along the lines of what you want:
static void Main(string[] args)
{
// ë
char[] ca = Encoding.Unicode.GetChars(new byte[] { 0xeb, 0x00 });
var sw = new StreamWriter(@"c:/helloworld.rtf");
sw.WriteLine(@"{\rtf
{\fonttbl {\f0 Times New Roman;}}
\f0\fs60 H" + GetRtfUnicodeEscapedString(new String(ca)) + @"llo, World!
}");
sw.Close();
}
static string GetRtfUnicodeEscapedString(string s)
{
var sb = new StringBuilder();
foreach (var c in s)
{
if (c <= 0x7f)
sb.Append(c);
else
sb.Append("\\u" + Convert.ToUInt32(c) + "?");
}
return sb.ToString();
}
The important bit is the Convert.ToUInt32(c)
which essentially returns the code point value for the character in question. The RTF escape for unicode requires a decimal unicode value. The System.Text.Encoding.Unicode
encoding corresponds to UTF-16 as per the MSDN documentation.
Solution 2:
Fixed code from accepted answer - added special character escaping, as described in this link
static string GetRtfUnicodeEscapedString(string s)
{
var sb = new StringBuilder();
foreach (var c in s)
{
if(c == '\\' || c == '{' || c == '}')
sb.Append(@"\" + c);
else if (c <= 0x7f)
sb.Append(c);
else
sb.Append("\\u" + Convert.ToUInt32(c) + "?");
}
return sb.ToString();
}