How to output unicode string to RTF (using C#)

Solution 1:

Provided that all the characters that you're catering for exist in the Basic Multilingual Plane (it's unlikely that you'll need anything more), then a simple UTF-16 encoding should suffice.


All possible code points from U+0000 through U+10FFFF, except for the surrogate code points U+D800–U+DFFF (which are not characters), are uniquely mapped by UTF-16 regardless of the code point's current or future character assignment or use.

The following sample program illustrates doing something along the lines of what you want:

static void Main(string[] args)
    // ë
    char[] ca = Encoding.Unicode.GetChars(new byte[] { 0xeb, 0x00 });
    var sw = new StreamWriter(@"c:/helloworld.rtf");
{\fonttbl {\f0 Times New Roman;}}
\f0\fs60 H" + GetRtfUnicodeEscapedString(new String(ca)) + @"llo, World!

static string GetRtfUnicodeEscapedString(string s)
    var sb = new StringBuilder();
    foreach (var c in s)
        if (c <= 0x7f)
            sb.Append("\\u" + Convert.ToUInt32(c) + "?");
    return sb.ToString();

The important bit is the Convert.ToUInt32(c) which essentially returns the code point value for the character in question. The RTF escape for unicode requires a decimal unicode value. The System.Text.Encoding.Unicode encoding corresponds to UTF-16 as per the MSDN documentation.

Solution 2:

Fixed code from accepted answer - added special character escaping, as described in this link

static string GetRtfUnicodeEscapedString(string s)
    var sb = new StringBuilder();
    foreach (var c in s)
        if(c == '\\' || c == '{' || c == '}')
            sb.Append(@"\" + c);
        else if (c <= 0x7f)
            sb.Append("\\u" + Convert.ToUInt32(c) + "?");
    return sb.ToString();