Convert UTF-16 to UTF-8 under Windows and Linux, in C

Solution 1:

Change encoding to UTF-8 with PowerShell:

Get-Content PATH\temp.txt -Encoding Unicode | Set-Content -Encoding UTF8 PATH2\temp.txt

Solution 2:

If you don't want to use ICU,

  1. Windows: WideCharToMultiByte
  2. Linux: iconv (Glibc)

Solution 3:

The open source ICU library is very commonly used.

Solution 4:

#include <iconv.h>

wchar_t *src = ...; // or char16_t* on non-Windows platforms
int srclen = ...;
char *dst = ...;
int dstlen = ...;
iconv_t conv = iconv_open("UTF-8", "UTF-16");
iconv(conv, (char*)&src, &srclen, &dst, &dstlen);
iconv_close(conv);

Solution 5:

I have run into this problem too, I solve it by using boost locale library

try
{           
    std::string utf8 = boost::locale::conv::utf_to_utf<char, short>(
                        (short*)wcontent.c_str(), 
                        (short*)(wcontent.c_str() + wcontent.length()));
    content = boost::locale::conv::from_utf(utf8, "ISO-8859-1");
}
catch (boost::locale::conv::conversion_error e)
{
    std::cout << "Fail to convert from UTF-8 to " << toEncoding << "!" << std::endl;
    break;
}

The boost::locale::conv::utf_to_utf function try to convert from a buffer that encoded by UTF-16LE to UTF-8, The boost::locale::conv::from_utf function try to convert from a buffer that encoded by UTF-8 to ANSI, make sure the encoding is right(Here I use encoding for Latin-1, ISO-8859-1).

Another reminder is, in Linux std::wstring is 4 bytes long, but in Windows std::wstring is 2 bytes long, so you would better not use std::wstring to contain UTF-16LE buffer.