Is it possible to set "locale" of a Windows application to UTF-8?
We know there is an application called AppLocale, which can change the code page of non-Unicode applications, to solve text display problems.
But there is a program whose right display code page is UTF-8, which means its text should be shown as UTF-8, but instead Windows displays it as the native code page and makes the text unreadable. It seems funny, because there are almost all countries and regions, but without UTF-8. I think it is a bug, because the programmers may use English and ignore testing non-English text display issues. I don't think the producer will fix it and I wanna fix it myself.
Is it possible to set non-Unicode output as UTF-8 by using software like AppLocale? Default non-Unicode output is native code page? How can I set the native code page to UTF-8?
Solution 1:
Previously it was not possible because
Microsoft claimed a UTF-8 locale might break some functions (a possible example is
_mbsrev
) as they were written to assume multibyte encodings used no more than 2 bytes per character, thus until now code pages with more bytes such as GB 18030 (cp54936) and UTF-8 could not be set as the locale.https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8
However there's a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox since Windows 10 insider build 17035 for setting the locale code page to UTF-8
See also
- Changing ansi and OEM code page in Windows
- Windows 10 Insider Preview Build 17035 Supports UTF-8 as ANSI
That said, the support is still buggy at this point
- Freeze issue in Windows 10 1803 when use UTF-8 as default code page
- when unicode beta support in windows 10 is turned on, add-ons fail to install
- UTF-8 support for single byte character sets is beta in Windows and likely breaks a lot of applications not expecting this
- Build fail with internal error in MSVC
Update:
Microsoft has also added the ability for programs to use the UTF-8 locale without even setting the UTF-8 beta flag above. You can use the /execution-charset:utf-8
or /utf-8
options when compiling with MSVC or set the ActiveCodePage property in appxmanifest
You can also use UTF-8 locale in older Windows versions by linking with the appropriate C runtime
Starting in Windows 10 build 17134 (April 2018 Update), the Universal C Runtime supports using a UTF-8 code page. This means that
char
strings passed to C runtime functions will expect strings in the UTF-8 encoding. To enable UTF-8 mode, use "UTF-8" as the code page when usingsetlocale
. For example,setlocale(LC_ALL, ".utf8")
will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page....
To use this feature on an OS prior to Windows 10, such as Windows 7, you must use app-local deployment or link statically using version 17134 of the Windows SDK or later. For Windows 10 operating systems prior to 17134, only static linking is supported.
UTF-8 Support