Setting UTF8 as default Character Encoding in Windows 7

is there a way to set Windows 7 to globally use UTF-8 as standard?
its really annoying to set every single text editor to use it.


The short answer is no, it is not possible.

To elaborate, I am afraid you won't find a global encoding option in Windows 7 that lets you both 1) set a global default which 2) all the applications you listed would obey.

Also, I would like to ask what is the problem here that you are trying to solve?

It is up to the application to choose whether they use unicode internally to represent data. While use of unicode is encouraged, you may never be sure that all your applications in fact do internally support it.

What you can do, however is change the default character encoding for each of the listed applications:

  • For Eclipse, default encoding for new files can be set from Windows > Preferences > General > Content Types (see post on Eclipse Community Forms)
  • For Notepad++, navigate to Settings > Preferences > New Document/Default/Directory and set Encoding to UTF-8
  • As for Thunderbird, I am pretty sure it already uses UTF-8 as the default encoding? (see these notes about character encoding)
  • In the case of OpenOffice (and LibreOffice), you actually don't even need to care about encoding, since documents saved by OpenOffice are based on XML, in which encoding is specified internally in the XML-files (and UTF-8 is already the default there as well)
  • From UTF-8 point-of-view, PowerShell is tricky. It has default encoding of UTF-16LE.
    • For outputting files from PowerShell to UTF-8, see this answer
    • For changing default encoding see this answer

It's not possible mainly because Windows does not allow UTF-8 as the system ANSI codepage even though it does have an ANSI codepage for UTF-8, codepage 65001. There seem to be several reasons for this:

  • When Unicode was new Microsoft decided UCS-2 would be the best way to support Unicode. At that time Unicode was 16-bit.
  • Windows has one ANSI codepage for each supported language, unlike Unix and Linux where the language and encoding can be set independently.
  • Code page 65001 doesn't work everywhere. Specifically it is broken with some of the MultiByte support in Windows which expect multibyte characters to require one or two bytes whereas UTF-8 requires between one and four bytes. The WriteFile() API for instance returns an incorrect result under codepage 65001 which bubbles up through all library code relying on it such as write().

The late Michael Kaplan who worked on internationalization at Microsoft had a blog, "Sorting it all Out", with several posts on related topics. I emailed him directly about some of these concerns back in the day.