python 3.0, how to make print() output unicode?
I'm working in WinXP 5.1.2600, writing a Python application involving Chinese pinyin, which has involved me in endless Unicode problems. Switching to Python 3.0 has solved many of them. But the print() function for console output is not Unicode-aware for some odd reason. Here's a teeny program.
print('sys.stdout encoding is "' + sys.stdout.encoding + '"')
str1 = 'lüelā'
print(str1)
Output is (changing angle brackets to square brackets for readability):
sys.stdout encoding is "cp1252" Traceback (most recent call last): File "TestPrintEncoding.py", line 22, in [module] print(str1) File "C:\Python30\lib\io.py", line 1491, in write b = encoder.encode(s) File "C:\Python30\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0101' in position 4: character maps to [undefined]
Note that ü = \xfc = 252 gives no problem since it's upper ASCII. But ā = \u0101 is beyond 8-bits.
Anyone have an idea how to change the encoding of sys.stdout to 'utf-8'? Bear in mind that Python 3.0 no longer uses the codecs
module, if I understand the documentation right.
Apologies, I gave you the program without the preamble. Before the 3 lines given, it starts like this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
Unfortunately, the coding specified by the "coding:" line is the coding of the source code, not of the console output. But thank you for your thoughts!
The Windows command prompt (cmd.exe) cannot display the Unicode characters you are using, even though Python is handling it in a correct manner internally. You need to use IDLE, Cygwin, or another program that can display Unicode correctly.
See this thread for a full explanation: http://www.nabble.com/unable-to-print-Unicode-characters-in-Python-3-td21670662.html
You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8." I have written a page on my ordeal with this problem.
Check out the question and answer here, I think they have some valuable clues. Specifically, note the setdefaultencoding
in the sys
module, but also the fact that you probably shouldn't use it.
Here's a dirty hack:
# works
import os
os.system("chcp 65001 &")
print("юникод")
However everything breaks it:
-
simple muting first line already breaks it:
# doesn't work import os os.system("chcp 65001 >nul &") print("юникод")
-
checking for OS type breaks it:
# doesn't work import os if os.name == "nt": os.system("chcp 65001 &") print("юникод")
-
it doesn't even works under if block:
# doesn't work import os if os.name == "nt": os.system("chcp 65001 &") print("юникод")
But one can print with cmd's echo:
# works
import os
os.system("chcp 65001 & echo {0}".format("юникод"))
and here's a simple way to make this cross-platform:
# works
import os
def simple_cross_platrofm_print(obj):
if os.name == "nt":
os.system("chcp 65001 >nul & echo {0}".format(obj))
else:
print(obj)
simple_cross_platrofm_print("юникод")
but the window's echo
trailing empty line can't be suppressed.
The problem of displaying Unicode charaters in Python in Windows is known. There is no official solution yet. The right thing to do is to use winapi function WriteConsoleW. It is nontrivial to build a working solution as there are other related issues. However, I have developed a package which tries to fix Python regarding this issue. See https://github.com/Drekin/win-unicode-console. You can also read there a deeper explanation of the problem. The package is also on pypi (https://pypi.python.org/pypi/win_unicode_console) and can be installed using pip.