Python 3 print() function with Farsi/Arabic characters [duplicate]

I simplified my code for better understanding. here is the problem :

case 1:

# -*- coding: utf-8 -*-

text = "چرا کار نمیکنی؟" # also using u"...." results the same
print(text)

output:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to <undefined>

case 2:

text = "چرا کار نمیکنی؟".encode("utf-8") 
print(text)

there is no output.

case 3:

import sys

text = "چرا کار نمیکنی؟".encode("utf-8")
sys.stdout.buffer.write(text)

output:

چرا کار نمیکنی؟

I know that case 3 works somehow , but I want to use other functions like print() , write(str()) , ....

I also read the documentation of python 3 regarding to Unicode here.

and also read dozens of Q&A in stackoverflow.

and here is a long article explaining the problem and answer for python 2.X

the simple question is:

how to print non-ASCII characters like Farsi or Arabic using python print() function?

update 1 : as it is suggested from many guys that the problem is concerned with the terminal I tested the case :

case 4 :

text = "چرا کار نمیکنی؟" .encode("utf-8")# also using u"...." results the same
print(text)

terminal :

python persian_encoding.py > test.txt

test.txt :

b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'

very important update:

after a while playing around with this issue, finally I found another workaround to make cmd.exe do the job (without needing third party softwares like ConEmu or ...):

a little explanation first:

our main problem does not concern Python. it's a problem with the Command Prompt character set in Windows(for complete explanation check out Arman's Answer) so ... if you change the character set of Windows Command Prompt to UTF-8 instead of default ascii , then the Command Prompt will be able to interact with UTF-8 characters(like Farsi or Arabic) this solution does not guarantee good representation of characters(as they will be printed out like little squares), but it's a good solution if you want to have file I/O in python with UTF-8 characters.

Steps:

before starting python from command line , type:

chcp 65001

now run your python code as always.

python testcode.py

result in case 1:

?????? ??? ??????

it runs without errors.

screenshot:

enter image description here

for more information about how to set 65001 as the default character set check this out.

Your code is correct as it works on my computer with both Python 2 and 3 (I'm on OS X):

~$ python -c 'print "تست"'
تست
~$ python3 -c 'print("تست")'
تست

The problem is with your terminal that can not output unicode characters. You could verify it by redirecting your output to a file like python3 my_file.py > test.txt and open the file using an editor.

If you are on Windows you could use a terminal like Console2 or ConEmu that renders unicode better than Windows prompt.

You may encounter errors with these terminals too because of wrong code-pages/encodings of Windows. There is a small python package that fixes them (sets them correctly):

1- Install this pip install win-unicode-console

2- Put this at the top of your python file:

try:
    # Fix UTF8 output issues on Windows console.
    # Does nothing if package is not installed
    from win_unicode_console import enable
    enable()
except ImportError:
    pass

If you got errors when redirecting to a file, you may fix it by settings io encoding:

On Windows command line:

SET PYTHONIOENCODING=utf-8

On Linux/OS X terminal:

export PYTHONIOENCODING=utf-8

Some points

There is no need to use u"aaa" syntax in python 3. Strings literals are unicode by default.
Default coding of files is UTF8 in python 3 so coding declaration comment (e.g. # -*- coding: utf-8 -*-) is not needed.

The output will depend basically on which platform&terminal you run your code. Let's examine the below snippet for different windows terminals running either with 2.x or 3.x:

# -*- coding: utf-8 -*-
import sys

def case1(text):
    print(text)

def case2(text):
    print(text.encode("utf-8"))

def case3(text):
    sys.stdout.buffer.write(text.encode("utf-8"))

if __name__ == "__main__":
    text = "چرا کار نمیکنی؟"

    for case in [case1, case2, case3]:
        try:
            print("Running {0}".format(case.__name__))
            case(text)
        except Exception as e:
            print(e)

        print('-'*80)

Results

Python 2.x

Sublime Text 3 3122

    Running case1
    'charmap' codec can't encode characters in position 0-2: character maps to <undefined>
    --------------------------------------------------------------------------------
    Running case2
    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
    --------------------------------------------------------------------------------
    Running case3
    چرا کار نمیکنی؟--------------------------------------------------------------------------------

ConEmu v151205

    Running case1
    ┌åÏ▒Ïº ┌®ÏºÏ▒ ┘å┘à█î┌®┘å█îÏƒ
    --------------------------------------------------------------------------------
    Running case2
    'ascii' codec can't decode byte 0xda in position 0: ordinal not in range(128)
    --------------------------------------------------------------------------------
    Running case3
    'file' object has no attribute 'buffer'
    --------------------------------------------------------------------------------

Windows Command Prompt

    Running case1
    ┌åÏ▒Ïº ┌®ÏºÏ▒ ┘å┘à█î┌®┘å█îÏƒ
    --------------------------------------------------------------------------------

    Running case2
    'ascii' codec can't decode byte 0xda in position 0: ordinal not in range(128)
    --------------------------------------------------------------------------------

    Running case3
    'file' object has no attribute 'buffer'
    --------------------------------------------------------------------------------

Python 3.x

Sublime Text 3 3122

    Running case1
    'charmap' codec can't encode characters in position 0-2: character maps to <undefined>
    --------------------------------------------------------------------------------
    Running case2
    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
    --------------------------------------------------------------------------------
    Running case3
    چرا کار نمیکنی؟--------------------------------------------------------------------------------

ConEmu v151205

    Running case1
    'charmap' codec can't encode characters in position 0-2: character maps to <undefined>
    --------------------------------------------------------------------------------
    Running case2
    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
    --------------------------------------------------------------------------------
    Running case3
    ┌åÏ▒Ïº ┌®ÏºÏ▒ ┘å┘à█î┌®┘å█îÏƒ--------------------------------------------------------------------------------

Windows Command Prompt

    Running case1
    'charmap' codec can't encode characters in position 0-2: character maps to <unde
    fined>
    --------------------------------------------------------------------------------

    Running case2
    b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda
    \xa9\xd9\x86\xdb\x8c\xd8\x9f'
    --------------------------------------------------------------------------------

    Running case3
    ┌åÏ▒Ïº ┌®ÏºÏ▒ ┘å┘à█î┌®┘å█îÏƒ----------------------------------------------------
    ----------------------------

As you can see just using sublime text3 terminal (case3) worked alright. The other terminals didn't support persian. The main point here is, it depends which terminal & platform you're using.

Solution (ConEmu specific)

Modern terminals like ConEmu allows you to work with UTF8-Encoding as explained here, so, let's try:

chcp 65001 & cmd

And then running again the script against 2.x & 3.x:

Python2.x

Running case1
��را کار نمیکنی؟[Errno 0] Error
--------------------------------------------------------------------------------
Running case2
'ascii' codec can't decode byte 0xda in position 0: ordinal not in range(128)
--------------------------------------------------------------------------------
Running case3
'file' object has no attribute 'buffer'
--------------------------------------------------------------------------------

Python3.x

Running case1
چرا کار نمیکنی؟
--------------------------------------------------------------------------------
Running case2
b'\xda\x86\xd8\xb1\xd8\xa7 \xda\xa9\xd8\xa7\xd8\xb1 \xd9\x86\xd9\x85\xdb\x8c\xda\xa9\xd9\x86\xdb\x8c\xd8\x9f'
--------------------------------------------------------------------------------
Running case3
چرا کار نمیکنی؟--------------------------------------------------------------------------------

As you can see, now the output was succesfull with python3 case1 (print). So... moral of a fable... learn more about your tools and how to configure them properly for your use-cases ;-)

Python 3 print() function with Farsi/Arabic characters [duplicate]

Some points

Results

Related

Recent Posts