How to display Chinese characters inside a pandas dataframe?

I can read a csv file in which there is a column containing Chinese characters (other columns are English and numbers). However, Chinese characters don't display correctly. see photo below

enter image description here

I loaded the csv file with pd.read_csv().

Either display(data06_16) or data06_16.head() won't display Chinese characters correctly.

I tried to add the following lines into my .bash_profile:

export LC_ALL=zh_CN.UTF-8
export LANG=zh_CN.UTF-8

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

but it doesn't help.

Also I have tried to add encoding arg to pd.read_csv():

pd.read_csv('data.csv', encoding='utf_8')
pd.read_csv('data.csv', encoding='utf_16')
pd.read_csv('data.csv', encoding='utf_32')

These won't work at all.

How can I display the Chinese characters properly?


Solution 1:

I just remembered that the source dataset was created using encoding='GBK', so I tried again using

data06_16 = pd.read_csv("../data/stocks1542monthly.csv", encoding="GBK")

Now, I can see all the Chinese characters.

Thanks guys!

Solution 2:

I see here three possible issues:

1) You can try this:

import codecs
x = codecs.open("testdata.csv", "r", "utf-8")

2) Another possibility can be theoretically this:

import pandas as pd
df = pd.DataFrame(pd.read_csv('testdata.csv',encoding='utf-8')) 

3) Maybe you should convert you csv file into utf-8 before importing with Python (for example in Notepad++)? It can be a solution for one-time-import, not for automatic process, of course.