Parse json string in csv file [duplicate]
I have a CSV file containing some JSON strings and I want to parse it out and store into dataframe. The file looks like:
file1,"{\"A1\": {\"a\": \"123\"}, \"B1\": {\"b1\": \"456\", \"b2\": \"789\", \"b3\": \"000\"}}",
file2,"{\"A2\": {\"a\": \"321\"}, \"B2\": {\"b1\": \"654\", \"b2\": \"987\"}}"
After get the key in the dictionary. The dateframe I want will be:
1 2 3
file1 {"A1":{"a":"123"}} {"B1":{"b1":"456","b2":"789","b3":"000"}}
file2 {"A2":{"a2":"321"}} {"B2":{"b1":"654","b2":"987"}}
The value in column 2 and columns 3 will be dictionary. I have tried:
pd.read_csv(file, quotechar='"', header=None)
but it still separates my JSON in the wrong way...
Any suggestions?
Many thanks!
The data you have is using \"
to escape a double quote within each cell. This behaviour can be specified by setting both doublequote=True
and escapechar='\\'
as parameters as follows:
df = pd.read_csv('input.json', doublequote=True, escapechar='\\')
print df
Giving you something like:
0 1 2
0 file1 {"A1": {"a": "123"}, "B1": {"b1": "456", "b2":...
1 file2 {"A2": {"a": "321"}, "B2": {"b1": "654", "b2":... None
file1 {"A1": {"a": "123"}, "B1": {"b1": "456", "b2": "789", "b3": "000"}} \
0 file2 {"A2": {"a": "321"}, "B2": {"b1": "654", "b2":...
Unnamed: 2
0 NaN