Converting nested json into a pandas data frame in Python

I have a nested data frame in JSON. I have no problem with taking a data frame that isn't nested and converting into pandas data frame.

What I am having issues is when there are multiple levels of the data frame and I need to write independent records for each of the json entries.

{
  'type': 'text1',
  'key': ['key1'],
  
},  
{
  'type': 'text2',
  'key': ['key1', 'key2'], 
}, 
 'type': 'text3',
 'key': 'key', 
}

I used the following code to write this into a data frame.

 df = pd.DataFrame.from_dict(json)

enter image description here

Unfortunately for each of the entries, I have to include a record. So if key has 2 elements in the array, 2 entries will need to be created. And an additional column (key index) will be created. So what I am trying to get is something similar to below.

enter image description here

Any help would be greatly appreciated on this as I have been stuck on this for a while!

Use explode:

json = [{'type': 'text1', 'key': ['key1']},
        {'type': 'text2', 'key': ['key1', 'key2']},
        {'type': 'text3', 'key': 'key'}]

df = pd.DataFrame(json).explode('key') \
       .assign(key_index=lambda x: x.groupby(level=0).cumcount())
print(df)

# Output
    type   key  key_index
0  text1  key1          0
1  text2  key1          0
1  text2  key2          1
2  text3   key          0

Update

How would you start the key index at 1 instead of 0 in the case of multiple entries. For text 1, the generated index is 0. For text 2, the generated index starts at 1 and so there is 1 and 2.


df = pd.DataFrame(json).explode('key') \
       .assign(key_index=lambda x: x.groupby(level=0)['key'].transform(cumcount))
print(df)

# Output
    type   key  key_index
0  text1  key1          0
1  text2  key1          1
1  text2  key2          2
2  text3   key          0

Converting nested json into a pandas data frame in Python

Related

Recent Posts