Creating dataframe from a dictionary where entries have different lengths
Say I have a dictionary with 10 key-value pairs. Each entry holds a numpy array. However, the length of the array is not the same for all of them.
How can I create a dataframe where each column holds a different entry?
When I try:
pd.DataFrame(my_dict)
I get:
ValueError: arrays must all be the same length
Any way to overcome this? I am happy to have Pandas use NaN
to pad those columns for the shorter entries.
In Python 3.x:
import pandas as pd
import numpy as np
d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.items() ]))
Out[7]:
A B
0 1 1
1 2 2
2 NaN 3
3 NaN 4
In Python 2.x:
replace d.items()
with d.iteritems()
.
Here's a simple way to do that:
In[20]: my_dict = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
In[21]: df = pd.DataFrame.from_dict(my_dict, orient='index')
In[22]: df
Out[22]:
0 1 2 3
A 1 2 NaN NaN
B 1 2 3 4
In[23]: df.transpose()
Out[23]:
A B
0 1 1
1 2 2
2 NaN 3
3 NaN 4