pandas - expand array to columns
I have a column in my pandas dataframe that contains array of numbers:
index | col
0 | [106.43477116337492, 6.762679391732501, 0.0, 9...
1 | [106.43477116337492, 6.58742122158056, 0.0, 9....
2 | [106.22211427793361, 7.303693743071101, 0.0, 9...
3 | [106.43477116337492, 7.955196940809838, 0.0, 9...
4 | [106.43477116337492, 6.400733170766536, 0.0, 9...
One value:
array([106.43477116, 6.76267939, 0. , 9.26076567,
10.78086689, 106.63684122, 5.98865461, 0. ,
8.16789259, 9.94066589, 2.03606668, 0. ,
0. ])
I need to expand the values in the array to separate columns so I will have:
col1 | col2 | col3 ...
106.434... | 6.7526.... | 0.0 ...
106.434... | 6.5874.... | 0.0 ...
How to do this? I already spent quite some time researching on this but only thing I found is explode()
which is not what I want.
Maybe this will help
a = np.array([106.43477116, 6.76267939, 0. , 9.26076567, 10.78086689, 106.63684122, 5.98865461, 0. , 8.16789259, 9.94066589, 2.03606668, 0. , 0. ])
col = pd.Series([a,a,a])
arr = np.array(col.values.tolist())
df = pd.DataFrame(columns=['c'+str(i) for i in range(a.size)])
df[df.columns] = arr
print(df)
Output:
c0 c1 c2 c3 c4 c5 c6 c7 \
0 106.434771 6.762679 0.0 9.260766 10.780867 106.636841 5.988655 0.0
1 106.434771 6.762679 0.0 9.260766 10.780867 106.636841 5.988655 0.0
2 106.434771 6.762679 0.0 9.260766 10.780867 106.636841 5.988655 0.0
c8 c9 c10 c11 c12
0 8.167893 9.940666 2.036067 0.0 0.0
1 8.167893 9.940666 2.036067 0.0 0.0
2 8.167893 9.940666 2.036067 0.0 0.0
I'm effectively just turning your column into np.ndarray
and assigning it to df[df.columns]
..values.tolist()
part is essential to get strictly shaped array. Maybe it's not the best way of doing it
You can 'spread' the column with arrays values using to_list
, then rebuild a dataframe, with if needed a prefix. And (eventually) get rid of the original column.
Assuming your dataframe column with arrays values is named 'array'
:
dfs = ( df.join( pd.DataFrame(df['array'].to_list())
.add_prefix('array_') )
.drop('array', axis = 1) )
>>> print(dfs)
array_0 array_1 array_2 ... array_10 array_11 array_12
0 106.434771 6.762679 0.0 ... 2.036067 0.0 0.0
1 106.434771 6.762679 0.0 ... 2.036067 0.0 0.0
[2 rows x 13 columns]
If you have a single column, do not want prefixes, and do not want to keep the original column, it is a bit simpler:
dfs = pd.DataFrame(df.iloc[:,0].to_list())
>>> print(dfs)
0 1 2 3 ... 9 10 11 12
0 106.434771 6.762679 0.0 9.260766 ... 9.940666 2.036067 0.0 0.0
1 106.434771 6.762679 0.0 9.260766 ... 9.940666 2.036067 0.0 0.0
[2 rows x 13 columns]