Long-form Pandas Dataframe from wide-form numpy arrays
Suppose I have the following two numpy.ndarrays
- pixels_array.shape = (1000, 28, 28)
- labels_array.shape = (1000, 0)
Pixels_array is 1000 item array of 28 x 28 pixel values, and labels_array is simply a 1000 item list of labels for those pixel values. I am attempting to merge those arrays into a long-form dataframe that looks like (did not include array examples due to space):
ID | Label | Pixels |
---|---|---|
1 | 9 | 28x28 array |
2 | B | 28x28 array |
3 | Q | 28x28 array |
4 | 8 | 28x28 array |
5 | Z | 28x28 array |
What is the best way to do this? I have been messing with this about an hour and just cannot get melt to work the way I expect. Sometimes I get a row for each item in each array, other time I get a total of 2 rows. Any help would be appreciated.
Solution 1:
You should be able to do that with following.
df = pd.DataFrame({'Pixels': [pixels_array[i] for i in range(1000)],
'Label': labels_array.flatten()})
Solution 2:
What you are asking for is rarely recommend, but one way is to coerce the array to a list first, i.e.:
arr1 = np.random.randint(1, 10, size=(1000, 28, 28))
arr2 = np.random.randn(1000)
df = pd.Series(arr2, name="Label").to_frame()
df['pixels'] = arr1.tolist()
Then if you want you can convert it back to an array:
df.pixels = df.pixels.apply(np.array)
Output:
Label pixels
0 -0.187183 [[7, 9, 6, 5, 5, 7, 6, 9, 1, 7, 7, 7, 2, 8, 8,...
1 0.360777 [[1, 4, 6, 7, 7, 4, 9, 1, 1, 8, 8, 6, 9, 3, 6,...
2 0.206012 [[7, 4, 8, 3, 4, 3, 8, 9, 1, 9, 6, 8, 7, 5, 3,...
3 0.726619 [[1, 8, 8, 4, 5, 1, 2, 2, 3, 4, 8, 3, 6, 4, 1,...
4 0.801372 [[3, 5, 7, 3, 5, 7, 4, 1, 5, 1, 6, 3, 8, 5, 9,...
(...)