shape error while concating columns after Principal Analysis in csv
I am applying PCA in my csv data. After normalization, seems PCA is working. I want to plot projection by making 4 components. but I am stuck with this error :
type x y ... fx fy fz
0 0 -0.639547 -1.013450 ... -8.600000e-231 -1.390000e-230 0.0
0 1 -0.497006 -2.311890 ... 0.000000e+00 0.000000e+00 0.0
1 0 0.154376 -0.873189 ... 1.150000e-228 -1.480000e-226 0.0
1 1 -0.342055 -2.179370 ... 0.000000e+00 0.000000e+00 0.0
2 0 0.312719 -0.872756 ... -2.370000e-221 2.420000e-221 0.0
[5 rows x 10 columns]
(1047064, 10)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-28-0b631a51ce61> in <module>()
33
34
---> 35 finalDf = pd.concat([principalDf, df[['type']]], axis = 1)
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in _verify_integrity(self)
327 for block in self.blocks:
328 if block.shape[1:] != mgr_shape[1:]:
--> 329 raise construction_error(tot_items, block.shape[1:], self.axes)
330 if len(self.items) != tot_items:
331 raise AssertionError(
ValueError: Shape of passed values is (2617660, 5), indices imply (1570596, 5)
This is my code:
import sys
import pandas as pd
import pylab as pl
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
df1=pd.read_csv('./data/1.csv')
df2=pd.read_csv('./data/2.csv')
df = pd.concat([df1, df2], axis=0).sort_index()
print(df.head())
print(df.shape)
features = ['x', 'y', 'z', 'vx', 'vy', 'vz', 'fx', 'fy', 'fz']
# Separating out the features
x = df.loc[:, features].values
# Separating out the target
y = df.loc[:,['type']].values
# Standardizing the features
x = StandardScaler().fit_transform(x)
pca = PCA(n_components=4)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['pcc1','pcc2','pcc3', 'pcc4'])
finalDf = pd.concat([principalDf, df[['type']]], axis = 1)
I guess I am getting error while concat my components and df['type'].
Can I get idea to get rid of this error?
Thank you.
The index in df
is not the same as in principalDf
. We have (using a short version of your data)
df.index
Int64Index([0, 0, 1, 1, 2, 2, 3, 3, 4, 4], dtype='int64')
and
principalDf.index
RangeIndex(start=0, stop=10, step=1)
Hence concat
is getting confused. You can fix this by resetting the index early on:
...
df = pd.concat([df1, df2], axis=0).sort_index().reset_index() # note reset_index() added
...