what is the difference between 'transform' and 'fit_transform' in sklearn
In scikit-learn estimator api,
fit()
: used for generating learning model parameters from training data
transform()
:
parameters generated from fit()
method,applied upon model to generate transformed data set.
fit_transform()
:
combination of fit()
and transform()
api on same data set
Checkout Chapter-4 from this book & answer from stackexchange for more clarity
These methods are used to center/feature scale of a given data. It basically helps to normalize the data within a particular range
For this, we use Z-score method.
We do this on the training set of data.
1.Fit(): Method calculates the parameters μ and σ and saves them as internal objects.
2.Transform(): Method using these calculated parameters apply the transformation to a particular dataset.
3.Fit_transform(): joins the fit() and transform() method for transformation of dataset.
Code snippet for Feature Scaling/Standardisation(after train_test_split).
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit_transform(X_train)
sc.transform(X_test)
We apply the same(training set same two parameters μ and σ (values)) parameter transformation on our testing set.