what is the difference between 'transform' and 'fit_transform' in sklearn

In scikit-learn estimator api,

fit() : used for generating learning model parameters from training data

transform() : parameters generated from fit() method,applied upon model to generate transformed data set.

fit_transform() : combination of fit() and transform() api on same data set

enter image description here

Checkout Chapter-4 from this book & answer from stackexchange for more clarity


These methods are used to center/feature scale of a given data. It basically helps to normalize the data within a particular range

For this, we use Z-score method.

Z-Score

We do this on the training set of data.

1.Fit(): Method calculates the parameters μ and σ and saves them as internal objects.

2.Transform(): Method using these calculated parameters apply the transformation to a particular dataset.

3.Fit_transform(): joins the fit() and transform() method for transformation of dataset.

Code snippet for Feature Scaling/Standardisation(after train_test_split).

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit_transform(X_train)
sc.transform(X_test)

We apply the same(training set same two parameters μ and σ (values)) parameter transformation on our testing set.