Saving StandardScaler() model for use on new datasets
How do I save the StandardScaler() model in Sklearn? I need to make a model operational and don't want to load training data agian and again for StandardScaler to learn and then apply on new data on which I want to make predictions.
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
#standardizing after splitting
X_train, X_test, y_train, y_test = train_test_split(data, target)
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)
Solution 1:
you could use joblib dump function to save the standard scaler model. Here's a complete example for reference.
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
data, target = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(data, target)
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
if you want to save the sc standardscaller use the following
from sklearn.externals.joblib import dump, load
dump(sc, 'std_scaler.bin', compress=True)
this will create the file std_scaler.bin and save the sklearn model.
To read the model later use load
sc=load('std_scaler.bin')
Note: sklearn.externals.joblib
is deprecated. Install and use the pure joblib
instead
Solution 2:
Or if you like to pickle:
import pickle
pickle.dump(sc, open('file/path/scaler.pkl','wb'))
sc = pickle.load(open('file/path/scaler.pkl','rb'))