What does calling fit() multiple times on the same model do?
After I instantiate a scikit model (e.g. LinearRegression
), if I call its fit()
method multiple times (with different X
and y
data), what happens? Does it fit the model on the data like if I just re-instantiated the model (i.e. from scratch), or does it keep into accounts data already fitted from the previous call to fit()
?
Trying with LinearRegression
(also looking at its source code) it seems to me that every time I call fit()
, it fits from scratch, ignoring the result of any previous call to the same method. I wonder if this true in general, and I can rely on this behavior for all models/pipelines of scikit learn.
If you will execute model.fit(X_train, y_train)
for a second time - it'll overwrite all previously fitted coefficients, weights, intercept (bias), etc.
If you want to fit just a portion of your data set and then to improve your model by fitting a new data, then you can use estimators, supporting "Incremental learning" (those, that implement partial_fit()
method)
You can use term fit() and train() word interchangeably in machine learning. Based on classification model you have instantiated, may be a clf = GBNaiveBayes()
or clf = SVC()
, your model uses specified machine learning technique.
And as soon as you call clf.fit(features_train, label_train)
your model starts training using the features and labels that you have passed.
you can use clf.predict(features_test)
to predict.
If you will again call clf.fit(features_train2, label_train2)
it will start training again using passed data and will remove the previous results. Your model will reset the following inside model:
- Weights
- Fitted Coefficients
- Bias
- And other training related stuff...
You can use partial_fit() method as well if you want your previous calculated stuff to stay and additionally train using next data