Is there a difference between the R functions fitted() and predict()?
Yes, there is. If there is a link function relating the linear predictor to the expected value of the response (such as log for Poisson regression or logit for logistic regression), predict
returns the fitted values before the inverse of the link function is applied (to return the data to the same scale as the response variable), and fitted
shows it after it is applied.
For example:
x = rnorm(10)
y = rpois(10, exp(x))
m = glm(y ~ x, family="poisson")
print(fitted(m))
# 1 2 3 4 5 6 7 8
# 0.3668989 0.6083009 0.4677463 0.8685777 0.8047078 0.6116263 0.5688551 0.4909217
# 9 10
# 0.5583372 0.6540281
print(predict(m))
# 1 2 3 4 5 6 7
# -1.0026690 -0.4970857 -0.7598292 -0.1408982 -0.2172761 -0.4916338 -0.5641295
# 8 9 10
# -0.7114706 -0.5827923 -0.4246050
print(all.equal(log(fitted(m)), predict(m)))
# [1] TRUE
This does mean that for models created by linear regression (lm
), there is no difference between fitted
and predict
.
In practical terms, this means that if you want to compare the fit to the original data, you should use fitted
.
The fitted
function returns the y-hat values associated with the data used to fit the model. The predict
function returns predictions for a new set of predictor variables. If you don't specify a new set of predictor variables then it will use the original data by default giving the same results as fitted
for some models, but if you want to predict for a new set of values then you need predict
. The predict
function often also has options for which type of prediction to return, the linear predictor, the prediction transformed to the response scale, the most likely category, the contribution of each term in the model, etc.