How do I deal with NAs in residuals in a regression in R?
Solution 1:
I just found this googling around a bit deeper. The resid
function on a lm
with na.action=na.exclude
is the way to go.
Solution 2:
Yet another idea is to take advantage of the row names associated with the data frame provided as input to lm
. In that case, the residuals should retain the names from the source data. Accessing the residuals from your example would give a value of -5.3061303 for test$residuals["4"]
and NA for test$residuals["3"]
.
However, this does not exactly answer your question. One approach to doing exactly what you asked for in terms of getting the NA values back into the residuals is illustrated below:
> D<-data.frame(x=c(NA,2,3,4,5,6),y=c(2.1,3.2,4.9,5,6,7),residual=NA)
> Z<-lm(y~x,data=D)
> D[names(Z$residuals),"residual"]<-Z$residuals
> D
x y residual
1 NA 2.1 NA
2 2 3.2 -0.28
3 3 4.9 0.55
4 4 5.0 -0.22
5 5 6.0 -0.09
6 6 7.0 0.04
If you are doing predictions based on the regression results, you may want to specify na.action=na.exclude
in lm
. See the help results for na.omit
for a discussion. Note that simply specifying na.exclude
does not actually put the NA values back into the residuals vector itself.
As noted in a prior answer, resid
(synonym for residuals
) provides a generic access function in which the residuals will contain the desired NA values if na.exclude
was specified in lm
. Using resid
is probably more general and a cleaner approach. In that case, the code for the above example would be changed to:
> D<-data.frame(x=c(NA,2,3,4,5,6),y=c(2.1,3.2,4.9,5,6,7),residual=NA)
> Z<-lm(y~x,data=D,na.action=na.exclude)
> D$residuals<-residuals(Z)