How do I deal with NAs in residuals in a regression in R?

Solution 1:

I just found this googling around a bit deeper. The resid function on a lm with na.action=na.exclude is the way to go.

Solution 2:

Yet another idea is to take advantage of the row names associated with the data frame provided as input to lm. In that case, the residuals should retain the names from the source data. Accessing the residuals from your example would give a value of -5.3061303 for test$residuals["4"] and NA for test$residuals["3"].

However, this does not exactly answer your question. One approach to doing exactly what you asked for in terms of getting the NA values back into the residuals is illustrated below:

> D<-data.frame(x=c(NA,2,3,4,5,6),y=c(2.1,3.2,4.9,5,6,7),residual=NA)
> Z<-lm(y~x,data=D)
> D[names(Z$residuals),"residual"]<-Z$residuals
> D
   x   y residual
1 NA 2.1       NA
2  2 3.2    -0.28
3  3 4.9     0.55
4  4 5.0    -0.22
5  5 6.0    -0.09
6  6 7.0     0.04

If you are doing predictions based on the regression results, you may want to specify na.action=na.exclude in lm. See the help results for na.omit for a discussion. Note that simply specifying na.exclude does not actually put the NA values back into the residuals vector itself.

As noted in a prior answer, resid (synonym for residuals) provides a generic access function in which the residuals will contain the desired NA values if na.exclude was specified in lm. Using resid is probably more general and a cleaner approach. In that case, the code for the above example would be changed to:

> D<-data.frame(x=c(NA,2,3,4,5,6),y=c(2.1,3.2,4.9,5,6,7),residual=NA)
> Z<-lm(y~x,data=D,na.action=na.exclude)
> D$residuals<-residuals(Z)