Last Observation Carried Forward In a data frame? [duplicate]
I wish to implement a "Last Observation Carried Forward" for a data set I am working on which has missing values at the end of it.
Here is a simple code to do it (question after it):
LOCF <- function(x)
{
# Last Observation Carried Forward (for a left to right series)
LOCF <- max(which(!is.na(x))) # the location of the Last Observation to Carry Forward
x[LOCF:length(x)] <- x[LOCF]
return(x)
}
# example:
LOCF(c(1,2,3,4,NA,NA))
LOCF(c(1,NA,3,4,NA,NA))
Now this works great for simple vectors. But if I where to try and use it on a data frame:
a <- data.frame(rep("a",4), 1:4,1:4, c(1,NA,NA,NA))
a
t(apply(a, 1, LOCF)) # will make a mess
It will turn my data frame into a character matrix.
Can you think of a way to do LOCF on a data.frame, without turning it into a matrix? (I could use loops and such to correct the mess, but would love for a more elegant solution)
Solution 1:
This already exists:
library(zoo)
na.locf(data.frame(rep("a",4), 1:4,1:4, c(1,NA,NA,NA)))
Solution 2:
If you do not want to load a big package like zoo just for the na.locf function, here is a short solution which also works if there are some leading NAs in the input vector.
na.locf <- function(x) {
v <- !is.na(x)
c(NA, x[v])[cumsum(v)+1]
}
Solution 3:
Adding the new tidyr::fill()
function for carrying forward the last observation in a column to fill in NA
s:
a <- data.frame(col1 = rep("a",4), col2 = 1:4,
col3 = 1:4, col4 = c(1,NA,NA,NA))
a
# col1 col2 col3 col4
# 1 a 1 1 1
# 2 a 2 2 NA
# 3 a 3 3 NA
# 4 a 4 4 NA
a %>% tidyr::fill(col4)
# col1 col2 col3 col4
# 1 a 1 1 1
# 2 a 2 2 1
# 3 a 3 3 1
# 4 a 4 4 1