Use a value from the previous row in an R data.table calculation

I want to create a new column in a data.table calculated from the current value of one column and the previous of another. Is it possible to access previous rows?

E.g.:

> DT <- data.table(A=1:5, B=1:5*10, C=1:5*100)
> DT
   A  B   C
1: 1 10 100
2: 2 20 200
3: 3 30 300
4: 4 40 400
5: 5 50 500
> DT[, D := C + BPreviousRow] # What is the correct code here?

The correct answer should be

> DT
   A  B   C   D
1: 1 10 100  NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540

Solution 1:

With shift() implemented in v1.9.6, this is quite straightforward.

DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]

From NEWS:

  1. New function shift() implements fast lead/lag of vector, list, data.frames or data.tables. It takes a type argument which can be either "lag" (default) or "lead". It enables very convenient usage along with := or set(). For example: DT[, (cols) := shift(.SD, 1L), by=id]. Please have a look at ?shift for more info.

See history for previous answers.

Solution 2:

Using dplyr you could do:

mutate(DT, D = lag(B) + C)

Which gives:

#   A  B   C   D
#1: 1 10 100  NA
#2: 2 20 200 210
#3: 3 30 300 320
#4: 4 40 400 430
#5: 5 50 500 540

Solution 3:

Several folks have answered the specific question. See the code below for a general purpose function that I use in situations like this that may be helpful. Rather than just getting the prior row, you can go as many rows in the "past" or "future" as you'd like.

rowShift <- function(x, shiftLen = 1L) {
  r <- (1L + shiftLen):(length(x) + shiftLen)
  r[r<1] <- NA
  return(x[r])
}

# Create column D by adding column C and the value from the previous row of column B:
DT[, D := C + rowShift(B,-1)]

# Get the Old Faithul eruption length from two events ago, and three events in the future:
as.data.table(faithful)[1:5,list(eruptLengthCurrent=eruptions,
                                 eruptLengthTwoPrior=rowShift(eruptions,-2), 
                                 eruptLengthThreeFuture=rowShift(eruptions,3))]
##   eruptLengthCurrent eruptLengthTwoPrior eruptLengthThreeFuture
##1:              3.600                  NA                  2.283
##2:              1.800                  NA                  4.533
##3:              3.333               3.600                     NA
##4:              2.283               1.800                     NA
##5:              4.533               3.333                     NA

Solution 4:

Based on @Steve Lianoglou 's comment above, why not just:

DT[, D:= C + c(NA, B[.I - 1]) ]
#    A  B   C   D
# 1: 1 10 100  NA
# 2: 2 20 200 210
# 3: 3 30 300 320
# 4: 4 40 400 430
# 5: 5 50 500 540

And avoid using seq_len or head or any other function.