Use a value from the previous row in an R data.table calculation
I want to create a new column in a data.table calculated from the current value of one column and the previous of another. Is it possible to access previous rows?
E.g.:
> DT <- data.table(A=1:5, B=1:5*10, C=1:5*100)
> DT
A B C
1: 1 10 100
2: 2 20 200
3: 3 30 300
4: 4 40 400
5: 5 50 500
> DT[, D := C + BPreviousRow] # What is the correct code here?
The correct answer should be
> DT
A B C D
1: 1 10 100 NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540
Solution 1:
With shift()
implemented in v1.9.6, this is quite straightforward.
DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]
From NEWS:
- New function
shift()
implements fastlead/lag
of vector, list, data.frames or data.tables. It takes atype
argument which can be either "lag" (default) or "lead". It enables very convenient usage along with:=
orset()
. For example:DT[, (cols) := shift(.SD, 1L), by=id]
. Please have a look at?shift
for more info.
See history for previous answers.
Solution 2:
Using dplyr
you could do:
mutate(DT, D = lag(B) + C)
Which gives:
# A B C D
#1: 1 10 100 NA
#2: 2 20 200 210
#3: 3 30 300 320
#4: 4 40 400 430
#5: 5 50 500 540
Solution 3:
Several folks have answered the specific question. See the code below for a general purpose function that I use in situations like this that may be helpful. Rather than just getting the prior row, you can go as many rows in the "past" or "future" as you'd like.
rowShift <- function(x, shiftLen = 1L) {
r <- (1L + shiftLen):(length(x) + shiftLen)
r[r<1] <- NA
return(x[r])
}
# Create column D by adding column C and the value from the previous row of column B:
DT[, D := C + rowShift(B,-1)]
# Get the Old Faithul eruption length from two events ago, and three events in the future:
as.data.table(faithful)[1:5,list(eruptLengthCurrent=eruptions,
eruptLengthTwoPrior=rowShift(eruptions,-2),
eruptLengthThreeFuture=rowShift(eruptions,3))]
## eruptLengthCurrent eruptLengthTwoPrior eruptLengthThreeFuture
##1: 3.600 NA 2.283
##2: 1.800 NA 4.533
##3: 3.333 3.600 NA
##4: 2.283 1.800 NA
##5: 4.533 3.333 NA
Solution 4:
Based on @Steve Lianoglou 's comment above, why not just:
DT[, D:= C + c(NA, B[.I - 1]) ]
# A B C D
# 1: 1 10 100 NA
# 2: 2 20 200 210
# 3: 3 30 300 320
# 4: 4 40 400 430
# 5: 5 50 500 540
And avoid using seq_len
or head
or any other function.