Is there a _fast_ way to run a rolling regression inside data.table?
Not as far as I know; data.table
doesn't have any special features for rolling windows. Other packages already implement rolling functionality on vectors, so they can be used in the j
of data.table
. If they are not efficient enough, and no package has faster versions (?), then it's a case of writing faster versions yourself and (of course) contributing them: either to an existing package or creating your own.
Related questions (follow links in links) :
Using data.table to speed up rollapply
R data.table sliding window
Rolling regression over multiple columns in R
You can do 14585 / 766 ~ 19 times faster with the roll_regres
function from the rollRegres
package
require(zoo)
require(data.table)
require(microbenchmark)
set.seed(1)
tt <- seq(as.Date("2011-01-01"), as.Date("2012-01-01"), by="day")
px <- rnorm(366, 95, 1)
DT <- data.table(period=tt, pvec=px)
dtt <- DT[,tnum:=as.numeric(period)][, list(pvec, tnum)]
# this is a quite bad problem as tnum and the square has a high cor
cor(dtt$tnum, dtt$tnum^2)
#R [1] 0.9999951
# so we center it to avoid numerical issues in the comparisons
dtt$tnum <- dtt$tnum - mean(dtt$tnum)
cor(dtt$tnum, dtt$tnum^2)
#R [1] -2.355659e-21
dtx <- as.matrix(DT[,tnum:=as.numeric(period)][, tnum2:= tnum^2][, int:=1][, list(pvec, int, tnum, tnum2)])
rollreg <- function(dd)
coef(lm(pvec ~ tnum + I(tnum^2), data = as.data.frame(dd)))
rollreg.fit <- function(dd) coef(lm.fit(y=dd[,1], x=dd[,-1]))
rr <- function(dd) rollapplyr(
dd, width=20, FUN = rollreg, by.column = FALSE, align = "right")
rr.fit <- function(dd) rollapplyr(
dd, width=20, FUN = rollreg.fit, by.column = FALSE, align = "right")
#####
# use rollRegres
library(rollRegres)
rollreg_out <- rr(dtt)
rollRegres_out <- roll_regres(pvec ~ tnum + I(tnum^2), dtt, width = 20L)
# show that they give the same
all.equal(rollRegres_out$coefs[-(1:19), ], rollreg_out,
check.attributes = FALSE)
#R [1] "Mean relative difference: 4.985435e-08"
#####
# benchmark
microbenchmark(
rr = rr(dtt),
rr.fit = rr.fit(dtx),
roll_regres = roll_regres(pvec ~ tnum + I(tnum^2), dtt ,width = 20L),
times = 5)
#R Unit: microseconds
#R expr min lq mean median uq max neval
#R rr 279404.357 279456.901 282071.3414 279989.840 282201.396 289304.21 5
#R rr.fit 13744.598 14017.981 14585.2106 14147.166 14887.117 16129.19 5
#R roll_regres 621.037 660.939 766.7364 721.383 843.853 986.47 5