What is the right way to multiply data frame by vector?

Solution 1:

This works too:

data.frame(mapply(`*`,df,v))

In that solution, you are taking advantage of the fact that data.frame is a type of list, so you can iterate over both the elements of df and v at the same time with mapply.

Unfortunately, you are limited in what you can output from mapply: as simple list, or a matrix. If your data are huge, this would likely be more efficient:

data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE))

Because it would convert it to a list, which is more efficient to convert to a data.frame.

Solution 2:

If you're looking for speed and memory efficiency - data.table to the rescue:

library(data.table)
dt = data.table(df)

for (i in seq_along(dt))
  dt[, (i) := dt[[i]] * v[i]]


eddi = function(dt) { for (i in seq_along(dt)) dt[, (i) := dt[[i]] * v[i]] }
arun = function(df) { df * matrix(v, ncol=ncol(df), nrow=nrow(df), byrow=TRUE) }
nograpes = function(df) { data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE)) }

N = 1e6
dt = data.table(A = rnorm(N), B = rnorm(N))
v = c(0,2)

microbenchmark(eddi(copy(dt)), arun(copy(dt)), nograpes(copy(dt)), times = 10)
#Unit: milliseconds
#               expr       min        lq      mean    median        uq       max neval
#     eddi(copy(dt))  23.01106  24.31192  26.47132  24.50675  28.87794  34.28403    10
#     arun(copy(dt)) 337.79885 363.72081 450.93933 433.21176 516.56839 644.70103    10
# nograpes(copy(dt))  19.44873  24.30791  36.53445  26.00760  38.09078  95.41124    10

As Arun points out in the comments, one can also use the set function from the data.table package to do this in-place modification on data.frame's as well:

for (i in seq_along(df))
  set(df, j = i, value = df[[i]] * v[i])

This of course also works for data.table's and could be significantly faster if the number of columns is large.