Summarizing multiple columns with data.table
You can use a simple lapply
statement with .SD
dt[, lapply(.SD, sum, na.rm=TRUE), by=category ]
category index a b z c d
1: c 19 51.13289 48.49994 42.50884 9.535588 11.53253
2: b 9 17.34860 20.35022 10.32514 11.764105 10.53127
3: a 27 25.91616 31.12624 0.00000 29.197343 31.71285
If you only want to summarize over certain columns, you can add the .SDcols
argument
# note that .SDcols also allows reordering of the columns
dt[, lapply(.SD, sum, na.rm=TRUE), by=category, .SDcols=c("a", "c", "z") ]
category a c z
1: c 51.13289 9.535588 42.50884
2: b 17.34860 11.764105 10.32514
3: a 25.91616 29.197343 0.00000
This of course, is not limited to sum
and you can use any function with lapply
, including anonymous functions. (ie, it's a regular lapply
statement).
Lastly, there is no need to use i=T
and j= <..>
. Personally, I think that makes the code less readable, but it is just a style preference.
Documentation
See ?.SD
, ?data.table
and its .SDcols
argument, and the vignette Using .SD for Data Analysis.
Also have a look at data.table
FAQ 2.1.