How to extract the first n rows per group?
yep, just use .SD
and index it as needed.
DT[, .SD[1:2], by=date]
date age name
1: 2000-01-01 3 Andrew
2: 2000-01-01 4 Ben
3: 2000-01-02 6 Adam
4: 2000-01-02 7 Bob
Edited as per @eddi's suggestion.
@eddi's suggestion is spot on:
Use this instead, for speed:
DT[DT[, .I[1:2], by = date]$V1]
# using a slightly larger data set
> microbenchmark(SDstyle=DT[, .SD[1:2], by=date], IStyle=DT[DT[, .I[1:2], by = date]$V1], times=200L)
Unit: milliseconds
expr min lq median uq max neval
SDstyle 13.567070 16.224797 22.170302 24.239881 88.26719 200
IStyle 1.675185 2.018773 2.168818 2.269292 11.31072 200
Probably not the fastest method, but it provides some flexibility if you don't use keyed variables and need some more flexibility. By changing the selected Row.ID
the number of first objects can be adjusted as needed.
dt[, .( age
, name
, Row.ID = rank(age)
)
, by = list(date)][Row.ID %in% (1:2), .(date
, age
, name
)]