How to select last N observation from each group in dplyr dataframe?
Given a dataframe:
df <- structure(list(a = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4), b = c(34,
343, 54, 11, 55, 62, 59, -9, 0, -0.5)), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))
I want to take last N observations / rows from each group:
df %>%
dplyr::group_by(a) %>%
dplyr::last(2)
Gives me wrong results.
I want it to be:
a b
1 343
1 54
2 55
2 62
3 59
3 -9
4 0
4 -0.5
Please advise what is wrong here?
The error I get is:
Error in order(order_by)[[n]] : subscript out of bounds
Solution 1:
As it is a specific question based on dplyr
1) after the group_by
, use slice
on the row_number()
library(tidyverse)
df %>%
group_by(a) %>%
slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups: a [4]
# a b
# <dbl> <dbl>
#1 1 343
#2 1 54
#3 2 55
#4 2 62
#5 3 59
#6 3 -9
#7 4 0
#8 4 -0.5
2) Or use filter
from dplyr
df %>%
group_by(a) %>%
filter(row_number() >= (n() - 1))
3) or with do
and tail
df %>%
group_by(a) %>%
do(tail(., 2))
4) In addition to the tidyverse
, methods, we can also use compact data.table
library(data.table)
setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1]
5) Or by
from base R
by(df, df$a, FUN = tail, 2)
6) or with aggregate
from base R
df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,]
7) or with split
from base R
do.call(rbind, lapply(split(df, df$a), tail, 2))
Solution 2:
A base R option using tapply
is to subset the last two rows for every group.
df[unlist(tapply(1:nrow(df), df$a, tail, 2)), ]
# a b
# <dbl> <dbl>
#1 1 343
#2 1 54
#3 2 55
#4 2 62
#5 3 59
#6 3 -9
#7 4 0
#8 4 -0.5
Or another option using ave
df[as.logical(with(df, ave(1:nrow(df), a, FUN = function(x) x %in% tail(x, 2)))), ]
Solution 3:
Also a tidyverse
possibility:
df %>%
group_by(a) %>%
top_n(2, row_number())
a b
<dbl> <dbl>
1 1. 343.
2 1. 54.0
3 2. 55.0
4 2. 62.0
5 3. 59.0
6 3. -9.00
7 4. 0.
8 4. -0.500
It is taking the top two rows given the row numbers per groups.
Solution 4:
Try tail()
.In R head
function allows you to preview the first n rows, while tail
allows you to preview last n rows.