filtering data.frame based on row_number()
UPDATE: dplyr has been updated since this question was asked and now performs as the OP wanted
I´m trying to get the second to the seventh line in a data.frame
using dplyr
.
I´m doing this:
require(dplyr)
df <- data.frame(id = 1:10, var = runif(10))
df <- df %>% filter(row_number() <= 7, row_number() >= 2)
But this throws an error.
Error in rank(x, ties.method = "first") :
argument "x" is missing, with no default
I know i could easily make:
df <- df %>% mutate(rn = row_number()) %>% filter(rn <= 7, rn >= 2)
But I would like to understand why my first try is not working.
Actually dplyr's slice
function is made for this kind of subsetting:
df %>% slice(2:7)
(I'm a little late to the party but thought I'd add this for future readers)
The row_number()
function does not simply return the row number of each element and so can't be used like you want:
• ‘row_number’: equivalent to ‘rank(ties.method = "first")’
You're not actually saying what you want the row_number
of. In your case:
df %>% filter(row_number(id) <= 7, row_number(id) >= 2)
works because id
is sorted and so row_number(id)
is 1:10
. I don't know what row_number()
evaluates to in this context, but when called a second time dplyr
has run out of things to feed it and you get the equivalent of:
> row_number()
Error in rank(x, ties.method = "first") :
argument "x" is missing, with no default
That's your error right there.
Anyway, that's not the way to select rows.
You simply need to subscript df[2:7,]
, or if you insist on pipes everywhere:
> df %>% "["(.,2:7,)
id var
2 2 0.52352994
3 3 0.02994982
4 4 0.90074801
5 5 0.68935493
6 6 0.57012344
7 7 0.01489950
Here is another way to do row-number based filtering in a pipeline.
df <- data.frame(id = 1:10, var = runif(10))
df %>% .[2:7,]
> id var
2 2 0.28817
3 3 0.56672
4 4 0.96610
5 5 0.74772
6 6 0.75091
7 7 0.05165