Extract rows for the first occurrence of a variable in a data frame
I have a data frame with two variables, Date and Taxa and want to get the date for the first time each taxa occurs. There are 9 different dates and 40 different taxa in the data frame consisting of 172 rows, but my answer should only have 40 rows.
Taxa is a factor and Date is a date.
For example, my data frame (called 'species') is set up like this:
Date Taxa
2013-07-12 A
2011-08-31 B
2012-09-06 C
2012-05-17 A
2013-07-12 C
2012-09-07 B
and I would be looking for an answer like this:
Date Taxa
2012-05-17 A
2011-08-31 B
2012-09-06 C
I tried using:
t.first <- species[unique(species$Taxa),]
and it gave me the correct number of rows but there were Taxa repeated. If I just use unique(species$Taxa) it appears to give me the right answer, but then I don't know the date when it first occurred.
Thanks for any help.
Solution 1:
t.first <- species[match(unique(species$Taxa), species$Taxa),]
should give you what you're looking for. match
returns indices of the first match in the compared vectors, which give you the rows you need.
Solution 2:
In the following command, duplicated
creates a logical index for duplicated data$Taxa
values. A subset of the data frame without the corresponding rows is created with:
data[!duplicated(data$Taxa), ]
The result:
Date Taxa
1 2012-05-17 A
2 2011-08-31 B
3 2012-09-06 C
Solution 3:
Here is a dplyr
option that is not dependent on the data being sorted in date order and accounts for ties:
library(dplyr)
df %>%
mutate(Date = as.Date(Date)) %>%
group_by(Taxa) %>%
filter(Date == min(Date)) %>%
slice(1) %>% # takes the first occurrence if there is a tie
ungroup()
# A tibble: 3 x 2
Date Taxa
<date> <chr>
1 2012-05-17 A
2 2011-08-31 B
3 2012-09-06 C
# sample data:
df <- read.table(text = 'Date Taxa
2013-07-12 A
2011-08-31 B
2012-09-06 C
2012-05-17 A
2013-07-12 C
2012-09-07 B', header = TRUE, stringsAsFactors = FALSE)
And you could get the same by sorting by date as well:
df %>%
mutate(Date = as.Date(Date)) %>%
group_by(Taxa) %>%
arrange(Date) %>%
slice(1) %>%
ungroup()