R create ID within a group [duplicate]
Solution 1:
There are several ways.
In base R, use ave
:
with(df, ave(rep(1, nrow(df)), IDFAM, FUN = seq_along))
# [1] 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2 3 4 5 1
With the "data.table" package, use sequence(.N)
:
library(data.table)
DT <- as.data.table(df)
DT[, ID := sequence(.N), by = IDFAM]
With the "dplyr" package, try:
df %>% group_by(IDFAM) %>% mutate(count = sequence(n()))
or (as recommended by Hadley in the comments):
df %>% group_by(IDFAM) %>% mutate(count = row_number(IDFAM))
Update
Since this seems to be something that is asked for relatively frequently, this feature has been added as a function (getanID
) in my "splitstackshape" package. It is based on the "data.table" approach above.
library(splitstackshape)
getanID(df, id.vars = "IDFAM")
# IDFAM AGED .id
# 1: 2010 7599 2996 1 45 1
# 2: 2010 7599 3071 1 47 1
# 3: 2010 7599 3071 1 24 2
# 4: 2010 7599 3660 1 46 1
# 5: 2010 7599 4736 1 46 1
# 6: 2010 7599 6235 1 44 1
# 7: 2010 7599 6299 1 43 1
# 8: 2010 7599 9903 1 43 1
# 9: 2010 7599 11013 1 43 1
# 10: 2010 7599 11778 1 16 1
# 11: 2010 7599 11778 1 43 2
# 12: 2010 7599 12248 1 46 1
# 13: 2010 7599 13127 1 44 1
# 14: 2010 7599 14261 1 47 1
# 15: 2010 7599 16280 1 43 1
# 16: 2010 7599 16280 1 16 2
# 17: 2010 7599 16280 1 20 3
# 18: 2010 7599 16280 1 18 4
# 19: 2010 7599 16280 1 18 5
# 20: 2010 7599 17382 1 43 1
Solution 2:
With dplyr 0.5 you can use the group_indices
function. Although it do not support mutate
, the following approach is straightforward:
df$id <- df %>% group_indices(IDFAM)