R create ID within a group [duplicate]

Solution 1:

There are several ways.

In base R, use ave:

with(df, ave(rep(1, nrow(df)), IDFAM, FUN = seq_along))
#  [1] 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2 3 4 5 1

With the "data.table" package, use sequence(.N):

library(data.table)
DT <- as.data.table(df)
DT[, ID := sequence(.N), by = IDFAM]

With the "dplyr" package, try:

df %>% group_by(IDFAM) %>% mutate(count = sequence(n()))

or (as recommended by Hadley in the comments):

df %>% group_by(IDFAM) %>% mutate(count = row_number(IDFAM))

Update

Since this seems to be something that is asked for relatively frequently, this feature has been added as a function (getanID) in my "splitstackshape" package. It is based on the "data.table" approach above.

library(splitstackshape)
getanID(df, id.vars = "IDFAM")
#                 IDFAM AGED .id
#  1:  2010 7599 2996 1   45   1
#  2:  2010 7599 3071 1   47   1
#  3:  2010 7599 3071 1   24   2
#  4:  2010 7599 3660 1   46   1
#  5:  2010 7599 4736 1   46   1
#  6:  2010 7599 6235 1   44   1
#  7:  2010 7599 6299 1   43   1
#  8:  2010 7599 9903 1   43   1
#  9: 2010 7599 11013 1   43   1
# 10: 2010 7599 11778 1   16   1
# 11: 2010 7599 11778 1   43   2
# 12: 2010 7599 12248 1   46   1
# 13: 2010 7599 13127 1   44   1
# 14: 2010 7599 14261 1   47   1
# 15: 2010 7599 16280 1   43   1
# 16: 2010 7599 16280 1   16   2
# 17: 2010 7599 16280 1   20   3
# 18: 2010 7599 16280 1   18   4
# 19: 2010 7599 16280 1   18   5
# 20: 2010 7599 17382 1   43   1

Solution 2:

With dplyr 0.5 you can use the group_indices function. Although it do not support mutate, the following approach is straightforward:

df$id <- df %>% group_indices(IDFAM)