Create a sequential number (counter) for rows within each group of a dataframe [duplicate]
How can we generate unique id numbers within each group of a dataframe? Here's some data grouped by "personid":
personid date measurement
1 x 23
1 x 32
2 y 21
3 x 23
3 z 23
3 y 23
I wish to add an id column with a unique value for each row within each subset defined by "personid", always starting with 1
. This is my desired output:
personid date measurement id
1 x 23 1
1 x 32 2
2 y 21 1
3 x 23 1
3 z 23 2
3 y 23 3
I appreciate any help.
Solution 1:
Some dplyr
alternatives, using convenience functions row_number
and n
.
library(dplyr)
df %>% group_by(personid) %>% mutate(id = row_number())
df %>% group_by(personid) %>% mutate(id = 1:n())
df %>% group_by(personid) %>% mutate(id = seq_len(n()))
df %>% group_by(personid) %>% mutate(id = seq_along(personid))
You may also use getanID
from package splitstackshape
. Note that the input dataset is returned as a data.table
.
getanID(data = df, id.vars = "personid")
# personid date measurement .id
# 1: 1 x 23 1
# 2: 1 x 32 2
# 3: 2 y 21 1
# 4: 3 x 23 1
# 5: 3 z 23 2
# 6: 3 y 23 3
Solution 2:
The misleadingly named ave()
function, with argument FUN=seq_along
, will accomplish this nicely -- even if your personid
column is not strictly ordered.
df <- read.table(text = "personid date measurement
1 x 23
1 x 32
2 y 21
3 x 23
3 z 23
3 y 23", header=TRUE)
## First with your data.frame
ave(df$personid, df$personid, FUN=seq_along)
# [1] 1 2 1 1 2 3
## Then with another, in which personid is *not* in order
df2 <- df[c(2:6, 1),]
ave(df2$personid, df2$personid, FUN=seq_along)
# [1] 1 1 1 2 3 2
Solution 3:
Using data.table
, and assuming you wish to order by date
within the personid
subset
library(data.table)
DT <- data.table(Data)
DT[,id := order(date), by = personid]
## personid date measurement id
## 1: 1 x 23 1
## 2: 1 x 32 2
## 3: 2 y 21 1
## 4: 3 x 23 1
## 5: 3 z 23 3
## 6: 3 y 23 2
If you wish do not wish to order by date
DT[, id := 1:.N, by = personid]
## personid date measurement id
## 1: 1 x 23 1
## 2: 1 x 32 2
## 3: 2 y 21 1
## 4: 3 x 23 1
## 5: 3 z 23 2
## 6: 3 y 23 3
Any of the following would also work
DT[, id := seq_along(measurement), by = personid]
DT[, id := seq_along(date), by = personid]
The equivalent commands using plyr
library(plyr)
# ordering by date
ddply(Data, .(personid), mutate, id = order(date))
# in original order
ddply(Data, .(personid), mutate, id = seq_along(date))
ddply(Data, .(personid), mutate, id = seq_along(measurement))