Create a sequential number (counter) for rows within each group of a dataframe [duplicate]

How can we generate unique id numbers within each group of a dataframe? Here's some data grouped by "personid":

personid date measurement
1         x     23
1         x     32
2         y     21
3         x     23
3         z     23
3         y     23

I wish to add an id column with a unique value for each row within each subset defined by "personid", always starting with 1. This is my desired output:

personid date measurement id
1         x     23         1
1         x     32         2
2         y     21         1
3         x     23         1
3         z     23         2
3         y     23         3

I appreciate any help.

Solution 1:

Some dplyr alternatives, using convenience functions row_number and n.

library(dplyr)
df %>% group_by(personid) %>% mutate(id = row_number())
df %>% group_by(personid) %>% mutate(id = 1:n())
df %>% group_by(personid) %>% mutate(id = seq_len(n()))
df %>% group_by(personid) %>% mutate(id = seq_along(personid))

You may also use getanID from package splitstackshape. Note that the input dataset is returned as a data.table.

getanID(data = df, id.vars = "personid")
#    personid date measurement .id
# 1:        1    x          23   1
# 2:        1    x          32   2
# 3:        2    y          21   1
# 4:        3    x          23   1
# 5:        3    z          23   2
# 6:        3    y          23   3

Solution 2:

The misleadingly named ave() function, with argument FUN=seq_along, will accomplish this nicely -- even if your personid column is not strictly ordered.

df <- read.table(text = "personid date measurement
1         x     23
1         x     32
2         y     21
3         x     23
3         z     23
3         y     23", header=TRUE)

## First with your data.frame
ave(df$personid, df$personid, FUN=seq_along)
# [1] 1 2 1 1 2 3

## Then with another, in which personid is *not* in order
df2 <- df[c(2:6, 1),]
ave(df2$personid, df2$personid, FUN=seq_along)
# [1] 1 1 1 2 3 2

Solution 3:

Using data.table, and assuming you wish to order by date within the personid subset

library(data.table)
DT <- data.table(Data)

DT[,id := order(date), by  = personid]

##    personid date measurement id
## 1:        1    x          23  1
## 2:        1    x          32  2
## 3:        2    y          21  1
## 4:        3    x          23  1
## 5:        3    z          23  3
## 6:        3    y          23  2

If you wish do not wish to order by date

DT[, id := 1:.N, by = personid]

##    personid date measurement id
## 1:        1    x          23  1
## 2:        1    x          32  2
## 3:        2    y          21  1
## 4:        3    x          23  1
## 5:        3    z          23  2
## 6:        3    y          23  3

Any of the following would also work

DT[, id := seq_along(measurement), by =  personid]
DT[, id := seq_along(date), by =  personid]

The equivalent commands using plyr

library(plyr)
# ordering by date
ddply(Data, .(personid), mutate, id = order(date))
# in original order
ddply(Data, .(personid), mutate, id = seq_along(date))
ddply(Data, .(personid), mutate, id = seq_along(measurement))

Why equal to operator does not work if it is not surrounded by space?

Is it possible to have 2 different but equal size real number sets that have the same mean and standard deviation?

If there are $74$ heads and $196$ legs, how many horses and humans are there? [closed]

What does the space of non-diagonalizable matrices look like?

Possible distinct positive real $x,y,z \neq 1$ with $x^{(y^z)} = y^{(z^x)} = z^{(x^y)}$ in cyclic permutation?

Generating function solution to previous question $a_{n}=a_{\lfloor n/2\rfloor}+a_{\lfloor n/3 \rfloor}+a_{\lfloor n/6\rfloor}$

Which polynomials are characteristic polynomials of a symmetric matrix?

Closed form for $\left(1+\left(\frac{1}{2}+\left(\frac{1}{3}+\left(\frac{1}{4}+\cdots\right)^2\right)^2\right)^2\right)^2$?

What are ways to compute polynomials that converge from above and below to a continuous and bounded function on $[0,1]$?

Is the complement of countably many disjoint closed disks path connected?

The Stupid Computer Problem : can every polynomial be written with only one $x$?

Show that $\sum{\frac{x^n}{(1+x^n)^n}}$ converges uniformly