Create counter with multiple variables [duplicate]

I have my data that looks like below:

CustomerID TripDate
1           1/3/2013
1           1/4/2013
1           1/9/2013
2           2/1/2013
2           2/4/2013
3           1/2/2013

I need to create a counter variable, which will be like below:

CustomerID TripDate   TripCounter
1           1/3/2013   1
1           1/4/2013   2 
1           1/9/2013   3
2           2/1/2013   1
2           2/4/2013   2 
3           1/2/2013   1

Tripcounter will be for each customer.

Use ave. Assuming your data.frame is called "mydf":

mydf$counter <- with(mydf, ave(CustomerID, CustomerID, FUN = seq_along))
mydf
#   CustomerID TripDate counter
# 1          1 1/3/2013       1
# 2          1 1/4/2013       2
# 3          1 1/9/2013       3
# 4          2 2/1/2013       1
# 5          2 2/4/2013       2
# 6          3 1/2/2013       1

For what it's worth, I also implemented a version of this approach in a function included in my "splitstackshape" package. The function is called getanID:

mydf <- data.frame(IDA = c("a", "a", "a", "b", "b", "b", "b"),
                   IDB = c(1, 2, 1, 1, 2, 2, 2), values = 1:7)
mydf
# install.packages("splitstackshape")
library(splitstackshape)
# getanID(mydf, id.vars = c("IDA", "IDB"))
getanID(mydf, id.vars = 1:2)
#   IDA IDB values .id
# 1   a   1      1   1
# 2   a   2      2   1
# 3   a   1      3   2
# 4   b   1      4   1
# 5   b   2      5   1
# 6   b   2      6   2
# 7   b   2      7   3

As you can see from the example above, I've written the function in such a way that you can specify one or more columns that should be treated as ID columns. It checks to see if any of the id.vars are duplicated, and if they are, then it generates a new ID variable for you.

You can also use plyr for this (using @AnadaMahto's example data):

> ddply(mydf, .(IDA), transform, .id = seq_along(IDA))
  IDA IDB values .id
1   a   1      1   1
2   a   2      2   2
3   a   1      3   3
4   b   1      4   1
5   b   2      5   2
6   b   2      6   3
7   b   2      7   4

or even:

> ddply(mydf, .(IDA, IDB), transform, .id = seq_along(IDA))
  IDA IDB values .id
1   a   1      1   1
2   a   1      3   2
3   a   2      2   1
4   b   1      4   1
5   b   2      5   1
6   b   2      6   2
7   b   2      7   3

Note that plyr does not have a reputation for being the quickest solution, for that you need to take a look at data.table.

Here's a data.table approach:

library(data.table)
DT <- data.table(mydf)
DT[, .id := sequence(.N), by = "IDA,IDB"]
DT
#    IDA IDB values .id
# 1:   a   1      1   1
# 2:   a   2      2   1
# 3:   a   1      3   2
# 4:   b   1      4   1
# 5:   b   2      5   1
# 6:   b   2      6   2
# 7:   b   2      7   3

meanwhile, you can also use dplyr. if your data.frame is called mydata

library(dplyr)
mydata %>% group_by(CustomerID) %>% mutate(TripCounter = row_number())

Create counter with multiple variables [duplicate]

Related

Recent Posts