Repeat rows in a data frame AND add an increment field [duplicate]

I found lots of answers on how to duplicate records, but I also want to add an increment field to each of the duplicated records. I found a similar question, but they don't have a startValue field: Repeat the rows in a data frame based on values in a specific column.

My data frame starts with

df <-
  data startValue freq
    a        3.4    3
    b        2.1    2
    c        6.3    1

I want this ouput

df.expanded <-
    data startValue value
       a        3.4     3
       a        3.4     4
       a        3.4     5
       b        2.1     2
       b        2.1     3
       c        6.3     6

I did find a way to do this, but I would like something simpler that will work well on large data sets. Here is what I did that worked.

df <- data.frame(data = c("a", "b", "c"),
                 startValue = c(3.4, 2.1, 6.3),
                 freq = c(3,2,1))
df

# find the largest integer that I will need as an index.
n <- floor(max(df$startValue + df$freq))-1

# repeat each df record n times. Only the record with the
# largest startValue + freq needs to be repeated this many
# times, but I am repeating everything this many times.
df.expanded <- df[rep(row.names(df), each = n), ]

# Use recycling to fill a new column. Now I have created
# a Cartesian product. If n is 46, records with a
# freq of 46 are repeated just the right number of times.
# but records with a freq of 2 are repeated many more times
# than is needed.
df.expanded$value <- 1:n

# finally, I filter out all the extra repeats that I didn't need.
df.expanded <-
df.expanded[df.expanded$value >= floor(df.expanded$startValue)
            & df.expanded$value < floor(df.expanded$startValue+df.expanded$freq),]
df.expanded[-3]

Is there a way that will work better with large data sets? Most records need less than 5 repeats, but a few need 50 repeats. I don't like the idea of repeating everything 50 times when only 1 out of 10000 records needs large repeats. Thanks.

You can use uncount from tidyr

library(dplyr)
library(tidyr)

df %>%
  uncount(weights = freq, .id = "n", .remove = F) %>%
  mutate(value = freq + n - 1)

  data startValue freq n value
1    a        3.4    3 1     3
2    a        3.4    3 2     4
3    a        3.4    3 3     5
4    b        2.1    2 1     2
5    b        2.1    2 2     3
6    c        6.3    1 1     1

Repeat rows in a data frame AND add an increment field [duplicate]

Related

Recent Posts