Create counter within consecutive runs of certain values

Solution 1:

William Dunlap's posts on R-help are the place to look for all things related to run lengths. His f7 from this post is

f7 <- function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)}

and in the current situation f7(!x). In terms of performance there is

> x <- sample(0:1, 1000000, TRUE)
> system.time(res7 <- f7(!x))
   user  system elapsed 
  0.076   0.000   0.077 
> system.time(res0 <- cumul_zeros(x))
   user  system elapsed 
  0.345   0.003   0.349 
> identical(res7, res0)
[1] TRUE

Solution 2:

Here's a way, building on Joshua's rle approach: (EDITED to use seq_len and lapply as per Marek's suggestion)

> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
 [1] 0 1 0 1 2 3 0 0 1 2

UPDATE. Just for kicks, here's another way to do it, around 5 times faster:

cumul_zeros <- function(x)  {
  x <- !x
  rl <- rle(x)
  len <- rl$lengths
  v <- rl$values
  cumLen <- cumsum(len)
  z <- x
  # replace the 0 at the end of each zero-block in z by the 
  # negative of the length of the preceding 1-block....
  iDrops <- c(0, diff(v)) < 0
  z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
  # ... to ensure that the cumsum below does the right thing.
  # We zap the cumsum with x so only the cumsums for the 1-blocks survive:
  x*cumsum(z)
}

Try an example:

> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
 [1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0

Now compare times on a million-length vector:

> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
   user  system elapsed 
   0.15    0.00    0.14 
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
   user  system elapsed 
   0.75    0.00    0.75 

Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!

Solution 3:

rle will "count how many consecutive hours the value has been zero since the last time it was not zero", but not in the format of your "desired output".

Note the lengths for the elements where the corresponding values are zero:

rle(x)
# Run Length Encoding
#   lengths: int [1:6] 1 1 1 3 2 2
#   values : num [1:6] 1 0 1 0 1 0