Count how many values are above limit in dataframe

Solution 1:

Try this:

below <- rowSums(data[,-c(1,2)] < data[,'ll'])  # below
above <- rowSums(data[,-c(1,2)] > data[,'ul'])  # above
DF <- cbind(data, below, above) # altogether
head(DF)
  ll ul unit1 unit2 unit3 unit4 unit5 below above
1 10 82    75    10    24    80    60     0     0
2 13 87    60    20     4    94    31     1     1
3  8 85    58     9    98    94    65     0     2
4 11 95    68    45     5    38    76     1     0
5 11 87    15    79     5    43    67     1     0
6 11 89    33     6    18     1    22     2     0

Solution 2:

I would use within() for this, following the same logic as @Jilber's answer:

# First, use set.seed to make your example reproducible
set.seed(1)
data <- data.frame(
  ll=round(runif(5, 5, 15)),
  ul=round(runif(5, 80, 95)),
  unit1=sample(1:100, 5, TRUE),
  unit2=sample(1:100, 5, TRUE),
  unit3=sample(1:100, 5, TRUE),
  unit4=sample(1:100, 5, TRUE),
  unit5=sample(1:100, 5, TRUE)
)
data
#   ll ul unit1 unit2 unit3 unit4 unit5
# 1  8 93    21    50    94    39    49
# 2  9 94    18    72    22     2    60
# 3 11 90    69   100    66    39    50
# 4 14 89    39    39    13    87    19
# 5  7 81    77    78    27    35    83

The within function lets you add new columns in a convenient way.

within(data, {
  below = rowSums(data[-c(1:2)] < ll)
  above = rowSums(data[-c(1:2)] > ul)
})
#   ll ul unit1 unit2 unit3 unit4 unit5 above below
# 1  8 93    21    50    94    39    49     1     0
# 2  9 94    18    72    22     2    60     0     1
# 3 11 90    69   100    66    39    50     1     0
# 4 14 89    39    39    13    87    19     0     1
# 5  7 81    77    78    27    35    83     1     0

Alternatively, you can also use transform() to achieve the same output:

transform(data, 
          below = rowSums(data[-c(1:2)] < ll), 
          above = rowSums(data[-c(1:2)] > ul))

Benchmarking the Jilber's solution and these two on a 2,000,000 row dataset, here are the results:

       test replications elapsed relative user.self sys.self
3    jilber            3  33.586    1.000    31.490    1.916
1    within            3  34.493    1.027    32.542    1.584
2 transform            3  33.813    1.007    31.870    1.828

I think these two functions hold up pretty well for the convenience they offer!

Count how many values are above limit in dataframe

Solution 1:

Solution 2:

Related

Recent Posts