How to count TRUE values in a logical vector

In R, what is the most efficient/idiomatic way to count the number of TRUE values in a logical vector? I can think of two ways:

z <- sample(c(TRUE, FALSE), 1000, rep = TRUE)
sum(z)
# [1] 498

table(z)["TRUE"]
# TRUE 
#  498 

Which do you prefer? Is there anything even better?


Solution 1:

The safest way is to use sum with na.rm = TRUE:

sum(z, na.rm = TRUE) # best way to count TRUE values

which gives 1.

There are some problems with other solutions when logical vector contains NA values.

See for example:

z <- c(TRUE, FALSE, NA)

sum(z) # gives you NA
table(z)["TRUE"] # gives you 1
length(z[z == TRUE]) # f3lix answer, gives you 2 (because NA indexing returns values)

Additionally table solution is less efficient (look at the code of table function).

Also, you should be careful with the "table" solution, in case there are no TRUE values in the logical vector. See for example:

z <- c(FALSE, FALSE)
table(z)["TRUE"] # gives you `NA`

or

z <- c(NA, FALSE)
table(z)["TRUE"] # gives you `NA`

Solution 2:

Another option which hasn't been mentioned is to use which:

length(which(z))

Just to actually provide some context on the "which is faster question", it's always easiest just to test yourself. I made the vector much larger for comparison:

z <- sample(c(TRUE,FALSE),1000000,rep=TRUE)
system.time(sum(z))
   user  system elapsed 
   0.03    0.00    0.03
system.time(length(z[z==TRUE]))
   user  system elapsed 
   0.75    0.07    0.83 
system.time(length(which(z)))
   user  system elapsed 
   1.34    0.28    1.64 
system.time(table(z)["TRUE"])
   user  system elapsed 
  10.62    0.52   11.19 

So clearly using sum is the best approach in this case. You may also want to check for NA values as Marek suggested.

Just to add a note regarding NA values and the which function:

> which(c(T, F, NA, NULL, T, F))
[1] 1 4
> which(!c(T, F, NA, NULL, T, F))
[1] 2 5

Note that which only checks for logical TRUE, so it essentially ignores non-logical values.

Solution 3:

Another way is

> length(z[z==TRUE])
[1] 498

While sum(z) is nice and short, for me length(z[z==TRUE]) is more self explaining. Though, I think with a simple task like this it does not really make a difference...

If it is a large vector, you probably should go with the fastest solution, which is sum(z). length(z[z==TRUE]) is about 10x slower and table(z)[TRUE] is about 200x slower than sum(z).

Summing up, sum(z) is the fastest to type and to execute.

Solution 4:

Another option is to use summary function. It gives a summary of the Ts, Fs and NAs.

> summary(hival)
   Mode   FALSE    TRUE    NA's 
logical    4367      53    2076 
>