Row-wise variance of a matrix in R

I'd like to compute the variance for each row in a matrix. For the following matrix A

     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    5    6   10
[3,]   50    7   11
[4,]    4    8   12

I would like to get

[1]  16.0000   7.0000 564.3333  16.0000

I know I can achieve this with apply(A,1,var), but is there a faster or better way? From octave, I can do this with var(A,0,2), but I don't get how the Y argument of the var() function in R is to be used.

Edit: The actual dataset of a typical chunk has around 100 rows and 500 columns. The total amount of data is around 50GB though.

You could potentially vectorize var over rows (or columns) using rowSums and rowMeans

RowVar <- function(x, ...) {
  rowSums((x - rowMeans(x, ...))^2, ...)/(dim(x)[2] - 1)
}

RowVar(A)
#[1]  16.0000   7.0000 564.3333  16.0000

Using @Richards data, yields in

microbenchmark(apply(m, 1, var), RowVar(m))

## Unit: milliseconds
## expr        min         lq     median         uq        max neval
## apply(m, 1, var) 343.369091 400.924652 424.991017 478.097573 746.483601   100
##        RowVar(m)   1.766668   1.916543   2.010471   2.412872   4.834471   100

You can also create a more general function that will receive a syntax similar to apply but will remain vectorized (the column wise variance will be slower as the matrix needs to be transposed first)

MatVar <- function(x, dim = 1, ...) {
  if(dim == 1){
     rowSums((x - rowMeans(x, ...))^2, ...)/(dim(x)[2] - 1)
  } else if (dim == 2) {
     rowSums((t(x) - colMeans(x, ...))^2, ...)/(dim(x)[1] - 1)
  } else stop("Please enter valid dimension")
}


MatVar(A, 1)
## [1]  16.0000   7.0000 564.3333  16.0000

MatVar(A, 2)
        V1         V2         V3 
## 547.333333   1.666667   1.666667

This is one of the main reasons why apply() is useful. It is meant to operate on the margins of an array or matrix.

set.seed(100)
m <- matrix(sample(1e5L), 1e4L)
library(microbenchmark)
microbenchmark(apply(m, 1, var))
# Unit: milliseconds
#              expr      min       lq   median       uq      max neval
#  apply(m, 1, var) 270.3746 283.9009 292.2933 298.1297 343.9531   100

Is 300 milliseconds too long to make 10,000 calculations?

Row-wise variance of a matrix in R

Related

Recent Posts