Row-wise variance of a matrix in R
I'd like to compute the variance for each row in a matrix. For the following matrix A
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 5 6 10
[3,] 50 7 11
[4,] 4 8 12
I would like to get
[1] 16.0000 7.0000 564.3333 16.0000
I know I can achieve this with apply(A,1,var)
, but is there a faster or better way? From octave, I can do this with var(A,0,2)
, but I don't get how the Y
argument of the var()
function in R is to be used.
Edit: The actual dataset of a typical chunk has around 100 rows and 500 columns. The total amount of data is around 50GB though.
You could potentially vectorize var
over rows (or columns) using rowSums
and rowMeans
RowVar <- function(x, ...) {
rowSums((x - rowMeans(x, ...))^2, ...)/(dim(x)[2] - 1)
}
RowVar(A)
#[1] 16.0000 7.0000 564.3333 16.0000
Using @Richards data, yields in
microbenchmark(apply(m, 1, var), RowVar(m))
## Unit: milliseconds
## expr min lq median uq max neval
## apply(m, 1, var) 343.369091 400.924652 424.991017 478.097573 746.483601 100
## RowVar(m) 1.766668 1.916543 2.010471 2.412872 4.834471 100
You can also create a more general function that will receive a syntax similar to apply
but will remain vectorized (the column wise variance will be slower as the matrix needs to be transposed first)
MatVar <- function(x, dim = 1, ...) {
if(dim == 1){
rowSums((x - rowMeans(x, ...))^2, ...)/(dim(x)[2] - 1)
} else if (dim == 2) {
rowSums((t(x) - colMeans(x, ...))^2, ...)/(dim(x)[1] - 1)
} else stop("Please enter valid dimension")
}
MatVar(A, 1)
## [1] 16.0000 7.0000 564.3333 16.0000
MatVar(A, 2)
V1 V2 V3
## 547.333333 1.666667 1.666667
This is one of the main reasons why apply()
is useful. It is meant to operate on the margins of an array or matrix.
set.seed(100)
m <- matrix(sample(1e5L), 1e4L)
library(microbenchmark)
microbenchmark(apply(m, 1, var))
# Unit: milliseconds
# expr min lq median uq max neval
# apply(m, 1, var) 270.3746 283.9009 292.2933 298.1297 343.9531 100
Is 300 milliseconds too long to make 10,000 calculations?