How to get mean, median, and other statistics over entire matrix, array or dataframe?
I know this is a basic question but for some strange reason I am unable to find an answer.
How should I apply basic statistical functions like mean, median, etc. over entire array, matrix or dataframe to get unique answers and not a vector over rows or columns
Since this comes up a fair bit, I'm going to treat this a little more comprehensively, to include the 'etc.' piece in addition to mean
and median
.
For a matrix, or array, as the others have stated,
mean
andmedian
will return a single value. However,var
will compute the covariances between the columns of a two dimensional matrix. Interestingly, for a multi-dimensional array,var
goes back to returning a single value.sd
on a 2-d matrix will work, but is deprecated, returning the standard deviation of the columns. Even better,mad
returns a single value on a 2-d matrix and a multi-dimensional array. If you want a single value returned, the safest route is to coerce usingas.vector()
first. Having fun yet?For a
data.frame
,mean
is deprecated, but will again act on the columns separately.median
requires that you coerce to a vector first, orunlist
. As before,var
will return the covariances, andsd
is again deprecated but will return the standard deviation of the columns.mad
requires that you coerce to a vector orunlist
. In general for adata.frame
if you want something to act on all values, you generally will justunlist
it first.
Edit: Late breaking news(): In R 3.0.0 mean.data.frame is defunctified:
o mean() for data frames and sd() for data frames and matrices are
defunct.
By default, mean
and median
etc work over an entire array or matrix.
E.g.:
# array:
m <- array(runif(100),dim=c(10,10))
mean(m) # returns *one* value.
# matrix:
mean(as.matrix(m)) # same as before
For data frames, you can coerce them to a matrix first (the reason this is by default over columns is because a dataframe can have columns with strings in it, which you can't take the mean of):
# data frame
mdf <- as.data.frame(m)
# mean(mdf) returns column means
mean( as.matrix(mdf) ) # one value.
Just be careful that your dataframe has all numeric columns before coercing to matrix. Or exclude the non-numeric ones.
You can use library dplyr
via install.packages('dplyr') and then
dataframe.mean <- dataframe %>%
summarise_all(mean) # replace for median