How to find the highest value of a column in a data frame in R?
Similar to colMeans
, colSums
, etc, you could write a column maximum function, colMax
, and a column sort function, colSort
.
colMax <- function(data) sapply(data, max, na.rm = TRUE)
colSort <- function(data, ...) sapply(data, sort, ...)
I use ...
in the second function in hopes of sparking your intrigue.
Get your data:
dat <- read.table(h=T, text = "Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9")
Use colMax
function on sample data:
colMax(dat)
# Ozone Solar.R Wind Temp Month Day
# 41.0 313.0 20.1 74.0 5.0 9.0
To do the sorting on a single column,
sort(dat$Solar.R, decreasing = TRUE)
# [1] 313 299 190 149 118 99 19
and over all columns use our colSort
function,
colSort(dat, decreasing = TRUE) ## compare with '...' above
To get the max of any column you want something like:
max(ozone$Ozone, na.rm = TRUE)
To get the max of all columns, you want:
apply(ozone, 2, function(x) max(x, na.rm = TRUE))
And to sort:
ozone[order(ozone$Solar.R),]
Or to sort the other direction:
ozone[rev(order(ozone$Solar.R)),]
Here's a dplyr
solution:
library(dplyr)
# find max for each column
summarise_each(ozone, funs(max(., na.rm=TRUE)))
# sort by Solar.R, descending
arrange(ozone, desc(Solar.R))
UPDATE: summarise_each()
has been deprecated in favour of a more featureful family of functions: mutate_all()
, mutate_at()
, mutate_if()
, summarise_all()
, summarise_at()
, summarise_if()
Here is how you could do:
# find max for each column
ozone %>%
summarise_if(is.numeric, funs(max(., na.rm=TRUE)))%>%
arrange(Ozone)
or
ozone %>%
summarise_at(vars(1:6), funs(max(., na.rm=TRUE)))%>%
arrange(Ozone)
In response to finding the max value for each column, you could try using the apply()
function:
> apply(ozone, MARGIN = 2, function(x) max(x, na.rm=TRUE))
Ozone Solar.R Wind Temp Month Day
41.0 313.0 20.1 74.0 5.0 9.0