R how to summarize multiple columns of data by different function from look up table
I have a data.table
of multiple columns that need to be summarized by a particular function depending on the column name.
A | B | C | D
1 | 1 | 1 | x
2 | 2 | 2 | y
3 | 3 | 3 | z
Should become
A | B | C | D
1 | 3 | 6 | z
As per the lookup table:
A | B | C | D
"min" | "max" | "sum" | "max"
You can't really summarize characters like letters (such as Z), so I assume you mean grouped summary data like this:
# Create data frame:
A <- c(1,2,3)
B <- c(1,2,3)
C <- c(1,2,3)
D <- c("x", "y", "z")
letters <- data.frame(A,B,C,D)
# Load library for summarizing values:
library(dplyr)
# Summarize and group by specific vector:
letters %>%
group_by(D) %>%
summarize(Min_A = min(A),
Max_B = max(B),
Sum_C = sum(C))
Which gives you this:
D Min_A Max_B Sum_C
<chr> <dbl> <dbl> <dbl>
1 x 1 1 1
2 y 2 2 2
3 z 3 3 3
Otherwise if you just mean all the descriptives (min, max, etc.):
# Ungrouped:
letters %>%
summarize(Min_A = min(A),
Max_B = max(B),
Sum_C = sum(C))
Which gives you:
Min_A Max_B Sum_C
1 1 3 6
Alternatively you can name it like this:
# Named Ungrouped:
zdata <- letters %>%
summarize(Min = min(A),
Max = max(B),
Sum = sum(C))
rownames(zdata) <- "Max"
zdata
Which gives you this:
Min Max Sum
Max 1 3 6
Not entirely sure why you want the max label for the rows, but this would achieve both your aims I think. There are many functions like this within dplyr. You can get a background on these functions in a book called R for Data Science!
We may use match.fun
provided the function names in lookup table are valid R functions.
data.frame(Map(function(x, y) match.fun(y)(x), df1, lookup))
# A B C D
#1 1 3 6 z
data
df1 <- structure(list(A = 1:3, B = 1:3, C = 1:3, D = c("x", "y", "z"
)), row.names = c(NA, -3L), class = c("data.table", "data.frame"))
lookup <- structure(list(A = "min", B = "max", C = "sum", D = "max"),
row.names = c(NA, -1L), class = "data.frame")