R how to summarize multiple columns of data by different function from look up table

I have a data.table of multiple columns that need to be summarized by a particular function depending on the column name.

A | B | C | D
1 | 1 | 1 | x
2 | 2 | 2 | y
3 | 3 | 3 | z

Should become

A | B | C | D
1 | 3 | 6 | z

As per the lookup table:

A     | B     | C     | D
"min" | "max" | "sum" | "max"

You can't really summarize characters like letters (such as Z), so I assume you mean grouped summary data like this:

# Create data frame:
A <- c(1,2,3)
B <- c(1,2,3)
C <- c(1,2,3)
D <- c("x", "y", "z")
letters <- data.frame(A,B,C,D)

# Load library for summarizing values:
library(dplyr)

# Summarize and group by specific vector:
letters %>% 
  group_by(D) %>% 
  summarize(Min_A = min(A),
            Max_B = max(B),
            Sum_C = sum(C))

Which gives you this:

  D     Min_A Max_B Sum_C
  <chr> <dbl> <dbl> <dbl>
1 x         1     1     1
2 y         2     2     2
3 z         3     3     3

Otherwise if you just mean all the descriptives (min, max, etc.):

# Ungrouped:
letters %>%  
  summarize(Min_A = min(A),
            Max_B = max(B),
            Sum_C = sum(C))

Which gives you:

  Min_A Max_B Sum_C
1     1     3     6

Alternatively you can name it like this:

# Named Ungrouped:
zdata <- letters %>%  
  summarize(Min = min(A),
            Max = max(B),
            Sum = sum(C))
rownames(zdata) <- "Max"
zdata

Which gives you this:

    Min Max Sum
Max   1   3   6

Not entirely sure why you want the max label for the rows, but this would achieve both your aims I think. There are many functions like this within dplyr. You can get a background on these functions in a book called R for Data Science!


We may use match.fun provided the function names in lookup table are valid R functions.

data.frame(Map(function(x, y) match.fun(y)(x), df1, lookup))
#  A B C D
#1 1 3 6 z

data

df1 <- structure(list(A = 1:3, B = 1:3, C = 1:3, D = c("x", "y", "z"
)), row.names = c(NA, -3L), class = c("data.table", "data.frame"))

lookup <- structure(list(A = "min", B = "max", C = "sum", D = "max"), 
           row.names = c(NA, -1L), class = "data.frame")