Row-wise iteration like apply with purrr

How do I achieve row-wise iteration using purrr::map?

Here's how I'd do it with a standard row-wise apply.

df <- data.frame(a = 1:10, b = 11:20, c = 21:30)

lst_result <- apply(df, 1, function(x){
            var1 <- (x[['a']] + x[['b']])
            var2 <- x[['c']]/2
            return(data.frame(var1 = var1, var2 = var2))
          })

However, this is not too elegant, and I would rather do it with purrr. May (or may not) be faster, too.


Solution 1:

You can use pmap for row-wise iteration. The columns are used as the arguments of whatever function you are using. In your example you would have a three-argument function.

For example, here is pmap using an anonymous function for the work you are doing. The columns are passed to the function in the order they are in the dataset.

pmap(df, function(a, b, c) {
     data.frame(var1 = a + b,
                var2 = c/2) 
     }  ) 

You can use the purrr tilde "short-hand" for an anonymous function by referring to the columns in order with numbers preceded by two dots.

pmap(df, ~data.frame(var1 = ..1 + ..2,
                var2 = ..3/2)  ) 

If you want to get these particular results as a data.frame instead of a list, you can use pmap_dfr.

Solution 2:

Note that you're using only vectorized operations in your example so you could very well do :

df %>% dplyr::transmute(var1 = a+b,var2 = c/2)

(or in base R: transform(df,var1 = a+b,var2 = c/2)[4:5])

If you use non vectorized functions such as median you can use pmap as in @aosmith 's answer, or use dplyr::rowwise.

rowwise is slower and the package maintainers advise to use the map family instead, but it's arguably easier on the eye than pmap in some cases. I personally still use it when speed isn't an issue:

library(dplyr)
df %>% transmute(var3 = pmap(.,~median(c(..1,..2,..3))))
df %>% rowwise %>% transmute(var3 = median(c(a,b,c)))

(to go back to a strict unnamed list output : res %>% split(seq(nrow(.))) %>% unname)

Solution 3:

You are free to always make a wrapper around a function you "like".

rmap <- function (.x, .f, ...) {
    if(is.null(dim(.x))) stop("dim(X) must have a positive length")
    .x <- t(.x) %>% as.data.frame(.,stringsAsFactors=F)
    purrr::map(.x=.x,.f=.f,...)
}

apply the new function rmap (rowwisemap)

rmap(df1,~{
    var1 <- (.x[[1]] + .x[[2]])
    var2 <- .x[[3]]/2
    return(data.frame(var1 = var1, var2 = var2))
    })

Additional Info: (eval from top to bottom)

df1 <- data.frame(a=1:3,b=1:3,c=1:3)
m   <- matrix(1:9,ncol=3)

apply(df1,1,sum)
rmap(df1,sum)

apply(m,1,sum)
rmap(m,sum)

apply(1:10,1,sum)  # intentionally throws an error
rmap(1:10,sum)     # intentionally throws an error

Solution 4:

You can use pmap and the ... in combination which for me is the best solution because I dont need to specify the parameters.

df <- data.frame(a = 1:10, b = 11:20, c = 21:30)

lst_result <- df %>%
   pmap(function(...) {
       x <- tibble(...)
      return(tibble(var1 = x$a + x$b, var2 = x$c/2))
   })