What are the differences between R's new native pipe `|>` and the magrittr pipe `%>%`?

Another difference between both of them is for the piped in values . can be used as a placeholder in magrittr's pipe

c("dogs", "cats", "rats") %>% grepl("at", .)
#[1] FALSE  TRUE  TRUE

But this is not possible with R's native pipe.

c("dogs", "cats", "rats") |> grepl("at", .)

Error in grepl(c("dogs", "cats", "rats"), "at", .) : object '.' not found

Here are different ways to reference them -

  1. Write a separate function -
find_at = function(x) grepl("at", x)
c("dogs", "cats", "rats") |> find_at()
#[1] FALSE  TRUE  TRUE

2 a. Use an anonymous function -

c("dogs", "cats", "rats") |> {function(x) grepl("at", x)}()

2 b. Use the new anonymous function syntax

c("dogs", "cats", "rats") |> {\(x) grepl("at", x)}()
  1. Specify the first parameter by name. This relies on the fact that the native pipe pipes into the first unnamed parameter, so if you provide a name for the first parameter it "overflows" into the second (and so on if you specify more than one parameter by name)
c("dogs", "cats", "rats") |> grepl(pattern="at")
#> [1] FALSE  TRUE  TRUE

Examples 1 and 2 taken from - https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/

Example 3 taken from https://mobile.twitter.com/rlangtip/status/1409904500157161477


The base R pipe |> added in R 4.1.0 "just" does functional composition. I.e. we can see that its use really is just the same as the functional call:

> 1:5 |> sum()             # simple use of |>
[1] 15
> deparse(substitute( 1:5 |> sum() ))
[1] "sum(1:5)"
> 

That has some consequences:

  • it makes it a little faster
  • it makes it a little simpler and more robust
  • it makes is a little more restrictive: sum() here needs the parens for a proper call
  • it limits uses of the 'implicit' data argument

This leads to possible use of => which is currently "available but not active" (for which you need to set the enviornment variable _R_USE_PIPEBIND_, and which may change for R 4.2.0).

(This was first offered as answer to a question duplicating this over here and I just copied it over as suggested.)

Edit: As the follow-up question on 'what is =>' comes up, here is a quick follow-up. Note that this operator is subject to change.

> Sys.setenv("_R_USE_PIPEBIND_"=TRUE)
> mtcars |> subset(cyl == 4) |> d => lm(mpg ~ disp, data = d)

Call:
lm(formula = mpg ~ disp, data = subset(mtcars, cyl == 4))

Coefficients:
(Intercept)         disp  
     40.872       -0.135  

> deparse(substitute(mtcars |> subset(cyl==4) |> d => lm(mpg ~ disp, data = d)))
[1] "lm(mpg ~ disp, data = subset(mtcars, cyl == 4))"
> 

The deparse(substitute(...)) is particularly nice here.


The native pipe is implemented as a syntax transformation and so 2 |> sqrt() has no discernible overhead compared to sqrt(2), whereas 2 %>% sqrt() comes with a small penalty.

microbenchmark(sqrt(1), 
               2 |> sqrt(), 
               3 %>% sqrt())
# Unit: nanoseconds
#          expr  min     lq    mean median   uq   max neval
#       sqrt(1)  117  126.5  141.66  132.0  139   246   100
#       sqrt(2)  118  129.0  156.16  134.0  145  1792   100
#  3 %>% sqrt() 2695 2762.5 2945.26 2811.5 2855 13736   100

You see how the expression 2 |> sqrt() passed to microbenchmark is parsed as sqrt(2). This can also be seen in

quote(2 |> sqrt())
# sqrt(2)