What are the differences between R's new native pipe `|>` and the magrittr pipe `%>%`?
Another difference between both of them is for the piped in values .
can be used as a placeholder in magrittr
's pipe
c("dogs", "cats", "rats") %>% grepl("at", .)
#[1] FALSE TRUE TRUE
But this is not possible with R's native pipe.
c("dogs", "cats", "rats") |> grepl("at", .)
Error in grepl(c("dogs", "cats", "rats"), "at", .) : object '.' not found
Here are different ways to reference them -
- Write a separate function -
find_at = function(x) grepl("at", x)
c("dogs", "cats", "rats") |> find_at()
#[1] FALSE TRUE TRUE
2 a. Use an anonymous function -
c("dogs", "cats", "rats") |> {function(x) grepl("at", x)}()
2 b. Use the new anonymous function syntax
c("dogs", "cats", "rats") |> {\(x) grepl("at", x)}()
- Specify the first parameter by name. This relies on the fact that the native pipe pipes into the first unnamed parameter, so if you provide a name for the first parameter it "overflows" into the second (and so on if you specify more than one parameter by name)
c("dogs", "cats", "rats") |> grepl(pattern="at")
#> [1] FALSE TRUE TRUE
Examples 1 and 2 taken from - https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/
Example 3 taken from https://mobile.twitter.com/rlangtip/status/1409904500157161477
The base R pipe |>
added in R 4.1.0 "just" does functional composition. I.e. we can see that its use really is just the same as the functional call:
> 1:5 |> sum() # simple use of |>
[1] 15
> deparse(substitute( 1:5 |> sum() ))
[1] "sum(1:5)"
>
That has some consequences:
- it makes it a little faster
- it makes it a little simpler and more robust
- it makes is a little more restrictive:
sum()
here needs the parens for a proper call - it limits uses of the 'implicit' data argument
This leads to possible use of =>
which is currently "available but not active" (for which you need to set the enviornment variable _R_USE_PIPEBIND_
, and which may change for R 4.2.0).
(This was first offered as answer to a question duplicating this over here and I just copied it over as suggested.)
Edit: As the follow-up question on 'what is =>
' comes up, here is a quick follow-up. Note that this operator is subject to change.
> Sys.setenv("_R_USE_PIPEBIND_"=TRUE)
> mtcars |> subset(cyl == 4) |> d => lm(mpg ~ disp, data = d)
Call:
lm(formula = mpg ~ disp, data = subset(mtcars, cyl == 4))
Coefficients:
(Intercept) disp
40.872 -0.135
> deparse(substitute(mtcars |> subset(cyl==4) |> d => lm(mpg ~ disp, data = d)))
[1] "lm(mpg ~ disp, data = subset(mtcars, cyl == 4))"
>
The deparse(substitute(...))
is particularly nice here.
The native pipe is implemented as a syntax transformation and so 2 |> sqrt()
has no discernible overhead compared to sqrt(2)
, whereas 2 %>% sqrt()
comes with a small penalty.
microbenchmark(sqrt(1),
2 |> sqrt(),
3 %>% sqrt())
# Unit: nanoseconds
# expr min lq mean median uq max neval
# sqrt(1) 117 126.5 141.66 132.0 139 246 100
# sqrt(2) 118 129.0 156.16 134.0 145 1792 100
# 3 %>% sqrt() 2695 2762.5 2945.26 2811.5 2855 13736 100
You see how the expression 2 |> sqrt()
passed to microbenchmark
is parsed as sqrt(2)
. This can also be seen in
quote(2 |> sqrt())
# sqrt(2)