cartesian product with dplyr R
I'm trying to find the dplyr function for cartesian product. I've two simple data.frame with no common variable:
x <- data.frame(x=c("a","b","c"))
y <- data.frame(y=c(1,2,3))
I would like to reproduce the result of
merge(x,y)
x y
1 a 1
2 b 1
3 c 1
4 a 2
5 b 2
6 c 2
7 a 3
8 b 3
9 c 3
I've already looked for this (for example here or here) without finding anything useful.
Thank you very much
Use crossing from the tidyr
package:
x <- data.frame(x=c("a","b","c"))
y <- data.frame(y=c(1,2,3))
crossing(x, y)
Result:
x y
1 a 1
2 a 2
3 a 3
4 b 1
5 b 2
6 b 3
7 c 1
8 c 2
9 c 3
Apologies to all: the below example does not appear to work with data.frames or data.tables.
When x and y are database tbl
s (tbl_dbi
/ tbl_sql
) you can now also do:
full_join(x, y, by = character())
Added to dplyr at the end of 2017, and also gets translated to a CROSS JOIN
in the DB world. Saves the nastiness of having to introduce the fake variables.
If we need a tidyverse
output, we can use expand
from tidyr
library(tidyverse)
y %>%
expand(y, x= x$x) %>%
select(x,y)
# A tibble: 9 × 2
# x y
# <fctr> <dbl>
#1 a 1
#2 b 1
#3 c 1
#4 a 2
#5 b 2
#6 c 2
#7 a 3
#8 b 3
#9 c 3
When faced with this problem, I tend to do something like this:
x <- data.frame(x=c("a","b","c"))
y <- data.frame(y=c(1,2,3))
x %>% mutate(temp=1) %>%
inner_join(y %>% mutate(temp=1),by="temp") %>%
dplyr::select(-temp)
If x and y are multi-column data frames, but I want to do every combination of a row of x with a row of y, then this is neater than any expand.grid() option that I can come up with