Problem with mutate: seeking to fill each line of a column with samples that differ from one another
I want to create a dataframe in which one column contains a list made of a sample of 4 elements taken from a vector. I need all rows of this variable to contain a different sample.
This is a reproducible example.
library(dplyr)
vec <- LETTERS # The vector we will take a sample from
td <- tibble(id = 1:3)
td |>mutate(smpl = list(sample(vec, 4, replace = FALSE))) |> View()
What I would expect, is to have a different list for each row of the td dataframe, such as:
id smpl
1 c("Q", "E", "A", "J")
2 c("Z", "A", "F", "T")
3 c("M", "V" "C", "L")
Instead, the same sample repeats line after line:
id smpl
1 c("Q", "E", "A", "J")
2 c("Q", "E", "A", "J")
3 c("Q", "E", "A", "J")
Any suggestion? I am especially interested in a solution using dplyr, but... there is more than one right way of doing things in R.
Thank you!
Are you looking for something like this ?
I made the assumption of creating a string/character because you were printing character vectors within quotes.
library(dplyr)
vec <- LETTERS # The vector we will take a sample from
td <- tibble(id = 1:3)
td %>% group_by(id) %>% mutate(smpl = base::paste(sample(vec,4),collapse = ""))
The result:
> td %>% group_by(id) %>% mutate(smpl = base::paste(sample(vec,4),collapse = ""))
# A tibble: 3 × 2
# Groups: id [3]
id smpl
<int> <chr>
1 1 SETU
2 2 VCXW
3 3 JXIL
You need to replicate
for each row:
td |>
mutate(smpl = replicate(n(), sample(vec, 4, replace = FALSE), simplify = FALSE)) |>
as.data.frame() # merely to show it, otherwise tibbles would hide the contents
# id smpl
# 1 1 Q, E, A, J
# 2 2 D, R, Q, O
# 3 3 X, G, D, E
Both answers work very well, thank you. I didn't know the function replicate(), which acts as a wrapper around sapply(). It definitely works. However, group_by(id)
is more in keeping with the logic of dplyr, so I marked that solution as the answer to this question.
Here is my final code (keeping sample items grouped in a list within one dataframe column, which is what I wanted):
td |> group_by(id) |>
mutate(smpl = list(sample(vec, 4, replace = FALSE))) |>
as.data.frame()
Thank you, Au p and r2evans!
All the best.