How to separate comma separated values in R in a new row?
You could try cSplit
from my "splitstackshape" package:
library(splitstackshape)
cSplit(as.data.table(mydf)[, col2 := gsub("[][\"]", "", col2)],
"col2", ",", "long")
# col1 col2
# 1: a 1
# 2: a 2
# 3: a 3
# 4: b 1
# 5: b 2
# 6: c 4
Of course, I'm highly partial to cSplit
, but you can also use "dplyr" and unnest
from "tidyr":
library(dplyr)
library(tidyr)
mydf %>%
mutate(col2 = strsplit(gsub("[][\"]", "", col2), ",")) %>%
unnest(col2)
Or just with "data.table":
library(data.table)
as.data.table(mydf)[, list(
col2 = unlist(strsplit(gsub("[][\"]", "", col2), ","))),
by = col1]
The separate_rows()
function in tidyr
is the boss for observations with multiple delimited values. As you have a mix of integer and character strings (but just want integers in the final result, set convert = TRUE
and use the drop_na()
(also in tidyr
) to filter out the new rows for where the square parenthesis would otherwise go.
# create data
library(tidyverse)
d <- data_frame(
col1 = c("a", "b", "c"),
col2 = c("1,2,3", "[\"1\",\"2\"]", 4)
)
d
# # A tibble: 3 x 2
# col1 col2
# <chr> <chr>
# 1 a 1,2,3
# 2 b "[\"1\",\"2\"]"
# 3 c 4
# tidy data
d %>%
separate_rows(col2, convert = TRUE) %>%
drop_na()
# # A tibble: 6 x 2
# col1 col2
# <chr> <int>
# 1 a 1
# 2 a 2
# 3 a 3
# 4 b 1
# 5 b 2
# 6 c 4