How to separate IDs into different rows using R
We may use separate_rows
with a regex lookaround - i.e. split at the ;
followed by a space (
) that succeeds a closing bracket (]
) and before an upper case letter
library(tidyr)
separate_rows(df1, NEW.ID, sep = "(?<=\\]); (?=[A-Z])")
-output
# A tibble: 5 × 1
NEW.ID
<chr>
1 P02538 [551-559]
2 P04259 [551-559]
3 A0A0B4J2F2 1xPhospho [T473]
4 Q8IVF2 1xPhospho [S1253]; 1xPhospho [S1748]
5 A0A1B0GX95 2xPhospho [S24; S26]
data
df1 <- structure(list(NEW.ID = c("P02538 [551-559]; P04259 [551-559]",
"A0A0B4J2F2 1xPhospho [T473]", "Q8IVF2 1xPhospho [S1253]; 1xPhospho [S1748]",
"A0A1B0GX95 2xPhospho [S24; S26]")), class = "data.frame",
row.names = c(NA,
-4L))