Detect new values when comparing existing variables in a dataframe and add them in a new variable in R
col1 | col2 |
---|---|
First,Second,Other | row,First |
Second,Other,Other2 | row,Second |
I would like to create a new column with the values that are in col1 and not in col2:
col1 | col2 | col3 |
---|---|---|
First,Second,Other | row,First | Second,Other |
Second,Other,Other2 | row,Second | Other,Other2 |
And what if the separator is a ||
instead of a ,
?
Loop rows, split, get the set difference, finally paste them back together again:
d$col3 <- apply(d, 1, function(i) {
paste(setdiff(unlist(strsplit(i[ 1 ], ",")),
unlist(strsplit(i[ 2 ], ","))), collapse = ",")})
d
# col1 col2 col3
# 1 First,Second,Other row,First Second,Other
# 2 Second,Other,Other2 row,Second Other,Other2
If we want to split on "||"
then apply below changes for strsplit in above code:
#example for ||
strsplit("First||Second||Other", split = "||", fixed = TRUE)
# [[1]]
# [1] "First" "Second" "Other"