Negation of gsub | Replace everything except strings in a certain vector
I have a vector of strings:
ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")
I want to keep only three possible values in this vector: N
, A
, and NA
.
Therefore, I want to replace any element that is NOT N
or A
with NA
.
How can I achieve this?
I have tried the following:
gsub(ve, pattern = '[^NA]+', replacement = 'NA')
gsub(ve, pattern = '[^N|^A]+', replacement = 'NA')
But these don't work well, because they replace every instance of "A" or "N" in every string with NA. So in some cases I end up with NANANANANANA
, instead of simply NA
.
Solution 1:
Use negative lookahead assertion.
ve <- c("N","A","A","A","N","ANN","NA","NFNFNAA","23","N","A","NN", "parnot", "important", "notall")
sub("^(?![NA]$).*", "NA", ve, perl=T)
# [1] "N" "A" "A" "A" "N" "NA" "NA" "NA" "NA" "N" "A" "NA" "NA" "NA" "NA"
^(?![NA]$)
asserts that
-> after the start ^
there should be only one letter [NA]
either N
or A
which should be followed by line end $
.
.*
match all chars
So that above regex would match any string except the string is N
or A
Solution 2:
If we are looking for fixed matches, then use %in%
with negation !
and assign it to 'NA'
ve[!ve %in% c("A", "N", "NA")] <- 'NA'
Note that in R
, missing value is unquoted NA
and not quoted. Hope it is a different category and would advise to change the category name to different name to avoid future confusions while parsing
Solution 3:
Here is an alternative regex solution
ve[!grepl("^[N|A]$", ve)] <- NA_character_
You still probably should go with Akrun's solution.