replacing values of a string variable in stringr

Solution 1:

There are two things you need to change in your code to obtain the desired output. The first one is the one @Emax mentioned: escaping parentheses with double backslashes (\\( and \\)). Additionally, you need to pay attention to the order of the replacements, as certain replacements might affect the outcome of following replacements. That is the reason in your OP "androstenedione \\(A\\)" do not get replaced by "Androstenedione", because the replacement "androstenedione" = "Androstenedione" is happening before "androstenedione \\(A\\)" = "Androstenedione". A simple solution to get the desired output would be to first replace the most specific cases (e.g., "androstenedione \\(A\\)"), before the more general ones (e.g., "androstenedione").

library(stringr)
df$new_var1 <- str_replace_all(df$Var1,
                               c(#16OHE1
                                 "16a-OH E1" = "16-OHE1", 
                                 "16a-OHE1" = "16-OHE1", 
                                 "16OHE" = "16-OHE1",
                                 #17Beta estradiol
                                 "17-b-estradiol" = "17-b-estradiol",
                                 "17b estradiol"= "17-b-estradiol",
                                 #Andreostenedione
                                 "androstenedione  \\(A\\)" = "Androstenedione",
                                 "androstenedione" = "Androstenedione",
                                 "Androstenedione" = "Androstenedione",
                                 #2-OHE-1
                                 "2-OHE-1" = "2-OHE-1",
                                 "2-hydroxy \\(OH\\) E1" = "2-OHE-1")
)

Solution 2:

Here's an approach with agrep (Fuzzy Matching) without replacing any parentheses. You can add insertions, deletions and substitutions with agrep for other examples if needed.

replacements

repl <- c(`16a-OH E1` = "16-OHE1", `16a-OHE1` = "16-OHE1", `16OHE` = "16-OHE1", 
`17-b-estradiol` = "17-b-estradiol", `17b estradiol` = "17-b-estradiol", 
androstenedione = "Androstenedione", Androstenedione = "Androstenedione", 
`Androstenedione  (A)` = "Androstenedione", `2-OHE-1` = "2-OHE-1", 
`2-hydroxy (OH) E1` = "2-OHE-1")
df$new_var1 <- sapply(seq_along(df$Var1), function(x){ 
  re=repl[agrep(df$Var1[x], names(repl))][1]; 
  ifelse(is.na(re), df$Var1[x], re) })

df$new_var1
 [1] "16-pathway"                               
 [2] "16-OHE1"                                  
 [3] "16-OHE1"                                  
 [4] "16-OHE1"                                  
 [5] "17-b-estradiol"                           
 [6] "17-OH-progesterone"                       
 [7] "17-OH-progesterone/ androstenedione ratio"
 [8] "17b-HSD (rs2830A)"                        
 [9] "17b-HSD (rs592389 G)"                     
[10] "17b-HSD (rs615492 G)"                     
[11] "17b-HSD (rs615942 G)"                     
[12] "17-b-estradiol"                           
[13] "17OH-progesterone"                        
[14] "2-OHE-1"                                  
[15] "2-OHE-1"                                  
[16] "2-OHE-1"                                  
[17] "2-pathway"                                
[18] "2:16 OHE ratio"                           
[19] "2:16 pathway ratio"                       
[20] "16-OHE1"                                  
[21] "2:16OHE"                                  
[22] "16-OHE1"                                  
[23] "Adiponectin"                              
[24] "Androstenedione"                          
[25] "Androstenedione"                          
[26] "Androstenedione"

Solution 3:

In str_replace_all you need to escape the ( and ) by using "double backslash " in front. Try the below it works. :)

df$new_var1 <- str_replace_all(df$Var1,
                               c(#16OHE1
                                 "16a-OH E1" = "16-OHE1", 
                                 "16a-OHE1" = "16-OHE1", 
                                 "16OHE" = "16-OHE1",
                                 "17-b-estradiol" = "17-b-estradiol",
                                 "17b estradiol"= "17-b-estradiol",
                                 "androstenedione" = "Androstenedione",
                                 "Androstenedione" = "Androstenedione",
                                 "androstenedione  \\(A\\)" = "Androstenedione",
                                 "2-OHE-1" = "2-OHE-1",
                                 "2-hydroxy \\(OH\\) E1" = "2-OHE-1"))