Conditional merge/replacement in R
use match()
, assuming values in df1 are unique.
df1 <- data.frame(x1=1:4,x2=letters[1:4],stringsAsFactors=FALSE)
df2 <- data.frame(x1=2:3,x2=c("zz","qq"),stringsAsFactors=FALSE)
df1$x2[match(df2$x1,df1$x1)] <- df2$x2
> df1
x1 x2
1 1 a
2 2 zz
3 3 qq
4 4 d
If the values aren't unique, use :
for(id in 1:nrow(df2)){
df1$x2[df1$x1 %in% df2$x1[id]] <- df2$x2[id]
}
The first part of Joris' answer is good, but in the case of non-unique values in df1
, the row-wise for-loop will not scale well on large data.frames.
You could use a data.table
"update join" to modify in place, which will be quite fast:
library(data.table)
setDT(df1); setDT(df2)
df1[df2, on = .(x1), x2 := i.x2]
Or, assuming you don't care about maintaining row order, you could use SQL-inspired dplyr
:
library(dplyr)
union_all(
inner_join( df1["x1"], df2 ), # x1 from df1 with matches in df2, x2 from df2
anti_join( df1, df2["x1"] ) # rows of df1 with no match in df2
) # %>% arrange(x1) # optional, won't maintain an arbitrary row order
Either of these will scale much better than the row-wise for-loop.