How to overwrite entries in a data frame by entries from a smaller dataframe?
I am trying to join two dataframes. The smaller is a subset of the larger, with updated values. I wish to keep all rows and columns in the larger dataframe, but overwrite values with the values in the smaller where the row ID and column correspond.
I can't see that any of the normal dplyr
or base
join operations (join, right, outer, inner) can easily achieve this. I am therefore looking for a join function/operation that can achieve what I want.
df1 <- structure(list(
ID = as.factor(c(1,2,5,6)),
Sepal.Width = c(4.5, 7, 3.2, 3.1),
Petal.Length = c(1.8, 2.4, 3.3, 6.5),
Petal.Width = c(1.2, 7.2, 3.2, 3.2)), row.names = c(NA,
4L), class = "data.frame")
df2 <- cbind(data.frame(ID = as.factor(1:10)), iris[1:10, 1:5])
A data.frame: 4 × 4
ID Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl>
1 1 4.5 1.8 1.2
2 2 7.0 2.4 7.2
3 5 3.2 3.3 3.2
4 6 3.1 6.5 3.2
A data.frame: 10 × 6
ID Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<fct> <dbl> <dbl> <dbl> <dbl> <fct>
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
I want to merge these into one:
A data.frame: 10 × 6
ID Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<fct> <dbl> <dbl> <dbl> <dbl> <fct>
1 1 5.1 4.5 1.8 1.2 setosa #<-- Updated rows
2 2 4.9 7.0 2.4 7.2 setosa #<-- Updated rows
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.2 3.3 3.2 setosa #<-- Updated rows
6 6 5.4 3.1 6.5 3.2 setosa #<-- Updated rows
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
# Î Î Î
# Updated columns
Solution 1:
Have you tried the (relatively) new function rows_update
from dplyr
which does this.
library(dplyr)
df2 %>% rows_update(df1, by = 'ID')
# ID Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 1 5.1 4.5 1.8 1.2 setosa
#2 2 4.9 7.0 2.4 7.2 setosa
#3 3 4.7 3.2 1.3 0.2 setosa
#4 4 4.6 3.1 1.5 0.2 setosa
#5 5 5.0 3.2 3.3 3.2 setosa
#6 6 5.4 3.1 6.5 3.2 setosa
#7 7 4.6 3.4 1.4 0.3 setosa
#8 8 5.0 3.4 1.5 0.2 setosa
#9 9 4.4 2.9 1.4 0.2 setosa
#10 10 4.9 3.1 1.5 0.1 setosa
Solution 2:
we can also use {powerjoin}
library(powerjoin)
power_left_join(df2, df1, by = "ID", conflict = coalesce_yx)
#> ID Sepal.Length Species Sepal.Width Petal.Length Petal.Width
#> 1 1 5.1 setosa 4.5 1.8 1.2
#> 2 2 4.9 setosa 7.0 2.4 7.2
#> 3 3 4.7 setosa 3.2 1.3 0.2
#> 4 4 4.6 setosa 3.1 1.5 0.2
#> 5 5 5.0 setosa 3.2 3.3 3.2
#> 6 6 5.4 setosa 3.1 6.5 3.2
#> 7 7 4.6 setosa 3.4 1.4 0.3
#> 8 8 5.0 setosa 3.4 1.5 0.2
#> 9 9 4.4 setosa 2.9 1.4 0.2
#> 10 10 4.9 setosa 3.1 1.5 0.1
It moves the conflicted columns to the end though