How to overwrite entries in a data frame by entries from a smaller dataframe?

r dplyr

I am trying to join two dataframes. The smaller is a subset of the larger, with updated values. I wish to keep all rows and columns in the larger dataframe, but overwrite values with the values in the smaller where the row ID and column correspond.

I can't see that any of the normal dplyr or base join operations (join, right, outer, inner) can easily achieve this. I am therefore looking for a join function/operation that can achieve what I want.

df1 <- structure(list(
    ID = as.factor(c(1,2,5,6)),
    Sepal.Width = c(4.5, 7, 3.2, 3.1), 
    Petal.Length = c(1.8, 2.4, 3.3, 6.5), 
    Petal.Width = c(1.2, 7.2, 3.2, 3.2)), row.names = c(NA, 
4L), class = "data.frame")
df2 <- cbind(data.frame(ID = as.factor(1:10)), iris[1:10, 1:5])

A data.frame: 4 × 4
ID  Sepal.Width Petal.Length    Petal.Width
<fct>   <dbl>   <dbl>   <dbl>
1   1   4.5 1.8 1.2
2   2   7.0 2.4 7.2
3   5   3.2 3.3 3.2
4   6   3.1 6.5 3.2

A data.frame: 10 × 6
ID  Sepal.Length    Sepal.Width Petal.Length    Petal.Width Species
<fct>   <dbl>   <dbl>   <dbl>   <dbl>   <fct>
1   1   5.1 3.5 1.4 0.2 setosa  
2   2   4.9 3.0 1.4 0.2 setosa  
3   3   4.7 3.2 1.3 0.2 setosa
4   4   4.6 3.1 1.5 0.2 setosa  
5   5   5.0 3.6 1.4 0.2 setosa  
6   6   5.4 3.9 1.7 0.4 setosa  
7   7   4.6 3.4 1.4 0.3 setosa
8   8   5.0 3.4 1.5 0.2 setosa
9   9   4.4 2.9 1.4 0.2 setosa
10  10  4.9 3.1 1.5 0.1 setosa

I want to merge these into one:

A data.frame: 10 × 6
ID  Sepal.Length    Sepal.Width Petal.Length    Petal.Width Species
<fct>   <dbl>   <dbl>   <dbl>   <dbl>   <fct>
1   1   5.1 4.5 1.8 1.2 setosa  #<-- Updated rows
2   2   4.9 7.0 2.4 7.2 setosa  #<-- Updated rows
3   3   4.7 3.2 1.3 0.2 setosa
4   4   4.6 3.1 1.5 0.2 setosa
5   5   5.0 3.2 3.3 3.2 setosa  #<-- Updated rows
6   6   5.4 3.1 6.5 3.2 setosa  #<-- Updated rows
7   7   4.6 3.4 1.4 0.3 setosa
8   8   5.0 3.4 1.5 0.2 setosa
9   9   4.4 2.9 1.4 0.2 setosa
10  10  4.9 3.1 1.5 0.1 setosa
           # Î   Î   Î
           # Updated columns

Solution 1:

Have you tried the (relatively) new function rows_update from dplyr which does this.

library(dplyr)

df2 %>% rows_update(df1, by = 'ID')

#   ID Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1   1          5.1         4.5          1.8         1.2  setosa
#2   2          4.9         7.0          2.4         7.2  setosa
#3   3          4.7         3.2          1.3         0.2  setosa
#4   4          4.6         3.1          1.5         0.2  setosa
#5   5          5.0         3.2          3.3         3.2  setosa
#6   6          5.4         3.1          6.5         3.2  setosa
#7   7          4.6         3.4          1.4         0.3  setosa
#8   8          5.0         3.4          1.5         0.2  setosa
#9   9          4.4         2.9          1.4         0.2  setosa
#10 10          4.9         3.1          1.5         0.1  setosa

Solution 2:

we can also use {powerjoin}

library(powerjoin)
power_left_join(df2, df1, by = "ID", conflict = coalesce_yx)
#>    ID Sepal.Length Species Sepal.Width Petal.Length Petal.Width
#> 1   1          5.1  setosa         4.5          1.8         1.2
#> 2   2          4.9  setosa         7.0          2.4         7.2
#> 3   3          4.7  setosa         3.2          1.3         0.2
#> 4   4          4.6  setosa         3.1          1.5         0.2
#> 5   5          5.0  setosa         3.2          3.3         3.2
#> 6   6          5.4  setosa         3.1          6.5         3.2
#> 7   7          4.6  setosa         3.4          1.4         0.3
#> 8   8          5.0  setosa         3.4          1.5         0.2
#> 9   9          4.4  setosa         2.9          1.4         0.2
#> 10 10          4.9  setosa         3.1          1.5         0.1

It moves the conflicted columns to the end though

How to overwrite entries in a data frame by entries from a smaller dataframe?

Solution 1:

Solution 2:

Related

Recent Posts