Merge R data frame or data table and overwrite values of multiple columns

I'd probably put the data in long form and drop dupes:

k = key(dt_1)
DTList = list(dt_1, dt_2)

DTLong = rbindlist(lapply(DTList, function(x) melt(x, id=k)))    
setorder(DTLong, na.last = TRUE)    
unique(DTLong, by=c(k, "variable"))

    id       date variable value
1: abc 2018-01-01        a     3
2: abc 2018-01-01        b     5
3: abc 2018-01-01        c     4
4: abc 2018-01-01        d     6
5: abc 2018-01-01        e    NA

You can do this by using dplyr::coalesce, which will return the first non-missing value from vectors.

(EDIT: you can use dplyr::coalesce directly on the data frames also, no need to create the function below. Left it there just for completeness, as a record of the original answer.)

Credit where it's due: this code is mostly from this blog post, it builds a function that will take two data frames and do what you need (taking values from the x data frame if they are present).

coalesce_join <- function(x, 
                          y, 
                          by, 
                          suffix = c(".x", ".y"), 
                          join = dplyr::full_join, ...) {
    joined <- join(x, y, by = by, suffix = suffix, ...)
    # names of desired output
    cols <- union(names(x), names(y))

    to_coalesce <- names(joined)[!names(joined) %in% cols]
    suffix_used <- suffix[ifelse(endsWith(to_coalesce, suffix[1]), 1, 2)]
    # remove suffixes and deduplicate
    to_coalesce <- unique(substr(
        to_coalesce, 
        1, 
        nchar(to_coalesce) - nchar(suffix_used)
    ))

    coalesced <- purrr::map_dfc(to_coalesce, ~dplyr::coalesce(
        joined[[paste0(.x, suffix[1])]], 
        joined[[paste0(.x, suffix[2])]]
    ))
    names(coalesced) <- to_coalesce

    dplyr::bind_cols(joined, coalesced)[cols]
}

We can use {powerjoin}, do a left join and deal with the conflicts using coalesce_xy() (which is pretty much dplyr::coalesce()).

library(powerjoin)
power_left_join(dt_1, dt_2, by = "id", conflict = coalesce_xy)
#    id       date a b c d  e
# 1 abc 2018-01-01 3 5 4 6 NA

Merge R data frame or data table and overwrite values of multiple columns

Related

Recent Posts