data.table join then add columns to existing data.frame without re-copy

Solution 1:

This is easy to do:

X[Y, z := i.z]

It works because the only difference between Y[X] and X[Y] here, is when some elements are not in Y, in which case presumably you'd want z to be NA, which the above assignment will exactly do.

It would also work just as well for many variables:

X[Y, `:=`(z1 = i.z1, z2 = i.z2, ...)]

Since you require the operation Y[X], you can add the argument nomatch=0 (as @mnel points out) so as to not get NAs for those where X doesn't contain the key values from Y. That is:

X[Y, z := i.z, nomatch=0]

From the NEWS for data.table

    **********************************************
    **                                          **
    **   CHANGES IN DATA.TABLE VERSION 1.7.10   **
    **                                          **
    **********************************************

NEW FEATURES

o   The prefix i. can now be used in j to refer to join inherited
    columns of i that are otherwise masked by columns in x with
    the same name.

Solution 2:

As an addition to the answer above, you can also do (v1.9.6+):

require(data.table) # v1.9.6+
X[Y, (colNames) := mget(paste0("i.", colNames))]

where colNames is a character vector listing the columns you want from Y. This lets you efficiently select columns to add (define colNames from a subset of names(Y)) in the case you are adding many columns.

Also, you can combine it with the new on= argument (from v1.9.6+) as:

# ad-hoc joins using 'on=' instead of setting keys
require(data.table) # v1.9.6+
X[Y, (colNames) := mget(paste0("i.", colNames)), on = "g"]

Credit to akrun for the (colNames) := mget(colNames) strategy here: Update rows of data frame in R.