data.table join then add columns to existing data.frame without re-copy
Solution 1:
This is easy to do:
X[Y, z := i.z]
It works because the only difference between Y[X]
and X[Y]
here, is when some elements are not in Y
, in which case presumably you'd want z
to be NA
, which the above assignment will exactly do.
It would also work just as well for many variables:
X[Y, `:=`(z1 = i.z1, z2 = i.z2, ...)]
Since you require the operation Y[X]
, you can add the argument nomatch=0
(as @mnel points out) so as to not get NAs for those where X doesn't contain the key values from Y. That is:
X[Y, z := i.z, nomatch=0]
From the NEWS for data.table
********************************************** ** ** ** CHANGES IN DATA.TABLE VERSION 1.7.10 ** ** ** **********************************************
NEW FEATURES
o The prefix i. can now be used in j to refer to join inherited columns of i that are otherwise masked by columns in x with the same name.
Solution 2:
As an addition to the answer above, you can also do (v1.9.6+
):
require(data.table) # v1.9.6+
X[Y, (colNames) := mget(paste0("i.", colNames))]
where colNames
is a character vector listing the columns you want from Y
. This lets you efficiently select columns to add (define colNames
from a subset of names(Y)
) in the case you are adding many columns.
Also, you can combine it with the new on=
argument (from v1.9.6+
) as:
# ad-hoc joins using 'on=' instead of setting keys
require(data.table) # v1.9.6+
X[Y, (colNames) := mget(paste0("i.", colNames)), on = "g"]
Credit to akrun for the (colNames) := mget(colNames)
strategy here: Update rows of data frame in R.