:= (pass by reference) operator in the data.table package modifies another data table object simultaneously
While testing my code, I found out the following: If I assign a data.table DT1
to DT
and change DT
afterwards, DT1
changes with it. So DT
and DT1
seem to be internally linked. Is this intended behavior? Although I'm not a programming expert, this looks wrong to me, and testing it with simple R variables or a data.frame
, I couldn't reproduce the behavior. What's happening here?
DF <- data.frame(ID=letters[1:5],
value=1:5)
DF1 <- DF
all.equal(DF1, DF)
[1] TRUE
DF[1, "value"] <- DF[1, "value"]*2
all.equal(DF1, DF)
[1] "Component 2: Mean relative difference: 1"
library(data.table)
data.table 1.7.1 For help type: help("data.table")
DT <- data.table(ID=letters[1:5],
value=1:5)
DT1 <- DT
all.equal(DT1, DT)
[1] TRUE
DT[, value:=value*2]
ID value
[1,] a 2
[2,] b 4
[3,] c 6
[4,] d 8
[5,] e 10
all.equal(DT1, DT)
[1] TRUE
This piece of documentation in data.table
would help. ? data.table::copy
No value is returned. The data.table is modified by reference. If you require a copy, take a copy first (using DT2=copy(DT)). copy() may also sometimes be useful before := is used to subassign to a column by reference.