Join R data.tables where key values are not exactly equal--combine rows with closest times
Another option may be roll='nearest'
(new in v1.8.8 on CRAN).
> setkey(DT1,x,time)
> DT1
x time v
1: a 10 1
2: a 30 2
3: a 60 3
4: b 10 4
5: b 30 5
6: b 60 6
7: c 10 7
8: c 30 8
9: c 60 9
> DT2
x time
1: a 17
2: b 54
3: c 3
> DT1[DT2,roll="nearest"]
x time v
1: a 17 1
2: b 54 6
3: c 3 7
Note that 17 appears to be closer to 10 than 30, hence the result in the first row.
If you need to roll to the next observation (next observation carried backwards) :
> DT1[DT2,roll=-Inf]
x time v
1: a 17 2
2: b 54 6
3: c 3 7
You can use findInterval
to accomplish this:
setkey(DT2, time)
DT1[, id := findInterval(DT1$time, DT2$time)]
DT2[, id := 1:3]
setkey(DT1, "x", "id")
setkey(DT2, "x", "id")
print(DT1[DT2][, id := NULL])
# x time v time.1
# 1: a 30 2 17
# 2: b 60 6 54
# 3: c 10 7 3
The idea: First sort the data.table by time because the second argument of findInterval
requires increasing order of values. Now, use findInterval
to find in which interval of 3, 17, 54
does the values in DT1$time
fall and store it in id
. In this particular case, it happens to range from 1 to 3. So, set these values as id
column for DT2
. Once you find the intervals and get id
, then it's straightforward. Instead of setting x
and time
, set x
and id
as keys and do your merge.
Note: Suppose your DT1$time
had a value of 0, then, the interval for that would have been 0. So, you'd get 4 unique values (0:3). In that case, it may be better to have DT2 with a time = 0 value as well. I just wanted to note this point here. I'll leave it to you.