Cartesian Product using data.table package

Solution 1:

If you first construct full names from the first and last in the cust-dataframe, you can then use CJ (cross-join). You cannot use all three vectors since there would be 99 items and teh first names would get inappropriately mixed with last names.

> nrow(CJ(dates$date, cust$first.name, cust$last.name ) )
[1] 99

This returns the desired data.table object:

> CJ(dates$date,paste(cust$first.name, cust$last.name) )
            V1           V2
 1: 2012-08-28 George Smith
 2: 2012-08-28  Henry Smith
 3: 2012-08-28     John Doe
 4: 2012-08-29 George Smith
 5: 2012-08-29  Henry Smith
 6: 2012-08-29     John Doe
 7: 2012-08-30 George Smith
 8: 2012-08-30  Henry Smith
 9: 2012-08-30     John Doe
10: 2012-08-31     John Doe
11: 2012-08-31 George Smith
12: 2012-08-31  Henry Smith
13: 2012-09-01     John Doe
14: 2012-09-01 George Smith
15: 2012-09-01  Henry Smith
16: 2012-09-02 George Smith
17: 2012-09-02  Henry Smith
18: 2012-09-02     John Doe
19: 2012-09-03  Henry Smith
20: 2012-09-03     John Doe
21: 2012-09-03 George Smith
22: 2012-09-04  Henry Smith
23: 2012-09-04     John Doe
24: 2012-09-04 George Smith
25: 2012-09-05 George Smith
26: 2012-09-05  Henry Smith
27: 2012-09-05     John Doe
28: 2012-09-06 George Smith
29: 2012-09-06  Henry Smith
30: 2012-09-06     John Doe
31: 2012-09-07 George Smith
32: 2012-09-07  Henry Smith
33: 2012-09-07     John Doe
            V1           V2

Solution 2:

merge.data.table(x, y) is a convenience function that wraps a call to x[y], so the merge needs to be based on columns that are in both data.tables. (That's what that error message is trying to tell you).

One work-around is to add a dummy column to both data.tables, whose only purpose is to make the merge possible:

## Add a column "k", and append it to each data.table's vector of keyed columns.
setkeyv(cust.dt[,k:=1], c(key(cust.dt), "k"))
setkeyv(dates.dt[,k:=1], c(key(dates.dt), "k"))

## Merge and then remove the dummy column
res <- merge(dates.dt, cust.dt, by="k")
head(res[,k:=NULL])
#          date first.name last.name
# 1: 2012-08-28     George     Smith
# 2: 2012-08-28      Henry     Smith
# 3: 2012-08-28       John       Doe
# 4: 2012-08-29     George     Smith
# 5: 2012-08-29      Henry     Smith
# 6: 2012-08-29       John       Doe

## Maybe also clean up cust.dt and dates.dt    
# cust.dt[,k:=NULL]
# dates.dt[,k=NULL]