Using data.table package inside my own package

I am trying to use the data.table package inside my own package. MWE is as follows:

I create a function, test.fun, that simply creates a small data.table object, and then sums the "Val" column grouping by the "A" column. The code is

test.fun<-function ()
{
    library(data.table)
    testdata<-data.table(A=rep(seq(1,5), 5), Val=rnorm(25))
    setkey(testdata, A)
    res<-testdata[,{list(Ct=length(Val),Total=sum(Val),Avg=mean(Val))},"A"]
    return(res)
}

When I create this function in a regular R session, and then run the function, it works as expected.

> res<-test.fun()
data.table 1.8.0  For help type: help("data.table")
> res
     A Ct      Total        Avg
[1,] 1  5 -0.5326444 -0.1065289
[2,] 2  5 -4.0832062 -0.8166412
[3,] 3  5  0.9458251  0.1891650
[4,] 4  5  2.0474791  0.4094958
[5,] 5  5  2.3609443  0.4721889

When I put this function into a package, install the package, load the package, and then run the function, I get an error message.

> library(testpackage)
> res<-test.fun()
data.table 1.8.0  For help type: help("data.table")
Error in `[.data.frame`(x, i, j) : object 'Val' not found

Can anybody explain to me why this is happening and what I can do to fix it. Any help is very much appreciated.


Solution 1:

Andrie's guess is right, +1. There is a FAQ on it (see vignette("datatable-faq")), as well as a new vignette on importing data.table:

FAQ 6.9: I have created a package that depends on data.table. How do I ensure my package is data.table-aware so that inheritance from data.frame works?

Either i) include data.table in the Depends: field of your DESCRIPTION file, or ii) include data.table in the Imports: field of your DESCRIPTION file AND import(data.table) in your NAMESPACE file.

Further background ... at the top of [.data.table (and other data.table functions), you'll see a switch depending on the result of a call to cedta(). This stands for Calling Environment Data Table Aware. Typing data.table:::cedta reveals how it's done. It relies on the calling package having a namespace, and, that namespace Import'ing or Depend'ing on data.table. This is how data.table can be passed to non-data.table-aware packages (such as functions in base) and those packages can use absolutely standard [.data.frame syntax on the data.table, blissfully unaware that the data.frame is() a data.table, too.

This is also why data.table inheritance didn't used to be compatible with namespaceless packages, and why upon user request we had to ask authors of such packages to add a namespace to their package to be compatible. Happily, now that R adds a default namespace for packages missing one (from v2.14.0), that problem has gone away :

CHANGES IN R VERSION 2.14.0
* All packages must have a namespace, and one is created on installation if not supplied in the sources.

Solution 2:

Here is the complete recipe:

  1. Add data.table to Imports in your DESCRIPTION file.

  2. Add @import data.table to your respective .R file (i.e., the .R file that houses your function that's throwing the error Error in [.data.frame(x, i, j) : object 'Val' not found).

  3. Type library(devtools) and set your working directory to point at the main directory of your R package.

  4. Type document(). This will ensure that your NAMESPACE file includes a import(data.table) line.

  5. Type build()

  6. Type install()

For a nice primer on what build() and install() do, see: http://kbroman.org/pkg_primer/.

Then, once you close your R session and login next time, you can immediately jump right in with:

  1. Type library("my_R_package")

  2. Type the name of your function that's housed in the .R file mentioned above.

  3. Enjoy! You should no longer receive the dreaded Error in [.data.frame(x, i, j) : object 'Val' not found