Using data.table package inside my own package
I am trying to use the data.table package inside my own package. MWE is as follows:
I create a function, test.fun, that simply creates a small data.table object, and then sums the "Val" column grouping by the "A" column. The code is
test.fun<-function ()
{
library(data.table)
testdata<-data.table(A=rep(seq(1,5), 5), Val=rnorm(25))
setkey(testdata, A)
res<-testdata[,{list(Ct=length(Val),Total=sum(Val),Avg=mean(Val))},"A"]
return(res)
}
When I create this function in a regular R session, and then run the function, it works as expected.
> res<-test.fun()
data.table 1.8.0 For help type: help("data.table")
> res
A Ct Total Avg
[1,] 1 5 -0.5326444 -0.1065289
[2,] 2 5 -4.0832062 -0.8166412
[3,] 3 5 0.9458251 0.1891650
[4,] 4 5 2.0474791 0.4094958
[5,] 5 5 2.3609443 0.4721889
When I put this function into a package, install the package, load the package, and then run the function, I get an error message.
> library(testpackage)
> res<-test.fun()
data.table 1.8.0 For help type: help("data.table")
Error in `[.data.frame`(x, i, j) : object 'Val' not found
Can anybody explain to me why this is happening and what I can do to fix it. Any help is very much appreciated.
Solution 1:
Andrie's guess is right, +1. There is a FAQ on it (see vignette("datatable-faq")
), as well as a new vignette on importing data.table
:
FAQ 6.9: I have created a package that depends on data.table. How do I ensure my package is data.table-aware so that inheritance from data.frame works?
Either i) include
data.table
in theDepends:
field of your DESCRIPTION file, or ii) includedata.table
in theImports:
field of your DESCRIPTION file ANDimport(data.table)
in your NAMESPACE file.
Further background ... at the top of [.data.table
(and other data.table
functions), you'll see a switch depending on the result of a call to cedta()
. This stands for Calling Environment Data Table Aware. Typing data.table:::cedta
reveals how it's done. It relies on the calling package having a namespace, and, that namespace Import'ing or Depend'ing on data.table
. This is how data.table
can be passed to non-data.table-aware packages (such as functions in base
) and those packages can use absolutely standard [.data.frame
syntax on the data.table
, blissfully unaware that the data.frame
is()
a data.table
, too.
This is also why data.table
inheritance didn't used to be compatible with namespaceless packages, and why upon user request we had to ask authors of such packages to add a namespace to their package to be compatible. Happily, now that R adds a default namespace for packages missing one (from v2.14.0), that problem has gone away :
CHANGES IN R VERSION 2.14.0
* All packages must have a namespace, and one is created on installation if not supplied in the sources.
Solution 2:
Here is the complete recipe:
-
Add
data.table
toImports
in yourDESCRIPTION
file. -
Add
@import data.table
to your respective .R file (i.e., the .R file that houses your function that's throwing the errorError in [.data.frame(x, i, j) : object 'Val' not found
). -
Type
library(devtools)
and set your working directory to point at the main directory of your R package. -
Type
document()
. This will ensure that yourNAMESPACE
file includes aimport(data.table)
line. -
Type
build()
-
Type
install()
For a nice primer on what build()
and install()
do, see: http://kbroman.org/pkg_primer/.
Then, once you close your R session and login next time, you can immediately jump right in with:
-
Type
library("my_R_package")
-
Type the name of your function that's housed in the .R file mentioned above.
-
Enjoy! You should no longer receive the dreaded
Error in [.data.frame(x, i, j) : object 'Val' not found