Where can I learn how to write C code to speed up slow R functions? [closed]

Solution 1:

Well there is the good old Use the source, Luke! --- R itself has plenty of (very efficient) C code one can study, and CRAN has hundreds of packages, some from authors you trust. That provides real, tested examples to study and adapt.

But as Josh suspected, I lean more towards C++ and hence Rcpp. It also has plenty of examples.

Edit: There were two books I found helpful:

  • The first one is Venables and Ripley's "S Programming" even though it is getting long in the tooth (and there have been rumours of a 2nd edition for years). At the time there was simply nothing else.
  • The second in Chambers' "Software for Data Analysis" which is much more recent and has a much nicer R-centric feel -- and two chapters on extending R. Both C and C++ get mentioned. Plus, John shreds me for what I did with digest so that alone is worth the price of admission.

That said, John is growing fond of Rcpp (and contributing) as he finds the match between R objects and C++ objects (via Rcpp) to be very natural -- and ReferenceClasses help there.

Edit 2: With Hadley's refocussed question, I very strongly urge you to consider C++. There is so much boilerplate nonsense you have to do with C---very tedious and very avoidable. Have a look at the Rcpp-introduction vignette. Another simple example is this blog post where I show that instead of worrying about 10% differences (in one of the Radford Neal examples) we can get eightyfold increases with C++ (on what is of course a contrived example).

Edit 3: There is complexity in that you may run into C++ errors that are, to put it mildly, hard to grok. But to just use Rcpp rather than to extend it, you should hardly ever need it. And while this cost is undeniable, it is far eclipsed by the benefit of simpler code, less boilerplate, no PROTECT/UNPROTECT, no memory management etc pp. Doug Bates just yesterday stated that he finds C++ and Rcpp to be much more like writing R than writing C++. YMMV and all that.

Solution 2:

Hadley,

You can definitely write C++ code that is similar to C code.

I understand what you say about C++ being more complicated than C. This is if you want to master everything : objects, templates, STL, template meta programming, etc ... most people don't need these things and can just rely on others to it. The implementation of Rcpp is very complicated, but just because you don't know how your fridge works, it does not mean you cannot open the door and grab fresh milk ...

From your many contributions to R, what strikes me is that you find R somewhat tedious (data manipulation, graphics, string manipulatio, etc ...). Well get prepared for many more surprises with the internal C API of R. This is very tedious.

From time to time, I read the R-exts or R-ints manuals. This helps. But most of the time, when I really want to find out about something, I go into the R source, and also in the source of packages written by e.g. Simon (there is usually lots to learn there).

Rcpp is designed to make these tedious aspects of the API go away.

You can judge for yourself what you find more complicated, obfuscated, etc ... based on a few examples. This function creates a character vector using the C API:

SEXP foobar(){
  SEXP ab;
  PROTECT(ab = allocVector(STRSXP, 2));
  SET_STRING_ELT( ab, 0, mkChar("foo") );
  SET_STRING_ELT( ab, 1, mkChar("bar") );
  UNPROTECT(1);
}

Using Rcpp, you can write the same function as:

SEXP foobar(){
   return Rcpp::CharacterVector::create( "foo", "bar" ) ;
}

or:

SEXP foobar(){
   Rcpp::CharacterVector res(2) ;
   res[0] = "foo" ;
   res[1] = "bar" ;
   return res ;
}

As Dirk said, there are other examples on the several vignettes. We also usually point people towards our unit tests because each of them test a very specific part of the code and are somewhat self explanatory.

I'm obviously biased here, but I would recommend getting familiar about Rcpp instead of learning the C API of R, and then come to the mailing list if something is unclear or does not seem doable with Rcpp.

Anyway, end of the sales pitch.

I guess it all depends what sort of code you want to write eventually.

Romain