Rcpp function check if missing value

r rcpp

R has both NaN and NA (which is really a special kind of NaN) for representing missing values. This is important to know because there are many functions that check if a value is NaN-y (NA or NaN):

Some truth tables for functions from the R/C API (note the frustrating lack of consistency)

+---------------------+
| Function | NaN | NA |
+---------------------+
| ISNAN    |  t  | t  |
| R_IsNaN  |  t  | f  |
| ISNA     |  f  | t  |
| R_IsNA   |  f  | t  |
+---------------------+

and Rcpp:

+-------------------------+
| Function     | NaN | NA |
+-------------------------+
| Rcpp::is_na  |  t  | t  |
| Rcpp::is_nan |  t  | f  |
+-------------------------+

and from the R interpreter (note: Rcpp tries to match this, rather than the R/C API):

+---------------------+
| Function | NaN | NA |
+---------------------+
| is.na    |  t  | t  |
| is.nan   |  t  | f  |
+---------------------+

Unfortunately it's a confusing landscape, but this should empower you a bit.

Both Rcpp and RcppArmadillo have predicates to test for NA, NaN (an R extension) and Inf.

Here is a short RcppArmadillo example:

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::mat foo(int n, double threshold=NA_REAL) {
  arma::mat M = arma::zeros<arma::mat>(n,n);
  if (arma::is_finite(threshold)) M = M + threshold;
  return M;
}

/*** R
foo(2)
foo(2, 3.1415)
***/

We initialize a matrix of zeros, and test for the argument. If it is finite (ie not NA or Inf or NaN), then we add that value. If you wanted to, you could test for the possibilities individually too.

This produces the desired result: without a second argument the default value of NA applies, and we get a matrix of zeros.

R> Rcpp::sourceCpp("/tmp/giorgio.cpp")

R> foo(2)
     [,1] [,2]
[1,]    0    0
[2,]    0    0

R> foo(2, 3.1415)
       [,1]   [,2]
[1,] 3.1415 3.1415
[2,] 3.1415 3.1415
R>

I've been testing this and can shed some light on the possibilities.

For a single SEXP target, the Rcpp option I've used is:

switch(TYPEOF(target)) {
case INTSXP:
    return Rcpp::traits::is_na<INTSXP>(Rcpp::as<int>(target));
case REALSXP:
    return Rcpp::traits::is_na<REALSXP>(Rcpp::as<double>(target));
case LGLSXP:
    return Rcpp::traits::is_na<LGLSXP>(Rcpp::as<int>(target));
case CPLXSXP:
    return Rcpp::traits::is_na<CPLXSXP>(Rcpp::as<Rcomplex>(target));
case STRSXP: {
    Rcpp::StringVector vec(target);
    return Rcpp::traits::is_na<STRSXP>(vec[0]);
}
}

If you want to check without using Rcpp there are some caveats:

As mentioned here, integer and logical NA (both stored as int) is equal to the minimum value of int (-2147483648).
For double, you could directly use what Rcpp uses, namely R_isnancpp. Equivalently, the ISNAN macro could be used.
For complex numbers, you could check both real and imaginary parts with the double method from above.

Character NA is tricky, since it's a singleton, so the address is what matters. I personally have been testing ways to do operations with R characters without storing std::string to avoid copies, i.e. using the char* directly. What I've found that works is to declare this in a .cpp file:

static const char *na_string_ptr = CHAR(Rf_asChar(NA_STRING));

and, based on this answer, do something like this for a Rcpp::StringVector or Rcpp::StringMatrix x:

Rcpp::CharacterVector one_string = Rcpp::as<Rcpp::CharacterVector>(x[i]);
char *ptr = (char *)(one_string[0]);
return ptr == na_string_ptr;

This last one still uses Rcpp, but I can use it once for initial setup and then just use the char pointers. I'm sure there's a way to do something similar with R's API, but that's something I haven't tried yet.

Rcpp function check if missing value

Related

Recent Posts