Rcpp function check if missing value
R has both NaN
and NA
(which is really a special kind of NaN
) for representing missing values. This is important to know because there are many functions that check if a value is NaN
-y (NA
or NaN
):
Some truth tables for functions from the R/C API (note the frustrating lack of consistency)
+---------------------+
| Function | NaN | NA |
+---------------------+
| ISNAN | t | t |
| R_IsNaN | t | f |
| ISNA | f | t |
| R_IsNA | f | t |
+---------------------+
and Rcpp:
+-------------------------+
| Function | NaN | NA |
+-------------------------+
| Rcpp::is_na | t | t |
| Rcpp::is_nan | t | f |
+-------------------------+
and from the R interpreter (note: Rcpp tries to match this, rather than the R/C API):
+---------------------+
| Function | NaN | NA |
+---------------------+
| is.na | t | t |
| is.nan | t | f |
+---------------------+
Unfortunately it's a confusing landscape, but this should empower you a bit.
Both Rcpp and RcppArmadillo have predicates to test for NA
, NaN
(an R extension) and Inf
.
Here is a short RcppArmadillo example:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat foo(int n, double threshold=NA_REAL) {
arma::mat M = arma::zeros<arma::mat>(n,n);
if (arma::is_finite(threshold)) M = M + threshold;
return M;
}
/*** R
foo(2)
foo(2, 3.1415)
***/
We initialize a matrix of zeros, and test for the argument. If it is finite (ie not NA
or Inf
or NaN
), then we add that value. If you wanted to, you could test for the possibilities individually too.
This produces the desired result: without a second argument the default value of NA
applies, and we get a matrix of zeros.
R> Rcpp::sourceCpp("/tmp/giorgio.cpp")
R> foo(2)
[,1] [,2]
[1,] 0 0
[2,] 0 0
R> foo(2, 3.1415)
[,1] [,2]
[1,] 3.1415 3.1415
[2,] 3.1415 3.1415
R>
I've been testing this and can shed some light on the possibilities.
For a single SEXP target
, the Rcpp
option I've used is:
switch(TYPEOF(target)) {
case INTSXP:
return Rcpp::traits::is_na<INTSXP>(Rcpp::as<int>(target));
case REALSXP:
return Rcpp::traits::is_na<REALSXP>(Rcpp::as<double>(target));
case LGLSXP:
return Rcpp::traits::is_na<LGLSXP>(Rcpp::as<int>(target));
case CPLXSXP:
return Rcpp::traits::is_na<CPLXSXP>(Rcpp::as<Rcomplex>(target));
case STRSXP: {
Rcpp::StringVector vec(target);
return Rcpp::traits::is_na<STRSXP>(vec[0]);
}
}
If you want to check without using Rcpp
there are some caveats:
- As mentioned here,
integer and logical
NA
(both stored asint
) is equal to the minimum value ofint
(-2147483648). - For
double
, you could directly use whatRcpp
uses, namelyR_isnancpp
. Equivalently, theISNAN
macro could be used. - For complex numbers, you could check both real and imaginary parts with the
double
method from above.
Character NA
is tricky, since it's a singleton, so the address is what matters.
I personally have been testing ways to do operations with R characters without storing std::string
to avoid copies,
i.e. using the char*
directly.
What I've found that works is to declare this in a .cpp
file:
static const char *na_string_ptr = CHAR(Rf_asChar(NA_STRING));
and, based on this answer,
do something like this for a Rcpp::StringVector
or Rcpp::StringMatrix
x
:
Rcpp::CharacterVector one_string = Rcpp::as<Rcpp::CharacterVector>(x[i]);
char *ptr = (char *)(one_string[0]);
return ptr == na_string_ptr;
This last one still uses Rcpp
,
but I can use it once for initial setup and then just use the char
pointers.
I'm sure there's a way to do something similar with R's API,
but that's something I haven't tried yet.