Test for numeric elements in a character string
Maybe there's a reason some other pieces of your data are more complicated that would break this, but my first thought is:
> !is.na(as.numeric(x))
[1] TRUE TRUE TRUE TRUE FALSE FALSE
As noted below by Josh O'Brien this won't pick up things like 7L
, which the R interpreter would parse as the integer 7. If you needed to include those as "plausibly numeric" one route would be to pick them out with a regex first,
x <- c("1.2","1e4","1.2.3","5L")
> x
[1] "1.2" "1e4" "1.2.3" "5L"
> grepl("^[[:digit:]]+L",x)
[1] FALSE FALSE FALSE TRUE
...and then strip the "L" from just those elements using gsub
and indexing.
I recently encountered a similar problem where I was trying to write a function to format values passed as a character string from another function. The formatted values would ultimately end up in a table and I wanted to create logic to identify NA, character strings, and character representations of numbers so that I could apply sprintf()
on them before generating the table.
Although more complicated to read, I do like the robustness of the grepl()
approach. I think this gets all of the examples brought up in the comments.
x <- c("0",37,"42","-5","-2.3","1.36e4","4L","La","ti","da",NA)
y <- grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x)
This would be evaluate to (formatted to help with visualization):
x
[1] "0" "37" "42" "-5" "-2.3" "1.36e4" "4L" "La" "ti" "da" NA
y
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
The regular expression is TRUE for:
- positive or negative numbers with no more than one decimal OR
- positive or negative integers (e.g., 4L) OR
- positive or negative numbers in scientific notation
Additional terms could be added to handle decimals without a leading digit or numbers with a decimal point but not digits after the decimal if the dataset contained numbers in poor form.
Avoid re-inventing the wheel with check.numeric()
from package varhandle.
The function accepts the following arguments:
v The character vector or factor vector. (Mandatory)
na.rm logical. Should the function ignore NA? Default value is FLASE since NA can be converted to numeric. (Optional)
only.integer logical. Only check for integers and do not accept floating point. Default value is FALSE. (Optional)
exceptions A character vector containing the strings that should be considered as valid to be converted to numeric. (Optional)
ignore.whitespace logical. Ignore leading and tailing whitespace characters before assessing if the vector can be converted to numeric. Default value is TRUE. (Optional)
Another possibility:
x <- c("0.33", ".1", "3", "123", "2.3.3", "1.2r", "1.2", "1e4", "1.2.3", "5L", ".22", -3)
locs <- sapply(x, function(n) {
out <- try(eval(parse(text = n)), silent = TRUE)
!inherits(out, 'try-error')
}, USE.NAMES = FALSE)
x[locs]
## [1] "0.33" ".1" "3" "123" "1.2" "1e4" "5L" ".22" "-3"
x[!locs]
## [1] "2.3.3" "1.2r" "1.2.3"