Converting unit abbreviations to numbers
Solution 1:
- So you want to translate SI unit abbreviations ('K','M',...) into exponents, and thus numerical powers-of-ten. Given that all units are single-letter, and the exponents are uniformly-spaced powers of 10**3, here's working code that handles 'Kilo'...'Yotta', and any future exponents:
> 10 ** (3*as.integer(regexpr('T', 'KMGTPEY')))
[1] 1e+12
Then just multiply that power-of-ten by the decimal value you have.
- Also, you probably want to detect and handle the 'no-match' case for unknown letter prefixes, otherwise you'd get a nonsensical
-1*3
> unit_to_power <- function(u) {
exp_ <- 10**(as.integer(regexpr(u, 'KMGTPEY')) *3)
return (if(exp_>=0) exp_ else 1)
}
Now if you want to case-insensitive-match both 'k' and 'K' to Kilo (as computer people often write, even though it's technically an abuse of SI), then you'll need to special-case e.g with if-else ladder/expression (SI units are case-sensitive in general, 'M' means 'Mega' but 'm' strictly means 'milli' even if disk-drive users say otherwise; upper-case is conventionally for positive exponents). So for a few prefixes, @DanielV's case-specific code is better.
If you want negative SI prefixes too, use
as.integer(regexpr(u, 'zafpnum@KMGTPEY')-8)
where@
is just some throwaway character to keep uniform spacing, it shouldn't actually get matched. Again if you need to handle non-power-of-10**3 units like 'deci', 'centi', will require special-casing, or the general dict-based approach WeNYoBen uses.base::regexpr
is not vectorized also its performance is bad on big inputs, so if you want to vectorize and get higher-performance usestringr::str_locate
.
Solution 2:
Give this a shot:
Text_Num <- function(x){
if (grepl("M", x, ignore.case = TRUE)) {
as.numeric(gsub("M", "", x, ignore.case = TRUE)) * 1e6
} else if (grepl("k", x, ignore.case = TRUE)) {
as.numeric(gsub("k", "", x, ignore.case = TRUE)) * 1e3
} else {
as.numeric(x)
}
}
Solution 3:
In your case you can using gsubfn
a=c('12M','1.2k')
dict<-list("k" = "e3", "M" = "e6")
as.numeric(gsubfn::gsubfn(paste(names(dict),collapse="|"),dict,a))
[1] 1.2e+07 1.2e+03