as.numeric with comma decimal separators?
I have a large vector of strings of the form:
Input = c("1,223", "12,232", "23,0")
etc. That's to say, decimals separated by commas, instead of periods. I want to convert this vector into a numeric vector. Unfortunately, as.numeric(Input)
just outputs NA
.
My first instinct would be to go to strsplit
, but it seems to me that this will likely be very slow. Does anyone have any idea of a faster option?
There's an existing question that suggests read.csv2
, but the strings in question are not directly read in that way.
Solution 1:
as.numeric(sub(",", ".", Input, fixed = TRUE))
should work.
Solution 2:
The readr
package has a function to parse numbers from strings. You can set many options via the locale
argument.
For comma as decimal separator you can write:
readr::parse_number(Input, locale = readr::locale(decimal_mark = ","))
Solution 3:
scan(text=Input, dec=",")
## [1] 1.223 12.232 23.000
But it depends on how long your vector is. I used rep(Input, 1e6)
to make a long vector and my machine just hangs. 1e4
is fine, though. @adibender's solution is much faster. If we run on 1e4, a lot faster:
Unit: milliseconds
expr min lq median uq max neval
adibender() 6.777888 6.998243 7.119136 7.198374 8.149826 100
sebastianc() 504.987879 507.464611 508.757161 510.732661 517.422254 100
Solution 4:
Also, if you are reading in the raw data, the read.table
and all the associated functions have a dec
argument. eg:
read.table("file.txt", dec=",")
When all else fails, gsub
and sub
are your friends.