as.numeric with comma decimal separators?

I have a large vector of strings of the form:

Input = c("1,223", "12,232", "23,0")

etc. That's to say, decimals separated by commas, instead of periods. I want to convert this vector into a numeric vector. Unfortunately, as.numeric(Input) just outputs NA.

My first instinct would be to go to strsplit, but it seems to me that this will likely be very slow. Does anyone have any idea of a faster option?

There's an existing question that suggests read.csv2, but the strings in question are not directly read in that way.


Solution 1:

as.numeric(sub(",", ".", Input, fixed = TRUE))

should work.

Solution 2:

The readr package has a function to parse numbers from strings. You can set many options via the locale argument.

For comma as decimal separator you can write:

readr::parse_number(Input, locale = readr::locale(decimal_mark = ","))

Solution 3:

scan(text=Input, dec=",")
## [1]  1.223 12.232 23.000

But it depends on how long your vector is. I used rep(Input, 1e6) to make a long vector and my machine just hangs. 1e4 is fine, though. @adibender's solution is much faster. If we run on 1e4, a lot faster:

Unit: milliseconds
         expr        min         lq     median         uq        max neval
  adibender()   6.777888   6.998243   7.119136   7.198374   8.149826   100
 sebastianc() 504.987879 507.464611 508.757161 510.732661 517.422254   100

Solution 4:

Also, if you are reading in the raw data, the read.table and all the associated functions have a dec argument. eg:

read.table("file.txt", dec=",")

When all else fails, gsub and sub are your friends.