Importing csv file into R - numeric values read as characters
Solution 1:
Whatever algebra you are doing in Excel to create the new column could probably be done more effectively in R.
Please try the following: Read the raw file (before any excel manipulation) into R using read.csv(... stringsAsFactors=FALSE)
. [If that does not work, please take a look at ?read.table
(which read.csv
wraps), however there may be some other underlying issue].
For example:
delim = "," # or is it "\t" ?
dec = "." # or is it "," ?
myDataFrame <- read.csv("path/to/file.csv", header=TRUE, sep=delim, dec=dec, stringsAsFactors=FALSE)
Then, let's say your numeric columns is column 4
myDataFrame[, 4] <- as.numeric(myDataFrame[, 4]) # you can also refer to the column by "itsName"
Lastly, if you need any help with accomplishing in R the same tasks that you've done in Excel, there are plenty of folks here who would be happy to help you out
Solution 2:
In read.table
(and its relatives) it is the na.strings
argument which specifies which strings are to be interpreted as missing values NA
. The default value is na.strings = "NA"
If missing values in an otherwise numeric variable column are coded as something else than "NA
", e.g. ".
" or "N/A
", these rows will be interpreted as character
, and then the whole column is converted to character
.
Thus, if your missing values are some else than "NA
", you need to specify them in na.strings
.
Solution 3:
If you're dealing with large datasets (i.e. datasets with a high number of columns), the solution noted above can be manually cumbersome, and requires you to know which columns are numeric a priori.
Try this instead.
char_data <- read.csv(input_filename, stringsAsFactors = F)
num_data <- data.frame(data.matrix(char_data))
numeric_columns <- sapply(num_data,function(x){mean(as.numeric(is.na(x)))<0.5})
final_data <- data.frame(num_data[,numeric_columns], char_data[,!numeric_columns])
The code does the following:
- Imports your data as character columns.
- Creates an instance of your data as numeric columns.
- Identifies which columns from your data are numeric (assuming columns with less than 50% NAs upon converting your data to numeric are indeed numeric).
- Merging the numeric and character columns into a final dataset.
This essentially automates the import of your .csv file by preserving the data types of the original columns (as character and numeric).