Override column types when importing data using readr::read_csv() when there are many columns

Solution 1:

Here follows a more generic answer to this question if someone happens to stumble upon this in the future. It is less advisable to use "skip" to jump columns as this will fail to work if the imported data source structure is changed.

It could be easier in your example to simply set a default column type, and then define any columns that differ from the default.

E.g., if all columns typically are "d", but the date column should be "D", load the data as follows:

  read_csv(df, col_types = cols(.default = "d", date = "D"))

or if, e.g., column date should be "D" and column "xxx" be "i", do so as follows:

  read_csv(df, col_types = cols(.default = "d", date = "D", xxx = "i"))

The use of "default" above is powerful if you have multiple columns and only specific exceptions (such as "date" and "xxx").

Solution 2:

Yes. For example to force numeric data to be treated as characters:

examplecsv = "a,b,c\n1,2,a\n3,4,d"
read_csv(examplecsv)
# A tibble: 2 x 3
#      a     b     c
#  <int> <int> <chr>
#1     1     2     a
#2     3     4     d
read_csv(examplecsv, col_types = cols(b = col_character()))
# A tibble: 2 x 3
#      a     b     c
#  <int> <chr> <chr>
#1     1     2     a
#2     3     4     d

Choices are:

col_character() 
col_date()
col_time() 
col_datetime() 
col_double() 
col_factor() # to enforce, will never be guessed
col_integer() 
col_logical() 
col_number() 
col_skip() # to force skip column

More: http://readr.tidyverse.org/articles/readr.html

Override column types when importing data using readr::read_csv() when there are many columns

Solution 1:

Solution 2:

Related

Recent Posts