When importing CSV into R how to generate column with name of the CSV?

You have already done all the hard work. With a fairly small modification this should be straight-forward.

The logic is:

Create a small helper function that reads an individual csv and adds a column with the file name.
Call this helper function in llply()

The following should work:

read_csv_filename <- function(filename){
    ret <- read.csv(filename)
    ret$Source <- filename #EDIT
    ret
}

import.list <- ldply(filenames, read_csv_filename)

Note that I have proposed another small improvement to your code: read.csv() returns a data.frame - this means you can use ldply() rather than llply().

Try this:

do.call("rbind", sapply(filenames, read.csv, simplify = FALSE))

The row names will indicate the source and line number.

Here is a solution using the import_list() function from rio, which is designed exactly for this purpose.

# setup some example files to import
rio::export(mtcars, "mtcars1.csv")
rio::export(mtcars, "mtcars2.csv")
rio::export(mtcars, "mtcars3.csv")

The default behavior of import_list() is to get a list of data frames:

str(rio::import_list(dir(pattern = "mtcars")), 1)
## List of 3
##  $ :'data.frame':       32 obs. of  11 variables:
##  $ :'data.frame':       32 obs. of  11 variables:
##  $ :'data.frame':       32 obs. of  11 variables:

But you can use the rbind argument to instead construct a single data frame (note the _file column at the end):

str(rio::import_list(dir(pattern = "mtcars"), rbind = TRUE))
## 'data.frame':   96 obs. of  12 variables:
##  $ mpg  : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl  : int  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp : num  160 160 108 258 360 ...
##  $ hp   : int  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt   : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec : num  16.5 17 18.6 19.4 17 ...
##  $ vs   : int  0 0 1 1 0 1 0 1 1 1 ...
##  $ am   : int  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear : int  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb : int  4 4 1 1 2 1 4 2 2 4 ...
##  $ _file: chr  "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" ...

and the rbind_label argument to specify the name of the column that identifies each file:

str(rio::import_list(dir(pattern = "mtcars"), rbind = TRUE, rbind_label = "source"))
## 'data.frame':   96 obs. of  12 variables:
##  $ mpg   : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl   : int  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp  : num  160 160 108 258 360 ...
##  $ hp    : int  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat  : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt    : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec  : num  16.5 17 18.6 19.4 17 ...
##  $ vs    : int  0 0 1 1 0 1 0 1 1 1 ...
##  $ am    : int  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear  : int  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb  : int  4 4 1 1 2 1 4 2 2 4 ...
##  $ source: chr  "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" ...

For full disclosure: I am the maintainer of rio.

When importing CSV into R how to generate column with name of the CSV?

Related

Recent Posts