When importing CSV into R how to generate column with name of the CSV?
You have already done all the hard work. With a fairly small modification this should be straight-forward.
The logic is:
- Create a small helper function that reads an individual csv and adds a column with the file name.
- Call this helper function in llply()
The following should work:
read_csv_filename <- function(filename){
ret <- read.csv(filename)
ret$Source <- filename #EDIT
ret
}
import.list <- ldply(filenames, read_csv_filename)
Note that I have proposed another small improvement to your code: read.csv() returns a data.frame - this means you can use ldply() rather than llply().
Try this:
do.call("rbind", sapply(filenames, read.csv, simplify = FALSE))
The row names will indicate the source and line number.
Here is a solution using the import_list()
function from rio, which is designed exactly for this purpose.
# setup some example files to import
rio::export(mtcars, "mtcars1.csv")
rio::export(mtcars, "mtcars2.csv")
rio::export(mtcars, "mtcars3.csv")
The default behavior of import_list()
is to get a list of data frames:
str(rio::import_list(dir(pattern = "mtcars")), 1)
## List of 3
## $ :'data.frame': 32 obs. of 11 variables:
## $ :'data.frame': 32 obs. of 11 variables:
## $ :'data.frame': 32 obs. of 11 variables:
But you can use the rbind
argument to instead construct a single data frame (note the _file
column at the end):
str(rio::import_list(dir(pattern = "mtcars"), rbind = TRUE))
## 'data.frame': 96 obs. of 12 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : int 6 6 4 6 8 6 8 4 4 6 ...
## $ disp : num 160 160 108 258 360 ...
## $ hp : int 110 110 93 110 175 105 245 62 95 123 ...
## $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec : num 16.5 17 18.6 19.4 17 ...
## $ vs : int 0 0 1 1 0 1 0 1 1 1 ...
## $ am : int 1 1 1 0 0 0 0 0 0 0 ...
## $ gear : int 4 4 4 3 3 3 3 4 4 4 ...
## $ carb : int 4 4 1 1 2 1 4 2 2 4 ...
## $ _file: chr "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" ...
and the rbind_label
argument to specify the name of the column that identifies each file:
str(rio::import_list(dir(pattern = "mtcars"), rbind = TRUE, rbind_label = "source"))
## 'data.frame': 96 obs. of 12 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : int 6 6 4 6 8 6 8 4 4 6 ...
## $ disp : num 160 160 108 258 360 ...
## $ hp : int 110 110 93 110 175 105 245 62 95 123 ...
## $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec : num 16.5 17 18.6 19.4 17 ...
## $ vs : int 0 0 1 1 0 1 0 1 1 1 ...
## $ am : int 1 1 1 0 0 0 0 0 0 0 ...
## $ gear : int 4 4 4 3 3 3 3 4 4 4 ...
## $ carb : int 4 4 1 1 2 1 4 2 2 4 ...
## $ source: chr "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" "mtcars1.csv" ...
For full disclosure: I am the maintainer of rio.