data.table::fwrite behaving strange when saving file names
I have a very large data frame and I am doing multiple operations on a subsets of columns in lapply
, which includes exporting the subset using data.table::fwrite
I have multiple columns which the same letters but are different cases (for example Co2 and CO2). When I obtain the column names and run the process in lapply
using dplyr::select
the correct columns are selected (i.e. the correct cases are selected), but when I export the data, some of the exported file names have the incorrect case, while others have the correct case.
For example, below is an example data frame
test_df <- data.frame('CO2' = c(1,2,3,4,5),
'Co2' = c(6,7,8,9,10))
CO2 Co2
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
Below is the code I use to select the columns used in lapply
followed by a simple subset of the datasets using lapply
(note the output is as would be expected).
test_colnames <- test_df %>% colnames()
lapply(test_colnames, function(x){
var_name <- all_of(x)
export <- test_df %>%
select(!!var_name) })
[[1]]
CO2
1 1
2 2
3 3
4 4
5 5
[[2]]
Co2
1 6
2 7
3 8
4 9
5 10
When I try to export the subsets using the code below I only get one output file (instead of two), which is named CO2, but has data from Co2. Somehow Co2 has been capitalised for the output name (but not the column names) and overwritten the CO2 file.
lapply(test_colnames, function(x){
var_name <- all_of(x)
export <- test_df %>%
select(!!var_name)
export %>% data.table::fwrite(paste0(data_directory, "test_out_", all_of(x), ".csv")) })
Only one file was output which was named test_out_CO2.csv, but below is the contents of the file.
Co2
1 6
2 7
3 8
4 9
5 10
Any ideas for how to fix this?
Edit: Simplified example, below would output only one file - "testName.txt":
library(data.table)
fwrite(mtcars[1:2], "testName.txt")
fwrite(mtcars[3:4], "testname.txt")
Solution 1:
This is not data.table specific issue, Windows filenames are case insensitive.
I would suggest to make the column names very unique:
colnames(test_df) <- make.unique(tolower(colnames(test_df)), sep = "_")
colnames(test_df)
# [1] "co2" "co2_1"
# ... rest of your code