Read Excel file from a URL using the readxl package
Consider a file on the internet (like this one (note the s in https) https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls
How can the sheet 2 of the file be read into R?
The following code is approximation of what is desired (but fails)
url1<-'https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls'
p1f <- tempfile()
download.file(url1, p1f, mode="wb")
p1<-read_excel(path = p1f, sheet = 2)
Solution 1:
This works for me on Windows:
library(readxl)
library(httr)
packageVersion("readxl")
# [1] ‘0.1.1’
GET(url1, write_disk(tf <- tempfile(fileext = ".xls")))
df <- read_excel(tf, 2L)
str(df)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 20131 obs. of 8 variables:
# $ Code : chr "C115388" "C115800" "C115801" "C115802" ...
# $ Codelist Code : chr NA "C115388" "C115388" "C115388" ...
# $ Codelist Extensible (Yes/No): chr "No" NA NA NA ...
# $ Codelist Name : chr "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" ...
# $ CDISC Submission Value : chr "SIXMW1TC" "SIXMW101" "SIXMW102" "SIXMW103" ...
# $ CDISC Synonym(s) : chr "6 Minute Walk Functional Test Test Code" "SIXMW1-Distance at 1 Minute" "SIXMW1-Distance at 2 Minutes" "SIXMW1-Distance at 3 Minutes" ...
# $ CDISC Definition : chr "6 Minute Walk Test test code." "6 Minute Walk Test - Distance at 1 minute." "6 Minute Walk Test - Distance at 2 minutes." "6 Minute Walk Test - Distance at 3 minutes." ...
# $ NCI Preferred Term : chr "CDISC Functional Test 6MWT Test Code Terminology" "6MWT - Distance at 1 Minute" "6MWT - Distance at 2 Minutes" "6MWT - Distance at 3 Minutes" ...
Solution 2:
From this issue on Github (#278):
some functionality for supporting more general inputs will be pulled out of readr, at which point readxl can exploit that.
So we should be able to pass urls directly to read_excel()
in the (hopefully near) future.
Solution 3:
use rio
R package. link. Here a reprex:
library(tidyverse)
library(rio)
url <- 'https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls'
rio::import(file = url,which = 2) %>%
glimpse()
#>
#> Rows: 30,995
#> Columns: 8
#> $ Code <chr> "C141663", "C141706", "C141707"...
#> $ `Codelist Code` <chr> NA, "C141663", "C141663", "C141...
#> $ `Codelist Extensible (Yes/No)` <chr> "No", NA, NA, NA, "No", NA, NA,...
#> $ `Codelist Name` <chr> "4 Stair Ascend Functional Test...
#> $ `CDISC Submission Value` <chr> "A4STR1TC", "A4STR101", "A4STR1...
#> $ `CDISC Synonym(s)` <chr> "4 Stair Ascend Functional Test...
#> $ `CDISC Definition` <chr> "4 Stair Ascend test code.", "4...
#> $ `NCI Preferred Term` <chr> "CDISC Functional Test 4 Stair ...
Solution 4:
A simpler solution is using the openxlsx package. Here is an example, which can be adapted to your needs:
library(openxlsx)
df = read.xlsx("https://archive.ics.uci.edu/ml/machine-learning-databases/00242/ENB2012_data.xlsx",sheet=1)