Download .csv file from github using HTTR GET request

I am trying to create an automatic pull in R using the GET function from the HTTR package for a csv file located on github.

Here is the table I am trying to download.

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv

I can make the connection to the file using the following GET request:

library(httr)

x <- httr::GET("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")

However I am unsure how I then convert that into a dataframe similar to the table on github.

Any assistance would be much appreciated.


Solution 1:

I am new to R but here is my solution.

You need to use the raw version of the csv file from github (raw.githubusercontent.com)!

library(httr)

x <- httr::GET("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")

# Save to file
bin <- content(x, "raw")
writeBin(bin, "data.csv")

# Read as csv
dat = read.csv("data.csv", header = TRUE, dec = ",")

colnames(dat) = gsub("X", "", colnames(dat))

# Group by country name (to sum regions)
# Skip the four first columns containing metadata 
countries = aggregate(dat[, 5:ncol(dat)], by=list(Country.Region=dat$Country.Region), FUN=sum)

# Here is the table of the most recent total confirmed cases
countries_total = countries[, c(1, ncol(countries))]

The output graph

How I got this to work:

  • How to sum a variable by group