Loop pages and crawler excel file path using rvest

For the entries from this link, I need to click each entry, then crawler url of excel file's path in the left bottom part of page:

How could I achieve that using web scrapy packages in R such as rvest, etc.? Sincere thanks at advance.


# Start by reading a HTML page with read_html():
common_list <- read_html("http://www.csrc.gov.cn/csrc/c100121/common_list.shtml")
common_list %>%
  # extract paragraphs
  rvest::html_nodes("a") %>%
  # extract text
  rvest::html_text() -> webtxt
# inspect

First, my question is how could I correctly set html_nodes to get url of each web page?

> driver
[1] "No sessionInfo. Client browser is mostly likely not opened."

PROCESS 'file105483d2b3a.bat', running, pid 37512.
> remDr
[1] "localhost"

[1] 4567

[1] "chrome"

[1] ""

[1] "ANY"

[1] TRUE

[1] TRUE


When I run remDr$navigate(url):

Error in checkError(res) : 
  Undefined error in httr call. httr output: length(url) == 1 is not TRUE

Using rvest to get the links,


link <- url %>%
  read_html() %>%  

link <- link[[2]] %>% 
  html_nodes("a") %>% 
  html_attr('href') %>% paste0('http://www.csrc.gov.cn', .)

 [1] "http://www.csrc.gov.cn/csrc/c101921/c1758587/content.shtml"                         
 [2] "http://www.csrc.gov.cn/csrc/c101921/c1714636/content.shtml"                         
 [3] "http://www.csrc.gov.cn/csrc/c101921/c1664367/content.shtml"                         
 [4] "http://www.csrc.gov.cn/csrc/c101921/c1657437/content.shtml"                         
 [5] "http://www.csrc.gov.cn/csrc/c101921/c1657426/content.shtml"     

We can use RSelenium to loop over the links and download excel files. It took me over a minute to completely load a single webpage. I will demonstrate hetre using a single link.

url <- "http://www.csrc.gov.cn/csrc/c101921/c1758587/content.shtml"
# launch the browser
driver <- rsDriver(browser = c("chrome"))
remDr <- driver[["client"]]

# click on the excel file path
remDr$findElement('xpath', '//*[@id="files"]/a')$clickElement()