Scraping javascript website in R
So, RSelenium is not the only answer (anymore). If you can install the PhantomJS binary (grab phantomjs binaries from here: http://phantomjs.org/) then you can use it to render the HTML and scrape it with rvest
(similar to the RSelenium approach but doesn't require java):
library(rvest)
# render HTML from the site with phantomjs
url <- "http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhX07d/#game-summary"
writeLines(sprintf("var page = require('webpage').create();
page.open('%s', function () {
console.log(page.content); //page source
phantom.exit();
});", url), con="scrape.js")
system("phantomjs scrape.js > scrape.html", intern = T)
# extract the content you need
pg <- html("scrape.html")
pg %>% html_nodes("#utime") %>% html_text()
## [1] "10:20 AM, October 28, 2014"
You could also use docker as the web driver (in place of selenium)
You will still need to install phantomjs, and docker too. Then run:
library(RSelenium)
url <- "http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhX07d/#game-summary"
system('docker run -d -p 4445:4444 selenium/standalone-chrome')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "chrome")
remDr$open()
remDr$navigate(url)
writeLines(sprintf("var page = require('webpage').create();
page.open('%s', function () {
console.log(page.content); //page source
phantom.exit();
});", url), con="scrape.js")
system("phantomjs scrape.js > scrape.html", intern = T)
# extract the content you need
pg <- read_html("scrape.html")
pg %>% html_nodes("#utime") %>% html_text()
# [1] "10:20 AM, October 28, 2014"