How to get google search results

Solution 1:

If you look at the htmlvariable, you can see that the search result links all are nested in <h3 class="r"> tags.

Try to change your getGoogleLinks function to:

getGoogleLinks <- function(google.url) {
   doc <- getURL(google.url, httpheader = c("User-Agent" = "R
   html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function
   nodes <- getNodeSet(html, "//h3[@class='r']//a")
   return(sapply(nodes, function(x) x <- xmlAttrs(x)[["href"]]))

Solution 2:

I created this function to read in a list of company names and then get the top website result for each. It will get you started then you can adjust it as needed.


#load data
d <-read.csv("P:\\needWebsites.csv")
c <- as.character(d$Company.Name)

# Function for getting website.
getWebsite <- function(name)
    url = URLencode(paste0("",name))

    page <- read_html(url)

    results <- page %>% 
      html_nodes("cite") %>% # Get all notes of type cite. You can change this to grab other node types.

    result <- results[1]

    return(as.character(result)) # Return results if you want to see them all.

# Apply the function to a list of company names.
websites <- data.frame(Website = sapply(c,getWebsite))]

Solution 3:

other solutions here don't work for me, here's my take on @Bryce-Chamberlain's issue which works for me in August 2019, it answers also another closed question : company name to URL in R

# install.packages("rvest")

get_first_google_link <- function(name, root = TRUE) {
  url = URLencode(paste0("",name))
  page <- xml2::read_html(url)
  # extract all links
  nodes <- rvest::html_nodes(page, "a")
  links <- rvest::html_attr(nodes,"href")
  # extract first link of the search results
  link <- links[startsWith(links, "/url?q=")][1]
  # clean it
  link <- sub("^/url\\?q\\=(.*?)\\&sa.*$","\\1", link)
  # get root if relevant
  if(root) link <- sub("^(https?://.*?/).*$", "\\1", link)

companies <- data.frame(company = c("apple acres llc","abbvie inc","apple inc"))
companies <- transform(companies, url = sapply(company,get_first_google_link))
#>           company                            url
#> 1 apple acres llc
#> 2      abbvie inc
#> 3       apple inc

Created on 2019-08-10 by the reprex package (v0.2.1)