Capture AJAX response with selenium and python

I click on a link in Firefox, the webpage sends a request using javascript, then the server sends some sort of response which includes a website address. So this new website then opens in a new Window. The html code behind the link is (I've omitted initial and final <span> tag):

> class="taLnk hvrIE6"
> onclick="ta.trackEventOnPage('AttractionContactInfo', 'Website',
> 2316062, 1); ta.util.cookie.setPIDCookie(15190);
> ta.call('ta.util.link.targetBlank', event, this,
> {'aHref':'LqMWJQiMnYQQoqnQQxGEcQQoqnQQWJQzZYUWJQpEcYGII26XombQQoqnQQQQoqnqgoqnQQQQoqnQQQQoqnQQQQoqnqgoqnQQQQoqnQQuuuQQoqnQQQQoqnxioqnQQQQoqnQQJMsVCIpEVMSsVEtHJcSQQoqnQQQQoqnxioqnQQQQoqnQQniaQQoqnQQQQoqnqgoqnQQQQoqnQQWJQzhYmkXHJUokUHnmKTnJXB',
> 'isAsdf':true})">Website

I want to capture the server response and extract the 'new website' using Python and Selenium. I've been using BeautifulSoup for scraping and am pretty new to Selenium.

So far, I am able to find this element and click on it using selenium, which opens the 'new website' in a new window. I don't know how to capture the response from server.

I once intercepted some ajax calls injecting javascript to the page using selenium. The bad side of the history is that selenium could sometimes be, let's say "fragile". So for no reason I got selenium exceptions while doing this injection.

Anyway, my idea was intercept the XHR calls, and set its response to a new dom element created by me that I could manipulate from selenium. In the condition for the interception you can even use the url that made the request in order to just intercept the one that you actually want (self._url)

btw, I got the idea from intercept all ajax calls?

Maybe this helps.

browser.execute_script("""
(function(XHR) {
  "use strict";

  var element = document.createElement('div');
  element.id = "interceptedResponse";
  element.appendChild(document.createTextNode(""));
  document.body.appendChild(element);

  var open = XHR.prototype.open;
  var send = XHR.prototype.send;

  XHR.prototype.open = function(method, url, async, user, pass) {
    this._url = url; // want to track the url requested
    open.call(this, method, url, async, user, pass);
  };

  XHR.prototype.send = function(data) {
    var self = this;
    var oldOnReadyStateChange;
    var url = this._url;

    function onReadyStateChange() {
      if(self.status === 200 && self.readyState == 4 /* complete */) {
        document.getElementById("interceptedResponse").innerHTML +=
          '{"data":' + self.responseText + '}*****';
      }
      if(oldOnReadyStateChange) {
        oldOnReadyStateChange();
      }
    }

    if(this.addEventListener) {
      this.addEventListener("readystatechange", onReadyStateChange,
        false);
    } else {
      oldOnReadyStateChange = this.onreadystatechange;
      this.onreadystatechange = onReadyStateChange;
    }
    send.call(this, data);
  }
})(XMLHttpRequest);
""")

I've come up to this page when trying to catch XHR content based on AJAX requests. And I eventually found this package

from seleniumwire import webdriver  # Import from seleniumwire
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# Go to the Google home page
driver.get('https://www.google.com')

# Access requests via the `requests` attribute
for request in driver.requests:
    if request.response:
        print(
            request.url,
            request.response.status_code,
            request.response.headers['Content-Type']
        )

this package allow to get the content response from any request, such as json :

https://www.google.com/ 200 text/html; charset=UTF-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_120x44dp.png 200 image/png
https://consent.google.com/status?continue=https://www.google.com&pc=s&timestamp=1531511954&gl=GB 204 text/html; charset=utf-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png 200 image/png
https://ssl.gstatic.com/gb/images/i2_2ec824b0.png 200 image/png
https://www.google.com/gen_204?s=webaft&t=aft&atyp=csi&ei=kgRJW7DBONKTlwTK77wQ&rt=wsrt.366,aft.58,prt.58 204 text/html; charset=UTF-8
..

I was unable to capture AJAX response with selenium but here is what works, although without selenium:

1- Find out the XML request by monitoring the network analyzing tools in your browser

2= Once you've identified the request, regenerate it using Python's requests or urllib2 modules. I personally recommend requests because of its additional features, most important to me was requests.Session.

You can find plenty of help and relevant posts regarding these two steps.

Hope it will help someone someday.

Capture AJAX response with selenium and python

Related

Recent Posts