Wait page to load before getting data with requests.get in python 3

It doesn't look like a problem of waiting, it looks like the element is being created by JavaScript, requests can't handle dynamically generated elements by JavaScript. A suggestion is to use selenium together with PhantomJS to get the page source, then you can use BeautifulSoup for your parsing, the code shown below will do exactly that:

from bs4 import BeautifulSoup
from selenium import webdriver

url = "http://legendas.tv/busca/walking%20dead%20s03e02"
browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
a = soup.find('section', 'wrapper')

Also, there's no need to use .findAll if you are only looking for one element only.


In Python 3, Using the module urllib in practice works better when loading dynamic webpages than the requests module.

i.e

import urllib.request
try:
    with urllib.request.urlopen(url) as response:

        html = response.read().decode('utf-8')#use whatever encoding as per the webpage
except urllib.request.HTTPError as e:
    if e.code==404:
        print(f"{url} is not found")
    elif e.code==503:
        print(f'{url} base webservices are not available')
        ## can add authentication here 
    else:
        print('http error',e)

I had the same problem, and none of the submitted answers really worked for me. But after long research, I found a solution:

from requests_html import HTMLSession
s = HTMLSession()
response = s.get(url)
response.html.render()

print(response)
# prints out the content of the fully loaded page
# response can be parsed with for example bs4

The requests_html package (docs) is an official package, distributed by the Python Software Foundation. It has some additional JavaScript capabilities, like for example the ability to wait until the JS of a page has finished loading.

I hope I was able to help someone!


I found a way to that !!!

r = requests.get('https://github.com', timeout=(3.05, 27))

In this, timeout has two values, first one is to set session timeout and the second one is what you need. The second one decides after how much seconds the response is sent. You can calculate the time it takes to populate and then print the data out.