Why doesn't requests.get() return? What is the default timeout that requests.get() uses?
In my script, requests.get
never returns:
import requests
print ("requesting..")
# This call never returns!
r = requests.get(
"http://www.some-site.com",
proxies = {'http': '222.255.169.74:8080'},
)
print(r.ok)
What could be the possible reason(s)? Any remedy? What is the default timeout that get
uses?
What is the default timeout that get uses?
The default timeout is None
, which means it'll wait (hang) until the connection is closed.
Just specify a timeout value, like this:
r = requests.get(
'http://www.justdial.com',
proxies={'http': '222.255.169.74:8080'},
timeout=5
)
From requests documentation:
You can tell Requests to stop waiting for a response after a given number of seconds with the timeout parameter:
>>> requests.get('http://github.com', timeout=0.001) Traceback (most recent call last): File "<stdin>", line 1, in <module> requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)
Note:
timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds).
It happens a lot to me that requests.get() takes a very long time to return even if the timeout
is 1 second. There are a few way to overcome this problem:
1. Use the TimeoutSauce
internal class
From: https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896
import requests from requests.adapters import TimeoutSauce class MyTimeout(TimeoutSauce): def __init__(self, *args, **kwargs): if kwargs['connect'] is None: kwargs['connect'] = 5 if kwargs['read'] is None: kwargs['read'] = 5 super(MyTimeout, self).__init__(*args, **kwargs) requests.adapters.TimeoutSauce = MyTimeout
This code should cause us to set the read timeout as equal to the connect timeout, which is the timeout value you pass on your Session.get() call. (Note that I haven't actually tested this code, so it may need some quick debugging, I just wrote it straight into the GitHub window.)
2. Use a fork of requests from kevinburke: https://github.com/kevinburke/requests/tree/connect-timeout
From its documentation: https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst
If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:
r = requests.get('https://github.com', timeout=(3.05, 27))
NOTE: The change has since been merged to the main Requests project.
3. Using evenlet
or signal
as already mentioned in the similar question:
Timeout for python requests.get entire response
I wanted a default timeout easily added to a bunch of code (assuming that timeout solves your problem)
This is the solution I picked up from a ticket submitted to the repository for Requests.
credit: https://github.com/kennethreitz/requests/issues/2011#issuecomment-477784399
The solution is the last couple of lines here, but I show more code for better context. I like to use a session for retry behaviour.
import requests
import functools
from requests.adapters import HTTPAdapter,Retry
def requests_retry_session(
retries=10,
backoff_factor=2,
status_forcelist=(500, 502, 503, 504),
session=None,
) -> requests.Session:
session = session or requests.Session()
retry = Retry(
total=retries,
read=retries,
connect=retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
# set default timeout
for method in ('get', 'options', 'head', 'post', 'put', 'patch', 'delete'):
setattr(session, method, functools.partial(getattr(session, method), timeout=30))
return session
then you can do something like this:
requests_session = requests_retry_session()
r = requests_session.get(url=url,...