Ideal method for sending multiple HTTP requests over Python? [duplicate]
Possible Duplicate:
Multiple (asynchronous) connections with urllib2 or other http library?
I am working on a Linux web server that runs Python code to grab realtime data over HTTP from a 3rd party API. The data is put into a MySQL database. I need to make a lot of queries to a lot of URL's, and I need to do it fast (faster = better). Currently I'm using urllib3 as my HTTP library. What is the best way to go about this? Should I spawn multiple threads (if so, how many?) and have each query for a different URL? I would love to hear your thoughts about this - thanks!
If a lot is really a lot than you probably want use asynchronous io not threads.
requests + gevent = grequests
GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily.
import grequests
urls = [
'http://www.heroku.com',
'http://tablib.org',
'http://httpbin.org',
'http://python-requests.org',
'http://kennethreitz.com'
]
rs = (grequests.get(u) for u in urls)
grequests.map(rs)
You should use multithreading as well as pipelining requests. For example search->details->save
The number of threads you can use doesn't depend on your equipment only. How many requests the service can serve? How many concurrent requests does it allow to run? Even your bandwidth can be a bottleneck.
If you're talking about a kind of scraping - the service could block you after certain limit of requests, so you need to use proxies or multiple IP bindings.
As for me, in the most cases, I can run 50-300 concurrent requests on my laptop from python scripts.