How to give delay between each requests in scrapy?
I don't want to crawl simultaneously and get blocked. I would like to send one request per second.
There is a setting for that:
DOWNLOAD_DELAY
Default:
0
The amount of time (in secs) that the downloader should wait before downloading consecutive pages from the same website. This can be used to throttle the crawling speed to avoid hitting servers too hard.
DOWNLOAD_DELAY = 0.25 # 250 ms of delay
Read the docs: https://doc.scrapy.org/en/latest/index.html
You can also set 'download_delay' attribute on spider if you don't want a global download delay. See http://doc.scrapy.org/en/latest/faq.html#what-does-the-response-status-code-999-means
class S(Spider):
rate = 1
def __init__(self):
self.download_delay = 1/float(self.rate)
rate sets a maximum amount of pages could be downloaded in one second.
Beside DOWNLOAD_DELAY, you can also use AUTOTHROTTLE feature of scrapy, https://doc.scrapy.org/en/latest/topics/autothrottle.html
It changes delay amount between requests depending on settings file. If you set 1 for both start and max delay, it will wait 1 second in each request.
It's original purpose is to vary delay time so detection of your bot will be harder.
You just need to set it in settings.py as follows:
AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_START_DELAY = 1
AUTOTHROTTLE_MAX_DELAY = 3