How to use PyCharm to debug Scrapy projects
I am working on Scrapy 0.20 with Python 2.7. I found PyCharm has a good Python debugger. I want to test my Scrapy spiders using it. Anyone knows how to do that please?
What I have tried
Actually I tried to run the spider as a script. As a result, I built that script. Then, I tried to add my Scrapy project to PyCharm as a model like this:File->Setting->Project structure->Add content root.
But I don't know what else I have to do
Solution 1:
The scrapy
command is a python script which means you can start it from inside PyCharm.
When you examine the scrapy binary (which scrapy
) you will notice that this is actually a python script:
#!/usr/bin/python
from scrapy.cmdline import execute
execute()
This means that a command like
scrapy crawl IcecatCrawler
can also be executed like this: python /Library/Python/2.7/site-packages/scrapy/cmdline.py crawl IcecatCrawler
Try to find the scrapy.cmdline package.
In my case the location was here: /Library/Python/2.7/site-packages/scrapy/cmdline.py
Create a run/debug configuration inside PyCharm with that script as script. Fill the script parameters with the scrapy command and spider. In this case crawl IcecatCrawler
.
Like this:
Put your breakpoints anywhere in your crawling code and it should work™.
Solution 2:
You just need to do this.
Create a Python file on crawler folder on your project. I used main.py.
- Project
- Crawler
- Crawler
- Spiders
- ...
- main.py
- scrapy.cfg
- Crawler
- Crawler
Inside your main.py put this code below.
from scrapy import cmdline
cmdline.execute("scrapy crawl spider".split())
And you need to create a "Run Configuration" to run your main.py.
Doing this, if you put a breakpoint at your code it will stop there.
Solution 3:
As of 2018.1 this became a lot easier. You can now select Module name
in your project's Run/Debug Configuration
. Set this to scrapy.cmdline
and the Working directory
to the root dir of the scrapy project (the one with settings.py
in it).
Like so:
Now you can add breakpoints to debug your code.
Solution 4:
I am running scrapy in a virtualenv with Python 3.5.0 and setting the "script" parameter to /path_to_project_env/env/bin/scrapy
solved the issue for me.
Solution 5:
intellij idea also work.
create main.py:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#coding=utf-8
import sys
from scrapy import cmdline
def main(name):
if name:
cmdline.execute(name.split())
if __name__ == '__main__':
print('[*] beginning main thread')
name = "scrapy crawl stack"
#name = "scrapy crawl spa"
main(name)
print('[*] main thread exited')
print('main stop====================================================')
show below: