How to make urllib2 requests through Tor in Python?
I'm trying to crawl websites using a crawler written in Python. I want to integrate Tor with Python meaning I want to crawl the site anonymously using Tor.
I tried doing this. It doesn't seem to work. I checked my IP it is still the same as the one before I used tor. I checked it via python.
import urllib2
proxy_handler = urllib2.ProxyHandler({"tcp":"http://127.0.0.1:9050"})
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
You are trying to connect to a SOCKS port - Tor rejects any non-SOCKS traffic. You can connect through a middleman - Privoxy - using Port 8118.
Example:
proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
opener = urllib2.build_opener(proxy_support)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
print opener.open('http://www.google.com').read()
Also please note properties passed to ProxyHandler, no http prefixing the ip:port
pip install PySocks
Then:
import socket
import socks
import urllib2
ipcheck_url = 'http://checkip.amazonaws.com/'
# Actual IP.
print(urllib2.urlopen(ipcheck_url).read())
# Tor IP.
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9050)
socket.socket = socks.socksocket
print(urllib2.urlopen(ipcheck_url).read())
Using just urllib2.ProxyHandler
as in https://stackoverflow.com/a/2015649/895245 fails with:
Tor is not an HTTP Proxy
Mentioned at: How can I use a SOCKS 4/5 proxy with urllib2?
Tested on Ubuntu 15.10, Tor 0.2.6.10, Python 2.7.10.