requests.get returns 403 while the same url works in browser
I'm trying to use the search form at rlsnet.ru. Here is the form's definition I've extracted from the source file:
<form id="site_search_form" action="/search_result.htm" method="get">
<input id="simplesearch_text_input" class="search__field" type="text" name="word" value="" autocomplete="off">
<input type="hidden" name="path" value="/" id="path">
<input type="hidden" name="enter_clicked" value="1">
<input id="letters_id" type="hidden" name="letters" value="">
<input type="submit" class="g-btn search__btn" value="Найти" id="simplesearch_button">
<div class="sf_suggestion">
<ul style="display: none; z-index:1000; opacity:0.85;">
</ul>
</div>
<div id="contentsf">
</div>
</form>
Here is the code I used to send the search request:
import requests
from urllib.parse import urlencode
root = "http://www.rlsnet.ru/search_result.htm?"
response = requests.get(root + urlencode({"word": "Церебролизин".encode('cp1251')})
Each time I do it, the response status is 403. When I enter the same request URL (i.e. http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED
) into Safari/Chrome/Opera, it works fine and returns the expected page. What am I doing wrong? Googling the issue only brought this SO question: why url works in browser but not using requests get method, which was of little use.
Well that's because default User-Agent
of requests
is python-requests/2.13.0
, and in your case that website don't like traffic from "non-browsers", so they try to block such traffic.
>>> import requests
>>> session = requests.Session()
>>> session.headers
{'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.13.0'}
All you need to do is to make the request appear like coming from a browser, so just add an extra header
parameter:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'} # This is chrome, you can set whatever browser you like
response = requests.get('http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED', headers=headers)
print response.status_code
print response.url
200
http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED