Why my program to scrape NSE website gets blocked in servers but works in local?
Solution 1:
I stumbled into the same problem. I do not know the proper pythonic solution with the python-requests module. There is a high chance NSE just blocks it.
So here is a pythonic solution that will work. It looks lame but I'm using it without digging deep -
import subprocess
import os
os.chdir(os.path.dirname(os.path.abspath(__file__)))
subprocess.Popen('curl "https://www.nseindia.com/api/quote-derivative?symbol=BANKNIFTY" -H "authority: beta.nseindia.com" -H "cache-control: max-age=0" -H "dnt: 1" -H "upgrade-insecure-requests: 1" -H "user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36" -H "sec-fetch-user: ?1" -H "accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" -H "sec-fetch-site: none" -H "sec-fetch-mode: navigate" -H "accept-encoding: gzip, deflate, br" -H "accept-language: en-US,en;q=0.9,hi;q=0.8" --compressed -o maxpain.txt', shell=True)
f=open("maxpain.txt","r")
var=f.read()
print(var)
It basically runs the curl function and sends the output to a file and read the file back. That's it.
Solution 2:
There are 2 things that are to be noted.
- Request header needs to have 'host' and 'user-agent'
__request_headers = {
'Host':'www.nseindia.com',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language':'en-US,en;q=0.5',
'Accept-Encoding':'gzip, deflate, br',
'DNT':'1',
'Connection':'keep-alive',
'Upgrade-Insecure-Requests':'1',
'Pragma':'no-cache',
'Cache-Control':'no-cache',
}
- Following cookies are dynamically set, which needs to be fetched and set dynamically.
'nsit',
'nseappid',
'ak_bmsc'
These are set from nse based on the functionality that is being used. This example: top gainers / losers. I tried to get top gainers and losers list, in which the request is blocked without these cookies.
try:
nse_url = 'https://www.nseindia.com/market-data/top-gainers-loosers'
url = 'https://www.nseindia.com/api/live-analysis-variations?index=gainers'
resp = requests.get(url=nse_url, headers=__request_headers)
if resp.ok:
req_cookies = dict(nsit=resp.cookies['nsit'], nseappid=resp.cookies['nseappid'], ak_bmsc=resp.cookies['ak_bmsc'])
tresp = requests.get(url=url, headers=__request_headers, cookies=req_cookies)
result = tresp.json()
res_data = result["NIFTY"]["data"] if "NIFTY" in result and "data" in result["NIFTY"] else []
if res_data != None and len(res_data) > 0:
__top_list = res_data
except OSError as err:
logger.error('Unable to fetch data')
Another thing to be noted is that these requests are blocked by NSE from most of the cloud VMs like AWS, GCP. I was able to get it from personal windows machine, but not from AWS or GCP.