Using request in python to download a xls file
In this page you will find a link to download an xls file (below attachment or adjuntos): https://www.banrep.gov.co/es/emisiones-vigentes-el-dcv
The link to download the xls file is: https://www.banrep.gov.co/sites/default/files/paginas/emisiones/EMISIONES.xls
I was using this code to automatically download that file:
import requests
import os
path = os.path.abspath(os.getcwd()) #donde se descargará el archivo
path = path.replace("\\", '/')+'/'
url = 'https://www.banrep.gov.co/sites/default/files/paginas/emisiones/EMISIONES.xls'
myfile = requests.get(url, verify=False)
open(path+'EMISIONES.xls', 'wb').write(myfile.content)
This code was working well, but suddently the downloaded file started being corrupted.
If I run the code, it raises this warning:
InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.banrep.gov.co'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
The error is related to how your request is being built. The status_code returned by the request is 403 [Forbiden]. You can see it typing
myfile.status_code
I guess the security issue is related to cookies and headers in your get request, because of that I suggest you take a view on how the webpage is building its headers in your request before the URL you're using is sent.
TIP: start you web browser in development mode and using Network tab, try to identify the headers.
To solve the issue of cookies take a view on how to retrieve naturally cookies pointing out to a previous webpage in www.banrep.gov.co, using requests.sessions
session_ = requests.Session()
Before coding you could try to test your requests using Postman, or other REST API test software.