"SSL: certificate_verify_failed" error when scraping https://www.thenewboston.com/
Solution 1:
The problem is not in your code but in the web site you are trying to access. When looking at the analysis by SSLLabs you will note:
This server's certificate chain is incomplete. Grade capped to B.
This means that the server configuration is wrong and that not only python but several others will have problems with this site. Some desktop browsers work around this configuration problem by trying to load the missing certificates from the internet or fill in with cached certificates. But other browsers or applications will fail too, similar to python.
To work around the broken server configuration you might explicitly extract the missing certificates and add them to you trust store. Or you might give the certificate as trust inside the verify argument. From the documentation:
You can pass verify the path to a CA_BUNDLE file or directory with certificates of trusted CAs:
>>> requests.get('https://github.com', verify='/path/to/certfile')
This list of trusted CAs can also be specified through the REQUESTS_CA_BUNDLE environment variable.
Solution 2:
You can tell requests not to verify the SSL certificate:
>>> url = "https://www.thenewboston.com/forum/category.php?id=15&orderby=recent&page=1"
>>> response = requests.get(url, verify=False)
>>> response.status_code
200
See more in the requests
doc
Solution 3:
You are probably missing the stock certificates in your system. E.g. if running on Ubuntu, check that ca-certificates
package is installed.