Selenium app redirect to Cloudflare page when hosted on Heroku
I have made a discord bot that uses selenium to access a website and get information, when I run my code locally I don't have any problem but when I deploy to Heroku the first URL I get redirects me to the page Attention Required! | Cloudflare
.
I have tried:
- Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
And many other with the same settings which I use:
options = Options()
options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")
options.add_experimental_option("excludeSwitches", ["enable-logging", "enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--headless")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
self.driver = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), options=options)
self.driver.execute_cdp_cmd('Network.setUserAgentOverride', {
"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
but this does not work and the code runs only locally
PS: locally I'm on Windows
Source of the page I'm redirected to: https://gist.github.com/rafalou38/9ae95bd66e86d2171fc8a45cebd9720c
In case the Selenium driven ChromeDriver initiated google-chrome Browsing Context is getting redirected to the page...
... this implies that a Cloudflare program is blocking your program from accessing the AUT (Application under Test).
Analysis
There can be several reasons behind Cloudflare blocking the access as follows:
- Cloudflare have idenified your program asa bot and the access is denied. You can find a detailed discussion in Can a website detect when you are using selenium with chromedriver?.
The access can be denied due to the following factors:
- Cloudflare is trying to counter a possible Dictionary attack.
- Your system IP is black listed by Cloudflare for mining Bit coins or Monero coins using your system.
In these cases eventually you are redirected to a captcha page.
Solution
In these cases the a potential solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context.
undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.
-
Code Block:
import undetected_chromedriver as uc from selenium import webdriver options = webdriver.ChromeOptions() options.add_argument("start-maximized") driver = uc.Chrome(options=options) driver.get('https://bet365.com')
Alternate Solution
An alternate solution would be to whitelist your IP address through the Project Honey Pot website and you can find the end-to-end process detailed out in the video tittled Attention Required one more step captcha CloudFlare Error.
I know it is not an actual solution, but sometimes Cloudflare blocks you by your location using your IP address. My code was working perfectly in my local server, but not in Heroku.
Turns out that the code was right using the solution provided by DebanjanB. The issue is that Heroku's server is running in a different country than mine. I confirmed this by asking a friend that lives in another country to try to get into the website with a phone. Cloudflare blocked my friend asking for a captcha.
I still haven't solve this. I'm not an expert and the workaround seems complicated. I guess a proxy could solve it??
I'll update if I get around it.
I used "undetected_chromedriver" and the following setup worked for me:
Used the buildpacks:
- https://github.com/heroku/heroku-buildpack-google-chrome
- https://github.com/heroku/heroku-buildpack-chromedriver
Added the config vars:
- CHROMEDRIVER_PATH=/app/.chromedriver/bin/chromedriver
- GOOGLE_CHROME_BIN=/app/.apt/usr/bin/google-chrome
Code snippet:
import undetected_chromedriver as uc
from selenium import webdriver
import os
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = uc.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), options=options)