Can a website detect when you are using Selenium with chromedriver?
Basically, the way the Selenium detection works, is that they test for predefined JavaScript variables which appear when running with Selenium. The bot detection scripts usually look anything containing word "selenium" / "webdriver" in any of the variables (on window object), and also document variables called $cdc_
and $wdc_
. Of course, all of this depends on which browser you are on. All the different browsers expose different things.
For me, I used Chrome, so, all that I had to do was to ensure that $cdc_
didn't exist anymore as a document variable, and voilà (download chromedriver source code, modify chromedriver and re-compile $cdc_
under different name.)
This is the function I modified in chromedriver:
File call_function.js:
function getPageCache(opt_doc) {
var doc = opt_doc || document;
//var key = '$cdc_asdjflasutopfhvcZLmcfl_';
var key = 'randomblabla_';
if (!(key in doc))
doc[key] = new Cache();
return doc[key];
}
(Note the comment. All I did I turned $cdc_
to randomblabla_
.)
Here is pseudocode which demonstrates some of the techniques that bot networks might use:
runBotDetection = function () {
var documentDetectionKeys = [
"__webdriver_evaluate",
"__selenium_evaluate",
"__webdriver_script_function",
"__webdriver_script_func",
"__webdriver_script_fn",
"__fxdriver_evaluate",
"__driver_unwrapped",
"__webdriver_unwrapped",
"__driver_evaluate",
"__selenium_unwrapped",
"__fxdriver_unwrapped",
];
var windowDetectionKeys = [
"_phantom",
"__nightmare",
"_selenium",
"callPhantom",
"callSelenium",
"_Selenium_IDE_Recorder",
];
for (const windowDetectionKey in windowDetectionKeys) {
const windowDetectionKeyValue = windowDetectionKeys[windowDetectionKey];
if (window[windowDetectionKeyValue]) {
return true;
}
};
for (const documentDetectionKey in documentDetectionKeys) {
const documentDetectionKeyValue = documentDetectionKeys[documentDetectionKey];
if (window['document'][documentDetectionKeyValue]) {
return true;
}
};
for (const documentKey in window['document']) {
if (documentKey.match(/\$[a-z]dc_/) && window['document'][documentKey]['cache_']) {
return true;
}
}
if (window['external'] && window['external'].toString() && (window['external'].toString()['indexOf']('Sequentum') != -1)) return true;
if (window['document']['documentElement']['getAttribute']('selenium')) return true;
if (window['document']['documentElement']['getAttribute']('webdriver')) return true;
if (window['document']['documentElement']['getAttribute']('driver')) return true;
return false;
};
According to user szx, it is also possible to simply open chromedriver.exe in a hex editor, and just do the replacement manually, without actually doing any compiling.
Replacing cdc_
string
You can use vim
or perl
to replace the cdc_
string in chromedriver
. See answer by @Erti-Chris Eelmaa to learn more about that string and how it's a detection point.
Using vim
or perl
prevents you from having to recompile source code or use a hex-editor.
Make sure to make a copy of the original chromedriver
before attempting to edit it.
Our goal is to alter the cdc_
string, which looks something like $cdc_lasutopfhvcZLmcfl
.
The methods below were tested on chromedriver version 2.41.578706
.
Using Vim
vim /path/to/chromedriver
After running the line above, you'll probably see a bunch of gibberish. Do the following:
- Replace all instances of
cdc_
withdog_
by typing:%s/cdc_/dog_/g
.-
dog_
is just an example. You can choose anything as long as it has the same amount of characters as the search string (e.g.,cdc_
), otherwise thechromedriver
will fail.
-
- To save the changes and quit, type
:wq!
and pressreturn
.- If you need to quit without saving changes, type
:q!
and pressreturn
.
- If you need to quit without saving changes, type
Using Perl
The line below replaces all cdc_
occurrences with dog_
. Credit to Vic Seedoubleyew:
perl -pi -e 's/cdc_/dog_/g' /path/to/chromedriver
Make sure that the replacement string (e.g., dog_
) has the same number of characters as the search string (e.g., cdc_
), otherwise the chromedriver
will fail.
Wrapping Up
To verify that all occurrences of cdc_
were replaced:
grep "cdc_" /path/to/chromedriver
If no output was returned, the replacement was successful.
Go to the altered chromedriver
and double click on it. A terminal window should open up. If you don't see killed
in the output, you've successfully altered the driver.
Make sure that the name of the altered chromedriver
binary is chromedriver
, and that the original binary is either moved from its original location or renamed.
My Experience With This Method
I was previously being detected on a website while trying to log in, but after replacing cdc_
with an equal sized string, I was able to log in. Like others have said though, if you've already been detected, you might get blocked for a plethora of other reasons even after using this method. So you may have to try accessing the site that was detecting you using a VPN, different network, etc.
As we've already figured out in the question and the posted answers, there is an anti Web-scraping and a Bot detection service called "Distil Networks" in play here. And, according to the company CEO's interview:
Even though they can create new bots, we figured out a way to identify Selenium the a tool they’re using, so we’re blocking Selenium no matter how many times they iterate on that bot. We’re doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious.
It'll take time and additional challenges to understand how exactly they are detecting Selenium, but what can we say for sure at the moment:
- it's not related to the actions you take with selenium - once you navigate to the site, you get immediately detected and banned. I've tried to add artificial random delays between actions, take a pause after the page is loaded - nothing helped
- it's not about browser fingerprint either - tried it in multiple browsers with clean profiles and not, incognito modes - nothing helped
- since, according to the hint in the interview, this was "reverse engineering", I suspect this is done with some JS code being executed in the browser revealing that this is a browser automated via selenium webdriver
Decided to post it as an answer, since clearly:
Can a website detect when you are using selenium with chromedriver?
Yes.
Also, what I haven't experimented with is older selenium and older browser versions - in theory, there could be something implemented/added to selenium at a certain point that Distil Networks bot detector currently relies on. Then, if this is the case, we might detect (yeah, let's detect the detector) at what point/version a relevant change was made, look into changelog and changesets and, may be, this could give us more information on where to look and what is it they use to detect a webdriver-powered browser. It's just a theory that needs to be tested.
A lot have been analyzed and discussed about a website being detected being driven by Selenium controlled ChromeDriver. Here are my two cents:
According to the article Browser detection using the user agent serving different webpages or services to different browsers is usually not among the best of ideas. The web is meant to be accessible to everyone, regardless of which browser or device an user is using. There are best practices outlined to develop a website to progressively enhance itself based on the feature availability rather than by targeting specific browsers.
However, browsers and standards are not perfect, and there are still some edge cases where some websites still detects the browser and if the browser is driven by Selenium controled WebDriver. Browsers can be detected through different ways and some commonly used mechanisms are as follows:
- Implementing captcha / recaptcha to detect the automatic bots.
You can find a relevant detailed discussion in How does recaptcha 3 know I'm using selenium/chromedriver?
- Detecting the term HeadlessChrome within headless Chrome UserAgent
You can find a relevant detailed discussion in Access Denied page with headless Chrome on Linux while headed Chrome works on windows using Selenium through Python
- Using Bot Management service from Distil Networks
You can find a relevant detailed discussion in Unable to use Selenium to automate Chase site login
- Using Bot Manager service from Akamai
You can find a relevant detailed discussion in Dynamic dropdown doesn't populate with auto suggestions on https://www.nseindia.com/ when values are passed using Selenium and Python
- Using Bot Protection service from Datadome
You can find a relevant detailed discussion in Website using DataDome gets captcha blocked while scraping using Selenium and Python
However, using the user-agent to detect the browser looks simple but doing it well is in fact a bit tougher.
Note: At this point it's worth to mention that: it's very rarely a good idea to use user agent sniffing. There are always better and more broadly compatible way to address a certain issue.
Considerations for browser detection
The idea behind detecting the browser can be either of the following:
- Trying to work around a specific bug in some specific variant or specific version of a webbrowser.
- Trying to check for the existence of a specific feature that some browsers don't yet support.
- Trying to provide different HTML depending on which browser is being used.
Alternative of browser detection through UserAgents
Some of the alternatives of browser detection are as follows:
- Implementing a test to detect how the browser implements the API of a feature and determine how to use it from that. An example was Chrome unflagged experimental lookbehind support in regular expressions.
- Adapting the design technique of Progressive enhancement which would involve developing a website in layers, using a bottom-up approach, starting with a simpler layer and improving the capabilities of the site in successive layers, each using more features.
- Adapting the top-down approach of Graceful degradation in which we build the best possible site using all the features we want and then tweak it to make it work on older browsers.
Solution
To prevent the Selenium driven WebDriver from getting detected, a niche approach would include either/all of the below mentioned approaches:
-
Rotating the UserAgent in every execution of your Test Suite using
fake_useragent
module as follows:from selenium import webdriver from selenium.webdriver.chrome.options import Options from fake_useragent import UserAgent options = Options() ua = UserAgent() userAgent = ua.random print(userAgent) options.add_argument(f'user-agent={userAgent}') driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe') driver.get("https://www.google.co.in") driver.quit()
You can find a relevant detailed discussion in Way to change Google Chrome user agent in Selenium?
-
Rotating the UserAgent in each of your Tests using
Network.setUserAgentOverride
throughexecute_cdp_cmd()
as follows:from selenium import webdriver driver = webdriver.Chrome(executable_path=r'C:\WebDrivers\chromedriver.exe') print(driver.execute_script("return navigator.userAgent;")) # Setting user agent as Chrome/83.0.4103.97 driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'}) print(driver.execute_script("return navigator.userAgent;"))
You can find a relevant detailed discussion in How to change the User Agent using Selenium and Python
-
Changing the property value of
navigator
for webdriver toundefined
as follows:driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", { "source": """ Object.defineProperty(navigator, 'webdriver', { get: () => undefined }) """ })
You can find a relevant detailed discussion in Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
- Changing the values of
navigator.plugins
,navigator.languages
, WebGL, hairline feature, missing image, etc.
You can find a relevant detailed discussion in Is there a version of selenium webdriver that is not detectable?
- Changing the conventional Viewport
You can find a relevant detailed discussion in How to bypass Google captcha with Selenium and python?
Dealing with reCAPTCHA
While dealing with 2captcha and recaptcha-v3 rather clicking on checkbox associated to the text I'm not a robot, it may be easier to get authenticated extracting and using the data-sitekey
.
You can find a relevant detailed discussion in How to identify the 32 bit data-sitekey of ReCaptcha V2 to obtain a valid response programmatically using Selenium and Python Requests?
tl; dr
You can find a cutting edge solution to evade webdriver detection in:
- selenium-stealth - a proven way to evade webdriver detection
Example of how it's implemented on wellsfargo.com:
try {
if (window.document.documentElement.getAttribute("webdriver")) return !+[]
} catch (IDLMrxxel) {}
try {
if ("_Selenium_IDE_Recorder" in window) return !+""
} catch (KknKsUayS) {}
try {
if ("__webdriver_script_fn" in document) return !+""