What is the difference between @include and @match in userscripts?
The GreaseSpot page on metadata blocks says that the two are very similar but @match
"sets more strict rules on what the *
character means." GreaseSpot then proceeds to teach using @include
, but Chrome examples like this generally seem to use @match
and indicate that @include
is only supported for compatibility purposes; @match
is preferred.
Apparently, @include google.*
can run on google.evil.com while @match google.*
cannot.
That one example is not sufficient to really see how the wildcards behave differently between these two, and better explanations are sought in answers here.
New GreaseMonkey scripts (Firefox) use @include
by default while new TamperMonkey scripts (for e.g. Chrome) use @match
by default.
What exactly are the differences between these two?
For example, how does each one handle wildcards?
Are there differences in cross-browser compatibility?
What reasons would someone have for choosing to use one over the other?
Solution 1:
You cannot use regular expressions with @match
, while you can with @include
.
However, @include
will give your users scarier security warnings about the script applying to all sites.
This is even though an @include
expression permits you to be more restrictive about the sites a script applies to (e.g. specifying that part of a URL be numeric using the regex fragment [0-9]+
, or using ^https?://
to apply to a script just those two schemes, instead of the more general non-regex globbing operator *
used for each of those cases in @match
, which causes the script to apply more broadly).
Solution 2:
TL;DR: Rigidity
The most important difference is that @match
is more more rigid and restrictive than @include
, and aims to be the more secure alternative. For this reason, @include
may also generate scarier warnings to the end user; it's also a little more complicated to use overall, depending on how you look at it.
The practical usage of the two can actually vary drastically; the full breakdown of usage for each follows below.
@include
(and @exclude
)
@include
is probably the directive which most people are more familiar with (along with its opposing twin, @exclude
, which has exactly the same syntax features). This is the more powerful directive of the two, largely because it can handle RegEx patterns (this also means it generates scarier warnings). Its usage is also the most straightforward of the two.
Modes
You can specify patterns in two ways/ "modes":
Glob Mode
Asterisks *
can be used as a wildcard glob, that is, to signify any amount of characters, including zero.
For example:
-
@include http://www.example.com/foo/*
:- Matches
http://www.example.com/foo/
andhttp://www.example.com/foo/bar
- Does not match
http://www.example.com/baz
- Matches
There's also a special pattern available to specifically match any top-level domain suffix: .tld
.
A pattern like @include https://www.example.tld/*
will match the given domain with any top-level domain suffix, such as .com
, .org
, or .co.uk
.
Regular Expression Mode
@include
directives that start with a /
character will be interpreted as a regular expression, with all standard JavaScript RegEx features available:
// ==UserScript==
// @include /^https?://www\.example\.com/.*$/
// @include /^http://www\.example\.(?:org|net)//
// ==/UserScript==
A few notes:
- Due to JavaScript's RegEx interpretation, forward slashes
/
are not required to be escaped inside expressions. - Other special characters still need to be escaped.
-
@include
patterns are always treated as case-insensitive. - Expressions not ending with the EOL token
$
will implicitly allow trailing characters on matches.- In other words, the expression is treated as if it ended with
.*
. -
@include /^https?://www\.google\.com/search/
will matchhttps://www.google.com/search?q=stackoverflow
.
- In other words, the expression is treated as if it ended with
Warnings
Keep in mind that the powerful nature of @include
means that a browser cannot guarantee the target of a given script as well as it can with a script using @match
; this means that scripts using @include
may trigger more dire and severe warnings for the user.
A common reason given to not use @include
involves URL fragments (portion of a URL following the hash #
character), and how a malicious actor could abuse them to execute a script on an undesirable page (Eg. @include http://*.example.com/
could match www.evil.com#www.example.com/
), as @match
ignores fragments by design.
While this attack is still theoretically possible, it's worth bearing in mind that some userscript managers (including Tampermonkey) purposely ignore fragments for matching purposes altogether, even in @include
directives.
@match
The @match
directive is a creation of Google for Chrome, designed as a safer, more sandboxed version of the @include
directive, with much more rigidity built-in.
Instead of allowing globs or RegEx, @match
interprets a pattern as 3 parts: the scheme, the host, and the path. Google's documentation describes the basic syntax this way:
<url-pattern> := <scheme>://<host><path>
<scheme> := '*' | 'http' | 'https' | 'file' | 'ftp' | 'urn'
<host> := '*' | '*.' <any char except '/' and '*'>+
<path> := '/' <any chars>
Each part of the pattern carries its own caveats, and also interprets wildcards *
differently.
Scheme
The scheme portion of the URL pattern must exactly match one of the supported schemes (which seems to depend on the browser), or the wildcard *
.
- In Chrome, that's:
http
,https
,file
,ftp
, orurn
. -
In Firefox, that appears to be
http
,https
,file
,ftp
,ws
,wss
,data
, or (chrome-
)extension
.
In this part of the pattern, wildcard *
matches exclusively http
or https
(MDN mentions that it may also match WebSocket schemes ws
and wss
in some browsers).
Host
The host portion of the URL pattern can come in three styles:
- Fully explicit:
www.stackoverflow.com
- Subdomain wildcard:
*.stackoverflow.com
- Fully wildcard:
*
The top-level domain suffix cannot be a wildcard (eg. www.stackoverflow.*
); this is disallowed for security reasons. In order to match multiple TLD suffixes, a script will need to include a specific @match
directive for each.
Path
The path portion of the URL pattern is the most permissive of the 3, as the only rule is that it must start with a forward slash /
. The rest can be any combination of characters and wildcards.
In this section, wildcards *
act as a standard glob operator, simply matching 0 or more characters.
The value that gets matched against the path portion of the pattern is officially the URL path plus the URL query string (eg. In google.com/search?q=test
, the query string is q=test
), including the ?
between. This is a potential pitfall for patterns that aim to match the end of a given domain, since they may be foiled by an added query string.
Also note that the path does not include URL fragments (the part of the URL at the end that follows a hash #
, eg. www.example.com#main
); @match
directives ignore URL fragments by design to prevent abuse of unintentional matches.
A Word of Caution
It's fairly obvious, but it bears repeating that scripts should be careful to @include
exactly and exclusively the URLs that the script is intended to be run on. Runaway scripts can range from undetectable, minor annoyances to major problems; always double check that scripts are running only where they're supposed to be, and use @exclude
to add guardrails if necessary or convenient.