What is the difference between @include and @match in userscripts?

The GreaseSpot page on metadata blocks says that the two are very similar but @match "sets more strict rules on what the * character means." GreaseSpot then proceeds to teach using @include, but Chrome examples like this generally seem to use @match and indicate that @include is only supported for compatibility purposes; @match is preferred.

Apparently, @include google.* can run on google.evil.com while @match google.* cannot.
That one example is not sufficient to really see how the wildcards behave differently between these two, and better explanations are sought in answers here.

New GreaseMonkey scripts (Firefox) use @include by default while new TamperMonkey scripts (for e.g. Chrome) use @match by default.

What exactly are the differences between these two?

For example, how does each one handle wildcards?
Are there differences in cross-browser compatibility?
What reasons would someone have for choosing to use one over the other?


Solution 1:

You cannot use regular expressions with @match, while you can with @include.

However, @include will give your users scarier security warnings about the script applying to all sites.

This is even though an @include expression permits you to be more restrictive about the sites a script applies to (e.g. specifying that part of a URL be numeric using the regex fragment [0-9]+, or using ^https?:// to apply to a script just those two schemes, instead of the more general non-regex globbing operator * used for each of those cases in @match, which causes the script to apply more broadly).

Solution 2:

TL;DR: Rigidity

The most important difference is that @match is more more rigid and restrictive than @include, and aims to be the more secure alternative. For this reason, @include may also generate scarier warnings to the end user; it's also a little more complicated to use overall, depending on how you look at it.

The practical usage of the two can actually vary drastically; the full breakdown of usage for each follows below.


@include (and @exclude)

@include is probably the directive which most people are more familiar with (along with its opposing twin, @exclude, which has exactly the same syntax features). This is the more powerful directive of the two, largely because it can handle RegEx patterns (this also means it generates scarier warnings). Its usage is also the most straightforward of the two.

Modes

You can specify patterns in two ways/ "modes":

Glob Mode

Asterisks * can be used as a wildcard glob, that is, to signify any amount of characters, including zero.

For example:

  • @include http://www.example.com/foo/*:
    • Matches http://www.example.com/foo/ and http://www.example.com/foo/bar
    • Does not match http://www.example.com/baz

There's also a special pattern available to specifically match any top-level domain suffix: .tld.

A pattern like @include https://www.example.tld/* will match the given domain with any top-level domain suffix, such as .com, .org, or .co.uk.

Regular Expression Mode

@include directives that start with a / character will be interpreted as a regular expression, with all standard JavaScript RegEx features available:

// ==UserScript==
// @include     /^https?://www\.example\.com/.*$/
// @include     /^http://www\.example\.(?:org|net)//
// ==/UserScript==

A few notes:

  • Due to JavaScript's RegEx interpretation, forward slashes / are not required to be escaped inside expressions.
  • Other special characters still need to be escaped.
  • @include patterns are always treated as case-insensitive.
  • Expressions not ending with the EOL token $ will implicitly allow trailing characters on matches.
    • In other words, the expression is treated as if it ended with .*.
    • @include /^https?://www\.google\.com/search/ will match https://www.google.com/search?q=stackoverflow.

Warnings

Keep in mind that the powerful nature of @include means that a browser cannot guarantee the target of a given script as well as it can with a script using @match; this means that scripts using @include may trigger more dire and severe warnings for the user.

A common reason given to not use @include involves URL fragments (portion of a URL following the hash # character), and how a malicious actor could abuse them to execute a script on an undesirable page (Eg. @include http://*.example.com/ could match www.evil.com#www.example.com/), as @match ignores fragments by design.

While this attack is still theoretically possible, it's worth bearing in mind that some userscript managers (including Tampermonkey) purposely ignore fragments for matching purposes altogether, even in @include directives.


@match

The @match directive is a creation of Google for Chrome, designed as a safer, more sandboxed version of the @include directive, with much more rigidity built-in.

Instead of allowing globs or RegEx, @match interprets a pattern as 3 parts: the scheme, the host, and the path. Google's documentation describes the basic syntax this way:

<url-pattern> := <scheme>://<host><path>
<scheme> := '*' | 'http' | 'https' | 'file' | 'ftp' | 'urn'
<host> := '*' | '*.' <any char except '/' and '*'>+
<path> := '/' <any chars>

Each part of the pattern carries its own caveats, and also interprets wildcards * differently.

Scheme

The scheme portion of the URL pattern must exactly match one of the supported schemes (which seems to depend on the browser), or the wildcard *.

  • In Chrome, that's: http, https, file, ftp, or urn.
  • In Firefox, that appears to be http, https, file, ftp, ws, wss, data, or (chrome-)extension.

In this part of the pattern, wildcard * matches exclusively http or https (MDN mentions that it may also match WebSocket schemes ws and wss in some browsers).

Host

The host portion of the URL pattern can come in three styles:

  • Fully explicit: www.stackoverflow.com
  • Subdomain wildcard: *.stackoverflow.com
  • Fully wildcard: *

The top-level domain suffix cannot be a wildcard (eg. www.stackoverflow.*); this is disallowed for security reasons. In order to match multiple TLD suffixes, a script will need to include a specific @match directive for each.

Path

The path portion of the URL pattern is the most permissive of the 3, as the only rule is that it must start with a forward slash /. The rest can be any combination of characters and wildcards.

In this section, wildcards * act as a standard glob operator, simply matching 0 or more characters.

The value that gets matched against the path portion of the pattern is officially the URL path plus the URL query string (eg. In google.com/search?q=test, the query string is q=test), including the ? between. This is a potential pitfall for patterns that aim to match the end of a given domain, since they may be foiled by an added query string.

Also note that the path does not include URL fragments (the part of the URL at the end that follows a hash #, eg. www.example.com#main); @match directives ignore URL fragments by design to prevent abuse of unintentional matches.


A Word of Caution

It's fairly obvious, but it bears repeating that scripts should be careful to @include exactly and exclusively the URLs that the script is intended to be run on. Runaway scripts can range from undetectable, minor annoyances to major problems; always double check that scripts are running only where they're supposed to be, and use @exclude to add guardrails if necessary or convenient.