How do I match any character across multiple lines in a regular expression?
Try this:
((.|\n)*)<FooBar>
It basically says "any character or a newline" repeated zero or more times.
It depends on the language, but there should be a modifier that you can add to the regex pattern. In PHP it is:
/(.*)<FooBar>/s
The s at the end causes the dot to match all characters including newlines.
The question is, can the .
pattern match any character? The answer varies from engine to engine. The main difference is whether the pattern is used by a POSIX or non-POSIX regex library.
A special note about lua-patterns: they are not considered regular expressions, but .
matches any character there, the same as POSIX-based engines.
Another note on matlab and octave: the .
matches any character by default (demo): str = "abcde\n fghij<Foobar>"; expression = '(.*)<Foobar>*'; [tokens,matches] = regexp(str,expression,'tokens','match');
(tokens
contain a abcde\n fghij
item).
Also, in all of boost's regex grammars the dot matches line breaks by default. Boost's ECMAScript grammar allows you to turn this off with regex_constants::no_mod_m
(source).
As for oracle (it is POSIX based), use the n
option (demo): select regexp_substr('abcde' || chr(10) ||' fghij<Foobar>', '(.*)<Foobar>', 1, 1, 'n', 1) as results from dual
POSIX-based engines:
A mere .
already matches line breaks, so there isn't a need to use any modifiers, see bash (demo).
The tcl (demo), postgresql (demo), r (TRE, base R default engine with no perl=TRUE
, for base R with perl=TRUE
or for stringr/stringi patterns, use the (?s)
inline modifier) (demo) also treat .
the same way.
However, most POSIX-based tools process input line by line. Hence, .
does not match the line breaks just because they are not in scope. Here are some examples how to override this:
-
sed - There are multiple workarounds. The most precise, but not very safe, is
sed 'H;1h;$!d;x; s/\(.*\)><Foobar>/\1/'
(H;1h;$!d;x;
slurps the file into memory). If whole lines must be included,sed '/start_pattern/,/end_pattern/d' file
(removing from start will end with matched lines included) orsed '/start_pattern/,/end_pattern/{{//!d;};}' file
(with matching lines excluded) can be considered. -
perl -
perl -0pe 's/(.*)<FooBar>/$1/gs' <<< "$str"
(-0
slurps the whole file into memory,-p
prints the file after applying the script given by-e
). Note that using-000pe
will slurp the file and activate 'paragraph mode' where Perl uses consecutive newlines (\n\n
) as the record separator. -
gnu-grep -
grep -Poz '(?si)abc\K.*?(?=<Foobar>)' file
. Here,z
enables file slurping,(?s)
enables the DOTALL mode for the.
pattern,(?i)
enables case insensitive mode,\K
omits the text matched so far,*?
is a lazy quantifier,(?=<Foobar>)
matches the location before<Foobar>
. -
pcregrep -
pcregrep -Mi "(?si)abc\K.*?(?=<Foobar>)" file
(M
enables file slurping here). Notepcregrep
is a good solution for macOSgrep
users.
See demos.
Non-POSIX-based engines:
-
php - Use the
s
modifier PCRE_DOTALL modifier:preg_match('~(.*)<Foobar>~s', $s, $m)
(demo) -
c# - Use
RegexOptions.Singleline
flag (demo):
-var result = Regex.Match(s, @"(.*)<Foobar>", RegexOptions.Singleline).Groups[1].Value;
-var result = Regex.Match(s, @"(?s)(.*)<Foobar>").Groups[1].Value;
-
powershell - Use the
(?s)
inline option:$s = "abcde`nfghij<FooBar>"; $s -match "(?s)(.*)<Foobar>"; $matches[1]
-
perl - Use the
s
modifier (or(?s)
inline version at the start) (demo):/(.*)<FooBar>/s
-
python - Use the
re.DOTALL
(orre.S
) flags or(?s)
inline modifier (demo):m = re.search(r"(.*)<FooBar>", s, flags=re.S)
(and thenif m:
,print(m.group(1))
) -
java - Use
Pattern.DOTALL
modifier (or inline(?s)
flag) (demo):Pattern.compile("(.*)<FooBar>", Pattern.DOTALL)
-
groovy - Use
(?s)
in-pattern modifier (demo):regex = /(?s)(.*)<FooBar>/
-
scala - Use
(?s)
modifier (demo):"(?s)(.*)<Foobar>".r.findAllIn("abcde\n fghij<Foobar>").matchData foreach { m => println(m.group(1)) }
-
javascript - Use
[^]
or workarounds[\d\D]
/[\w\W]
/[\s\S]
(demo):s.match(/([\s\S]*)<FooBar>/)[1]
-
c++ (
std::regex
) Use[\s\S]
or the JavaScript workarounds (demo):regex rex(R"(([\s\S]*)<FooBar>)");
-
vba vbscript - Use the same approach as in JavaScript,
([\s\S]*)<Foobar>
. (NOTE: TheMultiLine
property of theRegExp
object is sometimes erroneously thought to be the option to allow.
match across line breaks, while, in fact, it only changes the^
and$
behavior to match start/end of lines rather than strings, the same as in JavaScript regex) behavior.) -
ruby - Use the
/m
MULTILINE modifier (demo):s[/(.*)<Foobar>/m, 1]
-
rtrebase-r - Base R PCRE regexps - use
(?s)
:regmatches(x, regexec("(?s)(.*)<FooBar>",x, perl=TRUE))[[1]][2]
(demo) -
ricustringrstringi - in
stringr
/stringi
regex funtions that are powered with the ICU regex engine. Also use(?s)
:stringr::str_match(x, "(?s)(.*)<FooBar>")[,2]
(demo) -
go - Use the inline modifier
(?s)
at the start (demo):re: = regexp.MustCompile(`(?s)(.*)<FooBar>`)
-
swift - Use
dotMatchesLineSeparators
or (easier) pass the(?s)
inline modifier to the pattern:let rx = "(?s)(.*)<Foobar>"
-
objective-c - The same as Swift.
(?s)
works the easiest, but here is how the option can be used:NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionDotMatchesLineSeparators error:®exError];
-
re2, google-apps-script - Use the
(?s)
modifier (demo):"(?s)(.*)<Foobar>"
(in Google Spreadsheets,=REGEXEXTRACT(A2,"(?s)(.*)<Foobar>")
)
NOTES ON (?s)
:
In most non-POSIX engines, the (?s)
inline modifier (or embedded flag option) can be used to enforce .
to match line breaks.
If placed at the start of the pattern, (?s)
changes the bahavior of all .
in the pattern. If the (?s)
is placed somewhere after the beginning, only those .
s will be affected that are located to the right of it unless this is a pattern passed to Python's re
. In Python re
, regardless of the (?s)
location, the whole pattern .
is affected. The (?s)
effect is stopped using (?-s)
. A modified group can be used to only affect a specified range of a regex pattern (e.g., Delim1(?s:.*?)\nDelim2.*
will make the first .*?
match across newlines and the second .*
will only match the rest of the line).
POSIX note:
In non-POSIX regex engines, to match any character, [\s\S]
/ [\d\D]
/ [\w\W]
constructs can be used.
In POSIX, [\s\S]
is not matching any character (as in JavaScript or any non-POSIX engine), because regex escape sequences are not supported inside bracket expressions. [\s\S]
is parsed as bracket expressions that match a single character, \
or s
or S
.
If you're using Eclipse search, you can enable the "DOTALL" option to make '.' match any character including line delimiters: just add "(?s)" at the beginning of your search string. Example:
(?s).*<FooBar>