PHP Regex: How to match \r and \n without using [\r\n]?
I have tested \v
(vertical white space) for matching \r\n
and their combinations, but I found out that \v
does not match \r
and \n
. Below is my code that I am using..
$string = "
Test
";
if (preg_match("#\v+#", $string )) {
echo "Matched";
} else {
echo "Not Matched";
}
To be more clear, my question is, is there any other alternative to match \r\n
?
Solution 1:
PCRE and newlines
PCRE has a superfluity of newline related escape sequences and alternatives.
Well, a nifty escape sequence that you can use here is \R
. By default \R
will match Unicode newlines sequences, but it can be configured using different alternatives.
To match any Unicode newline sequence that is in the ASCII
range.
preg_match('~\R~', $string);
This is equivalent to the following group:
(?>\r\n|\n|\r|\f|\x0b|\x85)
To match any Unicode newline sequence; including newline characters outside the ASCII
range and both the line separator (U+2028
) and paragraph separator (U+2029
), you want to turn on the u
(unicode) flag.
preg_match('~\R~u', $string);
The u
(unicode) modifier turns on additional functionality of PCRE and Pattern strings are treated as (UTF-8).
The is equivalent to the following group:
(?>\r\n|\n|\r|\f|\x0b|\x85|\x{2028}|\x{2029})
It is possible to restrict \R
to match CR
, LF
, or CRLF
only:
preg_match('~(*BSR_ANYCRLF)\R~', $string);
The is equivalent to the following group:
(?>\r\n|\n|\r)
Additional
Five different conventions for indicating line breaks in strings are supported:
(*CR) carriage return
(*LF) linefeed
(*CRLF) carriage return, followed by linefeed
(*ANYCRLF) any of the three above
(*ANY) all Unicode newline sequences
Note: \R
does not have special meaning inside of a character class. Like other unrecognized escape sequences, it is treated as the literal character "R" by default.
Solution 2:
This doesn't answer the question for alternatives, because \v
works perfectly well
\v
matches any character considered vertical whitespace; this includes the platform's carriage return and line feed characters (newline) plus several other characters, all listed in the table below.
You only need to change "#\v+#"
to either
-
"#\\v+#"
escape the backslash
or
-
'#\v+#'
use single quotes
In both cases, you will get a match for any combination of \r
and \n
.
Update:
Just to make the scope of \v
clear in comparison to \R
, from perlrebackslash
- \R
\R
matches a generic newline; that is, anything considered a linebreak sequence by Unicode. This includes all characters matched by\v
(vertical whitespace), ...
Solution 3:
If there is some strange requirement that prevents you from using a literal [\r\n]
in your pattern, you can always use hexadecimal escape sequences instead:
preg_match('#[\xD\xA]+#', $string)
This is pattern is equivalent to [\r\n]+
.
Solution 4:
To match every LINE of a given String, simple use the ^$
Anchors and advice your regex engine to operate in multi-line mode. Then ^$
will match the start and end of each line, instead of the whole strings start and end.
http://php.net/manual/en/reference.pcre.pattern.modifiers.php
in PHP, that would be the m
modifier after the pattern. /^(.*?)$/m
will simple match each line, seperated by any vertical space inside the given string.
Btw: For line-Splitting, you could also use split()
and the PHP_EOL
constant:
$lines = explode(PHP_EOL, $string);