Split string by delimiter, but not if it is escaped
How can I split a string by a delimiter, but not if it is escaped? For example, I have a string:
1|2\|2|3\\|4\\\|4
The delimiter is |
and an escaped delimiter is \|
. Furthermore I want to ignore escaped backslashes, so in \\|
the |
would still be a delimiter.
So with the above string the result should be:
[0] => 1
[1] => 2\|2
[2] => 3\\
[3] => 4\\\|4
Use dark magic:
$array = preg_split('~\\\\.(*SKIP)(*FAIL)|\|~s', $string);
\\\\.
matches a backslash followed by a character, (*SKIP)(*FAIL)
skips it and \|
matches your delimiter.
Instead of split(...)
, it's IMO more intuitive to use some sort of "scan" function that operates like a lexical tokenizer. In PHP that would be the preg_match_all
function. You simply say you want to match:
- something other than a
\
or|
- or a
\
followed by a\
or|
- repeat #1 or #2 at least once
The following demo:
$input = "1|2\\|2|3\\\\|4\\\\\\|4";
echo $input . "\n\n";
preg_match_all('/(?:\\\\.|[^\\\\|])+/', $input, $parts);
print_r($parts[0]);
will print:
1|2\|2|3\\|4\\\|4
Array
(
[0] => 1
[1] => 2\|2
[2] => 3\\
[3] => 4\\\|4
)
Recently I devised a solution:
$array = preg_split('~ ((?<!\\\\)|(?<=[^\\\\](\\\\\\\\)+)) \| ~x', $string);
But the black magic solution is still three times faster.