Regex: To pull out a sub-string between two tags in a string
I have a file in the following format:
Data Data Data [Start] Data I want [End] Data
I'd like to grab the Data I want
from between the [Start]
and [End]
tags using a Regex. Can anyone show me how this might be done?
Solution 1:
\[start\](.*?)\[end\]
Zhich'll put the text in the middle within a capture.
Solution 2:
\[start\]\s*(((?!\[start\]|\[end\]).)+)\s*\[end\]
This should hopefully drop the [start]
and [end]
markers as well.
Solution 3:
$text ="Data Data Data start Data i want end Data";
($content) = $text =~ m/ start (.*) end /;
print $content;
I had a similar problem for a while & I can tell you this method works...
Solution 4:
A more complete discussion of the pitfalls of using a regex to find matching tags can be found at: http://faq.perl.org/perlfaq4.html#How_do_I_find_matchi. In particular, be aware that nesting tags really need a full-fledged parser in order to be interpreted correctly.
Note that case sensitivity will need to be turned off in order to answer the question as stated. In perl, that's the i modifier:
$ echo "Data Data Data [Start] Data i want [End] Data" \
| perl -ne '/\[start\](.*?)\[end\]/i; print "$1\n"'
Data i want
The other trick is to use the *? quantifier which turns off the greediness of the captured match. For instance, if you have a non-matching [end] tag:
Data Data [Start] Data i want [End] Data [end]
you probably don't want to capture:
Data i want [End] Data