Regex: To pull out a sub-string between two tags in a string

I have a file in the following format:

Data Data
Data
[Start]
Data I want
[End]
Data

I'd like to grab the Data I want from between the [Start] and [End] tags using a Regex. Can anyone show me how this might be done?


Solution 1:

\[start\](.*?)\[end\]

Zhich'll put the text in the middle within a capture.

Solution 2:

\[start\]\s*(((?!\[start\]|\[end\]).)+)\s*\[end\]

This should hopefully drop the [start] and [end] markers as well.

Solution 3:

$text ="Data Data Data start Data i want end Data";
($content) = $text =~ m/ start (.*) end /;
print $content;

I had a similar problem for a while & I can tell you this method works...

Solution 4:

A more complete discussion of the pitfalls of using a regex to find matching tags can be found at: http://faq.perl.org/perlfaq4.html#How_do_I_find_matchi. In particular, be aware that nesting tags really need a full-fledged parser in order to be interpreted correctly.

Note that case sensitivity will need to be turned off in order to answer the question as stated. In perl, that's the i modifier:

$ echo "Data Data Data [Start] Data i want [End] Data" \
  | perl -ne '/\[start\](.*?)\[end\]/i; print "$1\n"'
 Data i want 

The other trick is to use the *? quantifier which turns off the greediness of the captured match. For instance, if you have a non-matching [end] tag:

Data Data [Start] Data i want [End] Data [end]

you probably don't want to capture:

 Data i want [End] Data