Find everything between two XML tags with RegEx
In RegEx
, I want to find the tag and everything between two XML tags
, like the following:
<primaryAddress>
<addressLine>280 Flinders Mall</addressLine>
<geoCodeGranularity>PROPERTY</geoCodeGranularity>
<latitude>-19.261365</latitude>
<longitude>146.815585</longitude>
<postcode>4810</postcode>
<state>QLD</state>
<suburb>Townsville</suburb>
<type>PHYSICAL</type>
</primaryAddress>
I want to find the tag and everything between primaryAddress
, and erase that.
Everything between the primaryAddress
tag is a variable, but I want to remove the entire tag and sub-tags whenever I get primaryAddress
.
Anyone have any idea how to do that?
Solution 1:
It is not a good idea to use regex for HTML/XML parsing...
However, if you want to do it anyway, search for regex pattern
<primaryAddress>[\s\S]*?<\/primaryAddress>
and replace it with empty string...
Solution 2:
You should be able to match it with: /<primaryAddress>(.+?)<\/primaryAddress>/
The content between the tags will be in the matched group.
Solution 3:
It is not good to use this method but if you really want to split it with regex
<primaryAddress.*>((.|\n)*?)<\/primaryAddress>
the verified answer returns the tags but this just return the value between tags.
Solution 4:
this can capture most outermost layer pair of tags, even with attribute in side or without end tags
(<!--((?!-->).)*-->|<\w*((?!\/<).)*\/>|<(?<tag>\w+)[^>]*>(?>[^<]|(?R))*<\/\k<tag>\s*>)
edit: as mentioned in comment above, regex is always not enough to parse xml, trying to modify the regex to fit more situation only makes it longer but still useless