Sed replace between 2 strings with special character

I have an XML file containing a code and to use it with xmllink I need to remove a link.

XML file containing:

<xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PackingList xmlns="Link to somewhere#">
<morecode></morecode>

Using sed 'sed s/PackingList.*\>/PackingList/g' xmlfile gives me the following result (on the 2nd line):

<PackingList#">

while it should be

<PackingList>

What am I doing wrong?


Solution 1:

Three things wrong:-

  • The first quote in the sed command should be before the s/ option, not before sed itself - I presume this is a typing error.
  • The > character has no special meaning in regular expressions, and must not be escaped - the sequence \> has special significance: it means end of word, and because .* is "greedy" it matches the end of the last word on the line, hence the retention of the #".
  • If you match the source >, this will be included in the string to be replaced, so it must also appear in the replacement string.

So your edit command should be:

sed 's/PackingList.*>/PackingList>/g' xmlfile

This is similar to jherran's solution, but takes account of your original attempt at matching. It might be neater to match up to the trailing double-quote:

sed 's/PackingList.*"/PackingList/g' xmlfile

If you don't want to rely on greediness (and make it more readable), use:

sed 's/PackingList.*".*"/PackingList/g' xmlfile

Note that any subsequent XML tags on the same line may be deleted by any of the above: to avoid this, use:

sed 's/PackingList[^>]*"[^>]*"/PackingList/g' xmlfile

Solution 2:

Try this way:

sed 's/PackingList.*/PackingList>/g' xmlfile