Why is sed not working?
I have some HTML that I am trying to extract links from. Right now the file looks like this.
website.com/path/to/file/234432517.gif" width="620">
website.com/path/to/file/143743e53.gif" width="620">
website.com/path/to/file/123473232.gif" width="620">
website.com/path/to/file/634132317.gif" width="620">
website.com/path/to/file/432432173.gif" width="620">
I am trying to use sed to remove the " width="620">
from all the lines. Here is my sed code:
sudo sed -i "s/\"\swidth\=\"\d+\"\>//g" output
Why is this not working? everything I google leads to some code that looks like this but this does not work for some reason.
Because you are using PCRE (Perl Compatible Regular Expressions) syntax and sed
doesn't understand that, it uses Basic Regular Expressions (BRE) by default. It knows neither \s
nor \d
. You are also escaping all sorts of things that don't need to be escaped (neither the \=
nor the \>
are doing anything useful) while not escaping things that do need to be escaped (+
just means the symbol +
in BRE, you need \+
for "one or more".
This should do what you need:
sed 's/" width="[0-9]\+">//g' file
Or, using Extended Regular Expressions:
sed -E 's/"\s*width="[0-9]+">//g' file
Finally, as a general rule you never use sed -i
without first testing without the -i
to be sure it works or, if you do, at least use -i.bak
(-i
with any text will do this) to create a backup.
Here is my sed
solution:
sed -E 's/(.*)" width="[0-9]+">/\1/' filename
And as an alternative to the sed
I suggest using grep
to extract data from a file:
This would work for you:
grep -o "website.*\.gif" filename
And as terdon suggested, here is a look ahead solution using grep
:
grep -Po '.*(?="\swidth="\d*">)' filename
Also cut
is a good option in your situation:
cut -f1 -d'"' filename