How can I easily convert HTML special entities from a standard input stream in Linux?

CentOS

Is there an easy way to convert HTML special entities from a data stream? I'm passing data to a bash script and sometimes that data includes special entities. For example:

"test" & test $test ! test @ # $ % ^ & *

I'm not sure why some characters show up fine and other don't but unfortunately, I don't have control over the data coming in.

I'm thinking I might be able to use SED here but that seems like it would be cumbersome and possibly prone to false positives. Is there a Linux command I could pipe to that specializes in decoding this type of data?


PHP is well suited to this. This example requires PHP 5:

cat file.html | php -R 'echo html_entity_decode($argn);'

Perl is (as always) your friend. I think this will do it:

perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;'

E.g.:

echo '"test" & test $test ! test @ # $ % ^ & *' |perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;'

With output:

someguy@somehost ~]$ echo '"test" & test $test ! test @ # $ % ^ & *' |perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;'
"test" & test $test ! test @ # $ % ^ & *

recode seems available on default packages repositories of main GNU/Linux distributions. E.g. to decode HTML entities into UTF-8 :

…|recode html..utf8

With Python 3:

python3 -c 'import html,sys; print(html.unescape(sys.stdin.read()), end="")' < file.html