Bash script to convert from HTML entities to characters

Solution 1:

Try recode (archived page; GitHub mirror; Debian page):

$ echo '<' |recode html..ascii
<

Install on Linux and similar Unix-y systems:

$ sudo apt-get install recode

Install on Mac OS using:

$ brew install recode

Solution 2:

With perl:

cat foo.html | perl -MHTML::Entities -pe 'decode_entities($_);'

With php from the command line:

cat foo.html | php -r 'while(($line=fgets(STDIN)) !== FALSE) echo html_entity_decode($line, ENT_QUOTES|ENT_HTML401);'

Solution 3:

An alternative is to pipe through a web browser -- such as:

echo '&#33;' | w3m -dump -T text/html

This worked great for me in cygwin, where downloading and installing distributions are difficult.

This answer was found here

Solution 4:

Using xmlstarlet:

echo 'hello &lt; world' | xmlstarlet unesc

Solution 5:

A python 3.2+ version:

cat foo.html | python3 -c 'import html, sys; [print(html.unescape(l), end="") for l in sys.stdin]'