How can I do a batch conversion of HTML entities to Hanzi?
I have a huge .txt file which contains lots of HTML entities representing Unicode characters, like this:
哀牢山
In Pinyin, this would read "Ai Lao Shan" or "Ai1 Lao2 Shan1", to be more precise.
I need a tool or command line or Pages/Numbers macro, whatever, which replaces all strings like &#....;
in said file into proper Hanzi, which in this case would be:
哀牢山
Any suggestions for a tool or script or program that runs on macOS?
Solution 1:
You can install recode via the Terminal with Homebrew:
brew install recode
and then use it to convert HTML to Unicode, like this:
echo '哀牢山' | recode html..utf8
This produces
哀牢山
(inspired by @creving's answer on Stack Overflow)