Tool to convert accented characters to HTML entities?
Is there a tool (command-line is fine) that can convert accented characters to HTML entities in Ubuntu? Preferably recursively and without also converting html/php tags.
e.g.
from: é
to: é
or: é
Solution 1:
Recode can convert to HTML entities:
$ echo "é" | recode ..html
é
There are a few slightly different HTML transformations available in recode; see info recode HTML
.
If you want to recode a file or some files, you can use
$ recode ..html one_file another_file and so on
For recursive action, use the find
command, e.g.
$ find your_directory -type f -name "*.html"
The above find command will only show the files. Please make sure that you have found only the right files, not any binaries and not any files in unwanted directories. It is also a good idea to make a backup or use a copy of your files, not the real files. If you have found the correct find command, append -exec your_command {} +
, where your_command is the recode ..html
from above and the {}
denotes the file(s) which are given by find to recode:
$ find your_directory -type f -name "*.html" -exec recode ..html {} +
But wait a moment, there's one big caveat: recode ..html
assumes that your input files are in the same character set (encoding) that you are using on the command line. If all of your files use the "modern" UTF-8, it will work fine, because Ubuntu used UTF-8 from the standard. But if some of your files use the older ISO-8859-1 or other charsets, it will be a lot more complicated.