Create Files with Different Character Encodings from Z-Shell

I'm trying to understand character encoding better. To experiment, I want to take a string of characters and encode in different ways. Does zsh prompt have a way to create files that use specific character encodings? For example, I'm trying to create files that use the following character encodings:

  • ASCII
  • Unicode
  • UTF32
  • UTF8

I'd like to see the same string of characters encoded in each of these encodings to compare and contrast them. Thank you.


As a shell, zsh mostly doesn't deal with such things directly – but you can do it by running other programs through zsh. (Although, sure, zsh in particular is very rich in built-in functions unlike most other shells, but character encoding conversion doesn't seem to be one of them.)

To convert character encodings within the shell, usually the iconv tool is used – the -f option specifies what encoding to convert from; -t the target encoding; and data is read from stdin. For example:

echo "Here are some arrows 🠈, 🠊, 🠉, 🠋" > text_in_utf8.txt

iconv -f utf-8 -t utf-16 < text_in_utf8.txt > text_in_utf16.txt

(Usually the system locale is set to UTF-8, so anything you type directly into the shell – e.g. the echo in the above example – will also end up being UTF-8. But make sure to check it using locale charmap.)

Other character encodings such as iso8859-1 or ibm437 are also available; see iconv -l for a list. (Note that "Unicode" is just the abstract character set but not an encoding in itself. When you see "Unicode" as an encoding in Windows, that actually means UTF-16, or UCS-2 in very old versions).

There are also other ways to experiment with character encodings. For example, in Python you can .encode() a Unicode string into bytes using a specific encoding, or .decode() bytes back to a Unicode string:

$ python
>>> text = "🠈 🠊 🠉 🠋"
>>> utf8bytes = text.encode("utf-8")
>>> utf16bytes = text.encode("utf-16")
>>> ["%02x" % b for b in utf8bytes]
['f0', '9f', 'a0', '88', '20', 'f0', '9f', 'a0', '8a', '20',
 'f0', '9f', 'a0', '89', '20', 'f0', '9f', 'a0', '8b']

(In this case, I do mean "Unicode string" in the abstract sense, unlike the earlier note.)