iconv generating UTF-16 with BOM

Solution 1:

No, if you specify the byte ordering, iconv does not insert a BOM.

This is from The Unicode Consortium

Q: How I should deal with BOMs?

A: Here are some guidelines to follow:

  1. A particular protocol (e.g. Microsoft conventions for .txt files) may require use of the BOM on certain Unicode data streams, such as files. When you need to conform to such a protocol, use a BOM.
  2. Some protocols allow optional BOMs in the case of untagged text. In those cases,
    • Where a text data stream is known to be plain text, but of unknown encoding, BOM can be used as a signature. If there is no BOM, the encoding could be anything.
    • Where a text data stream is known to be plain Unicode text (but not which endian), then BOM can be used as a signature. If there is no BOM, the text should be interpreted as big-endian.
  3. Some byte oriented protocols expect ASCII characters at the beginning of a file. If UTF-8 is used with these protocols, use of the BOM as encoding form signature should be avoided.
  4. Where the precise type of the data stream is known (e.g. Unicode big-endian or Unicode little-endian), the BOM should not be used. In particular, whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used.

(my emphasis)

I expect iconv is attempting to be faithful to the last of these guidelines.


Update.

A digression

In my opinion:

  1. An option to specify a BOM would certainly be a useful additional feature for iconv.

  2. A UTF-16LE file without a BOM is usable in Windows, albeit with additional effort sometimes. For example Notepad's File Open dialogue allows you to select "Unicode" which is Microsoft's name for "UTF-16LE" and (unsurprisingly) seems to work on files without a BOM.

  3. I can open a UTF-16LE test file (without BOM) or a UTF-8 test file (without BOM) in Windows Notepad (XP) in the usual way e.g. by double-clicking the file's name in explorer. That seems usable to me. I am aware that sometimes Windows will guess the encoding incorrectly - In which case you have to tell Notepad the encoding when opening the file. This inconvenience means including a BOM is preferable for text files intended for use on Windows.

  4. If a specific application will not work with anything other than a UTF-16LE file with BOM, then I would agree that a UTF-16LE file without BOM is not usable for that specific application.

  5. I suspect that if you can make everything work with UTF-8 (without BOM), that is the best solution in the long term.

However the answer to the question "can I use the iconv command to generate UTF-16 output with a BOM and with specified endianness" is currently "No".

Solution 2:

If you want to add the BOM to a file you can add it manually:

For UTF-8 BOM(EF BB BF)

file='main.cpp'
printf '\xEF\xBB\xBF' > $file.utf8
iconv -f ASCII -t UTF-8 $file >> $file.utf8
mv -v $file.utf8 "converted-$file"

For UTF-16BE BOM(FE FF)

file='main.cpp'
printf '\xFE\xFF' > $file.utf16be
iconv -f ASCII -t UTF-16BE $file >> $file.utf16be
mv -v $file.utf16be "converted-$file"

For UTF-16LE BOM(FF FE)

file='main.cpp'
printf '\xFF\xFE' > $file.utf16le
iconv -f ASCII -t UTF-16LE $file >> $file.utf16le
mv -v $file.utf16le "converted-$file"

Note:

Probably you notice that the BOM in each case is different. You can find more information here: