iconv generating UTF-16 with BOM
Solution 1:
No, if you specify the byte ordering, iconv
does not insert a BOM.
This is from The Unicode Consortium
Q: How I should deal with BOMs?
A: Here are some guidelines to follow:
- A particular protocol (e.g. Microsoft conventions for .txt files) may require use of the BOM on certain Unicode data streams, such as files. When you need to conform to such a protocol, use a BOM.
- Some protocols allow optional BOMs in the case of untagged text. In those cases,
- Where a text data stream is known to be plain text, but of unknown encoding, BOM can be used as a signature. If there is no BOM, the encoding could be anything.
- Where a text data stream is known to be plain Unicode text (but not which endian), then BOM can be used as a signature. If there is no BOM, the text should be interpreted as big-endian.
- Some byte oriented protocols expect ASCII characters at the beginning of a file. If UTF-8 is used with these protocols, use of the BOM as encoding form signature should be avoided.
- Where the precise type of the data stream is known (e.g. Unicode big-endian or Unicode little-endian), the BOM should not be used. In particular, whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used.
(my emphasis)
I expect iconv
is attempting to be faithful to the last of these guidelines.
Update.
A digression
In my opinion:
An option to specify a BOM would certainly be a useful additional feature for iconv.
A UTF-16LE file without a BOM is usable in Windows, albeit with additional effort sometimes. For example Notepad's File Open dialogue allows you to select "Unicode" which is Microsoft's name for "UTF-16LE" and (unsurprisingly) seems to work on files without a BOM.
I can open a UTF-16LE test file (without BOM) or a UTF-8 test file (without BOM) in Windows Notepad (XP) in the usual way e.g. by double-clicking the file's name in explorer. That seems usable to me. I am aware that sometimes Windows will guess the encoding incorrectly - In which case you have to tell Notepad the encoding when opening the file. This inconvenience means including a BOM is preferable for text files intended for use on Windows.
If a specific application will not work with anything other than a UTF-16LE file with BOM, then I would agree that a UTF-16LE file without BOM is not usable for that specific application.
I suspect that if you can make everything work with UTF-8 (without BOM), that is the best solution in the long term.
However the answer to the question "can I use the iconv command to generate UTF-16 output with a BOM and with specified endianness" is currently "No".
Solution 2:
If you want to add the BOM to a file you can add it manually:
For UTF-8 BOM(EF BB BF)
file='main.cpp'
printf '\xEF\xBB\xBF' > $file.utf8
iconv -f ASCII -t UTF-8 $file >> $file.utf8
mv -v $file.utf8 "converted-$file"
For UTF-16BE BOM(FE FF)
file='main.cpp'
printf '\xFE\xFF' > $file.utf16be
iconv -f ASCII -t UTF-16BE $file >> $file.utf16be
mv -v $file.utf16be "converted-$file"
For UTF-16LE BOM(FF FE)
file='main.cpp'
printf '\xFF\xFE' > $file.utf16le
iconv -f ASCII -t UTF-16LE $file >> $file.utf16le
mv -v $file.utf16le "converted-$file"
Note:
Probably you notice that the BOM in each case is different. You can find more information here: