How to write a std::string to a UTF-8 text file

I just want to write some few simple lines to a text file in C++, but I want them to be encoded in UTF-8. What is the easiest and simple way to do so?


Solution 1:

The only way UTF-8 affects std::string is that size(), length(), and all the indices are measured in bytes, not characters.

And, as sbi points out, incrementing the iterator provided by std::string will step forward by byte, not by character, so it can actually point into the middle of a multibyte UTF-8 codepoint. There's no UTF-8-aware iterator provided in the standard library, but there are a few available on the 'Net.

If you remember that, you can put UTF-8 into std::string, write it to a file, etc. all in the usual way (by which I mean the way you'd use a std::string without UTF-8 inside).

You may want to start your file with a byte order mark so that other programs will know it is UTF-8.

Solution 2:

There is nice tiny library to work with utf8 from c++: utfcpp

Solution 3:

libiconv is a great library for all our encoding and decoding needs.

If you are using Windows you can use WideCharToMultiByte and specify that you want UTF8.

Solution 4:

What is the easiest and simple way to do so?

The most intuitive and thus easiest handling of utf8 in C++ is for sure using a drop-in replacement for std::string. As the internet still lacks of one, I went to implement the functionality on my own:

tinyutf8 (EDIT: now Github).

This library provides a very lightweight drop-in preplacement for std::string (or std::u32string if you will, because you iterate over codepoints rather that chars). Ity is implemented succesfully in the middle between fast access and small memory consumption, while being very robust. This robustness to 'invalid' UTF8-sequences makes it (nearly completely) compatible with ANSI (0-255).

Hope this helps!