C++ streams confusion: istreambuf_iterator vs istream_iterator?
What is the difference between istreambuf_iterator
and istream_iterator
?
And in general what is the difference between streams and streambufs?
I really can't find any clear explanation for this so decided to ask here.
IOstreams use streambufs to as their source / target of input / output. Effectively, the streambuf-family does all the work regarding IO and the IOstream-family is only used for formatting and to-string / from-string transformation.
Now, istream_iterator
takes a template argument that says what the unformatted string-sequence from the streambuf should be formatted as, like istream_iterator<int>
will interpret (whitespace-delimited) all incoming text as int
s.
On the other hand, istreambuf_iterator
only cares about the raw characters and iterates directly over the associated streambuf of the istream
that it gets passed.
Generally, if you're only interested in the raw characters, use an istreambuf_iterator
. If you're interested in the formatted input, use an istream_iterator
.
All of what I said also applies to ostream_iterator
and ostreambuf_iterator
.
Here's a really badly kept secret: an iostream per se, has almost nothing to do with actually reading or writing from/to a file on your computer.
An iostream basically acts as a "matchmaker" between a streambuf and a locale:
The iostream stores some state about how conversions should be done (e.g., the current width and precision for a conversion). It uses those to direct the locale how and where to do a conversion (e.g., convert this number to a string in that buffer with width 8 and precision 5).
Although you didn't ask directly about it, the locale in its turn is really just a container--but (for rather an oddity) a typesafe heterogeneous container. The things it contains are facet
s. A facet object defines a single facet of an overall locale. The standard defines a number of facets for everything from reading and writing numbers (num_get
, num_put
) to classifying characters (the ctype facet).
By default, a stream will use the "C" locale. This is pretty basic--numbers are just converted as a stream of digits, the only things it recognizes as letters are the 26 lower case and 26 upper case English letters, and so on. You can, however, imbue
a stream with a different locale of your choice. You can choose locales to use by names specified in strings. One that's particularly interesting is one that's selected by an empty string. Using an empty string basically tells the runtime library to select the locale it "thinks" is the most suitable, usually based on how the user has configured the operating system. This allows code to deal with data in a localized format without being written explicitly for any particular locale.
So, the basic difference between an istream_iterator
and an istreambuf_iterator
is that the data coming out of an istreambuf_iterator hasn't gone through (most of the) transformations done by the locale, but data coming out of an istream_iterator
has been transformed by the locale.
For what it's worth, that "most of the" in the previous paragraph is referring to the fact that when you read data from an istreambuf (via an iterator or otherwise) one little bit of locale-based transformation is done: along with various "formatting" kinds of things, the locale contains a codecvt facet, which is what's used to convert from some external representation to some internal representation (e.g., UTF-8 to UTF-32).
It may make more sense to ignore the fact that they're both stored in a locale, and think only of the individual facets involved:
So that's the real difference between a istream_iterator
and an istreambuf_iterator
. A little bit of transformation is (at least potentially) done to the data from either one, but substantially less is done to the data coming from an istreambuf_iterator
.