Who architected / designed C++'s IOStreams, and would it still be considered well-designed by today's standards? [closed]
First off, it may seem that I'm asking for subjective opinions, but that's not what I'm after. I'd love to hear some well-grounded arguments on this topic.
In the hope of getting some insight into how a modern streams / serialization framework ought to be designed, I recently got myself a copy of the book Standard C++ IOStreams and Locales by Angelika Langer and Klaus Kreft. I figured that if IOStreams wasn't well-designed, it wouldn't have made it into the C++ standard library in the first place.
After having read various parts of this book, I am starting to have doubts if IOStreams can compare to e.g. the STL from an overall architectural point-of-view. Read e.g. this interview with Alexander Stepanov (the STL's "inventor") to learn about some design decisions that went into the STL.
What surprises me in particular:
It seems to be unknown who was responsible for IOStreams' overall design (I'd love to read some background information about this — does anyone know good resources?);
Once you delve beneath the immediate surface of IOStreams, e.g. if you want to extend IOStreams with your own classes, you get to an interface with fairly cryptic and confusing member function names, e.g.
getloc
/imbue
,uflow
/underflow
,snextc
/sbumpc
/sgetc
/sgetn
,pbase
/pptr
/epptr
(and there's probably even worse examples). This makes it so much harder to understand the overall design and how the single parts co-operate. Even the book I mentioned above doesn't help that much (IMHO).
Thus my question:
If you had to judge by today's software engineering standards (if there actually is any general agreement on these), would C++'s IOStreams still be considered well-designed? (I wouldn't want to improve my software design skills from something that's generally considered outdated.)
Regarding who designed them, the original library was (not surprisingly) created by Bjarne Stroustrup, and then reimplemented by Dave Presotto. This was then redesigned and reimplemented yet again by Jerry Schwarz for Cfront 2.0, using the idea of manipulators from Andrew Koenig. The standard version of the library is based on this implementation.
Source "The Design & Evolution of C++", section 8.3.1.
Several ill-conceived ideas found their way into the standard: auto_ptr
, vector<bool>
, valarray
and export
, just to name a few. So I wouldn't take the presence of IOStreams necessarily as a sign of quality design.
IOStreams have a checkered history. They are actually a reworking of an earlier streams library, but were authored at a time when many of today's C++ idioms didn't exist, so the designers didn't have the benefit of hindsight. One issue that only became apparent over time was that it is almost impossible to implement IOStreams as efficiently as C's stdio, due to the copious use of virtual functions and forwarding to internal buffer objects at even the finest granularity, and also thanks to some inscrutable strangeness in the way locales are defined and implemented. My memory of this is quite fuzzy, I'll admit; I remember it being the subject of intense debate some years ago, on comp.lang.c++.moderated.
If you had to judge by today's software engineering standards (if there actually is any general agreement on these), would C++'s IOStreams still be considered well-designed? (I wouldn't want to improve my software design skills from something that's generally considered outdated.)
I would say NO, for several reasons:
Poor error handling
Error conditions should be reported with exceptions, not with operator void*
.
The "zombie object" anti-pattern is what causes bugs like these.
Poor separation between formatting and I/O
This makes stream objects unnecessary complex, as they have to contain extra state information for formatting, whether you need it or not.
It also increases the odds of writing bugs like:
using namespace std; // I'm lazy.
cout << hex << setw(8) << setfill('0') << x << endl;
// Oops! Forgot to set the stream back to decimal mode.
If instead, you wrote something like:
cout << pad(to_hex(x), 8, '0') << endl;
There would be no formatting-related state bits, and no problem.
Note that in "modern" languages like Java, C#, and Python, all objects have a toString
/ToString
/__str__
function that is called by the I/O routines. AFAIK, only C++ does it the other way around by using stringstream
as the standard way of converting to a string.
Poor support for i18n
Iostream-based output splits string literals into pieces.
cout << "My name is " << name << " and I am " << occupation << " from " << hometown << endl;
Format strings put whole sentences into string literals.
printf("My name is %s and I am %s from %s.\n", name, occupation, hometown);
The latter approach is easier to adapt to internationalization libraries like GNU gettext, because the use of whole sentences provides more context for the translators. If your string formatting routine supports re-ordering (like the POSIX $
printf parameters), then it also better handles differences in word order between languages.
I'm posting this as a separate answer because it is pure opinion.
Performing input & output (particularly input) is a very, very hard problem, so not surprisingly the iostreams library is full of bodges and things that with perfect hindsight could have been done better. But it seems to me that all I/O libraries, in whatever language are like this. I've never used a programming language where the I/O system was a thing of beauty that made me stand in awe of its designer. The iostreams library does have advantages, particularly over the C I/O library (extensibility, type-safety etc.), but I don't think anyone is holding it up as an example of great OO or generic design.
My opinion of C++ iostreams has improved substantially over time, particularly after I started to actually extend them by implementing my own stream classes. I began to appreciate the extensibility and overall design, despite the ridiculously poor member function names like xsputn
or whatever. Regardless, I think I/O streams are a massive improvement over C stdio.h, which has no type safety and is riddled with major security flaws.
I think the main problem with IO streams is that they conflate two related but somewhat orthogonal concepts: textual formatting and serialization. On the one hand, IO streams are designed to produce a human-readable, formatted textual representation of an object, and on the other hand, to serialize an object into a portable format. Sometimes these two goals are one and the same, but other times this results in some seriously annoying incongruities. For example:
std::stringstream ss;
std::string output_string = "Hello world";
ss << output_string;
...
std::string input_string;
ss >> input_string;
std::cout << input_string;
Here, what we get as input is not what we originally outputted to the stream. This is because the <<
operator outputs the entire string, whereas the >>
operator will only read from the stream until it encounters a whitespace character, since there's no length information stored in the stream. So even though we output a string object containing "hello world", we're only going to input a string object containing "hello". So while the stream has served its purpose as a formatting facility, it has failed to properly serialize and then unserialize the object.
You might say that IO streams weren't designed to be serialization facilities, but if that's the case, what are input streams really for? Besides, in practice I/O streams are often used to serialize objects, because there are no other standard serialization facilities. Consider boost::date_time
or boost::numeric::ublas::matrix
, where if you output a matrix object with the <<
operator, you'll get the same exact matrix when you input it using the >>
operator. But in order to accomplish this, the Boost designers had to store column count and row count information as textual data in the output, which compromises the actual human-readable display. Again, an awkward combination of textual formatting facilities and serialization.
Note how most other languages separate these two facilities. In Java, for example, formatting is accomplished through the toString()
method, while serialization is accomplished through the Serializable
interface.
In my opinion, the best solution would have been to introduce byte based streams, alongside the standard character based streams. These streams would operate on binary data, with no concern for human-readable formatting/display. They could be used solely as serialization/deserialization facilities, to translate C++ objects into portable byte sequences.