Why is the code in most STL implementations so convoluted?
The STL is a critical piece of the C++ world, most implementations derive from the initial efforts by Stepanov and Musser.
My question is given the criticality of the code, and it being one of the primary sources for people to view examples of well written C++ for both awe and learning purposes: Why are the various implementations of the STL so disgusting to look at - convoluted and generally good examples of how not to write C++ code from an aesthetics point of view.
The code examples below would not pass code-review at the places I've worked at for reasons varying from variable naming, layout, macros and uses of operators that require more than a simple glance to figure out what is actually occurring.
template<class _BidIt> inline
bool _Next_permutation(_BidIt _First, _BidIt _Last)
{ // permute and test for pure ascending, using operator<
_BidIt _Next = _Last;
if (_First == _Last || _First == --_Next)
return (false);
for (; ; )
{ // find rightmost element smaller than successor
_BidIt _Next1 = _Next;
if (_DEBUG_LT(*--_Next, *_Next1))
{ // swap with rightmost element that's smaller, flip suffix
_BidIt _Mid = _Last;
for (; !_DEBUG_LT(*_Next, *--_Mid); )
;
_STD iter_swap(_Next, _Mid);
_STD reverse(_Next1, _Last);
return (true);
}
if (_Next == _First)
{ // pure descending, flip all
_STD reverse(_First, _Last);
return (false);
}
}
}
_Ty operator()()
{ // return next value
static _Ty _Zero = 0; // to quiet diagnostics
_Ty _Divisor = (_Ty)_Mx;
_Prev = _Mx ? ((_Ity)_Ax * _Prev + (_Ty)_Cx) % _Divisor
: ((_Ity)_Ax * _Prev + (_Ty)_Cx);
if (_Prev < _Zero)
_Prev += (_Ty)_Mx;
return (_Prev);
}
Please note I'm not critiquing the interface, as it is very well designed and applicable. What I'm concerned about is the readability of the implementation details.
A similar questions have been previously posed:
Is there a readable implementation of the STL
Why STL implementation is so unreadable? How C++ could have been improved here?
Note: The code presented above is taken from MSVC 2010 algorithm and queue headers.
About the variables names, library implementors must use "crazy" naming conventions, such as names starting with an underscore followed by an uppercase letter, because such names are reserved for them. They cannot use "normal" names, because those may have been redefined by a user macro.
Section 17.6.3.3.2 "Global names" §1 states:
Certain sets of names and function signatures are always reserved to the implementation:
Each name that contains a double underscore or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
(Note that these rules forbid header guards like __MY_FILE_H
which I have seen quite often.)
Neil Butterworth, now listed as "anon", provided a useful link in his answer to the SO question "Is there a readable implementation of the STL?". Quoting his answer there:
There is a book The C++ Standard Template Library, co-authored by the original STL designers Stepanov & Lee (together with P.J. Plauger and David Musser), which describes a possible implementation, complete with code - see http://www.amazon.co.uk/C-Standard-Template-Library/dp/0134376331.
See also the other answers in that thread.
Anyway, most of the STL code (by STL I here mean the STL-like subset of the C++ standard library) is template code, and as such must be header-only, and since it's used in almost every program it pays to have that code as short as possible.
Thus, the natural trade-off point between conciseness and readability is much farther over on the conciseness end of the scale than with "normal" code.
In addition, the standard library is where the system-independent view of application code is connected to the underlying system, utilizing all kinds of compiler-specific things that you as an application developer should best stay away from.
Variable names for the reason that this is standard library code, and it should use reserved names for implementation details in headers. The following should not break the standard libraries:
#define mid
#include <algorithm>
So standard library headers can't use mid
as a variable name, hence _Mid
. The STL was different - it wasn't part of the language specification, it was defined as "here are some headers, use them as you will"
Your code or mine, on the other hand, would be invalid if it used _Mid
as a variable name since that's a reserved name - the implementation is allowed to do:
#define _Mid
if it feels like it.
Layout - meh. They probably have a style guide, they probably follow it, more or less. The fact that it doesn't match my style guide (and hence would fail my code review) is nothing to them.
Operators that are difficult to work out - difficult to whom? Code should be written for the people who maintain it, and GNU/Dinkumware/whoever probably don't want to let people loose on the standard libraries who can't puzzle out *--_Next
at a glance. If you use that sort of expression, you get used to it, and if you don't you'll continue finding it hard.
I will give, you, though, that operator()
overload is gibberish. [Edit: I get it, it's a linear congruential generator, done very generically, and if the modulus is "0" that means just use the natural wraparound of the arithmetic type.]
Implementations vary. libc++ for example, is much easier on the eyes. There's still a bit of underscore noise though. As others have noted, the leading underscores are unfortunately required. Here's the same function in libc++:
template <class _Compare, class _BidirectionalIterator>
bool
__next_permutation(_BidirectionalIterator __first, _BidirectionalIterator __last, _Compare __comp)
{
_BidirectionalIterator __i = __last;
if (__first == __last || __first == --__i)
return false;
while (true)
{
_BidirectionalIterator __ip1 = __i;
if (__comp(*--__i, *__ip1))
{
_BidirectionalIterator __j = __last;
while (!__comp(*__i, *--__j))
;
swap(*__i, *__j);
_STD::reverse(__ip1, __last);
return true;
}
if (__i == __first)
{
_STD::reverse(__first, __last);
return false;
}
}
}