Is list::size() really O(n)?
Recently, I noticed some people mentioning that std::list::size()
has a linear complexity.
According to some sources, this is in fact implementation dependent as the standard doesn't say what the complexity has to be.
The comment in this blog entry says:
Actually, it depends on which STL you are using. Microsoft Visual Studio V6 implements size() as {return (_Size); } whereas gcc (at least in versions 3.3.2 and 4.1.0) do it as { return std::distance(begin(), end()); } The first has constant speed, the second has o(N) speed
- So my guess is that for the VC++ crowd
size()
has constant complexity as Dinkumware probably won't have changed that fact since VC6. Am I right there?
- What does it look like currently in
gcc
? If it is really O(n), why did the developers choose to do so?
In C++11 it is required that for any standard container the .size()
operation must be complete in "constant" complexity (O(1)). (Table 96 — Container requirements). Previously in C++03 .size()
should have constant complexity, but is not required (see Is std::string size() a O(1) operation?).
The change in standard is introduced by n2923: Specifying the complexity of size() (Revision 1).
However, the implementation of .size()
in libstdc++ still uses an O(N) algorithm in gcc up to 4.8:
/** Returns the number of elements in the %list. */
size_type
size() const _GLIBCXX_NOEXCEPT
{ return std::distance(begin(), end()); }
See also Why is std::list bigger on c++11? for detail why it is kept this way.
Update: std::list::size()
is properly O(1) when using gcc 5.0 in C++11 mode (or above).
By the way, the .size()
in libc++ is correctly O(1):
_LIBCPP_INLINE_VISIBILITY
size_type size() const _NOEXCEPT {return base::__sz();}
...
__compressed_pair<size_type, __node_allocator> __size_alloc_;
_LIBCPP_INLINE_VISIBILITY
const size_type& __sz() const _NOEXCEPT
{return __size_alloc_.first();}
Pre-C++11 answer
You are correct that the standard does not state what the complexity of list::size()
must be - however, it does recommend that it "should have constant complexity" (Note A in Table 65).
Here's an interesting article by Howard Hinnant that explains why some people think list::size()
should have O(N) complexity (basically because they believe that O(1) list::size()
makes list::splice()
have O(N) complexity) and why an O(1) list::size()
is be a good idea (in the author's opinion):
- http://howardhinnant.github.io/On_list_size.html
I think the main points in the paper are:
- there are few situations where maintaining an internal count so
list::size()
can be O(1) causes the splice operation to become linear - there are probably many more situations where someone might be unaware of the negative effects that might happen because they call an O(N)
size()
(such as his one example wherelist::size()
is called while holding a lock). - that instead of permitting
size()
be O(N), in the interest of 'least surprise', the standard should require any container that implementssize()
to implement it in an O(1) fashion. If a container cannot do this, it should not implementsize()
at all. In this case, the user of the container will be made aware thatsize()
is unavailable, and if they still want or need to get the number of elements in the container they can still usecontainer::distance( begin(), end())
to get that value - but they will be completely aware that it's an O(N) operation.
I think I tend to agree with most of his reasoning. However, I do not like his proposed addition to the splice()
overloads. Having to pass in an n
that must be equal to distance( first, last)
to get correct behavior seems like a recipe for hard to diagnose bugs.
I'm not sure what should or could be done moving forward, as any change would have a significant impact on existing code. But as it stands, I think that existing code is already impacted - behavior might be rather significantly different from one implementation to another for something that should have been well-defined. Maybe onebyone's comment about having the size 'cached' and marked known/unknown might work well - you get amortized O(1) behavior - the only time you get O(N) behavior is when the list is modified by some splice() operations. The nice thing about this is that it can be done by implementors today without a change to the standard (unless I'm missing something).
As far as I know, C++0x is not changing anything in this area.
I've had to look into gcc 3.4's list::size
before, so I can say this:
- It uses
std::distance(head, tail)
. -
std::distance
has two implementations: for types that satisfy RandomAccessIterator, it uses "tail-head", and for types that merely satisfy InputIterator, it uses an O(n) algorithm relying on "iterator++", counting until it hits the given tail. -
std::list
does not satisfy RandomAccessIterator, so size is O(n).
As to the "why", I can only say that std::list
is appropriate for problems that require sequential access. Storing the size as a class variable would introduce overhead on every insert, delete, etc., and that waste is a big no-no per the intent of the STL. If you really need a constant-time size()
, use std::deque
.