Does resizing a vector invalidate iterators?

I found that this C++ code:

vector<int> a;
a.push_back(1);
a.push_back(2);
vector<int>::iterator it = a.begin();
a.push_back(4);
cout << *it;

print some big random number; but if you add a.push_back(3) between 3rd and 4th lines, it will print 1. Can you explain it to me?


Solution 1:

Edited with more careful wording

yes, resizing a vector might invalidate all iterators pointing into the vector.

The vector is implemented by internally allocating an array where the data is stored. When the vector grows, that array might run out of space, and when it does, the vector allocates a new, bigger, array, copies the data over to that, and then deletes the old array.

So your old iterators, which point into the old memory, are no longer valid. If the vector is resized downwards (for example by pop_back()), however, the same array is used. The array is never downsized automatically.

One way to avoid this reallocation (and pointer invalidation) is to call vector::reserve() first, to set aside enough space that this copying isn't necessary. In your case, if you called a.reserve(3) before the first push_back() operation, then the internal array would be big enough that the push_back's can be performed without having to reallocate the array, and so your iterators will stay valid.

Solution 2:

Vector iterators are only invalidated when the vector performs a reallocation.

The call to push_back(4) is causing the vector to allocate a new block of memory - this is what causes your iterator to become invalid. When you also use push_back(3), no reallocation is performed for push_back(4) so the iterator remains valid.

Solution 3:

Yes, any action that might change the size of the vector can invalidate iterators.

Edit: That includes operations (e.g. erase(), resize()) that reduce the size of the container. erase() doesn't invalidate all iterators, but it does invalidate any iterators referring to any point after the erased element(s). resize() is defined in terms of insert() and erase(), so it has the same potential.

Solution 4:

The rules for iterator invalidation are specific to a container.

Now invalidation may have 2 meanings with a vector:

  1. Invalidation = point outside of the range defined by [begin,end]
  2. Invalidation = point to a different object from the original one

As you can see, the second is much more strict:

std::vector<int> myVector;
myVector.push_back(0);
myVector.push_back(1);

std::vector<int>::iterator it = myVector.begin(); // it points to 0
myVector.erase(it); // it points to 1
myVector.erase(it); // it == myVector.end()

In this case, it is 'valid' in that it is always in the inclusive range [begin,end] and so can be safely used for any operation on myVector. On the other hand the expression (*it) keeps changing: first it returns 0, then 1, then it has undefined behavior...

In general, people will rather speak about the 2nd requirement, and invalidating an iterator simply means that (*it) may not produce the same result as before.


Now that this is said, there are several ways to invalidate an iterator on a Vector (in fact, it's the less stable structure of the STL).

During additions of elements:

  • This may trigger a reallocation (1) if myVector.size() == myVector.capacity(), since checking this is error-prone, we usually consider that any addition will invalidate the iterators
  • If you want to be 'picky' and knows that the reallocation is not triggered, then you still have to worry about insert. Inserting an element invalidates the iterators pointing to this current position and all subsequent ones since the elements are shifted one step towards the end of the vector.

During removal of elements:

  • There is no reallocation, even if the buffer is now much bigger than needed. It is possible to force this though, using the shrink to fit idiom (2).
  • All iterators pointing past the element removed are invalidated. Especially, the previous 'end' iterator is now beyond the [begin,end] range and cannot be used safely within STL algorithms for example.

(1) The internal structure of a std::vector is an array of T, this is due for compatibility with the C-programs (using &myVector.front() as the address of the array) and because it guarantees contiguity and a minimum space overhead (ie, the amount of space taken by the vector own data vs the amount of space occupied by an object)

At any moment, you can know how many objects a vector can hold using .capacity() method.

When you want to insert an object and the vector does not have the necessary capacity, a call to the .reserve(size_t) method is triggered. This method, if the number of items required is superior to the current capacity, triggers a reallocation.

The vector then allocates a new array of elements (its size is generally 2*n+1 where n is the current capacity), copies the elements of the current array to the new array, discards the current array.

Because it discards the current array, your iterators are invalidated as vector iterators are generally simple pointers (for efficiency).

Note that if the iterators were implemented as: a reference to the vector + a count, and dereferencing was actually *(&m_vector.front() + n) reallocation would not invalidate them... but they would be less efficient.

(2) Shrink to fit: Warning, this triggers a COPY of the elements and invalidates iterators.

// myVector has 10 elements, but myVector.capacity() == 1000
myVector.swap(std::vector<int>(myVector));

It first creates a temporary vector, which will allocate only as much memory as is needed (with a minimum depending on the library), and copy the elements of myVector. Then the swap operation exchange the buffers from myVector and this copy, and thus myVector now hols a buffer with the minimum amount of memory necessary. At the end of the operation the temporary is destroyed and the memory it holds released.