What is the memory layout of vector of arrays?

Solution 1:

Arrays do not have any indirection, but just store their data "directly". That is, a std::array<int, 5> literally contains five ints in a row, flat. And, like vectors, they do not put padding between their elements, so they're "internally contiguous".

However, the std::array object itself may be larger than the set of its elements! It is permitted to have trailing "stuff" like padding. So, although likely, it is not necessarily true that your data will all be contiguous in the first case.

An int
+----+
|    |
+----+

A vector of 2 x int
+----+----+----+-----+        +----+----+
| housekeeping | ptr |        | 1  |  2 |
+----+----+----+-----+        +----+----+
                   |          ^
                   \-----------

An std::array<int, 5>
+----+----+----+----+----+----------->
| 1  |  2 |  3 |  4 |  5 | possible cruft/padding....
+----+----+----+----+----+----------->

A vector of 2 x std::array<int, 5>
+----+----+----+-----+        +----+----+----+----+----+----------------------------+----+----+----+----+----+----------->
| housekeeping | ptr |        | 1  |  2 |  3 |  4 |  5 | possible cruft/padding.... | 1  |  2 |  3 |  4 |  5 | possible cruft/padding....
+----+----+----+-----+        +----+----+----+----+----+----------------------------+----+----+----+----+----+----------->
                   |          ^
                   \-----------

And, even if it were, due to aliasing rules, whether you'd be able to use a single int* to navigate all 10 numbers would potentially be a different matter!

All in all, a vector of ten ints would be clearer, completely packed, and possibly safer to use.

In the case of a vector of vectors, a vector is really just a pointer plus some housekeeping, hence the indirection (as you say).

Solution 2:

The big difference between std::vector and std::array is that std::vector contains a pointer to the memory it wraps, while std::array contains the actual array in itself.

That means a vector of vectors is like a jagged array.

For a vector of arrays, the std::array objects will be placed contiguously but separate from the vector object. Note that the std::array object themselves may be larger than the array they contain, and if so then the data will not be contiguous.

The last bit also means that an array (plain C-style or std::array) of std::array may also not keep the data contiguously. The std::array objects in the array will be contiguous, but not the data.

The only way to guarantee contiguous data for a "multi-dimensional" array is nested plain C-style arrays.

Solution 3:

The C++ standard does not guarantee that std::array doesn't contain any payload at the end of the array, so alas you cannot assume that the first element of a subsequent array is just after the last element of a prior array.

Even if that were the case, the behaviour on attempting to reach any element in a array by pointer arithmetic on a pointer to an element in a different array is undefined. This is because pointer arithmetic is only valid within arrays.

The above also applies to a std::array<std::array>.

Solution 4:

static_assert(sizeof(std::array<int,5>)==5*sizeof(int));

the above mitigates against having any padding on the end of a std::array. No major compiler will cause the above to fail to this date, and I'd bet won't in the future.

If and only if the above fails, then std::vector<std::array<int,5>> v(2) will have a "gap" between the std::arrays.

This doesn't help as much as you'd like; a pointer generated as follows:

int* ptr = &v[0][0];

only has a domain of validity up to ptr+5, and dereferencing ptr+5 is undefined behavior.

This is due to aliasing rules; you aren't allowed to "walk" past the end of one object into another, even if you know it is there, unless you first round-trip to certain types (like char*) where less restricted pointer arithmetic is permitted.

That rule, in turn, exists to allow compilers to reason about what data is being accessed through which pointer, without having to prove that arbitrary pointer arithmetic will let you reach outside objects.

So:

struct bob {
  int x,y,z;
};

bob b {1,2,3};
int* py = &b.y;

no matter what you do with py as an int*, you cannot legally modify x or z with it.

*py = 77;
py[-1]=3;
std::cout << b.x;

the complier can optimize the std::cout line to simply print 1, because the py[-1]=3 may attempt to modify b.x, but doing so through that means is undefined behavior.

The same kind of restrictions prevent you from going from the first array in your std::vector to the second (ie, beyond ptr+4).

Creating ptr+5 is legal, but only as a one-past-the-end pointer. Comparing ptr+5 == &v[1][0] is also not specified in result, even though their binary values are absolutely going to be identical in every compiler on every major hardware system.

If you want to go futher down the rabbit hole, it isn't even possible to implement std::vector<int> within C++ itself due to these restrictions on pointer aliasing. Last I checked (which was before c++17, but I didn't see a resolution in C++17) the standard committee was working on solving this, but I don't know the state of any such effort. (This is less of a problem than you might think, because nothing requires that std::vector<int> be implemented in standard-compliant C++; it must simply have standard-defined behavior. It can use compiler-specific extensions internally.)

What is the memory layout of vector of arrays?

Solution 1:

Solution 2:

Solution 3:

Solution 4:

Related

Recent Posts