What is the memory layout of vector of arrays?
Solution 1:
Arrays do not have any indirection, but just store their data "directly". That is, a std::array<int, 5>
literally contains five int
s in a row, flat. And, like vectors, they do not put padding between their elements, so they're "internally contiguous".
However, the std::array
object itself may be larger than the set of its elements! It is permitted to have trailing "stuff" like padding. So, although likely, it is not necessarily true that your data will all be contiguous in the first case.
An int
+----+
| |
+----+
A vector of 2 x int
+----+----+----+-----+ +----+----+
| housekeeping | ptr | | 1 | 2 |
+----+----+----+-----+ +----+----+
| ^
\-----------
An std::array<int, 5>
+----+----+----+----+----+----------->
| 1 | 2 | 3 | 4 | 5 | possible cruft/padding....
+----+----+----+----+----+----------->
A vector of 2 x std::array<int, 5>
+----+----+----+-----+ +----+----+----+----+----+----------------------------+----+----+----+----+----+----------->
| housekeeping | ptr | | 1 | 2 | 3 | 4 | 5 | possible cruft/padding.... | 1 | 2 | 3 | 4 | 5 | possible cruft/padding....
+----+----+----+-----+ +----+----+----+----+----+----------------------------+----+----+----+----+----+----------->
| ^
\-----------
And, even if it were, due to aliasing rules, whether you'd be able to use a single int*
to navigate all 10 numbers would potentially be a different matter!
All in all, a vector of ten int
s would be clearer, completely packed, and possibly safer to use.
In the case of a vector of vectors, a vector is really just a pointer plus some housekeeping, hence the indirection (as you say).
Solution 2:
The big difference between std::vector
and std::array
is that std::vector
contains a pointer to the memory it wraps, while std::array
contains the actual array in itself.
That means a vector of vectors is like a jagged array.
For a vector of arrays, the std::array
objects will be placed contiguously but separate from the vector object. Note that the std::array
object themselves may be larger than the array they contain, and if so then the data will not be contiguous.
The last bit also means that an array (plain C-style or std::array
) of std::array
may also not keep the data contiguously. The std::array
objects in the array will be contiguous, but not the data.
The only way to guarantee contiguous data for a "multi-dimensional" array is nested plain C-style arrays.
Solution 3:
The C++ standard does not guarantee that std::array
doesn't contain any payload at the end of the array, so alas you cannot assume that the first element of a subsequent array is just after the last element of a prior array.
Even if that were the case, the behaviour on attempting to reach any element in a array by pointer arithmetic on a pointer to an element in a different array is undefined. This is because pointer arithmetic is only valid within arrays.
The above also applies to a std::array<std::array>
.
Solution 4:
static_assert(sizeof(std::array<int,5>)==5*sizeof(int));
the above mitigates against having any padding on the end of a std::array
. No major compiler will cause the above to fail to this date, and I'd bet won't in the future.
If and only if the above fails, then std::vector<std::array<int,5>> v(2)
will have a "gap" between the std::array
s.
This doesn't help as much as you'd like; a pointer generated as follows:
int* ptr = &v[0][0];
only has a domain of validity up to ptr+5
, and dereferencing ptr+5
is undefined behavior.
This is due to aliasing rules; you aren't allowed to "walk" past the end of one object into another, even if you know it is there, unless you first round-trip to certain types (like char*
) where less restricted pointer arithmetic is permitted.
That rule, in turn, exists to allow compilers to reason about what data is being accessed through which pointer, without having to prove that arbitrary pointer arithmetic will let you reach outside objects.
So:
struct bob {
int x,y,z;
};
bob b {1,2,3};
int* py = &b.y;
no matter what you do with py
as an int*
, you cannot legally modify x
or z
with it.
*py = 77;
py[-1]=3;
std::cout << b.x;
the complier can optimize the std::cout
line to simply print 1
, because the py[-1]=3
may attempt to modify b.x
, but doing so through that means is undefined behavior.
The same kind of restrictions prevent you from going from the first array in your std::vector
to the second (ie, beyond ptr+4
).
Creating ptr+5
is legal, but only as a one-past-the-end pointer. Comparing ptr+5 == &v[1][0]
is also not specified in result, even though their binary values are absolutely going to be identical in every compiler on every major hardware system.
If you want to go futher down the rabbit hole, it isn't even possible to implement std::vector<int>
within C++ itself due to these restrictions on pointer aliasing. Last I checked (which was before c++17, but I didn't see a resolution in C++17) the standard committee was working on solving this, but I don't know the state of any such effort. (This is less of a problem than you might think, because nothing requires that std::vector<int>
be implemented in standard-compliant C++; it must simply have standard-defined behavior. It can use compiler-specific extensions internally.)