C++ std::vector vs array in the real world
I'm new to C++. I'm reading "Beginning C++ Through Game Programming" by Michael Dawson. However, I'm not new to programming in general. I just finished a chapter that dealt with vectors, so I've got a question about their use in the real world (I'm a computer science student, so I don't have much real-world experience yet).
The author has a Q/A at the end of each chapter, and one of them was:
Q: When should I use a vector instead of an array?
A: Almost always. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
What do you guys think? I remember learning about vectors in a Java book, but we didn't cover them at all in my Intro to Comp. Sci. class, nor my Data Structures class at college. I've also never seen them used in any programming assignments (Java and C). This makes me feel like they're not used very much, although I know that school code and real-world code can be extremely different.
I don't need to be told about the differences between the two data structures; I'm very aware of them. All I want to know is if the author is giving good advice in his Q/A, or if he's simply trying to save beginner programmers from destroying themselves with complexities of managing fixed-size data structures. Also, regardless of what you think of the author's advice, what do you see in the real-world more often?
Solution 1:
A: Almost always [use a vector instead of an array]. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
That's an over-simplification. It's fairly common to use arrays, and can be attractive when:
-
the elements are specified at compile time, e.g.
const char project[] = "Super Server";
,const Colours colours[] = { Green, Yellow }
;- with C++11 it will be equally concise to initialise
std::vector
s with values
- with C++11 it will be equally concise to initialise
the number of elements is inherently fixed, e.g.
const char* const bool_to_str[] = { "false", "true" };
,Piece chess_board[8][8];
-
first-use performance is critical: with arrays of constants the compiler can often write a memory snapshot of the fully pre-initialised objects into the executable image, which is then page-faulted directly into place ready for use, so it's typically much faster that run-time heap allocation (
new[]
) followed by serialised construction of objectscompiler-generated tables of
const
data can always be safely read by multiple threads, whereas data constructed at run-time must complete construction before other code triggered by constructors for non-function-localstatic
variables attempts to use that data: you end up needing some manner of Singleton (possibly threadsafe which will be even slower)In C++03,
vector
s created with an initial size would construct one prototypical element object then copy construct each data member. That meant that even for types where construction was deliberately left as a no-operation, there was still a cost to copy the data elements - replicating their whatever-garbage-was-left-in-memory values. Clearly an array of uninitialised elements is faster.
One of the powerful features of C++ is that often you can write a
class
(orstruct
) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
classes/structures containing arrays can still be POD types
arrays facilitate access in shared memory from multiple processes (by default
vector
's internal pointers to the actual dynamically allocated data won't be in shared memory or meaningful across processes, and it was famously difficult to force C++03vector
s to use shared memory like this even when specifying a custom allocator template parameter).embedding arrays can localise memory access requirement, improving cache hits and therefore performance
That said, if it's not an active pain to use a vector
(in code concision, readability or performance) then you're better off doing so: they've size()
, checked random access via at()
, iterators, resizing (which often becomes necessary as an application "matures") etc.. It's also often easier to change from vector
to some other Standard container should there be a need, and safer/easier to apply Standard algorithms (x.end()
is better than x + sizeof x / sizeof x[0]
any day).
UPDATE: C++11 introduced a std::array<>
, which avoids some of the costs of vector
s - internally using a fixed-sized array to avoid an extra heap allocation/deallocation - while offering some of the benefits and API features: http://en.cppreference.com/w/cpp/container/array.
Solution 2:
One of the best reasons to use a vector
as opposed to an array is the RAII idiom. Basically, in order for c++ code to be exception-safe, any dynamically allocated memory or other resources should be encapsulated within objects. These objects should have destructors that free these resources.
When an exception goes unhandled, the ONLY things that are gaurenteed to be called are the destructors of objects on the stack. If you dynamically allocate memory outside of an object, and an uncaught exception is thrown somewhere before it is deleted, you have a memory leak.
It's also a nice way to avoid having to remember to use delete
.
You should also check out std::algorithm
, which provides a lot of common algorithms for vector
and other STL containers.
I have on a few occasions written code with vector
that, in retrospect, probably would have been better with a native array. But in all of these cases, either a Boost::multi_array
or a Blitz::Array
would have been better than either of them.
Solution 3:
A std::vector is just a resizable array. It's not much more than that. It's not something you would learn in a Data Structures class, because it isn't an intelligent data structure.
In the real world, I see a lot of arrays. But I also see a lot of legacy codebases that use "C with Classes"-style C++ programming. That doesn't mean that you should program that way.