C++11 initializer list fails - but only on lists of length 2
I tracked down an obscure logging bug to the fact that initializer lists of length 2 appear to be a special case! How is this possible?
The code was compiled with Apple LLVM version 5.1 (clang-503.0.40), using CXXFLAGS=-std=c++11 -stdlib=libc++
.
#include <stdio.h>
#include <string>
#include <vector>
using namespace std;
typedef vector<string> Strings;
void print(string const& s) {
printf(s.c_str());
printf("\n");
}
void print(Strings const& ss, string const& name) {
print("Test " + name);
print("Number of strings: " + to_string(ss.size()));
for (auto& s: ss) {
auto t = "length = " + to_string(s.size()) + ": " + s;
print(t);
}
print("\n");
}
void test() {
Strings a{{"hello"}}; print(a, "a");
Strings b{{"hello", "there"}}; print(b, "b");
Strings c{{"hello", "there", "kids"}}; print(c, "c");
Strings A{"hello"}; print(A, "A");
Strings B{"hello", "there"}; print(B, "B");
Strings C{"hello", "there", "kids"}; print(C, "C");
}
int main() {
test();
}
Output:
Test a
Number of strings: 1
length = 5: hello
Test b
Number of strings: 1
length = 8: hello
Test c
Number of strings: 3
length = 5: hello
length = 5: there
length = 4: kids
Test A
Number of strings: 1
length = 5: hello
Test B
Number of strings: 2
length = 5: hello
length = 5: there
Test C
Number of strings: 3
length = 5: hello
length = 5: there
length = 4: kids
I should also add that the length of the bogus string in test b seems to be indeterminate - it's always greater than the first initializer string but has varied from one more than the length of the first string to the total of the lengths of the two strings in the initializer.
Solution 1:
Introduction
Imagine the following declaration, and usage:
struct A {
A (std::initializer_list<std::string>);
};
A {{"a" }}; // (A), initialization of 1 string
A {{"a", "b" }}; // (B), initialization of 1 string << !!
A {{"a", "b", "c"}}; // (C), initialization of 3 strings
In (A) and (C), each c-style string is causing the initialization of one (1) std::string, but, as you have stated in your question, (B) differs.
The compiler sees that it's possible to construct a std::string using a begin- and end-iterator, and upon parsing statement (B) it will prefer such construct over using "a"
and "b"
as individual initializers for two elements.
A { std::string { "a", "b" } }; // the compiler's interpretation of (B)
Note: The type of
"a"
and"b"
ischar const[2]
, a type which can implicitly decay into achar const*
, a pointer-type which is suitable to act like an iterator denoting either begin or end when creating a std::string.. but we must be careful: we are causing undefined-behavior since there is no (guaranteed) relation between the two pointers upon invoking said constructor.
Explanation
When you invoke a constructor taking an std::initializer_list using double braces {{ a, b, ... }}
, there are two possible interpretations:
-
The outer braces refer to the constructor itself, the inner braces denotes the elements to take part in the std::initializer_list, or:
-
The outer braces refer to the std::initializer_list, whereas the inner braces denotes the initialization of an element inside it.
It's prefered to do 2) whenever that is possible, and since std::string
has a constructor taking two iterators, it is the one being called when you have std::vector<std::string> {{ "hello", "there" }}
.
Further example:
std::vector<std::string> {{"this", "is"}, {"stackoverflow"}}.size (); // yields 2
Solution
Don't use double braces for such initialization.
Solution 2:
First of all, this is undefined behaviour unless I'm missing something obvious. Now let me explain. The vector is being constructed from an initializer list of strings. However this list only contains one string. This string is formed by the inner {"Hello", "there"}
. How? With the iterator constructor. Essentially, for (auto it = "Hello"; it != "there"; ++it)
is forming a string containing Hello\0
.
For a simple example, see here. While UB is reason enough, it would seem the second literal is being placed right after the first in memory. As a bonus, do "Hello", "Hello"
and you'll probably get a string of length 0. If you don't understand anything in here, I recommend reading Filip's excellent answer.