Most optimized way of concatenation in strings
Solution 1:
Here is a small test suite:
#include <iostream>
#include <string>
#include <chrono>
#include <sstream>
int main ()
{
typedef std::chrono::high_resolution_clock clock;
typedef std::chrono::duration<float, std::milli> mil;
std::string l_czTempStr;
std::string s1="Test data1";
auto t0 = clock::now();
#if VER==1
for (int i = 0; i < 100000; ++i)
{
l_czTempStr = s1 + "Test data2" + "Test data3";
}
#elif VER==2
for (int i = 0; i < 100000; ++i)
{
l_czTempStr = "Test data1";
l_czTempStr += "Test data2";
l_czTempStr += "Test data3";
}
#elif VER==3
for (int i = 0; i < 100000; ++i)
{
l_czTempStr = "Test data1";
l_czTempStr.append("Test data2");
l_czTempStr.append("Test data3");
}
#elif VER==4
for (int i = 0; i < 100000; ++i)
{
std::ostringstream oss;
oss << "Test data1";
oss << "Test data2";
oss << "Test data3";
l_czTempStr = oss.str();
}
#endif
auto t1 = clock::now();
std::cout << l_czTempStr << '\n';
std::cout << mil(t1-t0).count() << "ms\n";
}
On coliru:
Compile with the following:
clang++ -std=c++11 -O3 -DVER=1 -Wall -pedantic -pthread main.cpp
21.6463ms
-DVER=2
6.61773ms
-DVER=3
6.7855ms
-DVER=4
102.015ms
It looks like 2)
, +=
is the winner.
(Also compiling with and without -pthread
seems to affect the timings)
Solution 2:
In addition to other answers...
I made extensive benchmarks about this problem some time ago, and came to the conclusion that the most efficient solution (GCC 4.7 & 4.8 on Linux x86 / x64 / ARM) in all use cases is first to reserve()
the result string with enough space to hold all the concatenated strings, and then only append()
them (or use operator +=()
, that makes no difference).
Unfortunately it seems I deleted that benchmark so you only have my word (but you can easily adapt Mats Petersson's benchmark to verify this by yourself, if my word isn't enough).
In a nutshell:
const string space = " ";
string result;
result.reserve(5 + space.size() + 5);
result += "hello";
result += space;
result += "world";
Depending on the exact use case (number, types and sizes of the concatenated strings), sometimes this method is by far the most efficient, and other times it is on par with other methods, but it is never worse.
Problem is, this is really painful to compute the total required size in advance, especially when mixing string literals and std::string
(the example above is clear enough on that matter, I believe). The maintainability of such code is absolutely horrible as soon as you modify one of the literals or add another string to be concatenated.
One approach would be to use sizeof
to compute the size of the literals, but IMHO it creates as much mess than it solves, the maintainability is still terrible:
#define STR_HELLO "hello"
#define STR_WORLD "world"
const string space = " ";
string result;
result.reserve(sizeof(STR_HELLO)-1 + space.size() + sizeof(STR_WORLD)-1);
result += STR_HELLO;
result += space;
result += STR_WORLD;
A usable solution (C++11, variadic templates)
I finally settled for a set of variadic templates that efficiently take care of calculating the string sizes (eg. the size of string literals is determined at compile time), reserve()
as needed, and then concatenate everything.
Here it is, hope this is useful:
namespace detail {
template<typename>
struct string_size_impl;
template<size_t N>
struct string_size_impl<const char[N]> {
static constexpr size_t size(const char (&) [N]) { return N - 1; }
};
template<size_t N>
struct string_size_impl<char[N]> {
static size_t size(char (&s) [N]) { return N ? strlen(s) : 0; }
};
template<>
struct string_size_impl<const char*> {
static size_t size(const char* s) { return s ? strlen(s) : 0; }
};
template<>
struct string_size_impl<char*> {
static size_t size(char* s) { return s ? strlen(s) : 0; }
};
template<>
struct string_size_impl<std::string> {
static size_t size(const std::string& s) { return s.size(); }
};
template<typename String> size_t string_size(String&& s) {
using noref_t = typename std::remove_reference<String>::type;
using string_t = typename std::conditional<std::is_array<noref_t>::value,
noref_t,
typename std::remove_cv<noref_t>::type
>::type;
return string_size_impl<string_t>::size(s);
}
template<typename...>
struct concatenate_impl;
template<typename String>
struct concatenate_impl<String> {
static size_t size(String&& s) { return string_size(s); }
static void concatenate(std::string& result, String&& s) { result += s; }
};
template<typename String, typename... Rest>
struct concatenate_impl<String, Rest...> {
static size_t size(String&& s, Rest&&... rest) {
return string_size(s)
+ concatenate_impl<Rest...>::size(std::forward<Rest>(rest)...);
}
static void concatenate(std::string& result, String&& s, Rest&&... rest) {
result += s;
concatenate_impl<Rest...>::concatenate(result, std::forward<Rest>(rest)...);
}
};
} // namespace detail
template<typename... Strings>
std::string concatenate(Strings&&... strings) {
std::string result;
result.reserve(detail::concatenate_impl<Strings...>::size(std::forward<Strings>(strings)...));
detail::concatenate_impl<Strings...>::concatenate(result, std::forward<Strings>(strings)...);
return result;
}
The only interesting part, as far as the public interface is concerned, is the very last template<typename... Strings> std::string concatenate(Strings&&... strings)
template. Usage is straightforward:
int main() {
const string space = " ";
std::string result = concatenate("hello", space, "world");
std::cout << result << std::endl;
}
With optimizations turned on, any decent compiler should be able to expand the concatenate
call to the same code as my first example where I manually wrote everything. As far as GCC 4.7 & 4.8 are concerned, the generated code is pretty much identical as well as the performance.
Solution 3:
The WORST possible scenario is using plain old strcat
(or sprintf
), since strcat
takes a C string, and that has to be "counted" to find the end. For long strings, that's a real performance sufferer. C++ style strings are much better, and the performance problems are likely to be with the memory allocation, rather than counting lengths. But then again, the string grows geometrically (doubles each time it needs to grow), so it's not that terrible.
I'd very much suspect that all of the above methods end up with the same, or at least very similar, performance. If anything, I'd expect that stringstream
is slower, because of the overhead in supporting formatting - but I also suspect it's marginal.
As this sort of thing is "fun", I will get back with a benchmark...
Edit:
Note that these result apply to MY machine, running x86-64 Linux, compiled with g++ 4.6.3. Other OS's, compilers and C++ runtime library implementations may vary. If performance is important to your application, then benchmark on the system(s) that are critical for you, using the compiler(s) that you use.
Here's the code I wrote to test this. It may not be the perfect representation of a real scenario, but I think it's a representative scenario:
#include <iostream>
#include <iomanip>
#include <string>
#include <sstream>
#include <cstring>
using namespace std;
static __inline__ unsigned long long rdtsc(void)
{
unsigned hi, lo;
__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
string build_string_1(const string &a, const string &b, const string &c)
{
string out = a + b + c;
return out;
}
string build_string_1a(const string &a, const string &b, const string &c)
{
string out;
out.resize(a.length()*3);
out = a + b + c;
return out;
}
string build_string_2(const string &a, const string &b, const string &c)
{
string out = a;
out += b;
out += c;
return out;
}
string build_string_3(const string &a, const string &b, const string &c)
{
string out;
out = a;
out.append(b);
out.append(c);
return out;
}
string build_string_4(const string &a, const string &b, const string &c)
{
stringstream ss;
ss << a << b << c;
return ss.str();
}
char *build_string_5(const char *a, const char *b, const char *c)
{
char* out = new char[strlen(a) * 3+1];
strcpy(out, a);
strcat(out, b);
strcat(out, c);
return out;
}
template<typename T>
size_t len(T s)
{
return s.length();
}
template<>
size_t len(char *s)
{
return strlen(s);
}
template<>
size_t len(const char *s)
{
return strlen(s);
}
void result(const char *name, unsigned long long t, const string& out)
{
cout << left << setw(22) << name << " time:" << right << setw(10) << t;
cout << " (per character: "
<< fixed << right << setw(8) << setprecision(2) << (double)t / len(out) << ")" << endl;
}
template<typename T>
void benchmark(const char name[], T (Func)(const T& a, const T& b, const T& c), const char *strings[])
{
unsigned long long t;
const T s1 = strings[0];
const T s2 = strings[1];
const T s3 = strings[2];
t = rdtsc();
T out = Func(s1, s2, s3);
t = rdtsc() - t;
if (len(out) != len(s1) + len(s2) + len(s3))
{
cout << "Error: out is different length from inputs" << endl;
cout << "Got `" << out << "` from `" << s1 << "` + `" << s2 << "` + `" << s3 << "`";
}
result(name, t, out);
}
void benchmark(const char name[], char* (Func)(const char* a, const char* b, const char* c),
const char *strings[])
{
unsigned long long t;
const char* s1 = strings[0];
const char* s2 = strings[1];
const char* s3 = strings[2];
t = rdtsc();
char *out = Func(s1, s2, s3);
t = rdtsc() - t;
if (len(out) != len(s1) + len(s2) + len(s3))
{
cout << "Error: out is different length from inputs" << endl;
cout << "Got `" << out << "` from `" << s1 << "` + `" << s2 << "` + `" << s3 << "`";
}
result(name, t, out);
delete [] out;
}
#define BM(func, size) benchmark(#func " " #size, func, strings ## _ ## size)
#define BM_LOT(size) BM(build_string_1, size); \
BM(build_string_1a, size); \
BM(build_string_2, size); \
BM(build_string_3, size); \
BM(build_string_4, size); \
BM(build_string_5, size);
int main()
{
const char *strings_small[] = { "Abc", "Def", "Ghi" };
const char *strings_medium[] = { "abcdefghijklmnopqrstuvwxyz",
"defghijklmnopqrstuvwxyzabc",
"ghijklmnopqrstuvwxyzabcdef" };
const char *strings_large[] =
{ "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz",
"defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc"
"defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc"
"defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc"
"defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc"
"defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc"
"defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc"
"defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc"
"defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc"
"defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc"
"defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc",
"ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
"ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
"ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
"ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
"ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
"ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
"ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
"ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
"ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
"ghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdef"
};
for(int i = 0; i < 5; i++)
{
BM_LOT(small);
BM_LOT(medium);
BM_LOT(large);
cout << "---------------------------------------------" << endl;
}
}
Here are some representative results:
build_string_1 small time: 4075 (per character: 452.78)
build_string_1a small time: 5384 (per character: 598.22)
build_string_2 small time: 2669 (per character: 296.56)
build_string_3 small time: 2427 (per character: 269.67)
build_string_4 small time: 19380 (per character: 2153.33)
build_string_5 small time: 6299 (per character: 699.89)
build_string_1 medium time: 3983 (per character: 51.06)
build_string_1a medium time: 6970 (per character: 89.36)
build_string_2 medium time: 4072 (per character: 52.21)
build_string_3 medium time: 4000 (per character: 51.28)
build_string_4 medium time: 19614 (per character: 251.46)
build_string_5 medium time: 6304 (per character: 80.82)
build_string_1 large time: 8491 (per character: 3.63)
build_string_1a large time: 9563 (per character: 4.09)
build_string_2 large time: 6154 (per character: 2.63)
build_string_3 large time: 5992 (per character: 2.56)
build_string_4 large time: 32450 (per character: 13.87)
build_string_5 large time: 15768 (per character: 6.74)
Same code, run as 32-bit:
build_string_1 small time: 4289 (per character: 476.56)
build_string_1a small time: 5967 (per character: 663.00)
build_string_2 small time: 3329 (per character: 369.89)
build_string_3 small time: 3047 (per character: 338.56)
build_string_4 small time: 22018 (per character: 2446.44)
build_string_5 small time: 3026 (per character: 336.22)
build_string_1 medium time: 4089 (per character: 52.42)
build_string_1a medium time: 8075 (per character: 103.53)
build_string_2 medium time: 4569 (per character: 58.58)
build_string_3 medium time: 4326 (per character: 55.46)
build_string_4 medium time: 22751 (per character: 291.68)
build_string_5 medium time: 2252 (per character: 28.87)
build_string_1 large time: 8695 (per character: 3.72)
build_string_1a large time: 12818 (per character: 5.48)
build_string_2 large time: 8202 (per character: 3.51)
build_string_3 large time: 8351 (per character: 3.57)
build_string_4 large time: 38250 (per character: 16.35)
build_string_5 large time: 8143 (per character: 3.48)
From this, we can conclude:
The best option is appending a bit at a time (
out.append()
orout +=
), with the "chained" approach reasonably close.Pre-allocating the string is not helpful.
Using
stringstream
is pretty poor idea (between 2-4x slower).The
char *
usesnew char[]
. Using a local variable in the calling function makes it the fastest - but slightly unfairly to compare that.There is a fair bit of overhead in combining short string - just copying data should be at most one cycle per byte [unless the data doesn't fit in the cache].
edit2
Added, as per comments:
string build_string_1b(const string &a, const string &b, const string &c)
{
return a + b + c;
}
and
string build_string_2a(const string &a, const string &b, const string &c)
{
string out;
out.reserve(a.length() * 3);
out += a;
out += b;
out += c;
return out;
}
Which gives these results:
build_string_1 small time: 3845 (per character: 427.22)
build_string_1b small time: 3165 (per character: 351.67)
build_string_2 small time: 3176 (per character: 352.89)
build_string_2a small time: 1904 (per character: 211.56)
build_string_1 large time: 9056 (per character: 3.87)
build_string_1b large time: 6414 (per character: 2.74)
build_string_2 large time: 6417 (per character: 2.74)
build_string_2a large time: 4179 (per character: 1.79)
(A 32-bit run, but the 64-bit shows very similar results on these).