C++ Tuple vs Struct
Is there is any difference between using a std::tuple
and a data-only struct
?
typedef std::tuple<int, double, bool> foo_t;
struct bar_t {
int id;
double value;
bool dirty;
}
From what I have found online, I found that there are two major differences: the struct
is more readable, while the tuple
has many generic functions that can be used.
Should there be any significant performance difference?
Also, is the data layout compatible with each other (interchangeably casted)?
Solution 1:
We have a similar discussion about tuple and struct and I write some simple benchmarks with the help from one of my colleague to identify the differences in term of performance between tuple and struct. We first start with a default struct and a tuple.
struct StructData {
int X;
int Y;
double Cost;
std::string Label;
bool operator==(const StructData &rhs) {
return std::tie(X,Y,Cost, Label) == std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
}
bool operator<(const StructData &rhs) {
return X < rhs.X || (X == rhs.X && (Y < rhs.Y || (Y == rhs.Y && (Cost < rhs.Cost || (Cost == rhs.Cost && Label < rhs.Label)))));
}
};
using TupleData = std::tuple<int, int, double, std::string>;
We then use Celero to compare the performance of our simple struct and tuple. Below is the benchmark code and performance results collected using gcc-4.9.2 and clang-4.0.0:
std::vector<StructData> test_struct_data(const size_t N) {
std::vector<StructData> data(N);
std::transform(data.begin(), data.end(), data.begin(), [N](auto item) {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, N);
item.X = dis(gen);
item.Y = dis(gen);
item.Cost = item.X * item.Y;
item.Label = std::to_string(item.Cost);
return item;
});
return data;
}
std::vector<TupleData> test_tuple_data(const std::vector<StructData> &input) {
std::vector<TupleData> data(input.size());
std::transform(input.cbegin(), input.cend(), data.begin(),
[](auto item) { return std::tie(item.X, item.Y, item.Cost, item.Label); });
return data;
}
constexpr int NumberOfSamples = 10;
constexpr int NumberOfIterations = 5;
constexpr size_t N = 1000000;
auto const sdata = test_struct_data(N);
auto const tdata = test_tuple_data(sdata);
CELERO_MAIN
BASELINE(Sort, struct, NumberOfSamples, NumberOfIterations) {
std::vector<StructData> data(sdata.begin(), sdata.end());
std::sort(data.begin(), data.end());
// print(data);
}
BENCHMARK(Sort, tuple, NumberOfSamples, NumberOfIterations) {
std::vector<TupleData> data(tdata.begin(), tdata.end());
std::sort(data.begin(), data.end());
// print(data);
}
Performance results collected with clang-4.0.0
Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort | struct | Null | 10 | 5 | 1.00000 | 196663.40000 | 5.08 |
Sort | tuple | Null | 10 | 5 | 0.92471 | 181857.20000 | 5.50 |
Complete.
And performance results collected using gcc-4.9.2
Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort | struct | Null | 10 | 5 | 1.00000 | 219096.00000 | 4.56 |
Sort | tuple | Null | 10 | 5 | 0.91463 | 200391.80000 | 4.99 |
Complete.
From the above results we can clearly see that
Tuple is faster than a default struct
Binary produce by clang has higher performance that that of gcc. clang-vs-gcc is not the purpose of this discussion so I won't dive into the detail.
We all know that writing a == or < or > operator for every single struct definition will be a painful and buggy task. Let replace our custom comparator using std::tie and rerun our benchmark.
bool operator<(const StructData &rhs) {
return std::tie(X,Y,Cost, Label) < std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
}
Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort | struct | Null | 10 | 5 | 1.00000 | 200508.20000 | 4.99 |
Sort | tuple | Null | 10 | 5 | 0.90033 | 180523.80000 | 5.54 |
Complete.
Now we can see that using std::tie makes our code more elegant and it is harder to make mistake, however, we will loose about 1% performance. I will stay with the std::tie solution for now since I also receive a warning about comparing floating point numbers with the customized comparator.
Until now we have not has any solution to make our struct code run faster yet. Let take a look at the swap function and rewrite it to see if we can gain any performance:
struct StructData {
int X;
int Y;
double Cost;
std::string Label;
bool operator==(const StructData &rhs) {
return std::tie(X,Y,Cost, Label) == std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
}
void swap(StructData & other)
{
std::swap(X, other.X);
std::swap(Y, other.Y);
std::swap(Cost, other.Cost);
std::swap(Label, other.Label);
}
bool operator<(const StructData &rhs) {
return std::tie(X,Y,Cost, Label) < std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
}
};
Performance results collected using clang-4.0.0
Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort | struct | Null | 10 | 5 | 1.00000 | 176308.80000 | 5.67 |
Sort | tuple | Null | 10 | 5 | 1.02699 | 181067.60000 | 5.52 |
Complete.
And the performance results collected using gcc-4.9.2
Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
Group | Experiment | Prob. Space | Samples | Iterations | Baseline | us/Iteration | Iterations/sec |
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort | struct | Null | 10 | 5 | 1.00000 | 198844.80000 | 5.03 |
Sort | tuple | Null | 10 | 5 | 1.00601 | 200039.80000 | 5.00 |
Complete.
Now our struct is slightly faster than that of a tuple now (around 3% with clang and less than 1% with gcc), however, we do need to write our customized swap function for all of our structs.
Solution 2:
If you're using several different tuples in your code you can get away with condensing the number of functors you are using. I say this because I've often used the following forms of functors:
template<int N>
struct tuple_less{
template<typename Tuple>
bool operator()(const Tuple& aLeft, const Tuple& aRight) const{
typedef typename boost::tuples::element<N, Tuple>::type value_type;
BOOST_CONCEPT_REQUIRES((boost::LessThanComparable<value_type>));
return boost::tuples::get<N>(aLeft) < boost::tuples::get<N>(aRight);
}
};
This might seem like overkill but for each place within the struct I'd have to make a whole new functor object using a struct but for a tuple, I just change N
. Better than that, I can do this for every single tuple as opposed to creating a whole new functor for each struct and for each member variable. If I have N structs with M member variables that NxM functors I would need to create (worse case scenario) that can be condensed down to one little bit of code.
Naturally, if you're going to go with the Tuple way, you're also going to need to create Enums for working with them:
typedef boost::tuples::tuple<double,double,double> JackPot;
enum JackPotIndex{
MAX_POT,
CURRENT_POT,
MIN_POT
};
and boom, you're code is completely readable:
double guessWhatThisIs = boost::tuples::get<CURRENT_POT>(someJackPotTuple);
because it describes itself when you want to get the items contained within it.
Solution 3:
Tuple has built in default(for == and != it compares every element, for <.<=... compares first, if same compares second...) comparators: http://en.cppreference.com/w/cpp/utility/tuple/operator_cmp
edit: as noted in the comment C++20 spaceship operator gives you a way to specify this functionality with one(ugly, but still just one) line of code.
Solution 4:
Well, here's a benchmark that doesn't construct a bunch of tuples inside the struct operator==(). Turns out there's a pretty significant performance impact from using tuple, as one would expect given that there's no performance impact at all from using PODs. (The address resolver finds the value in the instruction pipeline before the logic unit ever even sees it.)
Common results from running this on my machine with VS2015CE using the default 'Release' settings:
Structs took 0.0814905 seconds.
Tuples took 0.282463 seconds.
Please monkey with it until you're satisfied.
#include <iostream>
#include <string>
#include <tuple>
#include <vector>
#include <random>
#include <chrono>
#include <algorithm>
class Timer {
public:
Timer() { reset(); }
void reset() { start = now(); }
double getElapsedSeconds() {
std::chrono::duration<double> seconds = now() - start;
return seconds.count();
}
private:
static std::chrono::time_point<std::chrono::high_resolution_clock> now() {
return std::chrono::high_resolution_clock::now();
}
std::chrono::time_point<std::chrono::high_resolution_clock> start;
};
struct ST {
int X;
int Y;
double Cost;
std::string Label;
bool operator==(const ST &rhs) {
return
(X == rhs.X) &&
(Y == rhs.Y) &&
(Cost == rhs.Cost) &&
(Label == rhs.Label);
}
bool operator<(const ST &rhs) {
if(X > rhs.X) { return false; }
if(Y > rhs.Y) { return false; }
if(Cost > rhs.Cost) { return false; }
if(Label >= rhs.Label) { return false; }
return true;
}
};
using TP = std::tuple<int, int, double, std::string>;
std::pair<std::vector<ST>, std::vector<TP>> generate() {
std::mt19937 mt(std::random_device{}());
std::uniform_int_distribution<int> dist;
constexpr size_t SZ = 1000000;
std::pair<std::vector<ST>, std::vector<TP>> p;
auto& s = p.first;
auto& d = p.second;
s.reserve(SZ);
d.reserve(SZ);
for(size_t i = 0; i < SZ; i++) {
s.emplace_back();
auto& sb = s.back();
sb.X = dist(mt);
sb.Y = dist(mt);
sb.Cost = sb.X * sb.Y;
sb.Label = std::to_string(sb.Cost);
d.emplace_back(std::tie(sb.X, sb.Y, sb.Cost, sb.Label));
}
return p;
}
int main() {
Timer timer;
auto p = generate();
auto& structs = p.first;
auto& tuples = p.second;
timer.reset();
std::sort(structs.begin(), structs.end());
double stSecs = timer.getElapsedSeconds();
timer.reset();
std::sort(tuples.begin(), tuples.end());
double tpSecs = timer.getElapsedSeconds();
std::cout << "Structs took " << stSecs << " seconds.\nTuples took " << tpSecs << " seconds.\n";
std::cin.get();
}