`std::variant` vs. inheritance vs. other ways (performance)

std::visit seems to lack some optimizations yet on some implementations. That being said there's a central point thats not very well seen in this lab-like setup - which is that variant based design is stack based vs. the virtual inheritance pattern which will naturally gravitate towards being heap based. In a real world scenario this means the memory layout could very well be fragmented (perhaps over time - once objects leave the cache, etc.) - unless it can somehow be avoided. The opposite is the variant based design that can be layout in contigoues memory. I believe this is an extremely important point to consider when performance is concerned that cannot be underestimated.

To illustrate this, consider the following:

std::vector<Base*> runtime_poly_;//risk of fragmentation

vs.

std::vector<my_var_type> cp_time_poly_;//no fragmentation (but padding 'risk')

This fragmentation is somewhat difficult to built into a benchmark test like this one. If this is (also) within the context of bjarne's statement is not clear to me when he said it could potentially be faster (which I do believe holds true).

Another very important thing to remember for the std::variant based design is that the size of each element uses up the size of the largest possible element. Therefore if objects do not have roughly the same size this has to be considered carefully since it may have a bad impact on the cache as a result.

Considering these points together it's hard to say which is best to use in the general case - however it should be clear enough if the set is a closed 'smallish' one of roughtly the same size - then the variant style shows great potential for being faster (as bjarne notes).

We now only considered performance and there are indeed other reasons for choosing one or the other pattern: In the end, you just have to get out the comfort of the 'lab' and design and benchmark your real world use cases.


You can match them all with a visit implementation if you can guarantee that the variant will never be empty by exception. Here is a single visitation visitor that matches the virtual above and inlines very well with jmp tables. https://gcc.godbolt.org/z/kkjACx

struct overload : Fs... {
  using Fs::operator()...;
};

template <typename... Fs>
overload(Fs...) -> overload<Fs...>;

template <size_t N, typename R, typename Variant, typename Visitor>
[[nodiscard]] constexpr R visit_nt(Variant &&var, Visitor &&vis) {
  if constexpr (N == 0) {
    if (N == var.index()) {
      // If this check isnt there the compiler will generate
      // exception code, this stops that
      return std::forward<Visitor>(vis)(
          std::get<N>(std::forward<Variant>(var)));
    }
  } else {
    if (var.index() == N) {
      return std::forward<Visitor>(vis)(
          std::get<N>(std::forward<Variant>(var)));
    }
    return visit_nt<N - 1, R>(std::forward<Variant>(var),
                              std::forward<Visitor>(vis));
  }
  while (true) {
  }  // unreachable but compilers complain
}

template <class... Args, typename Visitor, typename... Visitors>
[[nodiscard]] constexpr decltype(auto) visit_nt(
    std::variant<Args...> const &var, Visitor &&vis, Visitors &&... visitors) {
  auto ol =
      overload{std::forward<Visitor>(vis), std::forward<Visitors>(visitors)...};
  using result_t = decltype(std::invoke(std::move(ol), std::get<0>(var)));

  static_assert(sizeof...(Args) > 0);
  return visit_nt<sizeof...(Args) - 1, result_t>(var, std::move(ol));
}

template <class... Args, typename Visitor, typename... Visitors>
[[nodiscard]] constexpr decltype(auto) visit_nt(std::variant<Args...> &var,
                                                Visitor &&vis,
                                                Visitors &&... visitors) {
  auto ol =
      overload(std::forward<Visitor>(vis), std::forward<Visitors>(visitors)...);
  using result_t = decltype(std::invoke(std::move(ol), std::get<0>(var)));

  static_assert(sizeof...(Args) > 0);
  return visit_nt<sizeof...(Args) - 1, result_t>(var, std::move(ol));
}

template <class... Args, typename Visitor, typename... Visitors>
[[nodiscard]] constexpr decltype(auto) visit_nt(std::variant<Args...> &&var,
                                                Visitor &&vis,
                                                Visitors &&... visitors) {
  auto ol =
      overload{std::forward<Visitor>(vis), std::forward<Visitors>(visitors)...};
  using result_t =
      decltype(std::invoke(std::move(ol), std::move(std::get<0>(var))));

  static_assert(sizeof...(Args) > 0);
  return visit_nt<sizeof...(Args) - 1, result_t>(std::move(var), std::move(ol));
}

template <typename Value, typename... Visitors>
inline constexpr bool is_visitable_v = (std::is_invocable_v<Visitors, Value> or
                                        ...);

You call it with the variant first, followed by the visitors. Here is the Update 6 quickbench with it added Quickbench benchmark showing performance of  visit_nt. A link to the bench is here http://quick-bench.com/98aSbU0wWUsym0ej-jLy1POmCBw

So with that, I think the decision of whether or not to visit comes down to what is more expressive and clear in intent. The performance can be achieved either way.


Based on Update 6 http://quick-bench.com/LhdP-9y6CqwGxB-WtDlbG27o_5Y

I guess we can't compare time but relative to each other results seems different enough to show choices in the library implementation.

  • Visual 2019 v16.8.3

  • cl 19.28.29335 x64

  • compile in /std:c++17

     Run on (8 X 3411 MHz CPU s)
        CPU Caches:
        L1 Data 32 KiB (x4)
        L1 Instruction 32 KiB (x4)
        L2 Unified 256 KiB (x4)
        L3 Unified 8192 KiB (x1)
    
     -------------------------------------------------------------------
     Benchmark                         Time             CPU   Iterations
     -------------------------------------------------------------------
     TradeSpaceForPerformance       5.41 ns         5.47 ns    100000000
     Virtual                        11.2 ns         10.9 ns     56000000
     FunctionPointerList            13.2 ns         13.1 ns     56000000
     Index                          4.37 ns         4.37 ns    139377778
     GetIf                          4.79 ns         4.87 ns    144516129
     HoldsAlternative               5.08 ns         5.16 ns    100000000
     ConstexprVisitor               4.16 ns         4.14 ns    165925926
     StructVisitor                  4.26 ns         4.24 ns    165925926
     Overload                       4.21 ns         4.24 ns    165925926