Examples of when a bitwise swap() is a bad idea?

This is not specifically about swap but an example showing that low level optimizations are maybe not worth the trouble. The compiler often figures it out anyway.

Of course, this is my favorite example where the compiler is exceptionally lucky, but anyway we shouldn't assume that compilers are stupid and that we can easily improve on the generated code with some simple tricks.

My test code is - construct a std::string and copy it.

std::string whatever = "abcdefgh";
std::string whatever2 = whatever;

The first constructor looks like this

  basic_string(const value_type* _String,
               const allocator_type& _Allocator = allocator_type() ) : _Parent(_Allocator)
  {
     const size_type _StringSize = traits_type::length(_String);

     if (_MySmallStringCapacity < _StringSize)
     {
        _AllocateAndCopy(_String, _StringSize);
     }
     else
     {
        traits_type::copy(_MySmallString._Buffer, _String, _StringSize);

        _SetSmallStringCapacity();
        _SetSize(_StringSize);
     }
  }

The generated code is

   std::string whatever = "abcdefgh";
000000013FCC30C3  mov         rdx,qword ptr [string "abcdefgh" (13FD07498h)]  
000000013FCC30CA  mov         qword ptr [whatever],rdx  
000000013FCC30D2  mov         byte ptr [rsp+347h],0  
000000013FCC30DA  mov         qword ptr [rsp+348h],8  
000000013FCC30E6  mov         byte ptr [rsp+338h],0  

Here traits_type::copycontains a call to memcpy, which is optimized into a single register copy of the whole string (carefully selected to fit). The compiler also transforms a call to strlen into a compile time 8.

Then we copy it into a new string. The copy constructor looks like this

  basic_string(const basic_string& _String)
     : _Parent(std::allocator_traits<allocator_type>::select_on_container_copy_construction(_String._MyAllocator))
  {
     if (_MySmallStringCapacity < _String.size())
     {
        _AllocateAndCopy(_String);
     }
     else
     {
        traits_type::copy(_MySmallString._Buffer, _String.data(), _String.size());

        _SetSmallStringCapacity();
        _SetSize(_String.size());
     }
  }

and results in just 4 machine instructions:

   std::string whatever2 = whatever;
000000013FCC30EE  mov         qword ptr [whatever2],rdx  
000000013FCC30F6  mov         byte ptr [rsp+6CFh],0  
000000013FCC30FE  mov         qword ptr [rsp+6D0h],8  
000000013FCC310A  mov         byte ptr [rsp+6C0h],0  

Note that the optimizer remembers that the char's are still in register rdx and that the string length must be the same, 8.

It is after seeing things like this that I like to trust my compiler, and avoid trying to improve code with bit fiddling. It doesn't help, unless profiling finds an unexpected bottleneck.

(featuring MSVC 10 and my std::string implementation)


I'm going to argue that this is almost always a bad idea except in the specific case where profiling has been done and a more obvious and clear implementation of swap has performance problems. Even in that case I would only go with this sort of approach for straight up no-inheritance structures, never for any sort of class. You never know when inheritance will be added potentially breaking the whole thing (possibly in truly insidious ways too).

If you want a fast swap implementation perhaps a better choice (where appropriate) is to pimpl the class and then just swap out the implementation (again, this assumes that there are no back-pointers to the owner, but that's easily contained to the class & impl rather than external factors).

EDIT: Possible problems with this approach:

  • Pointers back to self (directly or indirectly)
  • If the class contains any object for which a straight byte-copy is meaningless (effectively recursing this definition) or for which copying is normally disabled
  • If the class needs any sort of locking to copy
  • It's easy to accidentally pass in two different types here (all it takes is one intermediate function to implicitly make a derived class look like the parent) and then you swap vptrs (OUCH!)

Why are "self-pointers" contrived?

class RingBuffer
{
    // ...
private:
    char buffer[1024];
    char* curr;
};

This type holds a buffer and a current position into the buffer.

Or maybe you've heard of iostreams:

class streambuf
{
  char buffer[64];
  char* put_ptr;
  char* get_ptr;
  // ...
};

As someone else mentioned, the small string optimisation:

// untested, probably buggy!
class String {
  union {
    char buf[8];
    char* ptr;
  } data;
  unsigned len;
  unsigned capacity;
  char* str;
public:
  String(const char* s, unsigned n)
  {
    if (n > sizeof(data.buf)-1) {
      str = new char[n+1];
      len = capacity = n;
    }
    else
    {
      str = data.buf;
      len = n;
      capacity = sizeof(data.buf) - 1;
    } 
    memcpy(str, s, n);
    str[n] = '\0';
  }
  ~String()
  {
    if (str != data.buf)
      delete[] str;
  }
  const char* c_str() const { return str; }
  // ...
};

This has a self-pointer too. If you construct two small strings then swap them, the destructors will both decide the string is "non-local" and try to delete the memory:

{
  String s1("foo", 3);
  String s2("bar", 3);
  bad_swap(s1, s2);
}  // BOOM! destructors delete stack memory

Valgrind says:

==30214== Memcheck, a memory error detector
==30214== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==30214== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==30214== Command: ./a.out
==30214== 
==30214== Invalid free() / delete / delete[]
==30214==    at 0x4A05E9C: operator delete[](void*) (vg_replace_malloc.c:409)
==30214==    by 0x40083F: String::~String() (in /dev/shm/a.out)
==30214==    by 0x400737: main (in /dev/shm/a.out)
==30214==  Address 0x7fefffd00 is on thread 1's stack
==30214== 
==30214== Invalid free() / delete / delete[]
==30214==    at 0x4A05E9C: operator delete[](void*) (vg_replace_malloc.c:409)
==30214==    by 0x40083F: String::~String() (in /dev/shm/a.out)
==30214==    by 0x400743: main (in /dev/shm/a.out)
==30214==  Address 0x7fefffce0 is on thread 1's stack

So that shows it affects types like std::streambuf and std::string, hardly contrived or esoteric examples.

Basically, bad_swap is never a good idea, if the types are trivially-copyable then the default std::swap will be optimal (of your compiler doesn't optimise it to memcpy then get a better compiler) and if they're not trivially-copyable it's a great way to meet Mr. Undefined Behaviour and his friend Mr. Serious Bug.


Besides the examples mentioned in other answers (particulary objects containing pointers to parts of themselves and objects needing locking), there could also be a case of pointers to the object being managed by an external datastructure, which needs to be updated accordingly (please note that the example is somewhat contrived in order to not be excessive (and maybe buggy by virtue of not having been tested)):

class foo
{
private:
   static std::map<foo*, int> foo_data;
public:
   foo() { foo_data.emplace(this, 0); }
   foo(const foo& f) { foo_data.emplace(this, foo_data[&f]); }
   foo& operator=(const foo& f) { foo_data[this] = foo_data[&f]; return *this}
   ~foo() { foo_data.erase(this); }
   ...
};

obviously something like this would break badly if objects are swapped by memcpy. Of course real world examples for this are typically somewhat more complex, but the point should be clear.

Besides the examples I think that copying (or swapping) non trivially copyable objects like this is undefined behaviour by the standard (might check that later). In that case there would be no guarantee at all for that code to work with more complex objects.