How do objects work in x86 at the assembly level?

I'm trying to understand how objects work at the assembly level. How exactly are objects stored in memory, and how do member-functions access them?

(editor's note: the original version was way too broad, and had some confusion over how assembly and structs work in the first place.)

Classes are stored exactly the same way as structs, except when they have virtual members. In that case, there's an implicit vtable pointer as the first member (see below).

A struct is stored as a contiguous block of memory (if the compiler doesn't optimize it away or keep the member values in registers). Within a struct object, addresses of its elements increase in order in which the members were defined. (source: http://en.cppreference.com/w/c/language/struct). I linked the C definition, because in C++ struct means class (with public: as the default instead of private:).

Think of a struct or class as a block of bytes that might be too big to fit in a register, but which is copied around as a "value". Assembly language doesn't have a type system; bytes in memory are just bytes and it doesn't take any special instructions to store a double from a floating point register and reload it into an integer register. Or to do an unaligned load and get the last 3 bytes of 1 int and the first byte of the next. A struct is just part of building C's type system on top of blocks of memory, since blocks of memory are useful.

These blocks of bytes can have static (global or static), dynamic (malloc or new), or automatic storage (local variable: temporary on the stack or in registers, in normal C/C++ implementations on normal CPUs). The layout within a block is the same regardless (unless the compiler optimizes away the actual memory for a struct local variable; see the example below of inlining a function that returns a struct.)

A struct or class is the same as any other object. In C and C++ terminology, even an int is an object: http://en.cppreference.com/w/c/language/object. i.e. A contiguous block of bytes that you can memcpy around (except for non-POD types in C++).

The ABI rules for the system you're compiling for specify when and where padding is inserted to make sure each member has sufficient alignment even if you do something like struct { char a; int b; }; (for example, the x86-64 System V ABI, used on Linux and other non-Windows systems specifies that int is a 32-bit type that gets 4-byte alignment in memory. The ABI is what nails down some stuff that the C and C++ standards leave "implementation dependent", so that all compilers for that ABI can make code that can call each other's functions.)

Note that you can use offsetof(struct_name, member) to find out about struct layout (in C11 and C++11). See also alignof in C++11, or _Alignof in C11.

It's up to the programmer to order struct members well to avoid wasting space on padding, since C rules don't let the compiler sort your struct for you. (e.g. if you have some char members, put them in groups of at least 4, rather than alternating with wider members. Sorting from large to small is an easy rule, remembering that pointers may be 64 or 32-bit on common platforms.)

More details of ABIs and so on can be found at https://stackoverflow.com/tags/x86/info. Agner Fog's excellent site includes an ABI guide, along with optimization guides.

Classes (with member functions)

class foo {
  int m_a;
  int m_b;
  void inc_a(void){ m_a++; }
  int inc_b(void);
};

int foo::inc_b(void) { return m_b++; }

compiles to (using http://gcc.godbolt.org/):

foo::inc_b():                  # args: this in RDI
    mov eax, DWORD PTR [rdi+4]      # eax = this->m_b
    lea edx, [rax+1]                # edx = eax+1
    mov DWORD PTR [rdi+4], edx      # this->m_b = edx
    ret

As you can see, the this pointer is passed as an implicit first argument (in rdi, in the SysV AMD64 ABI). m_b is stored at 4 bytes from the start of the struct/class. Note the clever use of lea to implement the post-increment operator, leaving the old value in eax.

No code for inc_a is emitted, since it's defined inside the class declaration. It's treated the same as an inline non-member function. If it was really big and the compiler decided not to inline it, it could emit a stand-alone version of it.

Where C++ objects really differ from C structs is when virtual member functions are involved. Each copy of the object has to carry around an extra pointer (to the vtable for its actual type).

class foo {
  public:
  int m_a;
  int m_b;
  void inc_a(void){ m_a++; }
  void inc_b(void);
  virtual void inc_v(void);
};

void foo::inc_b(void) { m_b++; }

class bar: public foo {
 public:
  virtual void inc_v(void);  // overrides foo::inc_v even for users that access it through a pointer to class foo
};

void foo::inc_v(void) { m_b++; }
void bar::inc_v(void) { m_a++; }

compiles to

  ; This time I made the functions return void, so the asm is simpler
  ; The in-memory layout of the class is now:
  ;   vtable ptr (8B)
  ;   m_a (4B)
  ;   m_b (4B)
foo::inc_v():
    add DWORD PTR [rdi+12], 1   # this_2(D)->m_b,
    ret
bar::inc_v():
    add DWORD PTR [rdi+8], 1    # this_2(D)->D.2657.m_a,
    ret

    # if you uncheck the hide-directives box, you'll see
    .globl  foo::inc_b()
    .set    foo::inc_b(),foo::inc_v()
    # since inc_b has the same definition as foo's inc_v, so gcc saves space by making one an alias for the other.

    # you can also see the directives that define the data that goes in the vtables

Fun fact: add m32, imm8 is faster than inc m32 on most Intel CPUs (micro-fusion of the load+ALU uops); one of the rare cases where the old Pentium4 advice to avoid inc still applies. gcc always avoids inc, though, even when it would save code size with no downsides :/ INC instruction vs ADD 1: Does it matter?

Virtual function dispatch:

void caller(foo *p){
    p->inc_v();
}

    mov     rax, QWORD PTR [rdi]      # p_2(D)->_vptr.foo, p_2(D)->_vptr.foo
    jmp     [QWORD PTR [rax]]         # *_3

(This is an optimized tailcall: jmp replacing call/ret).

The mov loads the vtable address from the object into a register. The jmp is a memory-indirect jump, i.e. loading a new RIP value from memory. The jump-target address is vtable[0], i.e. the first function pointer in the vtable. If there was another virtual function, the mov wouldn't change but the jmp would use jmp [rax + 8].

The order of entries in the vtable presumably matches the order of declaration in the class, so reordering the class declaration in one translation unit would result in virtual functions going to the wrong target. Just like reordering the data members would change the class's ABI.

If the compiler had more information, it could devirtualize the call. e.g. if it could prove that the foo * was always pointing to a bar object, it could inline bar::inc_v().

GCC will even speculatively devirtualize when it can figure out what the type probably is at compile time. In the above code, the compiler can't see any classes that inherit from bar, so it's a good bet that bar* is pointing to a bar object, rather than some derived class.

void caller_bar(bar *p){
    p->inc_v();
}

# gcc5.5 -O3
caller_bar(bar*):
    mov     rax, QWORD PTR [rdi]      # load vtable pointer
    mov     rax, QWORD PTR [rax]      # load target function address
    cmp     rax, OFFSET FLAT:bar::inc_v()  # check it
    jne     .L6       #,
    add     DWORD PTR [rdi+8], 1      # inlined version of bar::inc_v()
    ret
.L6:
    jmp     rax               # otherwise tailcall the derived class's function

Remember, a foo * can actually point to a derived bar object, but a bar * is not allowed to point to a pure foo object.

It is just a bet though; part of the point of virtual functions is that types can be extended without recompiling all the code that operates on the base type. This is why it has to compare the function pointer and fall back to the indirect call (jmp tailcall in this case) if it was wrong. Compiler heuristics decide when to attempt it.

Notice that it's checking the actual function pointer, rather than comparing the vtable pointer. It can still use the inlined bar::inc_v() as long as the derived type didn't override that virtual function. Overriding other virtual functions wouldn't affect this one, but would require a different vtable.

Allowing extension without recompilation is handy for libraries, but also means looser coupling between parts of a big program (i.e. you don't have to include all the headers in every file).

But this imposes some efficiency costs for some uses: C++ virtual dispatch only works through pointers to objects, so you can't have a polymorphic array without hacks, or expensive indirection through an array of pointers (which defeats a lot of hardware and software optimizations: Fastest implementation of simple, virtual, observer-sort of, pattern in c++?).

If you want some kind of polymorphism / dispatch but only for a closed set of types (i.e. all known at compile time), you can do it manually with a union + enum + switch, or with std::variant<D1,D2> to make a union and std::visit to dispatch, or various other ways. See also Contiguous storage of polymorphic types and Fastest implementation of simple, virtual, observer-sort of, pattern in c++?.

Objects aren't always stored in memory at all.

Using a struct doesn't force the compiler to actually put stuff in memory, any more than a small array or a pointer to a local variable does. For example, an inline function that returns a struct by value can still fully optimize.

The as-if rule applies: even if a struct logically has some memory storage, the compiler can make asm that keeps all the needed members in registers (and do transformations that mean that values in registers don't correspond to any value of a variable or temporary in the C++ abstract machine "running" the source code).

struct pair {
  int m_a;
  int m_b;
};

pair addsub(int a, int b) {
  return {a+b, a-b};
}

int foo(int a, int b) {
  pair ab = addsub(a,b);
  return ab.m_a * ab.m_b;
}

That compiles (with g++ 5.4) to:

# The non-inline definition which actually returns a struct
addsub(int, int):
    lea     edx, [rdi+rsi]  # add result
    mov     eax, edi
    sub     eax, esi        # sub result
                            # then pack both struct members into a 64-bit register, as required by the x86-64 SysV ABI
    sal     rax, 32
    or      rax, rdx
    ret

# But when inlining, it optimizes away
foo(int, int):
    lea     eax, [rdi+rsi]    # a+b
    sub     edi, esi          # a-b
    imul    eax, edi          # (a+b) * (a-b)
    ret

Notice how even returning a struct by value doesn't necessarily put it in memory. The x86-64 SysV ABI passes and returns small structs packed together into registers. Different ABIs make different choices for this.

(Sorry, I can't post this as "comment" to Peter Cordes' answer because of the code examples, so I have to post this as "answer".)

Old C++ compilers generated C code instead of assembly code. The following class:

class foo {
  int m_a;
  void inc_a(void);
  ...
};

... would result in the following C code:

struct _t_foo_functions {
  void (*inc_a)(struct _class_foo *_this);
  ...
};
struct _class_foo {
  struct _t_foo_functions *functions;
  int m_a;
  ...
};

A "class" becomes a "struct", an "object" becomes a data item of the struct type. All functions have an additional element in C (compared to C++): The "this" pointer. The first element of the "struct" is a pointer to a list of all functions of the class.

So the following C++ code:

m_x=1; // implicit this->m_x
thisMethod(); // implicit this->thisMethod()
myObject.m_a=5;
myObject.inc_a();
myObjectp->some_other_method(1,2,3);

... will look the following way in C:

_this->m_x=1;
_this->functions->thisMethod(_this);
myObject.m_a=5;
myObject.functions->inc_a(&myObject);
myObjectp->functions->some_other_method(myObjectp,1,2,3);

Using those old compilers the C code was translated into assembler or machine code. You only need to know how structures are handled in assembler code and how calls to function pointers are handled...

Although modern compilers do no longer convert C++ code to C code the resulting assembler code still looks the same way as if you would do the C++-to-C step first.

"new" and "delete" will result in a function calls to memory functions (you may call "malloc" or "free" instead), the call of the constructor or destructor and the initialization of the structure elements.

How do objects work in x86 at the assembly level?

Classes (with member functions)

Virtual function dispatch:

Objects aren't always stored in memory at all.

Related

Recent Posts