How do objects work in x86 at the assembly level?
I'm trying to understand how objects work at the assembly level. How exactly are objects stored in memory, and how do member-functions access them?
(editor's note: the original version was way too broad, and had some confusion over how assembly and structs work in the first place.)
Classes are stored exactly the same way as structs, except when they have virtual members. In that case, there's an implicit vtable pointer as the first member (see below).
A struct is stored as a contiguous block of memory (if the compiler doesn't optimize it away or keep the member values in registers). Within a struct object, addresses of its elements increase in order in which the members were defined. (source: http://en.cppreference.com/w/c/language/struct). I linked the C definition, because in C++ struct
means class
(with public:
as the default instead of private:
).
Think of a struct
or class
as a block of bytes that might be too big to fit in a register, but which is copied around as a "value". Assembly language doesn't have a type system; bytes in memory are just bytes and it doesn't take any special instructions to store a double
from a floating point register and reload it into an integer register. Or to do an unaligned load and get the last 3 bytes of 1 int
and the first byte of the next. A struct
is just part of building C's type system on top of blocks of memory, since blocks of memory are useful.
These blocks of bytes can have static (global or static
), dynamic (malloc
or new
), or automatic storage (local variable: temporary on the stack or in registers, in normal C/C++ implementations on normal CPUs). The layout within a block is the same regardless (unless the compiler optimizes away the actual memory for a struct local variable; see the example below of inlining a function that returns a struct.)
A struct or class is the same as any other object. In C and C++ terminology, even an int
is an object: http://en.cppreference.com/w/c/language/object. i.e. A contiguous block of bytes that you can memcpy around (except for non-POD types in C++).
The ABI rules for the system you're compiling for specify when and where padding is inserted to make sure each member has sufficient alignment even if you do something like struct { char a; int b; };
(for example, the x86-64 System V ABI, used on Linux and other non-Windows systems specifies that int
is a 32-bit type that gets 4-byte alignment in memory. The ABI is what nails down some stuff that the C and C++ standards leave "implementation dependent", so that all compilers for that ABI can make code that can call each other's functions.)
Note that you can use offsetof(struct_name, member)
to find out about struct layout (in C11 and C++11). See also alignof
in C++11, or _Alignof
in C11.
It's up to the programmer to order struct members well to avoid wasting space on padding, since C rules don't let the compiler sort your struct for you. (e.g. if you have some char
members, put them in groups of at least 4, rather than alternating with wider members. Sorting from large to small is an easy rule, remembering that pointers may be 64 or 32-bit on common platforms.)
More details of ABIs and so on can be found at https://stackoverflow.com/tags/x86/info. Agner Fog's excellent site includes an ABI guide, along with optimization guides.
Classes (with member functions)
class foo {
int m_a;
int m_b;
void inc_a(void){ m_a++; }
int inc_b(void);
};
int foo::inc_b(void) { return m_b++; }
compiles to (using http://gcc.godbolt.org/):
foo::inc_b(): # args: this in RDI
mov eax, DWORD PTR [rdi+4] # eax = this->m_b
lea edx, [rax+1] # edx = eax+1
mov DWORD PTR [rdi+4], edx # this->m_b = edx
ret
As you can see, the this
pointer is passed as an implicit first argument (in rdi, in the SysV AMD64 ABI). m_b
is stored at 4 bytes from the start of the struct/class. Note the clever use of lea
to implement the post-increment operator, leaving the old value in eax
.
No code for inc_a
is emitted, since it's defined inside the class declaration. It's treated the same as an inline
non-member function. If it was really big and the compiler decided not to inline it, it could emit a stand-alone version of it.
Where C++ objects really differ from C structs is when virtual member functions are involved. Each copy of the object has to carry around an extra pointer (to the vtable for its actual type).
class foo {
public:
int m_a;
int m_b;
void inc_a(void){ m_a++; }
void inc_b(void);
virtual void inc_v(void);
};
void foo::inc_b(void) { m_b++; }
class bar: public foo {
public:
virtual void inc_v(void); // overrides foo::inc_v even for users that access it through a pointer to class foo
};
void foo::inc_v(void) { m_b++; }
void bar::inc_v(void) { m_a++; }
compiles to
; This time I made the functions return void, so the asm is simpler
; The in-memory layout of the class is now:
; vtable ptr (8B)
; m_a (4B)
; m_b (4B)
foo::inc_v():
add DWORD PTR [rdi+12], 1 # this_2(D)->m_b,
ret
bar::inc_v():
add DWORD PTR [rdi+8], 1 # this_2(D)->D.2657.m_a,
ret
# if you uncheck the hide-directives box, you'll see
.globl foo::inc_b()
.set foo::inc_b(),foo::inc_v()
# since inc_b has the same definition as foo's inc_v, so gcc saves space by making one an alias for the other.
# you can also see the directives that define the data that goes in the vtables
Fun fact: add m32, imm8
is faster than inc m32
on most Intel CPUs (micro-fusion of the load+ALU uops); one of the rare cases where the old Pentium4 advice to avoid inc
still applies. gcc always avoids inc
, though, even when it would save code size with no downsides :/ INC instruction vs ADD 1: Does it matter?
Virtual function dispatch:
void caller(foo *p){
p->inc_v();
}
mov rax, QWORD PTR [rdi] # p_2(D)->_vptr.foo, p_2(D)->_vptr.foo
jmp [QWORD PTR [rax]] # *_3
(This is an optimized tailcall: jmp
replacing call
/ret
).
The mov
loads the vtable address from the object into a register. The jmp
is a memory-indirect jump, i.e. loading a new RIP value from memory. The jump-target address is vtable[0]
, i.e. the first function pointer in the vtable. If there was another virtual function, the mov
wouldn't change but the jmp
would use jmp [rax + 8]
.
The order of entries in the vtable presumably matches the order of declaration in the class, so reordering the class declaration in one translation unit would result in virtual functions going to the wrong target. Just like reordering the data members would change the class's ABI.
If the compiler had more information, it could devirtualize the call. e.g. if it could prove that the foo *
was always pointing to a bar
object, it could inline bar::inc_v()
.
GCC will even speculatively devirtualize when it can figure out what the type probably is at compile time. In the above code, the compiler can't see any classes that inherit from bar
, so it's a good bet that bar*
is pointing to a bar
object, rather than some derived class.
void caller_bar(bar *p){
p->inc_v();
}
# gcc5.5 -O3
caller_bar(bar*):
mov rax, QWORD PTR [rdi] # load vtable pointer
mov rax, QWORD PTR [rax] # load target function address
cmp rax, OFFSET FLAT:bar::inc_v() # check it
jne .L6 #,
add DWORD PTR [rdi+8], 1 # inlined version of bar::inc_v()
ret
.L6:
jmp rax # otherwise tailcall the derived class's function
Remember, a foo *
can actually point to a derived bar
object, but a bar *
is not allowed to point to a pure foo
object.
It is just a bet though; part of the point of virtual functions is that types can be extended without recompiling all the code that operates on the base type. This is why it has to compare the function pointer and fall back to the indirect call (jmp tailcall in this case) if it was wrong. Compiler heuristics decide when to attempt it.
Notice that it's checking the actual function pointer, rather than comparing the vtable pointer. It can still use the inlined bar::inc_v()
as long as the derived type didn't override that virtual function. Overriding other virtual functions wouldn't affect this one, but would require a different vtable.
Allowing extension without recompilation is handy for libraries, but also means looser coupling between parts of a big program (i.e. you don't have to include all the headers in every file).
But this imposes some efficiency costs for some uses: C++ virtual dispatch only works through pointers to objects, so you can't have a polymorphic array without hacks, or expensive indirection through an array of pointers (which defeats a lot of hardware and software optimizations: Fastest implementation of simple, virtual, observer-sort of, pattern in c++?).
If you want some kind of polymorphism / dispatch but only for a closed set of types (i.e. all known at compile time), you can do it manually with a union + enum
+ switch
, or with std::variant<D1,D2>
to make a union and std::visit
to dispatch, or various other ways. See also Contiguous storage of polymorphic types and Fastest implementation of simple, virtual, observer-sort of, pattern in c++?.
Objects aren't always stored in memory at all.
Using a struct
doesn't force the compiler to actually put stuff in memory, any more than a small array or a pointer to a local variable does. For example, an inline function that returns a struct
by value can still fully optimize.
The as-if rule applies: even if a struct logically has some memory storage, the compiler can make asm that keeps all the needed members in registers (and do transformations that mean that values in registers don't correspond to any value of a variable or temporary in the C++ abstract machine "running" the source code).
struct pair {
int m_a;
int m_b;
};
pair addsub(int a, int b) {
return {a+b, a-b};
}
int foo(int a, int b) {
pair ab = addsub(a,b);
return ab.m_a * ab.m_b;
}
That compiles (with g++ 5.4) to:
# The non-inline definition which actually returns a struct
addsub(int, int):
lea edx, [rdi+rsi] # add result
mov eax, edi
sub eax, esi # sub result
# then pack both struct members into a 64-bit register, as required by the x86-64 SysV ABI
sal rax, 32
or rax, rdx
ret
# But when inlining, it optimizes away
foo(int, int):
lea eax, [rdi+rsi] # a+b
sub edi, esi # a-b
imul eax, edi # (a+b) * (a-b)
ret
Notice how even returning a struct by value doesn't necessarily put it in memory. The x86-64 SysV ABI passes and returns small structs packed together into registers. Different ABIs make different choices for this.
(Sorry, I can't post this as "comment" to Peter Cordes' answer because of the code examples, so I have to post this as "answer".)
Old C++ compilers generated C code instead of assembly code. The following class:
class foo {
int m_a;
void inc_a(void);
...
};
... would result in the following C code:
struct _t_foo_functions {
void (*inc_a)(struct _class_foo *_this);
...
};
struct _class_foo {
struct _t_foo_functions *functions;
int m_a;
...
};
A "class" becomes a "struct", an "object" becomes a data item of the struct type. All functions have an additional element in C (compared to C++): The "this" pointer. The first element of the "struct" is a pointer to a list of all functions of the class.
So the following C++ code:
m_x=1; // implicit this->m_x
thisMethod(); // implicit this->thisMethod()
myObject.m_a=5;
myObject.inc_a();
myObjectp->some_other_method(1,2,3);
... will look the following way in C:
_this->m_x=1;
_this->functions->thisMethod(_this);
myObject.m_a=5;
myObject.functions->inc_a(&myObject);
myObjectp->functions->some_other_method(myObjectp,1,2,3);
Using those old compilers the C code was translated into assembler or machine code. You only need to know how structures are handled in assembler code and how calls to function pointers are handled...
Although modern compilers do no longer convert C++ code to C code the resulting assembler code still looks the same way as if you would do the C++-to-C step first.
"new" and "delete" will result in a function calls to memory functions (you may call "malloc" or "free" instead), the call of the constructor or destructor and the initialization of the structure elements.