stdcall and cdecl
Solution 1:
Raymond Chen gives a nice overview of what __stdcall
and __cdecl
does.
(1) The caller "knows" to clean up the stack after calling a function because the compiler knows the calling convention of that function and generates the necessary code.
void __stdcall StdcallFunc() {}
void __cdecl CdeclFunc()
{
// The compiler knows that StdcallFunc() uses the __stdcall
// convention at this point, so it generates the proper binary
// for stack cleanup.
StdcallFunc();
}
It is possible to mismatch the calling convention, like this:
LRESULT MyWndProc(HWND hwnd, UINT msg,
WPARAM wParam, LPARAM lParam);
// ...
// Compiler usually complains but there's this cast here...
windowClass.lpfnWndProc = reinterpret_cast<WNDPROC>(&MyWndProc);
So many code samples get this wrong it's not even funny. It's supposed to be like this:
// CALLBACK is #define'd as __stdcall
LRESULT CALLBACK MyWndProc(HWND hwnd, UINT msg
WPARAM wParam, LPARAM lParam);
// ...
windowClass.lpfnWndProc = &MyWndProc;
However, assuming the programmer doesn't ignore compiler errors, the compiler will generate the code needed to clean up the stack properly since it'll know the calling conventions of the functions involved.
(2) Both ways should work. In fact, this happens quite frequently at least in code that interacts with the Windows API, because __cdecl
is the default for C and C++ programs according to the Visual C++ compiler and the WinAPI functions use the __stdcall
convention.
(3) There should be no real performance difference between the two.
Solution 2:
In CDECL arguments are pushed onto the stack in revers order, the caller clears the stack and result is returned via processor registry (later I will call it "register A"). In STDCALL there is one difference, the caller doeasn't clear the stack, the calle do.
You are asking which one is faster. No one. You should use native calling convention as long as you can. Change convention only if there is no way out, when using external libraries that requires certain convention to be used.
Besides, there are other conventions that compiler may choose as default one i.e. Visual C++ compiler uses FASTCALL which is theoretically faster because of more extensive usage of processor registers.
Usually you must give a proper calling convention signature to callback functions passed to some external library i.e. callback to qsort
from C library must be CDECL (if the compiler by default uses other convention then we must mark the callback as CDECL) or various WinAPI callbacks must be STDCALL (whole WinAPI is STDCALL).
Other usual case may be when you are storing pointers to some external functions i.e. to create a pointer to WinAPI function its type definition must be marked with STDCALL.
And below is an example showing how does the compiler do it:
/* 1. calling function in C++ */
i = Function(x, y, z);
/* 2. function body in C++ */
int Function(int a, int b, int c) { return a + b + c; }
CDECL:
/* 1. calling CDECL 'Function' in pseudo-assembler (similar to what the compiler outputs) */
push on the stack a copy of 'z', then a copy of 'y', then a copy of 'x'
call (jump to function body, after function is finished it will jump back here, the address where to jump back is in registers)
move contents of register A to 'i' variable
pop all from the stack that we have pushed (copy of x, y and z)
/* 2. CDECL 'Function' body in pseudo-assembler */
/* Now copies of 'a', 'b' and 'c' variables are pushed onto the stack */
copy 'a' (from stack) to register A
copy 'b' (from stack) to register B
add A and B, store result in A
copy 'c' (from stack) to register B
add A and B, store result in A
jump back to caller code (a, b and c still on the stack, the result is in register A)
STDCALL:
/* 1. calling STDCALL in pseudo-assembler (similar to what the compiler outputs) */
push on the stack a copy of 'z', then a copy of 'y', then a copy of 'x'
call
move contents of register A to 'i' variable
/* 2. STDCALL 'Function' body in pseaudo-assembler */
pop 'a' from stack to register A
pop 'b' from stack to register B
add A and B, store result in A
pop 'c' from stack to register B
add A and B, store result in A
jump back to caller code (a, b and c are no more on the stack, result in register A)
Solution 3:
I noticed a posting that say that it does not matter if you call a __stdcall
from a __cdecl
or visa versa. It does.
The reason: with __cdecl
the arguments that are passed to the called functions are removed form the stack by the calling function, in __stdcall
, the arguments are removed from the stack by the called function. If you call a __cdecl
function with a __stdcall
, the stack is not cleaned up at all, so eventually when the __cdecl
uses a stacked based reference for arguments or return address will use the old data at the current stack pointer. If you call a __stdcall
function from a __cdecl
, the __stdcall
function cleans up the arguments on the stack, and then the __cdecl
function does it again, possibly removing the calling functions return information.
The Microsoft convention for C tries to circumvent this by mangling the names. A __cdecl
function is prefixed with an underscore. A __stdcall
function prefixes with an underscore and suffixed with an at sign “@” and the number of bytes to be removed. Eg __cdecl
f(x) is linked as _f
, __stdcall f(int x)
is linked as _f@4
where sizeof(int)
is 4 bytes)
If you manage to get past the linker, enjoy the debugging mess.
Solution 4:
I want to improve on @adf88's answer. I feel that pseudocode for the STDCALL does not reflect the way of how it happens in reality. 'a', 'b', and 'c' aren't popped from the stack in the function body. Instead they are popped by the ret
instruction (ret 12
would be used in this case) that in one swoop jumps back to the caller and at the same time pops 'a', 'b', and 'c' from the stack.
Here is my version corrected according to my understanding:
STDCALL:
/* 1. calling STDCALL in pseudo-assembler (similar to what the compiler outputs) */
push on the stack a copy of 'z', then copy of 'y', then copy of 'x'
call
move contents of register A to 'i' variable
/* 2. STDCALL 'Function' body in pseaudo-assembler */
copy 'a' (from stack) to register A
copy 'b' (from stack) to register B
add A and B, store result in A
copy 'c' (from stack) to register B
add A and B, store result in A
jump back to caller code and at the same time pop 'a', 'b' and 'c' off the stack (a, b and
c are removed from the stack in this step, result in register A)