Stackoverflow doing boxing in C#
I have these two chunks of code in C#:
First
class Program
{
static Stack<int> S = new Stack<int>();
static int Foo(int n) {
if (n == 0)
return 0;
S.Push(0);
S.Push(1);
...
S.Push(999);
return Foo( n-1 );
}
}
Second
class Program
{
static Stack S = new Stack();
static int Foo(int n) {
if (n == 0)
return 0;
S.Push(0);
S.Push(1);
...
S.Push(999);
return Foo( n-1 );
}
}
They both do the same:
Create a Stack (generic in
<int>
for the first example and a stack of object for the second one).Declare a method that calls itself recursively n times (n >= 0) and in every step push 1000 integers inside the created stack.
When I run the first example with Foo(30000)
no exception occurs, however the second example crashes with Foo(1000)
, just n = 1000.
When I saw the CIL generated for both cases the only difference was the boxing part for every push:
First
IL_0030: ldsfld class [System]System.Collections.Generic.Stack`1<int32> Test.Program::S
IL_0035: ldc.i4 0x3e7
IL_003a: callvirt instance void class [System]System.Collections.Generic.Stack`1<int32>::Push(!0)
IL_003f: nop
Second
IL_003a: ldsfld class [mscorlib]System.Collections.Stack Test.Program::S
IL_003f: ldc.i4 0x3e7
IL_0044: box [mscorlib]System.Int32
IL_0049: callvirt instance void [mscorlib]System.Collections.Stack::Push(object)
IL_004e: nop
My question is: Why, if there is not significant overload of CIL's stack for the second example, does it crash "faster" than the first one?
Why, if there is not significant overload of CIL's stack for second example, does it crash "faster" than the first one?
Note that the number of CIL instructions does not accurately represent the amount of work or memory that will be used. A single instruction can be very low impact, or very high impact, so counting CIL instructions is not an accurate way to measure "work".
Also realize that the CIL is not what gets executed. The JIT compiles the CIL to actual machine instructions, with an optimization phase, so the CIL may be very different than the actual executed instructions.
In the second case, since you're using a non-generic collection, every Push
call requires the integer to be boxed, as you determined in CIL.
Boxing an integer effectively creates an object which "wraps" the Int32
for you. Instead of just loading a 32-bit integer onto the stack, it now has to load a 32-bit integer onto the stack, then box it, which effectively also loads an object reference onto the stack.
If you inspect this in the Disassembly window, you can see the difference between the generic and non-generic version is dramatic, and much more significant than the generated CIL would suggest.
The generic version effectively compiles to as a series of calls like so:
0000022c nop
S.Push(25);
0000022d mov ecx,dword ptr ds:[03834978h]
00000233 mov edx,19h
00000238 cmp dword ptr [ecx],ecx
0000023a call 71618DD0
0000023f nop
S.Push(26);
00000240 mov ecx,dword ptr ds:[03834978h]
00000246 mov edx,1Ah
0000024b cmp dword ptr [ecx],ecx
0000024d call 71618DD0
00000252 nop
S.Push(27);
The non-generic, on the other hand, has to create the boxed objects, and instead compiles to:
00000645 nop
S.Push(25);
00000646 mov ecx,7326560Ch
0000064b call FAAC20B0
00000650 mov dword ptr [ebp-48h],eax
00000653 mov eax,dword ptr ds:[03AF4978h]
00000658 mov dword ptr [ebp+FFFFFEE8h],eax
0000065e mov eax,dword ptr [ebp-48h]
00000661 mov dword ptr [eax+4],19h
00000668 mov eax,dword ptr [ebp-48h]
0000066b mov dword ptr [ebp+FFFFFEE4h],eax
00000671 mov ecx,dword ptr [ebp+FFFFFEE8h]
00000677 mov edx,dword ptr [ebp+FFFFFEE4h]
0000067d mov eax,dword ptr [ecx]
0000067f mov eax,dword ptr [eax+2Ch]
00000682 call dword ptr [eax+18h]
00000685 nop
S.Push(26);
00000686 mov ecx,7326560Ch
0000068b call FAAC20B0
00000690 mov dword ptr [ebp-48h],eax
00000693 mov eax,dword ptr ds:[03AF4978h]
00000698 mov dword ptr [ebp+FFFFFEE0h],eax
0000069e mov eax,dword ptr [ebp-48h]
000006a1 mov dword ptr [eax+4],1Ah
000006a8 mov eax,dword ptr [ebp-48h]
000006ab mov dword ptr [ebp+FFFFFEDCh],eax
000006b1 mov ecx,dword ptr [ebp+FFFFFEE0h]
000006b7 mov edx,dword ptr [ebp+FFFFFEDCh]
000006bd mov eax,dword ptr [ecx]
000006bf mov eax,dword ptr [eax+2Ch]
000006c2 call dword ptr [eax+18h]
000006c5 nop
Here you can see the significance of boxing.
In your case, boxing the integer causes the boxed object references to get loaded onto the stack. On my system, this is causing a stackoverflow on any calls larger than Foo(127)
(in 32 bit), which suggests that the integers and boxed object references (4 bytes each) are all being kept on the stack, as 127*1000*8==1016000, which is dangerously close to the default 1 MB thread stack size for .NET applications.
When using the generic version, since there is no boxed object, the integers aren't having to all be stored on the stack, and the same register is being reused. This allows you to recurse significantly more (>40000 on my system) before you use up the stack.
Note that this will be CLR version and platform dependent, as there is a different JIT on x86/x64, as well.