Unexpected output from Bubblesort program with MSVC vs TCC

Solution 1:

Like interjay's answer mentioned, once you get into undefined behavior all bets are off. However, when you said that merely renaming the variable changed the behavior of the program, I got curious about what was going on (undefined or not).

First, I didn't believe that renaming the variable would change the compiler's output (all other things being equal), but sure enough - when I tried it, I was surprised to see exactly what you described.

So I had the compiler dump the assembly for the code it was generating for each version of the source file, and ran a comparison. Here's what I found in the compilers description of how it was laying out the local variables:

    ***** test.sizeOfInput.cod
    _c$ = -56                    ; size = 4
    _b$ = -52                    ; size = 4
    _inner$ = -48                ; size = 4
    _a$ = -44                    ; size = 40
>>> _sizeOfInput$ = -4           ; size = 4
    _main   PROC
    ***** test.sizeOfInputt.cod
    _c$ = -56                    ; size = 4
>>> _sizeOfInputt$ = -52         ; size = 4
    _b$ = -48                    ; size = 4
    _inner$ = -44                ; size = 4
    _a$ = -40                    ; size = 40
    _main   PROC
    *****

What you'll notice is that when the variable is named sizeOfInput, he compiler places it at a higher address than array a (just past the end of the array), and when the variable is named sizeOfInputt it places it at a lower address than array a instead of just past the end of the array. That means that in the build that has the variable named sizeOfInput, the undefined behavior that occurs when you modify a[10] is changing the value of sizeOfInput. In the build that uses the name sizeOfInputt, since that variable isn't at the end of the array, the write to a[10] trashes something else.

As to why the compiler would lay out the variables differently when the name of one changes in an apparently insignificant way - I have no idea.

But this is a good example of why you shouldn't count on the layout of local variables (or pretty much any variables, though you can count on the layout order of struct elements), and why when it comes to undefined behavior, "it works on my machine" doesn't cut it as proof that something works.

Solution 2:

Your code reads past the end of the array. The maximum value of outer is 10, and the maximum value of inner is 9, so a[inner+1] will read a[10]. This will give you undefined behavior, which explains why different compilers give different results.

As for the variable name making a difference: It probably doesn't. It's possible that if you run the same code twice (using the same variable name), you will get different results. Basically, when invoking undefined behavior, you can't be sure of anything that your program will do, so it's best not to try and find meaning in things like variable names.

There's also a chance that the variable name does make a difference. That depends on the implementation of the compiler: For example, using a different variable name might make the compiler organize memory differently somehow, which could cause the program to bahave differently. I think most compilers would output the same code if you change variable names though, so it was probably just a matter of luck.

Solution 3:

Michael Burr's reply exposes an interesting behavior of the compiler. But I still doubt the local variable name could affect its layout in the stack for the function's active record.

I am using VC++ 2013 with command line above (cl.exe /EHsc progam.cpp) and cannot repro it - changing variable name doesn't change the program behavior, instead, it shows random crash stably (some runs returns the results well and some runs crash).

The reason for the above random crash is __security_cookie is stored directly above (larger address) the array a, and as a is defined as singed integer array, the result will depend on the sign bit (mis-interpreted) of the value of __security_cookie. If it is a positive integer larger than 100, it is still the largest value in the array a, so sort will not switch it to other positions, then the check (__security_check_cookie) at the end of the function will be fine. If it is less than 100 or negative integer, it will be switched to lower element in the array after sort so __security_check_cookie reports failure. This means the program behavior depends on the random generated value for __security_cookie.

I changed the original test program to below to ease testing. I also extended the output to include the off-by-one element (arrayLen + 1), and we could predict the behavior based on the original value in the element after the defined array a.

#include<stdio.h>
#define arrayLen sizeOfInputt

void main()
{
    int a [10] ={23, 100, 20, 30, 25, 45, 40, 55, 43, 42};
    int arrayLen = sizeof(a)/sizeof(int);
    int b, outer, inner, c;
    printf("Size is : %d \n", arrayLen);

    printf("Values before bubble sort are  : \n");
    for ( b = 0; b < arrayLen + 1; b++)
        printf("%d\n", a[b]);

    printf("End of values before bubble sort... \n");

    for ( outer = arrayLen; outer > 0; outer-- )
    {
        for (  inner = 0 ; inner < outer ; inner++)
        {
        printf ( "Comparing positions: %d and %d\n",inner,inner+1);
        if ( a[inner] > a[inner + 1] )
        {
                int tmp = a[inner];
                a[inner] = a [inner+1];
            a[inner+1] = tmp;
            }
        }
        printf ( "Bubble sort total array size after inner loop is %d :\n",arrayLen);
        printf ( "Bubble sort arrayLen after inner loop is %d :\n",arrayLen);
    }
    printf ( "Bubble sort total array size at the end is %d :\n",arrayLen);
    for ( c = 0 ; c < arrayLen; c++)
        printf("Element: %d\n", a[c]);
}