Double cast to unsigned int on Win32 is truncating to 2,147,483,648

Solution 1:

A compiler bug...

From assembly provided by @anastaciu, the direct cast code calls __ftol2_sse, which seems to convert the number to a signed long. The routine name is ftol2_sse because this is an sse-enabled machine - but the float is in a x87 floating point register.

; Line 17
    call    _getDouble
    call    __ftol2_sse
    push    eax
    push    OFFSET ??_C@_0BH@GDLBDFEH@Direct?5cast?5value?3?5?$CFu?6@
    call    _printf
    add esp, 8

The indirect cast on the other hand does

; Line 18
    call    _getDouble
    fstp    QWORD PTR _d$[ebp]
; Line 19
    movsd   xmm0, QWORD PTR _d$[ebp]
    call    __dtoui3
    push    eax
    push    OFFSET ??_C@_0BJ@HCKMOBHF@Indirect?5cast?5value?3?5?$CFu?6@
    call    _printf
    add esp, 8

which pops and stores the double value to the local variable, then loads it into a SSE register and calls __dtoui3 which is a double to unsigned int conversion routine...

The behaviour of the direct cast does not conform to C89; nor does it conform to any later revision - even C89 explicitly says that:

The remaindering operation done when a value of integral type is converted to unsigned type need not be done when a value of floating type is converted to unsigned type. Thus the range of portable values is [0, Utype_MAX + 1).


I believe the problem might be a continuation of this from 2005 - there used to be a conversion function called __ftol2 which probably would have worked for this code, i.e. it would have converted the value to a signed number -2147483647, which would have produced the correct result when interpreted an unsigned number.

Unfortunately __ftol2_sse is not a drop-in replacement for __ftol2, as it would - instead of just taking the least-significant value bits as-is - signal the out-of-range error by returning LONG_MIN / 0x80000000, which, interpreted as unsigned long here is not at all what was expected. The behaviour of __ftol2_sse would be valid for signed long, as conversion of a double a value > LONG_MAX to signed long would have undefined behaviour.

Solution 2:

Following @AnttiHaapala's answer, I tested the code using optimization /Ox and found that this will remove the bug as __ftol2_sse is no longer used:

//; 17   :     printf("Direct cast value: %u\n", (unsigned int)getDouble());

    push    -2147483647             //; 80000001H
    push    OFFSET $SG10116
    call    _printf

//; 18   :     double d = getDouble();
//; 19   :     printf("Indirect cast value: %u\n", (unsigned int)d);

    push    -2147483647             //; 80000001H
    push    OFFSET $SG10117
    call    _printf
    add esp, 28                 //; 0000001cH

The optimizations inlined getdouble() and added constant expression evaluation thus removing the need for a conversion at runtime making the bug go away.

Just out of curiosity, I made some more tests, namely changing the code to force float-to-int conversion at runtime. In this case the result is still correct, the compiler, with optimization, uses __dtoui3 in both conversions:

//; 19   :     printf("Direct cast value: %u\n", (unsigned int)getDouble(d));

    movsd   xmm0, QWORD PTR _d$[esp+24]
    add esp, 12                 //; 0000000cH
    call    __dtoui3
    push    eax
    push    OFFSET $SG9261
    call    _printf

//; 20   :     double db = getDouble(d);
//; 21   :     printf("Indirect cast value: %u\n", (unsigned int)db);

    movsd   xmm0, QWORD PTR _d$[esp+20]
    add esp, 8
    call    __dtoui3
    push    eax
    push    OFFSET $SG9262
    call    _printf

However, preventing inlining, __declspec(noinline) double getDouble(){...} will bring the bug back:

//; 17   :     printf("Direct cast value: %u\n", (unsigned int)getDouble(d));

    movsd   xmm0, QWORD PTR _d$[esp+76]
    add esp, 4
    movsd   QWORD PTR [esp], xmm0
    call    _getDouble
    call    __ftol2_sse
    push    eax
    push    OFFSET $SG9261
    call    _printf

//; 18   :     double db = getDouble(d);

    movsd   xmm0, QWORD PTR _d$[esp+80]
    add esp, 8
    movsd   QWORD PTR [esp], xmm0
    call    _getDouble

//; 19   :     printf("Indirect cast value: %u\n", (unsigned int)db);

    call    __ftol2_sse
    push    eax
    push    OFFSET $SG9262
    call    _printf

__ftol2_sse is called in both conversions making the output 2147483648 in both situations, @zwol suspicions were correct.


Compilation details:

  • Using command line:
cl /permissive- /GS /analyze- /W3 /Gm- /Ox /sdl /D "WIN32" program.c        
  • In Visual Studio:

    • Disabling RTC in Project -> Properties -> Code Generation and setting Basic Runtime Checks to default.

    • Enabling optimization in Project -> Properties -> Optimization and setting Optimization to /Ox.

    • With debugger in x86 mode.

Solution 3:

Nobody has looked at the asm for MS's __ftol2_sse.

From the result, we can infer that it probably converted from x87 to signed int / long (both 32-bit types on Windows), instead of safely to uint32_t.

x86 FP -> integer instructions that overflow the integer result don't just wrap / truncate: they produce what Intel calls the "integer indefinite" when the exact value is not representable in the destination: high bit set, other bits clear. i.e. 0x80000000.

(Or if the FP invalid exception isn't masked, it fires and no value is stored. But in the default FP environment, all FP exceptions are masked. That's why for FP calculations you can get a NaN instead of a fault.)

That includes both x87 instructions like fistp (using the current rounding mode) and SSE2 instructions like cvttsd2si eax, xmm0 (using truncation toward 0, that's what the extra t means).

So it's a bug to compile double->unsigned conversion into a call to __ftol2_sse.


Side-note / tangent:

On x86-64, FP -> uint32_t can be compiled to cvttsd2si rax, xmm0, converting to a 64-bit signed destination, producing the uint32_t you want in the low half (EAX) of the integer destination.

It's C and C++ UB if the result is outside the 0..2^32-1 range so it's ok that huge positive or negative values will leave the low half of RAX (EAX) zero from the integer indefinite bit-pattern. (Unlike integer->integer conversions, modulo reduction of the value is not guaranteed. Is the behaviour of casting a negative double to unsigned int defined in the C standard? Different behaviour on ARM vs. x86. To be clear, nothing in the question is undefined or even implementation-defined behaviour. I'm just pointing out that if you have FP->int64_t, you can use it to efficiently implement FP->uint32_t. That includes x87 fistp which can write a 64-bit integer destination even in 32-bit and 16-bit mode, unlike SSE2 instructions which can only directly handle 64-bit integers in 64-bit mode.