Coercing floating-point to be deterministic in .NET?
Just what is this "hint" to the runtime?
As you conjecture, the compiler tracks whether a conversion to double or float was actually present in the source code, and if it was, it always inserts the appropriate conv opcode.
Does the C# spec stipulate that an explicit cast to float causes the insertion of a conv.r4 in the IL?
No, but I assure you that there are unit tests in the compiler test cases that ensure that it does. Though the specification does not demand it, you can rely on this behaviour.
The specification's only comment is that any floating point operation may be done in a higher precision than required at the whim of the runtime, and that this can make your results unexpectedly more accurate. See section 4.1.6.
Does the CLR spec stipulate that a conv.r4 instruction causes a value to be narrowed down to its native size?
Yes, in Partition I, section 12.1.3, which I note you could have looked up yourself rather than asking the internet to do it for you. These specifications are free on the web.
A question you didn't ask but probably should have:
Is there any operation other than casting that truncates floats out of high precision mode?
Yes. Assigning to a static field, instance field or element of a double[]
or float[]
array truncates.
Is consistent truncation enough to guarantee reproducibility across machines?
No. I encourage you to read section 12.1.3, which has much interesting to say on the subject of denormals and NaNs.
And finally, another question you did not ask but probably should have:
How can I guarantee reproducible arithmetic?
Use integers.
The 8087 Floating Point Unit chip design was Intel's billion dollar mistake. The idea looks good on paper, give it an 8 register stack that stores values in extended precision, 80 bits. So that you can write calculations whose intermediate values are less likely to lose significant digits.
The beast is however impossible to optimize for. Storing a value from the FPU stack back to memory is expensive. So keeping them inside the FPU is a strong optimization goal. Inevitable, having only 8 registers is going to require a write-back if the calculation is deep enough. It is also implemented as a stack, not freely addressable registers so that requires gymnastics as well that may produce a write-back. Inevitably a write back will truncate the value from 80-bits back to 64-bits, losing precision.
So consequences are that non-optimized code does not produce the same result as optimized code. And small changes to the calculation can have big effects on the result when an intermediate value ends up needing to be written back. The /fp:strict option is a hack around that, it forces the code generator to emit a write-back to keep the values consistent, but with the inevitable and considerable loss of perf.
This is a complete rock and a hard place. For the x86 jitter they just didn't try to address the problem.
Intel didn't make the same mistake when they designed the SSE instruction set. The XMM registers are freely addressable and don't store extra bits. If you want consistent results then compiling with the AnyCPU target, and a 64-bit operating system, is the quick solution. The x64 jitter uses SSE instead of FPU instructions for floating point math. Albeit that this added a third way that a calculation can produce a different result. If the calculation is wrong because it loses too many significant digits then it will be consistently wrong. Which is a bit of a bromide, really, but typically only as far as a programmer looks.