Associativity math: (a + b) + c != a + (b + c)
Recently I was going through an old blog post by Eric Lippert
in which, while writing about associativity he mentions that in C#, (a + b) + c
is not equivalent to a + (b + c)
for certain values of a, b, c.
I am not able to figure out for what types and range of arithmetic values might that hold true and why.
On the range of the double
type:
double dbl1 = (double.MinValue + double.MaxValue) + double.MaxValue;
double dbl2 = double.MinValue + (double.MaxValue + double.MaxValue);
The first one is double.MaxValue
, the second one is double.Infinity
On the precision of the double
type:
double dbl1 = (double.MinValue + double.MaxValue) + double.Epsilon;
double dbl2 = double.MinValue + (double.MaxValue + double.Epsilon);
Now dbl1 == double.Epsilon
, while dbl2 == 0
.
And on literally reading the question :-)
In checked
mode:
checked
{
int i1 = (int.MinValue + int.MaxValue) + int.MaxValue;
}
i1
is int.MaxValue
checked
{
int temp = int.MaxValue;
int i2 = int.MinValue + (temp + temp);
}
(note the use of the temp
variable, otherwise the compiler will give an error directly... Technically even this would be a different result :-) Compiles correctly vs doesn't compile)
this throws an OverflowException
... The results are different :-) (int.MaxValue
vs Exception
)
one example
a = 1e-30
b = 1e+30
c = -1e+30
Extending on the other answers which show how with extremes of small and large numbers you get a different result, here's an example where floating point with realistic normal numbers gives you a different answer.
In this case, instead of using numbers at the extreme limits of precision I simply do a lot of additions. The difference is between doing (((...(((a+b)+c)+d)+e)...
or ...(((a+b)+(c+d))+((e+f)+(g+h)))+...
I'm using python here, but you will probably get the same results if you write this in C#. First create a list of a million values, all of which are 0.1. Add them up from the left and you see the rounding errors become significant:
>>> numbers = [0.1]*1000000
>>> sum(numbers)
100000.00000133288
Now add them again, but this time add them in pairs (there are much more efficient ways to do this that use less intermediate storage, but I kept the implementation simple here):
>>> def pair_sum(numbers):
if len(numbers)==1:
return numbers[0]
if len(numbers)%2:
numbers.append(0)
return pair_sum([a+b for a,b in zip(numbers[::2], numbers[1::2])])
>>> pair_sum(numbers)
100000.0
This time any rounding errors are minimised.
Edit for completeness, here's a more efficient but less easy to follow implementation of a pairwise sum. It gives the same answer as the pair_sum()
above:
def pair_sum(seq):
tmp = []
for i,v in enumerate(seq):
if i&1:
tmp[-1] = tmp[-1] + v
i = i + 1
n = i & -i
while n > 2:
t = tmp.pop(-1)
tmp[-1] = tmp[-1] + t
n >>= 1
else:
tmp.append(v)
while len(tmp) > 1:
t = tmp.pop(-1)
tmp[-1] = tmp[-1] + t
return tmp[0]
And here's the simple pair_sum written in C#:
using System;
using System.Linq;
namespace ConsoleApplication1
{
class Program
{
static double pair_sum(double[] numbers)
{
if (numbers.Length==1)
{
return numbers[0];
}
var new_numbers = new double[(numbers.Length + 1) / 2];
for (var i = 0; i < numbers.Length - 1; i += 2) {
new_numbers[i / 2] = numbers[i] + numbers[i + 1];
}
if (numbers.Length%2 != 0)
{
new_numbers[new_numbers.Length - 1] = numbers[numbers.Length-1];
}
return pair_sum(new_numbers);
}
static void Main(string[] args)
{
var numbers = new double[1000000];
for (var i = 0; i < numbers.Length; i++) numbers[i] = 0.1;
Console.WriteLine(numbers.Sum());
Console.WriteLine(pair_sum(numbers));
}
}
}
with output:
100000.000001333
100000
This stems from the fact that ordinary value types (int, long, etc.) are stored using a fixed amount of bytes. Overflow is thus possible, when the sum of two values exceeds the byte storage capacity.
In C#, one may use BigInteger to avoid this kind of issue. BigInteger's are arbitrary in size and therefore do not create overflows.
BigInteger is only available from .NET 4.0 and above (VS 2010+).