How does rewriting $x^2 -y^2$ as $(x+y)(x-y)$ avoid catastrophic cancellation?

Solution 1:

$x^2-y^2$ will be only as precise as $x^2$ is. If $x$ and $y$ are close to each other, the computational error might be larger than the result.

In the second case, less loss of significance occurs.

Consider e.g. $x = 9000.2$ and $y = 9000.1$, where the floating-point precision is 5 significant digits. $x^2-y^2$ is actually equal to 1800.03.

Let's try to compute it in $x^2-y^2$ way: $x^2 = 81003600.04 = 81003000$; $y^2 = 81001800.01 = 81001000$, so $x^2-y^2 = 2000$, which is quite far away from the precise answer.

So let's try to compute it in $(x+y)(x-y)$ way: $x+y = 18000.3 = 18000$; $x-y = 0.1$, so $(x+y)(x-y) = 1800$, which is significantly closer to the precise answer (and is actually as close as possible with 5 significant digits).

Solution 2:

The answer by penartur is correct. His argument works for x and y of order 10^4, which is much less than the 10^16 order at which an overflow error would occur in a double IEEE format. The cancellation really is catastrophic, as the relative error is quite large (in some of my computations it becomes large enough to crash my models).

That being said, if you are dealing with numbers on the order of 10^8, then overflow considerations would start coming into play when trying to compute the squared quantities.