Expressing a Non Negative Integer as Sums of Two Squares
I'm writing a code in C that returns the number of times a non negative integer can be expressed as sums of perfect squares of two non negative integers.
R(n) is the number of couples (x,y) such that x² + y² = n where x, y, n are all
non negative integers.
How can I mathematically compute R(n)?
What is a very fast algorithm I can use to compute R(n)
?
Let $n$ have the prime power factorization $$n=2^e p_1^{a_1}\cdots p_k^{a_k}q_1^{b_1}\cdots q_l^{b_l},$$ where the $p_i$ are distinct primes of the form $4t+1$, and the $q_i$ are distinct primes of the form $4t+3$.
If one or more of the $b_i$ is odd, then $n$ has no representations as a sum of two squares. If all the $b_i$ are even, then the number $N(n)$ of representations is given by $$N(n)=4(a_1+1)(a_2+1)\cdots (a_k+1).$$ However, this counts all representations, including the ones that use negative integers. It also considers $0^2+5^2$ as different from $5^2+0^2$.
If we don't want negatives, and order doesn't matter, things are more complicated. If $8$ divides $N(n)$, then the number of essentially distinct representations is equal to $$\frac{N(n)}{8}.$$
If $N(n)$ is not divisible by $8$, it is of the form $4q+4$. Then the number of representations is $$\frac{N(n)+4}{8}.$$
Remark: Let $D_1$ be the number of positive divisors of $n$ of the form $4t+1$, and $D_3$ the number of positive divisors of $n$ of the form $4t+3$. It is a nice result of Jacobi that $$N(n)=4(D_1-D_3).$$ Unfortunately, I do not see how this can be used to give a more efficient algorithm for computing $N(n)$.
This was getting a bit long for a comment, but I wanted to add an alternate perspective that is more computationally oriented.
If you are dealing with just 16-bit integers, a lookup table will be quite fast, occupying at most 64KB of storage (which fits into contemporary caches) and constructible in linear time by just iterating over all pairs $0 \le x \le y \le 256$.
1ms is a fairly tight bar for a computation. How much slower your runs getting? Consider that memory allocation/deallocation can have an impact at this timescale.
For 100 numbers in 1ms you could probably budget less than 10000 operations for each input. If you are searching primes all the way to 2^16, that already is too slow: make sure to only sieve out primes up to $\sqrt{n}$, which only needs about 50 primes.
Finally, one can speed up the average-case time by avoiding factoring in cases where it is obvious that $R(n)=0$. Half the time, $n$ will be equal to a power of $2$ times a number that is $3 \pmod 4$, which forces $R(n)=0$. Adding this very inexpensive test (which could be implemented with bit operations) could give you twice the overall speed if your inputs are random. One can likewise detect the presence of an odd $b_i$ (defined as in André Nicolas's answer) while factoring and cut the loop short.