Probability that $x^2-y^2$ is divisible by $k$

Let two numbers $x$ and $y$ be selected from the set of first $n$ natural numbers with replacement(i.e. the two numbers can be same).The question is to find out the probability that $x^2-y^2$ is divisible by $k\in \mathbb{N}$


For $k=2$

Any number can be expressed as $2p,2p+1$.Now $x^2-y^2=(x-y)(x+y)$ If both numbers are of form $2p+1$ then (x-y) would be divisible by $2$ .if two numbers are of different forms then it will not be divisible by $2$.So the probability in this case is $a^2+(1-a)^2$ where $a$ is probability that number chosen is divisible by $2$ which is $\frac{\lfloor \frac{n}{2} \rfloor}{n}$.However this gets complicated with $k=3$ onwards because numbers in different forms may be divisible.In other words if there a generalisation or way to solve for some large $k$.Thanks.


A generalization expressed by a set

A good way to generalize this is to use modular arithmetic, or essentially look at the remainder of $\frac{x}{k}$ and $\frac{y}{k}$. As you pointed out in your example for $k=2$, the numbers can only be expressed as $$2*p,2*p+1$$

This can be further generalized into a set of unique expressions that define every number or every $x$ and $y$ for a value $k>1$ and $p≥0$$$S={kp,kp+1...kp+(k-2),kp+(k-1)}$$

Using modular arithmetic, we know that $S\equiv \{0,1,2...k-2,k-1\}\pmod k$

Since $x$ and $y$ are any two terms from the set $S$ for any $p≥0$, we know that $x$ and $y$ are really just equal to either any value from $\{0,1,2...k-2,k-1\}$ since we only need to look at the remainder when divided by $k$ to determine divisibility.

Probability using mod

Now we know $x^2-y^2 = (x-y)(x+y)$ meaning either $(x-y)$ or $(x+y)$ have to be divisible by $k$ in order to fulfill the requirement that $x^2-y^2$ is divisible by $k$.

It should be noted that the probability of choosing a random number being expressed by any expression in set $S$ is uniform and is equal to $\frac{1}{k}$. For example, the probability of choosing a number is expressible by $3p+1$ is $\frac{1}{3}$. This can be proved by showing how every number expressed by $kp$ can always be paired with $kp+1,kp+2,...kp+(k-2),kp+(k-1)$, thus showing how there are an equal number of numbers of each type.

Onto the question....

$(x-y)$ is divisible by $k$ if $(x-y)\equiv 0\pmod k$. This simplifies to $(x$ mod $k)-(y \text{ mod}\ k) = 0$. Thus we now need to find the probability that $x$ and $y$ are both expressible by the same expression in the set S.

Now this comes down to a simple problem of asking what is the probability of choosing two of the same elements from the set $S$ without replacement, which is $\frac{k}{k} *\frac{1}{k} =$ $\frac{k}{k^2}$

$(x+y)$ is divisible by $k$ if $(x+y)\equiv 0\pmod k$. This simplifies to $(x$ mod $k)+(y \text{ mod}\ k) = 0$. Thus we now need to find the probability that $x$ and $y$ are chosen such that the $x+y =$ a multiple of $k$.

Knowing that $x$ and $y$ must come from the set $S$, we can see there is a specific pairing of elements from $S$ that causes the addition of the pair of elements to be equal to $2*kp+k$, a multiple of $k$. The only times that the pair of elements, which will be the expression that expresses $x$ and $y$, add up to $2*kp$ is when one is expressed by $kp+m$ and the other being $kp+(k-m)$ for any whole number $m$. This is because $(kp+m)+(kp+(k-m)) = 2*kp$. For example $3p+1$ can be paired up with $3p+2$ since their sum is $6p+3$, a multiple of 3.

Now we are now looking for the probability the two chosen elements from $S$ are pairs of each other (If both chosen elements are $kp$, it will be a multiple of k still) The total number of pairs that can be chosen is the total number of elements in set $S$ squared, which is $k^2$. The total number of pairs that fulfill the divisibility by $k$ is equal to $\lfloor\frac{k}{2}\rfloor + 1$. The ceiling function is applied to correct errors that occur when $k$ is odd as you obviously can't have a fractional number of pairs. The extra $+1$ is for including the case when both elements are the same.

Thus the probability that $(x+y)$ is divisible by $k$ is $\frac{\lfloor\frac{k}{2}\rfloor + 1}{k^2}$.

However now we encounter a problem of overlap, which are the cases when the two elements chosen from $S$ make $(x-y)$ and $(x+y)$ divisible by k. To get rid of the over count we need to subtract the number of times this happens for a given set $S$. We know we need to subtract once for the case when both elements are $kp$. We also need to subtract another one depending if $k$ is even or not. This is because when $k$ is even, we have the case where the element $kp+\frac{k}{2}$ can pair with itself to fulfill everything. Thus the final answer is $$\frac{k}{k^2}+\frac{\lfloor\frac{k}{2}\rfloor + 1}{k^2} - \frac{(k+1 \text{ mod }2)+1}{k^2}$$

The $\frac{(k+1 \text{ mod }2)+1}{k^2}$ tells us to subtract one more and subtract another if $k$ is even, hence we shifted the modular part by 1 since $( \text{ even number mod }2 = 0)$


EDIT 12.04.17

After some days of studying this interesting problem, I provide now a complete analytic solution of the problem by retrieving a related problem in OEIS.

I give exact values of the first few probabilities, and give an exact general formula for the probababily if the divisor $n$ is prime.

A Monte-Carlo-Simulation is also presented.

Notice that my findings came up in opposite temporal order.

Analytic solution to the problem

Looking up the sequence of the first few terms of

$$s(n) = p(n) n^2 \tag{1}$$

which are

$${1, 2, 5, 8, 9, 10, 13, 24, 21, 18, 21, 40, 25, 26, 45}$$

in the online-encyclopedia of integer sequences https://oeis.org/ gives us the entry A062803 "Number of solutions to $x^2 == y^2 mod(n)$", created initially by Ahmed Fares (ahmedfares(AT)my-deja.com), Jul 19 2001. A formula was devised by Vladeta Jovovic on Sep 22, 2003: "Multiplicative with a(2^e)=e*2^e and a(p^e)=((p-1)*e+p)*p^(e-1) for an odd prime p."

Let us explore this a little bit closer. Our problem is transformed into the question of the number $s(n)$ of solutions to the congruence

$$x^2-y^2 == 0, \; mod(n)\tag{1}$$

Our probabilities are then given by

$$p(n) = s(n)/n^2\tag{2}$$

Suppose the number $n$ has the representation

$$n = \prod_{i=1}^{k} p_i^{a_i}\tag{3}$$

where $p_i$ is the $i$-th prime number appearing in $n$ in ascending order, $a_i$ ist multiplicity (exponent) and $k$ is the number of different prime factors in $n$. Notice that in number theory $k$ is traditionally called $\omega(n)$, the number of distinct prime factors. It is implemented in Mathematica as PrimeNu[n].

Then the statement of being multiplicative means that we can apply the formula to each of the prime power factors separately, and multiply the result together.

This gives for $n$ even

$$s(n_{even}) = (a_1 2^{a_1}) \prod_{i=2}^{k} ((p_i-1)a_i+p_i) p_i^{a_i-1}\tag{4.1}$$

and for $n$ odd

$$s(n_{odd}) = \prod_{i=1}^{k} ((p_i-1)a_i+p_i) p_i^{a_i-1}\tag{4.2}$$

It is easy to see that for an odd prime $n$, $s(n) = 2n-1$, as claimed earlier.

The formula simplifies if $n$ is square-free. Then all non-vanishing $a_i$ are equal to $1$, and we find

$$s(n_{even}) = 2 \prod_{i=2}^{k} (2 p_i-1)\tag{5.1}$$ $$s(n_{odd}) = \prod_{i=1}^{k} (2 p_i-1)\tag{5.2}$$

For those who are interested in encoding this formula, here is an example in Mathematica

s[n_] := Module[{fi, pi, ai, pout},   
  fi = FactorInteger[n];  
  pi = #[[1]] & /@ fi;  
  ai = #[[2]] & /@ fi;  
  pout = If[OddQ[n],   
    Product[((pi[[i]] - 1) ai[[i]] + pi[[i]]) pi[[i]]^(ai[[i]] - 1), {i, 1, Length[pi]}],    
    ai[[1]] 2^ai[[1]] Product[((pi[[i]] - 1) ai[[i]] + pi[[i]])
       pi[[i]]^(ai[[i]] - 1), {i, 2, Length[pi]}]]]  

The prime factor decomposition of $n$ is done by the function FactorInteger[]. From this we extract the $p_i$ and $a_i$, and then apply the formula of Jovovic.

Exact values of the probabilities

We make the (natural) assumption that all possible remainders of a randomly chosen number $x$ modulo $n$ have equal probability.

Then we can calculate the exact values of the probabilities with the following piece of code (written here in Mathematica)

h[n_] := 1/n^2 Count[Flatten[Table[Mod[x^2 - y^2, n], {x, 0, n - 1}, {y, 0, n - 1}]], 0]

Explanation

For a given divisor $n$ the expression $z = x^2-y^2$ needs to be considered only for $x$ and $y$ bewteen $0$ and $n-1$. The Table[] lists all elements $x^2-y^2 mod(n)$, and Flatten[] puts them in one array. Now Count[.,0] counts the zeroes in this array. Dividing this by $n^2$ gives the probability.

The result for $n = 1..30$ in the format $\{n,p(n)\}$ are

$$h(n)_{tab} = \left( \begin{array}{ccccc} \{1,1\} & \left\{2,\frac{1}{2}\right\} & \left\{3,\frac{5}{9}\right\} & \left\{4,\frac{1}{2}\right\} & \left\{5,\frac{9}{25}\right\} \\ \left\{6,\frac{5}{18}\right\} & \left\{7,\frac{13}{49}\right\} & \left\{8,\frac{3}{8}\right\} & \left\{9,\frac{7}{27}\right\} & \left\{10,\frac{9}{50}\right\} \\ \left\{11,\frac{21}{121}\right\} & \left\{12,\frac{5}{18}\right\} & \left\{13,\frac{25}{169}\right\} & \left\{14,\frac{13}{98}\right\} & \left\{15,\frac{1}{5}\right\} \\ \left\{16,\frac{1}{4}\right\} & \left\{17,\frac{33}{289}\right\} & \left\{18,\frac{7}{54}\right\} & \left\{19,\frac{37}{361}\right\} & \left\{20,\frac{9}{50}\right\} \\ \left\{21,\frac{65}{441}\right\} & \left\{22,\frac{21}{242}\right\} & \left\{23,\frac{45}{529}\right\} & \left\{24,\frac{5}{24}\right\} & \left\{25,\frac{13}{125}\right\} \\ \left\{26,\frac{25}{338}\right\} & \left\{27,\frac{1}{9}\right\} & \left\{28,\frac{13}{98}\right\} & \left\{29,\frac{57}{841}\right\} & \left\{30,\frac{1}{10}\right\} \\ \end{array} \right)$$

The simulation results (see below) are in reasonable agreement with these results.

If $n$ is an odd prime number the probability is given by

$$p(n)=\frac{2 n-1}{n^2}$$

If $n$ is composite I have not found the exact formula. The problem here seems to be related to quadratic residues.

Monte-Carlo-Simulation

EDIT 11.04.17

We distinguish two possible basic sets of integers from which to select for the divisibility test:

  1. Set with repetition

We create a set $m$ consisting of all numbers $z = x^2-y^2$ of integers where $1<= x <= n_{max}, 1<= y <= n_{max}$.

  1. Set without repetition

The set $m_0$ is obtained by removing from $m$ all duplicates.

Then, for a given divisor $n$ we estimate the probability of divisibility by the ratio of the number of elements of the set for which $\frac{z}{n}$ is integer relative to all elements of the set.

The resulting probabilities for $m_{max} = 10^3$ and divisors in the range $n = 1..30$ are

Case 1 (with repetition)

List of results in the format $(n, p(n))$

$$((1, 1.0), (2, 0.5000005), (3, 0.55533378), (4, 0.5000005), (5, \ 0.35968128), (6, 0.27766739), (7, 0.26530612), (8, 0.37475112), (9, \ 0.25933407), (10, 0.17984114), (11, 0.17355372), (12, 0.27766739), \ (13, 0.14792899), (14, 0.13265356), (15, 0.19973533), (16, \ 0.25000075), (17, 0.11417454), (18, 0.12966754), (19, 0.10246197), \ (20, 0.17984114), (21, 0.14736213), (22, 0.08677736), (23, \ 0.08502487), (24, 0.20808462), (25, 0.10419251), (26, 0.073965), (27, \ 0.11103881), (28, 0.13265356), (29, 0.06774345), (30, 0.09986916))$$

The graph

enter image description here

Case 2 (no repetition)

List of results in the format $(n, p(n))$

$$((1, 0.5125414), (2, 0.19847984), (3, 0.22136804), (4, 0.19847984), \ (5, 0.14290505), (6, 0.08393305), (7, 0.10653981), (8, 0.11713062), \ (9, 0.08960969), (10, 0.05436621), (11, 0.07129035), (12, \ 0.08393305), (13, 0.06139016), (14, 0.04070156), (15, 0.05862769), \ (16, 0.06700892), (17, 0.04836223), (18, 0.03377941), (19, \ 0.04369956), (20, 0.05436621), (21, 0.04403089), (22, 0.02740017), \ (23, 0.03676344), (24, 0.04804486), (25, 0.03682731), (26, \ 0.0236357), (27, 0.03531034), (28, 0.04070156), (29, 0.02972352), \ (30, 0.02204489))$$

The graph

enter image description here

Notice the remarkable peridocity with a period of $4$ in the "artificial" case 2. I'm sure there is a simple explanation for this, and that some readers can give it.