Algorithm to find solution to $ax^2 + by^2 = 1$ in a finite field
Let $\mathbb{F}$ be a finite field, and let $a,b \in \mathbb{F}$ be given, subject to $a\ne 0, b \ne 0$. Consider the equation
$$ax^2 + by^2 = 1.$$
It is guaranteed that there exists a solution to this equation. My question:
Is there an efficient algorithm to compute a solution to this equation, given $a,b$?
The proof I've seen that there exists a solution doesn't immediately yield an efficient algorithm (it'd require enumerating all of the elements of $\mathbb{F}$, which is inefficient). Is there a more efficient algorithm?
By efficient, I'm thinking of something with polynomial running time, i.e., where the number of basic field operations is polynomial in $\lg |\mathbb{F}|$.
This reminds me a little bit of Pell's equation, in a finite field, but I haven't seen how to translate those methods over to a finite field. I don't know if that is a distraction or if it's somehow relevant.
There is a natural (randomized) algorithm:
Choose $x,y$ uniformly at random from $\mathbb{F}$.
Let $s = ax^2 + by^2$. If $s \ne 0$ and $s$ is a square in $\mathbb{F}$, let $r= \sqrt{s}$ (the square-root can be computed efficiently in a finite-field) and set $x'=x/r$, $y'=y/r$. Now $x',y'$ are a solution to the original equation, as can be readily verified, so we are done.
If not, go back to step 1.
Is there any guarantee that this algorithm's expected running time will be polynomial? Heuristically, we would expect the answer to be yes: about half of the elements of $\mathbb{F}$ are squares, so (assuming that $ax^2 + by^2$ is distributed uniformly at random when $x,y$ are) the number of iterations of this algorithm will be geometrically distributed with parameter $1/2$, or 2 iterations in expectation. Of course, this is just a heuristic.
Can we justify the running time more rigorously? I argue below that the answer is yes.
Let's dispose of a trivial situation first: if we are in characteristic two, then every value has a square root, so the above algorithm will succeed with near-certainty. (Letting $s=ax^2 + by^2$, we have $\Pr[s=0]=1/|\mathbb{F}|$, and assuming $s\ne 0$, $s$ is surely a square, so step 2 succeeds with probability $1-1/|\mathbb{F}| \ge 1/2$.)
Therefore, in what remains, we can assume that we are in odd characteristic.
To help us reason about this situation, let's divide both sides by $a$, to get the related equation
$$x^2 + c y^2 = d,$$
where I've defined $c=b/a$, $d=1/a$. Any solution to this equation will solve the original equation.
I will argue that $x^2 + c y^2$ is approximately uniform on $\mathbb{F}$, when $(x,y)$ is uniform on $\mathbb{F}^2$. By appropriate scaling, this will show that our heuristic was indeed rigorous.
I'll define a multiplicative group $\mathbb{G} = (\{(x,y) : x^2 + c y^2 \ne 0\}, \otimes)$, where the group operation $\otimes$ is defined as follows:
$$(x_1,y_1) \otimes (x_2,y_2) = (x_1 x_2 - c y_1 y_2, x_1 y_2 + x_2 y_1).$$
It is straightforward but tedious to verify that this indeed forms a group operation. Also, there is a group homomorphism $N : \mathbb{G} \to \mathbb{F}^*$ given by
$$N(x,y) = x^2 + c y^2.$$
Again, it is straightforward but tedious to verify that this is a homomorphism of multiplicative groups, e.g., $N((x,y) \otimes (x',y')) = N(x,y) \times N(x',y')$.
Now let $s,t$ be two non-zero elements of $\mathbb{F}$. Define
$$S = \{(x,y) \in \mathbb{G} : N(x,y) = s\}, T = \{(x,y) \in \mathbb{G} : N(x,y) = t\}.$$
Since $N$ is surjective (as shown here), these are two cosets of the kernel of $N(\cdot)$, and any two such cosets must have the same size (Lagrange's theorem), so it follows that $|S|=|T|$. But $s,t$ were arbitrary non-zero elements of $\mathbb{F}$, so it follows that $N(x,y)$ is distributed uniformly on $\mathbb{F}^*$ when $x,y$ is distributed uniformly on $\mathbb{G}$.
Now $\mathbb{G}$ holds the set of $(x,y)$ such that $x^2+cy^2\ne 0$, so we must consider the set $A=\{(x,y) : x^2+cy^2=0\} = \mathbb{F}^2 \setminus \mathbb{G}$. It turns out that this set is not very large: if $-c$ is a square in $\mathbb{F}$, then $|A| = 2|\mathbb{F}|-1$, otherwise $|A|=1$. It follows that, when $x,y$ are chosen uniformly at random from $\mathbb{F}$, the probability that $ax^2 + by^2=0$ is small ($\le 2/|\mathbb{F}|$), so the probability that we succeed in step 2 of the algorithm at the top of this answer is at least $(1-2/|\mathbb{F}|)/2$. Thus, the expected number of iterations is constant and small. This rigorously justifies the heuristic analysis above, and shows that the algorithm runs in polynomial time, as required.
Where on earth did this mysterious group $\mathbb{G}$ come from? The intuition is that it comes from looking at the a complexification of $\mathbb{F}$. In particular, let $\mathbb{K} = \mathbb{F}(\sqrt{-c})$, i.e., adjoin a square root of $-c$. Define a norm on $\mathbb{K}$, so that $N(z) = z \times \overline{z}$ where $\overline{z}$ is the complex conjugate of $z$: or more precisely, define $N(\cdot)$ by
$$N(x+y\sqrt{-c}) = (x+y\sqrt{-c}) (x-y\sqrt{-c}) = x^2 + c y^2.$$
This norm is multiplicative: $N(z_1 z_2) = N(z_1) N(z_2)$ for $z_1,z_2 \in \mathbb{K}$, and $N(z) \in \mathbb{F}$ for all $z \in \mathbb{K}$. The group operation of $\mathbb{G}$ is just multiplication in $\mathbb{K}$, and the group homomorphism $\mathbb{N}$ is just this norm. This intuition only works out if $-1$ is a non-square in $\mathbb{F}$ (so that $-c$ will be a non-square), but I think the argument I gave above actually holds regardless of whether $-1$ is a square or not.
I hope I haven't made any mistakes. My thanks to Steven Stadnicki for his careful proof-reading of an earlier attempt at answering this question and to Pablo Rotondo for helpful comments that improved this answer.