Why does implicit differentiation work on non-functions?
Solution 1:
When you write down an equation in $x$ and $y$ of the form $$F(x,y)=0,$$ then you can define a curve $\mathcal{C}$ in the plane by saying that $(x_0,y_0)\in\mathcal{C}$ if and only if $F(x_0,y_0)=0$... a point is on the curve if and only if it satisfies the equation of the curve.
Now for different $F(x,y)$ you get different curves. Some might be empty such as $x^2+y^2+1$ while others are the entire plane such as $\sin^2x+\cos^2x-1$.
Others are nicer such as $x^2+y^2-1$ which gives you a circle or the equation that gives a figure-of-eight. Now locally (i.e. near a specific point) these curves might not be that nice in that, no, they don't look the graphs of functions or differentiable functions.
For example, the point in the middle of the figure-of-eight does not look like the graph of a function. Similarly on the left- and right-hand sides of the circle there are vertical tangents --- this is not the behaviour of a differentiable function.
However away from such danger points --- such as anywhere else on the circle --- if you zoom in close enough to the point, locally, the circle looks like the graph of a function. So what we do is almost take each point $(x,y)$ on a case-by-case basis and say, O.K., near this point we could in theory define a function $y=y(x)$.
For the example of your circle. Suppose we are away from the vertical tangents and are thinking about a point $P=(x,y)$. The implicit function tells you that close to $P$ there is a function $y=y(x)$ whose graph is locally the same as the circle (in fact for the circle you have $y(x)=\pm\sqrt{1-x^2}$).
So we assume that, near $P$, we actually have (I actually get my students to write out $y=y(x)$ to help them see Chain Rules and not have $y'=1$) $$x^2+[y(x)]^2-1=0.$$
Now these are two functions (LHS and RHS) so have the same derivative: $$2x+2(y(x))\cdot\frac{dy}{dx}=0\Rightarrow x+y\frac{dy}{dx}=0...$$
Solution 2:
I think they are being a bit sloppy. They could phrase this more carefully by saying something like:
Suppose that $y$ is a differentiable function that satisfies \begin{equation*} x^2 + y(x)^2 = 1 \end{equation*} for all $x$ in $(-1,1)$ . By differentiating both sides of this equation, we find that \begin{equation*} y'(x) = -\frac{x}{y(x)} \end{equation*} for all $x$ in $(-1,1)$.
Solution 3:
An equation of the form $$F(x,y)=0\tag{1}$$ generically defines a curve $\gamma$ in the $(x,y)$-plane, which may have self-intersections or singularities. When ${\bf p}=(x_0,y_0)$ is a "normal" point on $\gamma$ then you can draw a small window $$W:=\ ]x_0-h,x_0+h[\ \times \ ]y_0-h,y_0+h[\ $$ with center ${\bf p}$. Within this window the curve $\gamma$ can be viewed as graph of a "local" function $y=\phi(x)$ or $x=\psi(y)$. One then says that within $W$ the function $\phi$, resp. $\psi$, is implicitly defined by the given equation $(1)$. One has the following formula for the derivative of $\phi$ at $x_0$: $$\phi'(x_0)=-{F_x(x_0,y_0)\over F_y(x_0,y_0)}\ .$$ In this way one can compute the derivative of $\phi$ at individual points even if it is not possible to solve $(1)$ explicitly for $y$ in terms of the variable $x$.