distance from a point to a hyperplane
I have an n-dimensional hyperplane: $w'x + b = 0$ and a point $x_0$. The shortest distance from this point to a hyperplane is $d = \frac{|w \cdot x_0+ b|}{||w||}$. I have no problem to prove this for 2 and 3 dimension space using algebraic manipulations, but fail to do this for an n-dimensional space. Can someone show a nice explanation for it?
There are many ways to solve this problem. In principal one can use Lagrange multipliers and solve a large system of equations, but my attempt to do so met with a road block. However, since you are working in $\mathbb{R}^n$ we have the privilege of orthogonal projection via the dot product. To this end we need to construct a vector from the plane to $x_0$ to project onto a vector perpendicular to the plane. Then we compute the length of the projection to determine the distance from the plane to the point.
First, you have an affine hyperplane defined by $w \cdot x + b=0$ and a point $x_0$. Suppose that $X \in \mathbb{R}^n$ is a point satisfying $w \cdot X+b=0$, i.e. it is a point on the plane. You should construct the vector $x_0 - X$ which points from $X$ to $x_0$ so that you can project it onto the unique vector perpendicular to the plane. Some quick reasoning should tell you that this vector is, in fact, $w$. So we want to compute $\| \text{proj}_{w} (x_0-X)\|$. Some handy formulas give us $$ d=\| \text{proj}_{w} (x_0-X)\| = \left\| \frac{(x_0-X)\cdot w}{w \cdot w} w \right\| = |x_0 \cdot w - X \cdot w|\frac{\|w\|}{\|w\|^2} = \frac{|x_0 \cdot w - X \cdot w|}{\|w\|}$$ We chose $X$ such that $w\cdot X=-b$ so we get $$ d=\| \text{proj}_{w} (x_0-X)\| = \frac{|x_0 \cdot w +b|}{\|w\|} $$ as desired.
This almost seems like cheating and purely heuristic based on Euclidean geometry. Indeed, I would be more satisfied with a solution via Lagrange multipliers since it would not have required the fact that $\mathbb{R}^n$ has an inner product and just needed the topology of $\mathbb{R}^n$ instead. But we have the inner product, so maybe geometry will suffice for us this time.
To make this argument more concrete you should do each step in $\mathbb{R}^2$ for a line $y=mx+b$ and a point $(x_0,y_0)$.
Here is a Lagrange multiplier based solution.
The goal is to minimize $ (x_0 - x)'(x_0 - x) $ subject to $ w'x + b = 0 $
The Lagrangian is $ (x_0 - x)'(x_0 - x) - L(w'x + b) $
The derivative of the Lagrangian is $ 2(x_0 - x) - Lw = 0 $
Dot with $ w $, we get $ 2w'(x_0 - x) - Lw'w = 0 \implies L = \frac{2w'(x_0 - x)}{w'w} $
Dot with $ (x_0 - x) $, we get $ 2(x_0 - x)'(x_0 - x) - L(x_0 - x)'w = 0 \implies 2(x_0 - x)'(x_0 - x) = \frac{2w'(x_0 - x)}{w'w} (x_0 - x)'w \implies (x_0 - x)'(x_0 - x) = \frac{\left(w'(x_0 - x)\right)^2}{w'w} \implies (x_0 - x)'(x_0 - x) = \frac{\left(w'x_0 + b\right)^2}{w'w} $
Taking square root gives the answer we wanted.
The problem has a simple solution via elementary geometry. Indeed, consider the line via $x_0$ and parallel to the vector $w$, namely $L := \{x_0 + tw \mid t \in \mathbb R\} \subseteq \mathbb R^n$. This line cuts your hyperplane when $w^\top(x_0+tw) + b = 0$, i.e $t = -(w^\top x_0 + b)/\|w\|^2$. The distance between this point of intersection and starting point $x_0$ is $$ d := \|x_0 + tw - x_0\| = \|tw\|=|t|\|w\| = \frac{|w^\top x_0+b|}{\|w\|^2}\cdot\|w\| = \frac{|w^\top x_0 + b|}{\|w\|}, $$ as claimed.