Deriving the normal distance from the origin to the decision surface

While studying discriminant functions for linear classification, I encountered the following:

.. if $\textbf{x}$ is a point on the decision surface, then $y(\textbf{x}) = 0$, and so the normal distance from the origin to the decision surface is given by:

$$ \frac{\textbf{w}^T \textbf{x}}{\lvert\lvert \textbf{w} \lvert\lvert} = -\frac{w_0}{\lvert\lvert \textbf{w} \lvert\lvert} \tag 1 $$

Where $\textbf{w}$ is a weight vector, and $w_0$ is a bias. In an attempt to derive the above formula I tried the following:

\begin{align*} & \textbf{w}^T \textbf{x} + w_0 = 0 \tag 2\\ & \textbf{w}^T \textbf{x} = -w_0 \tag 3 \end{align*}

After which I am basically stuck. I think that the author gets about from equation $(3)$ to equation $(1)$ by normalising. But isn't calculating the normal (perpendicular) distance quite separate from normalising a vector? Secondly, how does equation $(1)$ translate into the normal distance being $ - \frac{w_0}{\lvert\lvert \textbf{w} \lvert\lvert}$ i.e. How is the quantity $\frac{\textbf{w}^T \textbf{x}}{\lvert\lvert \textbf{w} \lvert\lvert}$ the normal distance ?


I encountered the same confusion - it's one of the few places Bishop is unclear. I derived the distance from the origin to the hyperplane in a different way. Since we know that $w$ is orthogonal to the hyperplane, we know that the point $x'$ on the hyperplane that is closest to the origin can be represented as $x'=\alpha w$ for some scalar $\alpha$. Then, since $x'$ is on the hyperplane, we know that $w^T x' + w_0=0 \Rightarrow \alpha w^Tw+w_0=0 \Rightarrow \alpha=\frac{-w_0}{||w||^2}$. The the distance from $x'$ to the origin is just $||x'||=||\alpha w||=\alpha*||w||=\frac{-w_0}{||w||^2}||w||=\frac{-w_0}{||w||}$. This assumes that $w_0$ is negative, but if you want signed distances, you can modify things to fit your convention.


There is a simple proof which I think is what C.Bishop was hinting at. So basically we have established that the weight vector $\vec{w}$ is orthogonal to the decision boundary. We now take a vector from the origin to a point on the boundary x. The projection of this vector (lets call that vector $\vec{x}$) on $\vec{w}$ will have magnitude equal to the orthogonal distance to the decision boundary. This projection, which we symbolize $proj_{\vec{w}} \vec{x}$ is given by $\frac{\vec{w} \vec{x}}{\|\vec{w}\|^2} \vec{w}$ so $$ \|proj_{\vec{w}} \vec{x}\| =\frac{\vec{w} \vec{x}}{\|\vec{w}\|} $$ see https://en.wikibooks.org/wiki/Linear_Algebra/Orthogonal_Projection_Onto_a_Line). Since x is on the line that means that $\vec{w} \vec{x} + w_0 =0 $ so in the end we get that the orthogonal distance is $$r = \frac{\vec{w} \vec{x}}{\|\vec{w}\|} = -\frac{w_0}{\|\vec{w}\|}$$

projection of vector from origin to point on the decision boundary to weight vector