Why is the Operator Norm so hard to calculate?

First, as others have mentioned, the operator norm has many nice properties that make it convenient to use in proofs (most basically the fact that, by definition, it satisfies $\| Ax \| \le \| A \| \|x \|$). You might, for example, end up with factors of the operator norm in various bounds; even if you can't calculate the operator norm, if you can upper or lower bound it as appropriate then you can still extract information from these bounds. To really see the operator norm in action you can try learning some functional analysis; it really starts to be useful in the infinite-dimensional setting.

Second, here's how you calculate the operator norm (edit: when $p=2$). Let me assume that $A$ is real for simplicity although it doesn't matter much. You want to maximize $\langle Ax, Ax \rangle$ as $x$ ranges over all unit vectors. This is equivalent to maximizing

$$\langle A^T A x, x \rangle.$$

Now, unlike $A$, the matrix $A^T A$ is symmetric, and so by the spectral theorem it has an orthonormal basis of eigenvectors. These are the right singular vectors $r_i$ of $A$, and the corresponding eigenvalues are the squares $\sigma_i^2$ of the singular values of $A$ (up to the appearance of some zeroes, which don't matter for this calculation). If we write $x$ in this basis as

$$x = \sum x_i r_i$$

we get that

$$\langle Ax, Ax \rangle = \sum \sigma_i^2 x_i^2$$

where $\langle x, x \rangle = \sum x_i^2 = 1$. This is a much easier optimization problem! It follows that $\langle Ax, Ax \rangle$ is maximized when $x$ is equal to a right singular vector corresponding to the largest singular value $\sigma_1$, and that its maximum value is $\sigma_1^2$. Hence $\sigma_1$ is the operator norm of $A$. Note that if $A$ is normal it coincides with the absolute value of the largest eigenvalue (in absolute value) of $A$.

The largest singular value can be calculated in various ways. See the Wikipedia article on singular value decomposition for details.


The point is: in pure mathematics you mostly don't care about actually calculating things ;)

Of course this is (partly) a joke, but the answer is really that this norm has a lot of nice properties (see functional analysis), and allows you to prove theorems which will make other calculations a lot easier. The trouble is: if you want to define things so that they actually work for theorems (satisfy some nice universal property and so on), they are often quite difficult (if not impossible) to compute in practice. But after all, computing is not all that you want, if theorems manage to make everything a lot clearer.


An operator norm is better than just a norm, it is an algebra norm (not sure if it is the right term, in French it is). The point is that the norm satisfies: $$\forall A,B,\|AB\|\leqslant\|A\|\times\|B\|.$$


The operator norm of a square matrix $A$ is the square root of the magnitude of the largest eigenvalue of $A^T A$. To see this, first note that $$\|A^T A\| = \sup\limits_{\|x\|=1} \| A^T A x \| = \sup\limits_{\|x\|=1} \sup\limits_{\|y\|=1} \langle A^T A x , y \rangle = \sup\limits_{\|x\|=1} \sup\limits_{\|y\|=1} \langle Ax , Ay \rangle$$ $$= \sup\limits_{\|x\|=1} \|Ax\|^2.$$ Thus $\|A^T A\| = \|A\|^2$. Now, using the fact that a symmetric matrix has an orthornomal basis of eigenvectors, its not hard to show that for a symmetric matrix $S$, $\|S\|$ is the (absolute value of) the largest eigenvalue of $S$. Since $A^T A$ is symmetric, this completes the proof.


It depends on the norm which you take to begin with. Some matrix norms are hard to compute, others are not. In your example, for $p=2$, the norm of the matrix $A\in \mathbb{K}^{n\times n}$ is the square root of the maximal eigenvalue of $A^* A$. This computation is not too hard, even in large dimensions, as this is a Hermitian, resp. symmetric eigenvalues problem.

Similarly, for $p=1$ and $p=\infty$ the matrix norm has simple expressions, as the column sum, resp. row sum norm.

The technical reason why operator norms are great has been pointed out in previous answers. Submultiplicativity is very handy for many types of estimates. For instance, you get that $\|A\| \geq r(A)$, where $r(A)$ is the spectral radius of $\|A\|$. And on top of that the famous Gelfand formula $$ r(A) = \lim_{k\to \infty} \|A^k\|^{1/k} = \inf_{k\geq 1} \|A^k\|^{1/k},$$ which even holds for bounded linear operators in Banach spaces.