Why is Frobenius norm of a matrix greater than or equal to the 2 norm?
Write $x=\sum_{j=1}^nc_je_j$, for coefficients $c_1,\ldots,c_n$. Suppose that $\|x\|_2=1$, i.e. $\sum_j |c_j|^2=1$. Then \begin{align} \|Ax\|_2^2&=\left\|\sum_j c_j\,Ae_j\right\|_2^2\leq\left(\sum_j|c_j|\,\|Ae_j\|_2\right)^{2}\\ \ \\ &\leq\left(\sum_j|c_j|^2\right)\sum_j\|Ae_j\|_2^2=\sum_j\|Ae_j\|_2^2=\|A\|_F^2, \end{align} where the triangle inequality is used in the first $\leq$ and Cauchy-Schwarz in the second.
As $x$ was arbitrary, we get $\|A\|_2\leq\|A\|_F$.
In fact, the proof from $\left\| \mathbf{A}\right\|_2 =\max_{\left\| \mathbf{x}\right\|_2=1} \left\| \mathbf{Ax} \right\|_2$ to $\left\| \mathbf{A}\right\|_2 = \sqrt{\lambda_{\max}(\mathbf{A}^H \mathbf{A})}$ is straight forward. We can first simply prove when $\mathbf{P}$ is Hermitian $$ \lambda_{\max} = \max_{\| \mathbf{x} \|_2=1} \mathbf{x}^H \mathbf{Px}. $$ That's because when $\mathbf{P}$ is Hermitian, there exists one and only one unitary matrix $\mathbf{U}$ that can diagonalize $\mathbf{P}$ as $\mathbf{U}^H \mathbf{PU}=\mathbf{D}$ (so $\mathbf{P}=\mathbf{UDU}^H$), where $\mathbf{D}$ is a diagonal matrix with eigenvalues of $\mathbf{P}$ on the diagonal, and the columns of $\mathbf{U}$ are the corresponding eigenvectors. Let $\mathbf{y}=\mathbf{U}^H \mathbf{x}$ and substitute $\mathbf{x} = \mathbf{Uy}$ to the optimization problem, we obtain
$$ \max_{\| \mathbf{x} \|_2=1} \mathbf{x}^H \mathbf{Px} = \max_{\| \mathbf{y} \|_2=1} \mathbf{y}^H \mathbf{Dy} = \max_{\| \mathbf{y} \|_2=1} \sum_{i=1}^n \lambda_i |y_i|^2 \le \lambda_{\max} \max_{\| \mathbf{y} \|_2=1} \sum_{i=1}^n |y_i|^2 = \lambda_{\max} $$
Thus, just by choosing $\mathbf{x}$ as the corresponding eigenvector to the eigenvalue $\lambda_{\max}$, $\max_{\| \mathbf{x} \|_2=1} \mathbf{x}^H \mathbf{Px} = \lambda_{\max}$. This proves $\left\| \mathbf{A}\right\|_2 = \sqrt{\lambda_{\max}(\mathbf{A}^H \mathbf{A})}$.
And then, because the $n\times n$ matrix $\mathbf{A}^H \mathbf{A}$ is positive semidefinite, all of its eigenvalues are not less than zero. Assume $\text{rank}~\mathbf{A}^H \mathbf{A}=r$, we can put the eigenvalues into a decrease order:
$$ \lambda_1 \geq \lambda_2 \geq \lambda_r > \lambda_{r+1} = \cdots = \lambda_n = 0. $$
Because for all $\mathbf{X}\in \mathbb{C}^{n\times n}$, $$ \text{trace}~\mathbf{X} = \sum\limits_{i=1}^{n} \lambda_i, $$ where $\lambda_i$, $i=1,2,\ldots,n$ are eigenvalues of $\mathbf{X}$; and besides, it's easy to verify $$ \left\| \mathbf{A}\right\|_F = \sqrt{\text{trace}~ \mathbf{A}^H \mathbf{A}}. $$
Thus, through $$ \sqrt{\lambda_1} \leq \sqrt{\sum_{i=1}^{n} \lambda_i} \leq \sqrt{r \cdot \lambda_1} $$ we have $$ \left\| \mathbf{A}\right\|_2 \leq \left\| \mathbf{A}\right\|_F \leq \sqrt{r} \left\| \mathbf{A}\right\|_2 $$
The Frobenius norm is sub-multiplicative, therefore $||Ax||_F \leq ||A||_F ||x||_F$, which gives:
$$\forall x \neq 0 \quad \frac{ ||Ax||_2 } {||x||_2} = \frac{ ||Ax||_F } {||x||_F} \leq ||A||_F $$
So you have an upper bound for the quotient, and since the supremum (here maximum) is by definition smaller than any other upper bound, you immediately get:
$$ \underset{x \neq 0}{\max}{\frac{||Ax||}{||x||}} =||A||_2 \leq ||A||_F $$
Use the following two facts/properties:
- The Frobenius norm and the 2-norm coincide for vectors: $\|u\|_2 = \|u\|_{F}$.
- The Frobenius norm is submultiplicative: $\|AB\|_{F} \leq \|A\|_{F}\|B\|_{F}$ for any compatible matrices $A$,$B$ (in particular when, $B$ is a vector).
The proof then goes like this: $$ \begin{align*} \|A\|_{2} &= \sup_{\|u\|_{2}=1}\|Au\|_{2} \quad \text{(Definition of 2-norm)}\\&= \sup_{\|u\|_{2}=1}\|Au\|_{F} \quad \text{(Property 1)} \\&\leq \sup_{\|u\|_{2}=1}\|A\|_{F}\|u\|_{F} \quad \text{(Property 2)}\\&= \sup_{\|u\|_{2}=1}\|A\|_{F}\|u\|_{2} \quad \text{(Property 1)}\\&= \|A\|_{F}\sup_{\|u\|_{2}=1}\|u\|_{2} \\&= \|A\|_{F}. \end{align*} $$