Understanding weighted inner product and weighted norms

I am reading this book where at page 27 following definitions about weighted inner product and weighted norms are given.

Let $M$ and $N$ be Hermitian positive definite matrices of order $m$ and $n$ respectively. The weighted inner products in $\mathbb{C}^{m}$ and $\mathbb{C}^{n}$ are

$(x, y)_{M} = y^{*}Mx$ , $x, y \in \mathbb{C}^{m}$ and $(x, y)_{N} = y^{*}Nx$ , $x, y \in \mathbb{C}^{n}$ ....$(1)$

The definitions of weighted vector norms are

$\|x\|_{M} = (x, x)^\frac{1}{2}_{M} = (x^{*}Mx)^\frac{1}{2} = \|M^\frac{1}{2} x\|_{2}$, $x\in\mathbb{C}^{m}$ ....$(2)$

$\|x\|_{N} = (x, x)^\frac{1}{2}_{N} = (x^{*}Nx)^\frac{1}{2} = \|N^\frac{1}{2} x\|_{2}$, $x\in\mathbb{C}^{n}$ ....$(3)$

The definitions of weighted matrix norm are

$\|A\|_{MN} = \max_{\|x\|_{N} = 1}{\|Ax\|_{M}},\; x \in\mathbb{C}^n and ~~A\in \mathbb{C}^{m\times n}$

$\|B\|_{NM} = \max_{\|x\|_{M} = 1}{\|Bx\|_{N}},\; x \in\mathbb{C}^n and ~~B\in \mathbb{C}^{n\times m}$

Such a norm is sometimes called an operator norm subordinate to vector norm. It is easy to verify that

$\|A\|_{MN} = \|M^\frac{1}{2} A N^\frac{-1}{2} \|_{2}$ ....$(4)$

$\|B\|_{NM} = \|N^\frac{1}{2} B M^\frac{-1}{2} \|_{2}$ ....$(5)$

Could anybody explain me about the significance of weighted norms? Why we need weighted norm? In $(2)$ how we got $\|M^\frac{1}{2} x\|_{2}$? How could we find square root of matrix $M$? How did we got equation $(4)$ and $(5)$.

I would be very much thankful for the help and suggestions.


Solution 1:

Weighted norms have a variety of uses. Suppose you're measuring the size of vectors that are coming out of some random or physical process, and they look like this: $$ \begin{bmatrix} +5.4\times 10^{-10} \\ -1.3\times 10^{+6} \\ \end{bmatrix} \begin{bmatrix} +1.8\times 10^{-9} \\ -4.3\times 10^{+5} \\ \end{bmatrix} \begin{bmatrix} -2.3\times 10^{-9} \\ +3.4\times 10^{+5} \\ \end{bmatrix} \begin{bmatrix} +8.6\times 10^{-10} \\ +3.6\times 10^{+6} \\ \end{bmatrix} \begin{bmatrix} -3.2\times 10^{-10} \\ +2.7\times 10^{+6} \\ \end{bmatrix} $$ Would it make sense to use the standard Euclidean norm $\|\cdot\|_2$ to measure the size of these vectors? I say no. The values of $x_1$ hover around $10^{-9}$, $x_2$ around $10^6$. Since $x_1$ is so much smaller than $x_2$, $\|x\|_2\approx |x_2|$. You're losing information about $x_1$ with this measurement.

What you might choose to do in this circumstance is select a diagonally weighted norm $\|x\|_D\triangleq\sqrt{x^*Dx}$, with the values of $D_{ii}>0$ chosen to "normalize" each entry. For instance, I might choose $D_{11}=10^{18}$ and $D_{22}=10^{-12}$. The values of $D^{1/2} x$ are $$ \begin{bmatrix} +0.54 \\ -1.3 \end{bmatrix} \begin{bmatrix} +1.8 \\ -0.43 \end{bmatrix} \begin{bmatrix} -2.3 \\ +0.34 \end{bmatrix} \begin{bmatrix} +0.86 \\ +3.6 \end{bmatrix} \begin{bmatrix} -0.32 \\ +2.7 \end{bmatrix} $$ Now small relative changes in $x_1$ will have approximately the same impact on the norm $\|x\|_D=\sqrt{x^*Dx}=\|D^{1/2}x\|_2$ as small relative changes in $x_2$. This is probably a more informative norm for this set of vectors than a standard Euclidean norm.

Diagonally weighted norms are probably the easiest to justify intuitively, but in fact more general weighted norms have their uses. For instance, they come up often in proofs about Newton's method.

For information about matrix square roots, Wikipedia really is not a bad place to start, or any reasonably good linear algebra text. Square roots exist for any Hermitian positive semidefinite matrix---that is, any Hermitian matrix with nonnegative real eigenvalues.

Two types of square roots are typically considered for a real symmetric/complex Hermitian PSD matrix $M$. The lower triangular Cholesky factor $L$ satisfying $M=LL^*$ is simpler to compute in practice. But the symmetric/Hermitian square root $Q=M^{1/2}$ satisfying $M=Q^2$ is often preferred in proofs, because then you don't have to keep track of transposes, and because sometimes it is helpful for $Q$ and $M$ to share eigenvectors.

With the symmetric square root defined, the derivations for (2) are straightforward: $$\|M^{1/2}x\|_2 = \left(x^*M^{*/2}M^{1/2}x\right)^{1/2} = \left(x^*M^{1/2}M^{1/2}x\right)^{1/2} = \left(x^*Mx\right)^{1/2} = \|x\|_M.$$ Here is a derivation for (4). First, we convert the numerator: $$\|M^{1/2}AN^{-1/2}\|_2 = \max_{\|x\|_2=1} \|M^{1/2}(AN^{-1/2}x)\|_2 = \max_{\|x\|_2=1} \|AN^{-1/2}x\|_M$$ Now we define $y=N^{-1/2} x$, or $x=N^{1/2} y$: $$\max_{\|x\|_2=1} \|AN^{-1/2}x\|_M = \max_{\|N^{1/2} y\|_2=1} \|Ay\|_M = \max_{\|y\|_N=1}\|Ay\|_M.$$

Solution 2:

The above answers are perfectly nice. I just want to point out another example: energy norms.

I don't know how familiar you are with differential equations and/or calculus of variations, but I'll give it a try anyway.

Consider the following integral:

$$ E(v) = \frac{1}{2}\int_\Omega |\nabla v|^2 dx $$ where $\Omega$ is a nice (say with a smooth boundary, no corners nor spikes) bounded domain in $\mathbb{R}^n$. This in many application represents the internal energy of a system in a configuration given by the function $v$. For instance, if $v$ is the displacement from a reference configuration, $E(v)$ represents the elastic energy of the system (assuming linear elasticity).

The above integral can be rewritten as

$$ E(v) = a(v,v) $$ with

$$ a(u,v) = \frac{1}{2}\int_\Omega \nabla u\cdot\nabla v dx $$

Now, suppose we have a finite dimensional representation of the function $v$ (if you know finite elements you know where I'm heading). This means

$$ v(x) = \sum_{i=1}^n v_i \varphi_i(x), $$ where all the $\varphi_i(x)$ are fixed and known a priori.

If you plug this expression inside the definition of $E$ you get (being careful not to mess up the indices)

$$ E(v) = \frac{1}{2}\int_\Omega \nabla\left(\sum_{j=1}^n v_j \varphi_j(x)\right)\cdot\nabla\left(\sum_{i=1}^n v_i \varphi_i(x)\right)dx\\ = \cdots = \sum_{i=1}^n\sum_{j=1}^n v_iv_j\frac{1}{2}\int_\Omega \nabla \varphi_j \cdot \nabla \varphi_i dx $$

Now let $\underline{v}$ be the vector of the coefficients $v_i$ and $A$ the matrix whose entries are

$$ a_{ij} = a(\varphi_j,\varphi_i) = \frac{1}{2}\int_\Omega \nabla \varphi_j\cdot\nabla\varphi_i dx $$ Under certain assumptions on $v$ (for instance, $v=0$ on $\partial\Omega$, the boundary of $\Omega$), it can be shown that this is indeed a positive definite matrix.

Now, if the system is in a configuration described by $v$ and $v$ is expressed as above, then the energy of the system is given by

$$ E(v) = a(v,v) = \underline{v}^tA\underline{v} $$ which is precisely a weighted norm of $\underline{v}$ (squared). Here the matrix is not exactly a weight, bur rather it encodes the physics of the phenomenon. It is possible to show that, if $v$ is expanded as before, and you pick your basis functions $\varphi_i$ in such a way that

$$ \int_\Omega \varphi_i\varphi_j dx = \begin{cases} 0\mbox{ if }i\neq j\\ 1\mbox{ if }i=j, \end{cases} $$ then the standard Eucledian norm of $\underline{v}$ is corresponds to the value of the integral

$$ I(v) = \int_\Omega v^2 dx $$ which is an important norm of $v$, but it measures a different energy of the system.