How to solve the matrix minimization for BFGS update in Quasi-Newton optimization
I'll outline the major steps. See the explanations and answers to the comments below.
- Introduce some notations $$ \hat H=W^{1/2}HW^{1/2},\quad \hat H_k=W^{1/2}H_kW^{1/2},\quad \hat s_k=W^{1/2}s_k,\quad \hat y_k=W^{-1/2}y_k.\tag{1} $$ Then the problem becomes $$ \min\|\hat H_k-\hat H\|_F\quad\text{subject to }\hat H\hat y_k=\hat y_k\ (=\hat s_k). $$
- To use the fact that $\hat y_k$ is the eigenvector of $\hat H$, let us introduce the new orthonormal basis $$ U=[u\ |\ u_\bot] $$ where $u$ is the normalized $\hat y_k$, i.e. $$u=\frac{\hat y_k}{\|\hat y_k\|}\tag{2},$$ and $u_\bot$ is any ON-complement to $u$. Since the Frobenius norm is unitary invariant (as it is the sum of squares of the singular values) we have $$ \|\hat H_k-\hat H\|_F=\|U^T\hat H_kU-U^T\hat HU\|_F= \left\|\begin{bmatrix}\color{blue}* & \color{blue}*\\\color{blue}* & \color{red}*\end{bmatrix}-\begin{bmatrix}\color{blue}1 & \color{blue}0\\\color{blue}0 & \color{red}*\end{bmatrix}\right\|_F. $$ The blue part cannot be affected by optimization, and to minimize the Frobenius norm, it is clear that we should make the red part become zero, that is, the optimal solution satisfies $$ \color{red}{u_\bot^T\hat Hu_\bot}=\color{red}{u_\bot^T\hat H_ku_\bot}. $$ It gives the optimal solution to be $$ \hat H=U\begin{bmatrix}\color{blue}1 & \color{blue}0\\\color{blue}0 & \color{red}{u_\bot^T\hat H_ku_\bot}\end{bmatrix}U^T. $$
- To write it more explicitly $$ \hat H=\begin{bmatrix}u & u_\bot\end{bmatrix}\begin{bmatrix}1 & 0\\0 & u_\bot^T\hat H_ku_\bot\end{bmatrix}\begin{bmatrix}u^T \\ u_\bot^T\end{bmatrix}=uu^T+u_\bot u_\bot^T\hat H_ku_\bot u_\bot^T= uu^T+(I-uu^T)\hat H_k(I-uu^T) $$ where we used the following representation of the projection operator to the complement of $u$ $$ u_\bot u_\bot^T=I-uu^T. $$
- Changing variables back to the original ones (1), (2) is straightforward.
Explanations:
Step 1. The convenience for $\hat H$ and $\hat H_k$ comes directly from the problem $$ \min\|\underbrace{W^{1/2}H_kW^{1/2}}_{\hat H_k}-\underbrace{W^{1/2}HW^{1/2}}_{\hat H}\|_F. $$ Then we have to rewrite the data too \begin{align} Hy_k=s_k\quad&\Leftrightarrow\quad \underbrace{\color{blue}{W^{1/2}}H\color{red}{W^{1/2}}}_{\hat H}\underbrace{\color{red}{W^{-1/2}}y_k}_{\hat y_k}=\underbrace{\color{blue}{W^{1/2}}s_k}_{\hat s_k},\\ Ws_k=y_k\quad&\Leftrightarrow\quad \underbrace{\color{red}{W^{-1/2}}Ws_k}_{\hat s_k}=\underbrace{\color{red}{W^{-1/2}}y_k}_{\hat y_k}. \end{align} Thus, $\hat H\hat y_k=\hat y_k$. It is also equal to $\hat s_k$.
Step 2. Since $\hat Hu=u$ we know that $u^T\hat Hu=u^Tu=1$ and $u_\bot^THu=u_\bot^Tu=0$, so we can represent the optimizing variable as $$ U^T\hat HU=\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat H\begin{bmatrix}u & u_\bot\end{bmatrix}= \begin{bmatrix}u^T\hat Hu & u^T\hat Hu_\bot\\u_\bot^T\hat Hu & u_\bot^T\hat Hu_\bot\end{bmatrix}= \begin{bmatrix}1 & 0\\0 & u_\bot^T\hat Hu_\bot\end{bmatrix}. $$ It gives the following \begin{align} U^T\hat H_kU-U^T\hat HU&=\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat H_k\begin{bmatrix}u & u_\bot\end{bmatrix}-\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat H\begin{bmatrix}u & u_\bot\end{bmatrix}=\\ &=\begin{bmatrix}\color{blue}{u^T\hat H_ku} & \color{blue}{u^T\hat H_ku_\bot}\\\color{blue}{u_\bot^T\hat H_ku} & \color{red}{u_\bot^T\hat H_ku_\bot}\end{bmatrix}-\begin{bmatrix}\color{blue}{1} & \color{blue}{0}\\\color{blue}{0} & \color{red}{u_\bot^T\hat Hu_\bot}\end{bmatrix}. \end{align} This particular structure of the optimizing variable was the whole idea to switch to the new basis. Because $\hat H$ has no freedom to vary in the blue part, we cannot change the corresponding blue part of $\hat H_k$, so it is fixed for all possible $\hat H$. The red part though can be changed as we wish, and the smallest Frobenius norm \begin{align} \|U^T(\hat H_k-\hat H)U\|_F^2&= \left\|\begin{bmatrix}\color{blue}{u^T\hat H_ku-1} & \color{blue}{u^T\hat H_ku_\bot}\\\color{blue}{u_\bot^T\hat H_ku} & \color{red}{u_\bot^T\hat H_ku_\bot-u_\bot^T\hat Hu_\bot}\end{bmatrix}\right\|_F^2=\\ &=\color{blue}{(u^T\hat H_ku-1)^2+\|u^T\hat H_ku_\bot\|_F^2+\|u_\bot^T\hat H_ku\|_F^2}+\color{red}{\|u_\bot^T\hat H_ku_\bot-u_\bot^T\hat Hu_\bot\|_F^2} \end{align} is obtained when the red part is zero.
Step 3. The matrix $U$ is orthogonal, hence, $$ I=UU^T=\begin{bmatrix}u & u_\bot\end{bmatrix}\begin{bmatrix}u^T \\ u_\bot^T\end{bmatrix}=uu^T+u_\bot u_\bot^T\quad\Leftrightarrow\quad u_\bot u_\bot^T=I-uu^T. $$