Quasi-newton methods: Understanding DFP updating formula

In Nocedal/Wright Numerical Optimization book at pages 138-139 the approximate Hessian $B_k$ update for the quasi-Newton method: (DFP method) $$B_{k+1} = \left(I-\frac{y_ks_k^T}{y_k^Ts_k}\right)B_k\left(I-\frac{s_ky_k^T}{y_k^Ts_k}\right)+ \frac{y_ky_k^T}{y_k^Ts_k}\tag{1}$$ is explained as the solution to the problem: $$\min_B\|B-B_k\|_{F,W} \\ \text{subject to}~B=B^T,~Bs_k=y_k \tag{2}$$ for which $\|A\|_{F,W}$ is the weighted Frobenius norm : $$\|A\|_{F,W} = \|W^{1/2}AW^{1/2}\|_F$$ for $W$ being any symmetric matrix satisfying the relation $Wy_k=s_k$

How can I prove that $B_{k+1}$ given by equation (1) is the solution to the problem (2)?


This answer goes basically along the lines of my answer about BFGS update.

  1. Introduce the short notations $$ \min\|\underbrace{W^{1/2}B_kW^{1/2}}_{\hat B_k}-\underbrace{W^{1/2}BW^{1/2}}_{\hat B}\|_F $$ \begin{align} Wy_k=s_k\quad&\Leftrightarrow\quad \underbrace{\color{red}{W^{-1/2}}Wy_k}_{\hat y_k}=\underbrace{\color{red}{W^{-1/2}}s_k}_{\hat s_k}\quad\Leftrightarrow\quad \hat y_k=\hat s_k,\tag{1}\\ Bs_k=y_k\quad&\Leftrightarrow\quad \underbrace{\color{blue}{W^{1/2}}B\color{red}{W^{1/2}}}_{\hat B}\underbrace{\color{red}{W^{-1/2}}s_k}_{\hat s_k}=\underbrace{\color{blue}{W^{1/2}}y_k}_{\hat y_k}\quad\Leftrightarrow\quad \hat B\hat s_k=\hat y_k.\tag{2} \end{align} Then the problem becomes $$ \min\|\hat B_k-\hat B\|_F\quad\text{subject to }\hat B\hat s_k=\hat s_k. $$
  2. The optimization variable $\hat B$ has the given eigenvector with the given eigenvalue, hence, it is convenient to introduce the new orthonormal basis $$ U=[u\ |\ u_\bot] $$ where $u$ is the normalized eigenvector $\hat s_k$, i.e. $$ u=\frac{\hat s_k}{\|\hat s_k\|}\tag{3}, $$ and $u_\bot$ is any orthonormal complement to $u$. Then $u^T\hat Bu=u^Tu=1$ and $u_\bot^T\hat Bu=u_\bot^Tu=0$, and the operator matrices in the new basis take the form \begin{align} U^T\hat B_kU-U^T\hat BU&=\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat B_k\begin{bmatrix}u & u_\bot\end{bmatrix}-\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat B\begin{bmatrix}u & u_\bot\end{bmatrix}=\\ &=\begin{bmatrix}\color{blue}{u^T\hat B_ku} & \color{blue}{u^T\hat B_ku_\bot}\\\color{blue}{u_\bot^T\hat B_ku} & \color{red}{u_\bot^T\hat B_ku_\bot}\end{bmatrix}-\begin{bmatrix}\color{blue}{1} & \color{blue}{0}\\\color{blue}{0} & \color{red}{u_\bot^T\hat Bu_\bot}\end{bmatrix}. \end{align} Since the Frobenius norm is unitary invariant (as it depends on the singular values only) we have \begin{align} \|\hat B_k-\hat B\|_F^2&=\|U^T(\hat B_k-\hat B)U\|_F^2= \left\|\begin{bmatrix}\color{blue}{u^T\hat B_ku-1} & \color{blue}{u^T\hat B_ku_\bot}\\\color{blue}{u_\bot^T\hat B_ku} & \color{red}{u_\bot^T\hat B_ku_\bot-u_\bot^T\hat Bu_\bot}\end{bmatrix}\right\|_F^2=\\ &=\color{blue}{(u^T\hat B_ku-1)^2+\|u^T\hat B_ku_\bot\|_F^2+\|u_\bot^T\hat B_ku\|_F^2}+\color{red}{\|u_\bot^T\hat B_ku_\bot-u_\bot^T\hat Bu_\bot\|_F^2} \end{align} The blue part cannot be affected by optimization, and to minimize the Frobenius norm, it is clear that we should make the red part zero, that is, the optimal solution satisfies $$ \color{red}{u_\bot^T\hat Bu_\bot}=\color{red}{u_\bot^T\hat B_ku_\bot}. $$
  3. It gives the optimal solution to be \begin{align} \hat B&=U\begin{bmatrix}\color{blue}1 & \color{blue}0\\\color{blue}0 & \color{red}{u_\bot^T\hat B_ku_\bot}\end{bmatrix}U^T=\begin{bmatrix}u & u_\bot\end{bmatrix}\begin{bmatrix}1 & 0\\0 & u_\bot^T\hat B_ku_\bot\end{bmatrix}\begin{bmatrix}u^T \\ u_\bot^T\end{bmatrix}=uu^T+u_\bot u_\bot^T\hat B_ku_\bot u_\bot^T=\\ &=uu^T+(I-uu^T)\hat B_k(I-uu^T) \end{align} where we used the following representation for the projection operator to the complement of $u$ $$ I=UU^T=\begin{bmatrix}u & u_\bot\end{bmatrix}\begin{bmatrix}u^T \\ u_\bot^T\end{bmatrix}=uu^T+u_\bot u_\bot^T\quad\Leftrightarrow\quad u_\bot u_\bot^T=I-uu^T. $$
  4. Changing variables back to the original ones is straightforward via (1), (2), (3) $$ B=W^{-1/2}\hat BW^{-1/2}=W^{-1/2}uu^TW^{-1/2}+(I-W^{-1/2}uu^TW^{1/2})B_k(I-W^{1/2}uu^TW^{-1/2}) $$ where \begin{align} \|\hat s_k\|^2&=\hat s_k^T\hat s_k=\hat y_k^T\hat s_k=y_k^Ts_k,\\ uu^T&=\frac{\hat s_k^T\hat s_k}{\|\hat s_k\|^2}=\frac{\hat y_k\hat y_k^T}{y_k^Ts_k}=W^{1/2}\frac{y_ky_k^T}{y_k^Ts_k}W^{1/2},\\ W^{-1/2}uu^TW^{-1/2}&=\frac{y_ky_k^T}{y_k^Ts_k},\\ W^{1/2}uu^TW^{-1/2}&=\frac{Wy_ky_k^T}{y_k^Ts_k}=\frac{s_ky_k^T}{y_k^Ts_k}. \end{align}