How to learn a Vandermonde matrix from data

_{Cross-posted on Operations Research SE.}

I would like to discuss how to tackle an optimization problem to learn a Vandermonde matrix. In particular, I have an optimization problem in the following form:

\begin{align} V &= \operatorname{argmin}_{V}\|A - \operatorname{diag}(Vb )C\|_F^2 \end{align} where $A,C$ are $N \times N$ matrices, $V$ is a $N \times K$ Vandermonde matrix with $N >K$ and $b$ is a $K \times 1$ vector of coefficients.

This is a generalization of FIR filter design, where instead of finding the filter coefficients $b$ we search for the basis vectors $\{v_1, \ldots, v_k\}$ of the Vandermonde matrix. Each of this vector is such that $v_{i+1}= v_i \odot v_1 $, i.e., it is a geometric progression over the rows.

How can I learn the matrix $V$? Is there a a way to rewrite such problem in a form easier to optimize?

The problem is not convex, since it is the composition of a norm with a non-affine mapping of the $N$ variables contained in $v_1$. Exploiting the relationship between Frobenius norm and trace property, I could arrive at the problem of minimizing $V$ over the function:

$$\operatorname{vec}(Z)^\top \operatorname{vec}(Z) -2 \operatorname{vec}(A)^{\top}\operatorname{vec}(Z) $$

where $\operatorname{vec}(Z)= \operatorname{vec}(C)^\top (I \odot I) \operatorname{vec}(Vb)$. Might this help in solving for $V$? (or better, for $v_1$?).

Solution 1:

$ \def\bbR#1{{\mathbb R}^{#1}} \def\h#1{{\odot #1}} \def\a{\alpha}\def\b{\beta}\def\e{\varepsilon} \def\o{{\tt1}}\def\p{\partial} \def\B{\Big}\def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\B(#1\B)} \def\bR#1{\big(#1\big)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\diag#1{\operatorname{diag}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\SDiag#1{\operatorname{SubDiag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\m#1{\left[\begin{array}{r}#1\end{array}\right]} $Construct a Vandermonde matrix $V$ from the independent vector $x$ as follows $$\eqalign{ V &= \m{x^\h{0}&x^\h{\o}&x^\h{2}&x^{\odot 3}&\ldots&x^\h{(k-1)}} \;\in\bbR{n\times k} \\ &= \m{\o&&x&&x^\h{2}&x^\h{3}&\ldots&x^\h{(k-1)}} \\ \\ x^{\odot 3} &= x\odot x\odot x \quad\quad\bR{{\rm Hadamard\;powers/products}} \\ dx^{\odot 3} &= \c{dx}\odot x\odot x + x\odot \c{dx}\odot x + x\odot x\odot \c{dx} \\ &= 3x^\h{2}\odot {dx} \\ }$$ The differential of the matrix (wrt $x$) can be calculated as $$\eqalign{ dV &= \m{0&&\o&&2x^{\odot\o}&&3x^{\odot2}&\ldots} \odot \m{dx&dx&dx&\ldots} \\ &= \bR{VD_k^T}\odot\bR{dx\,\o^T} \\ &= \Diag{dx}\cdot \bR{VD_k^T} \\ D_k &= \SDiag{1,2,3,\ldots} = \m{ 0 & 0 & 0 & \ldots & 0 \\ \c{\o} & 0 & 0 & \ldots & 0 \\ 0 & \c{2} & 0 & \ldots & 0 \\ 0 & 0 & \c{3} & \ldots & 0 \\ \vdots & \vdots & \vdots &\ddots \\ } \;\in\bbR{k\times k} \\ }$$ The $D_k$ matrix is the Differentiation Matrix for the polynomial space.

For typing convenience, define the matrix variables $$\eqalign{ M &= {\Diag{Vb}C-A} \\ H &= \diag{MC^T}\,b^T \\ }$$ Note that $H$ is a function of $M$ which is a function of $V$ which is a function of $x$.
Therefore everything $\big($except for $A,b,C,D_k\big)$ is a polynomial function of $x$.

Now write the cost function and calculate its differential and gradient wrt $x$. $$\eqalign{ \phi &= \frac 12 M:M \\ d\phi &= M:dM \\ &= M:\BR{\Diag{dV\,b}\,C} \\ &= H:dV \\ &= H:\LR{\Diag{dx}\cdot \bR{VD_k^T}} \\ &= \LR{HD_k V^T}:\Diag{dx} \\ &= \diag{HD_k V^T}:dx \\ \grad{\phi}{x} &= \diag{HD_k V^T} \\ }$$ where a colon denotes the Frobenius product, which is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:AB &= CB^T:A = A^TC:B \\ \\ }$$

Setting the gradient to zero yields a complicated system of (nonlinear) polynomial equations $$\eqalign{ f(x) = 0 \;\in\bbR{n} }$$ whose roots must be found. Once the optimal $x$ has been located, the corresponding $V$ matrix can be constructed, as shown at the top of the post.

There will undoubtedly be many, many roots of $f(x),\,$ all of which are local extrema of the cost function. Each must be checked to determine if it's the desired global minimum.

Perhaps using a Computer Algebra System you can find the roots using a Gröbner basis or something.

How to learn a Vandermonde matrix from data

Solution 1:

Related

Recent Posts