Ratio of largest eigenvalue to sum of eigenvalues -- where to read about it?

Let $E_j$ be the $j$th largest-magnitude eigenvalue of a real symmetric $N \times N$ matrix $M$. I've found that the ratio

$$\frac{|E_1|}{\sum_{j=1}^N{|E_j|}},$$

is a measure of the "rank-one-ness" of $M$. Qualitatively, the more similar the columns of $M$ are to each other, the higher the ratio. In my graduate research, this measure appears naturally for a specific class of matrices.

I'm certain that there's been prior research on the properties and usefulness of this measure for deciding how well-aligned and similar the columns of a matrix are. For example, I've seen it used as a measure of "compressibility". Still, my searches haven't turned up much.

Where can I find out more?

Because $M$ is a correlation matrix, we know the diagonal elements $m_{ii} = 1 \ \forall i$. Computing the eigenvalues and eigenvectors of $M$ is is equivalent to performing principal components analysis on rescaled data (each column variable having unit variance).

The quantity $$\delta_j=\frac{|\lambda_j|}{\sum_{i=1}^{N} |\lambda_i|}$$ represents the proportion variation the $j$th eigenvector explains in your data set. Statisticians often order the eigenvalues of the correlation (or covariance) matrix by decreasing magnitude, and plot the level of cumulative variation explained by each eigenvector starting with the largest (respective) eigenvalue, and adding the next largest until all are exhausted. This is called a scree plot, and a quick google query will provide many examples.

The utility of plotting cumulative sums of $\delta_j$ is one can visualize the marginal explanatory power gained from including an additional principal component in a set of linear factors modeling $M$.

We know if $\delta_j \approx 1$, then all columns of $M$ are approximately the same, and if $\delta_j \approx 0$, then the $j$th column bears little resemblance to the others (in a linear sense).

Researching how statisticians choose principal components may prove useful for your purposes, as there is a lot written about this.

Similarly, operations researchers and applied mathematicians often study the column subset selection problem, which may also bear relevant fruit.

You can read about it in any chapter on principal components analysis (PCA), specifically PCA performed on a correlation matrix $\mathbf{R}$. As pointed out above, the sum of eigenvalues $\sum_j \lambda_j$ equals $p$, the number of dimensions of your data matrix. The ratio of the largest eigenvalue to the sum of eigenvalues reflects the percent variance explanation of the largest eigenvalue. However, it's not just the largest that is of interest: often the ratio of each eigenvalue to the sum (i.e, $p$) is used to reflect the percentage of variance explanation for each successive eigenvalue.

Ratio of largest eigenvalue to sum of eigenvalues -- where to read about it?

Related

Recent Posts