Measure of "how much diagonal" a matrix is

Given that your entries are frequencies, and you want to give credit for being "close" to the diagonal, a natural approach is to compute the correlation coefficient between the row and column. That is, suppose your matrix is built as follows: repeatedly generate a pair of numbers $x$ and $y$, and increment the count of the matrix entry at position $(x,y)$. If you think of $x$ and $y$ as samples of random variables $X$ and $Y$ respectively, then the sample correlation coefficient $r$ of $X$ and $Y$ lies between $-1$ and $1$. It is $1$ if $X$ and $Y$ are perfectly correlated, $-1$ if they are perfectly anticorrelated. The point is that $X$ and $Y$ are perfectly correlated (in this case, equal) precisely when the matrix is diagonal, strong correlation means the matrix entries tend to be near the diagonal.

This is robust: the correlation coefficient is unchanged if you scale the matrix (and the formula turns out to make sense even if your entries are nonnegative real numbers).

If you adapt the formulas in the above reference to this situation, they take the following form. Let $A$ be a $d\times d$ matrix; let $j$ be the $d$-long vector of all ones, and let $r=(1,2,\ldots,d)$ and $r_2=(1^2,2^2,\ldots,d^2)$. Then:

$$\begin{align} n &= j A j^T \textrm{ (the sum of the entries of $A$) }\\ \Sigma x &= r A j^T\\ \Sigma y &= j A r^T\\ \Sigma x^2 &= r_2 A j^T\\ \Sigma y^2 &= j A r_2^T\\ \Sigma xy &= r A r^T\\ r &= \frac{n\, \Sigma xy -\Sigma x\, \Sigma y}{\sqrt{n\, \Sigma x^2 - (\Sigma x)^2}\sqrt{n\, \Sigma y^2 - (\Sigma y)^2}} \end{align}$$

Some examples:

Diagonal matrix: $\left( \begin{array}{cccc} 1. & 0. & 0. & 0. \\ 0. & 5. & 0. & 0. \\ 0. & 0. & 30.5 & 0. \\ 0. & 0. & 0. & 3.14159 \\ \end{array} \right): \quad r=1.000000$

Diagonally dominant matrix: $\left( \begin{array}{ccc} 6 & 1 & 0 \\ 1 & 5 & 2 \\ 1 & 3 & 6 \\ \end{array} \right): \quad r=0.674149$

Uniformly distributed on $[0,1]$: $\left( \begin{array}{cccc} 0.2624 & 0.558351 & 0.249054 & 0.484223 \\ 0.724561 & 0.797153 & 0.689489 & 0.273023 \\ 0.462727 & 0.119412 & 0.911981 & 0.636588 \\ 0.089544 & 0.160899 & 0.910123 & 0.549202 \\ \end{array} \right): \quad r=0.233509$

Tridiagonal: $\left( \begin{array}{ccccc} 2 & 1 & 0 & 0 & 0 \\ 1 & 3 & 2 & 0 & 0 \\ 0 & 2 & 3 & 4 & 0 \\ 0 & 0 & 1 & 2 & 3 \\ 0 & 0 & 0 & 1 & 1 \\ \end{array} \right): \quad r=0.812383$


Here's an easy one. Let $M$ be your measured matrix, and $A$ be the matrix which agrees with $M$ along the diagonal, but is zero elsewhere. Then pick your favorite matrix norm (operator probably works well here) and use $\|M-A\|$ as your measurement.

If you want more fine tuned understanding of 'clustering', instead of making all the entries off the diagonal $0$, weight them by what band they are on. So the super and sub diagonal might take half the corresponding value in $M$.