Jaccard index, matrix notation

I have a matrix with rows representing events and columns representing users. The elements of the matrix are binary values indicating if a user has attended the event or not.

\begin{bmatrix}1&1&0&1&1\\1&1&0&0&1\\ 1&0&0&1&1\end{bmatrix}

I need matrix notation to compute the Jaccard distances between users.

\begin{align} J(U_1,U_2)=\frac{|U_1\cap U_2|}{|U_1\cup U_2|} \end{align}

To compute the numerator I can use the matrix operation \begin{align} A^T\times A \end{align}

Now my question is how to get the denominator of Jaccard index using the matrix notation.


Use the all-ones matrix $N,$ with the same dimensions as $A,$ to define the symmetric matrix $$\eqalign{ L &= A^TN + N^TA \\ }$$ Then the Jaccard matrix can be written as $$\eqalign{ J &= (A^TA)\oslash(L-A^TA) \\ }$$ where $\oslash$ denotes elementwise (aka Hadamard) division.

The elements of this $J$-matrix are the Jaccard distances, i.e. $$J_{ik} = J(U_i,U_k)\\$$


NB: $\;$The third column of your $A$-matrix is problematic and results in $J_{33}=\frac{0}{0}$
One way to address this is to use the elementwise pseudoinverse $$ \big(M^\oplus\big)_{ij} = \big(M_{ij}\big)^+ = \begin{cases} M_{ij}^{-1}\;&{\rm if\;}|M_{ij}|\ge\varepsilon\\0\;&{\rm otherwise}\end{cases} $$ Then the expression for the $J$-matrix becomes $$J = (A^TA)\odot(L-A^TA)^\oplus$$ where $\odot$ denotes the elementwise (aka Hadamard) product.

There is a simpler way to compute the number of elements of all unions which is based on the De Morgan duality relationship :

$$S_i \cup S_j=(S_i^c \cap S_j^c)^c \tag{1}$$

(where $^c$ means "complementary set").

Let us keep the same name $A$ for the initial matrix.

We have first to constitute the matrix $B$ of complementary sets. This is simply done by replacing each entry $A_{ij}$ by $1-A_{ij}$. The corresponding matrix operation is $B=U-A$ where $U$ is a matrix of ones with the convenient size.

The second operation in (1) (taking the intersection of complementary sets) is done evidently by matrix operation $C=B^TB$.

Let us introduce notation $|.|$ for the number of elements of a set.

The generic entry $C_{ij}$ of $C$ is

$$C_{ij}=|S_i^c \cap S_j^c|$$

Thus

$$e-C_{ij}=|(S_i^c \cup S_j^c)\color{red}{^c}|\tag{2}$$

(where $e$ is the number of "events").

The matrix operation corresponding to (2) is $Q=eU-C$ where $U$ is again a matrix of ones with the convenient dimensions, reaching thus our objective.


Here is a Matlab program which does the work :

u=5;% number of users; ( = number of sets)
e=3;% number of events ( = number of elements)
A=[1 1 0 1 1
   1 1 0 0 1
   1 0 1 1 1];% data matrix (dimensions e x u)
P=A'*A; % P_{ij}=|S_i inter S_j|  
B=ones(e,u)-A;
C=B'*B;
Q=e*ones(u,u)-C ; % Q_{ij}=|S_i union S_j|
Jac=P./Q, % matrix of Jaccard indices