Spanning forests of bipartite graphs and distinct row/column sums of binary matrices

Let $F_{m,n}$ be the set of spanning forests on the complete bipartite graph $K_{m,n}$. Let $$S_{m,n} = \{(r(M), c(M)), M \in B_{m,n} \}$$ where $B_{m,n}$ is the set of $m \times n$ binary matrices and $r(M), c(M)$ are the vectors of row sums and column sums of $M$, respecitvely. That is, $S_{m,n}$ is the set of the distinct row-sum, column-sum pairs of binary matrices. (The term spanning forest here refers to a forest that spans all of the vertices of the given graph; it doesn't have to be a maximal acyclic subgraph.)

Q: Is it true that $|F_{m,n}| = |S_{m,n}|$? It is true for $m,n \leq 4$. For $m=n$ this is OEIS A297077.

There is an obvious mapping from $F_{m,n} \rightarrow B_{m,n}$ given by taking the reduced adjacency matrix, so if $U = \{u_1, \ldots, u_m\}$, $V = \{v_1, \ldots, v_m\}$ are the color categories we set $M_{ij} = 1$ if $v_i \sim u_j$ in a forest $F$, else $M_{ij} = 0$. However, this does not help because multiple forests may have the same row and column sums - and not every row-sum, column-sum pair is represented by a forest under this mapping.

The numbers $|F_{m,n}|$ are given here:

$$\begin{array}{|c|c|}\hline m\backslash n & 1 & 2 & 3 & 4 & 5 \\ \hline 1 & 2 & \\\hline 2 & 4 & 15 \\\hline 3 & 8 & 54 & 328 \\\hline 4 & 16 & 189 & 1856 & 16145 \\\hline 5 & 32 & 648 & 9984 & 129000 & 1475856 \\\hline\end{array}$$

For more see this answer (sum each row.)

I can prove the conjecture, although I can't give a bijection. I've left up a previous answer which does the case $m=3$, since it serves as an example of many of the ideas here.

Here is the key observation:

Theorem: There is a unique array $f(m,n)$, for $m$ and $n$ nonnegative integers, satisfying:

$f(m,0) = f(0,n)=1$
If we fix $m \geq 1$, then $f(m,n)$ is of the form $(m+1)^n p_m(n)$ where $p_m(x)$ is a polynomial of degree $\leq m-1$.
If we fix $n \geq 1$, then $f(m,n)$ is of the form $(n+1)^m p_n(m)$ where $p_n(x)$ is a polynomial of degree $\leq n-1$.

Uniqueness: Let's suppose that we have already found unique values of $f(m,n)$ whenever $\min(m,n) < k$. In particular, we know the values of $f(k,x)$ and $f(x,k)$ for $0 \leq x \leq k-1$. This is enough values to uniquely determine a degree $k-1$ polynomial, so the values we already know determine the values of $f(k,x)$ and $f(x,k)$ for all $x$.

Existence: The proof of uniqueness gives an algorithm to compute $f$. The only potential issue is that $f(k,k)$ is determined twice, as a polynomial in its first variable and in its second variable. However, the algorithm is symmetric in $m$ and $n$, so the two interpolating polynomials coincide. $\square$

For example, we have $f(1,0) = 1 = 2^0 \cdot 1$ so we must have $p_1(n)=1$ and $f(1,n) = 2^n$. Similarly, $f(2,0)=1=2^0 \cdot 1$ and $f(2,1) = 4=3^1 \cdot (4/3)$, so we must have $p_2(n) = 1+n/3$ and $f(2,n) = 3^n(1+n/3) = 3^{n-1} (n+3)$. Using this algorithm, I computed $f(m,n)$ for $0 \leq m,n \leq 10$. Here is the data, and some Mathematica code, if you'd like to play with it.

f[m_, n_] := f[m, n] =
  If[m == 0 || n == 0, 1,
   If[m <= n, 
    (m + 1)^n * InterpolatingPolynomial[Table[{k, f[m, k]/(m + 1)^k}, {k, 0, m - 1}], x] /. x -> n,
    (n + 1)^m * InterpolatingPolynomial[Table[{k, f[k, n]/(n + 1)^k}, {k, 0, n - 1}], x] /. x -> m]]

Values of $f(m,n)$ for $0 \leq m,n \leq 10$.

1,1,1,1,1,1,1,1,1,1,1
1,2,4,8,16,32,64,128,256,512,1024
1,4,15,54,189,648,2187,7290,24057,78732,255879
1,8,54,328,1856,9984,51712,260096,1277952,6160384,29229056
1,16,189,1856,16145,129000,968125,6925000,47690625,318500000,2073828125
1,32,648,9984,129000,1475856,15450912,151201728,1403288064,12479546880,107152782336
1,64,2187,51712,968125,15450912,219682183,2862173104,34828543449,401200569280,4418300077219
1,128,7290,260096,6925000,151201728,2862173104,48658878080,760774053888,11122973573120,153936323805184
1,256,24057,1277952,47690625,1403288064,34828543449,760774053888,15047189968833,274908280855680,4707029151392121
1,512,78732,6160384,318500000,12479546880,401200569280,11122973573120,274908280855680,6199170628499200,129708290461760000
1,1024,255879,29229056,2073828125,107152782336,4418300077219,153936323805184,4707029151392121,129708290461760000,3283463201858585471

So, we can prove the conjecture by showing that both $F_{m,n}$ and $S_{m,n}$ obey conditions 1, 2 and 3. For condition 1, we can just define $S_{m,0}$, $S_{0,n}$, $F_{m,0}$, and $F_{0,n}$ to be singletons (and I claim this is actually the most natural definition); then we just need to check that our verification of polynomiality goes all the way down to the zero case.

Since conditions 2 and 3 are symmetric, and the definitions of $F_{m,n}$ and $S_{m,n}$ are symmetric, we just check condition $2$.

Verification of condition 2 for forests: Let $A_{m,b}$ be the set of isomorphism classes of bicolored forests whose white vertices are labeled $\{1,2,\ldots,m \}$ and which have $b$ black vertices, each of degree $\geq 2$. We claim that $$|F_{m,n}| = \sum_b |A_{m,b}| n(n-1)(n-2) \cdots (n-b+1) (m+1)^{n-b} . \qquad (1) $$

Proof: Take a forest in $F_{m,n}$, with the $m$ vertices colored white and the $n$ vertices colored black. Delete all black vertices of degree $0$ and $1$ and forget the numbering of the remaining black vertices. This gives a forest in $A_{m,b}$. To undo this process, we first must take the $b$ black vertices and decide which of the $n$ vertices of $K_{m,n}$ they will be; there are $n(n-1)(n-2) \cdots (n-b+1)$ ways to do this. (We used that trees in $A_{m,b}$ have no automorphisms.) Then we must take the $n-b$ remaining black vertices and either join them to one of the $m$ white vertices or leave them unconnected; there $(m+1)^{n-b}$ ways to do this.

Formula (1) is clearly $(m+1)^n$ times a polynomial, it remains to check that the polynomial has degree at most $m-1$. We must check that $A_{m,b}$ is empty if $b \geq m$. Indeed, since every black vertex has degree at least $2$, a forest in $A_{m,b}$ has at least $2b$ edges. But, since it is a forest, it also has at most $m+b-1$ vertices. So $2b \leq m+b-1$ and $b \leq m-1$. $\square$

Verification of condition 2 for margins of $(0,1)$ matrices: Consider a vector $C = (C_1, \ldots, C_n) \in \{ 0,1,\ldots, m \}^n$ of columns sums, and consider how many row sums $(R_1, \ldots, R_m)$ are compatible with it. Let $c_j = \# \{ k : C_k = j \}$, so $c_0+c_1+\cdots + c_m=n$.

By the Gale-Ryser theorem, $(R_1, \ldots, R_m)$ is compatible with $(c_0, \cdots, c_m)$ if and only if $\sum R_i = \sum j c_j$ and, for all index sets $1 \leq i_1 < i_2 < \cdots < i_k \leq m$, we have $$R_{i_1} + R_{i_2} + \cdots + R_{i_k} \leq \sum_j \min(j,k) c_j . $$ This is the defining list of inequalities of the permutahedron, so this condition can alternately be stated as saying that $(R_1, \ldots, R_m)$ is in the convex hull of the $m!$ permutations of $(c_1+c_2+\cdots+c_m, c_2+\cdots + c_m, \cdots, c_{m-1}+c_m, c_m)$.

By Theorem 11.3 of Postnikov's Permutohedra, associahedra, and beyond, the number of such $(R_1, \ldots, R_m)$ is a polynomial $g_m(c_0,c_1, \ldots, c_m)$ in the $c_j$, of degree $m-1$. So $$|S_{m,n}| = \sum_{c_0+\cdots + c_m=n} \frac{n!}{c_0! c_1! \cdots c_m!} \ g_m(c_0,c_1, \ldots, c_m). \qquad (2)$$ Let $\partial_j$ be $\tfrac{\partial}{\partial x_j}$. There is some polynomial $h_m$ in the $\partial_j$, of degree $m-1$, such that $$\left. h_m(\partial_0, \ldots, \partial_m) \left( x_0^{c_0} \cdots x_m^{c_m} \right) \right|_{(1,1,\ldots,1)} = g_m(c_1, \ldots, c_m). \qquad (3)$$ Combining (2), (3) and the binomial expansion of $(x_0+x_1+\cdots+x_m)^n$, we obtain $$|S_{m,n}| = \left. h_m(\partial_0, \ldots, \partial_m) \left( x_0+x_1+\cdots+x_m \right)^n \right|_{(1,1,\ldots,1)} . \qquad (4) $$ Let $h_m(t,t,\ldots,t) = \sum_{k=0}^{m-1} h_{m,k} t^k$. Then $(4)$ is $\sum_{k=0}^{m-1} h_{m,k} n(n-1)(n-2) \cdots (n-k+1) (m+1)^{n-k}$, which is a polynomial of the desired form. $\square$

This conjectured equivalence (if true) is still amazing to me. However, here is a baby step: The conjecture is indeed true for $m=2$. Perhaps someone else will find this useful as a base for generalization.

Claim: $|F_{2,n}| = |S_{2,n}|$ for all $n$

First, lets consider $F_{2,n}$. In your definition, a spanning forest is any subset $T$ of edges of $K_{2,n}$ which is acyclic. Let $u,v$ be the two nodes on the $m=2$ side, and let $S(u)$ be the neighbors of $u$, i.e. the subset of nodes (on the $n$-node side) which $u$ is connected to in $T$. Similarly for $S(v)$.

Next, note that $T$ contains a cycle iff $S = S(u) \cap S(v)$ contains $2$ nodes (or more). So, $F_{2,n}$ can be counted by counting all cases where $S$ contains $0$ or $1$ node.

$|S|=0$ case: Each of $n$ nodes can be connected to $u$, or to $v$, or to neither. Total no. $= 3^n$.
$|S|=1$ case: There are $n$ ways to pick the unique $w \in S$. After that, each of the remaining $n-1$ nodes can be connected to $u$, or to $v$, or to neither. Total no. $= n\, 3^{n-1}$
Summing up, $|F_{2,n}| = 3^n + n \,3^{n-1} = 3^{n-1} (3+n),$ which matches your table.

Second, lets consider $S_{2,n}$. There are $n$ column sums, each of which can be $0, 1, 2$. Let $x,y,z$ be the number of columns whose sum $=0, 1, 2$ respectively. The columns whose sum $=0$ or $2$ dictate their elements. In the $y$ columns whose sum $=1$, the number of $1$s in the top row can be any integer $\in [0,y]$, and so the top row sum can be any integer $\in [z,z+y]$, and in short there are $(y+1)$ possibilities. Obviously, once the top row sum is known that determines the bottom row sum $= (y+2z) \,-$ top row sum.

So $S_{2,n}$ can be counted by (i) summing over all possible $y$, and for each $y$: (ii) picking the $y$ columns whose sum $=1$, and (iii) for the remaining $n-y=x+z$ columns assign them $0$ or $2$ arbitrarily. I.e.:

$$|S_{2,n}| = \sum_{y=0}^n (y+1) {n \choose y} 2^{n-y}$$

This can be evaluated many ways but my favorite is to recognize a slightly transformed sum as an explicit formula for the expected value of a Binomial random variable:

$$ \begin{align} |S_{2,n}| &= \sum_{y=0}^n (y+1) {n \choose y} 2^{n-y} \\ &= 3^n \times \sum_{y=0}^n (1+y) {n \choose y} (\frac13)^y (\frac23)^{n-y} \\ &= 3^n \times \mathbb{E}[1 + Bin(n, \frac13)] \\ &= 3^n (1 + {n \over 3}) = 3^{n-1} (3 + n) = |F_{2,n}| & QED \end{align} $$

As I said, this is a baby step only. The key step in $F_{2,n}$ is that $T$ has a cycle iff $|S| \ge 2$, and the key step in $S_{2,n}$ is that the $y$ ones can be all in the top row or all in the bottom row or anywhere in between, for $(1+y)$ possibilities. These key steps make the counting easy. However, as far as I can see, neither key step has any easy way to generalize to $m=3$, let alone arbitrary $m$. (E.g. for $m=3$, a cycle in $T$ can involve $2$ or $3$ nodes on the $m=3$ side, and I don't know any good way to count that.)

I have a brute force proof that, for $m=3$, both counts are $4^{n - 2} (3 n^2 + 13 n + 16)$. It would be possible to extend this method for any fixed $m$. I'll write it up for $m=3$ in the hope it gives insight. As you'll see, many steps look the same on both sides, but I can't figure out how to give a general proof.

Counting margins of $3 \times n$ binary matrices Consider a vector $C \in \{ 0,1,2,3 \}^n$ of column sums. Let $C$ have $c_j$ entries equal to $j$ (so $c_0+c_1+c_2+c_3=n$). Let's consider how many row sums $(R_1, R_2, R_3)$ are compatible with $C$. By the Gale-Ryser theorem, these are the vectors with $R_1+R_2+R_3 = c_1+2c_2+3c_3$, $\min(R_1+R_2, R_1+R_3, R_2+R_3) \geq c_2+2c_3$ and $\min(R_1, R_2, R_3)\geq c_3$. Geometrically, this is the lattice points in a hexagon whose vertices are the permutations of $(c_1+c_2+c_3, c_2+c_3, c_3)$. I computed that the number of lattice points in this hexagon is $$\binom{c_1}{2} + 2 c_1 c_2 + \binom{c_2}{2} + 2 c_1 + 2 c_2 + 1.$$

The generating function for $\sum_C x_0^{c_0} x_1^{c_1} x_2^{c_2} x_3^{c_3}$ is $(x_0+x_1+x_2+x_3)^n$. Let $\partial_j$ be differentiation with respect to $x_j$. Let $D$ be the differential operator $$\tfrac{1}{2} \partial_1^2 + 2 \partial_1 \partial_2+\tfrac{1}{2} \partial_2^2 + 2 \partial_1 + 2 \partial_2 + 1.$$ Then $D$ applied to the monomial $x_0^{c_0} x_1^{c_1} x_2^{c_2} x_3^{c_3}$, and evaluated at $(1,1,1,1)$, gives $\binom{c_1}{2} + 2 c_1 c_2 + \binom{c_2}{2} + 2 c_1 + 2 c_2 + 1$. So the number of pairs $(C,R)$ is $D (x_0+x_1+x_2+x_3)^n$ evaluated at $(1,1,1,1)$.

This gives $3 n(n-1) 4^{n-2} + 4 n 4^{n-1} + 4^n = 4^{n - 2} (3 n^2 + 13 n + 16)$.

Counting forests Given a forest on $3 \times n$, define a sequence $(v_1, v_2, \ldots, v_n)$ in $\{ 0,1,2,3 \}^n$ by saying that $v_j$ is $0$ if vertex $j$ has degree $0$, and otherwise $v_j$ is the least neighbor of $j$. Let $c_j$ be the number of $v_j$'s equal to $j$.

The generating function for the $(c_0, c_1, c_2, c_3)$ sequences is $(x_0+x_1+x_2+x_3)^n$ (again!). We count how many forests give rise to each $(c_0, c_1, c_2, c_3)$, by listing all possiblilities.

One vertex joined to $(1,2)$ and another joined to $(1,3)$ This can happen in $c_1 (c_1-1)$ ways.

One vertex joined to $(1,2)$ and another joined to $(2,3)$ This can happen in $c_1 c_2$ ways.

One vertex joined to $(1,3)$ and another joined to $(2,3)$ This can happen in $c_1 c_2$ ways.

One vertex joined to $(1,2,3)$ This can happen in $c_1$ ways.

One vertex joined to $(1,2)$ This can happen in $c_1$ ways.

One vertex joined to $(1,3)$ This can happen in $c_1$ ways.

One vertex joined to $(2,3)$ This can happen in $c_2$ ways.

None of the $n$-vertices joined to more than one neighbor This can happen in $1$ way.

So, this time, we want to sum up $$c_1 (c_1-1) + 2 c_1 c_2 + 3 c_1 + c_2 +1.$$ The corresponding differential operator is $$E=\partial_1^2 + 2 \partial_1 \partial_2 + 3 \partial_1 + \partial_2+1.$$

The differential operators are different, but once again I get $3 n(n-1) 4^{n-2} + 4 n 4^{n-1} + 4^n$ at the end of the day.

A thought for the future Why do the differential operators $\tfrac{1}{2} \partial_1^2 + 2 \partial_1 \partial_2+\tfrac{1}{2} \partial_2^2 + 2 \partial_1 + 2 \partial_2 + 1$ and $\partial_1^2 + 2 \partial_1 \partial_2 + 3 \partial_1 + \partial_2+1$ give the same thing when applied to $(x_0+x_1+x_2+x_3)^n$? Because $(x_0+x_1+x_2+x_3)^n$ is solely a function of $z:=x_0+x_1+x_2+x_3$, so all the $\partial_j$'s evaluate to $\tfrac{d}{dz}$. Writing $\partial$ for $\tfrac{d}{dz}$, both operators are $3 \partial^2 + 4 \partial + 1$. Any hope to generalize this?

Spanning forests of bipartite graphs and distinct row/column sums of binary matrices

Related

Recent Posts