How many "good" graphs of size $n$ are there?

Let's a call a directed simple graph $G$ on $n$ labelled vertices good if every vertex has outdegree 1 and, when considered as if it were undirected, it is connected. How many good graphs of size $n$ are there?

Here's my work so far. Let's call this number $T(n)$. Clearly, $T(2) = 1$: there's only the loop on two vertices.

T(2)

We also have that $T(3) = 8$. We can count them using the following argument: let's call a possible shape a directed simple graph on $n$ unlabelled vertices which is good. For $n = 3$ we have the following shapes:

T(3)

There are $3!$ labelled good graphs of the first shape: fix the outside vertex in 3 possible ways, then fix the loop in two possible ways. There are also $\frac{3!}{3}$ labelled good graphs of the second shape: it's simply the number of cycles on 3 elements. So in total we have: $$T(3) = 3! + \frac{3!}{3} = 8\text{.}$$

We also know that $T(4) = 78$. Let's list all possible shapes:

T(4)

From top left to bottom right, it's easy to check that we have $4!$ labelled good graphs of the first shape, $2\cdot {4 \choose 2}$ of the second, $2\cdot {4 \choose 2}$ of the third, $4!$ of the fourth and $\frac{4!}{4}$ of the last. In total: $$T(4) = 4! + \left(2\cdot {4 \choose 2} + 2\cdot {4 \choose 2}\right) + 4! + \frac{4!}{4} = 3\cdot 4! + \frac{4!}{4}\text{.}$$

I think that $T(5) = 884$, but I won't draw all possible shapes or count their labelings for brevity.

I computed $T(5)$ again, and now I get 944. This also invalidates the following conjecture.

CONJECTURE [DISPROVEN]: I'm conjecturing that there's a simple-ish formula for $T(n)$. It's something like $$T(n) = (2^{n-2} - 1) \cdot n! + \frac{n!}{n} + S(n)$$ where $S(n)$ is some function I currently don't understand such that $S(2) = S(3) = S(4) = 0$, while $S(5) = 5\cdot 4$.


I would like to contribute some ideas even though I don't have as much time as I'd like at the moment. If I understand this problem correctly then the class of graphs under consideration call it $\mathcal{Q}$ is in a set-of relationship with the class of endofunctions call it $\mathcal{E}$ with the latter being sets of the former.

The following MSE link A has closely related material, as does this MSE link B where aspects of "Random Mapping Statistics" by Flajolet and Odlyzko are discussed.

This gives the species equation $$\def\textsc#1{\dosc#1\csod} \def\dosc#1#2\csod{{\rm #1{\small #2}}} \mathcal{E} = \textsc{SET}(\mathcal{Q})$$ which is in terms of generating functions $$E(z) = \exp Q(z) \quad\text{or}\quad Q(z) = \log E(z).$$

Recall the popular labeled rooted tree function which represents the species $$\mathcal{T} = \mathcal{Z} \times \textsc{SET}(\mathcal{T})$$ and has the functional equation $$T(z) = z \exp T(z).$$

We also have that $$T(z) = \sum_{n\ge 1} n^{n-1} \frac{z^n}{n!}$$ (Cayley's formula) and since there are $n^n$ endofunctions we obtain $$E(z) = 1 + z\frac{d}{dz} T(z) = 1 + z T'(z).$$

But from the functional equation we get $$T'(z) = \exp T(z) + z \exp T(z) T'(z) = \frac{T(z)}{z} + T(z) T'(z)$$ so that $$T'(z) = \frac{1}{z} \frac{T(z)}{1-T(z)}.$$

This finally yields $$E(z) = 1 + \frac{T(z)}{1-T(z)}$$ and hence $$Q(z) = \log\left(1+\frac{T(z)}{1-T(z)}\right).$$

This gives the sequence $$1, 3, 17, 142, 1569, 21576, 355081, 6805296, \\ 148869153, 3660215680,\ldots$$ which points us to OEIS A001865 where we learn that indeed we have the right exponential generating function. Note that the formula for $E(z)$ also proves that $\mathcal{E} = \textsc{SEQ}(\mathcal{T}).$

Now to extract coefficients from this for a closed formula we expand the logarithm to get for the count $q_n$ the formula

$$n! [z^n] \sum_{k\ge 1} \frac{(-1)^{k+1}}{k} \left(\frac{T(z)}{1-T(z)}\right)^k.$$

Observe that we can restrict this to $k\le n$ because the tree function term starts at $z,$ getting $$n! [z^n] \sum_{k=1}^n \frac{(-1)^{k+1}}{k} \left(\frac{T(z)}{1-T(z)}\right)^k.$$

We still need the terms of the fraction in the tree function, which can be done by Lagrange inversion. We have $$[z^n] \left(\frac{T(z)}{1-T(z)}\right)^k = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+1}} \left(\frac{T(z)}{1-T(z)}\right)^k \; dz.$$

Put $T(z) = w$ so that $z=w/\exp(w)=w\exp(-w)$ and $dz=(\exp(−w)−w\exp(−w))\; dw$ to get $$\frac{1}{2\pi i} \int_{|w|=\epsilon} \frac{\exp(w(n+1))}{w^{n+1}} \left(\frac{w}{1-w}\right)^k (\exp(−w)−w\exp(−w))\; dw \\ = \frac{1}{2\pi i} \int_{|w|=\epsilon} \frac{\exp(wn)}{w^{n+1-k}} \frac{1}{(1-w)^{k-1}} \; dw .$$

Extracting the residue at zero yields for $k\ge 2$ $$\sum_{q=0}^{n-k} \frac{n^{n-k-q}}{(n-k-q)!} {q+k-2\choose k-2}$$ and for $k=1,$ $$\frac{n^{n-1}}{(n-1)!}.$$

Collecting these into one formula finally yields $$n^n + n! \sum_{k=2}^n \frac{(-1)^{k+1}}{k} \sum_{q=0}^{n-k} \frac{n^{n-k-q}}{(n-k-q)!} {q+k-2\choose k-2}.$$

This computation is closely related to the material at this MSE link.

Addendum. I just noticed in one of the other posts that fixed points are not admitted in those endofunctions. The above material admits fixed points as in the discussion and the diagram at the Wikipedia entry.

Addendum Thu Dec 18 19:57:09 CET 2014. The case when there are no fixed points goes as follows. There are now $(n-1)^n$ endofunctions with no fixed points, which gives $$E(z) = 1 + \sum_{n\ge 1} (n-1)^n \frac{z^n}{n!}.$$

Now observe that when we apply Lagrange inversion to $$\frac{\exp(-T(z))}{1-T(z)}$$ we obtain

$$[z^n] \frac{\exp(-T(z))}{1-T(z)} = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+1}} \frac{\exp(-T(z))}{1-T(z)}\; dz$$

which using the same substitution as before becomes $$\frac{1}{2\pi i} \int_{|w|=\epsilon} \frac{\exp(w(n+1))}{w^{n+1}} \frac{\exp(-w)}{1-w} (\exp(−w)−w\exp(−w))\; dw \\ = \frac{1}{2\pi i} \int_{|w|=\epsilon} \frac{\exp(w(n-1))}{w^{n+1}}\; dw = \frac{(n-1)^n}{n!}.$$ But $\exp(-T(z)) = \frac{z}{T(z)}$ and we finally get for $E(z)$ the closed form $$\frac{z}{T(z)(1-T(z))}.$$

This gives for $Q(z)$ that $$Q(z) = \log\left(\frac{z}{T(z)(1-T(z))}\right)$$ (note that $E(z)$ has a constant term in its expansion at zero which is one) which produces the sequence $$0, 1, 8, 78, 944, 13800, 237432, 4708144, 105822432, 2660215680, \\ 73983185000, 2255828154624,\ldots $$ which points us to OEIS A000435, confirming the result from the accepted answer. Note that the OEIS says that this sequence was the first in the database, so we are content to be referencing it in this computation.

To get a closed form re-write $Q(z)$ as follows: $$Q(z) = \log\left(1 + \frac{z}{T(z)(1-T(z))} -1\right)$$ to get the formula $$n! [z^n] \sum_{k=1}^n \frac{(-1)^{k+1}}{k} \left(\frac{z}{T(z)(1-T(z))} -1\right)^k$$ This is $$n! [z^n] \sum_{k=1}^n \frac{(-1)^{k+1}}{k} \sum_{q=0}^k {k\choose q} (-1)^{k-q} \left(\frac{z}{T(z)(1-T(z))}\right)^q.$$ We can drop the term for $q=0$ when $n\ge 1,$ getting $$n! [z^n] \sum_{k=1}^n \frac{(-1)^{k+1}}{k} \sum_{q=1}^k {k\choose q} (-1)^{k-q} \left(\frac{z}{T(z)(1-T(z))}\right)^q.$$

Use Lagrange inversion to extract coefficients from the tree function term. $$[z^n] \left(\frac{z}{T(z)(1-T(z))}\right)^q = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+1}} \left(\frac{z}{T(z)(1-T(z))}\right)^q \; dz$$

which using the same substitution as before becomes $$\frac{1}{2\pi i} \int_{|w|=\epsilon} \frac{\exp(w(n+1-q))}{w^{n+1-q}} \left(\frac{1}{w(1-w)}\right)^q (\exp(−w)−w\exp(−w))\; dw \\ = \frac{1}{2\pi i} \int_{|w|=\epsilon} \frac{\exp(w(n-q))}{w^{n+1}} \frac{1}{(1-w)^{q-1}} \; dw$$

Extracting coefficients we get $$\sum_{p=0}^n \frac{(n-q)^{n-p}}{(n-p)!} {p+q-2\choose q-2}$$ when $q\ge 2$ and for $q=1$ $$\frac{(n-1)^n}{n!}.$$

Substituting this into the sum formula yields $$n! \times \sum_{k=1}^n \frac{(-1)^{k+1}}{k} \times k \times (-1)^{k-1} \frac{(n-1)^n}{n!} \\ + n! \times \sum_{k=1}^n \frac{(-1)^{k+1}}{k} \sum_{q=2}^k {k\choose q} (-1)^{k-q} \sum_{p=0}^n \frac{(n-q)^{n-p}}{(n-p)!} {p+q-2\choose q-2}.$$ This simplifies to $$n\times (n-1)^n + n! \times \sum_{k=2}^n \frac{(-1)^{k+1}}{k} \sum_{q=2}^k {k\choose q} (-1)^{k-q} \sum_{p=0}^n \frac{(n-q)^{n-p}}{(n-p)!} {p+q-2\choose q-2}.$$

This formula can be used to compute the count of these graphs for large $n$ where the tree function formula would no longer be practicable.

The sequence for $n=30$ to $n=34$ reads

38086159100543376291945674612050231296000000,
3117962569860399657478478640723143576082043800,
263711778692997479722657378560127779200642842624,
23019602620026625886784119896351926037410391377792,
2071846675499818842878197235287956993753027358752768

Additional observations. The OEIS entry OEIS A000435 says that this sequence is the normalized total height of all labeled rooted trees on $n$ nodes (sum of height of all nodes in all trees scaled by $n$). The question arises on how to prove this.

The height parameter is represented by the bivariate generating function $T(z, u)$ where $T(z, 1) = T(z)$ (the ordinary tree function) and we have the functional equation

$$T(z, u) = z + z\frac{T(uz,u)^1}{1!} + z\frac{T(uz,u)^2}{2!} + z\frac{T(uz,u)^3}{3!} + z\frac{T(uz,u)^4}{4!} + \cdots$$ or $$T(z, u) = z \exp T(uz, u).$$

The exponential generating function for the sum of the height of all nodes of all rooted labeled trees by the number of nodes is given by $$G(z) = \left.\frac{\partial}{\partial u} T(z, u)\right|_{u=1}.$$

We thus differentiate the functional equation, getting $$G(z) = \left. z \exp T(uz, u) \left(z \frac{\partial}{\partial z} T(z, u) + \frac{\partial}{\partial u} T(z, u) \right)\right|_{u=1}$$ which becomes $$G(z) = T(z) (z T'(z) + G(z))$$ so that $$G(z) (1-T(z)) = z T(z) T'(z)$$ or $$G(z) = \frac{z T(z)}{1-T(z)} T'(z).$$

We may substitute the expression for the derivative of the tree function that we obtained earlier into this to get $$G(z) = \frac{z T(z)}{1-T(z)} \frac{1}{z} \frac{T(z)}{1-T(z)} = \left(\frac{T(z)}{1-T(z)}\right)^2.$$

We have computed the exponential generating function of the total height of all labelled trees on $n$ nodes, which gives the sequence $$0, 2, 24, 312, 4720, 82800, 1662024, 37665152, 952401888, \\ 26602156800, 813815035000 $$ which is OEIS A001864.

The normalized height is this sequence divided by $n.$ To verify that this is indeed the same as the count of endofunctions with no fixed point we must show that $$z \frac{d}{dz} Q(z) = G(z)$$ i.e. that the endofunctions times $n$ give the total height or alternatively, the total height divided by $n$ give the endofunctions.

But the left is $$ z \frac{T(z)(1-T(z))}{z} \\ \times \left(\frac{1}{T(z)(1-T(z))} - z \frac{1-2T(z)} {T(z)^2 (1-T(z))^2} T'(z) \right)$$ which gives $$1- z \frac{1 - 2T(z)}{T(z)(1-T(z))} T'(z) = 1- \frac{1 - 2T(z)}{T(z)(1-T(z))} \frac{T(z)}{1-T(z)} = 1- \frac{1 - 2T(z)}{(1-T(z))^2} \\ = \frac{1-2T(z)+T(z)^2 -(1- 2T(z))}{(1-T(z))^2} = \left(\frac{T(z)}{1-T(z)}\right)^2,$$ thus concluding the proof.


My solution does not agree with your answer for $T(5)$, but let's give it a try anyway . . .

To construct such a graph on $n$ vertices, consider the vertices with indegree $0$. If there are none of these then the graph is a (directed) cycle, and there are $(n-1)!$ possibilities. If there are $k$ specified vertices with indegree $0$, then we obtain our graph by choosing a graph of the required type on $n-k$ vertices, then deciding which of these $n-k$ vertices are to be the targets of the edges from the $k$ vertices of indegree $0$. The number of possibilities is $T(n-k)(n-k)^k$. Now the maximum value of $k$ is $n-2$ (can't be $n-1$ because then the remaining vertex would have to be adjacent to itself, which you do not allow). Therefore, by inclusion/exclusion, we have $$T(n)=(n-1)!+\sum_{k=1}^{n-2}(-1)^{k-1}\!\!\binom nk T(n-k)(n-k)^k\ .$$ The initial value is $T(2)=1$, and for $n=3,4,5$ this gives $8,78,944$.

My results seem to match OEIS A000435, though I can't see any connection with your problem.

Comments please!


The graphs you are describing are known as simple (directed) pseudotrees; see http://en.wikipedia.org/wiki/Pseudoforest. There doesn't appear to be a 'nice' closed form for these trees. Wikipedia/OEIS gives the number of undirected connected graphs with $n$ vertices as

$$f(n) = \sum_{k=1}^n \frac{(-1)^{k-1}}{k} \sum_{n_1+\cdots+n_k=n} \frac{n!}{n_1! \cdots n_k!} \binom{\binom{n_1}{2}+\cdots +\binom{n_k}{2}}{n}.$$

(values at http://oeis.org/A057500).

Now, this is not quite what you want, since you want to count the number of such directed graphs. Luckily, these two quantities are easily related.

Note that every pseudotree can be decomposed into a directed cycle and several directed trees attached to points on this cycle. The edges in the directed trees must always point towards the cycle, so their direction is fixed given the undirected the graph. On the other hand, any cycle with more than two vertices can be directed in exactly two ways. Therefore, if $g(n)$ is the number of simple labelled directed pseudotrees (the quantity you want) and if $h(n)$ is the number of undirected pseudotrees with a cycle of size 2, then $g(n) = 2f(n) - h(n)$.

We can get an expression (although unfortunately not a closed form one) for $h(n)$ by observing that the number of ways to construct an undirected pseudotree with a cycle of size 2 is to choose 2 vertices to belong to the cycle, split the remanining vertices into 2 sets (depending on which vertex they belong to), and count the number of ways to build a labelled rooted forest on each of those sets. By Cayley's formula, there are $(n+1)^{n-1}$ rooted forests on $n$ labelled nodes, so this gives

$$h(n) = \binom{n}{2}\sum_{k=0}^{n-2}\binom{n-2}{k}(k+1)^{k-1}(n-k-1)^{n-k-3}$$

We can in fact generalize this logic to get a sum similar to that for $f(n)$ directly for $g(n)$; the only difference is that if our cycle is of length $k$, we must partition the remaining vertices into $k$ sets instead of 2. This gives

$$f(n) = \sum_{k=2}^{n}\binom{n}{k}(k-1)!\sum_{n_1+\cdots+n_{k}=n-k}\dfrac{(n-k)!}{n_1!\cdots n_k!}(n_1+1)^{n_1-1}\cdots(n_k+1)^{n_k-1}$$