How many ways to merge N companies into one big company: Bell or Catalan?

There's a famous interview question variously credited to Microsoft, Google and Yahoo:

Suppose you have given N companies, and we want to eventually merge them into one big company. How many ways are there to merge them?

Assuming you can merge as many companies as you like in a single step, I thought this boils down to "find the number of partitions of a set with N elements", in which case the answer is the Bell number $B_{n}$. This can be computed with this handy recursion cribbed shamelessly from Wikipedia:

$B_{n+1}=\sum_{k=0}^{n}{{n \choose k}B_k}$

$1, 1, 2, 5, 15, 52, 203...$

And you have to substract one since you're already starting from one of the possible sets: $B_{2}=2$, but there's only one way to combine A and B into AB.

However, there are a lot of sources on the net which claim that the correct solution is the Catalan number:

$C_n = \frac{1}{n+1}{2n\choose n} = \frac{(2n)!}{(n+1)!\,n!} = \prod\limits_{k=2}^{n}\frac{n+k}{k} \qquad\mbox{ for }n\ge 0$

$1, 1, 2, 5, 14, 42, 132...$

Which is correct, and why? Or are they both correct depending on the assumptions you make about the somewhat vague problem statement?


My first interpretation is that a way of merging the $n$ companies corresponds to a sequence $\langle P_k:k=0,\dots m\rangle$ such that:

  1. $P_0=\{1,\dots,n\}$;
  2. $P_{k+1}$ is a partition of $P_k$ for $k=0,\dots,m-1$;
  3. $|P_{k+1}|<|P_k|$ for $k=0,\dots,m-1$; and
  4. $|P_m|=1$.

These sequences evidently correspond to rooted trees with $n$ labelled leaves. The immediate correspondence between the partition sequences and the trees allows the latter to have internal vertices with only one child, so that each $P_k$ corresponds to a level of the tree, and requires that the leaves all be on the same level, but we can clearly collapse the non-branching branches and instead count rooted trees with $n$ labelled leaves in which every internal vertex has at least two children.

Let $M(n)$ be the number of such partition sequences or trees. Clearly $M(1)=M(2)=1$, and $M(3)=\binom32+\binom33=4$.

Edit2: In fact the correct numbers are $1,1,4,26,236,2752$, OEIS A000311, the number of phylogenetic trees with $n$ vertices. $M(n)$ is the sum over all partitions $\sum_{i=1}^mn_ip_i=n$, where the $p_i$ are distinct positive integers, of the terms $$\frac{n!}{\prod_{i=1}^mp_i!^{n_i}n_i!}\cdot\prod_{i=1}^mM(p_i)^{n_i}=\frac{n!}{n_1!n_2!\dots n_m!}\prod_{i=1}^m\left(\frac{M(p_i)}{p_i!}\right)^{n_i}\;;$$ verifying this is a matter of fairly straightforward counting. OEIS gives the much nicer recurrence $$M(n+1)=(n+2)M(n)+2\sum_{k=2}^{n-1}\binom{n}kM(k)M(n-k+1)\;,$$ among several others.

Restricting to binary mergers yields the sequence $1,1,3,15,105$, which appears to be A001147, $(2n-3)!!$. The OEIS entry says that $a_n$ is the number of distinct products of $n=1$ variables with commutative, non-associative multiplication, and those would appear to correspond to the binary merger trees.


So turns out the question probably originates from Steven Skiena's The Algorithms Design Manual, whose answer wiki gives the solution as:

$\prod_{i=2}^{n} \frac{i(i-1)}{2} = \frac{n! (n-1)!}{2^{n-1}}$

Unlike Christian's answer, this assumes "pairwise" (only two companies at a time) non-simultaneous mergers.


Assume that any number of companies may be merged at a time. We must decide how much to care about timing. For example, do $$ \begin{aligned} &a,b,c,d\rightarrow\{a,b\},\{c,d\}\rightarrow\{a,b,c,d\}\\ &a,b,c,d\rightarrow\{a,b\},c,d\rightarrow\{a,b\},\{c,d\}\rightarrow\{a,b,c,d\}\\ &a,b,c,d\rightarrow a,b,\{c,d\}\rightarrow\{a,b\},\{c,d\}\rightarrow\{a,b,c,d\}\\ \end{aligned} $$ count as one outcome, two outcomes, or three outcomes? If we care only about which groups get merged, but not about the timing, counting these as one outcome makes sense. But if timing is important, counting these as three outcomes makes sense. One might also disallow the first of the three, assuming that two mergers never take place exactly simultaneously, in which case only the last two would count.

If we assume that timing is unimportant, so that the scenarios above count as one outcome, then for four companies we have $$ \begin{aligned} &a,b,c,d\rightarrow\{a,b,c,d\}\\ &a,b,c,d\rightarrow\{a,b,c\},d\rightarrow\{a,b,c,d\}\quad\text{(plus three others)}\\ &a,b,c,d\rightarrow\{a,b\},c,d\rightarrow\{a,b,c,d\}\quad\text{(plus five others)}\\ &a,b,c,d\rightarrow\{a,b\},c,d\rightarrow\{a,b,c\},d\rightarrow\{a,b,c,d\}\quad\text{(plus 11 others)}\\ &a,b,c,d\rightarrow\{a,b\},\{c,d\}\rightarrow\{a,b,c,d\}\quad\text{(plus two others),} \end{aligned} $$ which is a total of $26$ outcomes. This is the result in the OEIS link relating to phylogenetic trees in Brian Scott's answer. If we consider timing to be important, then the three outcomes in the last line turn into nine outcomes, and the result is $32$ outcomes total. This is the correct answer for the partition model described, but not actually implemented, in Brian Scott's answer. (For a full discussion of the partition model, see Lengyel's constant; for the sequence, see A005121.) If we forbid simultaneous mergers, then we reduce the total by $3$, leaving $29$ total outcomes, as in Christian Blatter's answer.

Added: There was a question in the comments to Brian Scott's answer about the derivation of the recurrence $$ \begin{aligned} M(n+1)=&(n+2)M(n)+2\sum_{k=2}^{n-1}\binom{n}{k}M(k)M(n-k+1)\\ =&nM(n)+2\sum_{k=2}^n\binom{n}{k}M(k)M(n-k+1) \end{aligned} $$ for the number of outcomes in the phylogenetic tree model. The idea is to partition the set of merger histories according to the size, which we will denote by $k+1$, of the first conglomerate to contain firm $n+1$. Note that $k$ ranges from $1$ to $n$.

If $k\ge2$ then there are two scenarios, distinguished by the manner in which firm $n+1$ gets merged with the other $k$ firms. In Scenario I, the other $k$ firms first merge into a single conglomerate, and then firm $n+1$ merges with it. In Scenario II, the other $k$ firms first merge into two or more conglomerates, and then firm $n+1$ and these conglomerates all merge in a single step. Either scenario can take place in $M(k)$ ways, for a total of $2M(k)$ ways.

Once firm $n+1$ has joined a conglomerate, that conglomerate must then be merged, along with the $n-k$ remaining firms, into a single conglomerate. This can take place in $M(n-k+1)$ ways. As there are $\binom{n}{k}$ ways to choose the $k$ firms with which firm $n+1$ forms its first conglomerate, there are a total of $2\binom{n}{k}M(k)M(n-k+1)$ outcomes for a given $k\ge2$. The only difference when $k=1$ is that Scenario II cannot occur. Hence the coefficient $2$ becomes $1$.

The term $(n+2)M(n)$ is a combination of the $k=1$ and $k=n$ terms of the sum.

2nd addition: For completeness, I record the recurrence for the partition model here as well. Let $Z(n)$ be the number of ways in which $n$ firms may be merged. The initial set of $n$ firms may be partitioned into $k$ non-empty parts in $S(n,k)$ ways, where $S(n,k)$ is the Stirling number of the second kind. The set of $k$ parts represent the set of firms present after the first round of mergers. Since at least one merger must occur in each round, we have $1\le k\le n-1$. The new set of firms may be merged in $Z(k)$ ways. From this we get $$ Z(n)=\sum_{k=1}^{n-1}S(n,k)Z(k). $$


According to jpatokal's comment there are two notions:

Given a set $S$ of $n\geq1$ distinguishable objects a merger consists in selecting an arbitrary $k$-subset $A\subset S$, $2\leq k\leq n$, and forming the new set $S':=(S\setminus A)\cup\{A\}$ containing $n-k+1<n$ elements.

Given a finite set $S_0$ a merger history is a sequence of mergers $S_0\to S_1\to S_2\to \ldots\to S_r$ whereby $|S_r|=1$. (We do not consider the case that at the same instant two disjoint subsets $A_1$, $A_2\subset S$ are merged into two different amalgams $\{A_1\}, \ \{A_2\}$.)

The problem is to determine the number of possible merger histories for a starting set $S$ containing $n\geq1$ elements. Denote this number by $H(n)$. Then $H(1)=H(2)=1$, and the $H(n)$ satisfy the following recursion:

$$H(n)=\sum_{k=2}^n {n\choose k} H(n-k+1)\qquad (n\geq3)\ .$$

The first few values are $$1,1,4,29,336,5687,132294,4047969,157601068, \ldots\quad\ .$$ OEIS does not list such a sequence.