What is the probability that every pair of students studies together at some point?

Solution 1:

As promised in my response to Byron, here is my very partial progress to date on this problem. Though long-winded, it doesn't amount to much. I'm still looking for an answer to the general case, or to other special cases (like $n=2$ or $c=2$) or even just better estimates. I'm also interested to know about any applicable combinatorial tools even without a full answer.

First, a caution: The graphs I'm about to use are not related to the graphs in the general formulation of the original question. Ignore those graphs and just think of students and classrooms!

Let $G$ be a graph with a vertex for each student, $cn$ vertices in all. Say that a given year's assignment of students to classrooms respects $G$ if every edge in $G$ joins two students that are assigned to different classrooms. That is, $G$ encodes constraints on assignments; each edge in $G$ represents two students that must be kept apart. Let $R(G)$ be the probability that a random assignment of students to classrooms respects $G$. Note that $R(G)$ does not depend on $y$ since the definition of "respects" refers to a single year's assignment.

Now let $e(G)$ be the number of edges in $G$. Then the probability that after $y$ years every pair of students has at some point shared a classroom is exactly

$$\sum (-1)^{e(G)} (R(G))^y$$

where the sum is taken over all graphs $G$.

This result follows from a straightforward application of Inclusion/Exclusion. It's very pretty (at least I think so) but useless if you want a real answer. The problem isn't finding $R(G)$; this is tedious but can be automated. The real difficulty is that you have to handle $2^{cn\choose 2}$ graphs, which is far too many. The graphs come in isomorphic bunches (e.g. there are $cn\choose 2$ one-edged graphs that all have the probability of being respected) and you don't have to handle most graphs with lots of edges since $R(G) = 0$ for any graph $G$ that is not $c$-partite. But this doesn't help enough.

We get some use out of this wretched formula by generalizing the problem. Suppose we ask, given the same conditions, for the probability that each student in a given group of $p$ students at some point sees each of the other students in that group. The formula above gives the exact answer if we sum over all graphs with $p$ vertices, each labelled with a student in the group. For example, with $p = 2$ the formula gives the probability that two given students at some point see each other:

$$1 - \left(\frac{n(c-1)}{cn-1}\right)^y$$

(Of course we can get this result by a simpler route.) For $p=3$ the formula gives

$$1 - \left(\frac{N-n+1}{N(N-1)}\right)^y(3(N-1)^y - 3(N-n)^y + (N-2n+1)^y)$$

where $N = cn-1$. I've fought through the formula for $p$ up to 6, but it becomes impossible quickly.

Now some approximations. Define $U_m$ to be the probability that a given student $S$ is at some point in a classroom with each student in a given set of $m$ students not containing $S$. Another straightforward application of inclusion/exclusion gives

$$U_m = \sum_{j=0}^m(-1)^j{m\choose j}{N-j\choose n-1}^y \bigg/ {N\choose n-1}^y$$

where $N = cn-1$ as before. $U_N$ is then the probability that a single given student will see all the others. This is a very weak upper bound on the probability that all the students see all the others, though in trivial cases one student seeing all the others does imply that they all do! $U_m$ is rather easy to calculate; with the original parameters, for example, $U_{74}$ turns out to be a bit less than 1/7792. Note, BTW, that $U_1$ correctly gives the same result as the $p=2$ answer above.

If "student $A$ sees all the others" were independent of "student $B$ sees all the others", then we'd have the answer we seek: $(U_N)^{cn}$. But of course these events are unlikely to be independent. If many students have seen all the others, then surely the probability increases that all have seen the others. So $(U_N)^{cn}$ may be a lower bound on the true answer, if a weak one. Edit: Actually, I don't think it can be a lower bound, since I'm pretty sure one could find a case where one student can see all the others, but it's impossible for all students to do so.

Here's another way of estimating the answer: A group of $p$ students all see each other if and only if both (a) a particular one of these students sees all the other $p-1$ students, and (b) the other $p-1$ students all see each other. The probability of (a) is exactly $U_{p-1}$. Let $C_{p-1}$ denote the probability of (b). If events (a) and (b) were independent, then the exact answer to the original question would be $U_{p-1}C_{p-1}$. But by identical reasoning, $C_{p-1} = U_{p-2}C_{p-2}$. Continuing in this way, the exact answer to the original question would be

$$U_NU_{N-1}U_{N-2}\ldots U_2U_1$$

As above, the events in question are not likely to be independent. But most of the time they're close, right? With the original parameters, the fact that a group of five students all see each other has very little impact on whether a sixth student sees each of the five, though it should increase that probability slightly. This would make the expression above a lower bound on the actual answer, though I don't know how good it is.

Solution 2:

The only help I can give is to suggest a lower bound. You have cn students and the study together graph could then have cn(cn-1)/2 edges. Each session you pick $\frac{cn(n-1)}{2}$ edges to color in and you ask whether after y sessions all the edges are colored. If you ignore the class grouping, you can do the same problem randomly choosing $\frac{ycn(n-1)}{2}$ edges independently with replacement to color. I think your case will color the whole graph with higher probability, as no edge can claim more than $y$ of the colorings, but it should be close. Now this is a nice Poisson distribution. Each edge is colored with probability $1-\exp(-\lambda)$, where $\lambda$ is the average number of colorings each edge receives, here $\frac{ycn(n-1)}{cn(cn-1)}$, or just about $\frac{y}{c}$. The chance that all edges are colored is then $(1-\exp(-\lambda))^{\frac{cn(cn-1)}{2}}$, which as you say is very small.