Choosing office hours to maximise number of students who can attend at least one time slot
I have the following (real world!) problem which is most easily described using an example.
I ask my six students when the best time to hold office hours would be. I give them four options, and say that I'll hold two hours in total. The poll's results are as follows (1 means yes, 0 means no):
$$\begin{array}{l|c|c|c|c} \text{Name} & 9 \text{ am} & 10 \text{ am} & 11 \text{ am} & 12 \text{ pm} \\ \hline \text{Alice} & 1 & 1 & 0 & 0 \\ \text{Bob} & 1 & 1 & 0 & 1 \\ \text{Charlotte} & 1 & 1 & 0 & 1 \\ \text{Daniel} & 0 & 1 & 1 & 1 \\ \text{Eve} & 0 & 0 & 1 & 1 \\ \text{Frank} & 0 & 0 & 1 & 0 \\ \hline \text{Total} & 3 & 4 & 3 & 4 \end{array}$$
Naively, one might pick columns 2 and 4, which have the greatest totals. However, the solution which allows the most distinct people to attend is to pick columns 2 and 3.
In practice, however, I have ~30 possible timeslots and over 100 students, and want to pick, say, five different timeslots for office hours. How do I pick the timeslots which maximise the number of distinct students who can attend?
Solution 1:
In general, this problem is the maximum coverage problem, which is NP-hard, so you're not going to be able to find the optimal solution by any method substantially faster than brute force. (As I mentioned, for a problem of your size, brute force is still feasible.)
However, a modification of your strategy (to choose the timeslots with the highest totals) performs reasonably well.
It doesn't actually make sense to choose all the timeslots with the highest totals immediately: if they all have the same students in them, then this isn't any better than just choosing one of the timeslots. Instead, we can:
- Choose one timeslot with the highest total number of students.
- Remove all students that can make it to that timeslot from consideration. Recalculate the totals for the remaining students.
- Repeat steps 1-2 until you have chosen $k=5$ timeslots.
According to the wikipedia article, this algorithm achieves an approximation ratio of $1 - \frac1e$; that is, if you can reach $N$ students with the optimal choice of timeslots, this algorithm will reach at least $(1 - \frac1e)N$ students.
There are other possible approximate solutions out there; for example, there is the big step heuristic, which considers all choices of $p$ timeslots in each step, as opposed to all individual timeslots. (When $p=1$, it is the algorithm above; when $p=k$, it is just brute force.) No fast algorithm has guaranteed performance better than $(1 - \frac1e)N$, but they may perform better in some cases.
Solution 2:
With those numbers, brute force is still feasible. You have 30 choose 5 = 142506 different combinations. I don't think there's any algorithm other than brute force that is guaranteed to get the best solution, but one heuristic would be to arrange the students in sets from least availability to most. Then rank time slots by how many of the least available students they match, then break ties by how many of second lest available, etc. Take the best time slot according to that criteria. Remove all students that can make that time slot, and apply the same algorithm to the remaining time slots and students, and continue until you've reached your maximum number of time slots. In your example, Frank is the least available student, so 11 am, as the only time slot he can make, is the first choice. This leaves Alice, Bob, and Charlotte without a time slot, and of them, Alice is the least available. 9 am and 10 am both accommodate Alice, so you can take either one (although if you take into account students that were removed in previous steps, then 10 am wins).
You can combine the two: use the heuristic until you get to small enough numbers that you're okay with using brute force.
Another heuristic is to look at clustering (is there one set of students that is mostly available in the morning, and another in the evening), and pick time slots in different clusters.
Solution 3:
Practically speaking, simply vary the office hours each week.