What is the difference between a two sample t-test and a paired t-test?

To be clear, you could use the two-sample $t$-test instead of the paired test, but it will not have as much statistical power to reject the null hypothesis if the null is false.

The way the experiment is designed is that each time a visit takes place, one person uses the drive-thru and the other uses the counter. So the data that is collected is naturally paired up: for the two observed waiting times, there is always a pair of observations.

Why is this important for statistical inference? Well, consider an alternative experimental design as follows: one person always orders at the drive-thru, and the other always orders at the counter. They order the same menu items each time, but they don't always order at the same time because the timing of the visits are not coordinated. Do you see how this could be problematic for making an inference about whether the mean waiting times are equal for the two types of service? Obviously, by not coordinating the visits, it is possible that the waiting time depends on how busy the restaurant is at the moment of the visit, which in turn reasonably depends on the time of day and day of the week. In addition, if the experiment always assigns one person to drive-thru and the other to use the counter, then you won't be able to tell if any differences in the waiting time are attributable to the method of ordering versus the person who is ordering--the two things are counfounded. Maybe the person going up to the counter is more attractive and therefore gets preferential treatment by the restaurant staff.

By pairing the visits and randomizing who uses the drive-thru versus the counter, you control for these extraneous factors that might influence service time.

When the analysis is conducted, it is important to take into account the structure of the data collected. For instance, because the service times correspond to the same visits, the wait times from the same visit will tend to exhibit a positive correlation, because when the restaurant is busy, both drive-thru and counter service is likely to be longer than when the restaurant is not busy. Therefore, by conducting the analysis on the difference in the wait times calculated for each visit, you are removing the effect of restaurant occupancy on wait times: the estimated variance of the difference will be smaller than the variance for the two-sample (unpaired) test, and this leads to greater precision of inference and more power.


The point of randomizing who ordered at the counter vs. at the drive-thru is so that there's no bias based on who did the ordering for each. If one of them always ordered through the drive-thru, and the other always at the counter, then you couldn't make an inference based solely on whether the order was drive-thru or counter - the person making the order would also implicitly be part of the inference.

By randomizing that aspect, the person doing the ordering is no longer correlated with the time it took to complete the order - it's part of the residual, and is uncorrelated with the outcome of interest (difference in time to complete order).

As for the difference between a two-sample and paired t-test, there are a lot of resources out there that can give you more detail (this article from NIH might help).

Ask yourself whether you can think about differences in specific pairs that are in the two groups you're comparing. In this case, you can by pairing at the visit level. For each visit to the fast food restaurant, there is one observation at the counter, and one from the drive-thru. By using a matched pair design, you can eliminate the bias that comes from characteristics of the visit itself (e.g., from what time or day of the week they visited).

For each visit, you can look at time to order through drive-thru minus time to order at the counter, and you have a sample of 555 differences in these pairs, and you can test whether those differences are significantly different from 0.

For a two-sample t-test, you shouldn't be able to pair the observations in the two groups you were comparing in the same way.