Are card drops equally likely?

The short answer is: No, they are not.

The long answer follows.

First off, why pay attention to the Steam Market numbers? It's the closest approximation to the real thing we have, and while it is skewed in a way, we must make do with it.

We can consider three kinds of users:

  • Opportunists. They want the badge, so they'll sell dupes and use the revenue to buy the cards they need.
  • Scrooges. They don't want the badge, but they like the idea of making money from this situation, so they'll sell all the cards they get for real money. (I am in this group.)
  • Traders. They will never sell or buy a card to the Steam Market. They don't use the market at all, either because they don't care, or because they don't need it.

Since no one card is more special than the others, there is no rational need to cling on some cards and discard others. Only the full set of cards is useful. Any set of nine cards is equally pointless to hold onto.

We don't need to care too much about estimating the proportions between these user groups, the amount of cards each gets and the amount of cards each sells because we can only assume that card distribution does not change based on what kind of user you are. The only difference is that, if card drops are uniformly dropped, the dupes they'll sell and buy on the market will also be uniformly dropped; otherwise, they won't. Since the hypothesis is "card drops are dropping uniformly" we don't need, or want, to try and adjust for this skew that shouldn't be there in the first place.

Now, there's a fairly easy way to tell if a distribution of cards like this is uniform or not. It's called the Pearson's Chi Squared test. It's based on this formula:

enter image description here

You take the difference between the expected number of cards E and the observed number of cards O for each kind, square it, divide it by E and sum the results. If the result is "small enough", the test is passed. How small is small enough?

My reference material on Statistics says that it needs to be smaller than the (1-α)th percentile of the χ² distribution with 10 − 1 = 9 degrees of freedom. Realistically, we can find the value of α where the two quantities are the same (the so-called p-value); if α is close to 0, the expected distribution is "wrong"; if α is close to 1, observation matches expectation.

Well, let's run the numbers. While you could do this on any game, I chose to run the numbers on the Steam Trading Cards set for one main good reasons: there's a LOT of cards out there, which lets us be very precise. If I took the TF2 cards, for example, I'd only have a quarter of the data to work with (100k instead of 400k). Additionally, it lets me dismiss counterarguments that can be made about holding onto specific cards; more about that later.

For those who can't see images, the value of α is about 10^-66 (that's 0.000000000000000000000000000000000000000000000000000000000000000001). Ignoring the "outliers" in the group, Football Manager and Skyrim, doesn't get us appreciably closer to 1.

(Scroll down the image, I'm not done here.)

enter image description here

What does that 10^-66 number mean anyway? Wikipedia says (annotations in italic are mine):

One often "rejects the null hypothesis" (= cards drop evenly) when the p-value is less than the predetermined significance level which is often 0.05 or 0.01, indicating that the observed result would be highly unlikely under the null hypothesis (i.e., the observation is highly unlikely to be the result of random chance). Many common statistical tests, such as chi-squared tests (what I used) or Student's t-test, produce test statistics which can be interpreted using p-values.

In other words, it's 0.00000000000000000000000000000000000000000000000000000000000000001% likely that the difference in the numbers is due to really bad luck.

"But badp," I hear you cry. "Some of those games are more popular than others. It's obvious that the cards of the more popular games are more sought after than the rest! Who the hell likes soccer anyway?" It turns out roughly as many people like dragons as they like soccer; just look at the publicly available numbers for how many people are playing what games. Skyrim and Football Manager are close to the top, and yet there are 26% more Football Manager cards than Skyrim cards. Actually running the number, again, gives us little doubt on the matter: (see the numbers without outliers)

enter image description here

What does .229 mean? Let's see what this page has to say:

While correlation coefficients are normally reported as r = (a value between -1 and +1), squaring them makes then easier to understand. The square of the coefficient (or r square) is equal to the percent of the variation in one variable that is related to the variation in the other. After squaring r, ignore the decimal point. An r of .5 means 25% of the variation is related (.5 squared =.25). An r value of .7 means 49% of the variance is related (.7 squared = .49).

Here r² = 0.052. In other words, a difference of 1% in popularity would mean a 0.05% difference in cards available on the market. That's very weak correlation, and that shouldn't be surprising: each card, on its own, is pretty much worthless. Only the full set is useful. You know what's a better way to show your allegiance to a game rather than another? Crafting a game badge for it, then showing it off on your profile. Or being level 10 and featuring it for free on your profile.

Now, there'd be no such equivalent for this argument if I ran the same numbers on TF2. You could argue that somebody would hold onto Engineer cards because they love playing Engineer. It still sounds dumb, but the above counterpoint doesn't apply. OTOH, the numbers are still pretty clear there as well:

enter image description here

Okay, how about a game with a ton of different cards? McPixel comes to the rescue. Prefer a game with fewer ones? Torchlight only has 6 and the results are kind of glaring.

It would be easy to speculate on why this is the case. It could be a bug the likes of which we've heard about with Microsoft. It could be intentional in order to create a more interesting economy where not everything is worth precisely as much as everything else, and let entrepreneuring users try and invest money in the system (remember, Steam gets a cut of all transactions; more transactions, more cuts.) It doesn't really matter though. I hope it's now clear that it's a fact: card drops are not equally likely across the board.