Why does the number of possible probability distributions have the cardinality of the continuum?
Wikipedia's article on parametric statistical models (https://en.wikipedia.org/wiki/Parametric_model) mentions that you could parameterize all probability distributions with a one-dimensional real parameter, since the set of all probability measures & $\mathbb{R}$ share the same cardinality.
This fact is mentioned in the cited text (Bickel et al, Efficient and Adaptive Estimation for Semiparametric Models), but not proved or elaborated on.
This is pretty neat to me. (If I'd been forced to guess, I would have guessed the set of possible probability distributions to be bigger, since pdfs are functions $\mathbb{R}\rightarrow\mathbb{R}$, and we're counting probability distributions that don't have a density, too. It's got to be countable additivity constraining the number of possible distributions, but how?)
Where could I go to find a proof of this, or is it straightforward enough to outline in an answer here? Does its proof depend on AC or the continuum hypothesis? We need some kind of condition on the cardinality of the sample space that neither Wikipedia or Bickel mention, right (if it's too big, then the number of degenerate probability distributions is too big)?
A probability on $\mathbb{R}$, be it continuous or not, is given by its CDF $x \mapsto\mathbb{P}(X \leq x)$. A CDF is right-continuous, and the set of right-continuous functions has the cardinality of $\mathbb{R}$. To see this, you can for instance argue that the values of such a function are given by its values at the rational points, so it has at most the cardinality of a countable product of copies of $\mathbb{R}$, which has the cardinality of $\mathbb{R}$ as well.
To expand on the AC/CH question, Raoul's argument does not depend on either of these, since you can give an explicit injection from real-valued sequences $x_1,x_2,\ldots$ to $\mathbb R$ (and there is an explicit bijection between $\mathbb Q$ and $\mathbb N$, so between $\mathbb R^{\mathbb Q}$ and $\mathbb R^{\mathbb N}$). To do this, write each value as an decimal (converting $0.1999...$ to $0.2$, etc.). Then form a new infinite decimal as follows: digits in odd places are the digits of $x_1$, in order; those in places $\equiv 2$ mod $4$ are digits of $x_2$; those in places $\equiv 4$ mod $8$ digits of $x_3$; and so on. Since infinitely many digits of $x_1$ are not $9$, the same is true of the decimal we obtain by this process, and you can easily recover the digits of each $x_i$ from the final decimal, so this is an injection.