Is There a Continuous Analogue of the Hypergeometric Distribution?

Solution 1:

Short version, yes.

Long version, it's complicated. There is an interesting link between probability distributions and orthogonal polynomials. For instance, consider the Hermite polynomials, $H_0, H_1, \ldots$. These are orthogonal with respect to the weight function $e^{-x^2/2}$ on the support $(-\infty,\infty)$. In other words, $\int_{-\infty}^{\infty}H_i(x)H_j(x)e^{-x^2/2}dx = \sqrt{2\pi}i!\delta_{ij}$.

The weighting function and the coefficient having a factor $\sqrt{2\pi}$ should ring a bell; specifically, you find these represented in a standard normal distribution.

Norbert Wiener recognized that the two are intricately linked and developed what is called the Gauss-Hermite Polynomial Chaos. In short, we can write a random variable $k$ of any distribution as an infinite series about a random variable $\zeta$ of a normal (Gaussian) distribution by using the Hermite polynomials as basis functions: $$ k = \sum_{i=0}^{\infty}k_i H_i(\zeta).$$

It turns out, you can generalize this to other distributions, and to discrete distributions.

There is something called the Askey Scheme, which is essentially a family tree relating hypergeometric orthogonal polynomials. Many of the weighting functions for these polynomials are distribution functions for probability distributions, meaning we can perform a Wiener-Askey Polynomial Chaos expansion for any random variable about another random variable of almost any distribution of our choosing. See also: http://www.dam.brown.edu/scicomp/media/report_files/BrownSC-2003-07.pdf

The hypergeometric family linked to the discrete hypergeometric distribution is the Hahn family of polynomials. The continuous analogue is uncreatively called the "Continuous Hahn" family. This leads to the following answer:

The weighting function of the Continuous Hahn polynomials will give you the continuous analogue of the discrete hypergeometric distribution. In fact, it is most likely identical to within a scaling parameter. These functions are quite complicated, so it is not intuitive.

The polynomial chaos distribution is quite interesting and powerful. If you look at the series expansion, it is analogous to a Taylor expansion of a random variable. Importantly, it allows you to write a random variable in a possibly unknown or complicated distribution in terms of any distribution of your choosing; as with any series expansion, the challenge is to then compute the deterministic coefficients. This becomes quite potent when you have, say, a dynamical system parameterized about a random parameter -- instead of using complicated stochastic models, or time-consuming Monte-Carlo analysis, you can frame the problem as one of computing a handful of deterministic coefficients, and generate statistical moments and/or Monte Carlo samples from them easily.

Interestingly, the first coefficient, $k_0$ always represents the distribution mean.

Solution 2:

It is also the Pearson distributions. In the original article the change of the probabilities (at $k+1$ wrt $k$) of hypergeometric distribution divided by the current probability is a rational function of the index $k$.

Pearson noted that if the $k+1$ and k go close, then at the limit this is a derivative of the pdf devided by pdf and it is equal to the rational function (1st order devided by a second order polynomial). He solved the system and created a Cartesian product (and a chart) of any parameter entering. I would see that his distributions are simply the scaled variable times a normalized Appell $F_1$ function. I would say that a generalization to that would be a generalized Lauricella $F^{(n)}_D$ function, normalized, times the scaled variable.

I like very much this simple thought of Pearson. The reason he developed this function system is that the real life probabilities have not an infinite parameter range and they are not symmetrical like the Normal. Normal would not fit the biological data of the day and in this world nothing is without limits and asymmetry like the height and the armslength of people and crabs. But the original article (series) is more interesting to read. We owe this generalization to anthropology.

Solution 3:

Pearson has published many articles. Try the last one: http://rsta.royalsocietypublishing.org/content/216/538-548/429 But there are nice reasons to check the previoys ones.

The Appel and Lauricella generalizations cannot be known to Pearson but they are mentioned!! and in the name of absence of experimental data they are left off (he says they are left in abeyance).

You can work it out when you will notice that for real roots of the binomial in his equation, the integral of the cumulative distribution function can be put in the form of an integral representation of these functions. Put them side to side to tell.