Let $\mathbb T=\{1,\dotsc,10\}$ represent the ten pins in a standard game of bowling.

Given two sets of pins $T\subseteq S\subseteq \mathbb T$, let's write $p_{S\to T}$ to represent the conditional probability that given the current pins up are $S$, after a single bowl by a certain player, the pins up are $T$. For example, $p_{\mathbb T\to\varnothing}$ is the probability that the player bowls a strike, $p_{\mathbb T\to\mathbb T}$ is the probability of a gutter ball, and $p_{\{7,10\}\to\varnothing}$ is the probability of picking up a spare after the most infamous split.

Let us say a pinfall model is a tuple of all these probabilities $p_{S\to T}$. Such a model has a lot of parameters: one can count that the number of different $p_{S\to T}$s is $$\sum_{S\subseteq \mathbb T} \sum_{T \subseteq S} 1 = \sum_{S\subseteq \mathbb T} 2^{\lvert S\rvert} = \sum_{i=0}^{10} \binom{10}{i} 2^i = (2+1)^{10} = 59\,049.$$ (There are other more direct ways of counting these parameters. Also, because these probabilities come from $2^{10}$ separate probability distributions $p_{S\to\diamond}$, the number of degrees of freedom is actually $3^{10}-2^{10} = 58\,025.$)

Using a pinfall model, one can simulate a full (single-player) game of bowling, in the usual way one would expect for a Markov model. There is some amount of detail elided here, because the rules of bowling are tricky (especially the final frame) and a single game might involve anywhere from 11 to 21 throws. Note that the fundamental assumption of this setup is that every single throw is independent, and that the player never tires nor changes their strategy.

If we focus only on the final score of the game (using traditional scoring), every pinfall model produces a distribution $q$ on the 301 possible scores $0, \dotsc, 300$. For example, the probability of a perfect game $q_{300}$ is $p_{\mathbb T\to\varnothing}^{12}$, while the probability of a scoreless game $q_{0}$ is $p_{\mathbb T\to\mathbb T}^{20}$. If you work out the details, one can see that this map $f\colon \mathbb R^{59049} \to \mathbb R^{301}$ from a pinfall model $(p_{S\to T})$ to a score distribution $(q_s)$ is a polynomial map!

One might wonder how much about the pinfall model we can recover from the distribution of scores. Some things are easy: we can definitely get $p_{\mathbb T\to\varnothing}$ and $p_{\mathbb T\to \mathbb T}$ from the reasoning above involving $q_{300}$ and $q_0$. (One might say that you can "hear" how often a player gets a strike or a gutter ball simply from hearing enough of their final game scores.) However, other things are impossible: the dimension of the codomain of $f$ is only a few hundred, so we have no hope of getting most of the myriads of parameters. In particular, we're not going to be able to get $p_{\{7,10\}\to\varnothing}$.

How many independent dimensions in total can we recover from the score distribution? Phrased mathematically, what is the dimension of the image under $f$ of the pinfall models†, considered as a semialgebraic set or a submanifold?

In other words, how many independent degrees of freedom are there in a distribution of bowling scores (in this model)? Note that a degree of freedom here might correspond directly to a parameter from the original model, but more likely is some kind of derived quantity, like "the probability of a spare" $\sum_{\varnothing\subsetneq S\subseteq \mathbb T} (p_{\mathbb T\to S} \cdot p_{S\to\varnothing})$ or "the probability of a 9 on the first bowl" $\sum_{S\subseteq \mathbb T, \lvert \mathbb T\smallsetminus S\rvert = 9} (p_{\mathbb T\to S})$.

Edit to add: As discussed parenthetically above, the space of valid pinfall models is the subset of the full $59\,049$-dimensional space, where each group of parameters $p_{S\to\diamond}$ is a valid probability distribution. I don't care about the image under $f$ of "bad models", which don't correspond to distributions as they should, in part because $f$ does not make sense there.


Here's some partial progress on the problem. It was already mentioned in the problem that $300$ is an upper bound on the number of recoverable dimensions, because that is the dimension of the codomain of $f$. However, one can prove a much tighter bound.

Specifically, if $0\le s \le 10$, let $p_{10\to s}$ represent the probability that after a single bowl with all ten pins up, $s$ pins remain up. In terms of the original problem, we have $$ p_{10\to s} = \sum_{S\subseteq \mathbb T, \lvert S\rvert = s} p_{\mathbb T\to S}. $$ This is a probability distribution with ten degrees of freedom.

Furthermore, if $0\le t\le s \le 10$, let $p_{s\to t}$ represent the conditional probability that given the first bowl resulted in $s$ pins up, the second bowl leaves $t$ pins up. In terms of the original problem, we have $$ p_{s\to t} = \left(\sum_{T\subseteq S\subseteq \mathbb T, \lvert S\rvert = s, \lvert T\rvert =t} p_{\mathbb T\to S} p_{S\to T}\right)\bigg/p_{10\to s}.$$ (If $p_{10\to s} = 0$, it won't matter what $p_{s\to t}$ is.) Note that we should actually restrict $0< s < 10$; indeed, the original problem does not allow for a different strategy on the second bowl with all ten pins up. Accordingly, $p_{10\to s}$ means "the same thing" on the first and second bowl. Meanwhile, the situation $s=0$ will never call for a second bowl.

Similarly to before, $p_{s\to t}$ is a probability distribution with $s$ degrees of freedom, so overall all of these distributions $p_{s\to t}$ contain $10+9+\dotsb+1 =55$ degrees of freedom.

Moreover, these probability distributions are sufficient to compute $f$. In other words, the map $f$ factors through this space. The way to see this is that in terms of scoring, it suffices to "simulate" the game via the number of pins up. The Markov chain from the original problem (which was not explicitly described) can be simulated without actually keeping track of which specific pins stay up. So for example, we would start by simulating a single bowl by drawing from $p_{10\to s}$. If the result is a strike ($s=0$), we record that accordingly and move on to the next frame. If the result is anything else, we simulate a second bowl by drawing from $p_{s\to t}$. By construction, we have taken into account the marginal distribution of the first bowl when considering the effect of the second bowl. And so forth. The final frame is a little tricky, but works fine because we are able to simulate a single bowl, not only a complete frame "$p_{10\to s\to t}$".

Unfortunately, this does not prove a lower bound because we do not consider whether it is possible to recover all of these dimensions from the image of $f$. Accordingly, 55 is a tighter upper bound to the original problem, without a matching lower bound.