Background

You and I are going to play a game. To start off with I play a measurable function $f_1$ and you respond with a real number $y_1$ (possibly infinite). We repeat this some fixed number $N$ of times, to obtain a collection $\{(f_i,y_i)\}_{i=1}^N$. Define

$\Gamma=\{\mu: \mu $ is a probability measure, and $\int f_i(x)d\mu(x)=y_i \forall i\}$

(We'll assume that $\Gamma$ is not empty, meaning you cannot give inconsistent answers. For example, if I play $f_1(x)=x^2$, then you could not play $y_1=-1$).

Next, I play a measure $P$ which must satisfy $P\in \Gamma$. You then play a measure $Q\in \Gamma$. The game is scored such that your reward is equal to $H(Q|P)$ (the cross entropy) while my reward is equal to $-H(Q|P)$. In other words, I am rewarded to the extent that I can guess your distribution, while you are rewarded to the extent that you can foil my guess.

What should my strategy be for playing this game?

Motivation

This game can be considered as a general model of scientist designing experiments (the functions $f_i$) and using the results of those experiments to develop a theory (the measure $P$). The other player is a stand-in for "Nature", which we assume acts against us at every turn.

Further, the game is a generalization of the setting considered by Grunwald and Dawid in this paper. They consider what amounts to a special case of this game in which the set $\Gamma$ is assumed to be specified ahead of time (so each participant only specifies their distribution). Very interestingly, they show that in this case the optimal strategy is to play the distribution $\text{arg max}_{P\in\Gamma} H(P)$, where $H$ denotes Shannon entropy. This amounts to a passive model of scientific inference, in which the scientist has access only to observational data in the forms of constraints $\int f_i(x)d\mu(x)=y_i$, but has no control over what the actual functions $f_i$ are. Thus I am interested in what would happen if their setting is extended to an "active" one, in which the scientist can control which statistics $f_i$ to measure in the first place.


I will consider the case where all distributions are supported on a fixed finite set $\{1,2,\dots, M\}$. Hopefully someone will see how to extend to the continuous case.

All strategies for Player 1 are equivalent. Moreover, the eventual payoff does not depend on the number of questions asked.

In retrospect, this is maybe not surprising in light of the No Free Lunch theorem. To see it directly, first note that the optimal move for Player 1, conditional on any history $\{(f_i,y_i)\}$ is to play $P_{f,p}:=argmax_{P\in \Gamma_{\{(f_i,y_i)\}}} H(P)$. (This is also true when $N=0$). This is a consequence of the result of Grunwald/Dawid mentioned in the question. By a standard argument involving Lagrange multipliers, the distribution takes the form

$$P_{f,p}(x)=e^{\sum_i \lambda_i f_i(x)-A(\lambda)}$$

where $\lambda_i$ are chosen to satisfy the constraints $\sum_x f_i(x) P_{f,p}(x)=y_i$ and $A(\lambda)$ is chosen so that the total probability mass sums to 1.

Now let $Q\in \Gamma_{(f,p)}$. By the definition of cross entropy we have

\begin{eqnarray*} H(Q|P_{f,y})&=&-\sum_x Q(x)\log P_{f,y}(x)\\ &=&-\sum_x Q(x) \left(\sum_i \lambda_i f_i(x)-A(\lambda)\right)\\ &=&-\sum_i \lambda_i \sum_x f_i(x)Q(x)+A(\lambda)\\ & = & -\sum_i \lambda_i y_i+A(\lambda) \end{eqnarray*}

(In the last equality, we used the assumption that $Q\in \Gamma_{(f,y)}$). So in fact the cross entropy does not depend on $Q$, and is always equal to $H(P_{f,y})$. Said another way, the payoff to Player 2, conditional on the moves $\{(f_i,y_i)\}$ is equal to $H(P_{f,y})$.

The best case scenario for Player 2 would be if $P_{f,y}$ is the uniform distribution $Unif_M$ (since this has the highest entropy among all distributions on $\{1,2,\dots, M\}$). Player 2 can achieve this by following the strategy of playing $y_i=\sum_x f_i(x)/M$ in response to any move $f_i$ made by Player 1. It is easy to see that if $y_i$ are defined in this way, then $Unif_M\in \Gamma_{(f,y)}$, no matter what the functions $f_i$ are, which implies that $Unif_M=P_{f,y}$.

So no matter which or how many functions Player 1 plays, Player 2 can always force Player 1 to play $Unif_M$, for a resulting score of $log M$.

Unbounded Case

Here we assume that the distributions are supported on a non-compact set, either the integers or the real line. In this case, there exist distributions that have infinite entropy (Can the entropy of a random variable with countably many outcomes be infinite?). Now in general $H(Q|P)\ge H(Q)$, since KL divergence is always non-negative. This means that, as long as there is some distribution in $\Gamma$ that has infinite entropy, Player 2 can play that distribution, and thus obtain infinite reward. But making sure that $\Gamma$ contains such a distribution is easy, since Player 2 can just choose some distribution $Q$ with infinite entropy ahead of time, and then respond to any function $f_i$ with $y_i=E_{x\sim Q}f_i(x)$. So the game as stated is degenerate in this case.