Formula for picking time closest to (but after) target

Let's say you have an arbitrary length of time. You are playing a game in which you want to push a button during this time span after a light comes on. If you do so, you win; if not, you lose. You can't see the light; you can only guess a random time to push the button. If you are playing all by yourself, the obvious way to guarantee winning is to push the button at the last possible instant, so that you will be after the light coming on no matter what.

Now let's say you are playing against another player. You both have the same goal: push your button after the light comes on but before the other player pushes their button. I'm sure the best solution in this case would be to push the button in the exact middle of the time range if the other player is pushing at a random time, but it could be that there is no optimal solution if both players are using a coherent strategy. This is complicated by the fact that one can lose either being first or third in the sequence of three events, and in some cases by being second, if the light is third. Also, both players can lose; you don't win by being closer if you're too early.

If there is an optimal solution for the second case, can that be generalized in any way? For a length of time t and a random number of players p is there a moment x that would give the best chance of being the winner? This is somewhat like the way bidding works on The Price is Right, except there is no real way to judge the "value", or correct time, to bid.

EDIT: Let's also take this one level higher. Consider playing multiple rounds of the game. You are allowed one button push per game, so the total number of rounds and pushes permitted are the same, but you can distribute your pushes however you like, anywhere from one per round to all in the same round. Is there an optimum mix at that level? Remember again that you wouldn't know when or if the other player was playing in any particular round, you would only know if you had won or lost that round, and it is possible to both win and lose whether or not the other player plays.

The big assumptions I'll make in my answer are (1) that all distributions are absolutely continuous (with respect to Lebesgue) everywhere, so we can employ probability densities, (2) The support of both players' strategies are the same and of the form $[a,b]$. Maybe someone can extend this answer by relaxing these assumptions.

Let $F$, $G_1$ and $G_2$ be the cdfs of the light and players 1 and 2 arrival times. Let $f$, $g_1$, and $g_2$ be the corresponding densitites.

Player 1's expectation of winning is $$ P = \int_0^{\infty} I(t) \ d G_1(t) $$ where $$ I(t) = F(t)(1-G_2(t)) + \int_0^t G_2(t') dF(t'). $$ The first term in the integrand corresponds to $\mathbb{P}(light \le 1 < 2)$. The second is $\mathbb{P}(2 \le light \le 1)$.

Player 1 chooses $G_1$ to maximize his payoff. That means he chooses $G_1$ with its support where the integrand $I(t)$ of $P$ above is maximized. If $G_1$'s support is $[a,b]$, that means that \begin{eqnarray} (1)\ \ I(t) \ \mbox{ is a constant on $[a,b]$} \\ (2)\ \ I([a,b])\ge I(t) \mbox{ for any $t\ge 0$}. \end{eqnarray} Differentiate, and we end up with the following boundary value problem: \begin{eqnarray} 0 &=& f(t) (1-G_2(t)) - F(t) G_2'(t) + G_2(t) f(t) \\ G_2(a) &=& 0 \\ G_2(b) &=& 1. \end{eqnarray} This simplifies to $$ f(t) = F(t) G_2'(t). $$ The BVP is easily solved: $$ (3)\ \ \ G_2(t) = \log\left(\frac{F(t)}{F(a)}\right) \ \ \mbox{ for $t\in[a,b]$}, $$ but we require $G_2(b) = 1$, so we must have $F(b) = e F(a)$.

Let $T$ be the end of the support of $F$, i.e., (informally) the last time the light might come on. We might have $T=+\infty$. Note that \begin{eqnarray} I(b) &=& \int_0^b G_2(t') dF(t') \\ I(T) &=& \int_0^T G_2(t') dF(t') \end{eqnarray} If $b<T$, then the second integral strictly exceeds the first, violating $(2)$ above. Thus, $b\ge T$. But by $(3)$, $G_2(t)$ is flat for $t>T$, so in fact, $b=T$.

In summary: choose $a$ such that $F(a) = e^{-1}$ and choose $G_2$ according to $(3)$. $G_1$ is the same.

Note in passing that I did not assume the players used the same strategy, just that the supports are the same.

Now I've done the easy part. How to deal with the possibility of different supports, supports that aren't intervals, non-absolutely continuous distributions, etc., I'll leave to someone else!

We assume that the set of common knowledge of the game (each player knows it, and each player knows that the other player knows etc), is (skipping formalities):

$1)$ The rules of the game. These rules include that if the button is pushed after $T$ the player loses (only implicitly assumed in the OP's question).
$2)$ That both players are "rational" meaning in our case that they prefer winning to losing, and that they won't adopt strictly dominated strategies.
$3)$ That the "length of time" is finite, $[0,T]$.
$4)$ That the distribution of the timing of the light flash $\lambda$, is $G(\lambda)$, be it Uniform or other.
$5)$ That both players follow the principle of insufficient reason, whenever the need arises. This means that when no relevant information is available, chance is modelled as a uniform random variable. (Note: we need to assume that, because, although the principle of insufficient reason is a very intuitive argument, nevertheless philosophical and epistemological battles still rage over this principle, and so, "rationality" alone is not sufficient to argue that the PIR will be followed).

Denote $t_1$ the time-choice of player $1$ and $t_2$ the time-choice of player 2. Both $t_1$ and $t_2$ range in $[0,T]$. They cannot range below $0$ because it is impossible, and they don't range above $T$ because this is a strictly dominated strategy.

If we are player $1$, then $t_1$ is a decision variable, while, $\lambda$ and $t_2$ are random variables. The probability of us winnig is

$$P(\text {player 1 wins}) = P(\lambda \le t_1, t_2 > t_1) $$

Now, from the point of view of player 1, $\lambda$ and $t_2$ are independent random variables: if player $1$ "knows" that $\lambda = \bar \lambda$, this won't affect how he views the distribution of $t_2$. So

$$P(\text {player 1 wins}) = P(\lambda \le t_1) \cdot P(t_2 > t_1) = G(t_1)\cdot [1-F_2(t_1)]$$

where $F_2()$ is the distribution function of $t_2$. Player $1$ wants to maximize this probability over the choice of $t_1$:

$$\max_{t_1} P(\text {Player 1 wins})= G(t_1)\cdot [1-F_2(t_1)] $$

First order condition is

$$\frac {\partial}{\partial t_1} P(\text {Player 1 wins}) =0 \Rightarrow g(t_1^*)\cdot [1-F_2(t_1^*)] - G(t_1^*)f_2(t_1^*) =0 \qquad [1]$$

where lower case letters denote the corresponding density functions (which we assume they exist).

The second-order condition (because we have to make sure that this is a maximum), is

$$\frac {\partial^2}{\partial t_1^2} P(\text {Player 1 wins})|_{t^*_1} <0 \Rightarrow \\ g'(t^*_1)\cdot [1-F_2(t^*_1)] - 2g(t^*_1)f_2(t^*_1) - G(t^*_1)f'_2(t^*_1) <0 \qquad [2]$$

Now, since we have no other information on the timing of the light-flash, except its range, then by our assumptions regarding the common knowledge set, $\lambda \sim U(0,T)$. Then

$$[1] \rightarrow \frac 1T [1-F_2(t_1^*)] - \frac {t_1^*}{T}f_2(t_1^*) =0 \Rightarrow t_1^* = \frac {1-F_2(t_1^*)}{f_2(t_1^*)} \qquad [1a]$$

while $$[2] \rightarrow - \frac 2Tf_2(t^*_1) - \frac {t_1^*}{T}f'_2(t^*_1) =-\frac 1{T}\Big(2f_2(t^*_1)+t^*_1f'_2(t^*_1)\Big) \qquad [2a]$$

To cover a conjecture of the OP, that the button will be pushed in the exact middle of the time-length, this will happen if player $1$ models $t_2$ as being a uniform random variable, $t_2 \sim U(0,T)$. Then

$$[1a] \rightarrow t_1^* = \frac {1-(t_1^*/T)}{1/T} = T-t_1^* \Rightarrow t_1^* =T/2 \qquad [1b]$$ and $$[2a] \rightarrow -\frac 1{T}\frac 2T <0 \qquad [2b]$$ so it will indeed be a maximum (likewise for player 2).

Player $1$ will model $t_2$ as a Uniform, if he has no other information on it except its range. Well, does he know something more? By the set of common knowledge, he knows that player $2$ will also try to maximize from her part, and that she will model the timing of the light-flash as a uniform. So player $1$ knows that player $2$ will end up looking at the conditions

$$t_2^* = \frac {1-F_1(t_2^*)}{f_1(t_2^*)},\;\; [3] \qquad -\frac 1{T}\Big(2f_1(t^*_2)+t^*_2f'_1(t^*_2)\Big) <0 \qquad [4]$$

Does this knowledge permit player $1$ to infer something about the distribution of $t_2$? No, because $[3]$ and $[4]$ contain abstract information about how $t_2$ will be determined as a function of what, according to player $2$, is the distribution of $t_1$. They do not help player 2 in any way in relation to the distribution of $t_2$.

So we conclude, that given the assumed set of common knowledge, both will model each other distributions as Uniforms. Hmmm... does this tell us that indeed the solution of the game will be $(t_1^*,t_2^*) =? (T/2,\,T/2)$?
It appears that since both can essentially predict the choice of the other, they will then have an incentive to push the button earlier. It does not take much thinking to realize that this line of thinking would lead us to conclude that they would both hit the button at time $0$, thus a.s. "guaranteeing" that they will both lose, which they also know, because they both treat the light-flash as a continuous rv, and so the probability of the light flash occurring exactly at time zero, is zero. But this is a strictly dominated strategy and the players won't select it.

Does it pay to randomize over the interval $[0,T/2]$? Well, no, because probability of wining won't be at a maximum. So we conclude that indeed the solution to this game is

$$(t_1^*,t_2^*) = (T/2,\,T/2)$$

even though the players know a priori what each will play.It is not difficult to calculate that in this case the expected payoffs will be $$ (v_1,v_2) = (1/4,\; 1/4)$$ This pure strategy profile will be a rationalizible equilibrium if it is not strictly dominated by a mixed strategy.

Formula for picking time closest to (but after) target

Related

Recent Posts