Optimal stopping of a Poisson Process with a risky reward
I'm confident that there is a well-known solution to this problem, but I am having trouble finding a reference for it. I am also quite rusty on these kinds of problems, so I am having trouble solving it. Any help would be very much appreciated!
Consider the following bandit-like problem:
Time $t$ takes values in $[0,\infty)$. Suppose a "reward" $Y\in\{L,H\}$ is randomly drawn according to $\mathbb{P}\{Y=H\} = p$. Given $Y=y$, a Poisson process arrives at rate $\lambda_y>0$ ($\forall y\in\{L,H\}$).
A decision-maker (henceforth "DM") does not observe $Y$ initially. Given no arrivals, the DM can choose to stop the process and earn a payoff of $u_{_M}\in\mathbb{R}$. At the moment of an arrival, the process automatically stops and the DM earns a payoff of $u_{y}\in\mathbb{R}$ ($y\in\{L,H\}$). Assume that $u_{_L}<u_{_M}<u_{_H}$. What is the optimal time to stop the process?
Thanks again and apologies for the loose description. Please let me know if anything needs to be clarified!
Solution 1:
This expands my comment from yesterday: Consider the policy of fixing a (deterministic) time $x \in [0,\infty)$ and stopping at time $x$ if no arrival occurs before then. Let $g(x)$ be the expected reward under this policy. It is not difficult to write an exact expression for $g(x)$ in terms of $p$, $\lambda_H, \lambda_L, u_M, u_H, u_L$. Clearly $$u_L \leq g(x) \leq u_H \quad \forall x \in [0, \infty) $$ Now maximize $g(x)$ over $x \in [0, \infty]$ by considering the critical points. In particular compare $g(0)$, $g(\infty)$, and $g(c)$ for a point $c \in (0, \infty)$ such that $g'(c)=0$. Clearly \begin{align} g(0) &=u_M\\ g(\infty) &= pu_H + (1-p)u_L \end{align} so it remains only to find $c$ and $g(c)$.
Are deterministic policies optimal? Yes. Let $X$ be any nonnegative random variable chosen independently of the system. Suppose we decide to stop at time $X$. Our reward is $E[g(X)]$, which we know is finite (it is in $[u_L, u_H]$).
Claim: There must be a deterministic $x \in [0, \infty)$ such that $g(x)\geq E[g(X)]$.
Proof: Suppose not (we reach a contradiction). Define $m=E[g(X)]$. Then $g(x)<m$ for all $x \in [0, \infty)$. Since $X \in [0, \infty)$ surely, we surely have $g(X) < m$. It follows that $m-g(X)$ is a positive random variable, that is $m-g(X)>0$, and so $E[m-g(X)]>0$. This implies $m-m>0$, a contradiction. $\Box$