Transformation of state-space that preserves Markov property

I am solving a problem in Mathematical Statistics by Jun Shao

Let $\{X_n \}$ be a Markov chain. Show that if $g$ is a one-to-one Borel function, then $\{g(X_n )\}$ is also a Markov chain. Give an example to show that $\{g(X_n )\}$ may not be a Markov chain in general.

I have a hard time on solving it, even though I have been staring it and thinking about it for a whole day.

  1. For the first part, which is to show that any one-to-one Borel $g$ preserves Markov property of a Markov chain, I guess using the formula for density function under the change of random variables by $g$, learned in elementary probability course, might help, but I am not sure how to use it, or maybe the tools needed to solve the problem are not that simple?

  2. For the second part, I really have no idea of how to construct some $\{ X_n\}$ so that $\{g(X_n )\}$ is not a Markov chain, for example, when $g(x)=x^2$?

Here are also some extended thoughts and questions:

  1. If $g$ is not one-to-one, is $\{g(X_n )\}$ always not a Markov chain for any Markov chain $\{ X_n\}$?
  2. How about if $\{X_t \}, t \in \mathbb{R}$? Does any one-to-one $g$ also preserve Markov property of continuous-time stochastic processes?

Thanks a lot!


The transformation you are interested in is often called lumping some states of the process and the resulting process a lumped chain.

For a given denumerable Markov chain $X$ on a state space $E$ with transitions $p$ and a given function $g$, whether $g(X)$ is still a Markov chain or not may depend on the initial distribution of $X$, but a necessary and sufficient condition for $g(X)$ to be Markov for any initial distribution of $X$ (condition known at least since C. J. Burke and M. Rosenblatt, A Markovian Function of a Markov Chain (1958), and widely used in the applications) is that for any $x$ and $x'$ in $E$ such that $g(x)=g(x')$ and any $z$ in $g(E)$, $$ \sum_{y:g(y)=z}p(x,y)=\sum_{y:g(y)=z}p(x',y). $$ In words, being at a state $x$, the probability to jump to a lumped state $z$ may depend on $x$, but only through the lumped state $g(x)$.

A simple but inspiring example is a Markov chain $X$ on three states $0$, $1$ and $2$ with transitions from $0$ to $1$, from $1$ to $2$, from $2$ to $1$ and from $2$ to $0$ and no other transition, thus, for a given $u$ in $(0,1)$, $$ p(0,1)=p(1,2)=1,\qquad p(2,0)=u,\qquad p(2,1)=1-u. $$ The chain $X$ is irreducible, aperiodic, and it converges in distribution to the unique stationary distribution $\pi$ given by $$ \pi(0)=\frac{u}{2+u},\qquad\pi(1)=\pi(2)=\frac1{2+u}. $$ Now, lump together states $1$ and $2$, for example by a function $g$ to a two-state space $\{a,b\}$ such that $g(0)=a$ and $g(1)=g(2)=b$. Then:

  • When at $0$, the chain $X$ enters $\{1,2\}$ deterministically by $1$.
  • As long as the chain $X$ stays in $\{1,2\}$, it alternates deterministically between $1$ and $2$.
  • When in $\{1,2\}$, the chain $X$ can only jump to $0$ from $2$, not from $1$.

Thus, for every $k\geqslant1$, $$ P(g(X_{n})=a\mid g(X_{n-1})=\cdots=g(X_{n-k})=b,g(X_{n-k-1})=a) = \left\{\begin{array}{cccl} 0&\text{if}&k&\text{is odd,}\\ u&\text{if}&k&\text{is even.}\end{array}\right. $$ This shows that $g(X)$ is not Markov and even that $g(X)$ is not a Markov chain of any order.


The key is that the Markov property is fundamentally about information: if you want to know something about the process's behavior after some time $t$, and you know its location at time $t$, then any information about the process's behavior before time $t$ is irrelevant. All the relevant information from the history is packed up in the value of $X_t$. You can express this in terms of conditional probabilities or conditional independence, but the idea is the same, in either discrete or continuous time.

Now, if $g$ is a one-to-one function, then $g(X)$ contains exactly the same information as $X$. This is easy to understand, because $g$ has an inverse. Thus you should be able to replace $X$ by $g(X)$ and not change the truth of any statement that is only about information; in particular the Markov property should be preserved.

A more formal statement would be: for any $\sigma$-field $\mathcal{G}$, $X \in \mathcal{G}$ iff $g(X) \in \mathcal{G}$. Now, if you express the Markov property formally in terms of $\sigma$-fields, it should become clear how to prove the first part, in either discrete or continuous time.

For a counterexample to the second part, note that if $g$ is not one-to-one, then $g(X)$ may contain less information than $X$ (but never more). If we think about discrete time, in deciding what state to occupy at time $n+1$, the only information we can use is what state we are in at time $n$. But the function $g$ can merge distinct states into a single state, so that we can no longer tell them apart. If the original distinct states influenced $X_{n+1}$ in an important way, then the process can cease to be Markov.

This doesn't have to happen and you can certainly find particular examples of Markov chains $X_n$ and non-one-to-one functions $g$ so that $g(X_n)$ is not Markov. Just try!