Reference request: convergence for periodic Markov chains
Consider an irreducible but possibly periodic Markov chain on a finite state space with transition matrix $P$. We know there exists a unique stationary distribution $\pi$. If the Markov chain were aperiodic, we would have $P^n_{ij} \to \pi(j)$ as $n \to \infty$. This fails if the chain is periodic, but we have convergence of the Cesaro averages: $$\frac{1}{n}\sum_{k=1}^n P^k_{ij} \to \pi(j) \text{ as } n \to \infty.$$ Can anyone point me to a reference that states this fact? Every reference I've seen only considers convergence for aperiodic chains, or "fixes" periodicity by considering a lazy version of the chain. Alternatively, is there a simple way to obtain this result using the result for aperiodic chains?
The result follows immediately from applying the elementary renewal theorem to delayed renewal processes.
Here's a more elementary algebraic proof, using telescoping.
(problem 16, page 468 of Grinstead and Snell's free book https://math.dartmouth.edu/~prob/prob/prob.pdf )
for stochastic, $\text{m x m}$ matrix $P$
$\mathbf \pi^T P = \mathbf \pi^T$
and $ P\mathbf 1 = \mathbf 1$,
$W:= \mathbf 1 \mathbf \pi^T$ and $\text{trace}\big(W\big) = 1$
consider the following telescope
$\Big(I+P+P^2+....+ P^{n-1}\Big)\Big(I-P+W\Big) = I -P^n +nW$
thus
$\frac{1}{n}\Big(I+P+P^2+....+ P^{n-1}\Big)$
$= \frac{1}{n}\big(I -P^n +nW\big)\Big(I-P+W\Big)^{-1} $
$= \frac{1}{n}\Big\{\Big(I-P+W\Big)^{-1}\Big\} -\frac{1}{n}\Big\{P^n\Big(I-P+W\Big)^{-1}\Big\} +\frac{1}{n}\Big\{nW\Big(I-P+W\Big)^{-1}\Big\} $
$= \frac{1}{n}\Big(I-P+W\Big)^{-1} -\frac{1}{n}P^n\Big(I-P+W\Big)^{-1} +W$
now pass limits
$\lim_{n\to\infty}\Big\{ \frac{1}{n}\Big(I-P+W\Big)^{-1} -\frac{1}{n}P^n\Big(I-P+W\Big)^{-1} +W\Big\}$
$= \Big\{\lim_{n\to\infty}\frac{1}{n}\Big(I-P+W\Big)^{-1}\Big\} -\Big\{\lim_{n\to\infty} \frac{1}{n}P^n\Big(I-P+W\Big)^{-1}\Big\} +W$
$=0+0+W$
so
$W=\lim_{n\to\infty} \frac{1}{n}\Big(I+P+P^2+....+ P^{n-1}\Big)$
That's the argument in its entirety. I've left three book-keeping details for the end.
re: the third term simplification $W\Big(I-P+W\Big)^{-1}=W$
suppose $\Big(I-P+W\Big)^{-1}$ exists, then consider the inverse problem
$W\Big(I-P+W\Big) = W-WP +W^2 = W-W + W = W$
now multiply both sides on the right by $\Big(I-P+W\Big)^{-1}$
re: the second limit
observe that
$\Big\Vert \frac{1}{n}P^n\Big(I-P+W\Big)^{-1} - \mathbf 0\Big\Vert_F$
$ = \frac{1}{n}\Big\Vert P^n\Big(I-P+W\Big)^{-1}\Big\Vert_F$
$ \leq \frac{1}{n}\Big\Vert P^n\Big\Vert_F\cdot \Big\Vert \Big(I-P+W\Big)^{-1}\Big\Vert_F $
$ \leq \frac{1}{n} \mathbf 1^T P^n \mathbf 1 \cdot \Big\Vert \Big(I-P+W\Big)^{-1}\Big\Vert_F $
$ = \frac{1}{n} \mathbf 1^T \mathbf 1 \cdot \Big\Vert \Big(I-P+W\Big)^{-1}\Big\Vert_F $
$ = \frac{m}{n} \cdot \Big\Vert \Big(I-P+W\Big)^{-1}\Big\Vert_F$
$ \lt \epsilon $
for large enough n
(The second to last inequality follows from triangle inequality)
re: the invertibility of $\Big(I-P+W\Big)$
we prove $\det\Big(I-P+W\Big)=\prod_{j=2}^n (1-\lambda_j)$ and hence the matrix is invertible.
the nicest proof involves (partial) symmetrization:
using Perron Frobenius theory, we know that $\lambda_1 =1 $ is simple since $P$ is irreducibile.
$\mathbf v_1 := \mathbf \pi^\frac{1}{2}\cdot \frac{1}{\big \Vert \mathbf \pi^\frac{1}{2}\big \Vert_2}$
(where the square root is interpretted to be taken component-wise)
diagonal matrix $D:=\text{diag}\big(\mathbf v_1\big)$
Consider the similar matrix
$D\Big(I-P+W\Big)D^{-1} = I- (DPD^{-1}) +DWD^{-1} = I - B + \mathbf v_1\mathbf v_1^T$
$B$ has $\mathbf v_1$ as its left and right eigenvectors (check!).
Working over $\mathbb C$ and applying Schur Triangularization to $B$:
$V := \bigg[\begin{array}{c|c|c|c}\mathbf v_1 & \mathbf v_2 &\cdots & \mathbf v_{n}\end{array}\bigg]$
$B = VRV^{-1} = VRV^{*} =V\begin{bmatrix} 1 & \mathbf x_{m-1}^*\\ \mathbf 0 & \mathbf R_{m-1} \end{bmatrix}V^* =V\begin{bmatrix} 1 & \mathbf 0^T\\ \mathbf 0 & \mathbf R_{m-1} \end{bmatrix}V^*$
note $\mathbf x_{m-1} = \mathbf 0$ because $ \mathbf v_1^T = \mathbf v_1^* =\mathbf v_1^* B = 1\cdot \mathbf v_1^* + \sum_{j} x_j\cdot \mathbf v_j^*$
and the columns of $\mathbf V$ (or rows of $\mathbf V^*$) are linearly independent so every $x_j =0$
By simplicity of the Perron root: $\mathbf R_{m-1}$ does not have eigenvalues of 1, so
$I -B + \mathbf v_1 \mathbf v_1^T = V\big(I-R + \mathbf e_1\mathbf e_1^T\big)V^{*} =V\begin{bmatrix} 1 & \mathbf 0^T \\ \mathbf 0 & I_{m-1} -\mathbf R_{m-1} \end{bmatrix}V^*$
hence the determinant is $1\cdot \prod_{j=2}^n (1-\lambda_j) \neq 0$.