The "muscle" behind the fact that ergodic measures are mutually singular
This is really motivated by the soft question at the end, but let me begin with something more circumscribed:
Let $(X,\mathcal{B})$ be a measurable space and let $T:X\circlearrowleft$ be a self-map measurable with respect to $\mathcal{B}$. Let $\mu$ and $\nu$ be $T$-invariant finite measures such that $\nu \ll \mu$. Let $f\in L^1(X,\mu)$ be the Radon-Nikodym derivative $d\nu/d\mu$. I have two questions:
(1) Is $f$ a $T$-invariant element of $L^1(X,\mu)$, in the sense that $\int_Efd\mu = \int_E f\circ T d\mu$ for all $E\in\mathcal{B}$?
(2) If the answer is yes, is it possible to prove this without recourse to the Birkhoff Ergodic Theorem or an equivalent? If the answer is no, what is an example?
Motivation, thoughts, and the soft question: I am asking because on the one hand, it seems to me on general grounds that the fact that $f$ is defined uniquely in terms of the $T$-invariant measures $\nu$ and $\mu$ ought to force it to be $T$-invariant. (Where could the ability to change with $T$ come from, if not from $\mu$ or $\nu$?) On the other hand, when I apply the definitions directly, so far I have only been able to demonstrate the equality
$$ \int_E fd\mu = \int_E f\circ Td\mu$$
for sets $E$ in the pullback $\sigma$-algebra $T^{-1}\mathcal{B}$. For example, we have $$\int_{T^{-1}E}fd\mu = \nu(T^{-1}E) = \nu(E)=\int_E fd\mu = \int_EfdT_*\mu = \int_{T^{-1}E} f\circ Td\mu$$ verifying the equality for sets of the form $T^{-1}E$. Maybe I'm just not being clever enough, but every time I've played with it so far, this is how it comes out. Thus if $T$ is invertible, I have the desired equality, but if not, then I am not sure.
Meanwhile, if we make the additional assumption that $\mu,\nu$ are probability measures and that $\mu$ is ergodic, then using the Birkhoff Ergodic Theorem I can prove that $\nu = \mu$, which of course implies that $f=1$. After some more work this implies that distinct ergodic measures are mutually singular.
My soft question, which is really what the title is about, is, does this result in some essential way "come from" the Birkhoff Ergodic Theorem? If the answer above to (1) is yes, and it is possible to prove it without the BET, then this could in turn be used to prove that if $\mu,\nu$ are probability measures with $\mu$ ergodic, then $\nu=\mu$, and then this would imply that distinct ergodic measures are mutually singular without needing the BET. But my experience playing around so far makes it seem as though somehow without the BET, the definitions themselves are "not enough power." Is there anything to this? If so, what do I really mean? What aspect of the situation that the BET illuminates is needed for this result?
Thanks in advance for your thoughts.
Solution 1:
This problem troubled me for a long time. The BET is literally everywhere in ergodic theory and there are many situations where it is not immediately clear how necessary the BET really is. Below, I'll show that we don't need the BET here, just some ideas from martingale theory.
Motivation
The trouble with noninvertible measure preserving systems is that $T^{-1} \mathcal B \subsetneq \mathcal B$. A good way to think about this is that the action of $T$ coarsens phase space: more precisely, for a measurable observable $\phi : X \to \mathbb R$, we have that $\phi \circ T$ is $T^{-1} \mathcal B$-measurable.
For $f = d \nu / d \mu$, what you've shown already is that $$ \int_{T^{-1} E} f d \mu = \int_{T^{-1} E} f \circ T $$ for all $E \in \mathcal B$. Let's reformulate this in terms of conditional expectation: $$ \mu (f | T^{-1} \mathcal B ) = f \circ T \, , $$ where $\mu( \cdot | \cdot)$ denotes conditional expectation. You can think of this as saying that $f = d \nu / d \mu$ is "invariant on average". Naturally this does not mean $f$ is invariant yet, since $T^{-1} \mathcal B$ may be quite coarse when compared to $\mathcal B$.
Using the tower property of conditional expectation, one can show $$ \mu(f | T^{-n} \mathcal B) = f \circ T^n $$ for all $n \geq 0$. What's really astonishing is that the LHS converges almost surely, as I'll show below.
Reverse martingales
Definition: Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space and let $\mathcal F_n, n \geq 1$ be a sequence of $\sigma$-subalgebras for which $\mathcal F_n \supset \mathcal F_{n+1}$ for all $n$. A sequence of $L^1$ random variables $(X_n)$ is called a reverse martingale if $\mathbb E(X_{n+1} | \mathcal F_n) = X_n$ for all $n$.
Note that $X_n = \mathbb E(X_1 | \mathcal F_n)$, so $X_n$ is $L^1$ for all $n$ if $X_1$ is. This is what makes reverse martingales so nice: they're all Levy martingales automatically, hence automatically uniformly integrable (unlike those pesky forward martingales, convergence theorems for which require a lot of care to check $L^1$ convergence).
Theorem: Let $\{X_n\}$ be a reverse martingale with respect to $(\mathcal F_n)$. Then, $X_n$ converges almost surely and in $L^1$ to $X_\infty = \mathbb E(X_1 | \mathcal F_{\infty})$, where $\mathcal F_\infty = \cap_{i = 1}^\infty \mathcal F_i$.
Since $Y_n = \mu(f | T^{-n} \mathcal B)$ is a (reverse) Levy martingale, it converges almost surely to $\mu(f | \mathcal B_\infty)$ where $\mathcal B_\infty = \cap_{i = 0}^\infty T^{-i} \mathcal B$.
Proving $f$ is $T$-invariant
Write $f_\infty = \mu(f | \mathcal B_\infty)$, noting that $f \circ T^n$ converges to $f_\infty$ in measure ($\mu$).
To prove $f$ is $T$-invariant, let $I \subset \mathbb R$ be an interval. We will prove that for `most' intervals, $$D_I := \mu\bigg( \{ f \in I \} \Delta \{ f \circ T \in I \} \bigg) = 0,$$ where $\Delta$ denotes symmetric difference.
Fix $\epsilon > 0$ and let $n$ be sufficiently large so that $\mu(|f\circ T^n - f_\infty| > \epsilon) < \epsilon$. Then
$$ D_I = \mu \big( \{ f \circ T^n \in I\} \Delta \{ f \circ T^{n+1} \in I \} \big) \leq \epsilon + \mu\{f \circ T^n \in I\} \cap \{ f \circ T^{n+1} \in I_\epsilon \setminus I \} +\mu\{f \circ T^{n+1} \in I\} \cap \{ f \circ T^{n} \in I_\epsilon \setminus I \} $$ where $I_\epsilon$ is the `fattening' of $I$ by $\epsilon$ ($I_\epsilon = [a - \epsilon, b + \epsilon]$ where $I = [a,b]$). Pulling back, we have shown that
$$ D_I \leq \epsilon + \mu\{f \in I\} \cap \{ f \circ T \in I_\epsilon \setminus I \} +\mu\{f \circ T \in I\} \cap \{ f \in I_\epsilon \setminus I \} $$ for any $\epsilon > 0$. Taking $\epsilon \to 0$ and assuming $I = [a,b]$, where $a,b$ are chosen from the (at worst co-countable) set of points c for which $\mu \{ f = c\} = 0, \mu\{ f \circ T = c \} = 0$, we conclude that $D_I = 0$. This implies invariance of $f$.
Solution 2:
A Blumenthal has already given a neat answer, but there is a more elementary argument for the invariance of $f$ that requires neither the ergodic theorem nor the backward martingale convergence theorem. This is shown in Peter Walters book (Theorem 6.10).
First, note that for every measurable $E$, \begin{align*} \mu(T^{-1}E\setminus E) &= \mu(T^{-1}E)-\mu(T^{-1}E\cap E) \\ &= \mu(E) - \mu(T^{-1}E\cap E) \\ &= \mu(E\setminus T^{-1}E) \;. \end{align*} This is true for every invariant measure, in particular, \begin{align*} \nu(T^{-1}E\setminus E) &= \nu(E\setminus T^{-1}E) \;. \end{align*} for every measurable $E$.
Now, for $r>0$, let $E_r:=\{x: f(x)<r\}$. Then, \begin{align*} \int_{T^{-1}E_r\setminus E_r}f\,\mathrm{d}\mu &= \int_{E_r\setminus T^{-1}E_r}f\,\mathrm{d}\mu \;. \end{align*} Observe that $f\geq r$ on $T^{-1}E_r\setminus E_r$ and $f<r$ on $E_r\setminus T^{-1}E_r$. Therefore, $\mu(T^{-1}E_r\setminus E_r)=\mu(E_r\setminus T^{-1}E_r)=0$.
In words, this says that for every $r>0$, the set of points $x$ such that either $f(Tx)<r\leq f(x)$ or $f(x)<r\leq f(Tx)$ has $\mu$-measure $0$, and this means $f(x)$ and $f(Tx)$ must agree almost everywhere.
(More precisely, \begin{align*} \mu\left(\{x: f(Tx)<f(x)\}\right) &\leq \sum_{r\in\mathbb{Q}^+} \mu(T^{-1}E_r\setminus E_r) = 0 \end{align*} and similarly, $\mu\left(\{x: f(x)<f(Tx)\}\right)=0$. Hence, $f\circ T=f$ $\mu$-almost everywhere.)