Difference in probability distributions from two different kernels

The notation can be a bit cumbersome because of the nested integrals, but this solution relies only on very basic properties of integration and is direct (no induction).

Consider the difference $a(x_0,F)=\mathsf P'_{x_0}(F)-\mathsf P_{x_0}(F)$. By uniqueness of measure it follows from the definition of $\mathsf P_{x_0}$ that: $$\begin{align} a(x_0,F) =&\int_E\dots\int_E 1_F(x_1,\dots x_n) P'(x_{n-1},dx_n)\dots P'(x_0,dx_1)\\ &- \int_E\dots\int_E 1_F(x_1,\dots x_n) P(x_{n-1},dx_n)\dots P(x_0,dx_1) \end{align}$$

By introducing intermediate telescoping terms we can split this into a sum of $n$ terms $a(x_0,F)=\sum_{j=1}^n a_j(x_0,F)$ where $$\begin{align} a_j(x_0,F) =& \int_E\dots\int_E 1_F(x_1,\dots x_n) P'(x_{n-1},dx_n) \dots P'(x_{j-1},dx_j) P(x_{j-2},dx_{j-1})\dots P(x_0,dx_1)\\ &- \int_E\dots\int_E 1_F(x_1,\dots x_n) P'(x_{n-1},dx_n) \dots P'(x_j,dx_{j+1}) P(x_{j-1},dx_j)\dots P(x_0,dx_1) \end{align}$$

The innermost part (consisting of $n-j$ nested integrals) is common to both terms, to make things more readable we factor it into $$g_j(x_1,\dots x_j) = \displaystyle\int_E\dots\int_E 1_F(x_1,\dots x_n) P'(x_{n-1},dx_n)\dots P'(x_j,dx_{j+1})$$ Then by linearity and $|\int f|\le\int |f|$: $$\begin{align} |a_j(x_0,F)|\le& \int_E\dots\int_E\left|\int_E g_j(x_1,\dots x_j) \left(P'(x_{j-1},dx_j)-P(x_{j-1},dx_j)\right)\right| P(x_{j-2},dx_{j-1})\dots\\ \le& \int_E\dots\int_E\|P'-P\| P(x_{j-2},dx_{j-1})\dots P(x_0,dx_1)\\ =& \|P'-P\|\\ |a(x_0,F)|\le& n\cdot\|P'-P\| \end{align}$$ (the bound of the integral by $\|P'-P\|$ comes from considering the function $g_j(x_1,\dots x_{j-1},\cdot)$)

I hope I have a solution for the problem, so I post it here. I'll be happy if you comment on the solution if it is correct or maybe provide more short and neat one.

First of all, I change a notation a bit and use $\mathsf P_x^n$ instead of $\mathsf P_x$ in OP to denote the probability measure on the space $(E^n,\mathcal E^n)$, just to mention the dependence on $n$ expolicitly. Then for all measurable rectangles $B = A_1\times A_2\times\dots\times A_n\in \mathcal E^{n-1}$ and the set $A_0\in \mathcal E$ it holds that $$ \mathsf P_x^n(A_0\times B) = 1_{A_0}(x)\int\limits_{A_1}\dots \int\limits_{A_n}P(x_{n-1},dx_n)\dots P(x,dx_1) = 1_{A_0}(x)\int\limits_E \mathsf P_{y}^{n-1}(B)P(x,dy). $$ By the uniqueness of the probability measure $\mathsf P_x^n$ the same result holds for any $B\in \mathcal E^{n-1}$: $$ \mathsf P_x^n(A_0\times B) = 1_{A_0}(x)\int\limits_E \mathsf P_{y}^{n-1}(B)P(x,dy). \tag{1} $$
For any set $C\in \mathcal E^n = \mathcal E\times\mathcal E^{n-1}$ we can show that $$ \mathsf P_x^n(C) = \int\limits_E \mathsf P_y^{n-1}(C_x)P(x,dy) \tag{2} $$ where $C_x = \{y\in E^{n-1}:(x,y)\in C\}\in\mathcal E^{n-1}$. To prove it we first verify $(2)$ for measurable rectangles $C = A\times B$ using $(1)$, hence $C_x = B$ if $x\in A$ and $\emptyset$ otherwise. By the advise of @tb this result further extends to all $C\in \mathcal E^n$ by $\pi$-$\lambda$ theorem.
The inequality $\left|\tilde{\mathsf P}_x^n(C) - \mathsf P_x^n(C)\right|\leq n\|\tilde P-P\|$ can be proved then by induction: it clearly holds for $n=1$ $$ \left|\tilde{\mathsf P}^1_x(C) - \mathsf P^1_x(C)\right| = \left|\tilde P(x,C_x) - P(x,C_x)\right|\leq 1\cdot\|\tilde P - P\|. $$ If the same inequality holds for $n-1$, we have $$ \begin{align} \left|\tilde{\mathsf P}_x^n(C) - \mathsf P_x^n(C)\right| &= \left|\int\limits_E \tilde{\mathsf P}_y^{n-1}(C_x)\tilde P(x,dy)-\int\limits_E \mathsf P_y^{n-1}(C_x)P(x,dy)\right| \\ &\leq \left|\int\limits_E \left(\tilde{\mathsf P}_y^{n-1}(C_x) - \mathsf P_y^{n-1}(C_x)\right)\tilde P(x,dy)\right|+\left|\langle \tilde P(x,\cdot) - P(x,\cdot),\mathsf P_{(\cdot)}^{n-1}(C_x)\rangle\right| \\ &\leq (n-1)\|\tilde P - P\|+\|\tilde P - P\| = n\|\tilde P - P\| \end{align} $$ where $\langle m(\cdot),f(\cdot)\rangle = \int\limits_E f(y)\mu(dy)$ for all measurable bounded $f$ and all finite signed measures $\mu$. Since the RHS in the bound derived does not depend on $x,C$ it means that we proved desired bounds.

Difference in probability distributions from two different kernels

Related

Recent Posts