Solution 1:

This is actually an important and much used concept in the study of Markov chains. The number $\delta$ is meaningful because one assumes $\mu$ is a probability measure (and not only a measure). Then $\delta$ is used to evaluate the rate of the loss of memory of the initial state by the chain.

Solution 2:

Interpretation 1

If $A$ is small, then the chain probabilistically regenerates/forgets its past once it enters $A$.

If a transition kernel is of the form $$ p^{(1)}(x,B) = \nu(B) \quad{(*)} $$ for all $x$ and $B$, then the chain elements are independent. This is because the transition doesn't depend on where it's coming from: $x$.

With a small set, we're only looking at $x \in A$. If $A$ is $k=1$ small, then $p^{(1)}(x,\cdot) \ge \delta \nu(\cdot)$ for all $x \in A$, and we can write the transition kernel as follows: \begin{align*} p^{(1)}(x,\cdot) &= \delta \nu(\cdot) + p^{(1)}(x,\cdot) -\delta \nu(\cdot)\\ &= \delta \nu(\cdot) + (1-\delta) \frac{p^{(1)}(x,\cdot) -\delta \nu(\cdot)}{1-\delta} \\ &= \delta \nu(\cdot) + (1-\delta)K(x, \cdot). \end{align*}

This is a discrete mixture, so with probability $\delta$ you're transitioning with $\nu$ and forgetting the past, and with probability $1-\delta$, you're transitioning with something that takes into account where you're coming from and not forgetting the past. The smallness property gives us the nonnegativity of $K$. So you can think of it as just algebra.

As @Did mentions, "$\delta$ is used to evaluate the rate of the loss of memory of the initial state by the chain." You can see that if $\delta = 1$, then we get equation $(*)$. That's a maximum forgetting rate. It never even thinks about where it's coming from.

Other things: If $k > 1$, then it takes longer to forget, and if we’re talking about “petiteness” then it’s the same idea but with a random number of steps.

Interpretation 2

Smallness has nothing to do with the Lebesgue measure of the state space. There are many examples of chains where the entire state space is small.

However, the word "small" is still intuitive. Instead of using Lebesgue measure to describe the size of a set, you can use the chain's transition kernel.

Here's an example from Meyn and Tweedie's book on page 107. Suppose we're talking about a random walk model:

$$ \Phi_n = \Phi_{n-1} + W_n, $$ and try to visualize its transition densities $f(\Phi_n \mid \Phi_{n-1})$ for a few values of $\Phi_{n-1}$. For simplicity suppose $W_n$s are all iid standard normal. Here's a picture to help.

some transition densities

Notice that in the interval $-1 \le \Phi_{n-1} \le 1$, all of the transition densities that I plotted are bounded below by the horizontal line. That horizontal line is proportional to the uniform density on the interval $[-1,1]$. The interval $[-1,1]$ is "small" because if you spread it out even further, the lower bound would approach $0$, and you might not get the smallness condition.

If you think of increasing the size of the small set as increasing the bounds away from zero, it can still remain small. The word "small" can be misleading in this sense. That's why I prefer to think of "small" as relative to something else. One small set can be "larger" than another small set, so compare the small set to its complement, in this case. For any fixed region, inside the boundary is small, and outside the boundary is large/vast.

Solution 3:

Yes you can find a lot related topic at Jeffery Rosenthal's paper, especially the "General State Markov chain and MCMC algorithm".

Another useful reference is Nummelin's paper called "MC for MCMC". This is more intuitively than Rosenthal and when I write paper about this topic this summer, I find it is really useful.