Understanding the relationship of the $L^1$ norm to the total variation distance of probability measures, and the variance bound on it

I am trying to find a bound for variance of an arbitrary distribution $f_Y$ given a bound of a Kullback-Leiber divergence from a zero-mean Gaussian to $f_Y$, as I've explained in this related question. From page 10 of this article, it seems to me that:

$$\frac{1}{2}\left(\int_{-\infty}^{\infty}|p_Z(x)-p_Y(x)|dx\right)^2 \leq D(p_Z\|p_Y)$$

I have two questions:

1) How does this come about? The LHS is somehow related to total variation distance, which is $\sup\left\{|\int_A f_X(x)dx-\int_A f_Y(x)dx|:A \subset \mathbb{R}\right\}$ according to wikipedia article, but I don't see a connection. Can someone elucidate?

2) Section 6 on page 10 of the same article seems to talk about variation bounds, but I can't understand it... Can someone "translate" that to the language that someone with a graduate-level course on probability can understand? (I haven't taken measure theory, unfortunately.)


1) Check out Lemma 11.6.1 in Elements of Information Theory by Thomas and Cover.

2) The LHS is essentially the total variation between probability measures $p_Z$ and $p_Y$ (see here). I think "variation bounds" quite literally means bounds on the total variation between the probability measures, as given in the Lemma on p. 11.