Definition of measure-preserving: why inverse image?

Solution 1:

A simple answer is that if you consider a measurable map $T$, then in general $T(A)$ need not be measurable when $A$ is measurable and thus $\mu(T(A))$ need not be defined. On the other hand, (if $T$ is measurable, then) $T^{-1}(A)$ will always be defined. This could however be done for a measurable version of the usually called proper maps (for which the image of an open set is open).

A less simple answer (and somewhat more important) is that with the alternative that you mention, one would leave out of the game many noninvertible transformation: all expanding maps of the circle or of any torus, all toral endomorphisms that are not autormophisms, etc, etc, and this even only for the Lebesgue measure. Now imagine one-sided shifts and the associated Markov or Bernoulli measures. Symbolic dynamics and their measurable counterpart would again be out of the game in the noninvertible case.

Solution 2:

The reason is simply in the interpretation, which is sadly too often lost in mindless formalism.

Let me tell you the interpretation when $\mu$ is a probability measure. If $x$ is chosen at random according to $\mu$, what will be the probability distribution of $Tx$? The event that $Tx$ is in a measurable set $A$ is the same as the event that $x$ is in $T^{-1}A$. Hence $Tx$ is distributed according to a measure $T^*\mu$ where $(T^*\mu)(A):=\mu(T^{-1}A)$ for each $A$. Saying that $\mu$ is invariant thus simply means that $T^*\mu=\mu$, that is, if $x$ is chosen according to $\mu$, then the distribution of $Tx$ is also $\mu$.

For the cases where $\mu$ is not a probability measure, you should ask for the interpretation from people who study such measures. One famous example is in Hamiltonian dynamics, where Liouville's theorem states that the volume measure (i.e., the Lebesgue measure) on the phase space is invariant. (There, the time is continuous, but you can safely ignore that here.) The interpretation is the following: mark a large number of points in the phase space and track the trajectory of the system starting from each of these marked points. Liouville's theorem now states that if the marks are initially chosen roughly uniformly in the phase space, then the distribution of the marks at any moment in time remains roughly uniform (hence, the marks are neither compressed together nor stretched away in any region of the phase space). I leave it to you to verify that this translates to saying that $\mu(T^{-t}A)=\mu(A)$ for each measurable set $A$ and any time $t$, where $\mu$ is the volume measure.