Distinguishing probability measure, function and distribution
The difference between the terms "probability measure" and "probability distribution" is in some ways more of a difference in connotation of the terms rather than a difference between the things that the terms refer to. It's more about the way the terms are used.
A probability distribution or a probability measure is a function assigning probabilities to measurable subsets of some set.
When the term "probability distribution" is used, the set is often $\mathbb R$ or $\mathbb R^n$ or $\{0,1,2,3,\ldots\}$ or some other very familiar set, and the actual values of members of that set are of interest. For example, one may speak of the temperature on December 15th in Chicago over the aeons, or the income of a randomly chosen member of the population, or the particular partition of the set of animals captured and tagged, where two animals are in the same part in the partition if they are of the same species.
When the term "probability measure" is used, often nobody cares just what the set $\Omega$ is, to whose subsets probabilities are assigned, and nobody cares about the nature of the members or which member is randomly chosen on any particular occasion. But one may care about the values of some function $X$ whose domain is $\Omega$, and about the resulting probability distribution of $X$.
"Probablity mass function", on the other hand, is precisely defined. A probability mass function $f$ assigns a probabilty to each subset containing just one point, of some specified set $S$, and we always have $\sum_{s\in S} f(s)=1$. The resulting probability distribution on $S$ is a discrete distribution. Discrete distributions are precisely those that can be defined in this way by a probability mass function.
"Probability density function" is also precisely defined. A probability density function $f$ on a set $S$ is a function specifies probabilities assigned to measurable subsets $A$ of $S$ as follows: $$ \Pr(A) = \int_A f\,d\mu $$ where $\mu$ is a "measure", a function assigning non-negative numbers to measurable subsets of $A$ in a way that is "additive" (i.e. $\mu\left(A_1\cup A_2\cup A_3\cup\cdots\right) = \mu(A_1)+\mu(A_2)+\mu(A_3)+\cdots$ if every two $A_i,A_j$ are mutually exclusive). The measure $\mu$ need not be a probability measure; for example, one could have $\mu(S)=\infty\ne 1$. For example, the function $$ f(x) = \begin{cases} e^{-x} & \text{if }x>0, \\ 0 & \text{if }x<0, \end{cases} $$ is a probability density on $\mathbb R$, where the underlying measure is one for which the measure of every interval $(a,b)$ is its length $b-a$.