why measure theory for probability?
Solution 1:
The standard answer is that measure theory is a more natural framework to work in. After all, in probability theory you are concerned with assigning probabilities to events (sets)... so you are dealing with functions whose inputs are sets and whose outputs are real numbers. This leads to sigma-algebras and measure theory if you want to do rigorous analysis.
But for the more practically-minded, here are two examples where I find measure theory to be more natural than elementary probability theory:
1) Suppose X~Uniform(0,1) and Y=cos(X). What does the joint-density of (X,Y) look like? What is the probability that (X,Y) lies in some set A? This can be handled with delta-functions but personally I find measure theory to be more natural.
2) Suppose you want to talk about choosing a random continuous function (element of C(0,1) say). To define how you make this random choice you would like to give a p.d.f. but what would that look like? (The technical issue here is that this space of continuous functions is infinite dimensional and so Lebesgue measure cannot be defined). This problem is very natural in the field of Stochastic Processes including Financial Mathematics -- a stock price can be thought of as a random function. Under the measure theory framework you talk in terms of probability measures instead of p.d.f.'s and so infinite dimensions do not pose an obstacle.
Solution 2:
Simple answer: Tossing a coin.
Longer answer: You know that you treat discrete events like the above with probability mass functions or similar, but continuous things with probability density functions. Imagine you had $X$ which is randomly uniform on $[0,1]$ half the time and $5$ the rest of the time. Perfectly reasonable thing, could easily come up. Doesn't fit into either framework.
Measure theory provides a consistent language and mathematical framework unifying these ideas, and indeed much more general objects in stochastic theory. It removes any necessity to distinguish between fundamentally similar objects, and crystallizes the relevant points out, allowing much deeper understanding of the theory.