Is the y axis on a PDF actually meaningless?
This idea popped in my head when I was reading this post on the normal distribution and the y-axis.
My question is (and taking advantage of a nearby computer), a PDF inputs one value and returns another, and this returned value is a probability. So, if we were using R, we'd do something like dnorm(0)
and get 0.3989423
. Fair enough.
However, the above post mentioned (all credit due to @Arkamis):
"By the fundamental theorem of calculus, the PDF is then the derivative of the CDF; that is, the PDF is the derivative of a function that returns a probability. So what is that intuitively? Honestly... it's not really anything. The "units" of the vertical axis in the PDF plot don't lead to anything intuitive; they are meaningful, but only in a derived, mathematical sense."
So, is the y-axis of a PDF returning a probability, or instead is it a mostly unintuitive construct?
Solution 1:
As the author of that snippet, perhaps I should expand this comment.
Suppose you have a function $F(x)$ and its derivative $f(x) = F'(x)$. What does $y_k = f(x_k)$ tell you? It tells you the slope of $F(x)$ at the point $x_k$, and nothing else about $F(x)$. With some assumptions on $F$ such as continuity, we can extend this meaning to interpret some local behavior of $F$, and we can extract some additional approximate details about $F(x)$ from the quantity $y_k$.
Likewise, the probability density function of a continuous distribution, evaluated at a point in its support, gives you nothing but the density of the distribution at that point. With some additional knowledge of the underlying distribution function, we can expand this point value to extract some additional approximations and/or qualitative data about the distribution.
The PDF encodes the shape of the distribution, which is absolutely meaningful when you can compute $f(x)$ over some subset of the support of the distribution. But a single arbitrary value of the PDF usually gives you nothing important. It's only when we leverage the properties of a PDF in some way, typically through the Fundamental Theorem of Calculus, that we really get interesting data. Of course, plotting the PDF over the domain can be highly useful indeed!
Solution 2:
You know if you think about it carefully there is a meaning. I think the meaning is the Y-axis measurement units are obviously not probabilities alone but probabilities per 1 unit x.
Example: I have a uniform distribution from 0 to 10 then its pdf value is 1/10 for all the support. Then this states that my probability for any unit of length 1 (say from 0 to 1) is 1/10.
If my uniform distribution is from 0 to .5 then obviously the pdf will have a value of 2 because the support is not .5 units long.
This is the same anology using time, speed, and distance. Here distance is analogous to probability. Time is on the x-axis, speed (analogous to the numbers of the pdf) is on the y axis and given as distance per unit time. The question is what is the total distance traveled?
Solution 3:
The units are those of $1/\sigma$, where $\sigma$ is the standard deviation. This is seen in the fact that besides $1/\sigma$ the other factors in the density are the unitless $1/\sqrt{2\pi\ {}}$ and the unitless value of the exponential function.
If men's heights and temperatures at noon on the fourth of July are normally distributed, the units would be different in those two cases.
The values of probability density functions are not probabilities. If they were, then none of them could be more than $1$, but we commonly see values more than $1$. E.g. the normal density with standard deviation $1/100$.
If a normally distributed random variable $X$ is in miles, then the values of the density are in "per mile", i.e. $1/\text{mile}$. You add a certain amount of probability per mile added.
Solution 4:
The y units are the inverse of the x variable. For example if you have a distribution function showing the distribution of people vs weight you have weight on the x axis. When you take the area under the curve for a section of the populations, lets say 140-160 lbs, you get the probability of the population that is between 140 and 160 lbs. Since area is height x width we have probability is height x pounds and since probability is unites the height must be 1/lb. You could also say the y axis is probability per pound. Understanding that we have to scan a range of pounds and there is zero probability at any single weight. You could also use a y variable of %/lb as some prefer to think in terms of percentages rather than a unitless probability.
Solution 5:
Y axis represents probability density whereas probability is represented as the area. Y axis can be thought as probability/dx instead of probability. If you assume y axis as probability as you refine your range and get smaller and smaller the curve goes away. To avoid this paradox y axis is probability density.