What is the difference between "probability density function" and "probability distribution function"?

Solution 1:

Distribution Function

  1. The probability distribution function / probability function has ambiguous definition. They may be referred to:
    • Probability density function (PDF)
    • Cumulative distribution function (CDF)
    • or probability mass function (PMF) (statement from Wikipedia)
  2. But what confirm is:
    • Discrete case: Probability Mass Function (PMF)
    • Continuous case: Probability Density Function (PDF)
    • Both cases: Cumulative distribution function (CDF)
  3. Probability at certain $x$ value, $P(X = x)$ can be directly obtained in:
    • PMF for discrete case
    • PDF for continuous case
  4. Probability for values less than $x$, $P(X < x)$ or Probability for values within a range from $a$ to $b$, $P(a < X < b)$ can be directly obtained in:
    • CDF for both discrete / continuous case
  5. Distribution function is referred to CDF or Cumulative Frequency Function (see this)

In terms of Acquisition and Plot Generation Method

  1. Collected data appear as discrete when:
    • The measurement of a subject is naturally discrete type, such as numbers resulted from dice rolled, count of people.
    • The measurement is digitized machine data, which has no intermediate values between quantized levels due to sampling process.
    • In later case, when resolution higher, the measurement is closer to analog/continuous signal/variable.
  2. Way of generate a PMF from discrete data:
    • Plot a histogram of the data for all the $x$'s, the $y$-axis is the frequency or quantity at every $x$.
    • Scale the $y$-axis by dividing with total number of data collected (data size) $\longrightarrow$ and this is called PMF.
  3. Way of generate a PDF from discrete / continuous data:
    • Find a continuous equation that models the collected data, let say normal distribution equation.
    • Calculate the parameters required in the equation from the collected data. For example, parameters for normal distribution equation are mean and standard deviation. Calculate them from collected data.
    • Based on the parameters, plot the equation with continuous $x$-value $\longrightarrow$ that is called PDF.
  4. How to generate a CDF:
    • In discrete case, CDF accumulates the $y$ values in PMF at each discrete $x$ and less than $x$. Repeat this for every $x$. The final plot is a monotonically increasing until $1$ in the last $x$ $\longrightarrow$ this is called discrete CDF.
    • In continuous case, integrate PDF over $x$; the result is a continuous CDF.

Why PMF, PDF and CDF?

  1. PMF is preferred when
    • Probability at every $x$ value is interest of study. This makes sense when studying a discrete data - such as we interest to probability of getting certain number from a dice roll.
  2. PDF is preferred when
    • We wish to model a collected data with a continuous function, by using few parameters such as mean to speculate the population distribution.
  3. CDF is preferred when
    • Cumulative probability in a range is point of interest.
    • Especially in the case of continuous data, CDF much makes sense than PDF - e.g., probability of students' height less than $170$ cm (CDF) is much informative than the probability at exact $170$ cm (PDF).

Solution 2:

The relation between the probability density funtion $f$ and the cumulative distribution function $F$ is...

  • if $f$ is discrete: $$ F(k) = \sum_{i \le k} f(i) $$

  • if $f$ is continuous: $$ F(x) = \int_{y \le x} f(y)\,dy $$