Histogram and Normal distribution

I was studying histograms and normal distribution. As far as I know, they are two different tools used for calculating probability and statistics. More specifically they help to visualize and it is an effective way to summarize a large amount of data.

The main difference is in their math and the way they visualize. To calculate the probability of an event from a histogram, we calculate it in a normal arithmetic way. But, if we want to calculate probability from normal distribution we need calculus and geometry. I am adding screenshots so that everyone could understand what I meant above.

Could anyone help me to know their use cases? In which cases it will be better to use histograms and normal distribution? Is there any condition I should check before deciding which one I should use whether it is histogram or normal distribution?

Histogram of a small sample.

Suppose you have a population of high school women, you sample 100 womn at random from the population, measure their heights (to the nearest inch) and make a histogram of these 100 heights.

Using R statistical software, I can emulate this process to get fictitious data for an example. The vector x contains the heights in inches of 100 women.

 set.seed(2021)                  # for reproducibility
 x = round(rnorm(100, 64, 3.5))  # draw sample, round; see Note at end

From the following summary I can see that the tallest woman was 71" tall and the shortest was 56" tall. Also, I can see that that the average height is $\bar X = 63.36"$

summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  56.00   61.00   63.50   63.36   66.00   71.00

The histogram below has labels atop its bars, indicating how many women are represented in each bar. So, I can say that $8+10+1 = 19$ of the $100$ women are taller than 66". [In this style of histogram the intervals contain the top boundary, but not the bottom boundary.] From this I might guess that roughly $0.19 = 19\%$ of the women in the -population_ are taller than 66". But this is only a rough estimate based on a sample of 100. Perhaps it is more appropriate to give a 95% confidence interval for the probability as $(0.113, 0.267)$ or $0.19 \pm 0.077.$

hist(x, col="skyblue2", label=T)

p.est = 0.19
CI = p.est + qnorm(c(.025,.975))*sqrt(p.est*(1-p.est)/100) 
CI
[1] 0.1131104 0.2668896

enter image description here

sum(x > 66)
[1] 19

Exact distribution of population.

By contrast, if I am told that the population distribution of such female student heights is $\mathsf{Norm}(\mu = 64, \sigma=3.5).$ then I have more knowledge about the population than I can deduce form a sample of $100$ women.

Then I can find a z-score and use printed normal CDF tables to find the exact proportion of high school women in the population weighing more than 66". For the best result, I should use $66.5$ because women taller than that will be rounded to 67" or more. (This adjustment is called the 'continuity correction'.)

Then $Z = \frac{66.5 - 64}{3.5} = 0.714.$ And from the printed table you get approximately the proportion $0.238.$ [Usually, using printed tables involves some rounding, with a small loss of accuracy.] You can use the normal CDF function pnorm in R, to get the slightly more accurate value $0.2376.$

z = (66.5-64)/3.5;  z
[1] 0.7142857
1 - pnorm(0.714)
[1] 0.2376136
1 - pnorm(66.5, 64, 3.5)
[1] 0.2375253

Of course, the answer $0.238$ from the exact population distribution is much better than the approximate answer $0.19\pm 0.077$ estimated from a sample of only 100 women. But you try to do your best with the information you have.

The probability $0.238$ is the area under the density curve to the right of the vertical line.

hdr = "Density of NORM(64, 3.5)"
curve(dnorm(x, 64, 3.5), 50, 75, lwd=2, ylab="Density", main=hdr)
 abline(h = 0, col="green2");  abline(v = 66.5, lwd=2)

enter image description here

Note: The information in the line of R code

x = round(rnorm(100, 64, 3.5))

would never be known in a practical situation. This was used only to make a fictitious sample of 100. [I don't happen have a huge population of high school women in my office to use for taking the sample.]

A system of particles distributed on the surface of a ball, what is the "center of mass" of them on the surface?

Non-zero positive integers, not necessarily distinct, are written on the squares of an $8$ × $8$ chessboard. Unorthodox number theory problem.

Standard deviation of number of triangles in Erdos-Renyi uniform random graph G(n,m)

To find a sequence on $L^1$-norm equal to 2, converging a.e. to a function of $L^1$norm equal to 1.

Pip - Fatal error in launcher: Unable to create process using '"'

Android: Why does long click also trigger a normal click?

iOS (iPhone, iPad, iPodTouch) view real-time console log terminal

Xcode 7 Beta warnings: Interface Orientations and Launch Storyboard

What is out keyword in kotlin

How to show vertical line to wrap the line in Vim?

Play sound on button click android

Difference between a class and object in Kotlin