Why does the normalized z-score introduce a square root? (And some more confusion)

If $X$ is a normal random variable, you can record an observation of it, $x$, and compare it to the mean. The usual way to do this is to standardize the variable, i.e.,

$$z = \frac{x - \mu}{\sigma}$$

Let's say that $X_1, X_2, \ldots X_n$ are random variables from the same distribution as $X$ above. If we record observations of each and calculate the mean, that's also a random variable. However, we can't expect the mean, $\overline{X}$, our new random variable to have the same distribution as our original distribution. It will have the same mean, but it won't have the same variance.

Think of it this way: Make $n$ larger and larger — record more and more observations. It seems that after a while, the mean of all those observations will be the same as the mean from the population. To make it a bit more concrete: Flip a coin a few times, letting $X = 1$ for heads and $X = 0$ for tails. Will your mean be $0.5$? Probably not. Flip it a few more times. Maybe you're a bit closer to $0.5$. By the time you flip the coin, say, a few thousand times, you'll probably be very close to $0.5$.

In other words, when we record a sample mean, making many observations, in the long term, restricts how far we can stray from the true mean. This is shown by the fact that

$$\text{Var}(\overline{X}) = \frac{\sigma^2}{n}$$

Note that as $n \rightarrow \infty$, $\text{Var}(\overline{X}) \rightarrow 0$. This is the gist of the Central Limit Theorem.

The Central Limit Theorem tells us that, regardless of the distribution of a random variable, as we take larger and larger samples, the distribution of the sampling mean (i.e., of $\overline{X}$) is normally distributed, and so we can use all the convenient properties of the normal distribution (like the standardized form). So when we write

$$z = \frac{\overline{x}-\mu}{\sigma / \sqrt{n}}$$

it's really the same thing as the earlier $z$: It's the difference of an observation and an expected value divided by the standard deviation of whatever distribution the observation came from. The new standard deviation (the standard error) is derived from the old one, but that's because the new distribution is derived from the old one.

Remember, $\sigma^2$ stands for population variance. So regardless of whether we're looking at a sample of size n or just one observation, it's always the population variance. $σ^2/n$ is the variance of the sample mean in terms of the population variance. (And, of course, $σ/\sqrt{n}$ is the square root of the variance of the sample mean.)