Query on the standard deviation formula

It is widely known that the variance formula is:

$S^{2}=\frac{\sum_{i=1}^{n}\left ( X_{i} - \overline{X} \right )^{^{2}}}{n-1}$

and that the standard deviation formula is:

$S^{2}=\sqrt[]{\frac{\sum_{i=1}^{n}\left ( X_{i} - \overline{X} \right )^{^{2}}}{n-1}}$

But if the purpose of squaring the difference $\left ( X_{i} - \overline{X} \right )^{^{2}}$ is to eliminate the effect of the sign, would it not be more logical for the standard deviation equation to eliminate only the effect of the square affecting the upper part of the equation and not its entirety? like this:

$S^{2}=\frac{\sqrt{\sum_{i=1}^{n}\left ( X_{i} - \overline{X} \right )^{^{2}}}}{n-1}$

Does someone understand and can explain to me why it is not like this? Thanks.


[This is about the "purpose of the variance" part of the question.]

The purpose of squaring the error, i.e., $X_i-\bar X$, is not to eliminate sign effects, any other non-negative function instead of squaring would do the same.

Gauss (1821) choosed squaring the error, and he admits that this decision

"is made arbitrarily without a strong necessity"

Laplace proposes the absolute value, but Gauss argued against it. Following Laplace, a doubled error would count as much as the same error done twice.

But his main reason was that the absolute value doesn't have a derivative. He states that

"This treatment [that of Laplace] opposes in a higher degree any analytic treatment whereas the results from our principle [squaring the error] distinguish in simplicity and in generality as well."

See https://archive.org/details/abhandlungenmet00gausrich/page/n17/mode/2up, p. 5f.


The most important reason is consistency. An estimator $\hat{\theta}_n$ of $\theta$ is consistent if for every $\epsilon>0$ $$ P(|\hat{\theta}_n-\theta|>\epsilon)\to 0,\quad n\to\infty, $$ and we write $\hat{\theta}_n\overset{P}{\to}\theta$. It can be proved that $S\overset{P}{\to}\sqrt{Var(X)}$, where $S$ is constructed from $n$ iid copies $X_1,...,X_n$ of $X$. This says that whenever or sample size is big we have a very good estimate of the real standard deviation.

If you define $S=\sqrt{\sum_{i=1}^n(X_i-\bar{X}_n)^2}/(n-1)$, then $S\overset{P}{\to}0$, and that will be useless. The last affirmation can be proved with Slutsky's theorem.