What is the correct standard deviation when splitting a sample?

I roll a four-faced die 1000 times, but I have 100 dies, so I seperate into 10 rolls of 100 each and tally the result. I want to calculate the standard deviation of the 0 count. As an example, here's a result:

{0: 251, 1: 254, 2: 271, 3: 224}, $\mu = \frac{251}{1000} = 0.251$

{0: 30, 1: 24, 2: 26, 3: 20}
{0: 25, 1: 25, 2: 26, 3: 24}
{0: 22, 1: 22, 2: 27, 3: 29}
{0: 23, 1: 26, 2: 30, 3: 21}
{0: 24, 1: 20, 2: 30, 3: 26}
{0: 26, 1: 31, 2: 26, 3: 17}
{0: 22, 1: 23, 2: 32, 3: 23}
{0: 23, 1: 32, 2: 23, 3: 22}
{0: 27, 1: 28, 2: 22, 3: 23}
{0: 29, 1: 23, 2: 29, 3: 19}

Distribution

The first way I do it is by using the normal approximation: $$\sigma_1 = \sqrt{\frac{0.251*(1-0.251)}{1000}} = 0.0137$$.

The second way is to calculate the deviation of the 10 rolls, which gives: $$\sigma_2 = \sqrt{\frac{(0.3-0.251)^2+(0.25-0.251)^2+\cdots+(0.29-0.251)^2}{10}}=0.027$$

I tried changing and increasing both the total size and the size of the tally, but the results never approach each other. I think they are both consequences of the central limit theorem, and the discrepancy is due to sampling technique? Which is more correct, or are they both wrong? What's the right way to find $\sigma$ of 0, or 1, etc.? Thank you!

Here's the Python code I used to generate the problem:

import numpy as np
import collections

small = 100
big = 1000

die = np.random.randint(0,4,big)
diedict = collections.Counter(die)
print(dict(sorted(diedict.items()))) #the total tally
std1 = np.sqrt(diedict[0]/big*(1-diedict[0]/big)/big)

sumsquare=0
for i in range(0,big,small):
    print(dict(sorted(collections.Counter(die[i:i+small]).items()))) #the seperate rolls
    sumsquare += (collections.Counter(die[i:i+small])[0]/small-diedict[0]/big)**2

std2 = np.sqrt(sumsquare/(big/small))
print(std1,std2)

plot_histogram(diedict)

The number of occurrences of a particular face has a binomial distribution with parameters $n$ and $\frac14$

so the number of occurrences has a mean of $\frac n4$ and variance $\frac3{16} n$ and standard deviation $\sqrt{\frac3{16} n}$

and the proportion of occurrences has a mean of $\frac 14$ and variance $\frac3{16 n}$ and standard deviation $\sqrt{\frac3{16 n}}$.

When $n=1000$ this last standard deviation is $\sqrt{0.0001875} \approx 0.0137$, close to what you found. If you want the standard deviation for the proportions of a particular face from $1000$ attempts, this is the better approach

When $n=100$ this last standard deviation is $\sqrt{0.001875} \approx 0.0433$, and if you repeated your simulations you should get values around this. Your particular example was low though not exceptionally low (and you did not make any adjustments for using the sample mean to calculate the sample standard deviation)


There are multiple ways to interpret what's going on here.

We could assume the dice are all fair four-sided dice and that what you have done is an exercise in sampling from a population consisting of all possible rolls of a fair four-sided die. In that case you have $10$ samples of $100$ rolls per sample, which you can combine into a single sample of $1000$ rolls.

Of course what you have done in python is merely a simulation of the rolls of fair four-sided dice, but let's accept it as a reasonable proxy for the ideal mathematical process. (For what it's worth, even if you used real dice you would only be approximating the rolls of fair four-sided dice, because we cannot be sure that all the dice are precisely fair given their construction and the way you roll them.)

On the other hand, we could say that what you have done is to use your simulated dice to generate a population of $1000$ individuals, each of which has a numeric value. Exactly $251$ individuals in the population have the numeric value $0,$ which means that if you selected an individual from this population at random and asked if its value is $0,$ the answer ($1$ for true, $0$ for false) is a Bernoulli variable with mean exactly $0.251.$

What exactly then is "the 0 count"?

If the 0 count means the number of zeros in the observation of one roll, where the observation is chosen at random from your $1000$ total observations, then the 0 count has mean $\mu = 0.251$, just as you stated.

The standard deviation of the 0 count for an observation chosen at random from this population is $\sqrt{0.251(1-0.251)} \approx 0.43359.$


For the following, let's take the interpretation that your data as merely a sample of $1000$ observations from the population of all possible rolls of fair four-sided dice. Then $0.251$ is only the mean number of 0s per die observed in your sample That is it is the sample mean. This is an estimate of the population mean, but not necessarily exactly equal to the population mean.

In this interpretation, you have $251$ observations where the 0 count is $1$ and $749$ where it is $0.$ The sample standard deviation is $s = \sqrt{0.251(1-0.251)} \approx 0.43359$ (the same as when we regarded the $1000$ rolls as the entire population), but the usual estimate for the standard deviation of the population is slightly larger, $$ \hat\sigma = \sqrt{\frac{251(1 - 0.251)^2 + 749(0 - 0.251)^2}{999}} \approx 0.43381. $$

We might also be interested in the standard error of the mean. That's a measurement of how much your sample mean ($0.251$ for this sample) was likely to have varied from the population mean (which is $0.25$). (It's actually the standard deviation of the population of all possible random samples of the same size from the underlying population.) We can estimate the standard error of the mean from the sample standard deviation: $$ \mathop{SEM} = \frac{s}{N} \approx \frac{0.43359}{\sqrt{1000}} \approx 0.013711. $$ That agrees with what you found in your "normal approximation."


Your second way also appears to be related to the standard error of the mean. Continuing with the interpretation that your data are merely a sample of $1000$ observations from the population of all possible rolls of fair four-sided dice, you have ten samples of $100$ rolls each, each of which has a mean that may vary from the population mean (which is $0.25$ in this interpretation). In this case the standard error of the mean is obtained for each sample by dividing the sample standard deviation by $\sqrt{100},$ resulting in standard errors that range from about $0.0414$ to $0.0458.$ The sample that happens to exactly match a population of fair four-sided dice, where the 0 occurs $25$ times, has standard error $0.0433.$

As it happens, you have more than the expected number of sample means within a range of $\pm$ two standard errors, whether you count from the (theoretical) population mean or the mean of the sample of $1000$ rolls. Maybe this is due to a defect in the random number generator, but it could just be luck. Either way, you have a smaller amount of deviation than normal, so when you take your sample of $10$ observations of taking samples of $100$ rolls, and take the sample standard deviation of those $10$ observations, you get a result less than the standard error of any of the individual samples of $100.$

So if you consider your "second way" as a way of estimating the standard error of a sample of $100$ by taking ten samples of $100$ and taking the sample standard deviation of those ten observations, you arrive at an underestimate of the standard error of the mean for $100$ rolls.

To be clear: the result you get from your "second method" is (somewhat) surprisingly small. The fact that it is larger than the standard error of a sample of $1000$ is a good thing, because the standard error of the mean of $100$ rolls should be larger than the standard error of $1000$ rolls. The only discrepancy is that there should be an even larger difference between the two results.


If we do not assume the dice are fair, things get a little more complicated. If the dice are not all fair, are they all unfair in the exact same way, or can they be unfair in different ways? In the first case we can take $0.251$ as the best estimate of the mean 0 count for each die; in the second case $0.251$ is only the estimated mean of the means, where each die might have a different mean 0 count. The second case violates the usual assumptions behind a lot of the formulas we have used here.