Why is a simulation of a probability experiment off by a factor of 10?
In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $\frac1{8^{12}}$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $\frac1{19 \choose 7}$ over a million times more likely but difficult to simulate
If the balls are distinguishable, the probability all eight boxes are full is $$\frac{8! \, S_2(12,8)}{8^{12}}$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?
If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $\{0,1,2,3,\ldots,18\}$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive
Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^{12}\approx7\times10^{10}$ elements.
Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has ${19\choose7}\approx 5\times10^4$ elements.
The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.
Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:
weight = 1
for cell_population in cells:
weight *= math.factorial(cell_population)
At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.
The original problem is posed, so far as I can tell, to show the difference between combinations and permutations. In nature, there is no such thing as indistinguishable balls. Semi-infinite tests (e.g. Las Vegas) have shown this to be true.
Now, if the problem really wants you to use "indistinguishable" balls for the purposes of solving the problem, then yes, you need to use combinations and not permutations when calculating all the ways the indistinguishable balls are placed into the containers. And of course, you need to use permutations for the numbered balls, as they are distinguishable from each other and from the collection of indistinguishable balls.
Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.