How can I intuit the role of the central limit theorem in breaking the curse of dimensionality for Monte Carlo integration

As you correctly point out, the error in a uniform sampling integration goes as some power of $N^{−1/d}$. The problem of uniform sampling is indeed similar to a non-representative statistical survey. Many of the points in a uniform tensor grid don't add much new information. For example, assume that the function you want to integrate looks like $F(x,y)\approx f(x)+g(y)$ in a local square. If you sample this square with a $k\times k$ tensor grid, you will have used $k^2$ evaluations of $F$, but only received information about $f$ and $g$ for $k$ different arguments. A Taylor expansion of $F$ in a local square as $F(x,y)=ax+by+c+O(x^2+y^2)$ indicates that it is indeed not unusual for $F$ to locally look like that. (Note that no term related to $xy$ occurs in this expression.)

To analyze the connection between uniform tensor grids and the Taylor expansion further, let's look at multivariate polynomials of a fixed degree $k$ in $d$ variables. The function values on a $d$-dimensional uniform tensor grid with $k+1$ points in each direction uniquely determine a polynomial where the max degree of each individual variable (in each term) is smaller or equal than $k$. However, the degree (of an individual term) of a polynomial is defined as the sum of the degrees of the individual variables, so that the determined polynomial has degree $dk$ and $(k+1)^d$ individual terms (=degrees of freedom). On the other hand, a general multivariate polynomial of degree $k$ in $d$ variables only has $\frac{(d+k)!}{d!k!}$ individual terms (=degrees of freedom). For fixed $k$, we have $\frac{(d+k)!}{d!k!}=\frac{d^k}{k!} + O(d^{k-1})$, which is polynomial in $d$ (i.e. not exponential).

What has all this to do with randomness and Monte Carlo integration? It might help explain why the naive tensor product approach doesn't break down for Monte Carlo integration, while it does break down for uniform sampling. Unfortunately, there is no straightforward way to turn an effective (naive) randomized algorithm into an effective (naive) deterministic algorithm. However, we know from experience that problems which can be solved efficiently by randomized algorithms normally can also be solved efficiently by (more complicated) deterministic algorithms. The deterministic high-dimensional integration methods I know work by exploiting recursion (divide and conquer) in a clever way. However, this is not the place to explain hierarchical bases, hierarchical subspace splitting or the Smolyak method. A more black-box approach would be to use space-filling curves as a building block instead, where the recursion is already built into the construction of the curve. I don't know how well this would work for integration, but it has been asked here before. At least I know that the research groups investigating applications of sparse grids often also investigate applications of space filling curves.