An intuitive explanation of how the mathematical definition of ergodicity implies the layman's interpretation 'all microstates are equally likely'.
Solution 1:
It's useful to consider finite-state Markov chains with states $\{ 1, \ldots, N \}$. Such a Markov chain is defined by its transitions matrix $P = (P_{ij})_{i,j=1}^N$. We require that $0 \leq P_{ij} \leq 1$ for each $i, j = 1, \ldots, N$ and that $\sum_{j=1}^N P_{ij} = 1$. Thus, we can think of $P_{ij}$ as the probability of jumping from state $i$ to state $j$. We initialize the Markov chain in a state $X_0$ and let $X_n$ be the state at time $n$ (so $X_n$ is a random variable in $\{ 1, \ldots, N \}$).
A natural requirement is that the Markov chain be irreducible, which essentially means that we can get from any state to any other state with positive probability.
A finite-state Markov chain is said to be ergodic if it is irreducible and has an additional property called aperiodicity. The ergodic theorem for Markov chains says (roughly) that an ergodic Markov chain approaches its "stationary distribution" (see the previous link) as time $n \to \infty$.
Now in the case of physical systems, an additional assumption is usually that the system be reversible. It turns out that the stationary distribution of a finite-state irreducible reversible Markov chain is the uniform distribution, which assigns equal probability $1/N$ to each of the possible states.
Putting all this together, we see that a finite-state reversible ergodic Markov chain converges to the uniform distribution (i.e. reaches an equilibrium as time goes to infinity in which all states are equally likely).
The notion of ergodic dynamical system you asked about is a vast generalization of this idea.
Solution 2:
The equality of the time average and the space average essentially means that each trajectory travels through the space so randomly that all happens as if it reaches everywhere and even more spends a time on each region proportional to the size of that region.
In order to be a bit more precise, we need to go the foundations of ergodic theory.
Ergodic theory is essentially the study of maps and flows preserving a measure. This includes the study of the stochastic properties of the dynamics, such as ergodicity. The origins go back to statistical mechanics with an attempt to apply probability theory to conservative mechanical systems (recall that any Hamiltonian system preserves the Liouville measure, and thus the natural relation to ergodic theory).
Boltzmann's ergodic hypothesis corresponds to assume that typical points in a given energy level have a time average equal to the space average on that energy level (energy levels of conservative systems are invariant and so we cannot escape from them). From the mathematical point of view this requires the notion of ergodocity, which simply means that any invariant set has either zero or full measure (say Liouville measure).
In another (in fact more basic) direction, the existence of a finite invariant measure gives rise to the concept of qualitative Poincaré recurrence (strictly speaking this is unrelated to ergodicity). So, although it is true that almost "every phase space trajectory comes arbitrarily close to every phase space point with the same values of all conserved variables as the initial point of the trajectory", this statement is unrelated to ergodicity.