What does an empirical distribution represent?
Solution 1:
A physical model of the distribution of a random value $X$ is to write all possible values of $X$ on slips of paper (repeating values as needed to give them higher probabilities) and put them in a box, mix them thoroughly, and draw one out. Probabilities are just the proportions in the box: the probability of any set $A$ of possible values of $X$ is found by counting the tickets with values in $A$ and dividing by the total in the box.
The EDF describes the random variable you get when you take your entire sample--that is, all the slips of paper you have drawn to model a set of observations--and put them into an empty box.
A fancy way to compute probabilities for the EDF is the following. Given a set $A$ whose probability you wish to know, examine every ticket in the box (that is, each observation) and write a "1" if the value on that ticket is in $A$ and otherwise write a "0". (The symbol for this procedure is "$I(X_i \in A)$.") Adding up these values counts the number of tickets with value in $A$. Divide by the total number of tickets, $n$, to compute the proportion. That's all the formula in the question is doing.
The EDF has practical applications in simulations, permutation tests, and--especially--resampling methods. The intuition is that if your observations are representative of the original population (that is, of the set of tickets in the original box), then you can study the EDF to learn how to make inferences about the contents of a box based on a sample of it.
As an example, suppose you receive 40 "no" and 60 "yes" answers from 100 responses to a question given to a random sample of people. Because the sample is random, you know it is uncertain and therefore would like to estimate how closely the 40:60 split might match the actual proportion of opinions within the entire population. To find this, put your 100 slips of paper into a box, which therefore contains 40% no and 60% yes values (that's the empirical distribution). Sample this box (with replacement, so that its contents remain the same from one draw to the next) 100 times. This emulates your original sampling procedure, but due to chance variation is likely to produce a different result. Repeat this sampling--use a computer--thousands of times to see how much the results vary. This yields a resampling estimate of the variability of your actual sample.
Solution 2:
The empirical distribution is the distribution you'd get if you sampled from your sample instead of the whole population.
Solution 3:
All that's going on is that you are defining a (random) probability measure that assigns to each $A$ the probability $\frac 1 n \sum_{i = 1} ^ n I(X_i \in A)$, which is the proportion of observations that fall in the set $A$. For example, imagine I have a random sample from a population $(1, 4, 5, 2, 3)$ and let $A$ be the set of even integers. Then the empirical probability distribution assigns probability $\frac 2 5$ to $A$ because $\frac 2 5$'ths of the observations are even.
It's an interesting animal. One checks easily that for fixed $A$ this produces an unbiased estimator of $P(X_1 \in A)$, and that as $n \to \infty$ you get convergence in probability. With more effort you can strengthen this sort of thing by a considerable amount. The convergence properties are quite nice; see for example the Glivenko-Cantelli theorem which concerns the closely related empirical distribution function, and the DKW inequality which gives a rate of convergence for the empirical distribution function to the true distribution function.
Solution 4:
In the programming language R, the command rnorm(10) should simulate a random sample of size 10 from a standard normal distribution. So the "bell-shaped curve" is the probability distribution from which one samples. So I do this:
sort(rnorm(10))
[1] -1.41555384 -0.59325095 -0.41850747 -0.39489145 -0.29435177 0.04814372
[7] 0.16891370 0.48928250 0.96755695 1.88467730
(The "sort" command sorts them from smallest to largest.) The empirical distribution is then the distribution of a random variable that is equal to $-1.41555384$ with probability $1/10$, and to $-0.59325095$ with probability $1/10$, and so on.
Empircal distributions are involved in the Kolmogorov–Smirnov test and the Lilliefors test (among other things).