What is degree of freedom in statistics?

Solution 1:

Intuitively degrees of freedom denotes how many independent things are there. As we introduce constraints, we take away the degree of freedom.

First I'll try to answer your question about Chi-square.

Chi-square distribution with $n$ degree of freedom is the sum of squares $n$ independent standard normal distributions $N(0,1)$ hence we've got $n$ things that vary independently.

I'll start with mechanical example, as degree of freedom is similar in every field.

Consider an airplane flying. It has three degrees of freedom in the usual universe of space, and can be located only if three coordinates are known. These might be latitude, longitude, and altitude; or might be altitude, horizontal distance from some origin, and an angle; or might be direct distance from some origin, and two direction angles. If we consider a given instant of time as a section through the space-time universe, the airplane moves in a four‑dimensional path and can be located by four coordinates, the three previously named and a time coordinate.Hence it now has $4$ d.f.

Note that we assumed that plane is not rotating.

Now considering statistical degrees of freedom..

Similar meaning.

Degree of freedom of a statistic is number of values in calculation of statistic that are independent to vary. As we add restriction to observations, we reduce the degree of freedom. Imposing a relationship upon the observations is equivalent to estimating a parameter from them. The number of degrees of freedom is equal to the number of independent observations, which is the number of original observations minus the number of parmeters estimated from them.

Consider the calculation of mean $\frac {\sum_{i=1}^n X_n }{n}$, we are interested in estimation of error which are estimated by residues. Sum of residuals is $0$. Knowledge of any $n-1$ residues gives the remaining residue. So, only $n-1$ can vary independently. Hence they have $n-1$ d.f.

However d.f is mainly used in regression analysis, and ANOVA. You may note that all the distributions with so called d.f correspond to particular cases in linear statistics. Hence d.f is at the best artificial as they are not constraints on the random variable, but are actually degree of freedom of some quantities in some application from where these distributions originated.

Also, For people who are interested, < http://courses.ncssm.edu/math/Stat_Inst/Worddocs/DFWalker.doc > seems to be quite good read.

Solution 2:

Two people are sitting at a bar, you and your friend. There are two sorts of juice before you, one sweet, one sour. After you have chosen your drink, say the sweet one, your friend has no more choice - so degree of freedom is "1": only one of you can choose.

Generalize it to a group of friends to understand higher degrees of freedom...