Derivation of the Boltzmann factor in statistical mechanics

I have seen similar derivation of the Boltzmann factor many times before, (http://micro.stanford.edu/~caiwei/me334/Chap8_Canonical_Ensemble_v04.pdf , just for example), which I think is incomplete.

The argument is as follows:

Consider the system consisting of our small object in contact with a big reservoir. Let the total energy be $U$. Then when our object has energy $E$, the reservoir has energy $U-E$. Let the number of accessible state of the reservoir as function of its energy $x$ be $\Omega(x)$. Then the probability of finding the object with energy $E$ is

$$p(E) \propto \Omega(U-E)$$

Consider the Taylor expansion of $\ln\Omega(x)$: $$\ln\Omega(U-E)\approx \ln \Omega(U)-\frac{\partial \ln\Omega(x)}{\partial x}\bigg|_{x=U}E$$ Define $$\frac{1}{k T}=\frac{\partial \ln\Omega(x)}{\partial x}\bigg|_{x=U}$$ Then exponentiating both sides, we have $$\Omega(U-E)\approx \Omega(U)\exp(-E/kT)$$ So $$p(E)\propto \exp(-E/kT)$$

This must be incomplete because the above can be done to any functions to prove that they are exponential.

For example, we can show that for any function $f(x)$, $$f(x) \approx A\exp[B(x-x_0)]$$ around some $x_0$ by the above "proof": $$\ln f(x)\approx \ln f(x_0)+\frac{d \ln f(x)}{dx}\bigg|_{x_0}(x-x_0)$$ $$f(x) \approx f(x_0)\exp\left[\frac{d \ln f(x)}{dx}\bigg|_{x_0}(x-x_0)\right]$$

Besides, it it also possible to prove that $f(x) \approx g(x)$ around $x_0$ for any function $g$ you like. For example,

$$\sin(f(x))\approx \sin(f(x_0))+\cos(f(x_0))f'(x_0)(x-x_0)$$ $$f(x) \approx \sin^{-1}\left[\sin(f(x_0))+\cos(f(x_0))f'(x_0)(x-x_0)\right]$$

The flaw is obviously due to dropping the the higher order terms in the Taylor series. So there must be something very special about $\ln$ in particular, that we can drop the higher order terms in the series expansion of $\ln\Omega$, and then exponentiate the result. I suppose it's because the higher order terms all vanish in the thermodynamic limit?

Can anyone tell me what I am missing. Your help is much appreciated. Kind regards!


Solution 1:

(My understanding is) This is not a purely mathematical problem, but a mixture of physics and math. And the key lies in the physical picture. The heat bath is a trillion times larger than the system. The energy fluctuation of the system with respect to the bath is like a small ripple on the Pacific ocean (let's say there are no waves or currents).

So first order is all there it is. The reason of using $\ln$ on $\Omega$ is from the timeless $S = k_B \ln \Omega$ that connects stat-mech with thermodynamics. You are right that any function can be expressed as exponential locally, but if you look further away from the local, then you are doing math, not physics.

Solution 2:

I was seeking an answer to the same question and found this explanation in the Thermal Physics lecture slides at University Central Arkansas:

All higher order partial derivatives are zero by assumption of a large heat bath $$\frac{\partial^2 \sigma}{\partial U^2} = \frac{\partial}{\partial U} \frac{\partial \sigma}{\partial U} = \frac{\partial}{\partial U} \left(\frac{1}{\tau}\right) = 0$$

Where, following Kittel and Kroemer, I am writing the entropy as $\sigma(U) = \log \left[\Omega(U)\right]$ and the fundamental temperature as $1/\tau = \partial \sigma(U, N, V,...)/\partial U$

This is more satisfying to me than another, more common, explanation I have seen (e.g., here) that the state energy, $\varepsilon$, is much smaller than the total system energy, $U_0$, so higher-order terms can be neglected. I don't see how that argument can be made, since in the Taylor expansion of the entropy $$\sigma(U_0-\varepsilon) = \sigma(U_0) + \frac{{\rm d} \sigma}{{\rm d} U} (-\varepsilon) + \frac{1}{2!} \frac{{\rm d}^2 \sigma}{{\rm d} U^2} \, (-\varepsilon)^2 + \cdots$$ there is no way to write the series in powers of the small parameter $\frac{\varepsilon}{U_0}$.

Simply claiming that the reservoir is so large that its temperature is a constant seems cleaner to me.

Solution 3:

I think the question is why the statistical definition of temperature is meaningful.

$$ \frac{1}{k T}=\frac{\partial \ln\Omega(x)}{\partial x}\bigg|_{x=U} $$

Temperature is just some number we are going to associate to our probability distribution $\Omega$ as this parameter "Energy" varies from $U \to U-E$.

The canonical system is built from a large number (Avogadro constant is like $10^{23}$) of dynamical systems with energy shuffling back and forth between them. We have no way to keep track of all of them.

What is the most likely way to partition the energy? We will assume a multinomial distribution of the Energy, and this partition is changing with time.

$$ \mathbb{P}[E = E_1(t) + \dots + E_N(t)] = \binom{M}{M_1(t), \dots, M_N(t)} p_1^{M_1(t)}\dots p_N^{M_N(t)}$$

Using the Ergodic hypothesis we assume some "equibrium" is achieve, but did we actually prove that. Does the weather reach "equalibrium" ??? Anyway, assuming ergodicity, we can drop time.

$$ \mathbb{P}[E = E_1 + \dots + E_N] = \binom{M}{M_1, \dots, M_N} p_1^{M_1}\dots p_N^{M_N} \approx exp \left[ \sum p_i \log p_i \right]$$

Intuitively speaking, the mostly likely energy should be equidistributed among all systems. In fact, the multinomial coeffients will be maximized when $M_i \propto p_i$. Justifying the second equation.


Did you buy my assumption that $\Omega$ is multinomial?

The exchangeability and ergodicity hypotheses make this argument really shaky, as is my confidence in my own handling of this topic. However, most textbooks assume it's true without validating.

Solution 4:

You are correct that for $\Omega(x)>0$ one can always write $\Omega(x)=\exp(\ln(\Omega(x)))=\exp(\ln(\Omega(x_0+(x-x_0))))$ and then Taylor expand the logarithm. The Taylor expansion of the logarithm is $\ln(\Omega(x_0))+\frac{x-x_0}{\Omega(x_0)}$, so we are left with $\Omega(x_0)\exp \left ( \frac{x-x_0}{\Omega(x_0)} \right )$ when we perform this procedure.

The question is: how good is the error in this approximation? The Lagrange remainder in the Taylor expansion of the logarithm is $\frac{(x-x_0)^2}{2\Omega(y)^2}$ where $y$ is between $x_0$ and $x$, so that the relative error is $\exp \left ( \frac{(x-x_0)^2}{2\Omega(y)^2} \right )$. You are then left to argue that while $(x-x_0)^2$ only scales quadratically with the system size (at fixed energy density), $\Omega(y)^2$ scales far faster. This depends somewhat on finer details of the model for the microstates.