Is there a proof of Benford's Law? [duplicate]
As a rough/somewhat-intuitive explanation of why Benford's Law makes sense, consider it with respect to amounts of money. The amount of time(/effort/work) needed to get from \$1000 to \$2000 (100% increase) is a lot greater than the amount of time needed to get from \$8000 to \$9000 (12.5% increase)--increasing money is usually done in proportion to the money one has. Thought about in the other direction, it should take a fixed amount of time to, say, double one's money, so going from \$1000 to \$2000 takes as long as from \$2000 to \$4000 and \$4000 to \$8000, so the leading digit spends more time at lower values than at higher values. Because the value growth is exponential, the time spent at each leading digit is roughly logarithmic.
Benford's Law
To examine the distribution of the mantissas of a dataset, we can examine the fractional parts of the common logarithms of the data. That's because the fractional part of the common logarithm is the common logarithm of the mantissa.
For example, consider numbers with mantissa $2.5$: $$ \begin{align} \log(.025) &= .39794000867203760957 - 2\\ \log(.25) &= .39794000867203760957 - 1\\ \log(2.5) &= .39794000867203760957 + 0\\ \log(25) &= .39794000867203760957 + 1\\ \log(250) &= .39794000867203760957 + 2\\ \log(2500) &= .39794000867203760957 + 3 \end{align} $$ Thus, if the mantissa of a number is $2.5$, the fractional part of its common logarithm is $.39794000867203760957$.
If the data spans several decades (powers of $10$, not years; see Decade (log scale), when we combine the data from all of the decades, it tends to even out. Thus, the fractional parts of the common logarithms of the data should be evenly distributed.
For example, suppose the common logarithm of the data is distributed across $4$ decades as shown below:
$\hspace{7mm}$
Summing the distributions of the fractional parts of the logarithms, that is, moving the decades on top of each other and adding curves, we get the black curve at the top of the image below, which is close to evenly distributed:
$\hspace{64mm}$
Thus, we arrive at the principal assumption of Benford's Law: the logarithm of the mantissa of data which spans several decades is typically distributed evenly.
Note the hash marks on the bottoms of the graphs above. These marks separate where the different leading digits of the mantissa live. On the line segment below, we expand these hash marks and align the leading digit of the mantissa with the fractional part of the common logarithm. The leading digit of the mantissa is $1$ if the fractional part of the common logarithm is between $0$ and $.30103$; the leading digit is $2$ if the fractional part is between $.30103$ and $.47712$; and so on.
$\hspace{7mm}$
If the principal assumption of Benford's Law holds, the fractional part of the common logarithm is evenly distributed. In view of the previous diagram, it is obvious that the probability of $1$ being the leading digit is greater than that of $2$ being the leading digit; the probability of $2$ is greater than that of $3$; and so on. This is made precise below.
Data that has a mantissa starting with the digit $1$ has a common logarithm whose fractional part ranges from $\log(1)$ to $\log(2)$. If the fractional part of the common logarithm of the data is evenly distributed, then the portion of the data that starts with $1$ would be $$ \frac{\log(2)-\log(1)}{\log(10)-\log(1)}=.30102999566398119521 $$ Similarly, data that has a mantissa starting with the digit $2$ has a common logarithm whose fractional part ranges from $\log(2)$ to $\log(3)$. Thus, the portion of the data starting with $2$ would be $$ \frac{\log(3)-\log(2)}{\log(10)-\log(1)}=.17609125905568124208 $$ In the same manner, data that has a mantissa starting with the digit $d$ has a common logarithm whose fractional part ranges from $\log(d)$ to $\log(d+1)$. Thus, the portion of the data starting with $d$ would be $$ \frac{\log(d+1)-\log(d)}{\log(10)-\log(1)}\tag{1} $$ Using $(1)$, we can compute the probability that such data will start with the digit $d$: $$ \begin{array}{} d&P(d)\\ \hline\\ 1&.30102999566398119521\\ 2&.17609125905568124208\\ 3&.12493873660829995313\\ 4&.096910013008056414359\\ 5&.079181246047624827723\\ 6&.066946789630613198203\\ 7&.057991946977686754929\\ 8&.051152522447381288949\\ 9&.045757490560675125410 \end{array} $$ This distribution of leading digits is called Benford's Law.
Further Digits
The probability that the first two digits are $10$ is $$ \frac{\log(11)-\log(10)}{\log(100)-\log(10)}=.041392685158225040750 $$ The probability that the first two digits are $20$ is $$ \frac{\log(21)-\log(20)}{\log(100)-\log(10)}=.021189299069938072794 $$ Adding the probabilities for all first digits, we can compute the probability that the second digit is $0$ to be $.11967926859688076667$.
In this manner, we can compute the probability that the second digit is $d$: $$ \begin{array}{} d&P(d)\\ \hline\\ 0&.11967926859688076667\\ 1&.11389010340755643889\\ 2&.10882149900550836859\\ 3&.10432956023095946693\\ 4&.10030820226757934031\\ 5&.096677235802322528359\\ 6&.093374735783036121570\\ 7&.090351989269603369600\\ 8&.087570053578861399175\\ 9&.084997352057692199898 \end{array} $$ For reference, here are the probabilities that the third digit is $d$: $$ \begin{array}{} d&P(d)\\ \hline\\ 0&.10178436464421710175\\ 1&.10137597744780144287\\ 2&.10097219813704129959\\ 3&.10057293211092617495\\ 4&.10017808762794737592\\ 5&.099787575692177452606\\ 6&.099401309944962084127\\ 7&.099019206561896092170\\ 8&.098641184154777437875\\ 9&.098267163678253538152 \end{array} $$ Here are the probabilities that the fourth digit is $d$: $$ \begin{array}{} d&P(d)\\ \hline\\ 0&.10017614693993552632\\ 1&.10013688811757926504\\ 2&.10009767259461432585\\ 3&.10005850028348653742\\ 4&.10001937109690488020\\ 5&.099980284947840433784\\ 6&.099941241749525329518\\ 7&.099902241415451708313\\ 8&.099863283859370683672\\ 9&.099824368995291309873 \end{array} $$
For the case of Bendford's Law, of course scale invariance is a necessary condition; it the law must be true either if we measure things in meters or in feet or in furlongs, thus multiplying given data for a constant, the only distribution which allows this is the logarithmic one. But its being necessary does not mean that it is the answer, of course.
Scale invariance is not relevant for Zipf's Law, however, since we have an absolute rank.
In fact Benford's law and also Zipf law are special cases of Planck's 1901 distribution. Planck calculated the maximum entropy distribution of particles in boxes (as a part of his famous solution of Blackbody radiation).If we take the limit of Planck distribution to many more particles than boxes we obtain the zipf law. Benford's law is a bit more complicated as it requires to put a constraint on the number of particles in the boxes (9 at most). I recommend the book "Entropy God's Dice Game". In the book there are derivations of all these distributions in the appendixes. The book itself is popular and it is almost equations free. By the way Benford's law is called the first digit law. Nevertheless is true for any digit in a numerical file that is in the Shannon limit (a compressed file).
Benford's law is easy to prove, but a little harder to explain. The easiest way to understand why it works is to think a a random set of number as not one number but two: you have the number itself and the highest possible number that it could be. Therefore, you have a potential range of possible numbers for every random number. Also, for you to have a good data set the potential range should be random for each random number. If the potential range is the same for each random number, then Benford's law will not correctly predict the leading number.
For example, if you set a computer to randomly pick 15 numbers from a potential range of numbers 1 - 300 each time, the odds that the leading number will be a 1 or a 2 would be the exact same. The chance that the leading number would be a $3 - 9$ would be significantly less and therefore, the random numbers would not line up very well with Benford's law.
In order for Benford's law to work, each random number must have an equally random potential: n = random number; d = highest possible number on a range $1$ through $d$. For example, the first random number is picked out of 25, the next random number is picked out of 1679, the next random number is picked out of 500, the next out of 12478, the next out of 20, the next out of 36 etc. etc. The sort of random set is more likely to lead with a 1 followed by a 2 consistant with Benford's law.