Number of occurrences of the digit 1 in the numbers from 0 to n

So I dug up my notes on this problem (for those interested this is project euler problem # 156) There is an analytical form for the solution to this problem (but alas, even the analytical form is too slow to solve the big problem, which requires finding ALL numbers that satisfy this condition for ALL digits 1-9). I will state my result first (to avoid hiding it in the text) and provide explanation for it after.

SOLUTION

Define a list representation of a number $n$ to be (following a notation I borrowed from continued fractions)

$$n = r_k*10^k + r_{k-1}*10^{k-1} + ... + r_0*10^0 \equiv [r_k,r_{k-1},...,r_0]$$ The function $f(d,n)$ which gives the number of occurances of a digit $d$ up to the number $n$ is given by $$ f(d,n) = \sum_{j=0}^k\left(\sum_{i=0}^{r_j} (10^j\delta_{i-1,d})+ r_jE(j) +\delta_{r_j,d}(n[j:]+1)\right)$$

where the notation $n[j:]$ is used to signify the number formed by the last $j$ digits (e.g. for $n=1234=[1,2,3,4]$, $n[1] = 4$, $n[3] = 234$) and $E(j) = j*10^{j-1}$

Typing this into mathematica (and checking ($f(1,n) = n$ gives the next result of $\mathbf{199981}$).

NOTE: This can be generalized even further to any base $B$ by replacing occurrences of $10^k$ with $B^k$

MOTIVATION

First observe that in the number 0-9 any digit appears only 1 time.

In the numbers 0-99 any digit appears 10 + 10 times (10 times as the ones digit and 10 times in the range x0-x9)

In the numbers 0-999 any digit appears 20*10+100 = 300 times (20 times in each range of 100+ plus 100 in the range x00-x99)

It is not difficult to prove that this pattern continues, for any number given as $10^k-1$ there are $k*10^{k-1}$ appearances of any digit in that range.

This motivates my definition of the function

$$E(j) = j*10^{j-1}$$

However, the number given is unlikely to be as nice as $10^k-1$ we have to learn a clever way to add up the parts of that range. For a number (given in the above notation) like $[4,0,0,0]$. We see that the range $0-999$ occurs 4 times ($0-999$,$1000-1999$,$2000-2999$,$3000-3999$. So we are left with $4*E(3) = 1200$ occurrences of the digit $1$ (this motivates the $r_jE(j)$ term). However, in the range 1000-1999 the digit one occurs an extra 1000 (or $10^j$) times, this extra term only happens if $r_j$ is greater then the digit we are summing over (e.g. if we are adding occurrences of the digit $9$ the answer would be 1200 and not 2200), the best way I could think of to express this conditional statement analytically was with the sum $\sum_{i=1}^{r_j}(10^j\delta_{i-1,d})$

Calculating a number like $[4,1,2,4]$ is not much more difficult, we simply iterate over the sequence $[4,0,0,0]$, $[3,0,0]$, $[2,0]$,$[4]$ following the formula in the previous paragraph. However, the method above will miscount because it ignores the extra occurrence of the digit $1$ in the range $4100-4124$. So at every step of the iteration we have to make sure there aren't any tails that we are forgetting to count, which motivates the $\delta_{r_j,d}(n[j:]+1)$ term.


Ok this is simple to do with Mathematica.

For[i = 0; j = 0, i <= 200000,
 i++, j += (Plus @@ Cases[IntegerDigits[i], 1]); If[j == i, Print[i]]]

The result is

0,1,199981,199982,199983,...,199990,200000

Therefore the number you search has to be 199981.


I think I have the beginning of an idea to solve this. We can start by noticing that $f(10^{n+1}-1)= 1+10^n+10\times f(10^n-1)$ because when you list all the numbers from 1 to $10^{n+1}$ you get: 1... $10^n-1$, $10^n$,..., 1999...99, 20...00,...$10^{n+1}-1$. from $10^n$ to 1999...99 you can see $10^n+f(n)+1$ the digit '1', and from 20...00 to $10^{n+1}-1$ you get $9\times f(10^n-1)$ the digity '1'. So let's call $u_n=10^n-1$ and $v_n=f(u_n)$.

I can see that $u_9 >v_9$ and $u_{10} < v_{10}$, so the anwswer should be in beetween (or at least smaller than $10^{10}-1$.