How long will this take to reach.. kimye?

I found the 1bc29b36f623ba8 twitter account on 4chan last night, it's a user who is posting md5 hashes every 10 minutes in a sequential order, starting at ! and I assume with no end in sight.

The twitter account has a specific twitter post structure

[sequence hashed] #[sequence in plaintext] #md5 #allday #💯

which in my eyes reads quite upbeat about performing md5 hashing all day.

The most interesting thing, I think, is that this crypto focused account is following a rather odd selection of users. Kim Kardashian and Kanye West.

Could someone do the math and figure out when it will reach kimye? (kimye is a common abbreviation of Kim Kardashian and Kanye West, as a couple).. I'm barely interested in the couple, but from what I can tell kimye will be the first string to reference the accounts two favourite people.

The characters used appear to be limited to the following:

!#$%&()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~

Solution 1:

There are $91$ characters available, in the order

!#$%&()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~

The number of tweets to get to $\texttt{kimye}$ is $$\begin{align*} \hphantom{+}91^{1} & \quad \text{(number of 1-character strings)} \\\\ {}+ 91^2 & \quad \text{(number of 2-character strings)}\\\\ {}+ 91^3 & \quad \text{(number of 3-character strings)}\\\\ {}+ 91^4 & \quad \text{(number of 4-character strings)}\\\\ {}+ 71\cdot91^4 & \quad \genfrac{(}{)}{0pt}{0}{\text{number of 5-character strings}}{\text{starting with a character before }\texttt{k}}\\\\ {}+ 69\cdot91^3 & \quad \genfrac{(}{)}{0pt}{0}{\text{number of 5-character strings}}{\text{starting with }\texttt{k}\text{ but whose 2nd character is before }\texttt{i}}\\\\ {}+ 73\cdot91^2 & \quad \genfrac{(}{)}{0pt}{0}{\text{number of 5-character strings}}{\text{starting with }\texttt{ki}\text{ but whose 3rd character is before }\texttt{m}}\\\\ {}+ 85\cdot91^1 & \quad \genfrac{(}{)}{0pt}{0}{\text{number of 5-character strings}}{\text{starting with }\texttt{kim}\text{ but whose 4th character is before }\texttt{y}}\\\\ {}+ 65\cdot91^0 & \quad \genfrac{(}{)}{0pt}{0}{\text{number of 5-character strings}}{\text{starting with }\texttt{kimy}\text{ but whose 5th character is before }\texttt{e}} \end{align*}$$ which makes for a total of $$4,990,767,847\;\text{ tweets}$$ before $\texttt{kimye}$. At a rate of $6\text{ tweets/hour}$, from the inception of the Twitter feed, it will take about $$831,794,641 \text{ hours}\approx 94,891\text{ years}$$ to get to the string $\texttt{kimye}$.

Solution 2:

The $91$ characters in the order given, i.e.,

 !#$%&()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~

will serve as the digits of a bijective base-91 numeration system. Counting in this system is the same as listing the digit-strings in shortlex order:

decimal     bijective base-91
----------  -----------------
1           !
2           #
3           $
...
91          /
92          !!
93          !#
...
4990767847  kimyd
4990767848  kimye   <---
4990767849  kimyf
...

Here's an implementation of the conversion algorithm given at the Wikipedia link:

def word_to_number(w, alphabet):
    """
    return the integer n such that w is the nth word
    in the shortlex ordering of words on the given alphabet string 
    """
    k = len(alphabet)
    n = q = 0
    for c in w[::-1]:
        p = alphabet.index(c) + 1
        n = n + p*(k**q)
        q = q + 1
    return n

Then

alph = "!#$%&()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~"
print word_to_number('kimye', alph)

gives the result

4990767848

i.e., 'kimye' is the 4,990,767,848-th word in the listing by shortlex order.


For reference, and as a cross-check, here's the inverse function:

def number_to_word(n, alphabet):
    """
    return the nth word (n >= 0) 
    in the shortlex ordering of words on the given alphabet string
    """
    k = len(alphabet)
    word = ''
    q = n
    while q > 0:
        word = alphabet[(q-1)%k] + word
        q = (q-1)//k
    return word

Then number_to_word(4990767848, alph) --> kimye.