Is the Kolmogorov complexity of any string equally low?
I'm learning just now about this topic so this might be the most naive of the questions.
So, if I understand it correctly, the string:
ababababababababababababababababababababababababab
has a very low Kolmogorov complexity because you can make a lossless compression with an algorithm as short as "repeat ab 25 times".
While the string:
fgiihghbhjaejegabgfdeggiaejiigchcdchjbhigaifhiedid
is as random as it can get. Without any major recognizable pattern there is no clear algorithmic compression that would reduce our estimate of its Kolmogorov complexity.
But surprisingly there's a program that generates this string losslessly: "take 50 digits of Pi starting from the 1751th decimal and convert them to letters with A = 0, B = 1,..."
Okay, maybe in this case this was not a good compression (it seems to be longer than the string itself) but we can see how this would work for huge random strings. There's possibly always a way to find that string hidden in the decimal expansion of Pi and thus the Kolmogorov complexity of any string is at least as low as the program that generates the decimals of Pi.
This means that the Kolmogorov complexity of Dante's Divine Commedy, or the Kolmogorov complexity of a picture of planet Mars, or any neural network is basically the same. Since all that information is possibly hidden inside of Pi somewhere and thus we only need to make a program that references the decimal position where it should start reading, and bum, you have a compressed program that generates all the information.
Is this reasoning correct? If so, then why is Kolmogorov complexity a useful metric? Everything could have the same complexity or at least the same upper bound for it, which would be low and essentially the same for any piece of information by using this approach.
By the way I used this approach not because I consider Pi to be some esoteric number with special properties. I could have done the same reasoning with the square root of 5 or a 1D cellular automata like rule 30, for example. The important thing is that essentially random strings of any length can be generated deterministically with simple processes like these ("reading decimals", iterating the cellular automata).
I'm also aware that Pi might not be a normal number, meaning that even if digits never repeat and their frequency distribution is uniform it might not be true that any possible sequence is contained within Pi (maybe it avoids the sequence 38726 for some reason). But I'm assuming Pi is a normal number because this is the general belief among mathematicians even if it is still unproven. Anyways my case is not dependent on this particular issue, I could have used a proven normal number instead as an example.
Solution 1:
In fact, let's go even more general. Let $s$ be an infinite sequence of bits such that for all finite sequences $k$ of bits, $k$ occurs as a contiguous subsequence of $s$.
So we could specify any bit-string $k$ by giving a pair of indices $(i, j)$ where $i \leq j$. We would be specifying the string $s_i, \ldots, s_{j - 1}$.
However, the problem is that it might take more bits to write the pair of indices $(i, j)$ than it takes to just write out the string $k$ itself. In fact, it might take a lot more bits to specify it.
No matter what encoding scheme you specify, there will always be, for all $n$, some string of $\leq n$ bits which requires $\geq n$ bits to encode. This follows inexorably from the pidgeonhole principle.
Solution 2:
Mark's answer above is fantastic. For a fun practical demonstration of this principle, consider Borges' universal library, as realized in code.
The library "contains" all conceivable novels, but their addresses (realized here by the URLs) are almost always longer than the texts themselves.