Find a short expression for a given large integer

Give a single large number $N$, and a small set of operations and functions (in my case usually, +, -, *, ^, and $V(p,q,n),$ the nth term of the Lucas sequence generated by $p$ and $q$), is there a clever way to find a very short expression for $N$ using these operations/functions plus small integers? If so, I'd like to know this clever way; if not, I'd like a hint how we know there is not.

Note: In my application short means less than 72 characters, but my question is more general, so short might be on the order of $\,\log \log N$ characters. Obviously we can not write every with 72 characters--but what about a given $N$? Can we know?

Simple Example: given the prime 92709463146717245465044514107163, we might choose the short expression $3^{67}-2^{70}$.

Motivation: I run a web site which lists the largest known primes. Today, to make the list of the top 5000 a prime must have 223,656 digits. Most of these primes have very specific forms, most often $a\cdot b^n\pm1$, but sometimes something with more character like the prime $$2^{810655} + 2 V(1, 2, 810653) + 1$$ (244032 digits), or the prime $$(121227 \cdot 2^{384976} + 3)^2 + 24398127601 \cdot 2^{384962}$$ (231,789 digits). Occasionally folks are purposefully (or accidentally) obtuse, and try to submit the digits instead of a simple form. Usually I can find a simple form by playing with the numbed a bit in Maple. Another time a mathematician submitted what they claimed (at the time) was "the largest known prime with no special form," a claim I did not know how to check, yet still I wondered: can I shorten this expression enough to fit on a list that only allows a 100 or so characters?

A couple dozen new large primes are submitted each day so I have tried to automate most things. So I wonder, can the task mentioned above be automated (or at the least be sped up with mathematical insight)? Fortunately folks are rarely this obtuse and often just threatening to delete their prime is often enough for them to give me a nicer form, but sometimes they do not know of one (e.g., an ECPP record prime which makes the list by being of the 20 largest known (proven this way), even though these records are only about 10k digits).

But more than the practical question, I am just interested in the question: can we know if there is a very short form (using specified operations/functions/small integers) exists? (What might be the shortest we can hope for?)

A Programming question? Perhaps, but surely a good program would be guided by some mathematical principles. There must be something better than my usual brute force method using a greedy algorithm (e.g., subtracting large terms of the form $a\cdot b^n$ and then iterating).

Perhaps I just need help phrasing the question as surely someone has asked this, I just could not find it. I apologize in advance if I missed an obvious reference or link.


Solution 1:

What you're asking is a variant on the question of 'what is the simplest (shortest) description of this number?', a problem that's known to be uncomputable in general; the magic words for learning more about this are 'Kolmogorov Complexity' (and in particular, 'resource-bounded Kolmogorov complexity'). In many ways it's a sort of decryption problem (and for primes found with ECPP certificates it might even literally be a decryption problem), so it shouldn't be any surprise that it's hard in general.

For your particular application I think you've already hit the best short-cuts with your brute force algorithm; one particular variant on that that I would consider using would be to express the number in various bases (all $b\lt256$ or so, maybe) and look for long runs of individual base-$b$ digits, as this can help pry away terms of the form $a\cdot b^n$ and $a\cdot(b^n-1)$ even for relatively large values of $a$ (e.g., if someone discovers that $2^{513273}\cdot L(1,3,159431)+1$ is prime). There should also be lattice-based algorithms to discover whether a given large number is of the form $L(a,b,n)$ for small $a$ and $b$, since terms of those sequences will be very (abnormally) close to simple multiples of powers of $\phi$; similar techniques might work for finding large members of other linear recurrence relations (i.e., $r_n = ar_{n-1}+br_{n-2}$), and the best starting point for finding out more about these techniques would be to have a look at the LLL (Lenstra-Lenstra-Lovász) algorithm.