How is this possible to convert a long string to a number with less characters?
I'm going to write a program (function) that can convert a long string to a number. For this, first I convert each character (letter) to a number; like a=0.01, b=0.02, c=0.03
... . then for example I have:
abc // I don't want to return 0.010.020.03 (because it is false in mathematic and the returned number length is more than the string length!), I want to have a certain number that just belong to this characters (abc) combination and be less than the length of string. For example for this return 54 (just for example)
For example I can combine (+) these numbers but there are many problems with this way, and no I will have problem with combinations. Like abc will be (0.01+0.02+0.03)=>0.06
, but again bca, cba, bac,... (all combinations) will have same value (0.06)
Any suggestion or help about how is this possible?
Solution 1:
I believe that this question must be asked in Computer Science Stack Exchange.
If the conversion is one-way, I mean you don't need to get the string from the number (the reason might be storing both the string and the number in database) then you can use hash codes that is available in every programming language.
You can also convert the string to hexadecimal decode, For example the hexdec() function in PHP does this, or pack/unpack also does the same (this is a two-way solution).
But in theory, if there is a sequence of numbers, $[a_1,a_2, ... , a_n ]$ the following formula always generates a unique number for that sequence (Note, it is not a set it is a sequence) :
$$\prod_{i=1}^{n}{p_i}^{a_i}$$
where the $p_i$ is the $i$-th prime number. For example for $[2,4,7]$ it will be $2^2 \times 3^4 \times 5^7$ and by dividing the result to each prime number you can get the power number.
Update
I guess the Huffman coding or Prefix code is also useful to convert string to a binary then convert the binary to decimal number.
Solution 2:
If the alphabet consists of $m$ characters, then there are $m^n$ possible strings of length $n$. If you want to map each such string to a different nonnegative integer, then there are $m^n$ possible integers. If $m > 10$, some of those integers will have length greater than $n$.
Solution 3:
It is impossible to device a method for doing this that works for ALL strings.
Proof. If $\phi$ is such a method, then your requirement means that $\phi(s)$ is a string shorter than $s$ for all inputs $s$. So, by applying it recursively, we see that for enough many repetitions $\phi(\phi(\cdots(s))\cdots)$ is a string with length 1 bit. Congratulations! You have compressed an unabridged version of collected works of Shakespeare to a single bit.
The methods for compressing data used in practice work, because the inputs TYPICALLY given to them can be compressed by a function that EXPANDS (the more common) atypical inputs. The average length of a random string cannot decrease as per the argument from Robert Israel's answer.