Error correction code handling deletions and insertions
I have data which is expressed in form of fixed-length sequence of decimal digits, typically 10 digits in length.
I'm looking for error correction code that would allow me to append one or more characters to the end of my digit sequence and prevent data from being corrupted following ways:
- Individual char replacement (
...12345 => ...92345
) - Swapping of two neighbor characters (
...12345 => ...12435
) - Deletion of character (
...12345 => ....1245
) - Insertion of character (
...12345 => ..123X45
)
I can extend an alphabet of digits to include hexadecimal digits too in sake of reliability, so redundant code can use A-F
letters too. When represented number is smaller than it's required, I can pad it either way with some pad symbol, which could be either plain zero or in range A-F
if necessary.
I would greatly appreciate if this code will not have an overhead larger than two lengths of message itself.
I've already looked at Reed-Solomon code implementations, but as I lack finite field math knowledge yet, I was able only to play with existing implementation over GF(28) — I packed the number as 32-bit representation, applied various kinds of distortions and could only achieve stability with error code itself being twice as long as message itself (e.g. 64 bits), so now I'm searching for something shorter.
Does the code like above exist? If no, could you please point out to restrictions that have to be relaxed in order for it to exist?
The capacity of channels with deletions and insertions remains poorly understood. Still some lower and upper bounds are known, you can easily google for that.
As for codes that work under such circumstances you also have some literature to browse, I'd suggest that you take a look at the seminal work by MacKay and Davey: "Reliable communication over channels with insertions, deletions, and substitutions" and from there you can explore the different works citing it.