How to tell if a code is lossless

Solution 1:

The property you are looking for is called unique decodability, usually abbreviated to UD. There is a fairly simple algorithm which decides whether a code is UD. The idea is essentially to perform a breadth-first search for two strings which code the same way; in your example ac and cb.

Let $S = \def\w#1{\mathtt{#1}}\{\w{010}, \w{001}, \w{01}\}$, as in your example. These words are called code words. We want to find two sequences of code words whose concatenations are the same.

There is a simple theorem that says that this is impossible unless one code word is a prefix of another. (Otherwise, any concatenation of code words begins with exactly one of the code words, and this must be the first word from which it was composed.) So start by finding two words, one of which is a prefix of another; if there is none, the code is UD. In your example we find 01 and 010. Then we write $$\w{01}x = \w{010}y$$ and try to solve for $x$ and $y$. It is clear that $x$ must begin with 0, so we have either $x=\w{01}x'$ or $x = \w{001}x'$:
$$\begin{array}{lrl} \w{01}\,&\w{01}\,x' & = \w{010}\,y\\ \w{01}\,&\w{001}\,x' & = \w{010}\,y \end{array}$$

We may in general have to examine both of these possibilities. Examining the first of these, we see $\w{0101}x' = \w{010}y$, so $y$ must begin with 1, which is impossible, since no code word begins with 1, so we discard this equation. Examining the second equation instead, we see $\w{01}\ \w{001} x' = \w{010} y$, so $y$ must begin with 01. It must therefore be either 010 or 01. We recognize immediately that $y=\w{01}$ and $x'=\w{}$ solves the equation, and stop, having discovered that 01 001 = 010 01.

The formal statement of the algorithm has a queue of unexamined equations, each of the type $pv_1 = psv_2$ as above, where $p$ is a common prefix. This equation is represented in the algorithm with the single string $s$. For example, the equation we had above with $\w{01}\,\w{001}\,x'=\w{010}\,y$ has $p=\w{010}$ and $s=\w{01}$ and is represented in the algorithm as the string $\w{01}$.

The complete description is:


  1. Initialization: For each pair of code words $c_1$ and $c_2$ with $c_1s = c_2$, put $s$ in the queue.
  2. Main loop: Repeat the following until termination:
    • If the queue is empty, halt. The code is uniquely-decodable.
    • Otherwise:
      1. Take an item $s$ from the queue.
      2. For each code word $c$:
        • If $c = s$, halt. The code is not uniquely-decodable.
        • If $cs' = s$, and $s'$ has not been seen before, queue $s'$.
        • If $c = ss'$, and $s'$ has not been seen before, queue $s'$.

There is no requirement to use a queue; the items could be taken from it in FIFO order, or any other order, and the algorithm will still function. I am not aware of any performance guarantees that result from using a queue.

It's not hard to annotate each queue item with the two code word sequences that reached it, so that when the algorithm terminates with a non-UD code, it has constructed an example of two sequences of code words with identical concatenations. Here is some Perl code I wrote that does this.

Solution 2:

Thats kind of hard for a general code but you can use the Sardinas-Patterson algorithm.

The algorithm generates all possible "dangling suffixes" and checks to see if any of them is a codeword. A dangling suffix is the bits that are "left over" when you compare two similar sequences of your code. If you want your code to be decodable, there can be no dangling suffixes that are codewords.

In your example, if we compare "ac" and "c" we get

ac = 01001
c  = 01

As you can see, ac is three bits longer than c, those bits (001) are the dangling suffix. And since 001 is also a codeword, the code is not uniquely decodable.

In a real life scenario though, we would probably use prefix codes instead of general codes since they are just as effective, faster and guaranteed to be uniquely decodable.

Solution 3:

I think, simple answer will be to check for Prefix, Suffix codes. If the code words follow these properties they are uniquely decodable hence loss less. Huffman coding is best example of prefix coding. plz tell me if you want to know more and correct me if I am wrong.