Is it possible to remove ligatures from copied text?

The reader evince seems to decode ligatures when I tested this.

Btw. for pdflatex documents you can use this in the preamble to display ligatures in the PDF document but copy individual characters:

\input{glyphtounicode.tex}
\pdfgentounicode=1 %

In python this would be:

import unicodedata
# \uFB00 is the ff ligature.
unicodedata.normalize('NFKD',u'\uFB00').encode('ascii','ignore')

You could combine this with pyPdf to read the pdf files.


One possibility would be to use your favorite text-editor and simply replace them.

Another way would be to write a script which utilizes sed...but that would be *NIX-Systems only, I fear.