Is it possible to remove ligatures from copied text?
The reader evince seems to decode ligatures when I tested this.
Btw. for pdflatex documents you can use this in the preamble to display ligatures in the PDF document but copy individual characters:
\input{glyphtounicode.tex} \pdfgentounicode=1 %
In python this would be:
import unicodedata
# \uFB00 is the ff ligature.
unicodedata.normalize('NFKD',u'\uFB00').encode('ascii','ignore')
You could combine this with pyPdf to read the pdf files.
One possibility would be to use your favorite text-editor and simply replace them.
Another way would be to write a script which utilizes sed
...but that would be *NIX-Systems only, I fear.