Remove text from pdf
I have a pdf file with some text on each page which I would like to remove.
The text is matched by a regex and I think it comes in one block of the pdf.
I have used pdfedit to select and delete the text with the GUI but I was looking for a way to do this from the terminal.
Solution 1:
You can try pdftk, but it works only a fraction of the time, due to (I believe) a problem with fonts.
It works like this: first you need to uncompress the pdf file,
pdftk myfile.pdf output unc.pdf uncompress
then you modify it with
sed 's/oldstring/newstring/g' < unc.pdf > mod_unc.pdf
lastly you recompress it with
pdftk mod_unc.pdf output myfile_modified.pdf compress
I have had only moderate success with this command, in the sense that sometimes it works, sometimes it doesn't, according to its whim.
Solution 2:
On Windows (maybe a virtual machine) you could install PDF-XChange Editor https://www.tracker-software.com/product/downloads/enduser/pdf-xchange-editor
In the free-version can remove text (but not add text) without adding a watermark (of the software, even the software tells you so).
I had to remove several texts, therefore sed
was too timeconsuming/exhausting, and sed
did not work with umlauts.
Source: https://de.wikipedia.org/wiki/Benutzer:JoKalliauer/PDF