I have a pdf file with some text on each page which I would like to remove.

The text is matched by a regex and I think it comes in one block of the pdf.

I have used pdfedit to select and delete the text with the GUI but I was looking for a way to do this from the terminal.


Solution 1:

You can try pdftk, but it works only a fraction of the time, due to (I believe) a problem with fonts.

It works like this: first you need to uncompress the pdf file,

  pdftk myfile.pdf output unc.pdf uncompress

then you modify it with

  sed 's/oldstring/newstring/g' < unc.pdf > mod_unc.pdf

lastly you recompress it with

 pdftk mod_unc.pdf output myfile_modified.pdf compress

I have had only moderate success with this command, in the sense that sometimes it works, sometimes it doesn't, according to its whim.

Solution 2:

On Windows (maybe a virtual machine) you could install PDF-XChange Editor https://www.tracker-software.com/product/downloads/enduser/pdf-xchange-editor

In the free-version can remove text (but not add text) without adding a watermark (of the software, even the software tells you so).

I had to remove several texts, therefore sed was too timeconsuming/exhausting, and sed did not work with umlauts.

Source: https://de.wikipedia.org/wiki/Benutzer:JoKalliauer/PDF