Remove text from pdf

I have a pdf file with some text on each page which I would like to remove.

The text is matched by a regex and I think it comes in one block of the pdf.

I have used pdfedit to select and delete the text with the GUI but I was looking for a way to do this from the terminal.

Solution 1:

You can try pdftk, but it works only a fraction of the time, due to (I believe) a problem with fonts.

It works like this: first you need to uncompress the pdf file,

  pdftk myfile.pdf output unc.pdf uncompress

then you modify it with

  sed 's/oldstring/newstring/g' < unc.pdf > mod_unc.pdf

lastly you recompress it with

 pdftk mod_unc.pdf output myfile_modified.pdf compress

I have had only moderate success with this command, in the sense that sometimes it works, sometimes it doesn't, according to its whim.

Solution 2:

On Windows (maybe a virtual machine) you could install PDF-XChange Editor https://www.tracker-software.com/product/downloads/enduser/pdf-xchange-editor

In the free-version can remove text (but not add text) without adding a watermark (of the software, even the software tells you so).

I had to remove several texts, therefore sed was too timeconsuming/exhausting, and sed did not work with umlauts.

Source: https://de.wikipedia.org/wiki/Benutzer:JoKalliauer/PDF

Can not select a restart time

Can you force a single folder/file to sync with OneDrive?

Can I somehow display git file statuses in sidebar in Sublime Text 3?

How do you install & manage TextMate bundles?

Is there a way to tell VLC to buffer a file on a remote PC so as to play uninterrupted with the channel's average BW?

How do you keep the DVD drive from opening when you get the message "please insert a disk into drive"? [duplicate]

Is there a way to hide single bookmarks or bookmark folders in Google Chrome (or Chromium)?

Can I set Windows default second-monitor behaviour to "Extend these displays"?

How can I increase the width of the Show Desktop button in Windows 7/8/8.1/10?

Windows 10 remote desktop session logs off

Open two tabs side by side on Google Chrome

Stop Windows NLA from repeatedly detecting local network as a new, Unidentified Network

Remove text from pdf

Solution 1:

Solution 2:

Related

Recent Posts