Command line tool to search and replace text on a PDF

What’s shown as text in a PDF isn’t necessarily plain text in the source, refer to Kurt Pfeifle’s great answer for details. This answer covers only the most simple of cases, not at all does the here described approach work for any PDF!

If you’re lucky and it’s just text, then you can try to remove it simply with sed or in fact any text editor – let’s say it says “watermark”:

sed 's/watermark//g' in.pdf >out.pdf

If your PDF file is compressed you need to uncompress it first for this to work, e.g. with pdftk (How can I install pdftk in Ubuntu 18.04 and later?):

pdftk in.pdf output out.pdf uncompress 

If sed’s output is not readable with your preferred PDF reader, try repairing it with pdftk:

pdftk out.pdf output out_pdftk.pdf

Further reading: How to Edit PDFs?

Source: How to remove watermark from pdf using pdftk • Super User


Accepted answer will work only in rare cases

Sorry, the answer given by @dessert is as wrong as it could be as a general advice. It will not work for the general case of text replacement in PDFs (watermarks or not), and you'll have to be very lucky for very rare cases of PDFs you encounter were it would work. (Moreover, watermarks inserted by LibreOffice frequently are converted into vector or pixel graphics, even if they appear like text when printed or viewed on screen.... but this case I'll not discuss any further -- below I deal only with real text contents in a PDF.)

Reasons

The reasons for this are these:

  1. What appears to be ASCII text in the visual representation of its content in a PDF viewer, very likely will not be ASCII text inside the PDF source code. Instead it may be hex encoded.

  2. Additionally, an ASCII string's individual characters might be placed on the page in a consecutive order, but they may easily be placed individually, with each having its own coordinate information sprinkled in between the individual characters...

  3. Also, the hex encoding of the ASCII (and non-ASCII) character table (the "mapping") will not be predictable, and it may change from font to font.

Hence in all these cases your sed command will not succeed -- not even after uncompressing the PDF.

Example

Here is an example for the "string" Watermark, how it can appear inside a PDF created with LibreOffice:

56.8 726.989 Td /F2 16 Tf[<01>29<0203>-2<0405>6<06>-1<020507>]TJ

I'll dissect for you what that means:

  • 56.8 726.989 Td: Td is an operator to move the text positioning on the page; 56.8 726.989 are the x-/y-coordinates to describe that exact position.

  • /F2 16 Tf: Tf is an operator to set a certain font as well as its size as the currently active one; in this case it is the font tagged elsewhere with the name /F2 and its size should be 16 pt.

  • [<01>29<0203>-2<0405>6<06>-1<020507>]TJ: TJ is an operator to show text while at the same time allowing for individual glyph positioning. The meaning of the hex snippets enclosed by angle brackets are the following, according to the 'charmap' table specific for that PDF and the used font:

    • <01>: this is the 'W'.

    • <0203>: this is the 'at'.

    • <0405>: this is the 'er'.

    • <06>: this is the 'm'.

    • <020507>: this is the 'ark'.

    The numbers in between these hex snippets (29, -2, 6 and -1) are correction values which determine the individual spacings of the different characters.

Now you show me how you'd replace that "string" by something else by using sed... Remember, you do not know the encoding in advance, nor the placement correction numbers, when you deal with an arbitrary PDF. You can only find out by opening its source code in an editor and analysing its content.

Executive Summary

No, there is no command line way to reliably remove unwanted strings from a PDF!

You can only do this if...

(a) ...you are a PDF expert who is skilled to read the PDF source code;

(b) ...you are prepared to analyse the PDF file in question individually;

(c) ...you use a text editor to modify its contents after uncompressing the PDF source code.

WARNING: The answer currently marked as 'accepted' might have worked for the specific PDF of the OP. However, it will not work in the general case. Don't take the "recipe" it advertises for granted!