How to remove a watermark from a PDF file?
I thought this would be a simple task, but it turned out the other way.
The watermark is the very same (overlapping, but transparent) image on every single page. I created the PDF file myself (so no copyright worries here) using PDFCreator 0.9.8.
I have already tried my friend's Adobe Acrobat Pro, but it didn't work. It tries to remove it, but it can't. I tried to remove header/footer, etc., but the watermark just won't disappear.
How can I remove the watermark?
For image-based watermarks, there are several tools that promise their automatic removal. For example:
- We PDF Watermark Remover
- PDF Watermark Remover
- SoftOrbits PDF Logo Remover
All of these are free to try, but require a license to actually produce the desired output.
However, the watermark of this specific PDF file (which the OP sent me via email) isn't a single image that is repeated on all pages. As it turns out, PDFCreator hardcoded it (almost pixel by pixel) into every single one of them. This makes the watermark much more difficult to remove (and results in a rather bloated PDF file).
Since the watermark is actually composed of many tiny images, you can remove them with a PDF editor (e.g., Foxit Advanced PDF Editor), simply by selecting them and pressing Delete. Unfortunately, you have to repeat this for every page.
A less time-consuming solution would be to remove the watermark programmatically. We need:
- Pdftk: a tool to (un)compress and fix PDF streams.
- Notepad++: a text editor capable of replacing Perl Compatible Regular Expressions.
Steps
Download Pdftk and extract pdftk.exe and libiconv2.dll to %windir%\System32, a directory in the path or any other location of your choice.
Download and install Notepad++.
-
PDF streams are usually compressed using the DEFLATE algorithm. This saves space, but it makes the PDF's source illegible.
The command
pdftk original.pdf output uncompressed.pdf uncompress
uncompresses all streams, so they can be modified by a text editor.
-
Open uncompressed.pdf with Notepad++ to reveal the structure of the watermark.
In this specific case, every page begins with the block
q 9 0 0 9 2997 4118.67 cm BI /CS/RGB /W 1 /H 1 /BPC 8 ID Ÿ®¼ EI Q
and nearly 4,000 blocks just like this one. This particular block sets only one (
/W 1 /H 1
) of the watermark's pixels.Scrolling down until the pattern changes reveals that the watermark's stream is 95,906 bytes long (counting newlines). The exact same stream is repeated on every page of the PDF file.
-
Press Ctrl + H and set the following:
Find: q 9 0 0 9 2997 4118\.67 cm.{95881} Replace: (blank) Match case: checked Wrap around: checked Regular expression: selected . matches newline: checked
The regular expression
q 9 0 0 9 2997 4118\.67 cm.{95881}
matches the first line of the above block (q 9 0 0 9 2997 4118.67 cm
) and all following 95,881 characters, i.e., the watermark's stream.Clicking Replace All removes it from all pages of the PDF file.
-
The watermark has now been removed, but the PDF file has errors (the streams' lengths are incorrect) and it's uncompressed.
The command
pdftk uncompressed.pdf output nowatermark.pdf compress
takes care of both.
uncompressed.pdf is no longer needed. You can delete it.
The result is the same PDF without the watermark (and about half the size).
It sounds like the watermark is actually part of the images within the .PDF, and not a separate image rendered over it by whatever you are using to display the .PDF. You may not be able to remove the watermark without extracting the images from the .PDF, running them through an image editor, and then reconstructing the .PDF manually.
For text watermarks, editing a PostScript version can be much easier: After
$ pdftops document.pdf
edit document.ps, then convert back to PDF via
$ ps2pdf document.ps
Found another way to do it:
- Use pdf2htmlEX tool (or any other PDF to HTML converter) to convert the PDF to a HTML file.
- Edit HTML with a text editor, and remove the watermark. Save it.
- Print to the HTML to a new PDF document
- Profit
The artifacts of the stamp are that you can delete it within Adobe Acrobat Pro, however it regenerates on a mouse-move because the stream object keeps it persistent.
If you try to edit the pdf source - which is tricky, there's a chance that the file will be corrupted.
If the stamp is a stream, we can interrupt it by disconnecting the computer from the Net, which I did.
Then using the Adobe Acrobat Pro, I selected one of my annotations, right-clicked to get the popup, and selected "Show Comments List".
Select the nefarious watermark/stamp from the List, right-click to get the popup and select "Delete". Do this on every page where the affixation occurs.
Save the File under another name. My application crashed, but not before saving the file!
Open the new & much smaller file; note that all the watermarks/stamps are gonzo.
In my case, the file size of my 3-page document shrank from 300 kb down to an impressive 60 kb. All the original data and annotations remained intact - sans the watermarks.
~Good hunting :o)