Generate or update a PDF to include an encrypted, hidden watermark?

Solution 1:

I did something similar a few years ago. It did not meet all your "hard" criteria. It worked like this:

  • I put a hardly detectable, 2x2 point sized "clickable" area on some random place at one of the borders of a random PDF page. It's not very likely that it get's discovered by accident (amongst the load of other very obviously clickable hotspots that was in the PDF anyway...).

  • Should you click on the link, it would take you to a webpage http://my.own.site/project/87245e386722ad77b4212dbec4f0e912, with some made-up "errata" bullet points. (Did I mention that 87245e386722ad77b4212dbec4f0e912 was the MD5 hash of the person's name + contact data which I kept stored in a DB table? :-)

Obviously, this does not protect against printing+scanning+ocr-ing or against a PDF "refrying" cycle. And it also relies on some degree of "security by obscurity".

Here is how you use Ghostscript to add such a clickable hotspot to the lower left corner of page 1 of random-in.pdf:

gs \
 -o random-out.pdf \
 -sDEVICE=pdfwrite \
 -dPDFSETTINGS=/prepress \
 -c "[ /Rect [1 1 3 3]" \
 -c "  /Color [1 1 1]" \
 -c "  /Page 1" \
 -c "  /Action <</Subtype /URI" \
 -c "  /URI (http://my.own.site/87245e386722ad77b4212dbec4f0e912)>>" \
 -c "  /Subtype /Link" \
 -c "  /ANN pdfmark" \
 -f random-in.pdf

To make the clickable area bigger and visible change above commandline parameters like this:

 [....]
 -c "[/Rect [1 1 50 50]" \
 -c "  /Color [1 0 0]" \
 [....]

Even more simpler would be to generate and keep an MD5 hash of the PDF in your database. It will be uniq for each PDF you create, because of the documents UUID and the CreationDate and ModDate inside its meta data. Of course, this also only allows to track the original PDFs in their digital form...

Solution 2:

Very hard one and I am not sure that this will answer all your questions at all.

I am not sure on an all in one solution that can do this, or randomise.

However, if I was tasked with this, I would think that the easiest way is to keep the document in an intermediate format such as formatted HTML, or similar.

Using a print CSS file or similar, you can get the layout to be identical to the book and use a script of some sort to randomise the picture, content or anything and a server side PDF component that assembles the document back.

so then - for example, upon someone purchasing the document, your buy script can randomly choose a number which identifies a protection mechanism (e.g. first picture, second picture, text somewhere etc.), and then generate a unique download link.

When that download link is called, it checks the number, performs the operation and compiles to pdf then downloads it to the client.

Again, I know this will not be easy/straight forward, but you are not asking for something that is easy and this is the best way I can think of.