How to remove OCR from a PDF?
I have been searching Google for some time but cannot find an answer to my question.
I have unwanted layers of OCR in a document that I recently scanned with Adobe Acrobat. It has not been OCRed properly, and I want to redact some information, but the OCR is making the wanted information to get erased. I converted the files to TIFs, but noticed a (very) significant quality loss. I have heard that printing to another PDF either keeps the text or reduces the image quality.
In Acrobat Pro DC, the appropriate command is "Remove Hidden Information," which is available through both the "Protect" and "Redact" tools.
On running the command, it just searches out the hidden information but does not change the document. You must then tell Acrobat which information to remove. In this case, select "Hidden Text" in the Results pane, then click the Remove button and save the changed document.
(one year ago...)
If, as you say, the documents are scanned and not printed to PDF from Word for example, you can easily remove with your Adobe:
Select Document, Examine Document and now you can remove the hidden text (OCR).
In Acrobat Pro: use 'remove hidden information' (under 'protection'). Select all, execute, OCR is gone