Checking a PDF for exploits
Solution 1:
The following is the post of a moderator (Lupin) of Remote-Exploit Forums which I found very helpful.
Here's the method that I use in analysing malicious PDFs:
I use the tools
pdfid
andpdf-parser
from here. In the past I have also usedpdftk
, but I'm finding that less useful recently.The process:
- Use
pdfid
to analyse the pdf document.pdfid
can tell you if a pdf has Javascript included as well as autorun functionality and how many pages it has. A one page document with Javascript and autorun functionality is suspicious.- If Javascript is present, extract it from the document to determine its purpose. Sometimes the Javascript is included in plain text, in which case you can just use the strings utility to extract it. Otherwise, you can use pdf-parser to extract certain types of encoded Javascript.
- Malicious Javascript often contains obfuscation to disguise its true purpose. To remove this obfuscation I modify the script a little to allow easier debugging (e.g. assign the code from eval statements to a variable instead) and use the Rhino Javascript debugger to show me how the code is transformed as it runs.
- Many of the Javascript based PDF exploits often involve buffer overflows, and the shellcode is often in unicode format. I have a perl script that I wrote to convert this type of shellcode to a C program (really just C style shellcode with some wrapper code) which can then be compiled to be further analysed using standard binary analysis techniques.
I will note that PDF exploits are possible without Javascript, but in practice most of the ones out in the wild seem to use it. Certainly the ones I have seen have it.
Solution 2:
You can install and use ClamAV.