Structure of a PDF file? [closed]

Solution 1:

Here is a link to Adobe's reference material

http://www.adobe.com/devnet/pdf/pdf_reference.html

You should know though that PDF is only about presentation, not structure. Parsing will not come easy.

Solution 2:

I found the GNU Introduction to PDF to be helpful in understanding the structure. It includes an easily readable example PDF file that they describe in complete detail.

Other helpful links:

  • PDF Succinctly book is longer and has helpful pictures.
  • Introduction to the Insides of PDF is a presentation that isn't as in-depth but gives a quick overview and has lots of pictures.

Solution 3:

When I first started working with PDF, I found the PDF reference very hard to navigate. It might help you to know that the overview of the file structure is found in syntax, and what Adobe call the document structure is the object structure and not the file structure. That is also found in Syntax. The description of operators is hidden away in Appendix A - very useful for understanding what is happening in content streams. If you ever have the pain of working with colour spaces you will find that hidden in Graphics! Hopefully these pointers will help you find things more quickly than I did.

If you are using windows, pdftron CosEdit allows you to browse the object structure to understand it. There is a free demo available that allows you to examine the file but not save it.

Solution 4:

Here's the raw reference of PDF 1.7, and here's an article describing the structure of a PDF file. If you use Vim, the pdftk plugin is a good way to explore the document in an ever-so-slightly less raw form, and the pdftk utility itself (and its GPL source) is a great way to tease documents apart.

Solution 5:

I'm trying to do pretty much the same thing. The PDF reference is a very difficult document to read. This tutorial is a better start I think.