PDF specifications for coders: Adobe or ISO?
I want to code an application that can read and decode a pdf document; now where I'm supposed to get the specs for this fileformat ? The PDF format is standardized from the ISO group but it's not clear to me where is the most reliable source for getting this kind of informations.
what is a good source to start with this file format ?
Solution 1:
You can actually use both sources you mentioned; the confusion is historical.
Adobe invented PDF and it invented the Acrobat product family to be used together with it. The different PDF versions were released together with major Acrobat versions (PDF 1.3 for example was released together with Acrobat 4).
Because of the adoption of the PDF format and because a number of ISO standards were written that actually depended on the proprietary PDF file format (not an easy thing for an ISO standard), Adobe decided to hand over the PDF format to ISO.
From that point on and until today there is an ISO committee responsible for editing the PDF specification and coming up with new versions. The ISO standard for PDF is ISO 32000.
Also, keep in mind that, depending on where you want to use PDF, a number of other ISO standards might be very useful or indispensable. Amongst the most commonly used are PDF/X (for exchange of PDF files in the publishing community) and PDF/A (for the creation of PDF files that need to be archived in long-term storage). These specifications reference a specific version of the PDF standard and add additional requirements and restrictions.
As far as the specification is concerned, you can get all documents from the ISO directly. However, for PDF itself you can also get it from Adobe and that document will be identical. Refer to the Adobe DevNet site on Acrobat:
http://www.adobe.com/devnet/acrobat.html
Just download the Acrobat SDK and that will give you the documentation as part of it.
Let me add a word of caution on "targeting the PDF specification" in code. I really, really, really advise you to more clearly specify exactly what your needs are for PDF (editing, generating, quality control (preflight)) and then look for or ask about an existing library that meets those needs or can be extended to meet your needs.
Writing something that supports "PDF" in general will be a daunting task. The PDF specification is large, intricate and full of... well... niceties. There be dragons!
Update:
Direct link to Adobe's PDF-1.7 specification document (first edition, free to download, is here:
- Document management — Portable document format — Part 1: PDF 1.7
The content of this document later became officially adopted as the ISO standard for general PDF, ISO 32000-1.
Note however, that there are a few differences to the PDF file available from ISO:
- The page layout changed, compared to Adobe's version.
- ISO documents are not available for free (this one costs you in Swiss Francs CHF 238.- to download).
If you start developing PDF software, it is sufficient to have (free) PDF from above Adobe link around.
Update: 2021
It's worth noting that ISO meanwhile released a new version of the PDF specification, called ISO 32000-2. Information about this on the ISO site. This new version was published in 2017 and received an update in December 2020.
While the document does not dramatically alter PDF, and most of the general information about PDF from for example the free Adobe version of the specification will still be correct, there are definitely changes:
- Many things, especially deeply technical things such as everything on transparency, received an update, mostly to clarify existing language (and add information that was up to now more or less implicit). These updates may have an effect on how to implement or use those parts of the standard.
- New features have been included in the standard.
If you're writing PDF files, especially more simple ones, the Adobe specification should still be OK to get you going. If you want to support everything in the PDF standard, you'll need to pay for the latest ISO version (but that is a tall order anyway).
Solution 2:
PDF is not a lightweight format. It is basically postscript with compression on top. An existing library is definitely what you want to use, not write your own. It's a huge task.
Or get an existing PDF writer application, and start it from within your program.
I haven't looked at it very much, but libgnupdf looks OK.
According to Wikipedia PDF combines three technologies:
- A subset of the PostScript page description programming language, for generating the layout and graphics.
- A font-embedding/replacement system to allow fonts to travel with the documents.
- A structured storage system to bundle these elements and any associated content into a single file, with data compression where appropriate.