How to convert a PDF to a PDF/A?

My university requires submitted PDF files to be in PDF/A format.

I tried to find a converter, but they are all very expensive and/or complicated.

How do I convert my existing PDF file into a PDF/A?


Solution 1:

For saving existing Word documents to PDF/A all you need is a recent version of Microsoft Word. For creating from other products that print you can use a free PDF/A creator like the one at www.freepdfcreator.org

If you need to validate that a PDF/A file is compliant, you can use our free service at www.validatepdfa.com

Converting existing PDF files to PDF/A in a lossless way is a bit more tricky and not always possible. Acrobat 9 and 10 can do this. Our business, Solid Documents, also sells a product that does this (and other common archiving functions) for $99: Solid PDF Tools

Solution 2:

PDF/A is an international ISO standard for archiving PDFs. The standard requests strict compliance to its set of rules (like: "embed all fonts", "don't use transparencies", "don't use JavaScript", "no encryption",...).

There are a lot of PDFs out there which claim to be PDF/A, but fail a real smoke test. That claim is just a tag in the file's metadata. That tag can make f.e. Acrobat Reader display a special hint when rendering it.

A check for real compliancy requires some rather expansive commercial "preflight" software. Currently I'm not aware of any Free utility to do that job. See also here for some test results: Isartor testsuite.

You can use Ghostscript to (try to) convert PDF to PDF/A. How to do this is documented here (Update: for newer versions here).

But note: this document was updated only very recently [*]. Previous versions of Ghostscript's Ps2pdf.htm did mis-lead users to run a command that created PDFs claiming to be PDF/A but which failed real smoke tests.

How to convert PDF to PDF/A with Ghostscript:

Here is a commandline:

gswin32c ^
   -dPDFA ^
   -dNOOUTERSAVE ^
   -dUseCIEColor ^
   -sProcessColorModel=DeviceCMYK ^
   -sDEVICE=pdfwrite ^
   -o output_pdfa.pdf ^
   -dPDFACompatibilityPolicy=1 ^
    PDFA_def.ps ^
    input.pdf

[*] Note: The problem lays with the parameter PDFA_def.ps. This is a file you need to edit to suite your needs. Ghostscript ships with a sample of it in its /lib subdirectory. This sample will not work as-is without you editing it. How to edit is inside the sample's comments.

Solution 3:

I used the following command to convert PDF to PDF-A:

gs -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=MyOutPutPDF-A.pdf PDFOriginal.ps

If you have a PDF file, first convert it to PS to work with the above code. I tried several times with the PDF file to be directly converted to PDF-A. However, it didn't work.

Here is a way to convert your PDF to PS file:

pdftops PDFOriginal.pdf PDFOriginal.ps

Solution 4:

Microsoft Office 2007's 'Save as PDF' tool saves in PDF/A format.

A PDF/A document is just a PDF document that uses a specific subset of PDF that is designed to ensure it is 'self-contained'. Ie it is not permitted to be reliant on information from external sources (e.g. font programs and hyperlinks). From wikipedia:

Other key elements to PDF/A compatibility include:

* Audio and video content are forbidden.
* JavaScript and executable file launches are forbidden.
* All fonts must be embedded and also must be legally embeddable for
  unlimited, universal rendering. This also applies to the so-called     
  PostScript standard fonts such as Times or Helvetica.
* Colorspaces specified in a device-independent manner.
* Encryption is disallowed.
* Use of standards-based metadata is mandated.

Edit:

Since there aren't really any tools to test if a PDF is PDF/A, it's a safe bet that just like you, your university also has no way to test that the document you send them is PDF/A.

It's likely that the only reason they specifically request it is so they can be sure that all the content will be "there" when they open it. They just expressed this requirement rather cryptically (And badly) as being that it had to be PDF/A. So a simple way to test if the PDF meet their true requirement of self-containment is to transfer the PDF and view it from another (preferably offline) computer and ensure that everything appears as it should.

Solution 5:

In macOSX without using pdftops which I wasn't able to install (as @soham.m17 proposed) you can do:

pdf2ps oldPdf.pdf psVersionOfOldps.ps

to convert your pdf to ps format and then:

gs -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=MyOutPutPDF-A.pdf psVersionOfOldps.ps

to convert in pdf/A.