Where is the PEM file format specified?

I need to parse .PEM files.
I know that the standard for "Privacy-enhanced Electronic Mail" is defined in RFCs 1421-24. But they don't seem to mention some text I find inside OpenSSL .pem files (eg. "Key Attributes", "BEGIN CERTIFICATE", etc...) Is this an OpenSSL-specific format?


For quite a long time, there was no formal specification of the PEM format with regards to cryptographic exchange of information. PEM is the textual encoding, but what is actually being encoded depends on the context. In April 2015, the IETF approved RFC 7468, which finally documents how various implementations exchange data using PEM textual encoding. The following list, taken directly from the RFC, describes the PEM format used for the following scenarios:

  1. Certificates, Certificate Revocation Lists (CRLs), and Subject Public Key Info structures in the Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile [RFC5280].
  2. PKCS #10: Certification Request Syntax [RFC2986].
  3. PKCS #7: Cryptographic Message Syntax [RFC2315].
  4. Cryptographic Message Syntax [RFC5652].
  5. PKCS #8: Private-Key Information Syntax [RFC5208], renamed to One Asymmetric Key in Asymmetric Key Package [RFC5958], and Encrypted Private-Key Information Syntax in the same documents.
  6. Attribute Certificates in An Internet Attribute Certificate Profile for Authorization [RFC5755].

According to this RFC, for the above scenarios you can expect the following labels to be within the BEGIN header and END footer. Figure 4 of the RFC has more detail, including corresponding ASN.1 types.

  • CERTIFICATE [RFC5280]
  • X509 CRL [RFC5280]
  • CERTIFICATE REQUEST [RFC2986]
  • PKCS7 [RFC2315]
  • CMS [RFC5652]
  • PRIVATE KEY [RFC5208] [RFC5958]
  • ENCRYPTED PRIVATE KEY [RFC5958]
  • ATTRIBUTE CERTIFICATE [RFC5755]
  • PUBLIC KEY [RFC5280]

That's not the full story, though. The RFC was written by looking at existing implementations and documenting what they did. The RFC wasn't written first, nor was it written based on some existing authoritative documentation. So if you end up in a situation where you want to inter-operate with some implementation, you may have to look into the implementation's source code to figure out what they support.

For example, OpenSSL defines these BEGIN and END markers in crypto/pem/pem.h. Here is an excerpt from the header file with all the BEGIN and END labels that they support.

# define PEM_STRING_X509_OLD     "X509 CERTIFICATE"
# define PEM_STRING_X509         "CERTIFICATE"
# define PEM_STRING_X509_TRUSTED "TRUSTED CERTIFICATE"
# define PEM_STRING_X509_REQ_OLD "NEW CERTIFICATE REQUEST"
# define PEM_STRING_X509_REQ     "CERTIFICATE REQUEST"
# define PEM_STRING_X509_CRL     "X509 CRL"
# define PEM_STRING_EVP_PKEY     "ANY PRIVATE KEY"
# define PEM_STRING_PUBLIC       "PUBLIC KEY"
# define PEM_STRING_RSA          "RSA PRIVATE KEY"
# define PEM_STRING_RSA_PUBLIC   "RSA PUBLIC KEY"
# define PEM_STRING_DSA          "DSA PRIVATE KEY"
# define PEM_STRING_DSA_PUBLIC   "DSA PUBLIC KEY"
# define PEM_STRING_PKCS7        "PKCS7"
# define PEM_STRING_PKCS7_SIGNED "PKCS #7 SIGNED DATA"
# define PEM_STRING_PKCS8        "ENCRYPTED PRIVATE KEY"
# define PEM_STRING_PKCS8INF     "PRIVATE KEY"
# define PEM_STRING_DHPARAMS     "DH PARAMETERS"
# define PEM_STRING_DHXPARAMS    "X9.42 DH PARAMETERS"
# define PEM_STRING_SSL_SESSION  "SSL SESSION PARAMETERS"
# define PEM_STRING_DSAPARAMS    "DSA PARAMETERS"
# define PEM_STRING_ECDSA_PUBLIC "ECDSA PUBLIC KEY"
# define PEM_STRING_ECPARAMETERS "EC PARAMETERS"
# define PEM_STRING_ECPRIVATEKEY "EC PRIVATE KEY"
# define PEM_STRING_PARAMETERS   "PARAMETERS"
# define PEM_STRING_CMS          "CMS"

These labels are a start, but you still have to look into how the implementation encodes the data between the labels. There's not one correct answer for everything.


Updated answer for 2015: As users have already answered twice, before moderator @royhowie deleted the answers: there is now RFC 7468 which defines the PEM headers. The following quote is only a small part, and you should read the actual spec, which will likely stay on the internet for far longer than StackOverflow will.

However @royhowie deletes every answer that points to the RFC as 'link only' unless it has some text. So here is some text:

7. Textual Encoding of PKCS #10 Certification Request Syntax

PKCS #10 Certification Requests are encoded using the "CERTIFICATE REQUEST" label. The encoded data MUST be a BER (DER strongly preferred; see Appendix B) encoded ASN.1 CertificationRequest structure as described in [RFC2986].

-----BEGIN CERTIFICATE REQUEST-----
MIIBWDCCAQcCAQAwTjELMAkGA1UEBhMCU0UxJzAlBgNVBAoTHlNpbW9uIEpvc2Vm
c3NvbiBEYXRha29uc3VsdCBBQjEWMBQGA1UEAxMNam9zZWZzc29uLm9yZzBOMBAG
ByqGSM49AgEGBSuBBAAhAzoABLLPSkuXY0l66MbxVJ3Mot5FCFuqQfn6dTs+9/CM
EOlSwVej77tj56kj9R/j9Q+LfysX8FO9I5p3oGIwYAYJKoZIhvcNAQkOMVMwUTAY
BgNVHREEETAPgg1qb3NlZnNzb24ub3JnMAwGA1UdEwEB/wQCMAAwDwYDVR0PAQH/
BAUDAwegADAWBgNVHSUBAf8EDDAKBggrBgEFBQcDATAKBggqhkjOPQQDAgM/ADA8
AhxBvfhxPFfbBbsE1NoFmCUczOFApEuQVUw3ZP69AhwWXk3dgSUsKnuwL5g/ftAY
dEQc8B8jAcnuOrfU
-----END CERTIFICATE REQUEST-----

Figure 9: PKCS #10 Example

The label "NEW CERTIFICATE REQUEST" is also in wide use. Generators conforming to this document MUST generate "CERTIFICATE REQUEST" labels. Parsers MAY treat "NEW CERTIFICATE REQUEST" as equivalent to "CERTIFICATE REQUEST".


To get you started: As far as I know, if there's a part that's human-readable (has words and stuff), that's meant for human operators to know what the certification in question is, expiry dates, etc, for a quick manual verification. So you can ignore that.

You'll want to parse what's between the BEGIN-END blocks.

Inside, you'll find a Base64 encoded entity that you need to Base64 decode into bytes. These bytes represent a DER encoded certificate/key/etc. I'm not sure what good libraries you could use for parsing the DER data.

As a test to understand what data is inside each block, you can paste what's between the BEGIN-END blocks to this site which does ASN.1 decoding in JavaScript:

http://lapo.it/asn1js/

Although I wouldn't go pasting any production environment private keys to any site (although that seems to be just a javascript).

Base64: http://en.wikipedia.org/wiki/Base64

DER: http://en.wikipedia.org/wiki/Distinguished_Encoding_Rules

ASN.1: http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One