How to change internal page numbers in the meta data of a PDF?

Solution 1:

What you want is indeed called page labels and can easily be added directly in the PDF's source code. Rename the file extension from pdf to txt and open the file in a text editor (this can be slow, depending on the file size, be patient). The information about page labels is stored in a node called the document catalog which looks something like this:

3 0 obj
<< /Type /Catalog
   /Pages 1 0 R
>>
endobj

It may contain more confusing stuff, but this is the basic structure. There is only one catalog, so in a large file you can search for the node that contains /Catalog. Now you can make your desired changes by inserting the /PageLabels entry:

3 0 obj
<< /Type /Catalog
   /Pages 1 0 R
   /PageLabels << /Nums [ 0 << /P (cover) >>
                          % labels 1st page with the string "cover"
                          1 << /S /r >>
                          % numbers pages 2-6 in small roman numerals
                          6 << /S /D >>
                          % numbers pages 7-x in decimal arabic numerals
                        ]
               >>
>>
endobj

There are 3 lines starting with numbers, called page indices. Page 1 has the index 0, page 2 the index 1 and so forth. They always describe ranges, so the line with 1 <<...>> applies to all pages from index 1 to 5 and the line with 6 <<...>> applies to all pages from 6 up to the last page. A label for 0 <<...>> must always be defined.

You can find more information about page labels and PDF source code in the PDF standard or in a wiki on PDF standards.

Solution 2:

NOTE 1: The accepted answer is still mostly correct, but has some gaps. It is lacking in that many PDF files are not directly editable as text. Even when they are, such editing can sometimes damage the PDF making it unreadable. One solution, that will work for both Unix and Microsoft Windows is qpdf which can translate PDF files into "QDF", a text-editable form which is still a valid PDF file. The qpdf package comes with fix-qdf that recalculates offsets after a QDF file has been edited to correct any damage.

NOTE 2: Uncomfortable with text editors? Try using a GUI editor such as jpdftweak first. Sometimes the GUI pdf editors work, in which case, yay, you're done. However, when they fail, as has often been the case for me, you can try this more robust alternative. Either way, please do not down vote my answer for being less than elegant.


HOW TO Edit PDF Page Numbers Using Qpdf

Summary:

  1. qpdf -qdf foo.pdf foo.qdf
  2. edit foo.qdf

     0 << >>           % No label on first pages
     6 << /S /D >>     % Start numbering from 7th page.
    
  3. fix-qdf foo.qdf >bar.qdf
  4. test bar.qdf
  5. qpdf bar.qdf bar.pdf

Detailed steps

Step 1.

Convert the document to the easily editable QDF format. Run qpdf from the command line like so:

qpdf -qdf foo.pdf foo.qdf

Note: If you do not have qpdf installed already, Microsoft Windows executables can be downloaded from https://github.com/qpdf/qpdf/releases Unix systems, such as Ubuntu and Debian GNU/Linux can install it by typing apt install qpdf.

Step 2.

Edit the QDF document using a text editor such as notepad++, emacs, or gedit. Search for the word /Catalog and note the <<angle brackets>> it is inside. Nearby, you'll find the current /PageLabels (if any).

We'll be adding each section that should be differently numbered to the /PageLabels. The format is start-page << style >>. Note that white-space does not matter and that the first page of the document is 0. Unless otherwise specified, a new section always starts out numbering pages from 1.

Examples

Here is a full example of what PageLabels may look like, with comments added:

/Type /Catalog
/PageLabels <<
  /Nums [
    0           % From the first page of the document,
      <<
        /S /r   % ...use the lowercase roman numeral style.
      >>
    6           % From seventh page onward,
      <<
        /S /D   % ...use ordinary digits (arabic numerals)
      >>
  ]
>>

If the file has no PageLabels, add them after /Type /Catalog. For example, one might change,

1 0 obj
<<
  …
  /Type /Catalog
>>
endobj

into,

1 0 obj
<<
  … 
  /Type /Catalog
  /PageLabels
      << /Nums [
    0 << >>                 % No label for cover
    1 << /S /r >>           % i, ii for index
    3 << /S /D /St 15 >>    % 15, 16, 17, ... for article
    31 << /S /D /P (A-) >>  % A-1, A-2, A-3... for appendix
       ]
  >>
>>
endobj

OPTIONAL: STARTING FROM A DIFFERENT NUMBER WITH /St

Each section restarts numbering at 1 unless you tell it otherwise using /St. Notice how in the above example, the fourth page starts at 15.

OPTIONAL: USING A DIFFERENT STYLE WITH /S

The /S operator takes an argument that lets you pick the numbering style,

  • /D digits (1, 2, 3...)
  • /R uppercase Roman (I, II, III...)
  • /r lowercase Roman (i, ii, iii...)
  • /A uppercase alphabetical (A, B, C, ...., X, Y, Z, AA, AB, AC,...)
  • /a lowercase alphabetical (a, b, c, ...., x, y, z, aa, ab, ac,...)

If one omits the /S operator, then that section of pages will have no numbering. For example:

0 << >>         % No label for cover

OPTIONAL: ADDING A PREFIX TO EACH PAGE WITH /P

You can show any string of text before the page number by specifying a word in parentheses after /P:

  31
  <<
    /S /D
    /P (A-)     % label appendix pages A-1, A-2, A-3
  >>

Specifying a prefix without a style (/S), will give you pages that have only the word without any number. This can be useful, for example, if you'd like a cover page to simply have the label "Cover".

     0 << /P (Cover) >>        % No number, just "Cover"

Step 3.

Run fix-qdf to make your edits valid PDF and put the output in bar.qdf.

fix-qdf foo.qdf > bar.qdf

Step 4.

Open bar.qdf in your PDF viewing program and check that it is numbered correctly.

Step 5.

Convert the QDF file back into a normal PDF, like so:

qpdf bar.qdf bar.pdf

Ta da. You're done. You now have a document with correctly labeled page numbers in bar.pdf.