How to insert an PDPage within another PDPage with pdfbox

I use different tools like processing to create vector plots. These plots are written as single or multi-page pdfs. I would like to include these plots in a single report-like pdf using pdfbox.

My current workflow includes these pdfs as images with the following pseudo code

PDDocument inFile = PDDocument.load(file);
PDPage firstPage = (PDPage) inFile.getDocumentCatalog().getAllPages().get(0);
BufferedImage image = firstPage.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
PDXObjectImage ximage = new PDPixelMap(document, image);

PDPageContentStream contentStream = new PDPageContentStream(document, page);
contentStream.drawXObject(ximage, 0, 0, ximage.getWidth(), ximage.getHeight());
contentStream.close();

While this works it looses the benefits of the vector file formats, espectially file/size vs. printing qualitity.

Is it possible to use pdfbox to include other pdf pages as embedded objects within a page (Not added as a separate page)? Could I e.g. use a PDStream? I would prefer a solution like pdflatex is able to embed pdf figures into a new pdf document.

What other Java libraries can you recommend for that task?


Is it possible to use pdfbox to include other pdf pages as embedded objects within a page

It should be possible. The PDF format allows the use of so called form xobjects to serve as such embedded objects. I don't see an explicit implementation for that, though, but the procedure is similar enough to what PageExtractor or PDFMergerUtility do.

A proof of concept derived from PageExtractor using the current SNAPSHOT of the PDFBox 2.0.0 development version:

PDDocument source = PDDocument.loadNonSeq(SOURCE, null);
List<PDPage> pages = source.getDocumentCatalog().getAllPages();

PDDocument target = new PDDocument();
PDPage page = new PDPage();
PDRectangle cropBox = page.findCropBox();
page.setResources(new PDResources());
target.addPage(page);

PDFormXObject xobject = importAsXObject(target, pages.get(0));
page.getResources().addXObject(xobject, "X");

PDPageContentStream content = new PDPageContentStream(target, page);
AffineTransform transform = new AffineTransform(0, 0.5, -0.5, 0, cropBox.getWidth(), 0);
content.drawXObject(xobject, transform);
transform = new AffineTransform(0.5, 0.5, -0.5, 0.5, 0.5 * cropBox.getWidth(), 0.2 * cropBox.getHeight());
content.drawXObject(xobject, transform);
content.close();

target.save(TARGET);
target.close();
source.close();

This code imports the first page of a source document to a target document as XObject and puts it twice onto a page there with different scaling and rotation transformations, e.g. for this source

Source PDF, page 1

it creates this

Target PDF

The helper method importAsXObject actually doing the import is defined like this:

PDFormXObject importAsXObject(PDDocument target, PDPage page) throws IOException
{
    final PDStream src = page.getContents();
    if (src != null)
    {
        final PDFormXObject xobject = new PDFormXObject(target);

        OutputStream os = xobject.getPDStream().createOutputStream();
        InputStream is = src.createInputStream();
        try
        {
            IOUtils.copy(is, os);
        }
        finally
        {
            IOUtils.closeQuietly(is);
            IOUtils.closeQuietly(os);
        }

        xobject.setResources(page.findResources());
        xobject.setBBox(page.findCropBox());

        return xobject;
    }
    return null;
}

As mentioned above this is only a proof of concept, corner cases have not yet been taken into account.


To update this question:

There is already a helper class in org.apache.pdfbox.multipdf.LayerUtility to do the import.

Example to show superimposing a PDF page onto another PDF: SuperimposePage.

This class is part of the Apache PDFBox Examples and sample transformations as shown by @mkl were added to it.


As mkl appropriately suggested, PDFClown is among the Java libraries which provide explicit support for page embedding (so-called Form XObjects (see PDF Reference 1.7, § 4.9)).

In order to let you get a taste of the way PDFClown works, the following code represents the equivalent of mkl's PDFBox solution (NOTE: as mkl later stated, his code sample was by no means optimised, so this comparison may not correspond to the actual status of PDFBox -- comments are welcome to clarify this):

Document source = new File(SOURCE).getDocument();
Pages sourcePages = source.getPages();

Document target = new File().getDocument();
Page targetPage = new Page(target);
target.getPages().add(targetPage);

XObject xobject = sourcePages.get(0).toXObject(target);

PrimitiveComposer composer = new PrimitiveComposer(targetPage);
Dimension2D targetSize = targetPage.getSize();
Dimension2D sourceSize = xobject.getSize();
composer.showXObject(xobject, new Point2D.Double(targetSize.getWidth() * .5, targetSize.getHeight() * .35), new Dimension(sourceSize.getWidth() * .6, sourceSize.getHeight() * .6), XAlignmentEnum.Center, YAlignmentEnum.Middle, 45);
composer.showXObject(xobject, new Point2D.Double(targetSize.getWidth() * .35, targetSize.getHeight()), new Dimension(sourceSize.getWidth() * .4, sourceSize.getHeight() * .4), XAlignmentEnum.Left, YAlignmentEnum.Top, 90);
composer.flush();

target.getFile().save(TARGET, SerializationModeEnum.Standard);
source.getFile().close();

Comparing this code to PDFBox's equivalent you can notice some relevant differences which show PDFClown's neater style (it would be nice if some PDFBox expert could validate my assertions):

  • Page-to-FormXObject conversion: PDFClown natively supports a dedicated method (Page.toXObject()), so there's no need for additional heavy-lifting such as the helper method importAsXObject();
  • Resource management: PDFClown automatically (and transparently) allocates page resources, so there's no need for explicit calls such as page.getResources().addXObject(xobject, "X");
  • XObject drawing: PDFClown supports both high-level (explicit scale, translation and rotation anchors) and low-level (affine transformations) methods to place your FormXObject into the page, so there's no need to necessarily deal with affine transformations.

The whole point is that PDFClown features a rich architecture made up of multiple abstraction layers: according to your requirements, you can choose the most appropriate coding style (either to delve into PDF's low-level basic structures or to leverage its convenient and elegant high-level model). PDFClown lets you tweak every single byte and solve complex tasks with a ridiculously simple method call, at your will.

DISCLOSURE: I'm the lead developer of PDFClown.