How to merge multiple pdf files (generated in run time)?

Solution 1:

If you want to merge source documents using iText(Sharp), there are two basic situations:

  1. You really want to merge the documents, acquiring the pages in their original format, transfering as much of their content and their interactive annotations as possible. In this case you should use a solution based on a member of the Pdf*Copy* family of classes.

  2. You actually want to integrate pages from the source documents into a new document but want the new document to govern the general format and don't care for the interactive features (annotations...) in the original documents (or even want to get rid of them). In this case you should use a solution based on the PdfWriter class.

You can find details in chapter 6 (especially section 6.4) of iText in Action — 2nd Edition. The Java sample code can be accessed here and the C#'ified versions here.

A simple sample using PdfCopy is Concatenate.java / Concatenate.cs. The central piece of code is:

byte[] mergedPdf = null;
using (MemoryStream ms = new MemoryStream())
{
    using (Document document = new Document())
    {
        using (PdfCopy copy = new PdfCopy(document, ms))
        {
            document.Open();

            for (int i = 0; i < pdf.Count; ++i)
            {
                PdfReader reader = new PdfReader(pdf[i]);
                // loop over the pages in that document
                int n = reader.NumberOfPages;
                for (int page = 0; page < n; )
                {
                    copy.AddPage(copy.GetImportedPage(reader, ++page));
                }
            }
        }
    }
    mergedPdf = ms.ToArray();
}

Here pdf can either be defined as a List<byte[]> immediately containing the source documents (appropriate for your use case of merging intermediate in-memory documents) or as a List<String> containing the names of source document files (appropriate if you merge documents from disk).

An overview at the end of the referenced chapter summarizes the usage of the classes mentioned:

  • PdfCopy: Copies pages from one or more existing PDF documents. Major downsides: PdfCopy doesn’t detect redundant content, and it fails when concatenating forms.

  • PdfCopyFields: Puts the fields of the different forms into one form. Can be used to avoid the problems encountered with form fields when concatenating forms using PdfCopy. Memory use can be an issue.

  • PdfSmartCopy: Copies pages from one or more existing PDF documents. PdfSmartCopy is able to detect redundant content, but it needs more memory and CPU than PdfCopy.

  • PdfWriter: Generates PDF documents from scratch. Can import pages from other PDF documents. The major downside is that all interactive features of the imported page (annotations, bookmarks, fields, and so forth) are lost in the process.

Solution 2:

I used iTextsharp with c# to combine pdf files. This is the code I used.

string[] lstFiles=new string[3];
    lstFiles[0]=@"C:/pdf/1.pdf";
    lstFiles[1]=@"C:/pdf/2.pdf";
    lstFiles[2]=@"C:/pdf/3.pdf";

    PdfReader reader = null;
    Document sourceDocument = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage;
    string outputPdfPath=@"C:/pdf/new.pdf";


    sourceDocument = new Document();
    pdfCopyProvider = new PdfCopy(sourceDocument, new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));

    //Open the output file
    sourceDocument.Open();

    try
    {
        //Loop through the files list
        for (int f = 0; f < lstFiles.Length-1; f++)
        {
            int pages =get_pageCcount(lstFiles[f]);

            reader = new PdfReader(lstFiles[f]);
            //Add pages of current file
            for (int i = 1; i <= pages; i++)
            {
                importedPage = pdfCopyProvider.GetImportedPage(reader, i);
                pdfCopyProvider.AddPage(importedPage);
            }

            reader.Close();
         }
        //At the end save the output file
        sourceDocument.Close();
    }
    catch (Exception ex)
    {
        throw ex;
    }


private int get_pageCcount(string file)
{
    using (StreamReader sr = new StreamReader(File.OpenRead(file)))
    {
        Regex regex = new Regex(@"/Type\s*/Page[^s]");
        MatchCollection matches = regex.Matches(sr.ReadToEnd());

        return matches.Count;
    }
}

Solution 3:

Here is some code I pulled out of an old project I had. It was a web application but I was using iTextSharp to merge pdf files then print them.

public static class PdfMerger
    {
        /// <summary>
        /// Merge pdf files.
        /// </summary>
        /// <param name="sourceFiles">PDF files being merged.</param>
        /// <returns></returns>
        public static byte[] MergeFiles(List<Stream> sourceFiles)
        {
            Document document = new Document();
            MemoryStream output = new MemoryStream();

            try
            {
                // Initialize pdf writer
                PdfWriter writer = PdfWriter.GetInstance(document, output);
                writer.PageEvent = new PdfPageEvents();

                // Open document to write
                document.Open();
                PdfContentByte content = writer.DirectContent;

                // Iterate through all pdf documents
                for (int fileCounter = 0; fileCounter < sourceFiles.Count; fileCounter++)
                {
                    // Create pdf reader
                    PdfReader reader = new PdfReader(sourceFiles[fileCounter]);
                    int numberOfPages = reader.NumberOfPages;

                    // Iterate through all pages
                    for (int currentPageIndex = 1; currentPageIndex <=
                                        numberOfPages; currentPageIndex++)
                    {
                        // Determine page size for the current page
                        document.SetPageSize(
                            reader.GetPageSizeWithRotation(currentPageIndex));

                        // Create page
                        document.NewPage();
                        PdfImportedPage importedPage =
                            writer.GetImportedPage(reader, currentPageIndex);


                        // Determine page orientation
                        int pageOrientation = reader.GetPageRotation(currentPageIndex);
                        if ((pageOrientation == 90) || (pageOrientation == 270))
                        {
                            content.AddTemplate(importedPage, 0, -1f, 1f, 0, 0,
                                reader.GetPageSizeWithRotation(currentPageIndex).Height);
                        }
                        else
                        {
                            content.AddTemplate(importedPage, 1f, 0, 0, 1f, 0, 0);
                        }
                    }
                }
            }
            catch (Exception exception)
            {
                throw new Exception("There has an unexpected exception" +
                        " occured during the pdf merging process.", exception);
            }
            finally
            {
                document.Close();
            }
            return output.GetBuffer();
        }
    }



    /// <summary>
    /// Implements custom page events.
    /// </summary>
    internal class PdfPageEvents : IPdfPageEvent
    {
        #region members
        private BaseFont _baseFont = null;
        private PdfContentByte _content;
        #endregion

        #region IPdfPageEvent Members
        public void OnOpenDocument(PdfWriter writer, Document document)
        {
            _baseFont = BaseFont.CreateFont(BaseFont.HELVETICA,
                                BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
            _content = writer.DirectContent;
        }

        public void OnStartPage(PdfWriter writer, Document document)
        { }

        public void OnEndPage(PdfWriter writer, Document document)
        { }

        public void OnCloseDocument(PdfWriter writer, Document document)
        { }

        public void OnParagraph(PdfWriter writer,
                    Document document, float paragraphPosition)
        { }

        public void OnParagraphEnd(PdfWriter writer,
                    Document document, float paragraphPosition)
        { }

        public void OnChapter(PdfWriter writer, Document document,
                                float paragraphPosition, Paragraph title)
        { }

        public void OnChapterEnd(PdfWriter writer,
                    Document document, float paragraphPosition)
        { }

        public void OnSection(PdfWriter writer, Document document,
                    float paragraphPosition, int depth, Paragraph title)
        { }

        public void OnSectionEnd(PdfWriter writer,
                    Document document, float paragraphPosition)
        { }

        public void OnGenericTag(PdfWriter writer, Document document,
                                    Rectangle rect, string text)
        { }
        #endregion

        private float GetCenterTextPosition(string text, PdfWriter writer)
        {
            return writer.PageSize.Width / 2 - _baseFont.GetWidthPoint(text, 8) / 2;
        }
    }

I didn't write this, but made some modifications. I can't remember where I found it. After I merged the PDFs I would call this method to insert javascript to open the print dialog when the PDF is opened. If you change bSilent to true then it should print silently to their default printer.

public Stream addPrintJStoPDF(Stream thePDF)
{
    MemoryStream outPutStream = null;
    PRStream finalStream = null;
    PdfDictionary page = null;
    string content = null;

    //Open the stream with iTextSharp
    var reader = new PdfReader(thePDF);

    outPutStream = new MemoryStream(finalStream.GetBytes());
    var stamper = new PdfStamper(reader, (MemoryStream)outPutStream);
    var jsText = "var res = app.setTimeOut('this.print({bUI: true, bSilent: false, bShrinkToFit: false});', 200);";
    //Add the javascript to the PDF
    stamper.JavaScript = jsText;

    stamper.FormFlattening = true;
    stamper.Writer.CloseStream = false;
    stamper.Close();

    //Set the stream to the beginning
    outPutStream.Position = 0;

    return outPutStream;
}

Not sure how well the above code is written since I pulled it from somewhere else and I haven't worked in depth at all with iTextSharp but I do know that it did work at merging PDFs that I was generating at runtime.

Solution 4:

Tested with iTextSharp-LGPL 4.1.6:

    public static byte[] ConcatenatePdfs(IEnumerable<byte[]> documents)
    {
        using (var ms = new MemoryStream())
        {
            var outputDocument = new Document();
            var writer = new PdfCopy(outputDocument, ms);
            outputDocument.Open();

            foreach (var doc in documents)
            {
                var reader = new PdfReader(doc);
                for (var i = 1; i <= reader.NumberOfPages; i++)
                {
                    writer.AddPage(writer.GetImportedPage(reader, i));
                }
                writer.FreeReader(reader);
                reader.Close();
            }

            writer.Close();
            outputDocument.Close();
            var allPagesContent = ms.GetBuffer();
            ms.Flush();

            return allPagesContent;
        }
    }