How to merge multiple pdf files (generated in run time)?
Solution 1:
If you want to merge source documents using iText(Sharp), there are two basic situations:
You really want to merge the documents, acquiring the pages in their original format, transfering as much of their content and their interactive annotations as possible. In this case you should use a solution based on a member of the
Pdf*Copy*
family of classes.You actually want to integrate pages from the source documents into a new document but want the new document to govern the general format and don't care for the interactive features (annotations...) in the original documents (or even want to get rid of them). In this case you should use a solution based on the
PdfWriter
class.
You can find details in chapter 6 (especially section 6.4) of iText in Action — 2nd Edition. The Java sample code can be accessed here and the C#'ified versions here.
A simple sample using PdfCopy
is Concatenate.java / Concatenate.cs. The central piece of code is:
byte[] mergedPdf = null;
using (MemoryStream ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
for (int i = 0; i < pdf.Count; ++i)
{
PdfReader reader = new PdfReader(pdf[i]);
// loop over the pages in that document
int n = reader.NumberOfPages;
for (int page = 0; page < n; )
{
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
}
}
}
mergedPdf = ms.ToArray();
}
Here pdf
can either be defined as a List<byte[]>
immediately containing the source documents (appropriate for your use case of merging intermediate in-memory documents) or as a List<String>
containing the names of source document files (appropriate if you merge documents from disk).
An overview at the end of the referenced chapter summarizes the usage of the classes mentioned:
PdfCopy
: Copies pages from one or more existing PDF documents. Major downsides:PdfCopy
doesn’t detect redundant content, and it fails when concatenating forms.PdfCopyFields
: Puts the fields of the different forms into one form. Can be used to avoid the problems encountered with form fields when concatenating forms usingPdfCopy
. Memory use can be an issue.PdfSmartCopy
: Copies pages from one or more existing PDF documents.PdfSmartCopy
is able to detect redundant content, but it needs more memory and CPU thanPdfCopy
.PdfWriter
: Generates PDF documents from scratch. Can import pages from other PDF documents. The major downside is that all interactive features of the imported page (annotations, bookmarks, fields, and so forth) are lost in the process.
Solution 2:
I used iTextsharp with c# to combine pdf files. This is the code I used.
string[] lstFiles=new string[3];
lstFiles[0]=@"C:/pdf/1.pdf";
lstFiles[1]=@"C:/pdf/2.pdf";
lstFiles[2]=@"C:/pdf/3.pdf";
PdfReader reader = null;
Document sourceDocument = null;
PdfCopy pdfCopyProvider = null;
PdfImportedPage importedPage;
string outputPdfPath=@"C:/pdf/new.pdf";
sourceDocument = new Document();
pdfCopyProvider = new PdfCopy(sourceDocument, new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
//Open the output file
sourceDocument.Open();
try
{
//Loop through the files list
for (int f = 0; f < lstFiles.Length-1; f++)
{
int pages =get_pageCcount(lstFiles[f]);
reader = new PdfReader(lstFiles[f]);
//Add pages of current file
for (int i = 1; i <= pages; i++)
{
importedPage = pdfCopyProvider.GetImportedPage(reader, i);
pdfCopyProvider.AddPage(importedPage);
}
reader.Close();
}
//At the end save the output file
sourceDocument.Close();
}
catch (Exception ex)
{
throw ex;
}
private int get_pageCcount(string file)
{
using (StreamReader sr = new StreamReader(File.OpenRead(file)))
{
Regex regex = new Regex(@"/Type\s*/Page[^s]");
MatchCollection matches = regex.Matches(sr.ReadToEnd());
return matches.Count;
}
}
Solution 3:
Here is some code I pulled out of an old project I had. It was a web application but I was using iTextSharp to merge pdf files then print them.
public static class PdfMerger
{
/// <summary>
/// Merge pdf files.
/// </summary>
/// <param name="sourceFiles">PDF files being merged.</param>
/// <returns></returns>
public static byte[] MergeFiles(List<Stream> sourceFiles)
{
Document document = new Document();
MemoryStream output = new MemoryStream();
try
{
// Initialize pdf writer
PdfWriter writer = PdfWriter.GetInstance(document, output);
writer.PageEvent = new PdfPageEvents();
// Open document to write
document.Open();
PdfContentByte content = writer.DirectContent;
// Iterate through all pdf documents
for (int fileCounter = 0; fileCounter < sourceFiles.Count; fileCounter++)
{
// Create pdf reader
PdfReader reader = new PdfReader(sourceFiles[fileCounter]);
int numberOfPages = reader.NumberOfPages;
// Iterate through all pages
for (int currentPageIndex = 1; currentPageIndex <=
numberOfPages; currentPageIndex++)
{
// Determine page size for the current page
document.SetPageSize(
reader.GetPageSizeWithRotation(currentPageIndex));
// Create page
document.NewPage();
PdfImportedPage importedPage =
writer.GetImportedPage(reader, currentPageIndex);
// Determine page orientation
int pageOrientation = reader.GetPageRotation(currentPageIndex);
if ((pageOrientation == 90) || (pageOrientation == 270))
{
content.AddTemplate(importedPage, 0, -1f, 1f, 0, 0,
reader.GetPageSizeWithRotation(currentPageIndex).Height);
}
else
{
content.AddTemplate(importedPage, 1f, 0, 0, 1f, 0, 0);
}
}
}
}
catch (Exception exception)
{
throw new Exception("There has an unexpected exception" +
" occured during the pdf merging process.", exception);
}
finally
{
document.Close();
}
return output.GetBuffer();
}
}
/// <summary>
/// Implements custom page events.
/// </summary>
internal class PdfPageEvents : IPdfPageEvent
{
#region members
private BaseFont _baseFont = null;
private PdfContentByte _content;
#endregion
#region IPdfPageEvent Members
public void OnOpenDocument(PdfWriter writer, Document document)
{
_baseFont = BaseFont.CreateFont(BaseFont.HELVETICA,
BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
_content = writer.DirectContent;
}
public void OnStartPage(PdfWriter writer, Document document)
{ }
public void OnEndPage(PdfWriter writer, Document document)
{ }
public void OnCloseDocument(PdfWriter writer, Document document)
{ }
public void OnParagraph(PdfWriter writer,
Document document, float paragraphPosition)
{ }
public void OnParagraphEnd(PdfWriter writer,
Document document, float paragraphPosition)
{ }
public void OnChapter(PdfWriter writer, Document document,
float paragraphPosition, Paragraph title)
{ }
public void OnChapterEnd(PdfWriter writer,
Document document, float paragraphPosition)
{ }
public void OnSection(PdfWriter writer, Document document,
float paragraphPosition, int depth, Paragraph title)
{ }
public void OnSectionEnd(PdfWriter writer,
Document document, float paragraphPosition)
{ }
public void OnGenericTag(PdfWriter writer, Document document,
Rectangle rect, string text)
{ }
#endregion
private float GetCenterTextPosition(string text, PdfWriter writer)
{
return writer.PageSize.Width / 2 - _baseFont.GetWidthPoint(text, 8) / 2;
}
}
I didn't write this, but made some modifications. I can't remember where I found it. After I merged the PDFs I would call this method to insert javascript to open the print dialog when the PDF is opened. If you change bSilent to true then it should print silently to their default printer.
public Stream addPrintJStoPDF(Stream thePDF)
{
MemoryStream outPutStream = null;
PRStream finalStream = null;
PdfDictionary page = null;
string content = null;
//Open the stream with iTextSharp
var reader = new PdfReader(thePDF);
outPutStream = new MemoryStream(finalStream.GetBytes());
var stamper = new PdfStamper(reader, (MemoryStream)outPutStream);
var jsText = "var res = app.setTimeOut('this.print({bUI: true, bSilent: false, bShrinkToFit: false});', 200);";
//Add the javascript to the PDF
stamper.JavaScript = jsText;
stamper.FormFlattening = true;
stamper.Writer.CloseStream = false;
stamper.Close();
//Set the stream to the beginning
outPutStream.Position = 0;
return outPutStream;
}
Not sure how well the above code is written since I pulled it from somewhere else and I haven't worked in depth at all with iTextSharp but I do know that it did work at merging PDFs that I was generating at runtime.
Solution 4:
Tested with iTextSharp-LGPL 4.1.6:
public static byte[] ConcatenatePdfs(IEnumerable<byte[]> documents)
{
using (var ms = new MemoryStream())
{
var outputDocument = new Document();
var writer = new PdfCopy(outputDocument, ms);
outputDocument.Open();
foreach (var doc in documents)
{
var reader = new PdfReader(doc);
for (var i = 1; i <= reader.NumberOfPages; i++)
{
writer.AddPage(writer.GetImportedPage(reader, i));
}
writer.FreeReader(reader);
reader.Close();
}
writer.Close();
outputDocument.Close();
var allPagesContent = ms.GetBuffer();
ms.Flush();
return allPagesContent;
}
}