How to convert PDF files to images

I need to convert PDF files to images. If the PDF file is multi-page，I just need one image that contains all of the PDF pages.

Is there an open source solution which is not charged like the Acrobat product?

The thread "converting PDF file to a JPEG image" is suitable for your request.

One solution is to use a third-party library. ImageMagick is a very popular and is freely available too. You can get a .NET wrapper for it here. The original ImageMagick download page is here.

Convert PDF pages to image files using the Solid Framework Convert PDF pages to image files using the Solid Framework (dead link, the deleted document is available on Internet Archive).
Convert PDF to JPG Universal Document Converter
6 Ways to Convert a PDF to a JPG Image

And you also can take a look at the thread "How to open a page from a pdf file in pictureBox in C#".

If you use this process to convert a PDF to tiff, you can use this class to retrieve the bitmap from TIFF.

public class TiffImage
{
    private string myPath;
    private Guid myGuid;
    private FrameDimension myDimension;
    public ArrayList myImages = new ArrayList();
    private int myPageCount;
    private Bitmap myBMP;

    public TiffImage(string path)
    {
        MemoryStream ms;
        Image myImage;

        myPath = path;
        FileStream fs = new FileStream(myPath, FileMode.Open);
        myImage = Image.FromStream(fs);
        myGuid = myImage.FrameDimensionsList[0];
        myDimension = new FrameDimension(myGuid);
        myPageCount = myImage.GetFrameCount(myDimension);
        for (int i = 0; i < myPageCount; i++)
        {
            ms = new MemoryStream();
            myImage.SelectActiveFrame(myDimension, i);
            myImage.Save(ms, ImageFormat.Bmp);
            myBMP = new Bitmap(ms);
            myImages.Add(myBMP);
            ms.Close();
        }
        fs.Close();
    }
}

Use it like so:

private void button1_Click(object sender, EventArgs e)
{
    TiffImage myTiff = new TiffImage("D:\\Some.tif");
    //imageBox is a PictureBox control, and the [] operators pass back
    //the Bitmap stored at that position in the myImages ArrayList in the TiffImage
    this.pictureBox1.Image = (Bitmap)myTiff.myImages[0];
    this.pictureBox2.Image = (Bitmap)myTiff.myImages[1];
    this.pictureBox3.Image = (Bitmap)myTiff.myImages[2];
}

You can use Ghostscript to convert PDF to images.

To use Ghostscript from .NET you can take a look at Ghostscript.NET library (managed wrapper around the Ghostscript library).

To produce image from the PDF by using Ghostscript.NET, take a look at RasterizerSample.

To combine multiple images into the single image, check out this sample: http://www.niteshluharuka.com/2012/08/combine-several-images-to-form-a-single-image-using-c/#

As for 2018 there is still not a simple answer to the question of how to convert a PDF document to an image in C#; many libraries use Ghostscript licensed under AGPL and in most cases an expensive commercial license is required for production use.

A good alternative might be using the popular 'pdftoppm' utility which has a GPL license; it can be used from C# as command line tool executed with System.Diagnostics.Process. Popular tools are well known in the Linux world, but a windows build is also available.

If you don't want to integrate pdftoppm by yourself, you can use my PdfRenderer popular wrapper (supports both classic .NET Framework and .NET Core) - it is not free, but pricing is very affordable.

I used PDFiumSharp and ImageSharp in a .NET Standard 2.1 class library.

/// <summary>
/// Saves a thumbnail (jpg) to the same folder as the PDF file, using dimensions 300x423,
/// which corresponds to the aspect ratio of 'A' paper sizes like A4 (ratio h/w=sqrt(2))
/// </summary>
/// <param name="pdfPath">Source path of the pdf file.</param>
/// <param name="thumbnailPath">Target path of the thumbnail file.</param>
/// <param name="width"></param>
/// <param name="height"></param>
public static void SaveThumbnail(string pdfPath, string thumbnailPath = "", int width = 300, int height = 423)
{
    using var pdfDocument = new PdfDocument(pdfPath);
    var firstPage = pdfDocument.Pages[0];

    using var pageBitmap = new PDFiumBitmap(width, height, true);

    firstPage.Render(pageBitmap);

    var imageJpgPath = string.IsNullOrWhiteSpace(thumbnailPath)
        ? Path.ChangeExtension(pdfPath, "jpg")
        : thumbnailPath;
    var image = Image.Load(pageBitmap.AsBmpStream());

    // Set the background to white, otherwise it's black. https://github.com/SixLabors/ImageSharp/issues/355#issuecomment-333133991
    image.Mutate(x => x.BackgroundColor(Rgba32.White));

    image.Save(imageJpgPath, new JpegEncoder());
}

The PDF engine used in Google Chrome, called PDFium, is open source under the "BSD 3-clause" license. I believe this allows redistribution when used in a commercial product.

There is a .NET wrapper for it called PdfiumViewer (NuGet) which works well to the extent I have tried it. It is under the Apache license which also allows redistribution.

(Note that this is NOT the same 'wrapper' as https://pdfium.patagames.com/ which requires a commercial license*)

(There is one other PDFium .NET wrapper, PDFiumSharp, but I have not evaluated it.)

In my opinion, so far, this may be the best choice of open-source (free as in beer) PDF libraries to do the job which do not put restrictions on the closed-source / commercial nature of the software utilizing them. I don't think anything else in the answers here satisfy that criteria, to the best of my knowledge.

How to convert PDF files to images

Related

Recent Posts