How to convert PDF to image?
I have requirement of converting PDF pages to images. There is a background image with some text in my file, and when I save it as an image only the background image gets saved.
Is there any software available for the same so that complete page can be converted to an image?
Solution 1:
You can use pdftoppm
from the poppler-utils
package to convert a PDF to a PNG:
pdftoppm input.pdf outputname -png
This will output each page in the PDF using the format outputname-01.png
, with 01
being the index of the page.
Converting a single page or a range of pages of the PDF
pdftoppm input.pdf outputname -png -f {page} -singlefile
Change {page}
to the page number. It's indexed at 1, so -f 1
would be the first page.
If you'd like to work on a range of pages, you can also specify a number for the flag -l
(last page), so having -f 1 -l 30
would specify the pages from 1 to 30.
Specifying the converted image's resolution
The default resolution for this command is 150 DPI. Increasing it will result in both a larger file size and more detail.
To increase the resolution of the converted PDF, add the options -rx {resolution}
and -ry {resolution}
. For example:
pdftoppm input.pdf outputname -png -rx 300 -ry 300
Solution 2:
Install imagemagick.
-
Using a terminal where the PDF is located:
-
For the full document:
convert -density 150 input.pdf -quality 90 output.png
-
For a single page:
convert -density 150 input.pdf[666] -quality 90 output.png
-
Whereby:
PNG, JPG or (virtually) any other image format can be chosen.
-density xxx
will set the DPI toxxx
(common are 150 and 300).-quality xxx
will set the compression toxxx
for PNG, JPG and MIFF file formates (100 means no compression).[666]
will convert only the 667th page to PNG (zero-based numbering so[0]
is the 1st page).All other options (such as trimming, grayscale, etc.) can be viewed on the website of Image Magic.
Solution 3:
IIRC GIMP is capable of using PDFs, i.e. converting them into images. So if you want to edit the images right away - GIMP is your friend.
Solution 4:
The currently accepted answer does the job but results in an output which is larger in size and suffers from quality loss.
The method in the answer given here results in an output which is comparable in size to the input and doesn't suffer from quality loss.
TLDR - Use pdfimages
: pdfimages -j input.pdf output
Quoting the linked answer:
It's not clear what you mean by "quality loss". That could mean a lot of different things. Could you post some samples to illustrate? Perhaps cut the same section out of the poor quality and good quality versions (as a PNG to avoid further quality loss).
Perhaps you need to use
-density
to do the conversion at a higher dpi:convert -density 300 file.pdf page_%04d.jpg
(You can prepend
-units PixelsPerInch
or-units PixelsPerCentimeter
if necessary. My copy defaults to ppi.)Update: As you pointed out,
gscan2pdf
(the way you're using it) is just a wrapper forpdfimages
(from poppler).pdfimages
does not do the same thing thatconvert
does when given a PDF as input.
convert
takes the PDF, renders it at some resolution, and uses the resulting bitmap as the source image.
pdfimages
looks through the PDF for embedded bitmap images and exports each one to a file. It simply ignores any text or vector drawing commands in the PDF.As a result, if what you have is a PDF that's just a wrapper around a series of bitmaps,
pdfimages
will do a much better job of extracting them, because it gets you the raw data at its original size. You probably also want to use the-j
option topdfimages
, because a PDF can contain raw JPEG data. By default,pdfimages
converts everything to PNM format, and converting JPEG > PPM > JPEG is a lossy process.So, try
pdfimages -j file.pdf page
You may or may not need to follow that with a
convert
to.jpg
step (depending on what bitmap format the PDF was using).I tried this command on a PDF that I had made myself from a sequence of JPEG images. The extracted JPEGs were byte-for-byte identical to the source images. You can't get higher quality than that.