How to convert a .pdf file into a folder of images?

Solution 1:

OK well, I did some more research and although tohuwawohu's method does work, I found it easier to use a program called pdftoppm to achieve what I wanted done. Since I am pretty much a layperson when it comes to using command line apps, I will do my best to explain how I got this to work for me.

  1. Navigate to the folder containing the .pdf you wish to edit and open a terminal there. I did this by using the sample command:

    cd ~/Documents/PDF
    
  2. Let's say the file I want to edit is called Sample.pdf What I want to do is use pdftoppm to create image files of each page of the .pdf. Several formats can be chosen (see the man pages link above) but I prefer to use .png. The basic command looks like this:

    pdftoppm -FORMAT FILENAME.pdf PREFIX
    

    or in the example above:

    pdftoppm -png Sample.pdf Sample
    

    This command creates an image file of each page in the same folder as the original .pdf file with names like Sample-01.png, Sample-02.png and so on. I have tried it with the .png and .jpeg extensions successfully. .jpg is apparently not supported.

  3. Then I just use Archive Manager by selecting all the newly-created image files, right-clicking, and choosing "Compress" from the context menu. I then choose the archive format I prefer (in this case .cbz or Comic Book Zip) and create the new archive.

  4. Now I have a shiny new .cbz file called Sample.cbz which I can then view with my Comix reader!

Hopefully what I have posted above makes enough sense that someone else can learn from it. If I need to change it in any way please let me know.

Solution 2:

I'm not very familiar with *.cbr / *.cbz, but it seems you'll have to combine two steps:

  1. Convert PDF to Images
  2. Compress them into a ZIP / RAR archive.

Regarding step 1, you could use ImageMagick's convert command. You can feed convert with a PDf comprising multiple pages, and convert will return each page as single graphics file. I've tested it with a text scanned at 400 dpi, and the following command resulted in nice single JPGEs:

$ convert -verbose -colorspace RGB -interlace none -density 400 -quality 100 yourPdfFile.pdf 00%d.jpeg

(credits regarding the -quality option: this forum entry)

As a result, you get 000.jpeg, 001.jpeg and so on. Just zip them into a .cbz file, and you're done.

You could even combine both steps by "concatenating" them:

$ convert -verbose -colorspace RGB -interlace none -density 400 -quality 100 yourPdfFile.pdf 00%d.jpg && zip -vm comic.cbz *.jpg

(make sure that there aren't any other JPEGs in your current working directory, since using the code above, zip will move all JPEGs into the cbz file)