How can I remove all images from a PDF?

Meanwhile the latest Ghostscript releases have a much nicer and easier to use method of removing all images from a PDF. The parameter to add to the command line is -dFILTERIMAGE

 gs -o noimages.pdf -sDEVICE=pdfwrite -dFILTERIMAGE input.pdf

Even better, you can also remove all text or all vector drawing elements from a PDF by specifying -dFILTERTEXT or -dFILTERVECTOR.

Of course, you can also combine any combination of these -dFILTER* parameters you want in order to achieve a required result. (Combining all three will of course result in "empty" pages.)

Here is the screenshot from an example PDF page which contains all 3 types of content mentioned above:

_{Screenshot of original PDF page containing "image", "vector" and "text" elements.}

Running the following 6 commands will create all 6 possible variations of remaining contents:

 gs -o noIMG.pdf   -sDEVICE=pdfwrite -dFILTERIMAGE                input.pdf
 gs -o noTXT.pdf   -sDEVICE=pdfwrite -dFILTERTEXT                 input.pdf
 gs -o noVCT.pdf   -sDEVICE=pdfwrite -dFILTERVECTOR               input.pdf

 gs -o onlyTXT.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERIMAGE input.pdf 
 gs -o onlyIMG.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERTEXT  input.pdf
 gs -o onlyVCT.pdf -sDEVICE=pdfwrite -dFILTERIMAGE  -dFILTERTEXT  input.pdf

The following image illustrates the results:

_{Top row, from left: all "text" removed; all "images" removed; all "vectors" removed. Bottom row, from left: only "text" kept; only "images" kept; only "vectors" kept.}

I'm putting up the answer myself, but the actual code is by courtesy of Chris Liddell, Ghostscript developer.

I used his original PostScript code and stripped off its other functions. Only the function which removes raster images remains. Other graphical page objects -- text sections, patterns and vector objects -- should remain untouched.

Copy the following code and save it as remove-images.ps:

%!PS

% Run as:
%
%      gs ..... -dFILTERIMAGE -dDELAYBIND -dWRITESYSTEMDICT \
%                 ..... remove-images.ps <your-input-file>
%
% derived from Chris Liddell's original 'filter-obs.ps' script
% Adapted by @pdfkungfoo (on Twitter)

currentglobal true setglobal

32 dict begin

/debugprint     { systemdict /DUMPDEBUG .knownget { {print flush} if} 
                {pop} ifelse } bind def

/pushnulldevice {
  systemdict exch .knownget not
  {
    //false
  } if

  {
    gsave
    matrix currentmatrix
    nulldevice
    setmatrix
  } if
} bind def

/popnulldevice {
  systemdict exch .knownget not
  {
    //false
  } if
  {
    % this is hacky - some operators clear the current point
    % i.e.
    { currentpoint } stopped
    { grestore }
    { grestore moveto} ifelse
  } if
} bind def

/sgd {systemdict exch get def} bind def

systemdict begin

/_image /image sgd
/_imagemask /imagemask sgd
/_colorimage /colorimage sgd

/image {
   (\nIMAGE\n) //debugprint exec /FILTERIMAGE //pushnulldevice exec
  _image
  /FILTERIMAGE //popnulldevice exec
} bind def

/imagemask
{
  (\nIMAGEMASK\n) //debugprint exec
  /FILTERIMAGE //pushnulldevice exec
  _imagemask
  /FILTERIMAGE //popnulldevice exec
} bind def

/colorimage
{
  (\nCOLORIMAGE\n) //debugprint exec
  /FILTERIMAGE //pushnulldevice exec
  _colorimage
  /FILTERIMAGE //popnulldevice exec
} bind def

end
end

.bindnow

setglobal

Now run this command:

gs -o no-more-images-in-sample.pdf \
   -sDEVICE=pdfwrite               \
   -dFILTERIMAGE                   \
   -dDELAYBIND                     \
   -dWRITESYSTEMDICT               \
    remove-images.ps               \
    sample.pdf

I tested the code with the official PDF specification, and it worked. The following two screenshots show page 750 of input and output PDFs:

If you wonder why something that looks like an image is still on the output page: it is not really a raster image, but a 'pattern' in the original file, and therefor it is not removed.

How can I remove all images from a PDF?

Related

Recent Posts