How to convert a pdf file into a text file?

Is there an easy way to extract plain text from a pdf file?

On *nix systems I used to have a command ps2ascii that would do the job, but it seems that this command is not installed by default on my Mac.

What would be the easiest way to extract text from a pdf file or, alternatively, how to get ps2ascii on my system?


Adobe Reader has a "Save as Text…" option under the File menu. Easiest way.


ps2ascii is a part of Ghostscript, which can be installed on Mac OS X (and it might already be by default from the factory).


If you don't mind using a GUI, you can select text from a PDF opened with Preview.app


The following python script will output the text from a PDF document to a .txt file. (Note: There is no guarantee that the text is necessarily in 'logical' human readable order, due to the way that data is held in the PDF format.)

The script will create text files for any PDF files supplied as arguments to it on the command line (e.g. pdf2txt.py myPDF.pdf), or you can use in Automator's "Run Shell Script" action, setting the shell type to python and Pass input to "As arguments".

#!/usr/bin/python
# coding: utf-8

import os, sys
from Quartz import PDFDocument
from CoreFoundation import (NSURL, NSString)
NSUTF8StringEncoding = 4

def pdf2txt():
    for filename in sys.argv[1:]:   
        inputfile =filename.decode('utf-8')
        shortName = os.path.splitext(filename)[0]
        outputfile = shortName+" text.txt"
        pdfURL = NSURL.fileURLWithPath_(inputfile)
        pdfDoc = PDFDocument.alloc().initWithURL_(pdfURL)
        if pdfDoc :
            pdfString = NSString.stringWithString_(pdfDoc.string())
            pdfString.writeToFile_atomically_encoding_error_(outputfile, True, NSUTF8StringEncoding, None)

if __name__ == "__main__":
   pdf2txt()