How to convert all pdf files to text (within a folder) with one command?
The following will convert all files in the current directory:
for file in *.pdf; do pdftotext "$file" "$file.txt"; done
ls *.pdf | xargs -n1 pdftotext
xargs
is often a quick solution for running the same command multiple times with just a small change each time. The -n1
option makes sure that only one pdf file is passed to pdftotext at a time.
Edit: If you're worried about spaces in filenames and such, you can use this alternative:
find . -name '*.pdf' -print0 | xargs -0 -n1 pdftotext
write a bash script
for f in *.pdf; do
pdftotext "$f"
done
or type it in a one-line command as follows:
for f in *.pdf; do pdftotext "$f"; done
I hope this helps. I do not have a large group of .pdfs to test this on, but I use this strategy to convert my .flac files to .ogg files.
I have to thank first to Sam and to Ryan Thompson as well to all other answerers - for my answer here is nothing but a variation relating to the possibility of adding their solutions to Thunar's custom actions:
so, as any terminal command, a command to convert to text all pdf files within a folder can be put in the list of custom actions in Thunar file manager
The command there is find . -name '*.pdf' -print0 | xargs -0 -n1 pdftotext
, (comming from Ryan Thompson) it is the one I prefer to use, but it has a nasty turn... see below...
...it is a funny command, to be used with care: it is made to convert to text all pdf within the folder where it is fired, so, if it is fired by mistake in the home folder, it will have some unwanted effects: all your pdfs will be converted to text!
(I tested it like this: created a folder called "test" on the desktop and in it a pdf file and a series of folders within folders (/Desktop/test/a/b/c/e/f/g/h/i
) each containing the same pdf. Running that command in /Desktop/test
has converted all pdfs down to that in "i" folder.)
(I would welcome comments on how to adjust this command so as to avoid that risk.)
Replacing that with the other one (for file in *.pdf; do pdftotext "$file" "$file.txt"; done
) coming from Sam, the problem is avoided.
But in certain cases one might wish exactly what Ryan's solution does!