Batch merging PDF files based on file name
I'm trying to find a program/script that can merge files based on the filename. Files are in 1 folder (output from PDF24 print to PDF) and names as the following example:
File name layout: YYYY-MM-DD HH-MM-SS file name.pdf
Examples:
2021-05-31 11-12-13 Microsoft Outlook - Memo Style.pdf
2021-05-31 11-12-15 Some another filename - string.pdf
2021-05-31 11-12-18 Some another filename - string.pdf
2021-05-31 11-12-25 Some another filename - string.pdf
2021-05-31 11-12-45 Some another filename - string.pdf
2021-05-31 11-13-21 Microsoft Outlook - Memo Style.pdf
What I want is that the program looks at the filename, and takes every file from 'Microsoft Outlook' (including Outlook) until the next 'Microsoft Outlook' (excluding) and merges them.
What I'm doing is I'm printing an Outlook file and (some of) its attachments, and I want to merge them, so every PDF file is a mail on page 1, and has its attachments on the following pages.
Requirements:
- Important here is that the attachments are in the correct order, ie ordered by date, oldest first
- I want the PDF files to be split by mail. Every PDF file after merging is 1 file and its attachments
- I actually prefer a manual script because it has to run about once a week or once every few days
- Output should be automatically saved to a folder I choose (a subfolder of the source folder)
- I don't really care about output file names. They can be 001, 002, ... for example
I've looked at PDFtk after finding some questions on here, but it either can't do it, or I don't understand the documentation well enough (a very real possibility).
If anyone can help, it would be greatly appreciated.
PS: Merging every single PDF in 1 giant file is something I can do already, but I'd like them split because I can print and staple them automatically that way. Call me... energy efficient. Merging first and splitting afterward could be a possibility too I guess.
Solution 1:
For future reference, the code ComputerUser121212 posted works perfectly.
I've made it into a batch file, using the following link as help: https://stackoverflow.com/questions/4571244/creating-a-bat-file-for-python-script
I'm not proficient in batch files so my code may be suboptimal, but it works.
@echo on
rem = """
python -x "%~f0" %*
echo some more batch commands
goto :eof
"""
# Anything here is interpreted by Python
import os
files = os.listdir(".")
files.sort()
command_prefix = "pdftk "
command_args = ""
command_end = "cat output output1.pdf"
counter = 0
for file in files:
if ".py" not in file:
if "Microsoft Outlook" in file:
if files.index(file) != 0:
os.system(command_prefix + command_args + command_end)
counter = counter + 1
command_args = '"' + file + '"' + " "
command_end = "cat output output" + str(counter) + ".pdf"
else:
command_args = command_args + '"' + file + '"' + " "
os.system(command_prefix + command_args + command_end)
Paste the code above in a Notepad, save as a .bat file inside the pdf folder, and done.
Solution 2:
The following Python code should do the trick. The code runs on Python 3.9 and uses PDFtk. It assumes the first file in your directory is one that contains "Microsoft Outlook"
import os
files = os.listdir(".")
files.sort()
command_prefix = "pdftk "
command_args = ""
command_end = "cat output output1.pdf"
counter = 0
for file in files:
if ".py" not in file:
if "Microsoft Outlook" in file:
if files.index(file) != 0:
os.system(command_prefix + command_args + command_end)
counter = counter + 1
command_args = '"' + file + '"' + " "
command_end = "cat output output" + str(counter) + ".pdf"
else:
command_args = command_args + '"' + file + '"' + " "
os.system(command_prefix + command_args + command_end)
Save it to a file and run it from within the same directory as your files