How to find all modified PDF files
I have a folder of 6000+ PDF files (chapters, articles, etc.). I'm trying to weed out/sort those that I've just downloaded but never annotated. Is there a way to do this? Those PDFs that I've never annotated usually have the same "created" and "modified" dates, so I was thinking those criteria could be used (i.e., look for files whose modified date is later than/not the same as the created date), but I have no idea how to do that.
In other words, I need to be able to find any PDF on my computer that has been modified.
Thank you for any help!
Per info in the OP and comments, this will do as you asked.
In Automator:
- Create a new Workflow.
- Add a Find Finder Items action.
- With settings, e.g., Search (Documents)
- (All) of the following are true
- (Kind) (is) (PDF)
-
Add a Run AppleScript action.
Replace the default code with the following example AppleScript code show further below:
Note: If Skim is not in the /Applications folder, then modify the value of the
skimpdfPathFilename
variable accordingly. You should not need to modify anything else unless you want to set the value of theoffsetInSeconds
variable, e.g.set offsetInSeconds to 60
, to a different value. This variable is used to help find the files that really have been modified since they were created. The granularity differential between thecreation date
andmodification date
when a file is first created can be from 0 seconds to a higher value, which is not a consistent value depending on how the file was created. Make adjustments as you see fit for your use case.
What the Workflow and example AppleScript code does:
- Finds all PDF files in the target folder, including all subfolders.
- This is done with the Find Finder Items action and its output is passed to the
Run AppleScript action.
- This is done with the Find Finder Items action and its output is passed to the
- Creates a list of all PDF files that have been modified after the
creation date
, per the value of theoffsetInSeconds
variable.- This is done in the first
repeat
loop. Files meeting the criteria are stored inmodifiedFilesList
to be used in the nextrepeat
loop.
- This is done in the first
- Creates a list of all files that have annotations made in Skim.
- This is done using
xattr
to get the extended attributes of the target files. If a file has the target extended attributes a flag is set totrue
and if not, set tofalse
. The files flagged astrue
go intoannotatedSkimFilesList
to be used in the nextrepeat
loop.
- This is done using
- Embeds in place the annotations made to the files in Skim.
- Using the
skimpdf
utility within Skim on the files inannotatedSkimFilesList
, annotations are embedded in place. Thus no need to export to a second file, then delete the original and replace it.
- Using the
NOTE: While I have tested this and it works without issue for me, nonetheless do not run this until you are sure you have a proper backup! You should also test the workflow on a small sampling of copied files placed outside of the actual search folder the workflow will be run on after testing is done.
Example AppleScript code:
on run {input, parameters}
set skimpdfPathFilename to "'/Applications/Skim.app/Contents/SharedSupport/skimpdf'"
set offsetInSeconds to 60
set modifiedFilesList to {}
set annotatedSkimFilesList to {}
repeat with i from 1 to count input
set fileInfo to info for item i of input
set cDate to creation date in fileInfo
set mDate to modification date in fileInfo
if mDate > (cDate + offsetInSeconds) then
set end of modifiedFilesList to POSIX path of item i of input
end if
end repeat
repeat with i from 1 to count modifiedFilesList
set withNotes to (do shell script "xattr " & quoted form of item i in modifiedFilesList ¬
& " | [ $(grep -c \".*_notes$\") -ge 1 ] && printf 'true' || printf 'false'") as boolean
if withNotes then
set end of annotatedSkimFilesList to item i in modifiedFilesList
end if
end repeat
repeat with i from 1 to count annotatedSkimFilesList
do shell script skimpdfPathFilename & space & "embed" & space & ¬
quoted form of item i in annotatedSkimFilesList
end repeat
end run
Understanding the do shell script
command in the second repeat
loop:
When a PDF is annotated in Skim and saved, extended attributes are set on the file, e.g.:
$ xattr Filename.pdf
com.apple.FinderInfo
net_sourceforge_skim-app_notes
net_sourceforge_skim-app_rtf_notes
net_sourceforge_skim-app_text_notes
$
The output is piped |
to:
[ $(grep -c \".*_notes$\") -ge 1 ] && printf 'true' || printf 'false'
Which tests the output of grep
counting the occurrences of the pattern and if grep
finds one or more occurrences of the pattern, then the value of the withNotes
variable is set to true
, while being set to false
otherwise.
Note that Skim does have a built-in command line utility, e.g. /Applications/Skim.app/Contents/SharedSupport/skimnotes
that can be used to test if a PDF has annotations made in Skim, however because of its output this utility is better used in an shell script run in Terminal then a do shell script
command, and why I used xattr
and grep
instead.
Note: The example AppleScript code above is just that, and does not include any error handling as may be appropriate/needed/wanted, the onus is upon the user to add any appropriate error handling for any example code presented and or code written by the oneself.
Introduction
Based on your questions and your follow-up comment below it, I think the solution can be as simple as what I am proposing. In addition to @user3439894's well-written comment, I believe you a couple of great choices to accomplish your task.
The Setup
Open the location in
Finder
and navigate to the view options at the top. It looks like this:Now, navigate to the arrangement options/bar at the top, click on them and you should see the following. Be sure to check both
Date Modified
andDate Created
along with any other options you wish to sort by.- Next, sort your list by
Date Modified
, in my case I created the files one after another in consecutive naming order, don't let this fool you. I have changed the file17.pdf
and saved it. As you can see, it jumped to the top of the list. When viewed in deciding order.
Implementation
As all files are now grouped by their Date Modified
, you can drag them in chunks of individually into your droplet (assuming it does in fact function fully as you say it does).
This would cover the second half of your follow-up comment, while @user3439894 has given you essentially what you are looking for in the first half.
I would be interested in following up with you on how things went, whichever option you choose is up to you, they are both alternatives to manually screening the data one by one.