Finding and Renaming Mislabeled Duplicates

In this situation, renaming files is not the way to go!

As you stated in the OP, "There's almost 500,000 files, so doing this manually is not practical.", and as mention in my comment to the OP "I'd recommend checking out Gemini 2 which you can download a copy for free and run it to see what it finds. If you like the results you can consider purchasing it."... But I really didn't clarify why I mentioned an app that detects and deletes duplicates.

1. Make a new and current Time Machine backup of the e.g. Macintosh HD.
2. Use e.g., Gemini 2 to delete all duplicates from the recovered files.

What, delete all the duplicates!? ... Yes, with a current Time Machine backup of the e.g., Macintosh HD, why do you need to keep any duplicates!? You don't, as you'd already have properly named duplicates, and modified copies as applicable, of the files on the Macintosh HD on the Time Machine backup. Therefore, does it really make sense to waste time to rename any duplicates of the recovered files? I say No!

Now the files that remain in the recovered files are unique and you can decide from there how you want to proceed. You'll certainly have a much smaller number of files to decide whether or not you want to keep them, but it will not be an easy task to rename them to a meaningful name, as they are after all unique and do not have a meaningful name.

The files that do remain from the recovered files could be broken down into two groups, OS/Apps and User Data, the latter of which has more importance than the former. It should be relatively easy to create the two groups and get rid of the OS/Apps group as most of those files will be older than the current OS and of no real value. The User Data files, some may be files you've deleted from the Macintosh HD over time and the others may be modified copies of the deleted files and modified copies of the current files prior to the mishap. These are the ones you'll want to retain to search through as needed.

The most important thing in this situation is to have a current Time Machine backup of the e.g. Macintosh HD, and isolating unique User Data files within the recovered files, and is really the only point of sifting through the recovered files! Not messing around wasting time renaming files you already have on both the e.g. Macintosh HD and the Time Machine backup.


Note that I am a licensed user of the Gemini 2 app, but have no other affiliation with the developer MacPaw.


Here is a solution that I have tested several times and it seems to work. I'm sure someone with more experience can offer a solution which is much quicker than mine, but so far this is the best I can come up with.

I strongly suggest making a copy of your "recovered" files with generic file naming. Then I would run the following code on your copied files.

This AppleScript code works for me using the latest version of macOS Mojave.

Just paste this AppleScript code into Script Editor.app and you can run the code from there.

set genericFolder to (choose folder with prompt "Choose The Parent Folder Containing The Files To Be Renamed")
set sourceFolder to (choose folder with prompt "Choose The Parent Folder Containing The Files With Proper Names")

tell application "Finder"
    set genericFiles to files of entire contents of genericFolder as alias list
    set sourceFiles to files of entire contents of sourceFolder as alias list
end tell

repeat with genericFile in genericFiles
    repeat with sourceFile in sourceFiles
        try
            set isIdentical to (last word of (do shell script "diff -s " & quoted form of POSIX path of genericFile & " " & quoted form of POSIX path of sourceFile) is "identical") as boolean
            if isIdentical is true then
                tell application "System Events" to set name of genericFile to name of sourceFile
            end if
        end try
    end repeat
end repeat

This image is the folder and file structure before running the code. The source files are on the left image and the files to be renamed are on the right image. Take notice that the folder and file structures are different in both images.

enter image description here

This next image is the result after running the code

enter image description here

This solution was tested with only 38 files with generic names. On my system it took 1 minute and 33 seconds to process and rename the files accordingly. Without doing any actual math, if you are trying to process 500,000 files with this code, you could be looking at the better part of a full day for the process to complete. You may want to consider breaking the process down to processing a few folders at time, rather than the full mother load all at once.

However, if you are only looking to rename files with the name extension ".doc" , you can replace this code from above...

tell application "Finder"
    set genericFiles to files of entire contents of genericFolder as alias list
    set sourceFiles to files of entire contents of sourceFolder as alias list
end tell

With this code instead, which will make the process much much quicker.

tell application "Finder"
    set genericFiles to ((files of entire contents of genericFolder) whose name extension is "doc") as alias list
    set sourceFiles to ((files of entire contents of sourceFolder) whose name extension is "doc") as alias list
end tell

You can also have the code process only files with name extensions that you define, as in this following example.

set nameExtensions to {"doc", "pdf"}

tell application "Finder"
    set genericFiles to ((files of entire contents of genericFolder) whose name extension is in nameExtensions) as alias list
    set sourceFiles to ((files of entire contents of sourceFolder) whose name extension is in nameExtensions) as alias list
end tell