How to scan for duplicate files / images in one direction only?
I am looking for a solution to search for duplicate files / images on my mac but only in one direction: Folders A and B should be compared but I am only interested in files in folder A with duplicates in folder B. If there are any duplicates within B does not matter.
In detail:
On my Mac I have large collection of photos and images. Lets say they are all located in /Users/JustMe/PhotoCollection/
or its sub folders.
Within these folders there same known duplicates which where created intentionally. For example I have one folder Summer Vacation
and a sub folder which holds all vacation photos which should be used for a photo book. Thus I am not interested in finding these duplicates.
When I import photos from my camera it can happen that some of them have been imported before. Thus I would like to compare the import folder to the existing photos folder and find only duplicates
- within the import folder (both files are located in the import folder)
- or one version within the import folder and one within the existing photos folder
I am not interested in duplicates within the existing photos.
I tried some known duplicate scanners like Gemini, Cisdem, etc. They all allow to select multiple folders to compare but they don't seem to have an oneway option.
Can this be solved with macOS features or is there any tools which offers this option?
EDIT:
As requested here is an example:
/Users/JustMe/PhotoCollection/
SubfolderA/
Image1.jpg
SubfolderB/
ImageA.jpg
ImageB.jpg
Selection/
ImageA.jpg
SubfolderC/
...
/Users/JustMe/PhotoImport/
NewImage1.jpg
NewImage2.jpg
NewImage2_Copy.jpg
ImageA.jpg
Potential duplicates:
/Users/JustMe/PhotoCollection/SubfolderB/ImageA.jpg
/Users/JustMe/PhotoCollection/SubfolderB/Selection/ImageA.jpg
/Users/JustMe/PhotoImport/NewImage2.jpg
/Users/JustMe/PhotoImport/NewImage2_Copy.jpg
/Users/JustMe/PhotoImport/ImageA.jpg
/Users/JustMe/PhotoCollection/SubfolderB/ImageA.jpg
The first duplicate is within the existing collection and is a known duplicates which should not be deleted / scanned for.
The second duplicate is within the import folder and should be deleted
The 3rd duplicate has one version in the import folder and one in the existing collection. The import version should be deleted.
All duplicate scanners I have tests do not only test the file name but also the capture date, other raw data etc. to make sure files are equal. Additionally they can scan for similar pictures which are not 100% identical but almost the same.
Solution 1:
It looks like you're into photography and you haven't been keen for best practices for managing files. You have two options now.
1) You'd have to create (bash OR python script) which would look for MD5 checksum of the files and would delete only it finds duplicate with later date. Keep in mind, one careless step can delete the files you didn't want to delete.
2) (Suggested) There are several free and paid utilities that should be used to cleanup the duplicates. If it provides duplicates with date comparison. As you already mentioned (Gemini). I tested it few months ago and found it very useful.
There are several proven practices to manage the photo library. This is a good time to adapt one.
As you requested, here is a script which find duplicates in one folder.
#!/usr/bin/env python
# Syntax: duplicates.py DIRECTORY
import os, sys
top = sys.argv[1]
d = {}
for root, dirs, files in os.walk(top, topdown=False):
for name in files:
fn = os.path.join(root, name)
basename, extension = os.path.splitext(name)
basename = basename.lower() # ignore case
if basename in d:
print(d[basename])
print(fn)
else:
d[basename] = fn
Save this file as duplicates.py and give it rights and then execute it on the folder.
./duplicates.py Images
It requires you to have an understanding for the osx terminal.