Removing duplicate files, keeping only the newest file

I'm trying to clean up a photo dump folder, in which several files are duplicated but with different filenames or lost in subfolders.

I've looked at tools like rmlint, duff and fdupes, but I can't seem to find a way to have them keep only the file with the most recent timestamp. I suspect I have to postprocess the results, but I don't even know where to start to do this.

Can anyone guide me on how to get the duplicate files list and delete everything but the newest file?


Note that I use the zsh shell.

Try something like the following (untested; based on https://github.com/lipidity/btrfs-fun/blob/master/dedup):

# checksum everything in ${DIR}
cksums=$(mktemp)
find ${DIR} -xdev -type f -print0 | xargs -0 md5sum > $cksums

# loop through each md5 hash found
for hash in $(sort $cksums | uniq -w 32 -d | cut -c 1-32); do
  # list of files with this hash
  files=$(grep $hash $cksums | cut -c 35-)
  f=(${(f)files})
  unset files
  # $f now contains array of files with the same checksum
  # compare the first file to the rest, deleting any that are older
  newest=$f[1]
  for file in $f[2,-1]; do
    # make sure the files are still the same
    cmp $newest $file || continue
    # remove the older file
    if [[ $file -nt $newest ]]; then
      rm $newest
      newest=$file
    else
      rm $file
    fi
  done
done

Untested, but should get you most of the way. Let me know if anything needs further explanation.