Search for duplicate file names within folder hierarchy?

I have a folder called img, this folder has many levels of sub-folders, all of which containing images. I am going to import them into an image server.

Normally images (or any files) can have the same name as long as they are in a different directory path or have a different extension. However, the image server I am importing them into requires all the image names to be unique (even if the extensions are different).

For example the images background.png and background.gif would not be allowed because even though they have different extensions they still have the same file name. Even if they are in separate sub-folders, they still need to be unique.

So I am wondering if I can do a recursive search in the img folder to find a list of files that have the same name (excluding extension).

Is there a command that can do this?


Solution 1:

find . -mindepth 1 -printf '%h %f\n' | sort -t ' ' -k 2,2 | uniq -f 1 --all-repeated=separate | tr ' ' '/'

As the comment states, this will find folders as well. Here is the command to restrict it to files:

find . -mindepth 1 -type f -printf '%p %f\n' | sort -t ' ' -k 2,2 | uniq -f 1 --all-repeated=separate | cut -d' ' -f1

Solution 2:

FSlint Install fslint is a versatile duplicate finder that includes a function for finding duplicate names:

FSlint

The FSlint package for Ubuntu emphasizes the graphical interface, but as is explained in the FSlint FAQ a command-line interface is available via the programs in /usr/share/fslint/fslint/. Use the --help option for documentation, e.g.:

$ /usr/share/fslint/fslint/fslint --help
File system lint.
A collection of utilities to find lint on a filesystem.
To get more info on each utility run 'util --help'.

findup -- find DUPlicate files
findnl -- find Name Lint (problems with filenames)
findu8 -- find filenames with invalid utf8 encoding
findbl -- find Bad Links (various problems with symlinks)
findsn -- find Same Name (problems with clashing names)
finded -- find Empty Directories
findid -- find files with dead user IDs
findns -- find Non Stripped executables
findrs -- find Redundant Whitespace in files
findtf -- find Temporary Files
findul -- find possibly Unused Libraries
zipdir -- Reclaim wasted space in ext2 directory entries
$ /usr/share/fslint/fslint/findsn --help
find (files) with duplicate or conflicting names.
Usage: findsn [-A -c -C] [[-r] [-f] paths(s) ...]

If no arguments are supplied the $PATH is searched for any redundant
or conflicting files.

-A reports all aliases (soft and hard links) to files.
If no path(s) specified then the $PATH is searched.

If only path(s) specified then they are checked for duplicate named
files. You can qualify this with -C to ignore case in this search.
Qualifying with -c is more restictive as only files (or directories)
in the same directory whose names differ only in case are reported.
I.E. -c will flag files & directories that will conflict if transfered
to a case insensitive file system. Note if -c or -C specified and
no path(s) specifed the current directory is assumed.

Example usage:

$ /usr/share/fslint/fslint/findsn /usr/share/icons/ > icons-with-duplicate-names.txt
$ head icons-with-duplicate-names.txt 
-rw-r--r-- 1 root root    683 2011-04-15 10:31 Humanity-Dark/AUTHORS
-rw-r--r-- 1 root root    683 2011-04-15 10:31 Humanity/AUTHORS
-rw-r--r-- 1 root root  17992 2011-04-15 10:31 Humanity-Dark/COPYING
-rw-r--r-- 1 root root  17992 2011-04-15 10:31 Humanity/COPYING
-rw-r--r-- 1 root root   4776 2011-03-29 08:57 Faenza/apps/16/DC++.xpm
-rw-r--r-- 1 root root   3816 2011-03-29 08:57 Faenza/apps/22/DC++.xpm
-rw-r--r-- 1 root root   4008 2011-03-29 08:57 Faenza/apps/24/DC++.xpm
-rw-r--r-- 1 root root   4456 2011-03-29 08:57 Faenza/apps/32/DC++.xpm
-rw-r--r-- 1 root root   7336 2011-03-29 08:57 Faenza/apps/48/DC++.xpm
-rw-r--r-- 1 root root    918 2011-03-29 09:03 Faenza/apps/16/Thunar.png

Solution 3:

Save this to a file named duplicates.py

#!/usr/bin/env python

# Syntax: duplicates.py DIRECTORY

import os, sys

top = sys.argv[1]
d = {}

for root, dirs, files in os.walk(top, topdown=False):
    for name in files:
        fn = os.path.join(root, name)
        basename, extension = os.path.splitext(name)

        basename = basename.lower() # ignore case

        if basename in d:
            print(d[basename])
            print(fn)
        else:
            d[basename] = fn

Then make the file executable:

chmod +x duplicates.py

Run in e.g. like this:

./duplicates.py ~/images

It should output pairs of files that have the same basename(1). Written in python, you should be able to modify it.