How to merge duplicate folders with "name (1)", "name (1) (1)" etc. structure

This is the approach I would try in Linux. I have no experience with Google Filestream, Google Drive nor Synology CloudSync, so I cannot tell if the solution can be applied at all. Still I hope this will at least give you some ideas.


Assumptions

  • you can mount the share in your directory tree, so mv, cp and other sane tools can work with directories as if they were local;
  • files (or directories) with paths that become identical after you remove all (N) strings are in fact instances of the same file (directory);
  • instances of the same file should leave just one file;
  • instances of the same directory should merge their content in a single directory;
  • you can use all the tools I use here.

Procedure

Please read the entire answer before attempting to do anything.

I think some steps could be written as a script, but since the solution is highly experimental, it's better to do it by hand, step by step, paying attention what happens.

  1. In a shell cd to the mountpoint and invoke find . | vidir -; use a text editor of your choice, e.g. kate, like this:

    find . | EDITOR=kate vidir
    

    This will open the editor with a list of all objects, each one with its own number in front. When you alter the content and save the (temporary) file and close the editor, all the changes are applied. In general this is what you can do:

    • change paths to move (rename) files or directories;
    • delete lines to remove files or directories;
    • swap two or more numbers to swap files (you won't need it).

    Don't save the file unless you're sure the new content describes the directory tree you want to get.

  2. Copy the content from the editor to another file. The point is to work with it and paste the result back (and save it) only when you're sure you got it right. Next steps refer to the new file unless explicitly stated otherwise.

  3. Use sed or any other tool to get rid of all (N) strings (note the leading space). At this point you should get "clean" paths, many of them will occur more than once (with different numbers given by vidir).

  4. Use sort -k 2 to sort according to these paths. Thanks to -s the former Analysis should still precede the former Analysis (1).

  5. Use uniq -f 1 to drop duplicated paths. Now any path should occur just once.

  6. Double check the sanity of the directory structure encoded in the result.

  7. Paste the result into the original editor, save the file and exit the editor. vidir will remove objects associated with missing numbers and move objects associated with numbers that are left.


Testing

I would first use this solution to replicate the directory structure:

cp -a --attributes-only /mountpoint/ /guinea_pig_dir/

and test the procedure on the resulting empty files. This should reveal problems (if any) and hopefully allow to improve the method.


Possible problems

  1. vidir refuses to work with some non-standard characters.

  2. In general the order of objects is important. There are few pitfalls which generate objects like foo~ or foo~1, foo~2 when there's a collision with foo. You will "contract" your directory tree in a way that should generate no collisions, still I haven't investigated all possible scenarios. I really think you should experiment with /guinea_pig_dir/ and see what you get. In case of troubles maybe a clever sort between find and vidir will help.


Below is a bash script that performs this task. It works on e.g. MSYS2 Bash with rsync added. It is taken from this related question here:

Script for deduplicating files and folders with a particular suffix

#!/usr/bin/bash
IFS=$'\n';
set -f
#Go deepest first to deal with copies within copied folders.
for copy in $(find . -regextype posix-egrep -regex "^.*\ \([0-9]+\)\s*(\.[^/.]*)?$" | awk '{print length($0)"\t"$0}' | sort -rnk1 | cut -f2-); do
    orig=$(rev <<< "$copy" | sed -E 's/\)[0-9]+\(\ //' | rev)
    if [ "$orig" != "$copy" ]; then
        if [ -f "$orig" ]; then
            if [ -f "$copy" ]; then
                echo "File pair: $orig $copy"
                if diff -q "$orig" "$copy" &>/dev/null; then
                    echo "Removing file: $copy"
                    rm -f "$copy";
                fi
            fi           
        fi
        if [ -d "$orig" ]; then
            if [ -d "$copy" ]; then
                echo "Folder pair: $orig $copy"
                if rmdir "$copy" &>/dev/null; then
                    #If the "copy" was an empty directory then we've removed it and so we're done.
                    echo "Removed empty folder: $copy"
                else
                    #Non-destructively ensure that both folders have the same files at least.                    
                    rsync -aHAv --ignore-existing "$orig/" "$copy" &>/dev/null
                    rsync -aHAv --ignore-existing "$copy/" "$orig" &>/dev/null
                    if diff -qr "$orig" "$copy" &>/dev/null; then
                        echo "Removing folder: $copy"
                        rm -rf "$copy";
                    fi            
                fi
            fi
        fi
    fi
done
unset IFS;
set +f