How to merge duplicate folders with "name (1)", "name (1) (1)" etc. structure
This is the approach I would try in Linux. I have no experience with Google Filestream, Google Drive nor Synology CloudSync, so I cannot tell if the solution can be applied at all. Still I hope this will at least give you some ideas.
Assumptions
- you can mount the share in your directory tree, so
mv
,cp
and other sane tools can work with directories as if they were local; - files (or directories) with paths that become identical after you remove all
(N)
strings are in fact instances of the same file (directory); - instances of the same file should leave just one file;
- instances of the same directory should merge their content in a single directory;
- you can use all the tools I use here.
Procedure
Please read the entire answer before attempting to do anything.
I think some steps could be written as a script, but since the solution is highly experimental, it's better to do it by hand, step by step, paying attention what happens.
-
In a shell
cd
to the mountpoint and invokefind . | vidir -
; use a text editor of your choice, e.g.kate
, like this:find . | EDITOR=kate vidir
This will open the editor with a list of all objects, each one with its own number in front. When you alter the content and save the (temporary) file and close the editor, all the changes are applied. In general this is what you can do:
- change paths to move (rename) files or directories;
- delete lines to remove files or directories;
- swap two or more numbers to swap files (you won't need it).
Don't save the file unless you're sure the new content describes the directory tree you want to get.
Copy the content from the editor to another file. The point is to work with it and paste the result back (and save it) only when you're sure you got it right. Next steps refer to the new file unless explicitly stated otherwise.
Use
sed
or any other tool to get rid of all(N)
strings (note the leading space). At this point you should get "clean" paths, many of them will occur more than once (with different numbers given byvidir
).Use
sort -k 2
to sort according to these paths. Thanks to-s
the formerAnalysis
should still precede the formerAnalysis (1)
.Use
uniq -f 1
to drop duplicated paths. Now any path should occur just once.Double check the sanity of the directory structure encoded in the result.
Paste the result into the original editor, save the file and exit the editor.
vidir
will remove objects associated with missing numbers and move objects associated with numbers that are left.
Testing
I would first use this solution to replicate the directory structure:
cp -a --attributes-only /mountpoint/ /guinea_pig_dir/
and test the procedure on the resulting empty files. This should reveal problems (if any) and hopefully allow to improve the method.
Possible problems
vidir
refuses to work with some non-standard characters.In general the order of objects is important. There are few pitfalls which generate objects like
foo~
orfoo~1
,foo~2
when there's a collision withfoo
. You will "contract" your directory tree in a way that should generate no collisions, still I haven't investigated all possible scenarios. I really think you should experiment with/guinea_pig_dir/
and see what you get. In case of troubles maybe a cleversort
betweenfind
andvidir
will help.
Below is a bash script that performs this task. It works on e.g. MSYS2 Bash with rsync added. It is taken from this related question here:
Script for deduplicating files and folders with a particular suffix
#!/usr/bin/bash
IFS=$'\n';
set -f
#Go deepest first to deal with copies within copied folders.
for copy in $(find . -regextype posix-egrep -regex "^.*\ \([0-9]+\)\s*(\.[^/.]*)?$" | awk '{print length($0)"\t"$0}' | sort -rnk1 | cut -f2-); do
orig=$(rev <<< "$copy" | sed -E 's/\)[0-9]+\(\ //' | rev)
if [ "$orig" != "$copy" ]; then
if [ -f "$orig" ]; then
if [ -f "$copy" ]; then
echo "File pair: $orig $copy"
if diff -q "$orig" "$copy" &>/dev/null; then
echo "Removing file: $copy"
rm -f "$copy";
fi
fi
fi
if [ -d "$orig" ]; then
if [ -d "$copy" ]; then
echo "Folder pair: $orig $copy"
if rmdir "$copy" &>/dev/null; then
#If the "copy" was an empty directory then we've removed it and so we're done.
echo "Removed empty folder: $copy"
else
#Non-destructively ensure that both folders have the same files at least.
rsync -aHAv --ignore-existing "$orig/" "$copy" &>/dev/null
rsync -aHAv --ignore-existing "$copy/" "$orig" &>/dev/null
if diff -qr "$orig" "$copy" &>/dev/null; then
echo "Removing folder: $copy"
rm -rf "$copy";
fi
fi
fi
fi
fi
done
unset IFS;
set +f