Find and delete duplicated files in different disks and directories
Assuming you can use Windows as an OS for the whole process and you don't like Free Duplicate File Finder (never tried it, but found it mentioned here), you could use PowerShell to achieve what you want with relatively little effort. Note: I'm not a real pro at PowerShell, so I'm pretty sure that one could refine my code.
Just open Powershell ISE (or, if you don't have that, use Notepad), copy&paste the following code to it and save the resulting file somewhere as *.ps1.
You also have to change $oldpath
's and $newpath
's values to your directories - just put your paths between the quotes.
# Search-and-Destroy-script
# Get all files of both code-directories:
$oldpath = "Disk1:\code"
$newpath = "DiskNew:\code"
$files_old = Get-ChildItem -Path $oldpath -Recurse -File
$files_new = Get-ChildItem -Path $newpath -Recurse -File
for($i=0; $i -lt $files_old.length; $i++){
$j=0
while($true){
# if last edit time is the same and file-size is the same...
if($($files_old[$i]).length -eq $($files_new[$j]).length -and $($files_old[$i]).lastWriteTime -eq $($files_new[$j]).lastWriteTime){
# Get File-Hashes for those files (SHA1 should be enough)
$files_old_hash = Get-FileHash -Path $($files_old[$i]).FullName -Algorithm SHA1 | ForEach-Object {$_.Hash}
$files_new_hash = Get-FileHash -Path $($files_new[$j]).FullName -Algorithm SHA1 | ForEach-Object {$_.Hash}
# if hashes also are the same...
if($files_old_hash -eq $files_new_hash){
# remove the old file (-Confirm can be removed so you don't have to approve for every file)
# if you want to check the files before deletion, you could also just rename them (here we're adding the suffix ".DUPLICATE"
# Rename-Item -Path $($files_old[$i]).FullName -NewName "$($files_old[$i]).Name.DUPLICATE"
Remove-Item -Path $($files_old[$i]).FullName -Confirm
Write-Host "DELETING`t$($files_old[$i]).FullName" -ForegroundColor Red
break
}
# if files aren't the same...
}else{
# if old_file is compared to all new_files, check next old file
if($j -ge $files_new.length){
break
}
}
$j++
}
}
Then start the script (via right-click, for example) - if that fails, make sure your ExecutionPolicy
is set (https://superuser.com/a/106363/703240).
I use an almost identical script to check for files that were already copied (but possibly with changed names). This code assumes that only the names of the files are different, but not the content. The last edit time usually stays the same even after copying a file to a new path - unlike the creation time. If the content is different, my solution fails badly - you could use different unique attributes of files (but which?) or state that e.g. only files tat are smaller or older (considering the edit-time, again) than the new files should be deleted.
What the script does:
- Getting all files in the specified folders (and their subfolders)
- getting first old file (specified by $i)...
- comparing its last-edit-time and its file size with that of the first new file (specified by $j)...
- ...if they are equal, it calculates a file-hash to be sure that it is definitely the same file (arguably, this could be a bit too much effort for your goal)
- if hashes are equal, the old file gets deleted (and it will write which file in the terminal), then starting again at 2. with the next old file...
- if hashes are not equal (or last edit times don't equal or file-sizes don't equal) it starts again at 3. with the next new file.