Copying files: does Windows write to disk if files are identical?

Solution 1:

Robocopy.

Windows cannot differentiate between identical and modified files if you copy using Windows Explorer.

Windows can differentiate between identical and modified files if you copy using Robocopy, which is a file-copy utility included in Windows (Vista, 7, 8.1 and 10).

There's no need to use third-party tools.

You can save this script as a batch file and re-run it whenever you want to perform a backup:

robocopy /e c:\PDFs p:\PDFs

  • Whenever a PDF file is annotated and the changes are saved, both its Last Modified and Size attributes will change in the source folder, triggering an overwrite of the corresponding file in the destination folder the next time Robocopy is invoked.
  • Robocopy compares the Size and Last Modified attributes of all files that exist in both the source and destination. If either attribute is different, then the destination file will be overwritten. Note: these values only need to be different, not necessarily newer or larger; even if a source file has an older Last Modified or smaller Size attribute, the source will still overwrite the destination.
  • the /e (or /s) switch isn't necessary if all of your PDFs are in the root of the folder referenced in the script but if you want to include Subfolders then use /s. If you want to include subfolders and Empty subfolders, then use /e.
  • I would assign the backup drive a letter further along in the alphabet, so there's no risk of another drive being inadvertently assigned the drive letter used in the script, causing the backup to fail at some point in the future. I used P here for PDF.

That simple script is all you need.

Solution 2:

Windows does not do this. It will however prompt you to overwrite files with the same name and you can select manually if you want to do it.

For a easier solution, use FreeFilesSync to compare the folders and overwrite only changed/updated files (Mirror option and select File time and size in Comparison Settings).

Solution 3:

Yes and no! Windows Explorer only checks for metadata (file size, dates etc.).

But you could use a script e.g. powershell (see here) which comes with (most) Windows or 3rd party tools that let you compare/copy files using file checksums e.g. MD5 or SH1 hashing (see here and/or use a search engine).

I myself like to use the software checksum compare (see here), it lets you compare files and directories including file checksums and it works from a USB pen drive.

If you don't need to compare the file's content and if you just want to copy "newer" files you can use any advance copy method like xcopy, robocopy, etc.

Note: the different hashing methods have up and downsides (mainly reliability vs. speed). For me MD5 is more than enough for this type of file comparison, but that's a personal preference. See here for further info on that topic.

Solution 4:

In short: No

Windows doesn't dot that in a straightforward way.

Well, it does, but like everything in Windows it's ambiguous at best. You will be prompted for name conflicts, and depending on your Windows version, you get a more or less understandable dialog with several options to choose from, with an additional note ("Blah blah, different size, newer"). You can then, one by one, choose whether or not to keep the modified file, and you have the option of applying this to all "identical" matches.
Now of course it's Windows, so you have no guarantee that "newer" actually means newer, and you do not know what is "identical" (is it just the name collision, is it the size change, is it the modification date, or is it everything?).

Alternatives

There exist a huge variety of file sync programs, both free and commercial which are somewhat better insofar as they check whether a file has been modified before overwriting it, rsync being the traditional mother-of-all-tools, but also being a tidbid less user-friendly than some people may wish.
However, I do not recommend any of these because they are not fundamentally making things better.

Personally, if you are not afraid of a little commandline (could always make a batch file!) I'd recommend Matt Mahoney's excellent zpaq. This is basically ZIP, except it compresses much better, and it does deduplication on the fly.

How it that better?

Well, checksum-comparing tools are all nice and that. Especially when you go over the network, nothing can beat rsync running on both ends, it's just awesome. But while a typical sync tool will do the job just fine (and better than Explorer) this is not what it's best at.

Writing to an external drive, whether or not you compare checksums, has a couple of things you need to cope with:

  • Access time on the drive (abysmal)
  • Latency over USB or what you use (getting better but still kinda abysmal)
  • Bandwidth (actually pretty good nowadays)
  • Drive writes (and amplifications)
  • Drive reads

In order to compare checksums, you first have to read in the files. Fullstop. Which means that for a couple of thousand files, you pay for the latency of traversing the directory structure, opening files over a high-latency link, and reading the files several thousand times. Plus, transferring them in small units over a high-latency wire. Well, that sucks big time, it is a very expensive process.

Then, you must write the files that have changed, again with several high-latency operations such as opening files, and overwriting data, and again one by one. This sucks twice because not only is it inherently unsafe (you lose the file being overwritten if your cat stumbles over the USB cable) but also with modern shingling harddrives (such as many external drives), it can be excruciatingly slow, down to single-megabyte-per-second if you are unlucky. That, and the latency of thousands of small transfers adding up.
A well-written file copy tool may be able to deal with the safety issue by copying a temporary file, and atomically renaming it afterwards (but this adds even more overhead!).

Now, an archive format like zpaq will create an archive that contains the checksums of the files already, they can be read in quickly and sequentially from one location. It then locally (locally means "on your side of the cable" where you presumably have a reasonably fast disk connected via SATA or M.2 or something) compares checksums, compresses differences, and append-only writes the compressed data sequentially to the existing archive. Yes, this means that the archive will grow a little over time because you carry a whole history around. Alas, get over it, the cost is very moderate thanks to diffing and compression.

This method is faster, and safer at the same time. If you pull the cable mid-operation, your current backup is interrupted (obviously!). But you do not lose your previous data. All transactions that go over "slow" links are strictly sequential, large transfers, which maximizes throughput.