How do I compare two folders recursively and generate a list of files and folders that are different?
tl;dr and an example
I'm looking for a way to compare two folders recursively and output the relative paths all files (and folders) that are different (by size or by timestamp, à la rsync).
For example, say I have
C:\source\foo\a.txt
C:\source\foo\bar\b.txt
C:\source\foo\bar\c.txt
and
C:\target\foo\a.txt
C:\target\foo\bar\b.txt
C:\target\foo\bar\d.txt
C:\target\foo\baz\
and suppose b.txt
has been changed under C:\source
, and is thus newer.
Then given a magical script or command, say, magic C:\source C:\target
, I'd like the output to be
foo\bar\b.txt
Or, a full path on either the source or the target folder would be acceptable too:
C:\source\foo\bar\b.txt
As the example shows, I don't care about files and folders that have been deleted or created! Which should make this task much simpler than otherwise.
What I know already...
I'm a UNIX dev myself, and wouldn't be asking if this were a UNIX system we're dealing with, but alas. Also, this is for a custom nightly backup solution, where reliability and data integrity is a priority, so given that a few weeks ago I couldn't even figure out a for-loop in a batch script, I'm pretty sure I lack the experience to do this right, or even determine the best way to do this.
Reading http://www.howtoforge.com/backing-up-with-rsync-and-managing-previous-versions-history, I learned that rsync can do something like what I'm after, using options like
--dry-run # don't actually rsync (touch) any files
--itemize-changes # list changes rsync _would_ have made
--out-format="%i|%n|" # define an output format for the list of changes
However, I'd hate to rely on Cygwin (cwRsync) to use rsync, as I'm already prone to running quick-and-dirty experiments on my Cygwin installation, often breaking the environment and needing to reinstall Cygwin every few weeks. That kind of opposes the "reliable" part of a nightly backup.
I haven't found any "canonical" tool like rsync in Windows, at least not any that support options like the above. Also, I'm not looking for software in general unless it's a simple and compact tool for specifically this purpose—I prefer a transparent, programmatic solution. For something as important as backing up files, relying on software or code I can't see or understand is scary!
Recap
I can't wrap my head around batch scripting syntax. Next I'll try PowerShell. But what would you do, given this task?—Is there some obvious route that I'm missing?
Solution 1:
@Glytzhkof recommended Robocopy in his answer, and it suited my needs perfectly.
tl;dr
C:\>robocopy.exe source target /l /e /zb /xx /xl /fp /ns /nc /ndl /np /njh /njs
C:\source\foo\bar\b.txt
Details & Explanation of Options
Robocopy (Wikipedia) seems widely adopted for Windows system administration; is well-documented (TechNet); is discussed as more than an obscurity on Stack Overflow, Server Fault, and of course, here at Super User; provides for a specific function rather than trying to be a multi-purpose tool (which tend toward bloat and bugs); and furthermore has been providing this specific function since 1997. For me, all these factors contribute to "transparency," despite it being closed-source, and set my mind at ease.
Robocopy comes as part of a set of tools currently known as Windows Server 2003 Resource Kit Tools. After downloading and installing, I recreated the scenario in my question and gave it a go:
C:\>robocopy.exe source target /l /e /zb
-------------------------------------------------------------------------------
ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : Thu May 01 09:08:20 2014
Source : C:\source\
Dest : C:\target\
Files : *.*
Options : *.* /L /S /E /COPY:DAT /ZB /R:1000000 /W:30
------------------------------------------------------------------------------
0 C:\source\
1 C:\source\foo\
*EXTRA Dir -1 C:\target\foo\baz\
2 C:\source\foo\bar\
*EXTRA File 1 d.txt
Newer 5 b.txt
New File 1 c.txt
------------------------------------------------------------------------------
Total Copied Skipped Mismatch FAILED Extras
Dirs : 3 0 3 0 0 1
Files : 3 2 1 0 0 1
Bytes : 7 6 1 0 0 1
Times : 0:00:00 0:00:00 0:00:00 0:00:00
Ended : Thu May 01 09:08:20 2014
Looks good! Let me explain the options:
-
/l
lists actions without actually carrying them out. -
/e
includes subdirectories, but unlike/s
, includes empty directories too. -
/zb
copies in "restart" mode, and on access denied, "backup" mode; it seems like the safest approach; read more here.
I didn't need any of the copy-related options since I'm not actually performing any actions.
Anyway, next, it was only a matter of adding more switches to get the output I desired:
C:\>robocopy.exe source target /l /e /zb /xx /xl /fp /ns /nc /ndl /np /njh /njs
C:\source\foo\bar\b.txt
Again, let's go through the options.
First, I only cared about modified files and folders, so:
-
/xx
excludes "extra" files and directories—those which exist only in the target. -
/xl
excludes "lonely" files and directories—those which exist only in the source.
Second, I desired relative paths (or at least full paths, not just names):
-
/fp
enables full paths (unsurprisingly, there was no option for relative paths).
Third, I wanted to remove as much logging fluff as possible, and I was pleasantly surprised to find that all of it was removable:
-
/ns
suppresses file sizes. -
/nc
suppresses classes, e.g.Newer
. -
/ndl
suppresses directory names. -
/np
suppresses copy progress output. -
/njh
suppresses the job header. -
/njs
suppresses the job summary.
And there you have it!
For my purposes (creating versioned backups of changed files), I realized I'd actually like to have the timestamp of each modified file, too. Simply add /ts
:
C:\>robocopy.exe source target /l /e /zb /xx /xl /fp /ns /nc /ndl /np /njh /njs /ts
2014/05/01 15:20:42 C:\source\foo\bar\b.txt
Solution 2:
I made a custom batch driven backup system once that had a third party tool copying new and changed files to a backup drive nightly. For my life I can't recall what the name of that tool was at this point. I might be able to find it, but not right now.
The best cheap, commercial comparison tool out there is Beyond Compare from http://www.scootersoftware.com/ - it is hands down a brilliant tool. Its usefulness is immediate and a tool all professionals working with files would benefit from every day. Try it. See a screenshot. There is a command line version included.
Other than that Robocopy.exe should be able to accomplish what you want with some patience and testing.
Another tip: to avoid backup disaster I ran the backup script with a low-priviledge account to prevent it from deleting anything if someone would mess with the script, or have any rights at all if someone tries to log on with it. I think I set the account non-interactive or unable to log on interactively or something. Highly recommend this for batch running on Windows. Just thought I'd mention it since you are coming from the world of Unix.