How does Time Machine estimate required space for backup?

Much of what happens in Time Machine is undisclosed, but we know how the algorithm works so making an educated guess is possible:

Time Machine stores full versions of changed files on every backup. Not incremental differences. At the outset of a backup it does a search of your filesystem for files that have been modified since the last backup date. This cuts down what it has to look at. Then, for every file on this list, it compares it against the version that was last backed up. If it hasn't actually changed it just stores a pointer to the old version with some new metadata for the inode changes.

You've probably got timestamps changing on lots of files, but not the actual contents. So at the outset of a backup, Time Machine is adding up the size of the all files with changed timestamps and getting 15GB, but once it does the file diffs it turns out to be far less data that's actually changed. Time Machine, in the interests of being expedient about things, is likely making the cleanup space decision upfront and based on the worst-case estimate.

It can help to exclude some often-changing data from Time Machine backups. Things like the browser cache and IMAP mail offline archive (for example the local cache of your Gmail account). You can use the excellent and free TimeTracker tool from CharlesSoft to view the contents of your backups on a Time Machine volume. This might be able to give you an idea of what has changed from backup to backup so you can build a good exclusions list. For my recommendations on excluding things from Time Machine backups see this Ask Different answer.


No, Time Machine does not use the Modified Date timestamps.

Ordinarily, it uses the File System Event Store, a log of changes to the file system. (That's also how it can usually figure out the differences so quickly, since it doesn't have to look at every file and folder on your Mac.)

But in some cases, it does have to do that -- after a failed backup, for example. In that case it still doesn't use the dates; it compares everything on your system to the most recent backup, so will be in the "Preparing" or "Calculating Changes" phase for a while, and there will be a message in your system.log about a "deep traversal" (prior to Lion) or a "deep scan."

See #D4 in Time Machine - Troubleshooting for some of the common culprits, and ways to see what's going on.