What algorithm does git use to detect changes on your working tree?

Git’s index maintains timestamps of when git last wrote each file into the working tree (and updates these whenever files are cached from the working tree or from a commit). You can see the metadata with git ls-files --debug. In addition to the timestamp, it records the size, inode, and other information from lstat to reduce the chance of a false positive.

When you perform git-status, it simply calls lstat on every file in the working tree and compares the metadata in order to quickly determine which files are unchanged. This is described in the documentation under racy-git and update-index.


On a unix file-system, the file-info is tracked and can be accesed using lstat method. The stat structure contains multiple time-stamps, size information, and more:

struct stat {
    dev_t     st_dev;     /* ID of device containing file */
    ino_t     st_ino;     /* inode number */
    mode_t    st_mode;    /* protection */
    nlink_t   st_nlink;   /* number of hard links */
    uid_t     st_uid;     /* user ID of owner */
    gid_t     st_gid;     /* group ID of owner */
    dev_t     st_rdev;    /* device ID (if special file) */
    off_t     st_size;    /* total size, in bytes */
    blksize_t st_blksize; /* blocksize for file system I/O */
    blkcnt_t  st_blocks;  /* number of 512B blocks allocated */
    time_t    st_atime;   /* time of last access */
    time_t    st_mtime;   /* time of last modification */
    time_t    st_ctime;   /* time of last status change */
};

It seems that initially Git simply relied on this stat structure to decide if a file had been changed (see reference):

When checking if they differ, Git first runs lstat(2) on the files and compares the result with this information

However, a race condition was reported (racy-git) that found if a file was modified in the following manner:

: modify 'foo'
$ git update-index 'foo'
: modify 'foo' again, in-place, without changing its size 
                      (And quickly enough to not change it's timestamps)

This left the file in a state that was modified but not detectable by lstat.

To fix this issue, now in such situations where lstat state is ambiguous, Git compares the contents of the files to determine if it has been changed.


NOTE:

If anyone is confused, like I was, about st_mtime description, which states that it is updated by writes "of more than zero bytes," this means absolute change.

For example, in the case of a text file file with a single character A: if A is changed to B there is 0 net change in total byte size, but the st_mtime will still be updated (had to try it myself to verify, use ls -l to see timestamp).