git hard links - does it know a file is a hard link?

Solution 1:

Multiply linked tracked files will not cause Git’s object store to grow much since each link will be represented by the exact same blob object. Your working tree, however, might end up growing due to broken links.

Git does not track whether tracked, working tree files are hard links to the same file.

Git will leave multiply linked, tracked, working tree files alone if you do not ask it to do anything that would involve modifying the content at those pathnames or deleting the pathnames’ directory entries. But, if you were to (e.g.) checkout an old commit or branch and then switch back to your normal, most recent branch/commit, then Git will end up “breaking” the hard links (replacing the affected pathnames with new (but identical) files instead of recreating your multiply linked situation).

To recover your multiply linked status you could write a program to scan for identical files and relink them to any one of the files. Such a “relink” operation may be more complicated if all the links are not in the working tree itself or, at least, not in some easily identifiable “external” location (i.e. it will probably be difficult to recover the links if you are linking “random” files from all over your home directory into a “backup” repository and using Git to modify the working tree).

The idea has come up on the Git mailing list:

  • wanting Git to break links so that repositories cloned with cp -a are independent
  • proposed core.keepHardLinks (never integrated into any released Git?)
  • a similar question to this one
  • probably others…