Ways to improve git status performance

I have a repo of 10 GB on a Linux machine which is on NFS. The first time git status takes 36 minutes and subsequent git status takes 8 minutes. Seems Git depends on the OS for caching files. Only the first git commands like commit, status that involves pack/repack the whole repo takes a very long time for a huge repo. I am not sure if you have used git status on such a large repo, but has anyone come across this issue?

I have tried git gc, git clean, git repack but the time taken is still/almost the same.

Will sub-modules or any other concepts like breaking the repo into smaller ones help? If so which is the best for splitting a larger repo. Is there any other way to improve time taken for git commands on a large repo?


Solution 1:

To be more precise, git depends on the efficiency of the lstat(2) system call, so tweaking your client’s “attribute cache timeout” might do the trick.

The manual for git-update-index — essentially a manual mode for git-status — describes what you can do to alleviate this, by using the --assume-unchanged flag to suppress its normal behavior and manually update the paths that you have changed. You might even program your editor to unset this flag every time you save a file.

The alternative, as you suggest, is to reduce the size of your checkout (the size of the packfiles doesn’t really come into play here). The options are a sparse checkout, submodules, or Google’s repo tool.

(There’s a mailing list thread about using Git with NFS, but it doesn’t answer many questions.)

Solution 2:

I'm also seeing this problem on a large project shared over NFS.

It took me some time to discover the flag -uno that can be given to both git commit and git status.

What this flag does is to disable looking for untracked files. This reduces the number of nfs operations significantly. The reason is that in order for git to discover untracked files it has to look in all subdirectories so if you have many subdirectories this will hurt you. By disabling git from looking for untracked files you eliminate all these NFS operations.

Combine this with the core.preloadindex flag and you can get resonable perfomance even on NFS.

Solution 3:

Try git gc. Also, git clean may help.

UPDATE - Not sure where the down vote came from, but the git manual specifically states:

Runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance) and removing unreachable objects which may have been created from prior invocations of git add.

Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.

I always notice a difference after running git gc when git status is slow!

UPDATE II - Not sure how I missed this, but the OP already tried git gc and git clean. I swear that wasn't originally there, but I don't see any changes in the edits. Sorry for that!