How to remove unreferenced blobs from my Git repository
I have a GitHub repository that had two branches - master and release.
The release branch contained binary distribution files that were contributing to a very large repository size (more than 250 MB), so I decided to clean things up.
First I deleted the remote release branch, via git push origin :release
.
Then I deleted the local release branch. First I tried git branch -d release
, but Git said "error: The branch 'release' is not an ancestor of your current HEAD." which is true, so then I did git branch -D release
to force it to be deleted.
But my repository size, both locally and on GitHub, was still huge. So then I ran through the usual list of Git commands, like git gc --prune=today --aggressive
, without any luck.
By following Charles Bailey's instructions at SO 1029969 I was able to get a list of SHA-1 hashes for the biggest blobs. I then used the script from SO 460331 to find the blobs...and the five biggest don't exist, though smaller blobs are found, so I know the script is working.
I think these blogs are the binaries from the release branch, and they somehow got left around after the delete of that branch. What's the right way to get rid of them?
I present to you this useful command, "git-gc-all", guaranteed to remove all your Git garbage until they might come up extra configuration variables:
git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc
You might also need to run something like these first:
git remote rm origin
rm -rf .git/refs/original/ .git/refs/remotes/ .git/*_HEAD .git/logs/
git for-each-ref --format="%(refname)" refs/original/ | xargs -n1 --no-run-if-empty git update-ref -d
You might also need to remove some tags[1]:
git tag | xargs git tag -d
I put all this in a script: git-gc-all-ferocious.
[1]. Credit: Zitrax' comment
You can (as detailed in this answer) permanently remove everything that is referenced only in the reflog.
NOTE: This will remove many objects you might want to keep: Stashes; Old history not in any current branches; etc. Read the documentation to be sure this is what you want.
To expire the reflog, and then prune all objects not in branches:
git reflog expire --expire-unreachable=now --all
git gc --prune=now
git reflog expire --expire-unreachable=now --all
removes all references of unreachable commits in reflog
.
git gc --prune=now
removes the commits themselves.
Attention: Only using git gc --prune=now
will not work since those commits are still referenced in the reflog. Therefore, clearing the reflog is mandatory. Also note that if you use rerere
it has additional references not cleared by these commands. See git help rerere
for more details. In addition, any commits referenced by local or remote branches or tags will not be removed because those are considered as valuable data by git.