Why is my git repository so big?

145M = .git/objects/pack/

I wrote a script to add up the sizes of differences of each commit and the commit before it going backwards from the tip of each branch. I get 129MB, which is without compression and without accounting for same files across branches and common history among branches.

Git takes all those things into account so I would expect much much smaller repository. So why is .git so big?

I've done:

git fsck --full
git gc --prune=today --aggressive
git repack

To answer about how many files/commits, I have 19 branches about 40 files in each. 287 commits, found using:

git log --oneline --all|wc -l

It should not be taking 10's of megabytes to store information about this.


Some scripts I use:

git-fatfiles

git rev-list --all --objects | \
    sed -n $(git rev-list --objects --all | \
    cut -f1 -d' ' | \
    git cat-file --batch-check | \
    grep blob | \
    sort -n -k 3 | \
    tail -n40 | \
    while read hash type size; do 
         echo -n "-e s/$hash/$size/p ";
    done) | \
    sort -n -k1
...
89076 images/screenshots/properties.png
103472 images/screenshots/signals.png
9434202 video/parasite-intro.avi

If you want more lines, see also Perl version in a neighbouring answer: https://stackoverflow.com/a/45366030/266720

git-eradicate (for video/parasite.avi):

git filter-branch -f  --index-filter \
    'git rm --force --cached --ignore-unmatch video/parasite-intro.avi' \
     -- --all
rm -Rf .git/refs/original && \
    git reflog expire --expire=now --all && \
    git gc --aggressive && \
    git prune

Note: the second script is designed to remove info from Git completely (including all info from reflogs). Use with caution.


I recently pulled the wrong remote repository into the local one (git remote add ... and git remote update). After deleting the unwanted remote ref, branches and tags I still had 1.4GB (!) of wasted space in my repository. I was only able to get rid of this by cloning it with git clone file:///path/to/repository. Note that the file:// makes a world of difference when cloning a local repository - only the referenced objects are copied across, not the whole directory structure.

Edit: Here's Ian's one liner for recreating all branches in the new repo:

d1=#original repo
d2=#new repo (must already exist)
cd $d1
for b in $(git branch | cut -c 3-)
do
    git checkout $b
    x=$(git rev-parse HEAD)
    cd $d2
    git checkout -b $b $x
    cd $d1
done

git gc already does a git repack so there is no sense in manually repacking unless you are going to be passing some special options to it.

The first step is to see whether the majority of space is (as would normally be the case) your object database.

git count-objects -v

This should give a report of how many unpacked objects there are in your repository, how much space they take up, how many pack files you have and how much space they take up.

Ideally, after a repack, you would have no unpacked objects and one pack file but it's perfectly normal to have some objects which aren't directly reference by current branches still present and unpacked.

If you have a single large pack and you want to know what is taking up the space then you can list the objects which make up the pack along with how they are stored.

git verify-pack -v .git/objects/pack/pack-*.idx

Note that verify-pack takes an index file and not the pack file itself. This give a report of every object in the pack, its true size and its packed size as well as information about whether it's been 'deltified' and if so the origin of delta chain.

To see if there are any unusally large objects in your repository you can sort the output numerically on the third of fourth columns (e.g. | sort -k3n).

From this output you will be able to see the contents of any object using the git show command, although it is not possible to see exactly where in the commit history of the repository the object is referenced. If you need to do this, try something from this question.


Just FYI, the biggest reason why you may end up with unwanted objects being kept around is that git maintains a reflog.

The reflog is there to save your butt when you accidentally delete your master branch or somehow otherwise catastrophically damage your repository.

The easiest way to fix this is to truncate your reflogs before compressing (just make sure that you never want to go back to any of the commits in the reflog).

git gc --prune=now --aggressive
git repack

This is different from git gc --prune=today in that it expires the entire reflog immediately.