How to show top 10 largest items from git history
Solution 1:
To reduce the space used by files, git packs objects stored in the repository into a .pack
file. This pack file contains the actual git objects and the .idx
contains the index used to quickly locate objects within the pack file.
$ git verify-pack -v .git/objects/pack/pack-7b03cc896f31b2441f3a791ef760bd28495697e6.idx
The above command reads the given .idx
file and verifies it with the corresponding pack file. Using -v
you get a verbose output.
The third column in the output is the size of the objects. Using sort -k 3 -n
we are sorting the output numerically using the 3th column (based on size) and with tail -10
we are cutting out the last 10 which are the largest in size.
To get the name of files from their hash:
$ git ls-tree -r HEAD | grep HASH
To get a list of all names:
$ git verify-pack -v .git/objects/pack/pack-1daab5282d01ab18db98e21a985eb2d288f7faa0.idx | sort -k 3 -n | tail | cut -f1 -d' ' | while read i; do git ls-tree -r HEAD | grep "$i"; done
100644 blob 6209b3840fa470a534e670cff93bce698ba60819 .bashrc
100644 blob 1131e7127cb2cf6c1f854f728a1794262cdf85f6 .vimrc
100644 blob a249a5ae9b33553f4484da42a019ed14e5f44e21 .vim/colors/clrs.vim
100644 blob f329f223953827e59954f67ad4d76568b6dd894e .config/openbox/rc.xml
Read more:
$ git verify-pack --help
Unpacking Git packfiles
Git Internals - Packfiles
Git - finding a filename from a SHA1
Solution 2:
Here is another neat solution to this problem using git's ls-tree
sub-command:
$ git ls-tree -rl HEAD | sort -k4 -n | tail | awk '{print $4, $5}' |
numfmt --to=iec-i
4.0Ki .bashrc
4.0Ki .config/conky/conky.conf
4.5Ki .config/rofi/config.rasi
5.4Ki .vim/notes
7.2Ki .config/tint2/tint2rc
7.5Ki .bash_functions
7.5Ki .vimrc
19Ki .vim/colors/clrs.vim
38Ki .config/openbox/rc.xml
63Ki .config/ipfilter.dat
-
-r
to list the files recursively. -
-l
to show object size of blob (file) entries. -
sort -k4 -n
sort numerically based on 4th column. -
tail
cut out the last 10 item. - Using
awk
to only get the 4th and 5th column out out the output.