Finding most changed files in Git

Solution 1:

You could do something like the following:

git log --pretty=format: --name-only | sort | uniq -c | sort -rg | head -10

The log just outputs the names of the files that have been changed in each commit, while the rest of it just sorts and outputs the top 10 most frequently appearing filenames.

Solution 2:

you can use the git effort (from the git-extras package) command which shows statistics about how many commits per files (by commits and active days).

EDIT: git effort is just a bash script you can find here and adapt to your needs if you need something more special.

Solution 3:

I noticed that both Mark’s and sehe’s answers do not --follow the files, that is to say they stop once they reach a file rename. This script will be much slower, but will work for that purpose.

git ls-files |
while read aa
do
  printf . >&2
  set $(git log --follow --oneline "$aa" | wc)
  printf '%s\t%s\n' $1 "$aa"
done > bb
echo
sort -nr bb
rm bb

Solution 4:

Old question, but I think still a very useful question. Here is a working example in straight powershell. This will get the top 10 most changed files in your repo with respect to the branch you are on.

git log --pretty=format: --name-only | Where-Object { ![string]::IsNullOrEmpty($_) } | Sort-Object | Group-Object  | Sort-Object -Property Count -Descending | Select-Object -Property Count, Name -First 10

Solution 5:

This is a windows version

git log --pretty=format: --name-only  > allfiles.csv

then open in excel

A1: FileName
A2: isVisibleFilename  >> =IFERROR(IF(C2>0,TRUE,FALSE),FALSE)
A3: DotLocation >> =FIND("@",SUBSTITUTE(A2,".","@",(LEN(A2)-LEN(SUBSTITUTE(A2,".","")))/LEN(".")))
A4: HasExt       >> =C2>1
A5: TYPE        >> =IF(D2=TRUE,MID(A2,C2+1,18),"")

create pivot table

values: Type
  Filter: isFilename = true
  Rows : Type
  Sub : FileName

click [Count Of TYPE] -> Sort -> Sort Largest To Smallest