Count number of lines in a git repository
How would I count the total number of lines present in all the files in a git repository?
git ls-files
gives me a list of files tracked by git.
I'm looking for a command to cat
all those files. Something like
git ls-files | [cat all these files] | wc -l
Solution 1:
xargs
will let you cat
all the files together before passing them to wc
, like you asked:
git ls-files | xargs cat | wc -l
But skipping the intermediate cat
gives you more information and is probably better:
git ls-files | xargs wc -l
Solution 2:
git diff --stat 4b825dc642cb6eb9a060e54bf8d69288fbee4904
This shows the differences from the empty tree to your current working tree. Which happens to count all lines in your current working tree.
To get the numbers in your current working tree, do this:
git diff --shortstat `git hash-object -t tree /dev/null`
It will give you a string like 1770 files changed, 166776 insertions(+)
.
Solution 3:
If you want this count because you want to get an idea of the project’s scope, you may prefer the output of CLOC (“Count Lines of Code”), which gives you a breakdown of significant and insignificant lines of code by language.
cloc $(git ls-files)
(This line is equivalent to git ls-files | xargs cloc
. It uses sh
’s $()
command substitution feature.)
Sample output:
20 text files.
20 unique files.
6 files ignored.
http://cloc.sourceforge.net v 1.62 T=0.22 s (62.5 files/s, 2771.2 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Javascript 2 13 111 309
JSON 3 0 0 58
HTML 2 7 12 50
Handlebars 2 0 0 37
CoffeeScript 4 1 4 12
SASS 1 1 1 5
-------------------------------------------------------------------------------
SUM: 14 22 128 471
-------------------------------------------------------------------------------
You will have to install CLOC first. You can probably install cloc
with your package manager – for example, brew install cloc
with Homebrew.
cloc $(git ls-files)
is often an improvement over cloc .
. For example, the above sample output with git ls-files
reports 471 lines of code. For the same project, cloc .
reports a whopping 456,279 lines (and takes six minutes to run), because it searches the dependencies in the Git-ignored node_modules
folder.
Solution 4:
I've encountered batching problems with git ls-files | xargs wc -l
when dealing with large numbers of files, where the line counts will get chunked out into multiple total
lines.
Taking a tip from question Why does the wc utility generate multiple lines with "total"?, I've found the following command to bypass the issue:
wc -l $(git ls-files)
Or if you want to only examine some files, e.g. code:
wc -l $(git ls-files | grep '.*\.cs')
Solution 5:
The best solution, to me anyway, is buried in the comments of @ephemient's answer. I am just pulling it up here so that it doesn't go unnoticed. The credit for this should go to @FRoZeN (and @ephemient).
git diff --shortstat `git hash-object -t tree /dev/null`
returns the total of files and lines in the working directory of a repo, without any additional noise. As a bonus, only the source code is counted - binary files are excluded from the tally.
The command above works on Linux and OS X. The cross-platform version of it is
git diff --shortstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904
That works on Windows, too.
For the record, the options for excluding blank lines,
-
-w
/--ignore-all-space
, -
-b
/--ignore-space-change
, -
--ignore-blank-lines
, --ignore-space-at-eol
don't have any effect when used with --shortstat
. Blank lines are counted.