What's the use of the staging area in Git?
What is the point of git add .
or git add <filename>
to add it to the staging area? Why not just git commit -m "blabla"
?
I don't understand the value of the staging area.
Solution 1:
There are many uses of staging in Git. Some are listed below:
-
staging helps you split up one large change into multiple commits - Let's say you worked on a large-ish change, involving a lot of files and quite a few different subtasks. You didn't actually commit any of these -- you were "in the zone", as they say, and you didn't want to think about splitting up the commits the right way just then. (And you're smart enough not to make the whole thing on honking big commit!). Now the change is all tested and working, you need to commit all this properly, in several clean commits each focused on one aspect of the code changes. With the index, just stage each set of changes and commit until no more changes are pending. Really works well with git gui if you're into that too, or you can use git add -p or, with newer gits, git add -e.
-
staging helps in reviewing changes - Staging helps you "check off" individual changes as you review a complex commit, and to concentrate on the stuff that has not yet passed your review. Let me explain. Before you commit, you'll probably review the whole change by using git diff. If you stage each change as you review it, you'll find that you can concentrate better on the changes that are not yet staged. git gui is great here. It's two left panes show unstaged and staged changes respectively, and you can move files between those two panes (stage/unstage) just by clicking on the icon to the left of the filename. Even better, you can even stage partial changes to a file. In the right pane of git gui, right click on a change that you approve of and choose "stage hunk". Just that change (not the entire file) is now staged; in fact, if there are other, unstaged, changes in that same file, you'll find that the file now appears on both top and bottom left panes!
-
staging helps when a merge has conflicts - When a merge happens, changes that merge cleanly are updated both in the staging area as well as in your work tree. Only changes that did not merge cleanly (i.e., caused a conflict) will show up when you do a git diff, or in the top left pane of git gui. Again, this lets you concentrate on the stuff that needs your attention -- the merge conflicts.
-
staging helps you keep extra local files hanging around - Usually, files that should not be committed go into .gitignore or the local variant, .git/info/exclude. However, sometimes you want a local change to a file that cannot be excluded (which is not good practice but can happen sometimes). For example, perhaps you upgraded your build environment and it now requires an extra flag or option for compatibility, but if you commit the change to the Makefile, the other developers will have a problem. Of course you have to discuss with your team and work out a more permanent solution, but right now, you need that change in your working tree to do any work at all! Another situation could be that you want a new local file that is temporary, and you don't want to bother with the ignore mechanism. This may be some test data, a log file or trace file, or a temporary shell script to automate some test... whatever. In git, all you have to do is never to stage that file or that change. That's it.
-
staging helps you sneak in small changes - Let's say you're in the middle of a somewhat large-ish change and you are told about a very important bug that needs to be fixed asap. The usual recommendation is to do this on a separate branch, but let's say this fix is really just a line or two, and can be tested just as easily without affecting your current work. With git, you can quickly make and commit only that change, without committing all the other stuff you're still working on. Again, if you use git gui, whatever's on the bottom left pane gets committed, so just make sure only that change gets there and commit, then push!
Solution 2:
It's worth comparing how Git handles this—Git makes you know about and use the staging-area—to how Mercurial handles this. In Mercurial, you work exactly as you suggest: you just run hg commit
and Mercurial figures out what you changed and commits it. You do have to hg add
a new file, but if you are just changing existing files, there is nothing special to do: you change them, and commit, and you are done.
Mercurial's behavior seems (and in my observation, has been) much more new-user-friendly. Git actually lets you get most of the same effect by using git commit -a
. That is, you just add -a
to whatever other options you will use, and Git will do pretty much the same thing as Mercurial. But this is kind of a crutch, because eventually, you will find something that Git has done that is quite inexplicable unless you know about the staging area.
Hidd3N's answer shows a number of ways you can use Git's staging area. But if you step back a bit, and compare Mercurial and Git, you can, I think, see a lot more of what is really going on.
Remember that the job of any version control system (VCS) is to let you retrieve every committed version ever. (And, since both Git and Mercurial work on the snapshot of whole system basis, they are easy to compare here. There are some much older VCSes that operate on one file at a time: you must specifically check-in / commit each individual file. Git and Mercurial make a snapshot of everything-all-at-once.) These committed snapshots should last forever, and never change at all. That is, they are read-only.
Files that are read-only are no good for working on, though. So any VCS must have, somehow / somewhere, two separate things:
- the place where you work on files: this is your work-tree; and
- the place that snapshots are stored: this is your version database, or repository, or some other word—Git calls these things objects while Mercurial has a more complicated set of structures, so let's just call them objects here.
Git's object storage area has a bunch of read-only objects: in fact, one for every file, and every commit, and so on. You can add new objects any time, but you cannot change any existing objects.
As Mercurial demonstrates, there is no requirement for a separate staging area: the VCS can use the work-tree as the proposed commit. When you run hg commit
, Mercurial packages up the work-tree and makes a commit from it. When you make changes in the work-tree, you change the proposed next commit. The hg status
command shows you what you're proposing to commit, which is: whatever is different between the current commit and the work-tree.
Git, however, chooses to interpose this intermediate area, halfway between the read-only commits and the read/write work-tree. This intermediate area, the staging area or index or cache, contains the proposed next commit.
You start out by checking out some commit. At this point, you have three copies of every file:
- One is in the current commit (which Git can always find by the name
HEAD
). This one is read-only; you can't change it. It's in a special, compressed (sometimes very compressed), Git-only form. - One is in the index / staging-area. This one matches the
HEAD
one now, but it can be changed. It's the one proposed to go into the next commit. This, too, is in the special Git-only form. - The last one is in your work-tree, in ordinary form where you can work on it.
What git add
does is to copy files from your work-tree, into the staging area, overwriting the one that used to match the HEAD
commit.
When you run git status
, it must make two separate comparisons. One compares the HEAD
commit to the index / staging-area, to see what's going to be different in the next commit. This is what's to be committed
. The second comparison finds what's different between the index / staging-area, and the work-tree. This is what's not staged for commit
.
When you run git commit -a
, Git simply does the copy-to-staging-area based on the second comparison. More precisely, it runs the equivalent of git add -u
. (It secretly does this with a temporary staging-area, in case the commit fails for some reason, so that your regular staging-area / index is undisturbed for the duration of the commit. Some of this depends on additional git commit
arguments as well. Normally this tends to be invisible, at least until you start writing complex commit hooks.)