What is a dangling commit and a blob in a Git repository and where do they come from?
I'm looking for the basic information on dangling commits and blobs.
My repository seems fine. But I ran git fsck
for the first time to see what it did and I have a long list of 'dangling blobs' and a single 'dangling commit'.
What are these things? Where did they come from? Do they indicate anything unusual (good or bad) about the state of my repository?
During the course of working with your Git repository, you may end up backing out of operations, and making other moves that cause intermediary blobs, and even some things that Git does for you to help avoid loss of information.
Eventually (conditionally, according to the git gc man page) it will perform garbage collection and clean these things up. You can also force it by invoking the garbage collection process, git gc
.
For more information about this, see Maintenance and Data Recover on the git-scm site.
A manual run of GC will by default leave two weeks prior to the runtime of this command as a safety net. It is in fact encouraged to run the GC occasionally to help ensure performant use of your Git repository. Like anything, though, you should understand what it is doing before destroying those things that may be important to you.
Dangling blob = A change that made it to the staging area/index, but never got committed. One thing that is amazing with Git is that once it gets added to the staging area, you can always get it back because these blobs behave like commits in that they have a hash too!!
Dangling commit = A commit that isn't directly linked to by any child commit, branch, tag or other reference. You can get these back too!
HOWTO remove all dangling commits from your Git repository from http://www.tekkie.ro/news/howto-remove-all-dangling-commits-from-your-git-repository/:
git reflog expire --expire=now --all
git gc --prune=now
Make sure you really want to remove them, as you might decide you need them after all.
A dangling commit is a commit which is not associated with reference, i.e., there is no way to reach it.
For example, consider the diagram below. Suppose we delete the branch featureX without merging its changes, then commit D will become a dangling commit because there is no reference associated with it. Had it been merged into master, then HEAD and master references would have pointed to commit D and it would not be dangling anymore, even if we deleted featureX. Read the note after the diagram to understand this better.
Git automatically garbage collects (i.e., disposes) dangling commits. We can use the git reflog
to recover a branch (of dangling commits) which was deleted without merging it. We can recover deleted commits only if it is present in local object store. If it was garbage collected, then we can't recover it.
NOTE that a branch name, i.e., a branch label, is actually a reference to the latest commit on a branch, i.e., the tip of the branch. In the diagram above, featureX, master and HEAD are just references to specific commits. featureX and master labels refer to latest commits on their respective branches. HEAD generally refers to the tip of the currently checked out branch (master in this case). If you checkout an older commit on your current branch, then HEAD will be in a detached state, i.e., it will point to the older commit instead of the latest one. Also note that HEAD is called a symbolic reference because it actually points to the current branch label and any branch label always points to the tip of the branch. So, under normal circumstances, HEAD indirectly points to the latest commit.
As an aside, note that Git represents its commit graph/history as a directed acyclic graph. Each commit has a reference to its parent. Hence, the arrows in a commit diagram point from child commit to parent commit. We need a reference to the latest child commit in order to reach the older commits on a branch.
PS - The above diagram and understanding was obtained from this free course. Even though the course is quite old, the knowledge is still relevant.