Where does a Git branch start and what is its length?
In Git, you could say that every branch starts at the root commit, and that would be quite literally true. But I guess that's not very helpful for you. What you could do instead is to define "the start of a branch" in relation to other branches. One way you can do this is to use
git show-branch branch1 branch2 ... branchN
and that will show you the common commit between all specified branches at the bottom of the output (if there is, in fact, a common commit).
Here's an example from the Linux Kernel Git documentation for show-branch
$ git show-branch master fixes mhf
* [master] Add 'git show-branch'.
! [fixes] Introduce "reset type" flag to "git reset"
! [mhf] Allow "+remote:local" refspec to cause --force when fetching.
---
+ [mhf] Allow "+remote:local" refspec to cause --force when fetching.
+ [mhf~1] Use git-octopus when pulling more than one heads.
+ [fixes] Introduce "reset type" flag to "git reset"
+ [mhf~2] "git fetch --force".
+ [mhf~3] Use .git/remote/origin, not .git/branches/origin.
+ [mhf~4] Make "git pull" and "git fetch" default to origin
+ [mhf~5] Infamous 'octopus merge'
+ [mhf~6] Retire git-parse-remote.
+ [mhf~7] Multi-head fetch.
+ [mhf~8] Start adding the $GIT_DIR/remotes/ support.
*++ [master] Add 'git show-branch'.
In that example, master
is being compared with the fixes
and mhf
branches. Think of this output as a table, with each branch represented by its own column, and each commit getting its own row. Branches that contain a commit will have a +
or -
show up in their column in the row for that commit.
At the very bottom of the output, you'll see that all 3 branches share a common ancestor commit, and that it is in fact the head
commit of master
:
*++ [master] Add 'git show-branch'.
This means that both fixes
and mhf
were branched off of that commit in master
.
Alternative solutions
Of course that's only 1 possible way to determine a common base commit in Git. Other ways include git merge-base
to find common ancestors, and git log --all --decorate --graph --oneline
or gitk --all
to visualize the branches and see where they diverge (though if there are a lot of commits that becomes difficult very quickly).
Other questions from original poster
As for these questions you had:
Is commit
D
a member of both branches or can we clearly decide whether it belongs tobranch-A
orbranch-B
?
D
is a member of both branches, it's an ancestor commit for both of them.
Supervisors sometimes like to know, when a branch has been started (it usually marks the start of a task)...
In Git, you can rewrite the history of the entire commit tree(s) and their branches, so when a branch "starts" is not as set in stone as in something like TFS or SVN. You can rebase
branches onto any point in time in a Git tree, even putting it before the root commit! Therefore, you can use it to "start" a task at any point in time in the tree that you want.
This is a common use case for git rebase
, to sync branches up with the latest changes from an upstream branch, to push them "forward" in time along the commit graph, as if you had "just started" working on the branch, even though you've actually been working on it for a while. You could even push branches back in time along the commit graph, if you wanted to (though you might have to resolve a lot of conflicts, depending on the branch contents...or maybe you won't). You could even insert or delete a branch from right in the middle of your development history (though doing so would probably change the commit shas of a lot of commits). Rewriting history is one of the primary features of Git that makes it so powerful and flexible.
This is why commits come with both an authored date (when the commit was originally authored), and a committed date (when the commit was last committed to the commit tree). You can think of them as analogous to create time-date and last-modified time-date.
Supervisors sometimes like to know...to which branch some changes belong to (to get the purpose of some change - was it required for the work).
Again, because Git allows you to rewrite history, you can (re)base a set of changes on pretty much any branch/commit in the commit graph that you want. git rebase
literally allows you to move your entire branch around freely (though you might need to resolve conflicts as you go, depending on where you move the branch to and what it contains).
That being said, one of the tools you can use in Git to determine which branches or tags contains a set of changes is the --contains
:
# Which branches contains commit X?
git branch --all --contains X
# Which tags contains commit X?
git tag --contains X
The bounty notice on this question asks,
I'd be interested in knowing whether or not thinking about Git branches as having a defined "beginning" commit other than the root commit even makes sense?
It kind of does except:
- the root commit is "the first commit accessible from the branch HEAD" (and don't forget there can be multiple root commit with orphan branches, used for instance in GitHub for
gh-pages
) -
I prefer considering the start of a branch being the commit of another branch from which said branch has been created (tobib's answer without the
~1
), or (simpler) the common ancestor.
(also in "Finding a branch point with Git?", even though the OP mentioned being not interested in common ancestors):git merge-base A master
That means:
- the first definition gives you a fixed commit (which might never change except in case of massive
filter-branch
) - the second definition gives you a relative commit (relative to another branch) which can change at any time (the other branch can be deleted)
The second one makes more sense for git, which is all about merge and rebase between branches.
Supervisors sometimes like to know, when a branch has been started (it usually marks the start of a task) and to which branch some changes belong to (to get the purpose of some change - was it required for the work)
Branches are simply the wrong marker for that: due to the transient nature of branches (which can be renamed/moved/rebased/deleted/...), you cannot mimick a "change set" or an "activity" with a branch, to represent a "task".
That is an X Y problem: the OP is asking for an attempted solution (where does a branch starts) rather than the actual problem (what could be considered a task in Git).
To do that (representing a task), you could use:
- tags: they are immutable (once associated to a commit, that commit is no longer supposed to move/be rebased), and any commits between two well-named tags can represent an activity.
- some
git notes
to a commit to memorize to which "work item" said commit has been created (contrary to tags, notes can be rewritten if the commit is amended or rebased). - hooks (to associate a commit to some "external" object like a "work item", based on the commit message). That is what the bridge Git-RTC -- IBM Rational Team Concert -- does with a pre-receive hook) The point is: the start of a branch does not always reflect the start of a task, but merely the continuation of an history which can change, and whom sequence should represent a logical set of changes.
Perhaps you are asking the wrong question. IMO, it doesn't make sense to ask where a branch starts since a given branch includes all changes made to every file ever (i.e. since the initial commit).
On the other hand, asking where two branches diverged is definitely a valid question. In fact, this seems to be exactly what you want to know. In other words, you don't really want to know information about a single branch. Instead you want to know some information about comparing two branches.
A little bit of research turned up the gitrevisions man page which describes the details of referring to specific commits and ranges of commits. In particular,
To exclude commits reachable from a commit, a prefix ^ notation is used. E.g. ^r1 r2 means commits reachable from r2 but exclude the ones reachable from r1.
This set operation appears so often that there is a shorthand for it. When you have two commits r1 and r2 (named according to the syntax explained in SPECIFYING REVISIONS above), you can ask for commits that are reachable from r2 excluding those that are reachable from r1 by ^r1 r2 and it can be written as r1..r2.
So, using the example from your question, you can get the commits where branch-A
diverges from master
with
git log master..branch-A
There are two separate concerns here. Starting from your example,
A - B - C - - - - J [master] \ \ F - G [branch-A] \ / D - E \ H - I [branch-B]
[...] Supervisors sometimes like to know, when a branch has been started (it usually marks the start of a task) and to which branch some changes belong (to get the purpose of some change - was it required for the work)
two factual observations before we get to the meat:
First observation: what your supervisor wants to know is the mapping between commits and some external workorder-ish record: what commits address bug-43289 or featureB? Why are we changing strcat
usage in longmsg.c
? Who's going to pay for the twenty hours between your previous push and this one? The branch names themselves don't matter here, what matters is the commits' relationships with external administrative records.
Second observation: whether branch-A
or branch-B
gets published first (via say merge or rebase or push), the work in commits D and E has to go with it right then and not be duplicated by any subsequent operation. It makes no difference at all what was current when those commits were made. Branch names don't matter here, either. What matters is commits' relations with each other via the ancestry graph.
So my answer is, so far as any history is concerned branch names don't matter at all. They're convenience tags showing which commit is current for some purpose specific to that repo, nothing more. If you want some useful moniker in the default merge-commit message subject line, git branch some-useful-name
the tip before merging, and merge that. They're the same commits either way.
Tying whatever branch name the developer had checked out at the time of commit with some external record -- or anything at all -- is deep into "all fine so long as everything works" territory. Don't Do It. Even with the restricted usage common in most VCS's, your D-E-{F-G,H-I}
will occur sooner rather than later, and then your branch naming conventions will have to be adapted to handle that, and then something more complicated will show up, . . .
Why bother? Put the report number(s) prompting the work in a tagline at the bottom of your commit messages and be done with it. git log --grep
(and git in general) is blazingly fast for good reason.
Even a fairly flexible prep hook to insert taglines like this is trivial:
branch=`git symbolic-ref -q --short HEAD` # branch name if any
workorder=`git config branch.${branch:+$branch.}x-workorder` # specific or default config
tagline="Acme-workorder-id: ${workorder:-***no workorder supplied***}"
sed -i "/^ *Acme-workorder-id:/d; \$a$tagline" "$1"
and here's the basic pre-receive hook loop for when you need to inspect every commit:
while read old new ref; do # for each pushed ref
while read commit junk; do # check for bad commits
# test here, e.g.
git show -s $commit | grep -q '^ *Acme-workorder-id: ' \
|| { rc=1; echo commit $commit has no workorder associated; }
# end of this test
done <<EOD
$(git rev-list $old..$new)
EOD
done
exit $rc
The kernel project uses taglines like this for copyright signoff and code-review recording. It really couldn't get much simpler or more robust.
Note that I did some hand-mangling after c&p to de-specialize real scripts. Keyboard-to-editbox warning
I think this is probably a good opportunity for education. git
doesn't really record the starting point of any branch. Unless the reflog for that branch still contains the creation record, there's no way to definitively determine where it started, and if the branch has merges in it anywhere, it may in fact have more than one root commit, as well as many different possible points where it might have been created and started to diverge from its original source.
It might be a good idea to ask a counter question in such cases - why do you need to know where it branched from, or does it matter in any useful way where it branched from? There might or might not be good reasons that this is important - many of the reasons are probably tied up in the specific workflow your team has adopted and is trying to enforce, and may indicate areas where your workflow might be improved in some way. Perhaps one improvement would be figuring out what the "right" questions to ask - for example, rather than "where did branch-B
branch from", maybe "what branches do or don't contain the fixes/new features introduced by branch-B
"...
I'm not sure that a completely satisfactory answer to this question really exists...