Flexible vs static branching (Git vs Clearcase/Accurev)
My question is about the way in which Git handles branches: whenever you branch from a commit, this branch won’t ever receive changes from the parent branch unless you force it with a merge.
But in other systems such us Clearcase or Accurev, you can specify how branches get filled with some sort of inheritance mechanism: I mean, with Clearcase, using a config_spec, you can say “get all the files modified on branch /main/issue001 and then continue with the ones on /main or with this specific baseline”.
In Accurev you also have a similar mechanism which let’s streams receive changes from upper branches (streams how they call them) without merging or creating a new commit on the branch.
Don’t you miss this while using Git? Can you enumerate scenarios where this inheritance is a must?
Thanks
Update Please read VonC answer below to actually focus my question. Once we agree "linear storage" and DAG based SCMs have different capabilities, my question is: which are the real life scenarios (especially for companies more than OSS) where linear can do things not possible for DAG? Are they worth?
Solution 1:
To understand why Git does not offer some kind of what you are referring to as an "inheritance mechanism" (not involving a commit), you must first understand one of the core concepts of those SCMs (Git vs. ClearCase for instance)
ClearCase uses a linear version storage: each version of an element (file or directory) is linked in a direct linear relationship with the the previous version of the same element.
Git uses a DAG - Directed Acyclic Graph: each "version" of a file is actually part of a global set of changes in a tree that is itself part of a commit. The previous version of that must be found in a previous commit, accessible through a single directed acyclic graph path.
In a linear system, a config spec can specify several rules for achieving the "inheritance" you see (for a given file, first select a certain version, and if not present, then select another version, and if not present, then select a third, and so on).
The branch is a fork in a linear history a given version for a given select rule (all the other select rules before that one still apply, hence the "inheritance" effect)
In a DAG, a commit represents all the "inheritance" you will ever get; there is no "cumulative" selection of versions. There is only one path in this graph to select all the files you will see at this exact point (commit).
A branch is just a new path in this graph.
To apply, in Git, some other versions, you must either:
- merge into your branch some other commit (like in the git pull" mentioned by stsquad's answer) or
- rebase your branch (as Greg mentions)
But since Git is a DAG-based SCM, it will always result in a new commit.
What you are "losing" with Git is some kind of "composition" (when you are selecting different versions with different successive select rules), but that would not be practical in a DVCS (as in "Distributed"): when you are making a branch with Git, you need to do so with a starting point and a content clearly defined and easily replicated to other repositories.
In a purely central VCS, you can define your workspace (in ClearCase, your "view", either snapshot or dynamic) with whatever rules you want.
unknown-google adds in the comment (and in his question above):
So, once we see the two models can achieve different things (linear vs DAG), my question is: which are the real life scenarios (especially for companies more than OSS) where linear can do things not possible for DAG? Are they worth it?
When it comes to "real-life scenario" in term of selection rules, what you can do in a linear model is to have several selection rules for the same set of files.
Consider this "config spec" (i.e. "configuration specification" for selection rules with ClearCase):
element /aPath/... aLabel3 -mkbranch myNewBranch
element /aPath/... aLabel2 -mkbranch myNewBranch
It selects all the files labelled 'aLabel2
' (and branch from there), except for those labelled 'aLabel3
' - and branch from there - (because that rule precedes the one mentioning 'aLabel2
').
Is it worth it?
No.
Actually, the UCM flavor of ClearCase (the "Unified Configuration Management" methodology included with the ClearCase product, and representing all the "best practices" deduced from base ClearCase usage) does not allow it, for reasons of simplificity. A set of files is called a "component", and if you want to branch for a given label (known as a "baseline"), that would be translated like this following config spec:
element /aPath/... .../myNewBranch
element /aPath/... aLabel3 -mkbranch myNewBranch
element /aPath/... /main/0 -mkbranch myNewBranch
You have to pick one starting point (here, 'aLabel3
') and go from there.
If you want also the files from 'aLabel2
', you will make a merge from all the 'aLabel2
' files to the ones in 'myNewBranch'.
That is a "simplification" you do not have to make with a DAG, where each node of the graph represents a uniquely defined "starting point" for a branch, whatever is the set of files involved.
Merge and rebase are enough to combine that starting point with other versions of a given set of files, in order to achieve the desired "composition", while keeping that particular history in isolation in a branch.
The general goal is to reason in "coherent Version Control operations applied to a coherent component". A "coherent" set of files is one in a well-defined coherent state:
- if labelled, all its files are labelled
- if branched, all its files will branch from the same unique starting point
That is easily done in a DAG system; it can be more difficult in a linear system (especially with "Base ClearCase" where the "config spec" can be tricky), but it is enforced with the UCM methodology of that same linear-based tool.
Instead of achieving that "composition" through a "private selection rule trick" (with ClearCase, some select rule order), you achieve it only with VCS operations (rebase or merge), which leave a clear trace for everyone to follow (as opposed to a config spec private to a developer, or shared amongst some but not all developers). Again, it enforces a senses of coherency, as opposed to a "dynamic flexibility", that you may have a hard time to reproduce later on.
That allows you to leave the realm of VCS (Version Control System) and enter the realm of SCM (Software Configuration Management), which is mainly concerned with "reproducibility". And that (SCM features) can be achieved with a linear-based or a DAG-based VCS.
Solution 2:
It sounds like what you're looking for might be git rebase
. Rebasing a branch conceptually detaches it from its original branch point and reattaches it at some other point. (In reality, the rebase is implemented by applying each patch of the branch in sequence to the new branch point, creating a new set of patches.) In your example, you can rebase a branch to the current tip of an upper branch, which will essentially "inherit" all the changes made to the other branch.
Solution 3:
I'm not totally clear on what your asking for but it sounds like git's tracking semantics are what you want. When you branch from am origin you can do something like:
git -t -b my_branch origin/master
And then future "git pull"s will auto merge origin/master into your working branch. You can then use "git cherry -v origin/master" to see what the difference is. You can use "git rebase" before you publish your changes to clean up the history, but you shouldn't use rebase once your history is public (i.e. other people are following that branch).
Solution 4:
As to the inheritance scheme used by accurev: GIT users will probably "get" the whole thing when the look at git-flow (see also: http://github.com/nvie/gitflow and http://jeffkreeftmeijer.com/2010/why-arent-you-using-git-flow/)
This GIT branching model more or less does (manually / with the help of the git-flow tool) what accurev does out-of-the-box automatically and with great GUI support.
So it appears GIT can do what accurev does. Since I never actually used git/git-flow day-to-day I can't really say how it works out but it does look promising. (Minus proper GUI support :-)
Solution 5:
I'll try to answer you question. (I have to say here that I have not used GIT only read about it, so if something that I mention below is wrong, please correct me)
"Can you enumerate scenarios where this inheritance is a must?"
I won't say it is a must, because you can solve a problem with the tool you have, and might be a valid solution for your environment. I guess it is more a matter of the processes than the tool itself. Making sure your process is coherent and also allows you to go back in time to reproduce any intermediate step/state is the goal, and the plus is that the tool let you run you your process and SCMP as painless as possible
The one scenario I can see it is handy to have this 'inheritance' behavior and use the power of the config spec, is when you want your set of changes "isolated" mapped to a task (devtask, CR, SR, or whatever defines the purpose/scope of your change set)
Using this composition allows you to have your development branch clean and still use different combination (using composition) of the rest of the code, and still have only what is relevant for the task isolated in a branch during the whole life cycle of the task, just until the integration phase.
Being purist having to commit/merge/rebase just to have a "defined starting point" , I guess it would 'pollute' your branch and you will end up with your changes + others changes in your branch/change set.
When/Where this isolation is useful? The points bellow might only make sense on the context of companies pursuing CMM and some ISO certifications, and might be of no interest for other kind of companies or OSS
Being really picky, you might want to accurately count the lines of code (added/modified/deleted) of the change set corresponding to a single developer, later used as one input for code and effort estimations.
It can be easier to review the code at different stages, having just your code in a single branch (not glued with other changes)
On big projects with several teams and +500 developers actively working concurrently on the same base code, (where graphical individual element version trees looks like a messy tangled web with several loadlines, one for each big customer, or one for each technology ) large config specs using composition of several degrees in depth made this amount of people work seamlessly to adapt the same product/system (base code) to different purposes. Using this config spec, dynamically gave each team or sub team, a different view to what they need and from where they need to branch of, (cascading on several cases) without the need of creating intermediate integration branches, or constantly merging and rebasing all the bits that you need to start with. Code from the same task/purpose was branching of different labels but made sense. (You can argue here the 'known baseline' as a principle of the SCM but simple labels contemplated in a written SCM Plan did the work) It must be possible to solve this with GIT (I guess in a non dynamic way) but I find really hard to picture without this 'inheritance' behavior. I guess the point mentioned by VonC "if branched, all its files will branch from the same unique starting point" was broken here, but beside it was well documented on the SCMP, I remember there were strong business reason to do it that way.
Yes building these config specs that I mentioned above was not free, in the beginning there where 4-5 well paid people behind the SCM but were later reduced by automated scripts that asked you what you want on terms of labels/branches/features and will write the CS for you.
The reproducibility here was achieved by just saving the Config Spec along with the task in the devTask system, so each task upstream mapped to requirements, and downstream mapped to a config spec, an a set of changes (code files, design documents, test documents etc)
So up to here one conclusion here might be, only if your project is big/complicated enough (and you can afford SC Managers along the life of the project:) ) then you only will start thinking if you need the 'inheritance' behavior or really versatile tool, otherwise you will go directly to a a tool that is free and already take care of the coherence of you SCM ... but there could be other factors on the SCM tool that might make you stick to one or to another ...read on..
Some side notes, that might be out of topic, but I guess in some cases like mine need to be considered.
I have to add here that we use the "good-ol CC" not UCM. Totally agree with VonC on the a good methodology allows to "guide" the flexibility towards a more coherent configuration. The good thing is that CC is pretty flexible and you can find (not without some effort) a good way to have thing coherent while in other SCM you might have it for free. But for example here (and other places that I've worked with CC) for C/C++ projects we cannot afford the price of not having the winkin feature (reusing the Derive objects), that reduce several X times compiling time. It can be argued that having a better design , a more decoupled code, and optimizing Makefiles can reduce the need to compile the whole thing, but there are cases that you need to compile the whole beast many times a day, and sharing the DO saves heaps of time/money. Where I'm now we try to use as much free tool as we can, and I think we will get rid of CC if we can find a cheaper or free tool that implements the winkin feature.
I'll finish with something that Paul mention , different tools are better that other for different purposes but I will add that you can get away from some limitation of the tool by having a coherent process and without scarifying reproducibility, key points off the SCM In the end I guess the answer to it is worth? depends on your "problem", the SDLC you are running, your SCM processes, and if there is any extra feature (like winkin) that might be useful in your environment.
my 2 cents