Git: track branch in submodule but commit in other submodule (possibly nested)

I am looking for a situation in which I have a git structure with (possibly nested submodules). For each of these submodules, I want to specify separately, whether they should track a branch (see e.g., Git submodules: Specify a branch/tag)

For instance, my project may look like this:

main.tex
|- submod1 @ master
|    |-subsubmod1 @qsdf123
|- submod2 @ master
|    |-subsubmod2 @shasha12
|- submod3 @ qsdf321

Now, I want a way to update my submodules.

git submodule update --recursive

will update all submodules to their last recorded sha (i.e., it will work for subsubmod1, subsubmod2 and submod3, but do wrong stuff for the rest. On the other hand

git submodule update --recursive --remote

will update all submodules to the associated branch (by default, master), i.e., it will work for submod1 and submod2, but do wrong stuff for the rest.

Is there a way to get this done nicely?

In response to the first answer, I'll clarify what I mean by "do wrong stuff".

Here is a small example

bartb@EB-Latitude-E5450 ~/Desktop/test $ git init
Initialized empty Git repository in /home/bartb/Desktop/test/.git/
bartb@EB-Latitude-E5450 ~/Desktop/test $ git submodule add ../remote/ submod1
Cloning into 'submod1'...
done.
bartb@EB-Latitude-E5450 ~/Desktop/test $ git submodule add ../remote/ submod2
Cloning into 'submod2'...
done.
bartb@EB-Latitude-E5450 ~/Desktop/test $ cd submod1
bartb@EB-Latitude-E5450 ~/Desktop/test/submod1 $ git log
commit 42d476962fc4e25c64ff2a807d2bf9b3e2ea31f8
Author: Bart Bogaerts <[email protected]>
Date:   Tue Jun 21 08:56:05 2016 +0300

    init commit

commit db1ba3bc4b02df4677f1197dc137ff36ddfeeb5f
Author: Bart Bogaerts <[email protected]>
Date:   Tue Jun 21 08:55:52 2016 +0300

    init commit
bartb@EB-Latitude-E5450 ~/Desktop/test/submod1 $ git checkout db1ba3bc4b02df4677f1197dc137ff36ddfeeb5f
Note: checking out 'db1ba3bc4b02df4677f1197dc137ff36ddfeeb5f'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at db1ba3b... init commit
bartb@EB-Latitude-E5450 ~/Desktop/test/submod1 $ cd ..
bartb@EB-Latitude-E5450 ~/Desktop/test $ git config -f .gitmodules submodule.submod2.branch master
bartb@EB-Latitude-E5450 ~/Desktop/test $ git commit -a -m "modules"
[master (root-commit) ea9e55f] modules
 3 files changed, 9 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 submod1
 create mode 160000 submod2
bartb@EB-Latitude-E5450 ~/Desktop/test $ git status
On branch master
nothing to commit, working directory clean
bartb@EB-Latitude-E5450 ~/Desktop/test $  git submodule update --recursive --remote
Submodule path 'submod1': checked out '42d476962fc4e25c64ff2a807d2bf9b3e2ea31f8'
bartb@EB-Latitude-E5450 ~/Desktop/test $ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   submod1 (new commits)

As you can see, after the latest git submodule update --remote submod1 is checked out at the master, even though I never configured the master branch for it. That's what I mean by "do wrong stuff"

The same thing happens for subsubmodules: they are all checked out at master instead of at their specific commit.

This "issue" is actually the expected of git submodule update --remote. From the git documentation:

This option is only valid for the update command. Instead of using the superproject’s recorded SHA-1 to update the submodule, use the status of the submodule’s remote-tracking branch. The remote used is branch’s remote (branch.<name>.remote), defaulting to origin. The remote branch used defaults to master, but the branch name may be overridden by setting the submodule.<name>.branch option in either .gitmodules or .git/config (with .git/config taking precedence).

https://git-scm.com/docs/git-submodule

Especially the part:

The remote branch used defaults to master

This is what I want to avoid.

Edit: an additional request is: I do not want to make any modifications to submods or subsubmods (these are joint projects).


Solution 1:

Update 2020:

The OP BartBog reports in the comments:

The current (2016) answer does not work that well (anymore?) with subsubmodules since $top/.gitmodules does not contain the branch info of the subsub (and subsubsub modules)

New solution:

export top=$(pwd) 
git submodule foreach 'b=$(git config -f ${top}/.gitmodules submodule.${path}.branch); \
  case "${b}" in \
    "") git checkout ${sha1}; git su ;; 
    *) git checkout ${b}; git pull origin ${b}; git su;; 
  esac')

where git-su is the name of my script


Original 2016 answer:

will update all submodules to the associated branch (by default, master), i.e., it will work for submod1 and submod2, but do wrong stuff for the rest.

Actually, yes, it will do "wrong stuff for the rest".

I will illustrate that bug with an example below (a repo named parent with a submodule 'sub', itself with a submodule 'subsub'), using git version 2.9.0.windows.1.

And I will propose a simple workaround allowing sub to follow master, while making sure subsub is not checked out at its own master.


Setup

Let's make a repo subsub with two commits:

vonc@VONCAVN7 D:\git\tests\subm
> git init subsub1
Initialized empty Git repository in D:/git/tests/subm/subsub1/.git/
> cd subsub1
> git commit --allow-empty -m "subsub c1"
[master (root-commit) f3087a9] subsub c1
> git commit --allow-empty -m "subsub c2"
[master 03d08cc] subsub c2

Lets make that repo subsub a submodule of another repo 'sub':

vonc@VONCAVN7 D:\git\tests\subm
> git init sub
Initialized empty Git repository in D:/git/tests/subm/sub/.git/

> cd sub
> git submodule add -- ../subsub
Cloning into 'D:/git/tests/subm/sub/subsub'...
done.

By default, that submodule 'subsub' is checked out at its own master HEAD (gl is an alias for git log with pretty format):

vonc@VONCAVN7 D:\git\tests\subm\sub\subsub
> gl
* 03d08cc  - (HEAD -> master, origin/master, origin/HEAD) subsub c2 (4 minutes ago) VonC
* f3087a9  - subsub c1 (4 minutes ago) VonC

Let's make sure sub has subsub checked out at c1 (which is not master HEAD C2):

vonc@VONCAVN7 D:\git\tests\subm\sub\subsub
> git checkout @~
Note: checking out '@~'.

You are in 'detached HEAD' state.     
HEAD is now at f3087a9... subsub c1
> git br -avv
* (HEAD detached at f3087a9) f3087a9 subsub c1
  master                03d08cc [origin/master] subsub c2
  remotes/origin/HEAD   -> origin/master
  remotes/origin/master 03d08cc subsub c2

Let's add and commit that submodule 'subsub' (checked out at master~1 c1) in its parent repo 'sub':

vonc@VONCAVN7 D:\git\tests\subm\sub\subsub
> cd ..
vonc@VONCAVN7 D:\git\tests\subm\sub
> git add .
> git commit -m "subsub at HEAD-1"
[master (root-commit) 1b8144b] subsub at HEAD-1
 2 files changed, 4 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 subsub

Let's add a couple of commits in that repo 'sub':

vonc@VONCAVN7 D:\git\tests\subm\sub
> git commit --allow-empty -m "sub c1"
[master b7d1c40] sub c1
> git commit --allow-empty -m "sub c2"
[master c77f4b2] sub c2

vonc@VONCAVN7 D:\git\tests\subm\sub
> gl
* c77f4b2  - (HEAD -> master) sub c2 (2 seconds ago) VonC
* b7d1c40  - sub c1 (3 seconds ago) VonC
* 1b8144b  - subsub at HEAD-1 (77 seconds ago) VonC

The latest commit of sub does reference its submodule 'subsub' at the right commit (the subsub c1 one, not the c2 one)

vonc@VONCAVN7 D:\git\tests\subm\sub
> git ls-tree @
100644 blob 25a0feba7e1c1795be3b8e7869aaa5dba29d33e8    .gitmodules
160000 commit f3087a9bc9b743625e0799f57c017c82c50e35d6  subsub
              ^^^
              That is subsub master~1 commit c1

Finally, let's make a main parent repo 'parent' and add 'sub' as a submodule:

vonc@VONCAVN7 D:\git\tests\subm
> git init parent
Initialized empty Git repository in D:/git/tests/subm/parent/.git/
> cd parent

vonc@VONCAVN7 D:\git\tests\subm\parent
> git submodule add -- ../sub
Cloning into 'D:/git/tests/subm/parent/sub'...
done.

Let's make sure that sub is not checked out at its master HEAD (like we did before for subsub)

vonc@VONCAVN7 D:\git\tests\subm\parent
> cd sub

vonc@VONCAVN7 D:\git\tests\subm\parent\sub
> gl
* c77f4b2  - (HEAD -> master, origin/master, origin/HEAD) sub c2 (2 minutes ago) VonC
* b7d1c40  - sub c1 (2 minutes ago) VonC
* 1b8144b  - subsub at HEAD-1 (3 minutes ago) VonC

vonc@VONCAVN7 D:\git\tests\subm\parent\sub
> git checkout @~1
Note: checking out '@~1'.

You are in 'detached HEAD' state.
HEAD is now at b7d1c40... sub c1

Now, we add sub (checked out at its c1 commit, not at its c2 master HEAD) to the parent repo:

vonc@VONCAVN7 D:\git\tests\subm\parent
> git add .
> git st
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

        new file:   .gitmodules
        new file:   sub


vonc@VONCAVN7 D:\git\tests\subm\parent
> git commit -m "sub at c1"
[master (root-commit) 27374ec] sub at c1
 2 files changed, 4 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 sub

Let's make sub follows master as a submodule in the repo parent:

vonc@VONCAVN7 D:\git\tests\subm\parent
> git config -f .gitmodules submodule.sub.branch master
> git diff
diff --git a/.gitmodules b/.gitmodules
index 8688a8c..97974c1 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +1,4 @@
 [submodule "sub"]
        path = sub
        url = ../sub
+       branch = master

vonc@VONCAVN7 D:\git\tests\subm\parent
> git add .
> git commit -m "sub follows master"
[master 2310a02] sub follows master
 1 file changed, 1 insertion(+)

vonc@VONCAVN7 D:\git\tests\subm\parent
> gl
* 2310a02  - (HEAD -> master) sub follows master (1 second ago) VonC
* 27374ec  - sub at c1 (2 minutes ago) VonC

BUG:

If I clone the repo parent, and then ask to any of its submodule to checkout following their remote branch, sub and subsub will checkout their master branch (while only sub should checkout master, subsub should remain at c1)

First the clone:

vonc@VONCAVN7 D:\git\tests\subm
> git clone --recursive parent p1
Cloning into 'p1'...
done.
Submodule 'sub' (D:/git/tests/subm/sub) registered for path 'sub'
Cloning into 'D:/git/tests/subm/p1/sub'...
done.
Submodule path 'sub': checked out 'b7d1c403edaddf6a4c00bbbaa8e2dfa6ffbd112f'
Submodule 'subsub' (D:/git/tests/subm/subsub) registered for path 'sub/subsub'
Cloning into 'D:/git/tests/subm/p1/sub/subsub'...
done.
Submodule path 'sub/subsub': checked out 'f3087a9bc9b743625e0799f57c017c82c50e35d6'

So far, so good: sub and subsub are checked out at c1, not c2 (that is: not their master HEAD c2)

But:

vonc@VONCAVN7 D:\git\tests\subm\p1
> git submodule update --recursive --remote
Submodule path 'sub': checked out 'c77f4b2590794e030ec68a8cea23ae566701d2de'
Submodule path 'sub/subsub': checked out '03d08cc81e3b9c0734b8f53fad03ea7b1f0373df'

Now, from the clone p1, both submodule and subsubmodule are at their master HEAD c2.

And that, even so sub (checked out at its master as expected) still has subsub at c2:

vonc@VONCAVN7 D:\git\tests\subm\p1\sub
> git ls-tree @
100644 blob 25a0feba7e1c1795be3b8e7869aaa5dba29d33e8    .gitmodules
160000 commit f3087a9bc9b743625e0799f57c017c82c50e35d6  subsub

Workaround:

Without modifying anything within sub and subsub, here is how to make sure subsub remains at its expected c1 commit instead of following master (like sub is supposed to)

Call git submodule update --recursive from the submodule which has itself nested submodules (so no --remote here)

vonc@VONCAVN7 D:\git\tests\subm\p1\sub
> git submodule update --recursive
Submodule path 'subsub': checked out 'f3087a9bc9b743625e0799f57c017c82c50e35d6'

We now have:

  • sub remaining at master (because of parent .gitmodules branch directive, and its initial git submodule update --recursive --remote)
  • subsub is set back to its recorded sha1 (c1, not master c2)

Conclusion

  1. It does look like a bad design: --recursive applies the --remote to all nested submodules, defaulting to master when no submodule.<path>.<branch> is found.
  2. You can script your way out of this in order to:
  • update --remote what you want
  • resetting any submodule which has no branch specified in the top parent repo .gitmodules file to their proper SHA1.

Simply create anywhere in your %PATH% the git-subupd script (a bash script, which will work even in a regular Windows CMD session, because it will be interpreted by the git bash)

git-subupd:

#!/bin/bash
git submodule update --recursive --remote
export top=$(pwd)
git submodule foreach --recursive 'b=$(git config -f ${top}/.gitmodules submodule.${path}.branch); case "${b}" in "") git checkout ${sha1};; esac'

The "combinations of git commands" is reduced to one git call:

cd /path/to/parent/repo
git subupd

That is it.
(Any script called git-xxx can be called by git with git xxx)

vonc@VONCAVN7 D:\git\tests\subm\p1
> git subupd
Submodule path 'sub/subsub': checked out '03d08cc81e3b9c0734b8f53fad03ea7b1f0373df'
Entering 'sub'
Entering 'sub/subsub'
Previous HEAD position was 03d08cc... subsub c2
HEAD is now at f3087a9... subsub c1

sub remains set to master (commit c2, unchanged), while subsub is reset to c1 (instead of its master c2).

The OP BartBog declares in the comments using a slight variation of that script with:

export top=$(pwd)
git submodule foreach --recursive \
  'b=$(git config -f ${top}/.gitmodules submodule.${path}.branch); \
   case "${b}" in \
     "") git checkout ${sha1};; \
      *) git checkout ${b}; git pull origin ${b};; \
   esac' 

to avoid the call to submodule update --remote AND to make sure that my submodules are not in detached head state (conform your answer).