Git: Possible to use same submodule working copy by multiple projects?

I'm new to Git. Lets say, I have two git repositories that have the same library added as submodule:

/home/projects/project1/library_XYZ
/home/projects/project2/library_XYZ

Also lets say, I'm working on the projects and the library simultaneously. When I did changes to the library, lets say in /home/projects/project1/library_XYZ, I would have to push these changes and then pull them in /home/projects/project2/library_XYZ to make them available for project2, right? I think this is unhandy for two reasons:

  • I will have to build library_XYZ two times.
  • I have an unwanted redundancy that contradicts the actual project organization.

Is there any way to make Git clone the submodule library_XYZ to the same local directory, i.e. making the files being organized like this

/home/projects/project1
/home/projects/project2
/home/projects/library_XYZ

while library_XYZ still being a submodule of both projects?

I think this might be related to this, which is unanswered, although my setup is somewhat different.


Solution 1:

Setting up shared dependencies as submodules is easy. The git submodule command doesn't do it automatically, but a submodule is nothing more than a nested repository -- and git doesn't require any actual repository or its worktree to be in any particular place.

Set up a libraryXYZ repo for use as a shared submodule

# a submodule is just a repository. We're going to share this one.
git clone u://r/libraryXYZ 

# and keep its worktree right here:
( cd libraryXYZ; git config core.worktree .. )

Then, from anywhere, clone the project using the submodule and set it up to use the shared one:

git clone u://r/project1
cd project1
git submodule init
echo gitdir: path/to/shared/libraryXYZ/.git > libraryXYZ/.git

Now project1's libraryXYZ submodule will use the shared libraryXYZ repo and worktree.

Set up your build system to use that same worktree and you're done. You can of course get git to tell you where those are in any given repo:

# for example to see where all a project's submodules' parts are kept
git submodule foreach git rev-parse --git-dir --show-toplevel

# or for just one:
git --git-dir=$project1/libraryXYZ/.git rev-parse --git-dir --show-toplevel

(late edit: @twalberg's remark is worth keeping in mind, this might make it a bit too easy to do a git submodule update from one project without realizing you've also altered the build environment for every other project that shares dependencies.)

Solution 2:

I had sort of the same problem like you: A large generic utility library as a submodule, and many projects depending on it. I did not want to create a separate checkout for every instance of the utility library.

The solution suggested by jthill works fine, but it only solves the first half of the problem, namely how to keep git happy.

What was missing is how to keep your build system happy, which expects actual files to work with and does not care about gitlink references.

But if you combine his idea with a symlink, you get what you want!

In order to implement this, let's start with the projects from your example

/home/projects/project1
/home/projects/project2
/home/projects/library_XYZ

assuming both project1 and project2 have library_XYZ already added as a submodule, and that currently all three projects contain a full checkout of library_XYZ.

In order to replace the full checkouts of the library submodules by a shared symlink to the library's checkout, do this:

sharedproject="/home/projects/library_XYZ"
superproject="/home/projects/project1"
submodule="library_XYZ"
cd "$superproject"
(cd -- "$submodule" && git status) # Verify that no uncommited changes exist!
(cd -- "$submodule" && git push -- "$sharedproject") # Save any local-only commits
git submodule deinit -- "$submodule" # Get rid of submodule's check-out
rm -rf .git/modules/"$submodule" # as well as of its local repository
mkdir -p .submods
git mv -- "$submodule" .submods/
echo "gitdir: $sharedproject.git" > ".submods/$submodule/.git"
ln -s -- "$sharedproject" "$submodule"
echo "/$submodule" >> .gitignore

and then repeat the same steps for /home/projects/project2 as $superproject.

And here is an explanation what has been done:

First the submodule checkout is removed with "git submodule deinit", leaving library_XYZ behind as an empty directory. Be sure to commit any changes before you do this, because it will remove the checkout!

Next, we save any commits local to the check-out which have not yet been pushed to the shared project with "git push" to /home/projects/library_XYZ.

If this does not work because you did not setup a remote or refspec for that, you can do this:

(saved_from=$(basename -- "$superproject"); \
 cd -- "$submodule" \
 && git push -- "$sharedproject" \
             "refs/heads/*:refs/remotes/$saved_from/*")

This will save backups of all branches of the submodule's local repository as remote branches in /home/projects/library_XYZ. The basename of the $superproject directory will be used as the name of the remote, i. e. project1 or project2 in our example.

Of course, there exists not really a remote of that name in /home/projects/library_XYZ, but the saved branches will show up as if it did when "git branch -r" will be executed there.

As a safeguard, the refspec in the above command does not start with a "+", so the "git push" cannot accidentally overwrite any branch which already happens to exist in /home/projects/library_XYZ.

Next, .git/modules/library_XYZ will be removed in order to save space. We can do this because we do no longer need to use "git submodule init" or "git submodule update". This is the case because we will share both the check-out and the .git directory of /home/projects/library_XYZ with the submodule, avoiding a local copy of both.

Then we let git rename the empty submodule directory to ".submods/library_XYZ", a (hidden) directory the files in the projects will never use directly.

Next we apply jthill's partial solution to the problem and create a gitlink file in .submods/library_XYZ, which makes git see /home/projects/library_XYZ as the working tree and git repo of the submodule.

And now comes the new thing: We create a symlink with the relative name "library_XYZ" which points to /home/projects/library_XYZ. This symlink will not be put under version control, so we add it to the .gitignore file.

All the build files in project1 and project2 will use the library_XYZ symlink as if it were a normal subdirectory, but actually find the files from the working tree in /home/projects/library_XYZ there.

No-one except git actually uses .submods/library_XYZ!

However, as the symlink ./library_XYZ is not versioned, it won't be created when checking out project1 or project2. We therefore need to take care it will be created automatically when missing.

This should be done by the build infrastructure of project1/project2 with a command equivalent to the following shell commands:

$ test ! -e library_XYZ && ln -s .submods/library_XYZ

For instance, if project1 is built using a Makefile and contains the following target rule for updating the subproject

library_XYZ/libsharedutils.a:
        cd library_XYZ && $(MAKE) libsharedutils.a

then we insert the line from above as the first line of the rule's action:

library_XYZ/libsharedutils.a:
        test ! -e library_XYZ && ln -s .submods/library_XYZ
        cd library_XYZ && $(MAKE) libsharedutils.a

If your project is using some other build system you can usually do the same thing by creating a custom rule for creating the library_XYZ subdirectory.

If your project contains only scripts or documents and does not use any kind of build system at all, you can add a script which the user can run for creating the "missing directories" (actually: symlinks) as follows:

(n=create_missing_dirs.sh && cat > "$n" << 'EOF' && chmod +x -- "$n")
#! /bin/sh
for dir in .submods/*
do
        sym=${dir#*/}
        if test -d "$dir" && test ! -e "$sym"
        then
                echo "Creating $sym"
                ln -snf -- "$dir" "$sym"
        fi
done
EOF

This will create symlinks to all submodule check-outs in .submods, but only if they don't exist yet or if they are broken.

So far for the transformation from a conventional submodule layout to the new layout which allows sharing.

Once you already have that layout committed, check out the superproject somewhere, go to its top-level directory, and do the following in order to enable sharing:

sharedproject="/home/projects/library_XYZ"
submodule="library_XYZ"
ln -sn -- "$sharedproject" "$submodule"
echo "gitdir: $sharedproject.git" > ".submods/$submodule/.git"

I hope you get the point: The library_XYZ subdirectory used by project1 and project2 is an unversioned symlink rather than corresponding to the submodule path as defined in ".gitmodules".

The symlink will be created automatically by the build infrastructure itself and will then point to .submods/library_XYZ, but only, and this is important, if the symlink does not already exist.

This allows one to create the symlink manually instead of letting the build system create it, so it can also be made to point to a single shared check-out rather than to .submods/library_XYZ.

That way you can use a shared check-out on your machine if you want.

But if another person does nothing special and just checks out project1 and does a normal "git submodule update --init library_XYZ", things will work the same without a shared check-out.

No changes to the checked-out build files necessary in either case!

In other words, a check-out of project1 and project2 will work out of the box as usual, no special instructions need to be followed by other people using your repo.

But by manually creating the gitlink file and the library_XYZ symlink before the build system has a chance to create the symlink, you can locally "override" the symlink and enforce a shared checkout of the library.

And there is even another advantage: As it turns out, you don't need to mess with "git submodule init" or "git submodule update" at all if you use the above solution: It just works without!

This is because "git submodule init" is only necessary as a preparation for "git submodule update". But you won't need the latter because the library is already checked-out somewhere else and also has already its own .git directory there. So there is nothing to do for "git submodule update", and we don't need it.

As a side effect of no longer using "git submodule update", no .git/module subdirectory will be required either. Neither remains any need to set alternates (--reference option) for the submodules.

Also, you don't need any remotes for pushing/pulling /home/projects/library_XYZ in /home/projects/project1 and /home/projects/project2 any more. So you can remove the remote for accessing library_XYZ from project1 and project2.

A win-win situation!

The only obvious disadvantage of this solution is that it requires symlinks to work.

That means, it won't be possible to check out project1 on, say, a VFAT filesystem.

But then, who does that?

And even when doing so, projects like http://sourceforge.net/projects/posixovl may still be able to work around any symlink limitation of a filesystem.

Finally, some advice for Windows users here:

Symbolic links are available since VISTA via the mklink command, but it requires special privileges.

But when using the "junction"-command from sysinternals, symlinks to directories could already be created even back in Windows XP times.

In addition, you have the option to use CygWin, which can (AFAIK) emulate symlinks even without support from the OS.