Duplicate submodules with Git
I have a project in Git that has several submodules, and I need those submodules to be downloaded and the files available in order to use the main project, and in order for the submodules to work I need their own submodules to be available etc. So to set this up I recursively initialise the submodules using git submodule update --init --recursive
.
However, I've noticed that many of my submodules have shared dependencies, looking something like this in pseudocode (alpha -> beta
represents that alpha
has the submodule beta
)
my project -> submodule a -> submodule m
-> submodule b -> submodule m
-> submodule n -> submodule x
-> submodule c -> submodule x
My question is: is there any way of avoiding this duplication using only git, while still having (at least one copy of) the files for each submodule?
I can imagine a solution with symlinks, but it would be preferable if git handled this for me, and I'm not sure whether putting in the symlinks myself would cause problems when updating the submodules.
Ideally I'd love to simplify it down to:
my project -> submodule a -> symlink(submodule m)
-> submodule b -> symlink(submodule m)
-> symlink(submodule n)
-> submodule c -> symlink(submodule x)
-> submodule m
-> submodule n -> symlink(submodule x)
-> submodule x
Thanks in advance for any suggestions!
Solution 1:
This isn't built into git, but you can definitely do it with symlinks like you say. You might want to have a look at git new-workdir
(from git's contrib directory), which does essentially this. It's not aware of anything to do with submodules, but a submodule doesn't know it's a submodule - it's the parent repo that knows about that stuff. I haven't tried this, but I'm fairly certain you could use it something like this:
# remove the target first (new-workdir will refuse to overwrite)
rm -rf submodule_b/submodule_m
# (original repo) (symlinked repo)
git new-workdir submodule_a/submodule_m submodule_b/submodule_m
It works by symlinking essentially all of the .git directory; the notable thing that isn't symlinked is HEAD
; the two directories can have different things checked out, but share the same refs and objects.
From here you should be good. When you run a git submodule
command in the supermodule, it just goes into the submodules and runs appropriate commands there, which will all work as expected.
The one thing you usually need to be aware of with symlinked repos like this is that they share the same set of branches, so if they both have the same branch checked out, and you commit to it in one, the other will become out of sync. With submodules this generally won't be a problem, though, since they're essentially always in detached HEAD state unless you intervene.
Solution 2:
git-new-workdir
might not be good solution as discussed here:
http://comments.gmane.org/gmane.comp.version-control.git/196019
It didn't work for me under git 1.7.10.
I have solved it for my use-case using hard links. I'm running OS X and the filesystem allows creating hard-links to directories: https://github.com/darwin/hlink
Now I can hard-link submodule directories and git treats them transparently. Hard linking has also nice property that all submodules are fully mirrored including HEAD which is a behavior I prefer in my case.
Ok, the idea is to have one "master" submodule repo and hard-link all "slave" copies back to it. This will make them all indistinguishable from each other and fully synced.
CAVEATS:
1) This works fine as long as relative paths in .git work. In other words you can hard-link only submodules sitting on same directory level in the directory tree. This was my case. I assume you can easily fix it by modifying .gitfiles with your hard-linking task. Note: This should be no issue before git 1.7.10, because previously submodule's .git was a self-contained directory not just plaintext .git file pointing somewhere else.
2) Hard links may introduce some incompatibilities. For example TimeMachine gets confused because it uses hard links internally for versioning. Make sure you exclude your project directory form TimeMachine.
Here is an example of my rake task doing the job: https://github.com/binaryage/site/blob/3ef664693cafc972d05c57a64c41e89b1c947bfc/rakefile#L94-115