Add subdirectory of remote repo with git-subtree

Solution 1:

I've been experimenting with this, and found some partial solutions, though none are quite perfect.

For these examples, I'll consider merging the four files from contrib/completion/ of https://github.com/git/git.git into third_party/git_completion/ of the local repository.

1. git diff | git apply

This is probably the best way I've found. I only tested one-way merging; I haven't tried sending changes back to the upstream repository.

# Do this the first time:
$ git remote add -f -t master --no-tags gitgit https://github.com/git/git.git
# The next line is optional. Without it, the upstream commits get
# squashed; with it they will be included in your local history.
$ git merge -s ours --no-commit gitgit/master
# The trailing slash is important here!
$ git read-tree --prefix=third_party/git-completion/ -u gitgit/master:contrib/completion
$ git commit

# In future, you can merge in additional changes as follows:
# The next line is optional. Without it, the upstream commits get
# squashed; with it they will be included in your local history.
$ git merge -s ours --no-commit gitgit/master
# Replace the SHA1 below with the commit hash that you most recently
# merged in using this technique (i.e. the most recent commit on
# gitgit/master at the time).
$ git diff --color=never 53e53c7c81ce2c7c4cd45f95bc095b274cb28b76:contrib/completion gitgit/master:contrib/completion | git apply -3 --directory=third_party/git-completion
# Now fix any conflicts if you'd modified third_party/git-completion.
$ git commit

Since it's awkward having to remember the most recent commit SHA1 that you merged from the upstream repository, I've written this Bash function which does all the hard work for you (grabbing it from git log):

git-merge-subpath() {
    local SQUASH
    if [[ $1 == "--squash" ]]; then
        SQUASH=1
        shift
    fi
    if (( $# != 3 )); then
        local PARAMS="[--squash] SOURCE_COMMIT SOURCE_PREFIX DEST_PREFIX"
        echo "USAGE: ${FUNCNAME[0]} $PARAMS"
        return 1
    fi

    # Friendly parameter names; strip any trailing slashes from prefixes.
    local SOURCE_COMMIT="$1" SOURCE_PREFIX="${2%/}" DEST_PREFIX="${3%/}"

    local SOURCE_SHA1
    SOURCE_SHA1=$(git rev-parse --verify "$SOURCE_COMMIT^{commit}") || return 1

    local OLD_SHA1
    local GIT_ROOT=$(git rev-parse --show-toplevel)
    if [[ -n "$(ls -A "$GIT_ROOT/$DEST_PREFIX" 2> /dev/null)" ]]; then
        # OLD_SHA1 will remain empty if there is no match.
        local RE="^${FUNCNAME[0]}: [0-9a-f]{40} $SOURCE_PREFIX $DEST_PREFIX\$"
        OLD_SHA1=$(git log -1 --format=%b -E --grep="$RE" \
                   | grep --color=never -E "$RE" | tail -1 | awk '{print $2}')
    fi

    local OLD_TREEISH
    if [[ -n $OLD_SHA1 ]]; then
        OLD_TREEISH="$OLD_SHA1:$SOURCE_PREFIX"
    else
        # This is the first time git-merge-subpath is run, so diff against the
        # empty commit instead of the last commit created by git-merge-subpath.
        OLD_TREEISH=$(git hash-object -t tree /dev/null)
    fi &&

    if [[ -z $SQUASH ]]; then
        git merge -s ours --no-commit "$SOURCE_COMMIT"
    fi &&

    git diff --color=never "$OLD_TREEISH" "$SOURCE_COMMIT:$SOURCE_PREFIX" \
        | git apply -3 --directory="$DEST_PREFIX" || git mergetool

    if (( $? == 1 )); then
        echo "Uh-oh! Try cleaning up with |git reset --merge|."
    else
        git commit -em "Merge $SOURCE_COMMIT:$SOURCE_PREFIX/ to $DEST_PREFIX/

# Feel free to edit the title and body above, but make sure to keep the
# ${FUNCNAME[0]}: line below intact, so ${FUNCNAME[0]} can find it
# again when grepping git log.
${FUNCNAME[0]}: $SOURCE_SHA1 $SOURCE_PREFIX $DEST_PREFIX"
    fi
}

Use it like this:

# Do this the first time:
$ git remote add -f -t master --no-tags gitgit https://github.com/git/git.git
$ git-merge-subpath gitgit/master contrib/completion third_party/git-completion

# In future, you can merge in additional changes as follows:
$ git fetch gitgit
$ git-merge-subpath gitgit/master contrib/completion third_party/git-completion
# Now fix any conflicts if you'd modified third_party/git-completion.

2. git read-tree

If you're never going to make local changes to the merged in files, i.e. you're happy to always overwrite the local subdirectory with the latest version from upstream, then a similar but simpler approach is to use git read-tree:

# Do this the first time:
$ git remote add -f -t master --no-tags gitgit https://github.com/git/git.git
# The next line is optional. Without it, the upstream commits get
# squashed; with it they will be included in your local history.
$ git merge -s ours --no-commit gitgit/master
$ git read-tree --prefix=third_party/git-completion/ -u gitgit/master:contrib/completion
$ git commit

# In future, you can *overwrite* with the latest changes as follows:
# As above, the next line is optional (affects squashing).
$ git merge -s ours --no-commit gitgit/master
$ git rm -rf third_party/git-completion
$ git read-tree --prefix=third_party/git-completion/ -u gitgit/master:contrib/completion
$ git commit

I found a blog post that claimed to be able to merge (without overwriting) using a similar technique, but it didn't work when I tried it.

3. git subtree

I did actually find a solution that uses git subtree, thanks to http://jrsmith3.github.io/merging-a-subdirectory-from-another-repo-via-git-subtree.html, but it's incredibly slow (each git subtree split command below takes me 9 minutes for a 28 MB repo with 39000 commits on a dual Xeon X5675, whereas the other solutions I found take less than a second).

If you can live with the slowness, it should be workable:

# Do this the first time:
$ git remote add -f -t master --no-tags gitgit https://github.com/git/git.git
$ git checkout gitgit/master
$ git subtree split -P contrib/completion -b temporary-split-branch
$ git checkout master
$ git subtree add --squash -P third_party/git-completion temporary-split-branch
$ git branch -D temporary-split-branch

# In future, you can merge in additional changes as follows:
$ git checkout gitgit/master
$ git subtree split -P contrib/completion -b temporary-split-branch
$ git checkout master
$ git subtree merge --squash -P third_party/git-completion temporary-split-branch
# Now fix any conflicts if you'd modified third_party/git-completion.
$ git branch -D temporary-split-branch

Note that I pass in --squash to avoid polluting the local repository with lots of commits, but you can remove --squash if you'd prefer to preserve the commit history.

It's possible that subsequent splits can be made faster using --rejoin (see https://stackoverflow.com/a/16139361/691281) - I didn't test that.

4. Whole repo git subtree

The OP clearly stated that they want to merge a subdirectory of an upstream repository into a subdirectory of the local repository. If however instead you want to merge an entire upstream repository into a subdirectory of your local repository, then there's a simpler, cleaner, and better supported alternative:

# Do this the first time:
$ git subtree add --squash --prefix=third_party/git https://github.com/git/git.git master

# In future, you can merge in additional changes as follows:
$ git subtree pull --squash --prefix=third_party/git https://github.com/git/git.git master

Or if you prefer to avoid repeating the repository URL, then you can add it as a remote:

# Do this the first time:
$ git remote add -f -t master --no-tags gitgit https://github.com/git/git.git
$ git subtree add --squash --prefix=third_party/git gitgit/master

# In future, you can merge in additional changes as follows:
$ git subtree pull --squash --prefix=third_party/git gitgit/master

# And you can push changes back upstream as follows:
$ git subtree push --prefix=third_party/git gitgit/master
# Or possibly (not sure what the difference is):
$ git subtree push --squash --prefix=third_party/git gitgit/master

See also:

  • http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/
  • https://hpc.uni.lu/blog/2014/understanding-git-subtree/
  • http://getlevelten.com/blog/tom-mccracken/smarter-drupal-projects-projects-management-git-subtree

5. Whole repo git submodule

A related technique is git submodules, but they come with annoying caveats (for example people who clone your repository won't clone the submodules unless they call git clone --recursive), so I didn't investigate whether they can support subpaths.

Edit: git-subtrac (from the author of the earlier git-subtree) seems to solve some of the problems with git submodules. So this might be a good option for merging an entire upstream repository into a subdirectory, but it still doesn't appear to support including only a subdirectory of the upstream repository.

Solution 2:

I was able to do something like this by adding :dirname to the read-tree command. (note that I'm actually just trying to learn git and git-subtrees myself this week, and trying to setup an environment similar to how I had my projects in subversion using svn:externals -- my point being that there might be a better or easier way than the commands I'm showing here...)

So for example, using your example structure above:

git remote add library_remote _URL_TO_LIBRARY_REPO_
git fetch library_remote
git checkout -b library_branch library_remote/master
git checkout master
git read-tree --prefix=dir1 -u library_branch:libdir