git shallow clone to specific tag

I want to clone the Linux kernel repo, but only from version 3.0 onwards, since the kernel repo is so huge it makes my versioning tools run faster if I can do a shallow clone. The core of my question is: how can I tell git what the "n" value is for the --depth parameter? I was hoping this would work:

git clone http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git --depth v3.0

thanks.


How about cloning the tag to a depth of 1?

  • git clone --branch mytag0.1 --depth 1 https://example.com/my/repo.git

Notes:

  • --depth 1 implies --single-branch, so no info from other branches is brought to the cloned repository
  • if you want to clone a local repository, use file:// instead of only the repository path

Read fully for a solution, but unfortunately, git clone does not work in the fashion you are requesting. The --depth parameter limits the number of revisions not the number of commits. There is not a clone parameter which limits the amount of commits. In your situation, even if you knew that there were only at most 10 revision differences from the file that has changed the most between v3.0 and the newest HEAD in the repo and used --depth 10 you could still get most or the whole repo history. Because some objects may not have as many as 10 revisions and you will get their history all the way back to the beginning of their first appearance in the repo.

Now here is how to do what you like: The key to your issue is that you need the commits between v3.0 and the recent most reference you want. Here are the steps I did to do just that:

  • git clone http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git --depth 10075 smaller_kernel_repo
  • cd smaller_kerenel_repo
  • Determine the sha of v3.0 git log --oneline v3.0^..v3.0
  • Create a graft point starting with this sha (it is 02f8c6aee8df3cdc935e9bdd4f2d020306035dbe)
  • echo "02f8c6aee8df3cdc935e9bdd4f2d020306035dbe" > .git/info/grafts
  • To get around some issues with some kernel log entries do: export GIT_AUTHOR_NAME="tmp" and export GIT_COMMITTER_NAME="tmp"

  • There is a nice warning about in the man page about git filter-branch rewriting history by following graft points... so lets abuse that, now run git filter-branch and sit back and wait...(and wait and wait)

Now you need to clean up everything:

git reflog expire --expire=now --all
git repack -ad  # Remove dangling objects from packfiles
git prune       # Remove dangling loose objects

This process is time consuming but not very complex. Hopefully it will save you all the time you were hoping for in the long run. At this point you will have is essentially a repo with an amended history of only v3.0 onwards from the linux-stable.git repo. Just like if used the --depth on clone you have the same restrictions on the repo and would only be able to modify and send patches from the history you already have. There are ways around that.. but it deserves its own Q&A.

I am in the process of testing out the last few steps myself, but the git filter-branch operation is still going. I'll update this post with any issues, but I'll go ahead and post it so you can start on this process if you find it acceptable.

UPDATE

Workaround for issue (fatal: empty ident <> not allowed). This issue stems with a problem in the commit history of the linux repo.

Change the git filter-branch command to:

git filter-branch --commit-filter '
    if [ "$GIT_AUTHOR_EMAIL" = "" ];
    then
            GIT_AUTHOR_EMAIL="tmp@tmp";
            GIT_AUTHOR_NAME='tmp'
            GIT_COMMITTER_NAME='Me'
            GIT_COMMITTER_EMAIL='[email protected]'
            git commit-tree "$@";
    else
            git commit-tree "$@";
    fi '

For someone who already has a clone this command will get the number of commits between tip of current branch and the tag 5.6:

$ git rev-list HEAD ^5.6 --count
407

I found this project implementing rev-list using the GitHub API: https://github.com/cjlarose/github-rev-list

The very lengthy man page on rev-list indicates there is a lot going on behind the scenes. There are many different paths to possibly count commits through with branches and merges coming and going. For this use case though that can probably be ignored(?)


Unfortunately the --depth parameter of git clone accepts only a number, the number of revisions to which the cloning repository should be truncated.

A possible solution is to clone entire repository, and then truncate its history to keep only commits after v3.0. Here is a good how-to: http://bogdan.org.ua/2011/03/28/how-to-truncate-git-history-sample-script-included.html

git checkout --orphan temp v3.0
git commit -m "Truncated history"
git rebase --onto temp v3.0 master
git branch -D temp
git gc