How to extract one file with commit history from a Git repo with index-filter & co?
Solution 1:
A faster and easier-to-understand filter that accomplishes the same thing:
git filter-branch --index-filter '
git read-tree --empty
git reset $GIT_COMMIT -- $your $files $here
' \
-- --all -- $your $files $here
Solution 2:
Seems it's not particularly easy, and that's the reason I'll be answering my own question despite many similar questions regarding git [index-filter|subdirectory-filter|filter-tree], as I needed to use all the previous to achieve this!
First a quick note, that even a spell like in a comment on Splitting a set of files within a git repo into their own repository, preserving relevant history
SPELL='git ls-tree -r --name-only --full-tree "$GIT_COMMIT" | grep -v "trie.lisp" | tr "\n" "\0" | xargs -0 git rm --cached -r --ignore-unmatch'
git filter-branch --prune-empty --index-filter "$SPELL" -- --all
will not help with files named like imaging/DrinkkejaI<0300>$'\302\210'.txt_74x2032.gif
.
The aI<0300>$'\302\210'
part once was a single letter: ä
.
So in order to extract a single file, in addition to filter-branch I also needed to do:
git filter-branch -f --subdirectory-filter lisp/source/model HEAD
Alternatively, you can use --tree-filter: (the test is needed, because the file was at another directory earlier, see: How can I move a directory in a Git repo for all commits?)
MV_FILTER='test -f source/model/trie.lisp && mv ./source/model/trie.lisp . || echo "Nothing to do."'
git filter-branch --tree-filter $MV_FILTER HEAD --all
To see all the names a file have had, use:
git log --pretty=oneline --follow --name-only git-path/to/file | grep -v ' ' | sort -u
As described at http://whileimautomaton.net/2010/04/03012432
Also follow the steps on afterwards:
$ git reset --hard
$ git gc --aggressive
$ git prune
$ git remote rm origin # Otherwise changes will be pushed to where the repo was cloned from
Solution 3:
Note that things get much easier if you combine this with the additional step of moving the desired file(s) into a new directory.
This might be a quite common use case (e.g. moving the desired single file to the root dir).
I did it (using git 1.9) like this (first moving the file(s), then deleting the old tree):
git filter-branch -f --tree-filter 'mkdir -p new_path && git mv -k -f old_path/to/file new_path/'
git filter-branch -f --prune-empty --index-filter 'git rm -r --cached --ignore-unmatch old_path'
You can even easily use wildcards for the desired files (without messing around with grep -v ).
I'd think that this ('mv' and 'rm') could also be done in one filter-branch but it did'n work for me.
I didn't try it with weird characters but I hope this helps anyway. Making things easier seems always to be a good idea to me.
Hint:
This is a time consuming action on large repos. So if you want to do several actions (like getting a bunch of files and then rearrange them in 'new_path/subdirs') it's a good idea to do the 'rm' part as soon as possible to get a smaller and faster tree.
Solution 4:
I've found an elegant solution using git log and git am here: https://www.pixelite.co.nz/article/extracting-file-folder-from-git-repository-with-full-git-history/
In case it goes away, here's how you do it:
-
in the original repo,
git log --pretty=email --patch-with-stat --reverse --full-index --binary -- path/to/file_or_folder > /tmp/patch
-
if the file was in a subdirectory, or if you want to rename it
sed -i -e 's/deep\/path\/that\/you\/want\/shorter/short\/path/g' /tmp/patch
-
in a new, empty repo
git am < /tmp/patch
Solution 5:
The following will rewrite the history and keep only commits that touch the list of files you give. You probably want to do that in a clone of your repository to avoid losing the original history.
FILES='path/to/file1 other-path/to/file2 file3'
git filter-branch --prune-empty --index-filter "
git read-tree --empty
git reset \$GIT_COMMIT -- $FILES
" \
-- --all -- $FILES
Then you can merge that new branch into your target repository, via normal merge
or rebase
commands according to your use-case.