Git sparse checkout with exclusion
Sadly none of the above worked for me so I spent very long time trying different combination of sparse-checkout
file.
In my case I wanted to skip folders with IntelliJ IDEA configs.
Here is what I did:
Run git clone https://github.com/myaccount/myrepo.git --no-checkout
Run git config core.sparsecheckout true
Created .git\info\sparse-checkout
with following content
!.idea/*
!.idea_modules/*
/*
Run 'git checkout --' to get all files.
Critical thing to make it work was to add /*
after folder's name.
I have git 1.9
I would have expected something like the below to work:
/*
!presentations/heavy_presentation
But it doesn't. And I did try many other combinations. I think the exclude is not implemented properly and there are bugs around it (still)
Something like:
presentations/*
!presentations/heavy_presentation
does work though and you will get the presentations folder without the heavy_presentation folder.
So the workaround would be to include everything else explicitly.
With Git 2.25 (Q1 2020), Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout
" command.
First, here is an extended example, starting with a fast clone using a --filter
option:
git clone --filter=blob:none --no-checkout https://github.com/git/git
cd git
git sparse-checkout init --cone
# that sets git config core.sparseCheckoutCone true
git read-tree -mu HEAD
Using the cone option (detailed/documented below) means your .git\info\sparse-checkout
will include patterns starting with:
/*
!/*/
Meaning: only top files, no subfolder.
If you do not want top file, you need to avoid the cone mode:
# Disablecone mode in .git/config.worktree
git config core.sparseCheckoutCone false
# remove .git\info\sparse-checkout
git sparse-checkout disable
# Add the expected pattern, to include just a subfolder without top files:
git sparse-checkout set /mySubFolder/
# populate working-tree with only the right files:
git read-tree -mu HEAD
In details:
(See more at "Bring your monorepo down to size with sparse-checkout
" from
Derrick Stolee)
So not only excluding a subfolder does work, but it will work faster with the "cone" mode of a sparse checkout (with Git 2.25).
See commit 761e3d2 (20 Dec 2019) by Ed Maste (emaste
).
See commit 190a65f (13 Dec 2019), and commit cff4e91, commit 416adc8, commit f75a69f, commit fb10ca5, commit 99dfa6f, commit e091228, commit e9de487, commit 4dcd4de, commit eb42fec, commit af09ce2, commit 96cc8ab, commit 879321e, commit 72918c1, commit 7bffca9, commit f6039a9, commit d89f09c, commit bab3c35, commit 94c0956 (21 Nov 2019) by Derrick Stolee (derrickstolee
).
See commit e6152e3 (21 Nov 2019) by Jeff Hostetler (Jeff-Hostetler
).
(Merged by Junio C Hamano -- gitster
-- in commit bd72a08, 25 Dec 2019)
sparse-checkout
: add 'cone' modeSigned-off-by: Derrick Stolee
The sparse-checkout feature can have quadratic performance as the number of patterns and number of entries in the index grow.
If there are 1,000 patterns and 1,000,000 entries, this time can be very significant.Create a new Boolean config option, core.sparseCheckoutCone, to indicate that we expect the sparse-checkout file to contain a more limited set of patterns.
This is a separate config setting fromcore.sparseCheckout
to avoid breaking older clients by introducing a tri-state option.
The config
man page includes:
`core.sparseCheckoutCone`:
Enables the "cone mode" of the sparse checkout feature.
When the sparse-checkout file contains a limited set of patterns, then this mode provides significant performance advantages.
The git sparse-checkout
man page details:
CONE PATTERN SET
The full pattern set allows for arbitrary pattern matches and complicated inclusion/exclusion rules.
These can result inO(N*M)
pattern matches when updating the index, whereN
is the number of patterns andM
is the number of paths in the index. To combat this performance issue, a more restricted pattern set is allowed whencore.spareCheckoutCone
is enabled.The accepted patterns in the cone pattern set are:
- Recursive: All paths inside a directory are included.
- Parent: All files immediately inside a directory are included.
In addition to the above two patterns, we also expect that all files in the root directory are included. If a recursive pattern is added, then all leading directories are added as parent patterns.
By default, when running
git sparse-checkout init
, the root directory is added as a parent pattern. At this point, the sparse-checkout file contains the following patterns:/* !/*/
This says "include everything in root, but nothing two levels below root."
If we then add the folderA/B/C
as a recursive pattern, the foldersA
andA/B
are added as parent patterns.
The resulting sparse-checkout file is now/* !/*/ /A/ !/A/*/ /A/B/ !/A/B/*/ /A/B/C/
Here, order matters, so the negative patterns are overridden by the positive patterns that appear lower in the file.
If
core.sparseCheckoutCone=true
, then Git will parse the sparse-checkout file expecting patterns of these types.
Git will warn if the patterns do not match.
If the patterns do match the expected format, then Git will use faster hash- based algorithms to compute inclusion in thesparse-checkout
.
So:
sparse-checkout
: init and set in cone modeHelped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee
To make the cone pattern set easy to use, update the behavior of '
git sparse-checkout (init|set)
'.Add '
--cone
' flag to 'git sparse-checkout init
' to set the config option 'core.sparseCheckoutCone=true
'.When running '
git sparse-checkout set
' in cone mode, a user only needs to supply a list of recursive folder matches. Git will automatically add the necessary parent matches for the leading directories.
Note, the --cone
option is only documented in Git 2.26 (Q1 2020)
(Merged by Junio C Hamano -- gitster
-- in commit ea46d90, 05 Feb 2020)
doc
:sparse-checkout
: mention--cone
optionSigned-off-by: Matheus Tavares
Acked-by: Derrick Stolee
In af09ce2 ("
sparse-checkout
: init and set in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge), the '--cone
' option was added to 'git sparse-checkout
init'.Document it in
git sparse-checkout
:
That includes:
When
--cone
is provided, thecore.sparseCheckoutCone
setting is also set, allowing for better performance with a limited set of patterns.
("set of patterns" presented above, in the "CONE PATTERN SET
" section of this answer)
How much faster this new "cone" mode would be?
sparse-checkout
: use hashmaps for cone patternsHelped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee
The parent and recursive patterns allowed by the "cone mode" option in sparse-checkout are restrictive enough that we can avoid using the regex parsing. Everything is based on prefix matches, so we can use hashsets to store the prefixes from the sparse-checkout file. When checking a path, we can strip path entries from the path and check the hashset for an exact match.
As a test, I created a cone-mode sparse-checkout file for the Linux repository that actually includes every file. This was constructed by taking every folder in the Linux repo and creating the pattern pairs here:
/$folder/ !/$folder/*/
This resulted in a sparse-checkout file sith 8,296 patterns.
Running 'git read-tree -mu HEAD' on this file had the following performance:core.sparseCheckout=false: 0.21 s (0.00 s) core.sparseCheckout=true : 3.75 s (3.50 s) core.sparseCheckoutCone=true : 0.23 s (0.01 s)
The times in parentheses above correspond to the time spent in the first
clear_ce_flags()
call, according to thetrace2
performance traces.While this example is contrived, it demonstrates how these patterns can slow the sparse-checkout feature.
And:
sparse-checkout
: respect core.ignoreCase in cone modeSigned-off-by: Derrick Stolee
When a user uses the sparse-checkout feature in cone mode, they add patterns using "
git sparse-checkout set <dir1> <dir2> ...
" or by using "--stdin
" to provide the directories line-by-line over stdin.
This behaviour naturally looks a lot like the way a user would type "git add <dir1> <dir2> ...
"If
core.ignoreCase
is enabled, then "git add
" will match the input using a case-insensitive match.
Do the same for thesparse-checkout
feature.Perform case-insensitive checks while updating the skip-worktree bits during
unpack_trees()
. This is done by changing the hash algorithm and hashmap comparison methods to optionally use case- insensitive methods.When this is enabled, there is a small performance cost in the hashing algorithm.
To tease out the worst possible case, the following was run on a repo with a deep directory structure:git ls-tree -d -r --name-only HEAD | git sparse-checkout set --stdin
The 'set' command was timed with
core.ignoreCase
disabled or enabled.
For the repo with a deep history, the numbers werecore.ignoreCase=false: 62s core.ignoreCase=true: 74s (+19.3%)
For reproducibility, the equivalent test on the Linux kernel repository had these numbers:
core.ignoreCase=false: 3.1s core.ignoreCase=true: 3.6s (+16%)
Now, this is not an entirely fair comparison, as most users will define their sparse cone using more shallow directories, and the performance improvement from eb42feca97 ("unpack-trees: hash less in cone mode" 2019-11-21, Git 2.25-rc0) can remove most of the hash cost. For a more realistic test, drop the "
-r
" from thels-tree
command to store only the first-level directories.
In that case, the Linux kernel repository takes 0.2-0.25s in each case, and the deep repository takes one second, plus or minus 0.05s, in each case.Thus, we can demonstrate a cost to this change, but it is unlikely to matter to any reasonable sparse-checkout cone.
With Git 2.25 (Q1 2020), "git sparse-checkout
list" subcommand learned to give its output in a more concise form when the "cone" mode is in effect.
See commit 4fd683b, commit de11951 (30 Dec 2019) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit c20d4fd, 06 Jan 2020)
sparse-checkout
: list directories in cone modeSigned-off-by: Derrick Stolee
When
core.sparseCheckoutCone
is enabled, the 'git sparse-checkout set
' command takes a list of directories as input, then creates an ordered list of sparse-checkout patterns such that those directories are recursively included and all sibling entries along the parent directories are also included.
Listing the patterns is less user-friendly than the directories themselves.In cone mode, and as long as the patterns match the expected cone-mode pattern types, change the output of '
git sparse-checkout list
' to only show the directories that created the patterns.With this change, the following piped commands would not change the working directory:
git sparse-checkout list | git sparse-checkout set --stdin
The only time this would not work is if
core.sparseCheckoutCone
istrue
, but the sparse-checkout file contains patterns that do not match the expected pattern types for cone mode.
The code recently added in this release to move to the entry beyond the ones in the same directory in the index in the sparse-cone mode did not count the number of entries to skip over incorrectly, which has been corrected, with Git 2.25.1 (Feb. 2020).
See commit 7210ca4 (27 Jan 2020) by Junio C Hamano (gitster
).
See commit 4c6c797 (10 Jan 2020) by Derrick Stolee via GitGitGadget (``).
(Merged by Junio C Hamano -- gitster
-- in commit 043426c, 30 Jan 2020)
unpack-trees
: correctly compute result countReported-by: Johannes Schindelin
Signed-off-by: Derrick Stolee
The
clear_ce_flags_dir()
method processes the cache entries within a common directory. The returnedint
is the number of cache entries processed by that directory.
When using the sparse-checkout feature in cone mode, we can skip the pattern matching for entries in the directories that are entirely included or entirely excluded.eb42feca ("
unpack-trees
: hash less in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge listed in batch #0) introduced this performance feature. The old mechanism relied on the counts returned by callingclear_ce_flags_1()
, but the new mechanism calculated the number of rows by subtracting "cache_end
" from "cache
" to find the size of the range.
However, the equation is wrong because it divides bysizeof(struct cache_entry *)
. This is not how pointer arithmetic works!A coverity build of Git for Windows in preparation for the 2.25.0 release found this issue with the warning:
Pointer differences, such as `cache_end` - cache, are automatically scaled down by the size (8 bytes) of the pointed-to type (struct `cache_entry` *). Most likely, the division by sizeof(struct `cache_entry` *) is extraneous and should be eliminated.
This warning is correct.
This leaves us with the question "how did this even work?"
The problem that occurs with this incorrect pointer arithmetic is a performance-only bug, and a very slight one at that.
Since the entry count returned byclear_ce_flags_dir()
is reduced by a factor of 8, the loop inclear_ce_flags_1()
will re-process entries from those directories.By inserting global counters into
unpack-tree.c
and tracing them withtrace2_data_intmax()
(in a private change, for testing), I was able to see count how many times the loop insideclear_ce_flags_1()
processed an entry and how many timesclear_ce_flags_dir()
was called.
Each of these are reduced by at least a factor of 8 with the current change.
A factor larger than 8 happens when multiple levels of directories are repeated.Specifically, in the Linux kernel repo, the command
git sparse-checkout set LICENSES
restricts the working directory to only the files at root and in the LICENSES directory.
Here are the measured counts:
clear_ce_flags_1
loop blocks:Before: 11,520 After: 1,621
clear_ce_flags_dir
calls:Before: 7,048 After: 606
While these are dramatic counts, the time spent in
clear_ce_flags_1()
is under one millisecond in each case, so the improvement is not measurable as an end-to-end time.
With Git 2.26 (Q1 2020), some rough edges in the sparse-checkout feature, especially around the cone mode, have been cleaned up.
See commit f998a3f, commit d2e65f4, commit e53ffe2, commit e55682e, commit bd64de4, commit d585f0e, commit 4f52c2c, commit 9abc60f (31 Jan 2020), and commit 9e6d3e6, commit 41de0c6, commit 47dbf10, commit 3c75406, commit d622c34, commit 522e641 (24 Jan 2020) by Derrick Stolee (derrickstolee
).
See commit 7aa9ef2 (24 Jan 2020) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit 433b8aa, 14 Feb 2020)
sparse-checkout
: fix cone mode behavior mismatchReported-by: Finn Bryant
Signed-off-by: Derrick Stolee
The intention of the special "cone mode" in the sparse-checkout feature is to always match the same patterns that are matched by the same sparse-checkout file as when cone mode is disabled.
When a file path is given to "
git sparse-checkout
set" in cone mode, then the cone mode improperly matches the file as a recursive path.
When setting the skip-worktree bits, files were not expecting theMATCHED_RECURSIVE
response, and hence these were left out of the matched cone.Fix this bug by checking for
MATCHED_RECURSIVE
in addition toMATCHED
and add a test that prevents regression.
The documentation now includes:
When
core.sparseCheckoutCone
is enabled, the input list is considered a list of directories instead of sparse-checkout patterns.
The command writes patterns to the sparse-checkout file to include all files contained in those directories (recursively) as well as files that are siblings of ancestor directories.
The input format matches the output ofgit ls-tree --name-only
. This includes interpreting pathnames that begin with a double quote ("
) as C-style quoted strings.
With Git 2.26 (Q1 2020), "git sparse-checkout
" learned a new "add
" subcommand.
See commit 6c11c6a (20 Feb 2020), and commit ef07659, commit 2631dc8, commit 4bf0c06, commit 6fb705a (11 Feb 2020) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit f4d7dfc, 05 Mar 2020)
sparse-checkout
: create 'add' subcommandSigned-off-by: Derrick Stolee
When using the sparse-checkout feature, a user may want to incrementally grow their sparse-checkout pattern set.
Allow adding patterns using a new 'add' subcommand.This is not much different from the 'set' subcommand, because we still want to allow the '
--stdin
' option and interpret inputs as directories when in cone mode and patterns otherwise.When in cone mode, we are growing the cone.
This may actually reduce the set of patterns when adding directoryA
whenA/B
is already a directory in the cone. Test the different cases: siblings, parents, ancestors.When not in cone mode, we can only assume the patterns should be appended to the sparse-checkout file.
And:
sparse-checkout
: work with Windows pathsSigned-off-by: Derrick Stolee
When using Windows, a user may run '
git sparse-checkout
set A\B\C' to add the Unix-style path
A/B/C` to their sparse-checkout patterns.Normalizing the input path converts the backslashes to slashes before we add the string '
A/B/C
' to the recursive hashset.
The sparse-checkout patterns have been forbidden from excluding all paths, leaving an empty working tree, for a long time.
With Git 2.27 (Q2 2020), this limitation has been lifted.
See commit ace224a (04 May 2020) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit e9acbd6, 08 May 2020)
sparse-checkout
: stop blocking empty workdirsReported-by: Lars Schneider
Signed-off-by: Derrick Stolee
Remove the error condition when updating the sparse-checkout leaves an empty working directory.
This behavior was added in 9e1afb167 ("sparse checkout: inhibit empty worktree", 2009-08-20, Git v1.7.0-rc0 -- merge).
The comment was added in a7bc906f2 ("Add explanation why we do not allow to sparse checkout to empty working tree", 2011-09-22, Git v1.7.8-rc0 -- merge) in response to a "dubious" comment in 84563a624 ("
[
unpack-trees.c](https
://github.com/git/git/blob/ace224ac5fb120e9cae894e31713ab60e91f141f/unpack-trees.c): cosmetic fix", 2010-12-22, Git v1.7.5-rc0 -- merge).With the recent "cone mode" and "
git sparse-checkout init [--cone]
" command, it is common to set a reasonable sparse-checkout pattern set of/* !/*/
which matches only files at root. If the repository has no such files, then their "
git sparse-checkout init
" command will fail.Now that we expect this to be a common pattern, we should not have the commands fail on an empty working directory.
If it is a confusing result, then the user can recover with "git sparse-checkout disable
" or "git sparse-checkout set
". This is especially simple when using cone mode.