Fetching multiple Git remotes in parallel
The answer is actually maybe. In particular:
git remote | xargs --max-procs=4 -n 1 git fetch
As you've seen, this actually works when tested, up to a point. I wrote a fancy version of the same kind of thing once (with fancy display control of the fetching process, all written in Python—it turns out that there's a bug in git fetch --progress
, though, so that this does not work right with pipes; you must use ptys).
without clashing with the git file locking ... it seems to work when all repositories are unrelated to each other.
That's the rub: each fetch assumes it can get its locks. The fetches need to lock each remote-tracking name, and usually that works just fine since the names are separate—remote A
does not interfere with remote B
because refs/remotes/A/master
and refs/remotes/B/master
use different locks—but the final repacking may fail unless you do what you did, disable auto-gc
and then run GC yourself (you should also re-renable it afterward).
You may also end up fetching more data than necessary (as I noted in the other answer). There is not much you can do about this without external information, e.g., maybe there's one remote you should always fetch first.
but the final repacking may fail unless you do what you did, disable auto-gc and then run GC yourself
Actually, with Git 2.23 (Q3 2019), that might not be necessary anymore.
"git fetch
" that grabs from a group of remotes learned to run the
auto-gc
only once at the very end.
See commit c3d6b70 (19 Jun 2019) by Nguyễn Thái Ngọc Duy (pclouds
).
(Merged by Junio C Hamano -- gitster
-- in commit 892d3fb, 09 Jul 2019)
fetch
: only run 'gc
' once when fetching multiple remotesIn multiple remotes mode,
git-fetch
is launched for n-1 remotes and the last remote is handled by the current process. Each of these processes will in turn run 'gc
' at the end.This is not really a problem because even if multiple '
gc --auto
' is run at the same time we still handle it correctly.
It does show multiple "auto packing in the background" messages though.
And we may waste some resources whengc
actually runs because we still do some stuff before checking the lock and moving it to background.So let's try to avoid that.
We should only need one '
gc
' run after all objects and references are added anyway.Add a new option
--no-auto-gc
that will be used by thosen-1
processes.
'gc --auto
' will always run on the main fetch process (*).(*) even if we fetch remotes in parallel at some point in future, this should still be fine because we should "join" all those processes before this step.