Git commands that could break/rewrite the history
Can you provide a list of (all, or the most common) the operations or commands that can compromise the history in git?
What should be absolutely avoided?
- Amend a commit after a push of this one (
git commit
/git push
/git commit --amend
) - Rebase toward something that has already pushed
I would like this question (if it has not already asked before somewhere else) to become some kind of reference on the common avoidable operations on git.
Moreover I use git reset
a lot, but am not completely aware of the possible damage I could do to the repository (or to the other contributors copies). Can git reset
be dangerous?
Note that, starting Git 2.24 (Q4 2019), the list above might not need to include git filter-branch
anymore.
git filter-branch
is being deprecated (BFG too)
See commit 483e861, commit 9df53c5, commit 7b6ad97 (04 Sep 2019) by Elijah Newren (newren
).
(Merged by Junio C Hamano -- gitster
-- in commit 91243b0, 30 Sep 2019)
Recommend
git-filter-repo
instead ofgit-filter-branch
filter-branch
suffers from a deluge of disguised dangers that disfigure history rewrites (i.e. deviate from the deliberate changes).Many of these problems are unobtrusive and can easily go undiscovered until the new repository is in use.
This can result in problems ranging from an even messier history than what led folks tofilter-branch
in the first place, to data loss or corruption. These issues cannot be backward compatibly fixed, so add a warning to bothfilter-branch
and its manpage recommending that another tool (such asfilter-repo
) be used instead.Also, update other manpages that referenced
filter-branch
.
Several of these needed updates even if we could continue recommendingfilter-branch
, either due to implying that something was unique tofilter-branch
when it applied more generally to all history rewriting tools (e.g.BFG
,reposurgeon
,fast-import
,filter-repo
), or because something aboutfilter-branch
was used as an example despite other more commonly known examples now existing.
Reword these sections to fix these issues and to avoid recommendingfilter-branch
.Finally, remove the section explaining
BFG Repo Cleaner
as an alternative tofilter-branch
.
I feel somewhat bad about this, especially since I feel like I learned so much from BFG that I put to good use infilter-repo
(which is much more than I can say forfilter-branch
), but keeping that section presented a few problems:
- In order to recommend that people quit using
filter-branch
, we need to provide them a recommendation for something else to use that can handle all the same types of rewrites.
To my knowledge,filter-repo
is the only such tool. So it needs to be mentioned.- I don't want to give conflicting recommendations to users
- If we recommend two tools, we shouldn't expect users to learn both and pick which one to use; we should explain which problems one can solve that the other can't or when one is much faster than the other.
BFG
andfilter-repo
have similar performance- All filtering types that
BFG
can do,filter-repo
can also do.
In fact,filter-repo
comes with a reimplementation ofBFG
namedbfg-ish
which provides the same user-interface asBFG
but with several bugfixes and new features that are hard to implement inBFG
due to its technical underpinnings.While I could still mention both tools, it seems like I would need to provide some kind of comparison and I would ultimately just say that
filter-repo
can do everythingBFG
can, so ultimately it seems that it is just better to remove that section altogether.
the operations or commands that can compromise the history in git?
At least, the newren/git-filter-repo
can recover from any history compromised by its usage.
Amongst its stated goals:
More intelligent safety
Writing copies of the original refs to a special namespace within the repo does not provide a user-friendly recovery mechanism. Many would struggle to recover using that.
Almost everyone I've ever seen do a repository filtering operation has done so with a fresh clone, because wiping out the clone in case of error is a vastly easier recovery mechanism.
Strongly encourage that workflow by detecting and bailing if we're not in a fresh clone, unless the user overrides with--force
.
git filter-repo
as mentioned in the documentation roughly works by running:
git fast-export <options> | filter | git fast-import <options>
And git fast-export
/ git fast-import
has some improvment with git 2.24 (Q4 2019)
See commit 941790d, commit 8d7d33c, commit a1638cf, commit 208d692, commit b8f50e5, commit f73b2ab, commit 3164e6b (03 Oct 2019), and commit af2abd8 (25 Sep 2019) by Elijah Newren (newren
).
(Merged by Junio C Hamano -- gitster
-- in commit 16d9d71, 15 Oct 2019)
For example:
fast-import
: allow tags to be identified by mark labelsSigned-off-by: Elijah Newren
Mark identifiers are used in
fast-export
andfast-import
to provide a label to refer to earlier content.Blobs are given labels because they need to be referenced in the commits where they first appear with a given filename, and commits are given labels because they can be the parents of other commits.
Tags were never given labels, probably because they were viewed as unnecessary, but that presents two problems:
- It leaves us without a way of referring to previous tags if we want to create a tag of a tag (or higher nestings).
- It leaves us with no way of recording that a tag has already been imported when using
--export-marks
and--import-marks
.Fix these problems by allowing an optional mark label for tags.
Off the top of my head:
-
git commit --amend
will re-write the previous commit -
git rebase
can rewrite multiple commits (rebase is also called when usinggit pull
with the--rebase
flag or thebranch.$name.rebase
config option) -
git filter-branch
can rewrite multiple commits -
git push -f
can change the commit a branch points to (same goes for thegit push origin +branch
syntax) -
git reset
can change the commit a branch points to -
git branch -f
can change the commit a branch points to (by recreating a branch with the same name) -
git checkout -B
can change the commit a branch points to (by recreating a branch with the same name)
knittl has already compiled a good list of the commands that rewrite history, but I wanted to build upon his answer.
Can you provide a list of [...] the operations or commands that can compromise the history in git? What should be absolutely avoided?
First of all, there is nothing wrong with rewriting/deleting history per se; after all, you probably routinely create feature branches, keep them strictly local, then delete (after merging them or realising they lead you nowhere) without thinking twice about it.
However, you can and certainly will run into problems when you locally rewrite/delete history that other people have already access to and then push it to a shared remote.
Operations that should count as rewriting/deleting the history of a local repo
Of course, there are dumb ways of corrupting or deleting history (e.g. tampering with the contents of .git/objects/
) , but those are outside the scope of my answer.
You can rewrite history of a local repo in various ways. The section of the Pro Git book entitled Rewriting history, mentions a few
git amend --commit
git rebase
git filter-branch
- Roberto Tyley's BFG Repo Cleaner (a 3rd-party tool)
Arguably, there are more. Any operation that has the potential to alter or otherwise move a non-symbolic reference (branch or tag) and make it point to a commit that is not a descendant of the branch's current tip should count as rewriting local history. This includes:
-
git commit --amend
: replaces the last commit; - All forms of rebase (incl.
git pull --rebase
); -
git reset
(see an example below); -
git checkout -B
andgit branch -f
: resets an existing branch to a different commit; -
git tag --force
: recreates a tag with the same name but potentially pointing to another commit.
Any deletion of a non-symbolic reference (branch or tag) may also be considered history deleting:
-
git branch -d
orgit branch -D
git tag -d
Arguably, deleting a branch that has been fully merged into another should be considered only a mild form of history deleting, if at all.
Tags are different, though. Deleting a lightweight tag is not such a big deal, but deleting an annotated tag, which is a bona fide Git object, should count as deleting local history.
Operations that rewrite/delete the history of a remote repo
As for as I know, only a git push -f
(equivalent to git push --force
) has the potential to rewrite/delete history in the remote repository.
That said, it is possible to
- disable the ability to force-update remote branches to non-fast-forward references, by setting
receive.denyNonFastForwards
on the server. - disable the ability to delete a branch living on a remote repository, by setting
receive.denyDeletes
on the server.
Moreover I use
git reset
a lot, but am not completely aware of the possible damage I could do to the repository (or to the other contributors copies). Cangit reset
be dangerous?
git-reset
, as mentioned by knittl, usually changes where a branch reference points. This command can be dangerous, in so far as it can make reachable commits become unreachable. Because a picture speaks a thousand words, consider the following situation:
You're on the master
branch, which points at commit D
. Now, let's say you run, for instance,
git reset master~2
A soft reset is considered to be the most benign form of reset, because it "only" changes where the current branch points to, but doesn't affect the staging area or your working tree. That said, merely changing where a branch points to in that fashion has ramifications: after that soft reset, you will end up with
Commits C
and D
, which were reachable from master
before the reset, have now become unreachable; in other words, they're not ancestors of any reference (branch, tag, or HEAD). You could say that they're in "repository limbo"; they still exists in your Git repo's object database, but they will no longer be listed in the output of git log
.
If you actually found those commits valuable before the reset, you should make them reachable again by making some reference (e.g. another branch) point to commit D
again. Otherwise, commits C
and D
will end up dying a true death when Git runs its automatic garbage collection and deletes unreachable objects.
You can, in theory, fish commit D
out of the reflog, but there is always a risk that you will forget about those unreachable commits or won't be able to identify which entry of the reflog corresponds to commit D
.
In conclusion, yes, git-reset
can be dangerous, and it's a good idea to make sure the current tip of the branch you're about to reset will remain reachable after the reset. If needed, create another branch there before the reset, just in case, as a backup; and if you're sure you want to forget those commits, you can always delete that branch later.
From experience one of the most dangerous commands is
git push -f mirror
This mirrors your local repo onto remote in process removing all other branches other than the ones you have on your local repo.