How do you detect an evil merge in git?
I've created a simple git repo to illustrate my question, available on GitHub here: https://github.com/smileyborg/EvilMerge
Here's an illustration of the repo history:
master A---B---D---E-----G-----I
\ / \ /
another_branch ----C \ /
\ /
another_branch2 F---H
(In the actual repo on GitHub, D
is 4a48c9
, and I
is 48349d
.)
D
is a "simple" evil merge, where the merge commit "correctly" resolves a merge conflict, but also makes an unrelated "evil" change that did not exist in either parent. It is possible to discover the "evil" part of this merge by using git show -c
on this commit, as the output includes ++
and --
(as opposed to single +
and -
) to indicate the changes that did not exist in either parent (see this answer for context).
I
is a different kind of evil merge, where the merge commit "correctly" resolves a merge conflict (caused by changes from F
to file.txt
that conflict with changes from G
), but also "evilly" discards the changes made to a completely different file file2.txt
(effectively undoing the changes from H
).
How can you know that I
is an evil merge? In other words, what command(s) can you use to discover that I
not only manually resolves a conflict, but also fails to merge changes that it should have?
Edit/Update: What is an evil merge?
As pointed out by René Link below, it is hard (perhaps impossible) to define a generic set of criteria to identify an "evil merge". However, much like Supreme Court Justice Stewart said about pornography, evil merges are something you know when you see.
So perhaps a better question to ask is this: what git command(s) can you use on a merge commit to get a diff output of all novel changes introduced solely in the merge commit itself. This diff should include:
- all merge conflict resolutions (at least, if the resolution involved anything more complex than choosing one parent's changes over the other's)
- all additions or removals that did not exist in either parent (as seen in
D
) - all changes that did exist in one of the parents but that the merge commit discards (as seen in
I
)
The goal here is to be able to have a human look at this output and know whether the merge was successful or (accidentally or maliciously) "evil" without having to re-review all the previously-reviewed changes (e.g. F
and H
) that are being integrated in the merge.
The easiest thing to do would be to diff the results of your conflict resolution with a merge that auto-resolves conflicts without human intervention. Any automatic resolutions will be ignored, since they will be resolved in exactly the same way.
I see two ways of visualizing the possible "evil" resolutions. If you are making this into a script add &> /dev/null
to the end of all lines that you do not care to see output.
1) Use two separate diffs, one that favors the first parent, and a second that favors the second parent.
MERGE_COMMIT=<Merge Commit>
git checkout $MERGE_COMMIT~
git merge --no-ff --no-edit -s recursive -Xours $MERGE_COMMIT^2
echo "Favor ours"
git diff HEAD..$MERGE_COMMIT
git checkout $MERGE_COMMIT~
git merge --no-ff --no-edit -s recursive -Xtheirs $MERGE_COMMIT^2
echo "Favor theirs"
git diff HEAD..$MERGE_COMMIT
2) Diff against the results of the conflicted merge with the conflicts still in.
MERGE_COMMIT=<Merge Commit>
git checkout $MERGE_COMMIT~
git -c merge.conflictstyle=diff3 merge --no-ff $MERGE_COMMIT^2 --no-commit
git add $(git status -s | cut -c 3-)
git commit --no-edit
git diff HEAD..$MERGE_COMMIT
Before we can detect evil merges we must define what evil merges are.
Every merge that has conflicts must be resolved manually. In order to resolve conflicts we can
- take one of the changes and omit the other.
- eventually take both changes (in this case the order in the result might be important)
- take none of them and create a new change that is the consolidation of both.
- take none of them and omit both.
So what is an evil merge now?
According to this blog it is
a merge is considered evil if it does not faithfully integrate all changes from all parents.
So what is a "faithful integration"? I think noone can give a general answer, because it depends on the semantics of the code or text or whatever is merged.
Other say
An evil merge is a merge that introduces changes that do not appear in any parent.
With this definition all conflicts that are resolved by
- take one of the changes and omit the other.
- take none of them and create a new change that is the consolidation of both.
- take none of them and omit both.
are evil merges.
So we finally come to the questions.
Is it legal to
- only take one of the changes and omit the other?
- take both changes?
- take none of them and create a new change that is the consolidation of both?
- take none of them and omit both?
And things can become more complex if we think about octopus merges.
My conclusion
The only evil merge we can detect is a merge that was done without conflicts. In this case we can redo the merge and compare it with the merge that was already done. If there are differences than someone introduced more than he/she should and we can be sure that this merge is an evil merge.
At least I think we must detect evil merges manually, because it depends on the semantics of the changes and I'm not able to formulate a formal definition of what an evil merge is.
Disclaimer: As pointed out by @smileyborg, this solution will not detect a case where the evil merge completely reverted a change that was introduced by one of the parents. This defect occurs because according to the Git Docs for the -c
option
Furthermore, it lists only files which were modified from all parents.
I recently discovered a much simpler solution to this question than any of the current answers.
Basically, the default behavior of git show
for merge commits should solve your problem. In cases where the modifications from both sides of the merge do not touch, and no "evil" changes were made there will be no diff output. I had previously thought that git show
never shows diffs for merge commits. However, if a merge commit involves a messy conflict or an evil merge, then a diff will be displayed in combined format.
To view the combined format when viewing a number of commit patches with log -p
, simply add the parameter --cc
.
In the example given from GitHub in the question the following is displayed (with my comments interspersed):
$ git show 4a48c9
(D in the example)
commit 4a48c9d0bbb4da5fb30e1d24ae4e0a4934eabb8d
Merge: 0fbc6bb 086c3e8
Author: Tyler Fox <[email protected]>
Date: Sun Dec 28 18:46:08 2014 -0800
Merge branch 'another_branch'
Conflicts:
file.txt
diff --cc file.txt
index 8be441d,f700ccd..fe5c38a
--- a/file.txt
+++ b/file.txt
@@@ -1,9 -1,7 +1,9 @@@
This is a file in a git repo used to demonstrate an 'evil merge'.
The following lines are not evil. Changes made by the first parent are indicated by a +
/-
in the left-most column; changes made by the second parent are indicated by +
/-
in the second column.
- int a = 0;
- int b = 1;
+ int a = 1;
+ int b = 0;
+int c = 2;
- a = b;
+ b = a;
a++;
Here is the evil part: ++
was change to --
from both parents. Note the leading --
and ++
indicating that these changes occur from both parents, meaning that someone introduced new changes in this commit that were not already reflected in one of the parents. Do not confuse the leading, diff-generated ++
/--
with the trailing ++
/--
which is part of the file contents.
--b++;
++b-- ;
End of of evilness.
+c++;
To quickly view all merge commits that may have issues:
git log --oneline --min-parents=2 --cc -p --unified=0
All uninteresting merges will be displayed on a single line, while the messy ones - evil or otherwise - will display the combined diff.
Explanation:
-
-p
- Display patch -
--oneline
- Display each commit header on a single line -
--min-parents=2
- Only show merges. -
--cc
- Show combined diff, but only for places where changes from both parents overlap -
--unified=0
- Display 0 lines of context; Modify the number to be more aggressive in finding evil merges.
Alternatively, add the following to eliminate all uninteresting commits:
-z --color=always | perl -pe 's/^([^\0]*\0\0)*([^\0]*\0\0)(.*)$/\n$2\n$3/'
-
-z
- Display NUL instead of newline at the end of commit logs -
--color=always
- Don't turn off color when piping to perl -
perl -pe 's/^([^\0]*\0\0)*([^\0]*\0\0)
- Massage the output to hide log entries with empty diffs
I've expanded on the answer from Joseph K. Strauss to create two complete shell scripts that can be easily used to get a meaningful diff output for a given merge commit.
The scripts are available in this GitHub Gist: https://gist.github.com/smileyborg/913fe3221edfad996f06
The first script, detect_evil_merge.sh
, uses the strategy of automatically redoing the merge again without resolving any conflicts, and then diff'ing that to the actual merge.
The second script, detect_evil_merge2.sh
, uses the strategy of automatically redoing the merge again twice, once resolving conflicts with the first parent's version, and second resolving conflicts using the second parent's version, and then diff'ing each of those to the actual merge.
Either script will do the job, it's just personal preference on which way you find it easier to understand how the conflicts were resolved.