Git: Is there a way to figure out where a commit was cherry-pick'ed from?

Solution 1:

By default, the information about the original, “cherry” commit is not recorded as part of the new commit.

Record the Source Commit in the Commit Message

If you can force the use of particular workflows/options, git cherry-pick has the -x option:

When recording the commit, append to the original commit message a note that indicates which commit this change was cherry-picked from.

This is obviously useless if you can not rely on the cherry pickers using the option. Also, since the recorded information is just plain text—not an actual reference as far as Git is concerned—even if you use -x, you still have to take steps to make sure that the original commit is kept alive (e.g. is is part of the DAG of a tag or a non-rewinding branch).

git cherry and git patch-id

If you can restrict your search to two particular branches of the history DAG, then git cherry can find both “unpicked” and “picked” cherries.

Note: This command (and the related git patch-id) can only identify conflict-free cherries that were individually picked without extra changes. If there was a conflict while picking the cherry (e.g. you had to slightly modify it to get it to apply), or you used -n/--no-commit to stage extra changes (e.g. multiple cherries in a single commit), or the content of the commit was rewritten after the picking, then you will have to rely on commit message comparison (or the -x information if it was recorded).

git cherry is not really designed to identify the origin of picked cherries, but we can abuse it a bit to identify single cherry pairs.

Given the following history DAG (as in the original poster’s example):

1---2---3---B---D  master
         \
          A---C    dev
# D is a cherry-picked version of C

you will see something like this:

% git cherry master dev
+ A
- C
% git cherry dev master
+ B
- D

(A, B, C, and D are full SHA-1 hashes in the real output)

Since we see one cherry (the - lines) in each list, they must form a cherry pair. D was a cherry picked from C (or vice versa; you can not tell by the DAG alone, though the commit dates might help).

If you are dealing with more than one potential cherry, you will have to “roll your own” program to do the mapping. The code should be easy in any language with associative arrays, hashes, dictionaries, or equivalent. In awk, it might look like this:

match_cherries() {
    a="$(git rev-parse --verify "$1")" &&
    b="$(git rev-parse --verify "$2")" &&
    git rev-list "$a...$b" | xargs git show | git patch-id |
    awk '
        { p[$1] = p[$1] " " $2 }
    END { 
            for (i in p) {
                l=length(p[i])
                if (l>41) print substr(p[i],2,l-1)
            }
        }'
}
match_cherries master dev

With an extended example that has two picked cherries:

1---2---3---B---D---E  master
         \
          A---C        dev
# D is a cherry-picked version of C
# E is a cherry-picked version of A

The output might look like this:

match_cherries master dev
D C
E A

(A, C, D, and E are full SHA-1 hashes in the real output)

This tells us that C and D represent the same change and that E and A represent the same change. As before, there is no way to tell which of each pair was “the first” unless you also consider (e.g.) the commit dates of each commit.

Commit Message Comparison

If your cherries were not picked with -x, or they are “dirty” (had conflicts, or other changes added to them (i.e. with --no-commit plus staging extra changes, or with git commit --amend or other “history rewriting” mechanism)), then you may have to fall back on less the less reliable technique of comparing commit messages.

This technique works best if you can find some bit of the commit message that is likely to be unique to the commit and is unlikely to have changed in the commit that resulted from the cherry pick. The bit that would work best would depend on the style of commit messages used in your project.

Once you have picked out an “identifying part” of the message, you can use git log to find commits (also demonstrated in Jefromi’s answer).

git log --grep='unique part of the commit message' dev...master

The argument to --grep is actually a regular expression, so you might need to escape any regexp metacharacters ([]*?.\).

If you are not sure which branches might hold the original commit and the new commit, you can use --all as Jefromi showed.

Solution 2:

If I follow your diagram, you want to know if you can determine than D (not B) is the result of cherry-picking A.

In theory, as illustrated in "How to list git branches that contain a given commit?", you can search for a commit, if D is actually the same commit (SHA1) than A:

git branch --contains <commit>

But as Jefromi comments, D cannot have the same SHA1 in this case.
That leaves the search for a common commit message: see Jefromi's answer.


As Ken Bloom mentions in the comments of the question, for such a local cherry-picking, a daggy-fix technique (like in monotone or mercurial) is more appropriate, because it will leave a clear trace of the merge.

Daggy fixes mean using rather than losing the true origin and relationship between bugs and fixes in the ancestry graph.

Since [Git] offers the ability to make a commit on top of any revision, thereby spawning a tiny anonymous branch, a viable alternative to cherry-picking is as follows:

  • use bisect to identify the revision where a bug arose;
  • check out that revision;
  • fix the bug;
  • and commit the fix as a child of the revision that introduced the bug.

This new change can easily be merged into any branch that had the original bug, without any sketchy cherry-picking antics required.
It uses a revision-control tool's normal merge and conflict-resolution machinery, so it is far more reliable than cherry-picking (the implementation of which is almost always a series of grotesque hacks).

https://storage.googleapis.com/google-code-attachments/rainforce/issue-4/comment-5/Hg-dag-6-daggy-fix.png Hg DaggyFox

(here a Mercurial diagram, but easily applied to Git)

Doing daggy fixes all the time isn't for everyone.
It's not always so easy to develop a fix directly against the revision where the bug was introduced.

  • Perhaps the bug wasn't discovered until some other more recent code used it in ways that exposed the bug; it would be hard to debug and find the fix without this other code around.
  • Or perhaps the importance or scope of the fix simply hadn't been realised at the time.

See also this article for more on daggy-fix:

This technique of going back in history to fix a bug, then merging the fix into modern branches, was given the name "daggy fixes" by the authors of Monotone, an influential distributed revision-control system.
The fixes are called daggy because they take advantage of a project's history being structured as a directed acyclic graph, or dag.
While this approach could be used with Subversion, its branches are heavyweight compared with the distributed tools, making the daggy-fix method less practical. This underlines the idea that a tool's strengths will inform the techniques that its users bring to bear.

Solution 3:

No information about the original commit is embedded in the newly created commit, so there's no direct way to tell. What you suggest (searching for the commit message) is probably the best way - it's certainly a lot easier than searching for a commit with the same diff:

git log --grep="<commit subject>" --all

Unless of course the commit's no longer reachable from a branch... probably then you'd want to look at the output of git fsck.