GitHub Merge branch 'master'

Been trying out Git and Github after many years of svn use. I seem to have the basics down, but one item is confusing me.

  • UserA makes a change to FileA and pushes to the remote server (GitHub)

  • UserB makes a change to FileB. He first pulls from remote server, and then pushes his change to FileB to the remote server

  • GitHub commit history shows the push from UserA and the push from UserB

  • However, there is an additional entry in the commit history from UserB called 'Merge branch 'master' of https://github.com/xxx/yyy'. Viewing the diff in Github shows this to be an exact replica of the Changes that UserA made to FileA

Why is this duplicate shown - both the push from UserA to FileA and the Merge branch master entries are identical...the second seems superfluous to me.


Solution 1:

Each version ("commit") stored in git forms part of a graph, and it's frequently helpful to think about what you're doing in git in terms of that graph.

When UserA begins, let's say that there had only been two commits created, which we'll call P and Q:

P--Q (master)

He then modifies FileA, stages that change and creates a commit that represents the new state of the source code - let's say that commit is called R. This has a single parent, which is the commit Q:

P--Q--R (master)

After successfully pushing, the commit graph for the GitHub repository looks the same.

UserB started with the same history:

P--Q (master)

... but created a different commit, say called S, which has his modified version of FileB:

P--Q--S (master)

UserB tries to push that to GitHub, but the push is refused - unless you "force" the push, you're not allowed to update a remote branch unless the version you're pushing includes all of the history in that remote branch. So, UserB pulls from GitHub. A pull really consists of two steps, fetching and merging. The fetch updates origin/master, which is like a cache of the state of the remote branch master from the remote origin. (This is an example of a "remote-tracking branch".)

P--Q--S (master)
    \
      R (origin/master)

The history in this graph has diverged, so the merge tries to unify those two histories by creating a merge commit (say M) which has both S and R as parents, and hopefully represents the changes from both branches:

P--Q--S--M (master)
    \   /
     \ /
      R (origin/master)

When GitHub shows you a diff that represents the changes introduced by the commit, it's simple in the case of a commit with one parent - it can just do a diff from that version. However, in the case of a commit such as M, with more than one parent, it has to choose a parent to show the diff against. That explains why the diff shown for the merge commit M might appear to be the same as that shown for one of S or R. Commits in git are defined by the exact state of the source tree, not the changes that got the tree into that state.