Examples of different results produced by the standard (Myers), minimal, patience and histogram diff algorithms

Solution 1:

I think there are multiple algorithms supported because none of the algorithms are clearly the best choice in all cases.

The differences are in readability of the patch output and processing time needed to generate the patch.

Summarizing, this is what I understand the differences are:

  • Myers: The original algorithm as implemented in xdiff (http://www.xmailserver.org/xdiff-lib.html and http://www.xmailserver.org/diff2.pdf), optimizing the 'edit distance' for changed lines.
  • Minimal: Myers plus trying to minimize the patch size.
  • Patience: Attempts to trade readability of the patch versus patch size and processing time. See What is `git diff --patience` for? and http://bramcohen.livejournal.com/73318.html or http://alfedenzo.livejournal.com/170301.html for a description.
  • Histogram: Mainly created for speed. Faster than Myers and Patience, originally developed in jgit (http://eclipse.org/jgit/)

Here is a comparison of speed for Myers, patience, and histogram: http://marc.info/?l=git&m=133103975225142&w=2

Here is a comparison of diff output of Histogram vs Myers: http://marc.info/?l=git&m=138023003519837&w=2

Solution 2:

Although comparing only 2 algorithms: Myers and Histogram, it might help. A study by Nugroho et al. reveals the level of disagreement between both diff algorithms. The study performed 3 comparisons, namely metrics, SZZ algorithm, and patches. From the comparison of metrics and SZZ, we can see the high differences between Myers and Histogram in the number of different identified code changes. It is true that none of those diff's are incorrect in describing changes. However, from the manual patches comparison, the Histogram algorithm provides a reasonable diff output better in describing human change intention.