Semantic Diff Utilities [closed]

I'm trying to find some good examples of semantic diff/merge utilities. The traditional paradigm of comparing source code files works by comparing lines and characters.. but are there any utilities out there (for any language) that actually consider the structure of code when comparing files?

For example, existing diff programs will report "difference found at character 2 of line 125. File x contains v-o-i-d, where file y contains b-o-o-l". A specialized tool should be able to report "Return type of method doSomething() changed from void to bool".

I would argue that this type of semantic information is actually what the user is looking for when comparing code, and should be the goal of next-generation progamming tools. Are there any examples of this in available tools?


Solution 1:

We've developed a tool that is able to precisely deal with this scenario. Check http://www.semanticmerge.com

It merges (and diffs) based on code structure and not using text-based algorithms, which basically allows you to deal with cases like the following, involving strong refactor. It is also able to render both the differences and the merge conflicts as you can see below:

enter image description here

And instead of getting confused with the text blocks being moved, since it parses first, it is able to display the conflicts on a per method basis (per element in fact). A case like the previous won't even have manual conflicts to solve.

enter image description here

It is a language-aware merge tool and it has been great to be finally able to answer this SO question :-)

Solution 2:

Eclipse has had this feature for a long time. It's called "Structure Compare", and it's very nice. Here is a sample screenshot for Java, followed by another for an XML file:

(Note the minus and plus icons on methods in the upper pane.)

Eclipse's Java Structure ComparerEclipse's XML Structure Comparer

Solution 3:

To do "semantic comparisons" well, you need to compare the syntax trees of the languages, and take into account the meaning of symbols. A really good semantic diff would understand the language semantics, and realize when one block of code was equivalent in function to another. Going this far requires a theorem prover, and while it would be extremely cute, isn't presently practical for a real tool.

A workable approximation of this is simply comparing syntax trees, and reporting changes in terms of structures inserted, deleted, moved, or changed. Getting somewhat closer to a "semantic comparison", one could report when an identifier is changed consistently across a block of code.

See our http://www.semanticdesigns.com/Products/SmartDifferencer/index.html for a syntax tree-based comparison engine that works with many languages, that does the above approximation.

EDIT Jan 2010: Versions available for C++, C#, Java, PHP, and COBOL. The website shows specific examples for most of these.

EDIT May 2010: Python and JavaScript added.

EDIT Oct 2010: EGL added.

EDIT Nov 2010: VB6, VBScript, VB.net added