Using 'diff' (or anything else) to get character-level diff between text files

I'd like to use 'diff' to get a both line difference between and character difference. For example, consider:

File 1

abcde
abc
abcccd

File 2

abcde
ab
abccc

Using diff -u I get:

@@ -1,3 +1,3 @@
 abcde
-abc
-abcccd
\ No newline at end of file
+ab
+abccc
\ No newline at end of file

However, it only shows me that were changes in these lines. What I'd like to see is something like:

@@ -1,3 +1,3 @@
 abcde
-ab<ins>c</ins>
-abccc<ins>d</ins>
\ No newline at end of file
+ab
+abccc
\ No newline at end of file

You get my drift.

Now, I know I can use other engines to mark/check the difference on a specific line. But I'd rather use one tool that does all of it.


Git has a word diff, and defining all characters as words effectively gives you a character diff. However, newline changes are ignored.

Example

Create a repository like this:

mkdir chardifftest
cd chardifftest
git init
echo -e 'foobarbaz\ncatdog\nfox' > file
git add -A; git commit -m 1
echo -e 'fuobArbas\ncat\ndogfox' > file
git add -A; git commit -m 2

Now, do git diff --word-diff=color --word-diff-regex=. master^ master and you'll get:

git diff

Note how both additions and deletions are recognized at the character level, while both additions and deletions of newlines are ignored.

You may also want to try one of these:

git diff --word-diff=plain --word-diff-regex=. master^ master
git diff --word-diff=porcelain --word-diff-regex=. master^ master

You can use:

diff -u f1 f2 |colordiff |diff-highlight

screenshot

colordiff is a Ubuntu package. You can install it using sudo apt-get install colordiff.

diff-highlight is from git (since version 2.9). It is located in /usr/share/doc/git/contrib/diff-highlight/diff-highlight. You can put it somewhere in your $PATH.


Python's difflib is ace if you want to do this programmatically. For interactive use, I use vim's diff mode (easy enough to use: just invoke vim with vimdiff a b). I also occaisionally use Beyond Compare, which does pretty much everything you could hope for from a diff tool.

I haven't see any command line tool which does this usefully, but as Will notes, the difflib example code might help.


You can use the cmp command in Solaris:

cmp

Compare two files, and if they differ, tells the first byte and line number where they differ.


Python has convenient library named difflib which might help answer your question.

Below are two oneliners using difflib for different python versions.

python3 -c 'import difflib, sys; \
  print("".join( \
    difflib.ndiff( \ 
      open(sys.argv[1]).readlines(),open(sys.argv[2]).readlines())))'
python2 -c 'import difflib, sys; \
  print "".join( \
    difflib.ndiff( \
      open(sys.argv[1]).readlines(), open(sys.argv[2]).readlines()))'

These might come in handy as a shell alias which is easier to move around with your .${SHELL_NAME}rc.

$ alias char_diff="python2 -c 'import difflib, sys; print \"\".join(difflib.ndiff(open(sys.argv[1]).readlines(), open(sys.argv[2]).readlines()))'"
$ char_diff old_file new_file

And more readable version to put in a standalone file.

#!/usr/bin/env python2
from __future__ import with_statement

import difflib
import sys

with open(sys.argv[1]) as old_f, open(sys.argv[2]) as new_f:
    old_lines, new_lines = old_f.readlines(), new_f.readlines()
diff = difflib.ndiff(old_lines, new_lines)
print ''.join(diff)