Difference in whitespace between two files on Linux

Solution 1:

For vim users, there is a handy utility to show exact differences between files:

vimdiff file1 file2

This will put each file in windows, side-by-side, and differences with highlighted in color.

Some useful commands when in vimdiff

While in vimdiff, some useful commands are:

  • ]c: jump to next change

  • [c: jump to previous change

  • ctrl-W ctrl-W: switch to other window

  • zo: open folds

  • zc: close folds

Example

Here is an example of vimdiff in an xterm comparing two versions of a cups configuration file:

enter image description here

You can see that long sections of identical lines have been collapsed. They can be opened again with zo.

The color scheme will vary depending on your option settings. In the above example, when a line appears in one file but not the other, that line is given a dark blue background. In the other file, the missing lines are indicated by dashed lines. When a line appears in both files but has some differences, the unchanged parts of the lines have a pink background and the changed parts have a red background.

Solution 2:

On FreeBSD or most Linux systems, you can pipe the output of diff through cat -v -e -t to show whitespace differences.

diff file1 file2 | cat -vet

Tabs will be shown as ^I, a $ will be shown at the end of each line so that you can see trailing whitespace, and nonprinting characters will be displayed as ^X or M-X.

If you have GNU coreutils (available on most non-busybox Linux distributions), this can be simplified to

diff file1 file2 | cat -A

On busybox systems, use catv -vet .

Solution 3:

Was one of the files edited on a Windows machine?

Standard line termination on Windows is CRLF, where on Linux it's simply LF (and on Macs it used to be CR, but I suspect that's changed since OS X).

Try wc -l on the files and see how many lines, then see if the size difference is the same as the number of lines (last line may not be terminated in one file).

Solution 4:

od may help. The Octal Dump command can show contents in hexadecimal. This can help you to see what bytes, including null bytes or unexpected white space, is in a file. Possible common causes may be LF vs CRLF, tabs vs spaces, or ASCII vs Unicode (which may often just have a null byte before each normally visible byte). od -x filename ought to reveal any of those patterns. If you want a more elaborate way to view the file, any "hex editor" may do nicely. The nice thing about od is that, like the cut command, it is built into many Unix systems. So, often, no separate installation is necessary.

If you need files to be more similar, tr can make some changes, and sed can make more. I would probably start with ls -l to see which file is larger, then view bytes to see what needs to be changed, and then change one of the files so that they seem more similar.

Solution 5:

To find out where real whitespaces and tabs are you could replace them using sed for example:

$ cat file
  line 1
  line 2
    line 6
        line 7
$ sed 's/ /-/g; s/\t/<tab>/g' file
--line-1
--line-2
<tab>line-6
<tab><tab>line-7

And now compare the two files.