Difference in whitespace between two files on Linux
Solution 1:
For vim
users, there is a handy utility to show exact differences between files:
vimdiff file1 file2
This will put each file in windows, side-by-side, and differences with highlighted in color.
Some useful commands when in vimdiff
While in vimdiff
, some useful commands are:
]c
: jump to next change[c
: jump to previous changectrl-W ctrl-W
: switch to other windowzo
: open foldszc
: close folds
Example
Here is an example of vimdiff
in an xterm
comparing two versions of a cups
configuration file:
You can see that long sections of identical lines have been collapsed. They can be opened again with zo
.
The color scheme will vary depending on your option settings. In the above example, when a line appears in one file but not the other, that line is given a dark blue background. In the other file, the missing lines are indicated by dashed lines. When a line appears in both files but has some differences, the unchanged parts of the lines have a pink background and the changed parts have a red background.
Solution 2:
On FreeBSD or most Linux systems, you can pipe the output of diff through cat -v -e -t
to show whitespace differences.
diff file1 file2 | cat -vet
Tabs will be shown as ^I
, a $
will be shown at the end of each line so that you can see trailing whitespace, and nonprinting characters will be displayed as ^X
or M-X
.
If you have GNU coreutils (available on most non-busybox Linux distributions), this can be simplified to
diff file1 file2 | cat -A
On busybox systems, use catv -vet
.
Solution 3:
Was one of the files edited on a Windows machine?
Standard line termination on Windows is CRLF, where on Linux it's simply LF (and on Macs it used to be CR, but I suspect that's changed since OS X).
Try wc -l
on the files and see how many lines, then see if the size difference is the same as the number of lines (last line may not be terminated in one file).
Solution 4:
od
may help. The Octal Dump command can show contents in hexadecimal. This can help you to see what bytes, including null bytes or unexpected white space, is in a file. Possible common causes may be LF vs CRLF, tabs vs spaces, or ASCII vs Unicode (which may often just have a null byte before each normally visible byte). od -x filename
ought to reveal any of those patterns. If you want a more elaborate way to view the file, any "hex editor" may do nicely. The nice thing about od
is that, like the cut
command, it is built into many Unix systems. So, often, no separate installation is necessary.
If you need files to be more similar, tr
can make some changes, and sed
can make more. I would probably start with ls -l
to see which file is larger, then view bytes to see what needs to be changed, and then change one of the files so that they seem more similar.
Solution 5:
To find out where real whitespaces and tabs are you could replace them using sed
for example:
$ cat file
line 1
line 2
line 6
line 7
$ sed 's/ /-/g; s/\t/<tab>/g' file
--line-1
--line-2
<tab>line-6
<tab><tab>line-7
And now compare the two files.