How to diff large files on Linux

Solution 1:

cmp does things byte-by-byte, so it probably won't run out of memory (just tested it on two 7 GB files) -- but you might be looking for more detail than a list of "files X and Y differ at byte x, line y". If the similarities of your files are offset (e.g., file Y has an identical block of text, but not at the same location), you can pass offsets to cmp; you could probably turn it into a resynchronizing compare with a small script.

Aside: In case anyone else lands here when looking for a way to confirm that two directory structures (containing very large files) are identical: diff --recursive --brief (or diff -r -q for short, or maybe even diff -rq) will work and not run out of memory.

Solution 2:

I found this link

diff -H might help, or you can try installing the textproc/2bsd-diff port which apparently doesn't try to load the files into RAM, so it can work on large files more easily.

I'm not sure if you tried those two options or if they might work for you. Good luck.

Solution 3:

If the files are identical (same length) except for a few byte values, you can use a script like following (w is the number of bytes per line to hexdump, adjust to your display width):

w=12;
while read -ru7 x && read -ru8 y;
do
  [ ".$x" = ".$y" ] || echo "$x | $y";
done 7< <(od -vw$w -tx1z FILE1) 8< <(od -vw$w -tx1z FILE2) > DIFF-FILE1-FILE2 &

less DIFF-FILE1-FILE2

It's not very fast, but does the job.