Fastest way to tell if two files have the same contents in Unix/Linux?

I believe cmp will stop at the first byte difference:

cmp --silent $old $new || echo "files are different"

I like @Alex Howansky have used 'cmp --silent' for this. But I need both positive and negative response so I use:

cmp --silent file1 file2 && echo '### SUCCESS: Files Are Identical! ###' || echo '### WARNING: Files Are Different! ###'

I can then run this in the terminal or with a ssh to check files against a constant file.

To quickly and safely compare any two files:

if cmp --silent -- "$FILE1" "$FILE2"; then
  echo "files contents are identical"
else
  echo "files differ"
fi

It's readable, efficient, and works for any file names including "` $()

Because I suck and don't have enough reputation points I can't add this tidbit in as a comment.

But, if you are going to use the cmp command (and don't need/want to be verbose) you can just grab the exit status. Per the cmp man page:

If a FILE is '-' or missing, read standard input. Exit status is 0 if inputs are the same, 1 if different, 2 if trouble.

So, you could do something like:

STATUS="$(cmp --silent $FILE1 $FILE2; echo $?)"  # "$?" gives exit status for each comparison

if [[ $STATUS -ne 0 ]]; then  # if status isn't equal to 0, then execute code
    DO A COMMAND ON $FILE1
else
    DO SOMETHING ELSE
fi

EDIT: Thanks for the comments everyone! I updated the test syntax here. However, I would suggest you use Vasili's answer if you are looking for something similar to this answer in readability, style, and syntax.

For files that are not different, any method will require having read both files entirely, even if the read was in the past.

There is no alternative. So creating hashes or checksums at some point in time requires reading the whole file. Big files take time.

File metadata retrieval is much faster than reading a large file.

So, is there any file metadata you can use to establish that the files are different? File size ? or even results of the file command which does just read a small portion of the file?

File size example code fragment:

  ls -l $1 $2 | 
  awk 'NR==1{a=$5} NR==2{b=$5} 
       END{val=(a==b)?0 :1; exit( val) }'

[ $? -eq 0 ] && echo 'same' || echo 'different'

If the files are the same size then you are stuck with full file reads.

Fastest way to tell if two files have the same contents in Unix/Linux?

Related

Recent Posts