Can I use file_get_contents() to compare two files?

I want to synchronize two directories. And I use

file_get_contents($source) === file_get_contents($dest)

to compare two files. Is there any problem to do this?


I would rather do something like this:

function files_are_equal($a, $b)
{
  // Check if filesize is different
  if(filesize($a) !== filesize($b))
      return false;

  // Check if content is different
  $ah = fopen($a, 'rb');
  $bh = fopen($b, 'rb');

  $result = true;
  while(!feof($ah))
  {
    if(fread($ah, 8192) != fread($bh, 8192))
    {
      $result = false;
      break;
    }
  }

  fclose($ah);
  fclose($bh);

  return $result;
}

This checks if the filesize is the same, and if it is it goes through the file step by step.

  • Checking the modified time check can be a quick way in some cases, but it doesn't really tell you anything other than that the files have been modified at different times. They still might have the same content.
  • Using sha1 or md5 might be a good idea, but this requires going through the whole file to create that hash. If this hash is something that could be stored and used later, then it's a different story probably, but yeah...

Use sha1_file() instead. It's faster and works fine if you just need to see whether the files differ. If the files are large, comparing the whole strings to each other can be very heavy. As sha1_file() returns an 40 character representation of the file, comparing files will be very fast.

You can also consider other methods like comparing filemtime or filesize, but this will give you guaranteed results even if there's just one bit that's changed.


  • Memory: e.g. you have a 32 MB memory limit, and the files are 20 MB each. Unrecoverable fatal error while trying to allocate memory. This can be solved by checking the files by smaller parts.
  • Speed: string comparisons are not the fastest thing in the world, calculating a sha1 hash should be faster (if you want to be 110% sure, you can compare the files byte-by-byte when hash matches, but you'll rule out all the cases where content and hash change (99%+ cases))
  • Efficiency: do some preliminary checks - e.g. there's no point comparing two files if their size differs.

Ths will work, but is inherently more inefficient than calculating checksum for both files and comparing these. Good candidates for checksum algorithms are SHA1 and MD5.

http://php.net/sha1_file

http://php.net/md5_file

if (sha1_file($source) == sha1_file($dest)) {
    /* ... */
}