Create checksum sha256 of all files and directories?

Solution 1:

You can use find to find all files in the directory tree, and let it run sha256sum. The following command line will create checksums for the files in the current directory and its subdirectories.

find . -type f -exec sha256sum {} \;

I don't use the options -b and -t, but if you wish, you can use -b for all files. The only difference that I notice is the asterisk in front of each file name.

Solution 2:

TL;DR

cd /path/to/working/directory
sha256sum <(find . -type f -exec sha256sum \; | sort)

Intro

A more complete answer to the one above, which fixes the problem with find "finding" files in different orders on different systems.

Piping output to file, compare with diff

Firstly, you probably want to pipe the output to a file for comparison with diff. For this you would use

find . -type f -exec sha256sum {} \; > file1.lst

Then on your other system

find . -type f -exec sha256sum {} \; > file2.lst
rsync file2.lst user@host:/home/user/file2.lst
ssh user@host
diff file1.lst file2.lst # might not match due to order

Fixing order of files found with find by piping to sort

Here I am assuming you are doing something similar to what I required this for - copying files from one system to another over a network and verifying the integrity of those files.

What I found was that the order in which find finds files can vary between two systems, even when the OS is "Debian" in both cases.

Therefore, one needs to sort the output in the text files.

sort file1.lst > file1sorted.lst
sort file2.lst > file2sorted.lst
diff file1.lst file2.lst # bad
diff file1sorted.lst file2sorted.lst # ok

You can do the find and sort all in one line, while redirecting the output to a file.

find . -type f -exec sha256sum {} \; | sort > file1.lst

Other sha/md5 sums

You might want to have an increased level of shasumming. To use the 512 bit version simply do;

find . -type f -exec sha512sum {} \; | sort > file1.lst

Alternatively, 256 bit might be overkill for what you are doing, so do

find . -type f -exec md5sum {} \; | sort > file1.lst

A complete 1 line command to compare 2 directories with 1 shasum output

Now, if you have many files and do not want to save the output to a file, you could simply shasum the output. To do this, use

sha256sum <(find . -type -f -exec sha256sum \; | sort)

The pipe to sort is required to ensure the output is sorted before computing the final sha256sum. Without this, if find finds files in a different order, despite the shasums for each file being correct, the overall shasum will depend on the order.

Problem relating to diff output and paths used

You may have some path which looks like

/A/B/C/*

where * are the subdirectories and files you are interested in shasumming. If A/B/C are 1 or more directories containing only 1 subfolder you might end up accidentally running your shasum command in the wrong directory, resulting in the following

sort1.txt
sha256sum1    ./A/B/C/file1

sort2.txt
sha256sum2    ./B/C/file1

Even if sha256sum = sha256sum2 diff will say the files are different. (Because they are due to the different base directory in the path.)

Here is a short python3 code to check the sums line by line, which solves this problem.

#!/usr/bin/env python3
file1_name = "sort1.txt"
file2_name = "sort2.txt"
file1 = open(file1_name, 'r')
file2 = open(file2_name, 'r')
file1_lines = file1.readlines();
file2_lines = file2.readlines();
if(len(file1_lines) == len(file2_lines)):
    print("line numbers ok")
    for i in range(len(file1_lines)):
        line1 = file1_lines[i]
        line2 = file2_lines[i]
        line1_split = line1.split(' ')
        line2_split = line2.split(' ')
        shasum1 = line1_split[0]
        shasum2 = line2_split[0]
        if(shasum1 != shasum2):
            print("shasum error: ", line1)
else:
    print("Error: file ", file1_name, " number of lines != ", file2_name, " number of lines")
print("done")

I initially wanted to write a shell script to do this, but I got bored trying to figure out how to do it, so went back to python.

This makes me think that actually writing a python code to do the entire thing would have been easier, except for the find command.

Solution 3:

Late answer, but for the sake of documentation...

The other answers suggest to call sha256sum via find and the -exec option. This has the effect that sha256sum is called once for each file, which is a significant overhead for the OS starting processes.

A more efficient solution is to convert the find results to command line arguments by piping it through xargs and call sha256sum that way. xargs runs sha256sum once or in large badges if there are too many lines.

find /path/to/your/dir -type f | xargs sha256sum -b

In case that you have filenames with whitespaces, use the -print0 flag in find and -0 flag in xargs to terminate strings with \0

find /path/to/your/dir -type f -print0 | xargs -0 sha256sum -b