Generate md5 checksum for all files in a directory

I would like to create a md5 checksum list for all files in a directory.

I want to cat filename | md5sum > ouptput.txt. I want to do this in 1 step for all files in my directory.

Any assistance would be great.


You can pass md5sum multiple filenames or bash expansions:

$ md5sum * > checklist.chk  # generates a list of checksums for any file that matches *
$ md5sum -c checklist.chk   # runs through the list to check them
cron: OK
database.sqlite3: OK
fabfile.py: OK
fabfile.pyc: OK
manage.py: OK
nginx.conf: OK
uwsgi.ini: OK

If you want to get fancy you can use things like find to drill down and filter the files, as well as working recursively:

find -type f -exec md5sum "{}" + > checklist.chk

A great checksum creation/verification program is rhash.

  • It can create SFV compatible files, and check them too.

  • It supports md4, md5, sha1, sha512, crc32 and many many other.

  • It can do recursive creation (-r option) like md5deep or sha1deep.

  • Last but not least, you can format the output of the checksum file. For example:

    rhash --md5 -p '%h,%p\n' -r /home/ > checklist.csv
    

    outputs a CSV file including the full path of files recursively starting with the /home directory.

I also find the -e option to rename files by inserting crc32 sum into the name extremely useful.

Note that you can also change md5sum with rhash in the PhoenixNL72 examples.


Here are two more extensive examples:

  1. Create an md5 file in each directory which doesn't already have one, with absolute paths:

    find "$PWD" -type d | sort | while read dir; do [ ! -f "${dir}"/@md5Sum.md5 ] && echo "Processing " "${dir}" || echo "Skipped " "${dir}" " @md5Sum.md5 already present" ; [ ! -f "${dir}"/@md5Sum.md5 ] &&  md5sum "${dir}"/* > "${dir}"/@md5Sum.md5 ; chmod a=r "${dir}"/@md5Sum.md5;done 
    
  2. Create an md5 file in each folder which doesn't already have one: no paths, only filenames:

    find "$PWD" -type d | sort | while read dir; do cd "${dir}"; [ ! -f @md5Sum.md5 ] && echo "Processing " "${dir}" || echo "Skipped " "${dir}" " @md5Sum.md5 allready present" ; [ ! -f @md5Sum.md5 ] &&  md5sum * > @md5Sum.md5 ; chmod a=r "${dir}"/@md5Sum.md5 ;done 
    

What differs between 1 and 2 is the way the files are presented in the resulting md5 file.

The commands do the following:

  1. Build a list of directory names for the current folder. (Tree)
  2. Sort the folder list.
  3. Check in each directory if the file @md5sum.md5 exists. Output Skipped if it exists, output Processing if it doesn't exist.
  4. If the @md5Sum.md5 file doesn't exist, md5Sum will generate one with the checksums of all the files in the folder. 5) Set the generated @md5Sum.md5 file to read only.

The output of this entire script can be redirected to a file (.....;done > test.log) or piped to another program (like grep). The output will only tell you which directories where skipped and which have been processed.

After a successful run, you will end up with an @md5Sum.md5 file in each subdirectory of your current directory

I named the file @md5Sum.md5 so it'll get listed at the top of the directory in a samba share.

Verifying all @md5Sum.md5 files can be done by the next commands:

find "$PWD" -name @md5Sum.md5 | sort | while read file; do cd "${file%/*}"; md5sum -c @md5Sum.md5; done > checklog.txt

Afterwards you can grep the checklog.txt using grep -v OK to get a list of all files that differ.

To regenerate an @md5Sum.md5 in a specific directory, when you changed or added files for instance, either delete the @md5Sum.md5 file or rename it and run the generate command again.


I hit this issue, and while the solutions above are elegant, I wanted a quick and dirty hack for this situation: 1 directory, with subdirectories one level deep inside it.

So, enter the directory in a shell and run:

md5sum * */* 2>/dev/null > md5sum.md5

This gets all the files in the top level directory, removes the error warning about the sub directories being directories, and then runs md5sums on the subdirectory contents. Advantage: easy to remember, does exactly what it's supposed to do. I always get confused by find syntax and can never remember it off the top of my head, so no need to loop etc, dealing with spaces in directory names, this one liner worked fine. Not a robust powerful solution, no good for > 1 level of subdirectories, but a quick and easy fix for the problem.


Here's mine:

time find dirname/|xargs md5sum |tee dirname.md5

It throws errors when it tries to calculate it for a directory, but it's good enough for me.