With a time sorted List, How to insert a checksum for each file?

Time stamp example:
20211018_14:54:54.0596445490_Mon

Ubuntu 20.04.3 command below works, displaying

  • Directories and
  • Files and
  • Hidden files with:
    Permissions, Time_Day, f or d, Path/fileName with
find . -printf "%M %TY%Tm%Td_%TT_%Ta %Y%p\n" |sort -k2 ;

Sort by time, column 2.
Most recent files at bottom, 4 examples:

-rw-r--r-- 20211001_13:02:16.0000000000_Fri f./Bash/awkCommnads.txt   
-rw-r--r-- 20211013_06:22:12.0000000000_Wed f./.HiddenFile_1.txt   
drwxr-xr-x 20211018_14:51:42.1712136500_Mon d.   
drwxr-xr-x 20211018_14:54:54.0596445490_Mon d./Bash   

Said differently,
How to get 32 byte md5sum checksum for each file,
on Left side of above List that is sorted by time?
Example:

123456789T123456789w123456789Y12 -rw-r--r-- 20211001_13:02:16.0000000000_Fri f./Bash/awkCommnads.txt   

Once md5 works then sha512sum.

Tip for testing:

setterm -linewrap off ; find . -printf "%M %TY%Tm%Td_%TT_%Ta %Y%p\n" |sort -k2 ; tput smam ;  

For testing, one line per record, No Line wrap:

setterm -linewrap off ; Commands... ; tput smam ;

tput smam ; = linewrap on

Again,
With a time sorted List, How to insert a checksum for each file?

--



Solution 1:

You need to run md5sum for each file (or rather for each regular file). You run arbitrary commands from find with -exec. The problem is find . -type f -exec md5sum {} \; will print (i.e. md5sum will print) also the pathname and a trailing newline, and sometimes a leading backslash. You need to get rid of them before you proceed to -printf.

A straightforward way is with cut and tr. To execute md5sum … | cut … | tr … inside find, you need a shell there. Executing many processes per file is costly. You cannot use -exec … {} + because you need md5sum and -printf to take turns. We will save few processes if we manage to make the shell do the job of cut and tr.

For a single file (e.g. /etc/fstab) you can print its md5sum in the desired format, still without cut or tr, like this:

sh -c '
   exec 2>/dev/null
   sum="$(md5sum <"$1")" || sum="????????????????????????????????"
   printf "%s " "${sum%% *}"
' sh /etc/fstab

With performance in mind we hope printf is a builtin in your sh. The above command is designed to show question marks if md5sum fails. Useful links:

  • What is the second sh in sh -c 'some shell code' sh?
  • Parameter expansion and quotes within quotes.
  • About ${sum%% *}.

Now let's build the command into your find. Like this:

find . -exec sh -c '
   exec 2>/dev/null
   sum="$(md5sum <"$1")" || sum="????????????????????????????????"
   printf "%s " "${sum%% *}"
' find-sh {} \; -printf '%M %TY%Tm%Td_%TT_%Ta %Y%p\n' | sort -k3

Notes:

  • sort -k2 became sort -k3.

  • The command will try to run md5sum for files of any type. E.g. for a directory md5sum will fail and you will get question marks, this is acceptable. On the other hand, for a fifo md5sum may endlessly wait for data, this you don't want. Consider restricting find to regular files (simply add -type f as the first test) or fixing the command, so our -exec happens for regular files only:

    find . \( -type f -exec sh -c '
        exec 2>/dev/null
        sum="$(md5sum <"$1")" || sum="????????????????????????????????"
        printf "%s " "${sum%% *}"
    ' find-sh {} \; -o -printf '-------------------------------- ' \) \
    -printf '%M %TY%Tm%Td_%TT_%Ta %Y%p\n' | sort -k3
    

    The sequences of ? or - characters are of the length of any md5sum, so they align nicely.

  • Newlines in pathnames will confuse your sort. If you can, use null-terminated strings. E.g. with GNU sort in my Kubuntu:

    find … -printf '…\0' | sort -z … | tr '\0' '\n'