Tar extracting from multi-volume tape whilst computing shasums

As part of our backup system, we replicate zfs datasets from a TrueNAS system to a couple of backup servers, one of which is running TrueNAS Scale and has a LTO-5 tape drive connected. We occasionally write one of the read-only snapshot's contents to tape. As some of these datasets are large, tar is used with the --multi-volume flag.

Prior to backup, sha256sums are generated for every file in the snapshot directory. A copy of this file is kept on the server and also written to tape.

After this, the entire contents of the snapshot are written to tape using

  tar --acls --xattrs --spares --label="SomeLabel" --multi-volume -cvpf /dev/nst0 *

This has served us well, however, I wish to verify the data after it has been written to tape. I want to avoid needing to extract the entire dataset of files to a scratch location which would otherwise allow running "sha256sum -c" as the TrueNAS scale server does not have sufficient additional space for some datasets to be extracted. Instead I tried:-

  tar --multi-volume -xf /dev/nst0 --to-command=tar-shasums.sh | tee verify-datasetname.sha25sum

Where tar-shasums.sh is along these lines:

#!/bin/bash

sha1=`sha1sum`
echo -n $sha1 | sed 's/ .*$//'
echo "  $TAR_FILENAME"

I've run into an issue however if the tar spans across two tapes. When tar is in the middle of reading back a file that spans two tapes, it will ask for the next volume to be inserted and enter to be pressed. However, this will error as the device is in use.

It looks like the "--to-command" is still active for that file, since it has yet to receive all the data to produce the shasum, yet it also cannot finish until the tape is changed, but the tape cannot be changed until it has finished...

Currently I kill the shasum process, which allows tar to continue with the next tape but means that one file spanning the two volumes cannot be verified. Unless that file is manually extracted and checked. Not ideal.

I'm expecting a no, but, is there any way around this? Any way to generate shasums that does not involve extracting the entire tar to disk first? Or, any way to break the locks on /dev/nst0 to allow tar to continue reading from the newly inserted tape without having to kill shas256sum?


Solution 1:

I had a look at the tar source last night and it looks like "--to-command" does create a pipe it then uses fork to run script and pipes the file data to it.

So the issue is, fork causes the forked process to inherit all the parents file descriptors which includes the /dev/nst0 device that tar has open. Tar then closes /dev/nst0 ready for media change but the forked process that is waiting for more piped data still has it open, hence deadlock.

I've partially worked around this by changing the script it runs to always close /dev/nst0 descriptor

DEVICE=/dev/nst0
file=`lsof -p $$ | grep ${DEVICE} | awk '{print $4}'`
file=${file::-1}
eval "exec ${file}<&-"

There is then just one process "sh" that appears to still hang on to the file descriptor. "fuser -u /dev/nst0" shows this and as a temporary workaround it's possible to use gdb to close it after which the media change and remainder of the checksums generate correctly.

gdb -p PID
p close(FD)

I'm not sure if it's possible to use fork but not pass all file descriptors to the forked process but that looks like it would be the final solution.

I'll update this answer if I figure that out.