How to get Btrfs verify checksum for one file?

Btrfs offers these commands to verify data integrity/checksums:

btrfs scrub start <path>|<device>
btrfs check --check-data-csum

However, AFAIK those always verify whole filesystems; the path argument is to identify a filesystem on a device, not file/directory within filesystem.

Now, I have a 3TB Btrfs filesystem. Scrubbing it takes hours. Sometimes I need to make sure that only certain file/directory has not yet been affected by bitrot — for example, before using an *.iso installation image or restoring a backup. How do I use Btrfs for this — without falling back to keeping manual hash files per each file?

I am aware that Btrfs does not store checksums for individual files — it stores checksums for blocks of data. In this case what I am looking for is a command/tool that identifies all the blocks used for storing certain files/directories and verifies those blocks only.

I read somewhere that Btrfs allegedly verifies checksums on read. That is, if a file has been bit-rotted, reading it would fail or something like that. Is this the case?


The answer is: simply try reading the whole file. If it reads differently from what has been checksummed, there will be an Input/output error. So yes, Btrfs indeed verifies checksums on read!

To find out this answer, I put together the following test:

  1. Allocate a 1 Gb file to be used as a block device for testing Btrfs partition, mount it as a loop device and format Btrfs on it;
  2. Create a dummy 800 Mb file containing a known unique sequence of bytes in the middle (token1);
  3. Write the file to the Btrfs and record its sha256 for later reference;
  4. Unmount and patch the block device file so that one byte is changed. For this, we sed-replace token1 with token2;
  5. Mount again and try getting sha256 of the 800 Mb file on the Btrfs. See Input/output error;
  6. Unmount, patch back, mount and see that the 800 Mb file is readable again and sha256 is the same as in step 3;
  7. Profit!

Here is the script:

#!/bin/bash
f="btrfstestblockdevicefile"
ft="btrfstestfile"
loop="/dev/loop0"
mount_dir="btrfstestdir"
size="1g"
token1="36bbf48aa6645646fbaa7f25b64224fb3399ad40bc706c79bb8276096e3c9e8f"
token2="36bbf48aa6645646fbaa7f25b64224fb4399ad40bc706c79bb8276096e3c9e8f"

f_mount() {
    echo "Mounting..." && \
    sudo losetup $loop $f && \
    if ! [[ -z $1 ]] ; then
        sudo mkfs.btrfs -q $loop
    fi
    mkdir $mount_dir && \
    sudo mount $loop $mount_dir
}

f_umount() {
    echo "Unmounting..." && \
    sudo umount $loop && \
    sudo rmdir $mount_dir && \
    sudo losetup -d $loop
}

echo "Allocating file for test block device..." && \
fallocate -l $size $f && \
f_mount 1 && \
echo "Generating test file..." && \
dd if=/dev/urandom of="${ft}1" bs=1M count=400 status=none && \
echo $token1 > "${ft}2" && \
dd if=/dev/urandom of="${ft}3" bs=1M count=400 status=none && \
sudo sh -c "cat ${ft}1 ${ft}2 ${ft}3 > ${mount_dir}/${ft}" && \
rm "${ft}1" "${ft}2" "${ft}3" && \
echo "Calculating original hash of the file..." && \
sha256sum "${mount_dir}/${ft}" && \
f_umount && \
echo "Patching the file in the block device file..." && \
sed -i "s/${token1}/${token2}/g" $f && sync && \
f_mount && \
echo "Trying to read the file..." && \
sha256sum "${mount_dir}/${ft}"
echo "OK, unmount, patch back and try again..." && \
f_umount && \
sed -i "s/${token2}/${token1}/g" $f && sync && \
f_mount && \
sha256sum "${mount_dir}/${ft}" && \
echo "Yay, Btrfs rules! Cleaning up..." && \
f_umount && \
rm $f && \
echo "All clear!"

As expected, replacing mkfs.btrfs with making a non-checksumming filesystem (e.g. mkfs.ext4) allows for the corrupted file to be read. Of course, its sha256 is different from the non-corrupted one.