Linux BTRFS - convert to single with failed drive

A small amount of backstory:

I have a small media filesystem, on which I store various movies and TV shows that are used for my HTPC setup. This was originally set up, using btrfs, on a 1TB WD external drive.

Later, I decided to purchase another drive, to give this filesystem RAID1 mirroring capabilities. This drive is a Seagate Barracuda (2TB, BARRACUDA 7200.14 FAMILY). Unfortunately, this was not a good choice of drive. The drive started developing large amounts of read errors shortly, although BTRFS was able to correct them.

Recently, the amount of read errors from this drive has spiked, with its condition steadily worsening. BTRFS is now starting to crash:

kernel: RSP: 0018:ffff88005f0e7cc0  EFLAGS: 00010282
kernel: RIP: 0010:[<ffffffffa0081736>]  [<ffffffffa0081736>] btrfs_check_repairable+0xf6/0x100 [btrfs]
kernel: task: ffff88001b5c4740 ti: ffff88005f0e4000 task.ti: ffff88005f0e4000
kernel: Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
kernel: CPU: 1 PID: 3136 Comm: kworker/u8:3 Tainted: G           O    4.5.3-1-ARCH #1
kernel: invalid opcode: 0000 [#1] PREEMPT SMP 
kernel: kernel BUG at fs/btrfs/extent_io.c:2309!
kernel: ------------[ cut here ]------------
kernel: BTRFS info (device sdc1): csum failed ino 73072 extent 1531717287936 csum 3335082470 wanted 3200325796 mirror 0
kernel: ata3: EH complete
kernel: BTRFS error (device sdc1): bdev /dev/sda3 errs: wr 0, rd 18, flush 0, corrupt 0, gen 0
kernel: blk_update_request: I/O error, dev sda, sector 2991635296

I'd like to remove the faulty drive from the RAID1 array, going back to no redundancy on a single drive. Unfortunately, there seems to be a lack of documentation on how to do this.

I am aware that one can run the following:

sudo btrfs balance start -dconvert=single /media

to convert the data profile to single mode, but I'm unsure as to just WHERE the data will be placed. As one of the drives is failing, I'd like to be able to ensure that BTRFS doesn't dutifully erase all the data on the good drive, and place a single copy on the bad drive - instead, I'd like to simply act as if the other drive never existed (as in, convert back to my old setup)

This doesn't work:

$ sudo btrfs device delete /dev/sda3 /media
ERROR: error removing device '/dev/sda3': unable to go below two devices on raid1

What am I to do? Help would be greatly appreciated.

TL;DR: started with 1 drive in BTRFS single, added another drive, made it RAID1, other drive is now erroring, how do I return to just one drive (SPECIFICALLY the known good one) with single?


Solution 1:

Alright, I figured it out with the help of this Trello link. In case anyone else wants to do this, here's the procedure.

Procedure

From a RAID1 array of two disks, one /dev/sda which is faulty and another /dev/sdc known-good:

  1. Disable auto-mounting of this array in /etc/fstab, reboot. Basically, we want btrfs to forget this array exists, as there's a bug where it'll still try to use one of the drives if it's unplugged.
  2. Now that your array is unmounted, execute:

    echo 1 | sudo tee /sys/block/sda/device/delete

    replacing sda with the faulty device name. This causes the disk to spin down (you should verify this in dmesg) and become inaccessible to the kernel.

    Alternatively: just take the drive out of the computer before booting! I chose not to opt for this method, as the above works fine for me.

  3. Mount your array, with -o degraded mode.
  4. Begin a rebalancing operation with sudo btrfs balance start -f -mconvert=single -dconvert=single /mountpoint. This will reorganise the extents on the known-good drive, converting them to single (non-RAID). This will take almost a day to complete, depending on the speed of your drive and size of your array. (mine had ~700 GiB, and rebalanced at a rate of 1 1GiB chunk per minute) Luckily, this operation can be paused, and will keep the array online while it occurs.
  5. Once this is done, you can issue sudo btrfs device remove missing /mountpoint to remove the 'missing' faulty device.
  6. Begin a second rebalance with sudo btrfs balance start -mconvert=dup /mountpoint to restore metadata redundancy. This takes a few minutes on my system.
  7. You're done! Your array is now single mode, with all redundancy removed.
  8. Take your faulty drive outside, and beat it with a hammer.

Troubleshooting

  • Help, btrfs tried to write to my faulty disk, errored out, and forced it readonly!
    • Did you follow step 1, and reboot before continuing? It's likely that btrfs still thinks the drive you spun down is present. Rebooting will cause btrfs to forget any errors, and will let you continue.

Solution 2:

Thanks for your post. I had this idea that I could test out raid, pop the drive out of my hotswap bay, use another drive and and then pop the raid drive back in. In retrospect, this was a bad idea and now I need my hotswap bay.

Here's what I found. As root:

# sudo btrfs fi show
Label: 'disk'  uuid: 12817aeb-d303-4815-8bba-a3440e36c62c
Total devices 2 FS bytes used 803.10GiB
    devid    1 size 931.51GiB used 805.03GiB path /dev/sda1
    devid    2 size 931.51GiB used 805.03GiB path /dev/sdb1

Note the devid listed for each drive. Man for brtrfs balance lead me to the devid option, took a couple tries to figure out how the filters worked (initially trying devid=/dev/sdb1). So your first attempt is going to look something like this.

# btrfs balance start -dconvert=single,devid=2 -mconvert=single,devid=2 /mnt

Which gave me an error.

ERROR: error during balancing '/media/.media': Invalid argument
There may be more info in syslog - try dmesg | tail    

Here's the error from dmesg:

BTRFS error (device sdb1): balance will reduce metadata integrity, use force if you want this

So this is the final that worked:

# btrfs balance start -f -dconvert=single,devid=2 -mconvert=single,devid=2 /mnt

Hopefully this helps someone else out.