Can a failed Btrfs drive in RAID-1 be replaced live?
I am trying to decide on a filesystem and would like to know if it is possible to replace a failed drive in btrfs RAID without downtime.
-
Suppose I create a new btrfs filesystem using the command
mkfs.btrfs -d raid1 /dev/sdb /dev/sdc
Now suppose one day
/dev/sdc
fails. There are two possibilities: it can fail gradually, showing S.M.A.R.T. errors - in this situation I can add a new device withbtrfs device add /dev/sde /mnt; btrfs filesystem balance /mnt
and then remove the old one withbtrfs device delete /dev/sdc /mnt
.-
But if it suddenly fails, becoming unreadable... A web search says in this situation I must first unmount the filesystem, mount in degraded mode, add a new device, then remove the missing device.
umount /mnt mount -o degraded /dev/sdb /mnt btrfs device add /dev/sdf /mnt btrfs device delete missing /mnt
An unmount is obviously a disruptive operation so there would be downtime - any application using the filesystem would get an I/O error. But these kind of "tutorials" on btrfs look outdated, considering btrfs is under heavy development.
Question is: considering current state of btrfs, is it possible to do this online, i.e. without unmounting?
If not, there is a software-only solution that can fulfill this need?
In Linux 3.8, btrfs replace mountpoint old_disk new_disk
was added. If you're running a recent kernel, it will provide the functionality you are looking for.
small correction, current syntax is:
btrfs replace start OLDDEV NEWDEV MOUNTPOINT
which then backgrounds.
You can check the status with
btrfs replace status MOUNTPOINT
which will show you a continuously updated status of the replace operation.