Synology - BTRFS volume crashed & recovered. Why did it crash?

Solution 1:

I visited the WebGUI. There were no errors at all in the system other than the volume being offline.

The SMART status of each disk was checked and seemed OK
"Storage Manager" showed "Healthy"
"Storage Pool" showed "Healthy"

Only "volume" showed crashed.

I SSH'ed in and checked. mdadm --D /dev/md2 which is where my array was. It was showing State: Clean, Degraded

I checked dmesg, and found this:

[5638907.327288] ------------[ cut here ]------------
[5638907.332247] WARNING: CPU: 3 PID: 10234 at fs/btrfs/extent-tree.c:4207 btrfs_write_dirty_block_groups+0x365/0x390 [btrfs]()
[5638907.343601] BTRFS: Transaction aborted (error -2)
[5638907.343603] Modules linked in: nfsd exportfs rpcsec_gss_krb5 cifs udf isofs loop tcm_loop(O) iscsi_target_mod(O) target_core_ep(O) target_core_multi_file(O) target_core_file(O) target_core_iblock(O) target_core_mod(O) syno_extent_pool(PO) rodsp_ep(O) hid_generic usbhid hid usblp usb_storage denverton_synobios(PO) overlay exfat(O) btrfs synoacl_vfs(PO) hfsplus md4 hmac bnx2x(O) mdio mlx5_core(O) mlx4_en(O) mlx4_core(O) mlx_compat(O) qede(O) qed(O) atlantic(O) tn40xx(O) i40e(O) ixgbe(O) be2net(O) igb(O) i2c_algo_bit e1000e(O) vxlan ip6_udp_tunnel udp_tunnel fuse vfat fat crc32c_intel aesni_intel glue_helper lrw gf128mul ablk_helper arc4 cryptd ecryptfs sha256_generic ecb aes_x86_64 authenc des_generic ansi_cprng cts md5 cbc cpufreq_powersave cpufreq_performance acpi_cpufreq processor cpufreq_stats
[5638907.425092]  dm_snapshot dm_bufio crc_itu_t crc_ccitt quota_v2 quota_tree psnap p8022 llc sit tunnel4 ip_tunnel ipv6 zram sg etxhci_hcd xhci_pci xhci_hcd uhci_hcd ehci_pci ehci_hcd usbcore usb_common [last unloaded: denverton_synobios]
[5638907.448308] CPU: 3 PID: 10234 Comm: btrfs-transacti Tainted: P           O    4.4.59+ #25556
[5638907.457047] Hardware name: Synology Inc. RS2818RP+/Type2 - Board Product Name1, BIOS M.212 2019/11/01
[5638907.466571]  0000000000000000 ffff880068a0fc50 ffffffff812bf70d ffff880068a0fc98
[5638907.474851]  ffffffffa0939b8d ffff880068a0fc88 ffffffff8104b7cd ffff8801704f9e00
[5638907.483132]  ffff88003b616338 0000000000000001 00000000fffffffe ffff8801704f9f50
[5638907.491422] Call Trace:
[5638907.494179]  [<ffffffff812bf70d>] dump_stack+0x4d/0x70
[5638907.499623]  [<ffffffff8104b7cd>] warn_slowpath_common+0x7d/0xc0
[5638907.505932]  [<ffffffff8104b859>] warn_slowpath_fmt+0x49/0x50
[5638907.511996]  [<ffffffffa0892d45>] btrfs_write_dirty_block_groups+0x365/0x390 [btrfs]
[5638907.520056]  [<ffffffffa0932df8>] commit_cowonly_roots+0x230/0x2d1 [btrfs]
[5638907.527250]  [<ffffffffa08a90e8>] btrfs_commit_transaction+0x528/0xcb0 [btrfs]
[5638907.534793]  [<ffffffffa08a9905>] ? start_transaction+0x95/0x3d0 [btrfs]
[5638907.541810]  [<ffffffffa08a387c>] transaction_kthread+0x1ec/0x220 [btrfs]
[5638907.548915]  [<ffffffffa08a3690>] ? btrfs_cleanup_transaction+0x510/0x510 [btrfs]
[5638907.556701]  [<ffffffff810672a6>] kthread+0xc6/0xe0
[5638907.561883]  [<ffffffff810671e0>] ? kthread_create_on_node+0x180/0x180
[5638907.568717]  [<ffffffff81567abf>] ret_from_fork+0x3f/0x80
[5638907.574423]  [<ffffffff810671e0>] ? kthread_create_on_node+0x180/0x180
[5638907.581346] ---[ end trace 27185b26c2db1370 ]---
[5638907.586280] BTRFS: error (device md2) in btrfs_write_dirty_block_groups:4207: errno=-2 No such entry
[5638907.595721] BTRFS info (device md2): forced readonly
[5638907.600997] BTRFS warning (device md2): Skipping commit of aborted transaction.
[5638907.608618] BTRFS: error (device md2) in cleanup_transaction:2019: errno=-2 No such entry
[5638907.617108] BTRFS info (device md2): delayed_refs has NO entry

So the data was there, and the array was in read only. My research took me to this SuSe KB: https://www.suse.com/support/kb/doc/?id=000018769

I references the same error as I got BTRFS: Transaction aborted (error -2).

The article states

Good thing is, it's a WARNING, not a fatal error. WARNINGs like this one, e.g. regarding quota, typically are runtime only things that are fixed by BTRFS after the WARNING is issued. Not a bad problem.

Which was somewhat reassuring.

I ran a

syno_poweroff_task -d

In order to shutdown all the Synology services that may be accessing the volume. This stops the WebUI etc., but keeps SSH on.

I then did a

umount /volume1

In order to stop I/O to the volume (although it was already in RO mode per the output of dmesg above. I then did a btrfsck on the md2 device. Output below

# btrfsck /dev/md2
Syno caseless feature on.
Checking filesystem on /dev/md2
UUID: 7a29febb-e9b5-4f77-afd7-4e1e10971340
checking extents
checking free space tree
checking fs roots
checking csums
checking root refs
found 30769182859264 bytes used err is 0
total csum bytes: 44052
total tree bytes: 50331648
total fs tree bytes: 24477696
total extent tree bytes: 23691264
btree space waste bytes: 1190925
file data blocks allocated: 30769149861888
referenced 30769131089920

And in dmesg:

[5644451.646580] BTRFS info (device md127): using free space tree
[5644451.652561] BTRFS info (device md127): has skinny extents
[5644459.827213] BTRFS info (device md127): checking UUID tree

Once that completed, I simply rebooted the system and everything came up as per normal.

I'm currently running a volume consistency check" on it which I believe is a mdadm "resync" that's been running for almost 24 hours now with no issues.

I guess the goal of making this post is to figure out if anyone with a bit more knowhow on btrfs has experienced this, and if they have any ideas what caused this?

Synology - BTRFS volume crashed & recovered. Why did it crash?

Solution 1:

Related

Recent Posts