How to recover an mdadm array on Synology NAS with drive in "E" state?

Just an addition to the solution that I found after I experienced the same issue. I followed dSebastien's blog post on how to re-create the array:

I found that that method of recreating the array worked better than this above method. However after re-creating the array, the volume was still not showing on the web interface. None of my LUN's were showing. Basically showing a new array with nothing configured. I contacted Synology support, and they remoted in to fix the issue. Unfortunately, they remoted in whilst I was away from the console. I did manage to capture the session though, and looked through what they did. Whilst trying to recover some of my data, the drive crashed again, and I was back at the same situation. I recreated the array as in dSebastien's blog and then looked through the synology session to perform their update. After running the below commands, my array and LUN's appeared on the web interface, and I was able to work with them. I have practically zero experience in linux, but these were the commands I performed in my situation. Hope this can help someone else, but please use this at your own risk. It would be best to contact Synology support and get them fix this for you, as this situation might be different from yours

DiskStation> synocheckiscsitrg
synocheckiscsitrg: Pass 

DiskStation> synocheckshare
synocheckshare: Pass SYNOICheckShare()
synocheckshare: Pass SYNOICheckShareExt()
synocheckshare: Pass SYNOICheckServiceLink()
synocheckshare: Pass SYNOICheckAutoDecrypt()
synocheckshare: Pass SYNOIServiceShareEnableDefaultDS()

DiskStation> spacetool --synoblock-enum
****** Syno-Block of /dev/sda ******
//I've removed the output. This should display info about each disk in your array

DiskStation> vgchange -ay
  # logical volume(s) in volume group "vg1" now active

DiskStation> dd if=/dev/vg1/syno_vg_reserved_area of=/root/reserved_area.img
24576+0 records in
24576+0 records out

DiskStation> synospace --map_file -d
Success to dump space info into '/etc/space,/tmp/space'

DiskStation> synocheckshare
synocheckshare: Pass SYNOICheckShare()
synocheckshare: Pass SYNOICheckShareExt()
synocheckshare: Pass SYNOICheckServiceLink()
synocheckshare: Pass SYNOICheckAutoDecrypt()
synocheckshare: Pass SYNOIServiceShareEnableDefaultDS()

DiskStation> synocheckiscsitrg
synocheckiscsitrg: Not Pass, # conflict 

DiskStation> synocheckiscsitrg
synocheckiscsitrg: Pass

Another addition: I've hit a very similar issue with my one-disk / RAID level 0 device.

Synology support was very helpful and restored my device. Here's what happened, hope this helps others:

My disk had read errors on one particular block, messages in system log (dmesg) were:

[4421039.097278] ata1.00: read unc at 105370360
[4421039.101579] lba 105370360 start 9437184 end 5860528064
[4421039.106917] sda3 auto_remap 0
[4421039.110097] ata1.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6
[4421039.116744] ata1.00: edma_err_cause=00000084 pp_flags=00000003, dev error, EDMA self-disable
[4421039.125410] ata1.00: failed command: READ FPDMA QUEUED
[4421039.130767] ata1.00: cmd 60/00:08:b8:d2:47/02:00:06:00:00/40 tag 1 ncq 262144 in
[4421039.130772]          res 41/40:00:f8:d2:47/00:00:06:00:00/40 Emask 0x409 (media error) <F>
[4421039.146855] ata1.00: status: { DRDY ERR }
[4421039.151064] ata1.00: error: { UNC }
[4421039.154758] ata1: hard resetting link
[4421039.667234] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
[4421039.887286] ata1.00: configured for UDMA/133
[4421039.891777] ata1: UNC RTF LBA Restored
[4421039.895745] ata1: EH complete

A few seconds later I received the dreadful Volume 1 has crashed mail from my device.

-- Disclaimer: Be sure to replace the device name by your's and do not simply copy&paste these commands, as this might make things worse! --

After stopping smb I was able to re-mount the partition read only and run e2fsk with badblocks check (-c):

umount /dev/md2
e2fsck -C 0 -v -f -c /dev/md2

(one could also use e2fsck -C 0 -p -v -f -c /dev/md2 to run as unattended as possible, although this didn't work out in my case, because the errors had to be fixed manually. So I had to restart e2fsck. Conclusio: -p doesn't make much sense in case of disk error)

Although e2fsck was able to fix the errors and smartctl also showed no more increase in Raw_Read_Error_Rate, the volume still wouldn't mount in read-write mode by the device. DSM still showed "volume crashed"

So I opened a ticket with support. It took quite a while to get things going first, but in the end they fixed it by rebuilding the RAID array with:

synospace --stop-all-spaces
syno_poweroff_task -d 
mdadm -Sf /dev/md2
mdadm -AfR /dev/md2 /dev/sda3

Be sure to check your device names (/dev/mdX and /dev/sdaX) before doing anything. cat /proc/mdstat will show the relevant information.

How to recover an mdadm array on Synology NAS with drive in "E" state?

Related

Recent Posts