ddrescue from drive intermittantly cutting out, ways to automate power cycling via software?

Have a failed Windows/NTFS drive. I was able to connect to another PC via USB-to-IDE cable and recover most of the drive with a Knoppix 7.2 LiveUSB and ddrescue/ddrutitliy. I've recovered most critical data. Now I'm using it as a learning exercise and to see if I could have made the process easier and just how much I might be able to get back without messing with hardware.

Full narrative below, but I was wondering if anyone has successfully made a script to handle the situation of a drive (USB or other) that either intermittently, or for known reason, cuts out during rescue.

Using ddru_ntfsbitmap I was able to pull the bootsector and the MBR bitmap to narrow recovery to just used filespace (16GB on a 60GB drive). Command used was:

ddru_ntfsbitmap -i 32256 -m MFTDomainFile.txt /dev/sdc filelocations.txt

(the -i uses 63*512 = 32256. because the drive won't mount to find the 63 from fdisk, I had to guess until ddru_ntfsbitmap told me it could find the bootsector. Apparently it's usually 63 or 2048.)

This drive cuts out frequently, and there is one drive section (first 950MB) where it will cut out after a single sector read error. Continuing requires pulling the usb-IDE cable and re-pluging it in to have the drive show back up in /dev. On this PC (or maybe it has to do with Knoppix) if the drive cuts out, ddrescue continues to mark additional read attempts as errors, making it difficult to keep track of 'real' sector read problems. (On another old PC with Ubuntu it would somehow detect this and terminate ddrescue, a nice feature, but I don't know what was responsible for that different behavior.) With a few starts and restarts I was able to use ddrescue to read/copy large sections at a time and get the majority of the disk. (>95% of the better-behaved section). Using the -i and -s options I work around and limit the impact of the 'mark everything as bad' problem.

Generally, the command I used was something like:

ddrescue -S -m filelocations.txt /dev/sdc HDImage.img HDRecoverlog.txt -r2 -d -i5GB -s1GB

if it cutout it would only mark that 1GB section as bad, and I could retry adding -AM so it'll block copy, or just run with -r5. Copy speed didn't seem to matter much in the slow sections, and it would cut out more frequently when doing block-copies.

After getting all large untried sections, allowing ddrescue to run overnight on the more stable part of the drive picked up most sector errors in that section. ddru_ntfsfindbad (to find which files had errors) reported 20 sector errors in $MFT, so I re-ran overnight using:

ddrescue -S -m MFTDomainFile.txt /dev/sdc HDImage.img HDRecoverlog.txt -r-1 -d

This picked up all $MFT errors, so I ran ddru_ntfsfindbad to find the files that still had errors in the rest of the drive:

ddru_ntfsfindbad -i 32256 -DD -HDImage.img HDRecoverlog.txt

(the -DD produces a debug-logfile with sector locations for each inode, which can be combined with the normal ntfsfindbad.log to locate every file with an error.)

Just letting ddrescue run on the stable part over a full weekend, it picked up all but two sector errors in that section (from 1112 errors on Friday). Rerunning ddru_ntfsfindbad produced a much smaller file error list.

For the difficult front part of the drive, there was still ~150MB marked as 'bad'. Much had simply been skipped/untried but marked as bad as the drive cutout. Using the ntfsfindbad log I could manually target a file with the following (apparently well known) painful process:

  1. Plug in the USB connector.
  2. Issue the ddrescue command after spin-up.
  3. Hit CTRL-C if I hear it catch on a bad sector , otherwise it loses its location as mentioned above.
    • with CTRL-C it stops at the next sector and will start there next time.
    • the -X option would remove the need for this, but it stays on the problem sector instead of advancing, and would never move past a really bad sector.
  4. unplug the drive to cut power
  5. goto (1)

Most files of interest I was able to completely recover this way. Manually targeting/splitting large untried sections it might pull the whole section without error. But sometimes I would have to retry a particular error 10-20times. Using this process, all but ~150kB of errors cleaned up 'quickly'. Since then, manually retrying has gotten down to a total of ~55 single sector errors and 31kB. Based on the behavior of the rest of the drive, I wouldn't be surprised if I could scrape all but a few actually bad sectors, and possibly get the whole image working after replacing a handful of files (of course Windows/system32 is sitting in that section).

I realize this amounts to a rather fortunate (albeit rare) case where data is this recoverable. But the last disk I rescued had a very similar "drive cuts out on a single read error" problem. I've read a number of posts about how to try to reset or power cycle the disk and/or USB port. From what I've seen the ability to cut USB power is highly dependent on both hardware and Linux kernel. I also realize there are hardware & service solutions to help with this that would be worthwhile if the cost/benefit picture was different.

So, is there a better way to handle the 5 step process above? That may have also made the initial data-scrape in the 'good' section faster (less manual) as well. Was there a better way (still DIY) to tackle the whole process?


Solution 1:

Setting the TLER to 1 second fixed the power cycle issue for me:

smartctl -l scterc,10,10 /dev/sdX

Solution 2:

I just had the same issue with a bad SSD in a USB enclosure, having to plug USB out and in again 200-300 times as the disk's /dev/sd… device was disappearing every time ddrescue encountered a bad sector, or just tried to read something in the ddrescue first pass "big block" mode that was in vicinity of a bad sector.

I did not find a proper solution for myself this time, but had the idea that people who encounter this more often than once per life would profit from a USB relay board that can be controlled from the command line, for example using the usbrelay software. The relay would be connected to a specially crafted USB extension cable in which it would interrupt and re-connect the power wire. That's the same as unplugging and re-plugging the device, and the system should then re-detect the device. That allows to automate everything in software, as the script can now initiate re-plugging the device.