fsck: Options for most thorough disk check

I've got a disk with a known problem (I know because dd gags when I try to clone it). But when I boot with a live CD and run fsck on the unmounted partition, I get this:

ubuntu@ubuntu:~$ sudo fsck /dev/sdf1
fsck 1.41.4 (27-Jan-2009)
e2fsck 1.41.4 (27-Jan-2009)
/dev/sdf1: clean, 227091/9625600 files, 12789815/38497756 blocks

a millisecond later. It's hard to believe it's checked out the entire hard drive in a ms.

I'm also not certain whether I should be fsck'ing sdf1 or the entire physical disk sdf. When I try the entire drive:

ubuntu@ubuntu:~$ sudo fsck /dev/sdf
fsck 1.41.4 (27-Jan-2009)
e2fsck 1.41.4 (27-Jan-2009)
fsck.ext2: Device or resource busy while trying to open /dev/sdf
Filesystem mounted or opened exclusively by another program?

Which I don't understand because none of the partitions appear to be mounted (I just booted from a live CD and ran the command).

So my basic question is: How can I get fsck (or a different tool that might work better) to spend more than a millisecond analyzing my problem disk?

First off, you're right about running fsck on the partition - fsck only works on filesystems, not entire disks. You can get a list of all partitions on the disk with fdisk -l /dev/sdd.

You're filesystem type is probably ext3 (the default in most Linux distros), which means it will usually pass an fsck as long its journal is clean. fsck -f will, as mentioned above, force a full check.

However, if you have read errors on the disk, no amount of fsck will help dd - since dd really doesn't care about the content of the disk.

To get dd to read the disk and continue on read errors, use dd conv=noerror,sync, which will continue on read errors and append null bytes to any block when there is a read error.

After you have finished the backup, you should run fsck -f on the clone to get it up and running again.

Another tip: If you backup the partition to a file, you can loopback mount it with mount -o loop filename.ext3 /mountpoint. Also, say you are cloning a 200G partition to a 500G drive, you can then run resize2fs /dev/sdx1 (where sdx is your new drive, partitioned with a single 500G partition), and the filesystem will be resized to 500G.

Lastly, if the disk is in such a shape that it's giving you read errors, I would advise you to avoid turning the disk off and on until you're finished recovering data. In some failure modes, the disk will at some point simply no longer spin up or fail to be recognized by the OS, and at that point getting data out of the drive becomes quite expensive.

This may not be relevant in your case, but thought I'd mention it anyway:

For a lower-level disk check, you could use the badblocks utility. It goes through a device and reports any bad blocks (it cannot repair anything, of course). It's useful, at least, for verifying whether a disk is physically damaged.

Also, e2fsck can use badblocks to avoid bad blocks being used by a filesystem. From e2fsck manual:

  -c     This option causes e2fsck to use badblocks(8) program to do a  read-
         only scan of the device in order to find any bad blocks.  If any bad
         blocks are found, they are added to the bad block inode  to  prevent
         them from being allocated to a file or directory.  If this option is
         specified twice, then the bad block scan will be done using  a  non-
         destructive read-write test.

You want the -f option to fsck (Force checking even if the file system seems clean.)

You should run fsck in single user mode. One easy way to do this without a live cdrom boot is to reboot with the -F option.

shutdown -rF now

fsck: Options for most thorough disk check

Related

Recent Posts