Is this hard disk dead?

Not sure, is this right site for this Q, but let me try

Last time i have problem with hard disk. Sometimes its do strange sound, and i get it from logs:

$dmesg | grep ata4

[29409.945516] ata4.00: exception Emask 0x10 SAct 0xf SErr 0x90202 action 0xe frozen

[29409.945529] ata4.00: irq_stat 0x00400000, PHY RDY changed

[29409.945538] ata4: SError: { RecovComm Persist PHYRdyChg 10B8B }

[29409.945546] ata4.00: failed command: READ FPDMA QUEUED

[29409.945562] ata4.00: cmd 60/30:00:56:22:5f/00:00:00:00:00/40 tag 0 ncq 24576 
in
[29409.945573] ata4.00: status: { DRDY }

[29409.945580] ata4.00: failed command: READ FPDMA QUEUED

[29409.945594] ata4.00: cmd 60/18:08:8e:22:5f/00:00:00:00:00/40 tag 1 ncq 12288 
in
[29409.945605] ata4.00: status: { DRDY }

[29409.945611] ata4.00: failed command: READ FPDMA QUEUED

[29409.945625] ata4.00: cmd 60/08:10:46:02:66/00:00:00:00:00/40 tag 2 ncq 4096 
in
[29409.945635] ata4.00: status: { DRDY }

[29409.945641] ata4.00: failed command: READ FPDMA QUEUED

[29409.945656] ata4.00: cmd 60/80:18:ee:04:66/00:00:00:00:00/40 tag 3 ncq 65536 
in
[29409.945666] ata4.00: status: { DRDY }

[29409.945679] ata4: hard resetting link

[29413.976083] ata4: softreset failed (device not ready)

[29413.976097] ata4: applying SB600 PMP SRST workaround and retrying

[29414.148070] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

[29414.184986] ata4.00: SB600 AHCI: limiting to 255 sectors per cmd

[29414.243280] ata4.00: SB600 AHCI: limiting to 255 sectors per cmd

[29414.243292] ata4.00: configured for UDMA/133

[29414.243324] ata4: EH complete

[680674.804563] ata4: exception Emask 0x50 SAct 0x0 SErr 0x90a02 action 0xe 
frozen
[680674.804575] ata4: irq_stat 0x00400000, PHY RDY changed

[680674.804584] ata4: SError: { RecovComm Persist HostInt PHYRdyChg 10B8B }

[680674.804603] ata4: hard resetting link

[680678.840561] ata4: softreset failed (device not ready)

Is this ata4 sata hard drive dead? Must i change it ASAP ? Need I specify more info?


Solution 1:

Replace your drive immediately, especially if that 'strange sound' is a clicking noise.

Solution 2:

THe clicking is an issue of course, but I found this question investigating a similar error on our server, and in our case the issue was NOT the drive, but a problem with WD (Western Digital) and faulty NCQ implementation!

You can read about it on these links to see if it fits your issue:

  • https://superuser.com/questions/284952/troublesome-hard-drive-in-lvm-is-it-broken
  • http://www.axelog.de/2010/05/9-sata-phyrdychg-exception/
  • https://bugzilla.kernel.org/show_bug.cgi?id=8627

But the error (specifically this part) looks similar enough that I found this question :)

ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen
ata6.00: irq_stat 0x00400000, PHY RDY changed
ata6: SError: { PHYRdyChg LinkSeq TrStaTrns }

For us, short term fix/test was this:

echo 1 > /sys/block/sd{a,b,c,d}/device/queue_depth

Long term would be to add it to the blacklist as you can read in the references. I have NO experience with that, but link 2 says a patch would look like this:

--- a/drivers/ata/libata-core.c 2010-05-20 20:39:08.000000000 +0200
+++ b/drivers/ata/libata-core.c 2010-05-20 20:43:54.000000000 +0200
@@ -3924,6 +3924,7 @@
        { "Maxtor 7V300F0",     "VA111630",     ATA_HORKAGE_NONCQ },
        { "ST380817AS",         "3.42",         ATA_HORKAGE_NONCQ },
        { "ST3160023AS",        "3.42",         ATA_HORKAGE_NONCQ },
+       { "WDC WD2502ABYS-5*",  NULL,           ATA_HORKAGE_NONCQ },