how to interpret these errors from syslog

My Ubuntu has been acting weird lately. Yesterday, it wouldn't boot normally, so I had to do a 'recovery mode' boot. It said I had to do an fsck manually, which I did using a live CD. After this, I was already able to boot to the desktop but everything is so sluggish. Apps would turn gray for seconds. Sometimes other apps wont start at all. In other instances it it saying that the filesystem is in read-only mode.

This is part of what I've been getting:

Oct 26 21:23:56  kernel: [ 1900.960506] sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
Oct 26 21:23:56  kernel: [ 1900.960533] end_request: I/O error, dev sda, sector 63206544
Oct 26 21:23:56  kernel: [ 1900.960541] Buffer I/O error on device sda1, logical block 7900562
Oct 26 21:24:00  kernel: [ 1904.146683]          res 51/40:00:90:74:c4/00:00:00:00:00/03 Emask 0x9 (media error)
Oct 26 21:24:00  kernel: [ 1904.146692] ata1.00: error: { UNC }
Oct 26 21:24:03  kernel: [ 1907.351844]          res 51/40:00:90:74:c4/00:00:00:00:00/03 Emask 0x9 (media error)
Oct 26 21:24:03  kernel: [ 1907.351853] ata1.00: error: { UNC }
Oct 26 21:24:06  kernel: [ 1910.482152]          res 51/40:00:90:74:c4/00:00:00:00:00/03 Emask 0x9 (media error)
Oct 26 21:24:06  kernel: [ 1910.482161] ata1.00: error: { UNC }
Oct 26 21:24:09  kernel: [ 1913.604742]          res 51/40:00:90:74:c4/00:00:00:00:00/03 Emask 0x9 (media error)
Oct 26 21:24:09  kernel: [ 1913.604751] ata1.00: error: { UNC }
Oct 26 21:24:12  kernel: [ 1916.792646]          res 51/40:00:90:74:c4/00:00:00:00:00/03 Emask 0x9 (media error)
Oct 26 21:24:12  kernel: [ 1916.792656] ata1.00: error: { UNC }
Oct 26 21:24:15  kernel: [ 1919.922855]          res 51/40:00:90:74:c4/00:00:00:00:00/03 Emask 0x9 (media error)
Oct 26 21:24:15  kernel: [ 1919.922864] ata1.00: error: { UNC }
Oct 26 21:24:16  kernel: [ 1920.056506] sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
Oct 26 21:24:16  kernel: [ 1920.056533] end_request: I/O error, dev sda, sector 63206544
Oct 26 21:24:16  kernel: [ 1920.056540] Buffer I/O error on device sda1, logical block 7900562
Oct 26 21:24:55  kernel: [ 1959.134566]          res 51/40:00:e0:28:44/00:00:00:00:00/04 Emask 0x9 (media error)
Oct 26 21:24:55  kernel: [ 1959.134575] ata1.00: error: { UNC }
Oct 26 21:25:05  kernel: [ 1969.674292]          res 51/40:00:b2:4c:44/00:00:00:00:00/04 Emask 0x9 (media error)
Oct 26 21:25:05  kernel: [ 1969.674301] ata1.00: error: { UNC }
Oct 26 21:25:08  kernel: [ 1972.887782]          res 51/40:00:b2:4c:44/00:00:00:00:00/04 Emask 0x9 (media error)
Oct 26 21:25:08  kernel: [ 1972.887791] ata1.00: error: { UNC }
Oct 26 21:25:12  kernel: [ 1976.059674]          res 51/40:00:b2:4c:44/00:00:00:00:00/04 Emask 0x9 (media error)
Oct 26 21:25:12  kernel: [ 1976.059683] ata1.00: error: { UNC }
Oct 26 21:25:15  kernel: [ 1979.206592]          res 51/40:00:b2:4c:44/00:00:00:00:00/04 Emask 0x9 (media error)
Oct 26 21:25:15  kernel: [ 1979.206601] ata1.00: error: { UNC }

Solution 1:

Your first hard drive (sda) is in the process of aggressively failing.

Power it off immediately, get a new boot drive, install a new OS on the new boot drive. Once that is working, plug in the failing drive, mount it "read only" and you might be able to extract data from it.

Of course, like most people you have complete and current back-ups, right? ;)

added in response to comment

"Infant failures" are so common in hardware engineering that there is a term for them. In general, a device will fail very early or run for a goodly while. If you' are having repeated failures you are either:

  1. buying cheap disks (I've personally had more trouble with Brand Foo drives than I can count (but my personal experience can't be used to make a general statement about a manufacturer which is why I didn't write "Maxtor")).
  2. having some really bad luck. The same could happen with lightbulbs: there is going to be some guy who has the next two lightbulbs he buys fail in a week. Them's statistics for you and you might just be "that guy" with the drives.
  3. you actually have a bad drive controller that is burning up the drive electronics. For example, there can be an out-of-spec resistor on a drive control line and it will fry every drive you attach to it.

I think those three possibilities are far, far more likely than you uncovering a fatal flaw in EXT4; it's just been beaten on too hard. Then again, demonic possession could be at play, consult the clergy of your choice and good luck.

Solution 2:

You can check the health of your hard disk by using the Disk Utility Tool. Click on System/Administration/Disk Utility, locate your hard disk in the window on the left and click on it, click on the "SMART Data" button on the right. Take a look at the assessment for each item, as well as the Overall assessment at the top. If it's not green, your disk is definitely failing.