SATA hdd errors
In my experience the errors you're seeing are actually hardware errors reflected in software. The 'lost page write due to I/O error' message is one I've seen with bad hard-drives, and it behaves similar to how you describe when attempting to fsck it. This is almost definitely a true hardware fault.
You should check the output of smartctl to see what it says could be problem.
smartctl --attributes /dev/sdb
It'll give you output similar to this:
=== START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0003 212 186 021 Pre-fail Always - 4358 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 97 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0 9 Power_On_Hours 0x0032 066 066 000 Old_age Always - 25420 10 Spin_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0 11 Calibration_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 86 194 Temperature_Celsius 0x0022 104 001 000 Old_age Always - 46 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail Offline - 0
The output can be arcane, but the one I'd pay close attention to would be Reallocated_Sector_Ct, since that tells you what the HD has for known bad sectors. The command 'smartctl -a' will give a lot more data. On the bad HD I had a while back, the bottom of that output is the 'SMART Error Log' which had a few entries.
You had an uncorrectable read error.
Error: UNC at LBA = 0x03800922 = 58722594
The data that was on that block is now lost.
You should:
- be using a mirror in the first place. Enterprise disks are actually intended to be behind a mirror and they would rather return a read error than try really hard to get the data.
- recover the lost data from backups
You have NO EXCUSE to not be using RAID (especially if you host website for clients!) - the OS is not that large, you don't need a dedicated disk for it on a 2-disk system.