Causes of sudden massive filesystem damage? ("root inode is not a directory") [closed]
I have a laptop running Maverick (very happily until yesterday), with a Patriot Torx SSD; LUKS encryption of the whole partition; one lvm physical volume on top of that; then home and root in ext4 logical volumes on top of that.
When I tried to boot it yesterday, it complained that it couldn't mount the root filesystem. Running fsck, basically every inode seems to be wrong. Both home and root filesystems show similar problems. Checking a backup superblock doesn't help.
e2fsck 1.41.12 (17-May-2010)
lithe_root was not cleanly unmounted, check forced.
Resize inode not valid. Recreate? no
Pass 1: Checking inodes, blocks, and sizes
Root inode is not a directory. Clear? no
Root inode has dtime set (probably due to old mke2fs). Fix? no
Inode 2 is in use, but has dtime set. Fix? no
Inode 2 has a extra size (4730) which is invalid
Fix? no
Inode 2 has compression flag set on filesystem without compression support. Clear? no
Inode 2 has INDEX_FL flag set but is not a directory.
Clear HTree index? no
HTREE directory inode 2 has an invalid root node.
Clear HTree index? no
Inode 2, i_size is 9581392125871137995, should be 0. Fix? no
Inode 2, i_blocks is 40456527802719, should be 0. Fix? no
Reserved inode 3 (<The ACL index inode>) has invalid mode. Clear? no
Inode 3 has compression flag set on filesystem without compression support. Clear? no
Inode 3 has INDEX_FL flag set but is not a directory.
Clear HTree index? no
....
Running strings
across the filesystems, I can see there are what look like filenames and user data there. I do have sufficiently good backups (touch wood) that it's not worth grovelling around to pull back individual files, though I might save an image of the unencrypted disk before I rebuild, just in case.
smartctl
doesn't show any errors, neither does the kernel log. Running a write-mode badblocks
across the swap lv doesn't find problems either. So the disk may be failing, but not in an obvious way.
At this point I'm basically, as they say, fscked? Back to reinstalling, perhaps running badblocks over the disk, then restoring from backup? There doesn't even seem to be enough data to file a meaningful bug...
I don't recall that this machine crashed last time I used it.
At this point I suspect a bug or memory corruption caused it to write garbage across the disks when it was last running, or some kind of subtle failure mode for the SSD.
What do you think would have caused this? Is there anything else you'd try?
Solution 1:
It seems that your first superblock is corrupt. There are many copies of the superblock, since it is the most critical piece of the filesystem. You can try e2fsck
with the -b
option to check if a different copy of the superblock has the correct information. Check e2fsck(8) for more information about the -b
option, and how to determine the location of the additional superblocks.
IIRC, there is only one copy of the root directory, so if it was damaged, it will have to be recreated, empty. The directories originally under the root directory will appear in /lost+found and you will have to relocate them from there.
Inode tables are spread through the partition. It is unlikely that you would lose all of them. The ones that are recoverable, if their files cannot be relocated to their original directories, they will also end in /lost+found.
Solution 2:
I've seen this before. It's something to do with Ubuntu 10.10. I'd look around on the bug tracker as its been posted a few times. To be sure, take a snapshot of the disk, wipe it then drop it in a secondary system to see if the bug repeats itself(to rule out the disk - unlikely culprit).
Solution 3:
Update: Eventually, I became convinced the problem was some kind of complicated SSD failure, or I suppose possibly an interaction between the kernel and the SSD. I replaced it with a magnetic disk, and I haven't had trouble again.