Accidentally resized ext4fs over 16TB limit, how to rollback?

We tried to increase the size of our 15TB partition with lvresize with the following command:

lvresize --resizefs --size +1T /dev/vg0/mm

No errors were displayed during the operation, but the result was kind of a catastrophe. No files were visible any more and the whole drive seemed to be empty.

syslog contained the following errors:

inode #2: block 31777: comm ls: bad entry in directory: inode out of bounds - offset=0(0), inode=2, rec_len=12, name_len=1

We managed to umount the partition and the plan was to run fsck to fix it:

$ fsck.ext4 -n /dev/mapper/vg0-mm
e2fsck 1.42.9 (4-Feb-2014)
Superblock has an invalid journal (inode 8).
Clear? no

fsck.ext4: Illegal inode number while checking ext3 journal for /dev/mapper/vg0-mm

/dev/mapper/vg0-mm: ********** WARNING: Filesystem still has errors **********

$ fsck.ext4 -v /dev/mapper/vg0-mm
e2fsck 1.42.9 (4-Feb-2014)
Superblock has an invalid journal (inode 8).
Clear<y>? yes
*** ext3 journal has been deleted - filesystem is now ext2 only ***

Corruption found in superblock.  (inodes_count = 0).

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
     or
         e2fsck -b 32768 <device>

/dev/mapper/vg0-mm: ***** FILE SYSTEM WAS MODIFIED *****

Ok, then next trying to list available alternate superblocks:

$ mke2fs -n  /dev/mapper/vg0-mm
mke2fs 1.42.9 (4-Feb-2014)
mke2fs: Size of device (0x100000000 blocks) /dev/mapper/vg0-mm too big to be expressed in 32 bits using a blocksize of 4096.

So it looks like we managed to accidentally go over the 16TB limit with lvresize and our ext4fs didn't have the 64-bit feature flag enabled.

After that we tried to decrease size back to below the 16TB limit, but resize2fs doesn't work either, because it thinks partition is (obviously) dirty and doesn't want to do anything with it.

Any recommendations which direction to take next? Run resize with force or trying to enable the 64-bit feature flag? Something else?

dumpe2fs displays this (among other info):

Inode count: 0
Block count: 4294967295
Block size: 4096

Relevant version info:

cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.5 LTS"
e2fsprogs version: 1.42.9-3ubuntu1.3 amd64
Kernel version: 3.13.0-147-generic

UPDATE: After reading through some source code we found root cause for the problem: https://github.com/torvalds/linux/commit/4f2f76f751433908364ccff82f437a57d0e6e9b7

The question still remains, how to recover from the situation where inode count has been reset to 0, thanks to the overflow bug.


Solution 1:

It seems that the weird behavior regarding lvresize using the --resizefs option together with ext filesystem and to-the-last-byte complete 16 TB large partition was first seen already back in 2009: (https://www.redhat.com/archives/ext3-users/2009-January/msg00003.html)

"We should probably make mkfs just silently lop off one block if it encounters a boundary condition like this ..."

The end credits?

We were able to fix the situation by hacking the superblock by hand. We took a LVM snapshot of the start situation first. Then we dumped the first 64k bytes of the partition to a file for a closer examination. The Inode Count and Block Count values at the very beginning of the Superblock were corrupted due to the bug listed above. (https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Layout)

As per studying e2fsprogs source code we determined that by writing the Inode Count and Block Count values one Block Group smaller (32k) of the maximum to the Superblock could be a nice thing to try and it worked. The bytes were

00 80 ff ff 00 80 ff ff

and we just dd'ed them to the partition.

With newest version of self-compiled e2fsprogs' fsck was able to fix the rest of the errors. All seems ok now.