Estimating the time needed for a resize2fs shrink

I have a large ext4 filesystem which I'm currently shrinking (109Tb -> 83Tb in my case), and it's taking an extremely long time (Day 5 as of asking). Currently I can see that the process is still doing I/O (so it seems it hasn't errored out and stalled i.e. 100% cpu usage) via iotop. However, from a cursory glance around the internet it would seem that resize2fs hasn't been quite as optimized for shrinks as much as growing the volumes (circa 2011).

To that matter, I don't want to interrupt it if I can help it, but I feel a little naked running a filesystem change for this long. What would be a good/timely estimate for an ext4 shrink, given we know the space requirements before and after (as well as the number of blocks / block sizes)

Software involved:

  • e2fs...: 1.43.1
  • OS: debian 4.19.16-1-bpo9+1

My specific filesystem:

  • Type: ext4
  • Size: ~109Tb (29297465344 blocks)
  • Shrink to: 83Tb (22280142848 blocks)
  • Block size: 4Kb (4096 bytes)
  • bytes-per-inode: 2^15 (32786 bytes)

Current outputs:

resize2fs -p ...:

[root@devlynx]## ~:: resize2fs -p /dev/storage/storage 83T
resize2fs 1.43.4 (31-Jan-2017)
Resizing the filesystem on /dev/storage/storage to 22280142848 (4k) blocks.
Begin pass 2 (max = 802451420)
Relocating blocks             XX--------------------------------------

iotop:

   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
  7282 be/4 root       39.21 M/s   39.21 M/s  0.00 % 94.07 % resize2fs -p /dev/storage/storage 83T

cat /proc/7282/io:

rchar: 12992021859371
wchar: 12988874121611
syscr: 13244258
syscw: 12482026
read_bytes: 13003899662336
write_bytes: 12988874125312
cancelled_write_bytes: 0

I'm still looking up info about the different passes resize2fs needs to do as well as how I could calculate how long those passes take given the info I've got about my filesystem (I have more if needed). In short, how can I come up with a final estimation for how long this will take?

Edit: Is this actually a finished Pass 2?

[root@devlynx]## ~:: resize2fs -p /dev/storage/storage 83T
resize2fs 1.43.4 (31-Jan-2017)
Resizing the filesystem on /dev/storage/storage to 22280142848 (4k) blocks.
Begin pass 2 (max = 802451420)
Relocating blocks             XX--------------------------------------
Begin pass 3 (max = 894088)
Scanning inode table          XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 4 (max = 92164)
Updating inode references     XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
The filesystem on /dev/storage/storage is now 22280142848 (4k) blocks long.

Solution 1:

Rough estimates can help illustrate the scale of a thing, even if simplistic and not at all accurate or precise. Assume all 1.2E+14 bytes need to be read, and 4E+7 bytes per second can be sustained. That is 3E+6 seconds, or 34 days. resize2fs 5% progress bar at about 5 days seems like the right power of 10.

Weeks to go, at least.


When does this volume need to be returned to service? Different urgency for something that needs to go up now, versus an archive with no immediate use that you can spend a month on.

Are you prepared for data loss if this gets interrupted? There is not a graceful way to stop it, so a chance of corruption. Successful reduces have happened, but they are not commonly done, and stopping the reduce in the middle of reshuffling around blocks even less so. Whatever happens to this file system, check consistency with fsck. Have a recovery plan ready, with backups of important data.

Must this volume still be reduced, even if this attempt ends up failed? The safe way is to create a new, smaller file system and copy data over. Obvious disadvantage, this requires new storage. Perhaps take the opportunity to do a storage migration or other things that require an array rebuild or similar.

Solution 2:

As there is only one other answer, I figure I'll offer up my limited experience.

My experience running ~5 resize2fs shrinks has been that dividing the amount to be shrunk (109Tb - 83Tb = 26Tb in OP's case) by the write speed as reported by tools like iotop resulted in a time estimate slightly larger than the process actually took; my resizes took 70-90% of that time.

OP reported in a comment on John Mahowald's answer that the final process took "about a week". This matches my experience, as OP reported 40 meg/sec from iotop. 40 * 60 * 60 * 24 * 7 = 24,192,000 meg, or about 23Tb which is about 88.7% of the 26Tb reduction size.

I was always resizing down to the minimum size to contain the stored data (-M), and I'd speculate that this kind of resize would take longer than downsizing a volume which is mostly empty, as we could imagine fewer allocated blocks would need to be relocated.

OP's experience with the pass 2 "progress bar" matches mine: I have not been able to glean any meaningful indication of progress, and pass 2 always finishes with a mostly-empty progress bar. Furthermore, the number of Xs on the "progess bar" increases and decreases in my experience, sometime going up and down multiple times. I have seen it increase to 8 X's, then decrease to 0 and finish with 0. I have also seen it complete with other numbers of X's, like two or six. I have no idea how to interpret what this "progress bar" is telling me and intend to post my own question about this.

OP's slow data rate also matches my experience. My resizes happened at around 40meg/sec despite the disks in question being capable of 80-120meg/sec sequential write speed. My blocks are 4k in size. If the drive is relocating a single block at a time, this is probably like random 4k read/write--a very seek-heavy operation. My resize2fs process also seemed to consume one whole core of CPU (100% cpu usage).