mkdir: "no space left on device" on specific folders after Apache Tomcat reached max-file ulimit
The question:
I have a tomcat running a java application that occasionally accumulates socket handles and reaches the ulimit we configured (both soft and hard) for max-open-files, which is 100K. When this happens, the java appears to still be alive, but we can no longer access it.
However my question is about a bizarre phenomenon that accompanies this situation: I cannot mkdir
inside the tomcat folder.
[root@server /opt/apache-tomcat-7.0.52]# mkdir some_folder
mkdir: cannot create directory `some_folder': No space left on device
In fact, I get the same error under multiple different folders that reside under /opt
, but not under /opt
directly, and not - for example - under /opt/apache-tomcat-7.0.52/logs
.
I can't explain it for the life of me, and can only resolve using init 6
. Any suggestions on how to fix the issue and be able to mkdir
again without restart?
Some pointers and clues that I've gathered:
The setup is CentOS 6.5 running under AWS with the said tomcat disk mounted from an EBS volume.
Running df -h
shows that the disk is evidently not full:
[root@server ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 9.9G 3.6G 5.9G 38% /
none 121G 0 121G 0% /dev/shm
/dev/xvdc 1008G 197G 760G 19% /mnt/eternal
Contents of /etc/fstab
(which, for some reason, use double mounting - not sure why):
/dev/xvdc /mnt/eternal ext4 defaults 0 0
/mnt/eternal /opt ext4 defaults,bind 0 0
And appropriate lines from mount
:
/dev/xvdc on /mnt/eternal type ext4 (rw)
/mnt/eternal on /opt type none (rw,bind)
Running df -i
doesn't hint at something bad (and is similar to a healthy system):
[root@server ~]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvda1 655360 78245 577115 12% /
none 31549847 1 31549846 1% /dev/shm
/dev/xvdc 67108864 12551 67096313 1% /mnt/eternal
Running sysctl fs.file-nr
gives this result which is evidently high but seems far from the limit:
[root@server ~]# sysctl fs.file-nr
fs.file-nr = 101632 0 25087252
Running find /proc | wc -l
returns 62497876
(62M), which could reach some OS limit; on a similar healthy system it's more like 1800000 (1.8M).
The hugely occupied subfolder appears to be /proc/<my-java-pid>/task
(~62M items compared to ~1.7M on the healthy system). This is likely just a reflection of my 100K fds (x2, for both fds and fdinfos) over 300 individual "task" folders.
This appears at the end of my dmesg dump (my java pid in this example is 105940) - not sure how this might relate:
INFO: task java:105940 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java D 0000000000000008 0 105940 1 0x00000080
ffff88161ab55c88 0000000000000082 ffff88161ab55c18 ffffffff8109be4f
ffffffff81ed28f0 ffff881e66360ae0 ffffffff8100bb8e ffff88161ab55c88
ffff881e66361098 ffff88161ab55fd8 000000000000fb88 ffff881e66361098
Call Trace:
[<ffffffff8109be4f>] ? hrtimer_try_to_cancel+0x3f/0xd0
[<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff810521c9>] ? mutex_spin_on_owner+0x99/0xc0
[<ffffffff8151636e>] __mutex_lock_slowpath+0x13e/0x180
[<ffffffff8151620b>] mutex_lock+0x2b/0x50
[<ffffffff8111c461>] generic_file_aio_write+0x71/0x100
[<ffffffffa0121fb1>] ext4_file_write+0x61/0x1e0 [ext4]
[<ffffffff81180d7a>] do_sync_write+0xfa/0x140
[<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff812292ab>] ? selinux_file_permission+0xfb/0x150
[<ffffffff8121bd26>] ? security_file_permission+0x16/0x20
[<ffffffff81181078>] vfs_write+0xb8/0x1a0
[<ffffffff81181971>] sys_write+0x51/0x90
[<ffffffff81517e2e>] ? do_device_not_available+0xe/0x10
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
I would be happy to share/provide any other suggested findings.
Secretly I hope that understanding this weird behavior would shed light on the pathology causing this whole mess. But, that's just my private hope :)
Solution 1:
In most cases (obviously not in your case) the reason will be that you are running out of iNodes.
To check this run df -i:
Filesystem Inodes IUsed IFree IUse% Mounted on
[...]
25600 25600 0 100% /foo
Here you can see the use of iNodes is 100%.
Bad news is, according to https://superuser.com/questions/585641/changing-max-inode-count-number-in-ext3-filesystem-in-cent-os you need to re-create your file system with the -i option in order to increase the number of inodes.
Solution 2:
I found the answer to my question of "how to fix this scenario". I don't know all the details of how this came to be, but I know enough to put out an answer.
Short answer: unmounting the disk, running chkdsk -f
on it, and mounting back again solves and prevents the issue from reoccurring. As an alternative, creating a new disk (remember we're on AWS) and copying all data to the new disk (rsync -a
was my command of choice) and using it to replace the original disk also solves & prevents.
Longer answer: the disk file system (ext4) appears to have reached some unstable state when the disk's snapshot had originally been created. When later the original snapshot of 200GB had been extended (using resize2fs
) to 1TB, it appears that in some sense it kept remembering internally the original size of 200GB, creating all sorts of weird phenomena which ended up with the OS unable to close handles, thus making the Tomcat reach its file limit, thus having all hell break loose.
Longest answer, with a bit more of the detective work details: the breakthrough happened when we had this pathology occur in parallel on two separate setups. Checking out all the parameters on those setups and comparing, we realized that df -h
on the drive was showing this result:
/dev/xvdc 1008G 197G 760G 19% /mnt/eternal
Now, this didn't catch our attention before, because the disk still has plenty of space left. But it was the exact same disk usage (197G) on both setups, and that has no reason to happen. From here things quickly unfolded. As mentioned before, our AWS instances had been created from an image that has a disk snapshot of 200GB, which is extended on individual instances using resize2fs
- usually to the maximum size of 1TB. We were finally able to recreate a "bad state" by launching a new instance, resizing to 1TB, and creating a big file of 300GB. When this was done, the system didn't freeze, but it did show the same strange behavior:
/dev/xvdc 1008G 197G 760G 19% /mnt/eternal
And that when there was clearly more than 197GB of data on the disk. So, we tried out the two methods mentioned above (chkdsk and recreating the disk) on two individual clean setups, and on each of those the weird behavior would no longer appear.
Our best guess is that at some point, when the AMI was created, something went wrong in the snapshot process - most likely because we had taken a "snapshot without restart" (though we don't usually, and I have no evidence to back this up, so I hope our DevOps don't go mad at me for blaming her without cause!). All in all, an interesting experience.