SSH unresponsive after stopping Tomcat, many ksoftirqd CPU usage
I have a weird issue. Every time I stop Tomcat, SSH becomes very slow, almost totally unresponsive. It takes a minute or even more for SSH to accept any command. When I finally get Tomcat started again everything goes back to normal.
This is happening on a very busy server. Problem is that I need to stop the Tomcat often because of the application upgrades. It would normally take few seconds to upgrade, but here it takes almost 10 minutes and because of that we are experiencing unwanted downtimes.
One thing that I see is when I stop Tomcat top
shows a lot of ksoftirqd/X
processes at 100% CPU. Could this be the issue?
Kernel version is: 2.6.18-308.11.1.el5
Red Hat version is: Red Hat Enterprise Linux Server release 5.9 (Tikanga)
Any idea why this is happening?
I know that this isn't the "best practice", but, i would suggest you to reset tomcat remotely through ssh and set the outputs to null:
ssh your_server '/etc/init.d/tomcat restart > /dev/null 2>&1'
(You could replace the command above with the equivalent you use to restart tomcat).
This is a workaround, not a solution. Could you try this while connected on other ssh session and check if the problem still occurrs and affects all the sessions?
Slow ssh connection / ssh lag is a symptom of a high load. High load is often caused by io blocking, which is often caused by swapping.
to check your load, run uptime
or top
you will probably see load numbers over 10 when ssh is not responding. They will probably hover under 2 during normal use.
run free
or top
to see your memory usage, you will probably see a lot of swap.
Once you find out the root symptoms, you can then search for why is "tomcat swapping on shutdown" or "tomcat high load" which is probably because it's trying to write some things stored in cache or swap to disk. Is your tomcat jvm max heap size larger than the amount of memory you have?
Maybe something connecting to your webapp retries constantly when it goes down creating DoS scenario.
This could all be specific to your webapp so use general terms.