MySQL connection spike out of nowhere

We're running a website spread across three severs. Two of them are load-balanced web servers, and the last is a dedicated mysql server. The mysql server is running RHEL5 64-bit using a 2.6.18-92.1.6.el5 #1 SMP kernel and MySQL 5.0.45. It's a pretty beefy server too, with a Xeon L5420 and 8 gigs of RAM. Our web server's php pages are configured to use mysqli.

Normally we don't really utilize all the resources we're given, we do 20-25 queries a second in the daytime. However, every so often, we'll smack headfirst into the max database connections limit and with that our site croaks. What's more, it's seems to happen at night, where the site's traffic should be at a minimum.

We started out at 100 max connections, we upped it to 300 and yet it still occurs. If it makes a difference, we notice that sometimes there are tons of sleeping MySQL processes, yet nothing that connects to the database utilizes persistent connections. It does not happen every night, we had some issues where it would croak every night and then it was fine for about a week and a half until today.

We don't have any monster queries that would tie up the database for minutes at a time. We've tried glancing over the SLOW_QUERY log. We have a few queries that show up in there, but generally they don't last for more than 1 or 2 seconds and those are quite infrequent.

Does that sound like anything in particular? How would we proceed from here in terms of diagnosing the problem?


Solution 1:

My guess is that you have some long running queries in your application. When they are executed they cause the connection to stay checked out of the pool for a long time (relative to the usual usage pattern), this causes your pool to become exhausted, grow, and continue to grow up to its maximum, at which point any remaining workers block waiting on connections to be released.

The first thing will be to track down when this happens, that is, is it a cyclical event, or random. If its the former you're in luck, as you can be ready time it happens. If you can't determine a pattern then you'll have to be vigilant.

You may be able to figure this from looking at your website monitoring logs, or sar from your database to see if there are any correlating spikes.

If you can catch your database when its under load, you should execute the following commands on the mysql server

show innodb status;
show processlist;

The former will print out diagnostic information about the innodb engine (you are using innodb right?), the latter will print out the first few hundred chars of the query that was executing. Look for queries that have been running for a long time, queries generating temporary tables on disk, and queries that are blocked on a resource.

After that, the hard work begins. Use EXPLAIN to estimate the cost of the query, and the resources it uses. Avoid queries that require sorting on disk via a tmp table. Look for long running reporting jobs, or other scheduled maintenance tasks that periodically lock or saturate your database. It could be something as simple as the backup task, or a job that rolls up old purchase order data.

I recommend having these three settings in your /etc/my.cnf

log_slow_queries
log-queries-not-using-indexes
set-variable = long_query_time=1

For a web application doing 20-30 requests per second, you can't afford to have anything show up in these logs.

btw, IMHO its pointless to increase your connection pool's size beyond your original size as this will only delay the onset of pool exhaustion by at best, a few seconds, and only put more pressure on your db right when it doesn't need it.

Solution 2:

I have seen this before.

We had a cron doing a mysqldump of a database with MyISM tables. Because of the MyISM mysql dump would lock entire tables. causing the queries (and therefore connections) to queue up.

Solution 3:

AlexMax, Where you able to solve this issue? While there are suttle differences I am currently seeing similar problems as you describe here with 14 web servers load balanced to 8 mysql servers (each website is codded to one of the web servers). Connections will spike and 90%+ of the ones in mysql's processlist are listed as sleep with Query NULL. This will make mysql stop allowing connections and lasts approx. 2-3 minutes. Mysql 5.0.70 and PHP 5.28 on Quad Xeons (32bit for us though).