Python Django sites on Apache+mod_wsgi with nginx proxy: highly fluctuating performance

I have an Ubuntu 10.04 box running several dozen Python Django sites using mod_wsgi (embedded mode; the faster mode, if properly configured). Performance highly fluctuates. Sometimes fast, sometimes several seconds delay. The smokeping graphs are al over the place.

Recently, I also added an nginx proxy for the static content, in the hopes it would cure the highly fluctuating performance. But, even though it reduced the number of requests Apache has to process significantly, it didn't help with the main problem.

When clicking around on websites while running htop, it can be seen that sometimes requests are almost instant, whereas sometimes it causes Apache to consume 100% CPU for a few seconds. I really don't understand where this fluctuation comes from.

I have configured the mpm_worker for Apache like this:

StartServers          1
MinSpareThreads      50
MaxSpareThreads      50
ThreadLimit          64
ThreadsPerChild      50
MaxClients           50
ServerLimit          1
MaxRequestsPerChild  0
MaxMemFree           2048

1 server with 50 threads, max 50 clients. Munin and apache2ctl -t both show a consistent presence of workers; they are not destroyed and created all the time. Yet, it behaves as such.

This tells me that once a sub interpreter is created, it should remain in memory, yet it seems sites have to reload all the time.

I also have a nginx+gunicorn box, which performs quite well. I would really like to know why Apache is so random.

This is a virtual host config:

<VirtualHost *:81>
    ServerAdmin [email protected]
    ServerName example.com

    DocumentRoot /srv/http/site/bla

    Alias /static/ /srv/http/site/static
    Alias /media/ /srv/http/site/media
    WSGIScriptAlias / /srv/http/site/passenger_wsgi.py

    <Directory />
            AllowOverride None
    </Directory>

    <Directory /srv/http/site>
            Options -Indexes FollowSymLinks MultiViews
            AllowOverride None
            Order allow,deny
            allow from all
    </Directory>

  • Ubuntu 10.04
  • Apache 2.2.14
  • mod_wsgi 2.8
  • nginx 0.7.65

Edit: I've put some code in the settings.py file of a site that writes the date to a tmp file whenever it's loaded. I can now see that the site is not randomly reloaded all the time, so Apache must be keeping it in memory. So, that's good, except it doesn't bring me closer to an answer...

Edit: I just found an error that might also be related to this:

  File "/usr/lib/python2.6/subprocess.py", line 633, in __init__
    errread, errwrite)

  File "/usr/lib/python2.6/subprocess.py", line 1049, in _execute_child
    self.pid = os.fork()

OSError: [Errno 12] Cannot allocate memory

The server has 600 of 2000 MB free, which should be plenty. Is there a limit that is set on Apache or WSGI somewhere?


Solution 1:

Have you tried using New Relic to try and identify whether it is an issue in your web application? Free tier available and also a initial full trial. Overview of what it can give you in:

  • http://lanyrd.com/2012/pycon/spcdg/

If a specific issue with web application of backend service that is used doesn't stand out as an issue, the WSGI server capacity analysis reporting may show up something, as will tell you whether you are running out capacity. It can also tell you whether you are over provisioned and wasting resources, which is actually quite often the case.

  • http://blog.newrelic.com/2012/09/11/introducing-capacity-analysis-for-python/

BTW, in general I would recommend against using 50 request threads in the one process. You are better off using about 5 threads and multiple processes. Exactly what is best does though really depend on the specific application, whether it is doing a lot of CPU bound work, and how much it has to handle long running requests. Whether serving up a lot of static files via same Apache can also impact it, with daemon mode of mod_wsgi possibly even being a better overall solution.

You are also on a very old mod_wsgi version, although don't believe that would cause an issue.

Finally, to avoid issues with some third party C extension modules for Python, if this is the only WSGI application on that server, set:

WSGIApplicationGroup %{GLOBAL}

Solution 2:

I fixed it. I converted all the production sites to use their own process (and all development sites all together in one process as well), in daemon mode. The Smokeping graphs are a lot better now. Performance is steady.

This still leaves me in the dark about why embedded mode had these problems, because as far as I can tell I had no process creation/destruction, but at least I have a better running server.

Edit:

As an example for in an apache site configuration:

WSGIDaemonProcess mysite12 processes=1 threads=10 display-name=%{GROUP}
WSGIProcessGroup mysite12

And then for low priority sites I put this in wsgi.conf:

WSGIDaemonProcess developmentsites processes=1 threads=15 display-name=%{GROUP}

And then in an apache conf:

WSGIProcessGroup developmentsites

Look at the difference (also because of nginx proxy):

enter image description here