How to minimise SpamAssassin (spamd) memory use
I think you're misunderstanding the way Linux reports memory usage. When a process forks, it results in a second process that shares a lot of resources with the original process. Included in that is memory. However, Linux uses a technique known as Copy On Write (COW) for this. What that means is that each forked child process will see the same data in memory as the original process, but whenever that data changes (by the child or parent), the changes are copied and only then point to a new location.
Until one of the processes makes changes to that data, they are sharing the same copy. As a result, I could have a process that uses 100MB of RAM, and fork it 10 times. Each of those forked processes would show 100MB of RAM being used, but if you looked at the overall memory usage on the box, it might only show that 130MB of RAM is being used (100MB shared between the processes, plus a few MB of overhead, plus another dozen MB or two for the rest of the system).
As a final example, I have a box right now with 30 apache processes running. Each process is showing a usage of 22MB of RAM. However, when I run free -m to show my overall RAM usage, I get:
topher@crucible:/tmp$ free -m
total used free shared buffers cached
Mem: 349 310 39 0 24 73
-/+ buffers/cache: 212 136
Swap: 511 51 460
As you can see, this box doesn't even have enough RAM to run 30 processes that were each using 18MB of "real" RAM. Unless you're literally running out of RAM or your apps are swapping heavily, I wouldn't worry about things.
UPDATE: Also, check out this tool called smem, mentioned by jldugger in the answer to another question on Linux memory usage here.
Using sa-compile you might be able to improve the matching speed of many rules.
Here's what I have done.
I have a set-up where a lot of messages tend to be delivered roughly at the same time; for a series of experiments I run SA on messages which are copied to a temporary spool and then delivered by a cron job every five minutes.
spamd
would keep on printing "maybe you should increase the max-children parameter" and I had it raised up to 40 at one point, but I had the server consuming all its swap space and crashing.
Now I have implemented a different regime where delivery is governed by a Procmail lock file. Because it was simple to do, I just use the last digit of the process ID, and run with 10 children. I'm not at all sure this is optimal, but it has already helped avoid the insane load peaks I wouled experience from time to time.
LINEBUF=10240
# Grab last digit of PID for lockfile
PID=$$
:0
* PID ?? ()\/[0-9]$
{ D=$MATCH }
:0
* > 512000
{ SA="(too large)" }
:0Ew:/tmp/20spamc.$D
SA=| spamc -p 38783 -l -y
In addition, I start up spamd
with a number of ulimit
restrictions. The numbers were taken out of http://svn.apache.org/repos/asf/spamassassin/trunk/contrib/run-masses except I removed the ulimit -u
restriction. (Not sure what's going on. 32 is way too small in any event. With something like 500 I could keep spamd
running for a while, but eventually running into the limit.)
ulimit -v 204800
ulimit -m 204800
ulimit -n 256
#ulimit -u 32
perl -T -I lib -w spamd --min-children 2 --max-children 10 --max-spare 5 etc etc
I guess I will end up with delivery failures if the load is too high for an extended time, but so far, it seems I have managed to reduce the load to manageable levels with this; and a bunch of failed deliveries is still much better than the machine running out of swap.