I am having issues with a server running out of physical memory and I'm having troubles discerning whether it is from my application's Java process or something else on the server. Let's take the following scenario:

Server physical memory: 3747MB
Java -Xms64m
Java -Xmx512m
Java XX:MaxPermSize=512m

When I boot up the server, the OS (RHEL) reports that 487MB are being used, using your favorite memory reporting tool (top, cat /proc/meminfo | grep Mem, free -m, etc). When I start my Java process (pid 123), it uses around 215MB of physical memory (as reported by RES memory in ps -f -p 123), taking my total used memory up to around 700MB.

If I let it run for an entire day, the RES memory for my process fluctuates a little, but is generally consistent. However the total server memory has steadily increased around 1500MB, taking it to a total of 2200MB.

If my java heap size or perm gen heap were growing, wouldn't it be reflected in the process' RES memory?

Also, I can't seem to account for that extra 1500MB anywhere.

# ps aux | awk '{ RES+=$6 } END { printf("RES: %.2fMB\n", RES/1024) }'
RES: 722.23MB

Can anyone help me find that lost memory? I am basically trying to figure out if this is my problem with the application, or the infrastructure team's problem with their server build.


Linux uses the policy of reclaiming, but not marking as "really free" any recently used memory (on the theory that scrubbing it costs effort, leaving the stuff around in case somebody uses it again costs nothing and may save a bundle). Don't worry about "free memory" reports. Look at how much (if any) swap is being used (swap is essentially a disk space for memory requirements that really overflow physical memory; disk is extremely slow, you don't want to need it). If you are worried about performance, install and configure monitoring software like the infamous sar (sysstat, surely there is a package for your system), it will record what is going on in minute detail for later perusal. With said reports in hand you'll know what (if anything) is your bottleneck. The quip that "Premature optimization is the root of all evil" is because people are notoriously bad at guessing where the real performance problems are, and end "fixing" something that is working perfectly fine.