Daily Apt Upgrade causing oom-killer to kill my java server

I do not know the Amazon EC2 specifically, but I had similar issues on a DigitalOcean node running Jira (which is written in Java).

One main problem with languages such as Java (Python, Ada, JavaScript, C#...) is that they use what's called garbage collection. So you never have to free an object, it frees itself once all references to it were severed.

One issue can be the garbage collector not running as often as required to keep the amount of memory as low as possible. i.e. if the server gets busy, it will attempt to handle the requests before it attempts to collect memory. This is one of the parameters you can setup when you start a Java program.

The other options I used with memory are the -Xms and such which tells Java how much it is allowed to allocate. When all that memory is allocated, the garbage collector is forced to run so as to rescue as much memory as possible and Java may still stop with an error (i.e. if you have a leak or too many connections at once requiring more memory than allowed), but running APT in parallel should not end up killing your Java app.

Finally, to fix my issue, what I've done, on the top of verifying my parameters, was to add a swap file. That uses some of your disk space, but I think it's worth not having your service just die out all the time. However, it should not be abused... If you swap all the time, it's better to get a bigger computer (double the RAM) because otherwise your system is going to be really slow. But if the auto apt updates take around 15 min., it can be worth it to swap a bit at that point instead of crashing.

Another solution is to restart your service once in a while. That way the memory that was not yet collected is going to forcibly be collected. Of course, any user who tries to connect while you restart your service are going to get an error (500 or 503).

Note: It could be that you get more hits around the time the APT auto-updates happen. If that's the case, you could look at changing the time at which the APT processes run. This is an annoying one, though, since it is defined as a "daily" on cron and all the dailies run one after the other at a given time. You'd have to move all of these processes to another time in the day or change the way the APT updates are setup.