Is it possible to get a list of running processes with a Cloudwatch Alarm?

We have an EC2 instance (Ubuntu) that has a few java-based applications and lately we're getting hit with high CPU utilization spikes that trigger one of our Cloudwatch alarms. By the time we get into the server to look at the cpu utilization, things have calmed down.

What we'd love to see in one of the alarm emails is a list of running processes and their cpu utilization(%) at the time of the alarm. Is this even possible?


Solution 1:

I suggest you to try using process accounting and run atop to collect system data snapshots every 10 min (default) or reduce 5 min if you should need better resolution.

apt-get install atop acct

Then you can easily check what was going on at some point using syntax like

atop -r atop.log.file -b 00:00 -e 00:05

example above will show you what was going on in system usage snapshot between 00:00 and 00:05

Solution 2:

I don't have experience doing anything similar, but in theory, it's possible to do it with existing building blocks:

CloudWatch -> SNS -> HTTP/HTTPS -> homebrew webapp -> collect data and email it
  • Setup your CloudWatch alarm so that it publishes an SNS message to a topic when it goes off.
  • Have a webapp running on your EC2 instance which, when a particular address is hit, collects the list of running processes and emails it.
  • Add a subscription to the SNS topic with the webapp's endpoint. You can either choose HTTP or HTTPS as the protocol.

You can combine this with the suggestion to use atop and configure your webapp to send the recent N-minute outputs.