EC2 instance automatically becomes unavailable - 504 Gateway timeout

I have a t3a.micro instance running wordpress with pretty low traffic. This instance automatically becomes unavailable resulting in 504 Gateway timeout error. At that moment I fail to connect with EC2 using ssh as well. It happens anytime in day, sometimes daily and sometimes doesn't happen at all for whole week. There is no traffic spike when it goes down as well. I have asked this question from AWS Support as well and got following answer:

Upon checking from my end, I was able to see that the instance has been failing the instance status check[1] several times in the past 24 hours indicating that the instance has issues at the OS level. Upon further checking the console logs, I was able to see the following Out of Memory error:

[ 5626.002561] Out of memory: Killed process 2050 (apache2) total-vm:552924kB, anon-rss:54868kB, file-rss:0kB, shmem-rss:32076kB, UID:33 pgtables:848kB oom_score_adj:0

[ 5674.174673] Out of memory: Killed process 1788 (apache2) total-vm:624936kB, anon-rss:51952kB, file-rss:0kB, shmem-rss:34184kB, UID:33 pgtables:856kB oom_score_adj:0

[ 5763.820732] Out of memory: Killed process 1815 (apache2) total-vm:550384kB, anon-rss:51604kB, file-rss:0kB, shmem-rss:34532kB, UID:33 pgtables:836kB oom_score_adj:0

[ 5773.275938] Out of memory: Killed process 1973 (apache2) total-vm:624744kB, anon-rss:52260kB, file-rss:0kB, shmem-rss:32136kB, UID:33 pgtables:856kB oom_score_adj:0

[ 5959.347157] Out of memory: Killed process 2014 (apache2) total-vm:552440kB, anon-rss:54020kB, file-rss:0kB, shmem-rss:28856kB, UID:33 pgtables:844kB oom_score_adj:0

[ 6438.787255] Out of memory: Killed process 2165 (apache2) total-vm:624756kB, anon-rss:51948kB, file-rss:0kB, shmem-rss:29836kB, UID:33 pgtables:856kB oom_score_adj:0

A bit about OOM (Out of memory) killer, if memory is exhaustively used up by processes, to the extent which can threaten the stability of the system, then the OOM killer kicks in. It is the task of the OOM Killer to continue killing processes until enough memory is freed for the smooth functioning of the rest of the process that the Kernel is attempting to run. It seems from the output that your instance might have exhausted the memory, thus the OOM killer is being called upon by the Linux Kernel.

They suggested to monitor memory usage by each process and kill or fix that process to not use that much memory. This doesn't look like a very practical approach at a time when I am running wordpress, that I can not modify even if it using too much memory for few minutes.

I have another instance t3.small using elastic beanstalk that hosts a java web application with nginx/tomcat environment on amazon linux AMI provided by beanstalk. Same issue happens with this instance, automatically goes down and shows 504 Gateway timeout.

Question: - I don't want to upgrade my instance because they are running perfectly fine with current amount of traffic on them. Is there any solution to handle these issues without upgrade and ofcourse constant monitoring of processes?


Solution 1:

Your main options are:

  • Work out what's using the RAM and configure it to use less RAM (best option)
  • Use an instance with more RAM (which may not fully solve the problem)
  • Use a swap file as a cheap way to increase RAM - I use a swap file just to make sure there's some spare to prevent OOM situations, so while it won't solve the problem it's ag ood idea.

I run five fairly low volume Wordpress websites, MySQL, and a few other tool on a t3a.nano (512MB RAM) plus 512MB swap.

Here are a few ways to reduce memory usage of a typical Wordpress system (off the top of my head):

  • Limit the number of PHP threads / workers. I think I allow maybe 2-3 threads, if one isn't available the web server will wait until one is, which usually isn't long. This one is key as PHP / Wordpress is a massive memory hog.
  • Limit the maximum memory available to each PHP thread.
  • Turn off MySQL performance schema
  • Turn MySQL parameters to reduce memory usage. You basically just have to read the documentation, but I'll copy my current config below. This is from my Windows PC but Linux will be similar.
  • Check your web server memory usage, if it's Apache and high you could consider Nginx which is fast and small.

You should also search for "LAMP (or LEMP) wordpress reduce memory usage"

Here's the MySQL config I'm using. It's my new config for MySQL 8.x and hasn't been tested, but it's quite similar to my MySQL5 config so should be ok. It's optimised for low memory usage, not high performance, and I'm not a MySQL expert so it's probably not great.

[mysqld]
# set basedir to your installation path
basedir=(insert here)
# set datadir to the location of your data directory
datadir=(insert here)

# Turn off performance schema
performance_schema=OFF

# Turn off binary log
skip-log-bin
log_bin = OFF

# Disable monitoring
innodb_monitor_disable=all

# RAM optimisation settings overall
innodb_buffer_pool_size=50M
innodb_buffer_pool_instances=1
innodb_flush_method=unbuffered
innodb_log_buffer_size=1048576
innodb_log_file_size=4194304
innodb_max_undo_log_size=10485760
innodb_sort_buffer_size=64K
innodb_ft_cache_size=1600000
innodb_max_undo_log_size=10485760
max_connections=20
key_buffer_size=1M

# Reduce RAM: per thread or per operation settings
thread_stack=140K
thread_cache_size = 2
read_buffer_size=8200
read_rnd_buffer_size=8200
max_heap_table_size=16K
tmp_table_size=128K
temptable_max_ram=2097152
bulk_insert_buffer_size=0
join_buffer_size=128
net_buffer_length=1K

# Slow query log
slow_query_log=OFF
#long_query_time=5
#log_slow_rate_limit=1
#log_slow_rate_type=query
#log_slow_verbosity=full
#log_slow_admin_statements=ON
#log_slow_slave_statements=ON
#slow_query_log_always_write_time=1
#slow_query_log_use_global_control=all

# Logs
log_error = (wherever)
general_log = ON
general_log_file  = (wherever)