Elastic Beanstalk disable health state change based on 4xx responses

I have a rest api running on Elastic Beanstalk, which works great. Everything application-wise is running good, and working as expected.

The application is a rest api, used to lookup different users.

example url: http://service.com/user?uid=xxxx&anotherid=xxxx

If a user with either id's is found, the api responds with 200 OK, if not, responds with 404 Not Found as per. HTTP/1.1 status code defenitions.

It is not uncommon for our api to answer 404 Not Found on a lot of requests, and the elastic beanstalk transfers our environment from OK into Warning or even into Degraded because of this. And it looks like nginx has refused connection to the application because of this degraded state. (looks like it has a threshold of 30%+ into warningand 50%+ into degraded states. This is a problem, because the application is actually working as expected, but Elastic Beanstalks default settings thinks it is a problem, when it's really not.

Does anyone know of a way to edit the threshold of the 4xx warnings and state transitions in EB, or completely disable them?

Or should i really do a symptom-treatment and stop using 404 Not Found on a call like this? (i really do not like this option)


Solution 1:

Update: AWS EB finally includes a built-in setting for this: https://stackoverflow.com/a/51556599/1123355

Old Solution: Upon diving into the EB instance and spending several hours looking for where EB's health check daemon actually reports the status codes back to EB for evaluation, I finally found it, and came up with a patch that can serve as a perfectly fine workaround for preventing 4xx response codes from turning the environment into a Degraded environment health state, as well as pointlessly notifying you with this e-mail:

Environment health has transitioned from Ok to Degraded. 59.2 % of the requests are erroring with HTTP 4xx.

The status code reporting logic is located within healthd-appstat, a Ruby script developed by the EB team that constantly monitors /var/log/nginx/access.log and reports the status codes to EB, specifically in the following path:

/opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.2.0/gems/healthd-appstat-1.0.1/lib/healthd-appstat/plugin.rb

The following .ebextensions file will patch this Ruby script to avoid reporting 4xx response codes back to EB. This means that EB will never degrade the environment health due to 4xx errors because it just won't know that they're occurring. This also means that the "Health" page in your EB environment will always display 0 for the 4xx response code count.

container_commands:
    01-patch-healthd:
        command: "sudo /bin/sed -i 's/\\# normalize units to seconds with millisecond resolution/if status \\&\\& status.index(\"4\") == 0 then next end/g' /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.2.0/gems/healthd-appstat-1.0.1/lib/healthd-appstat/plugin.rb"
    02-restart-healthd:
        command: "sudo /usr/bin/kill $(/bin/ps aux | /bin/grep -e '/bin/bash -c healthd' | /usr/bin/awk '{ print $2 }')"
        ignoreErrors: true

Yes, it's a bit ugly, but it gets the job done, at least until the EB team provide a way to ignore 4xx errors via some configuration parameter. Include it with your application when you deploy, in the following path relative to the root directory of your project:

.ebextensions/ignore_4xx.config

Good luck, and let me know if this helped!

Solution 2:

There is a dedicated Health monitoring rule customization called Ignore HTTP 4xx (screenshot attached) Just enable it and EB will not degrade instance health on 4xx errors. enter image description here

Solution 3:

Thank you for your answer Elad Nava, I had the same problem and your solution worked perfectly for me!

However, after opening a ticket in the AWS Support Center, they recommended me to modify the nginx configuration to ignore 4xx on Health Check instead of modifying the ruby script. To do that, I also had to add a config file to the .ebextensions directory, in order to overwrite the default nginx.conf file:

files:
  "/tmp/nginx.conf":
    content: |

      # Elastic Beanstalk Managed

      # Elastic Beanstalk managed configuration file
      # Some configuration of nginx can be by placing files in /etc/nginx/conf.d
      # using Configuration Files.
      # http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/customize-containers.html
      #
      # Modifications of nginx.conf can be performed using container_commands to modify the staged version
      # located in /tmp/deployment/config/etc#nginx#nginx.conf

      # Elastic_Beanstalk
      # For more information on configuration, see:
      #   * Official English Documentation: http://nginx.org/en/docs/
      #   * Official Russian Documentation: http://nginx.org/ru/docs/

      user  nginx;
      worker_processes  auto;

      error_log  /var/log/nginx/error.log;

      pid        /var/run/nginx.pid;


      worker_rlimit_nofile 1024;

      events {
          worker_connections  1024;
      }

      http {

          ###############################
          # CUSTOM CONFIG TO IGNORE 4xx #
          ###############################

          map $status $loggable {
            ~^[4]  0;
            default 1;
          }

          map $status $modstatus {
            ~^[4]  200;
            default $status;
          }

          #####################
          # END CUSTOM CONFIG #
          #####################

          port_in_redirect off;
          include       /etc/nginx/mime.types;
          default_type  application/octet-stream;


          # This log format was modified to ignore 4xx status codes!
          log_format  main   '$remote_addr - $remote_user [$time_local] "$request" '
                             '$status $body_bytes_sent "$http_referer" '
                             '"$http_user_agent" "$http_x_forwarded_for"';

          access_log  /var/log/nginx/access.log  main;

          log_format healthd '$msec"$uri"'
                             '$modstatus"$request_time"$upstream_response_time"'
                             '$http_x_forwarded_for' if=$loggable;

          sendfile        on;
          include /etc/nginx/conf.d/*.conf;

          keepalive_timeout  1200;

      }

container_commands:
  01_modify_nginx:
    command: cp /tmp/nginx.conf /tmp/deployment/config/#etc#nginx#nginx.conf

Although this solution is quite more verbose, I personally believe that it is safer to implement, as long as it does not depend on any AWS proprietary script. What I mean is that, if for some reason AWS decides to remove or modify their ruby script (believe me or not, they love to change scripts without previous notice), there is a big chance that the workaround with sed will not work anymore.

Solution 4:

Here is a solution based off of Adriano Valente's answer. I couldn't get the $loggable bit to work, although skipping logging for the 404s seems like that would be a good solution. I simply created a new .conf file that defined the $modstatus variable, and then overwrote the healthd log format to use $modstatus in place of $status. This change also required nginx to get restarted. This is working on Elastic Beanstalk's 64bit Amazon Linux 2016.09 v2.3.1 running Ruby 2.3 (Puma).

# .ebextensions/nginx.conf

files:
  "/tmp/nginx.conf":
    content: |

      # Custom config to ignore 4xx in the health file only
      map $status $modstatus {
        ~^[4]  200;
        default $status;
      }

container_commands:
  modify_nginx_1:
    command: "cp /tmp/nginx.conf /etc/nginx/conf.d/custom_status.conf"
  modify_nginx_2:
    command: sudo sed -r -i 's@\$status@$modstatus@' /opt/elasticbeanstalk/support/conf/webapp_healthd.conf
  modify_nginx_3:
    command: sudo /etc/init.d/nginx restart