Nginx Ingress 504 timeout - EKS with ELB connected to nginx ingress

Solution 1:

We think our issue was related to this:

https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html#loopback-timeout

We're using an internal nlb with our nginx ingress controller, with targets registered by instance ID. We found that the 504 timeouts and the X second waits were only occurring on applications that were sharing a node with one of our ingress controller replicas. We used a combination of nodeSelectors, labels, taints, and tolerations to force the ingress controllers onto their own node, and it appears to have eliminated the timeouts.

We also changed our externalTrafficPolicy setting to Local.

Solution 2:

I had the same issue as J. Koncel where my applications that were sharing the same nodes as the nginx ingress controller were the only ones that got the 504 timeouts.

Instead of using nodeSelectors and taints/tolerations, I used Pod anti-affinity: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#inter-pod-affinity-and-anti-affinity.

I added a label to the spec for my nginx-ingress-controller

podType: ingress

Then I updated the yml files for the applications that should not be scheduled on the same instance as the nginx-ingress-controller to be this:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: podType
          operator: In
          values:
          - ingress
      topologyKey: "kubernetes.io/hostname"