Random failed_to_connect_to_backend errors on GCE LB
I made a simple setup with two GCE instances behind a load balancer. But in the balancer logs, I can see random 502 responses with the following error: "failed_to_connect_to_backend"
Al thought the last health check was fine with 200 response, and checking my nginx logs shows that the request didn't even get though the backend to nginx.
I'm unable to know what the issue is, are there any sort of logs showing why it failed to connect to the backend? is it a a health check issue? are there any health checks logs?
Solution 1:
Have you configured keep alive timeout correctly?
A TCP session timeout, whose value is fixed at 10 minutes (600 seconds). This session timeout is sometimes called a keepalive or idle timeout, and its value is not configurable by modifying your backend service. You must configure the web server software used by your backends so that its keepalive timeout is longer than 600 seconds to prevent connections from being closed prematurely by the backend.
This is now in official GCP documentation. Recommended setting for nginx: KeepAliveTimeout 620. Recommended setting for Apache: keepalive_timeout 620s.
https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340
https://cloud.google.com/compute/docs/load-balancing/http/
https://cloud.google.com/load-balancing/docs/https/#timeouts_and_retries