Google Cloud External Load Balancer's backend services randomly fail with 502 Server Error

Solution 1:

Glad to hear that your issue has been fixed and I understand that you have achieved it by manually creating NEG thru GCP console and subsequently editing backend services rather than using Terraform. The most likely cause of this issue seems to be racing condition i.e. in Terraform we usually define the resources in a chain and hence each resource being defined is dependent on another resource. Usually while defining resources through Terraform, the backend services creation and NE attachments are dependent on NEG creation. Both the backend services creation and Network endpoint(NE) attachment operations tend to run in parallel and in such case the NE attach process doesn’t reference to the backend service correctly because the state of the Internet NEG will be read exactly during backend service creation/update (so NE attachment has to happen prior to backend creation) .
So, in the Terraform while creating the backend service, we have to define it to be depends-on (meta argument) [1] NE attachment (i.e, backend service should run only after NE attachment).

[1] https://www.terraform.io/docs/language/meta-arguments/depends_on.html

Hope this clarifies your doubt.