What is causing Google App Engine 504 Gateway Timeout Every 10 minutes or so?
I just migrated our production website over to Google App Engine last night, and now the app starts giving a 504 Gateway Timeout after about 10-15 minutes of inactivity. But I've been running a test version of the new site on GAE, in a different project, for months and it's never had this problem. I don't understand what is going on.
Below is the GAE dashboard graph which shows that pretty much all requests were returning as 5XX errors until 7am. I wasn't sure what was happening, so I just decided to re-deploy the app to see if that would kick things into working again... and for some reason, it seemed to do so.
After I re-deployed, the site started responding again. I clicked around for a while, fixed a bug, re-deployed... everything was good.
Then at 7:20, all of the requests turned into 504 Gateway Timeout errors again.
I saw it at 7:40, re-deployed without any changes and it's back up and working again.
What is going on?
p.s. I know that a dynamic gae instance will go idle and require time to spin back up, but that's not what is happening. I see that delay with my dev project gae, but then the page comes up after about 10 seconds. On my production gae, my browser just spins for a full minute and then gives me a 504 error and it keeps doing it until I re-deploy.
According to OP:
He had too many syslog()
or error_log()
calls in too little time and this caused a backend issue where the app would crash and the frontend gateway would try to talk to the backend instance, but the instance was dead, so the gateway would eventually timeout.
The fix lies within spacing out the calls to log information.