Google Compute Engine Random Shutdown
Our Compute Engine which runs the backend for a mobile game randomly shutdown today (8/3/18) and the logs are unable to tell me which user or IP address initiated it.
I've been doing some digging and got into the Syslogs, which display the following:
Mar 8 10:58:10 redis-prod-vm systemd[1]: Started Synchronise Hardware Clock to System Clock.
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopping Session 5 of user redis.
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopping User Manager for UID 999...
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Stopping Default.
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopping Graphical Interface.
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopped target Graphical Interface.
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopping Entropy daemon using the HAVEGE algorithm...
Mar 8 10:58:10 redis-prod-vm haveged[369]: haveged: Stopping due to signal 15
Mar 8 10:58:10 redis-prod-vm haveged[369]: haveged starting up
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopping Multi-User System.
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopped target Multi-User System.
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Stopped target Default.
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Stopping Basic System.
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Stopped target Basic System.
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Stopping Paths.
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Stopped target Paths.
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Stopping Timers.
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Stopped target Timers.
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Stopping Sockets.
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Stopped target Sockets.
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Starting Shutdown.
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Reached target Shutdown.
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopping Deferred execution scheduler...
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopping vsftpd FTP server...
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopping LSB: bitnami init script...
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopping Regular background program processing daemon...
Mar 8 10:58:10 redis-prod-vm systemd[7558]: Starting Exit the Session...
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopping Google Compute Engine user shutdown scripts...
Mar 8 10:58:10 redis-prod-vm systemd[1]: Stopping OpenBSD Secure Shell server...
This continued until the VM finished shutting down, here's the log:
2018-03-08 10:59:46.073 GMT compute.instances.stop {
"event_timestamp_us":"XXX",
"actor":{"user":""},
"resource":{
"name":"redis-prod-vm",
"type":"instance",
"zone":"us-central1-f","id":"XXX"
},
"event_type":"GCE_OPERATION_DONE",
"trace_id":"XXX",
"operation":{"type":"operation",…
I've replaced some potentially important numbers with XXX to be safe. Can someone please help shed light onto what happened?
I post this answer to make recommendation provided by @Taher at the comment section more visible:
Please have a look at the documentation Preemptible VM instances:
A preemptible VM is an instance that you can create and run at a much lower price than normal instances. However, Compute Engine might stop (preempt) these instances if it requires access to those resources for other tasks. Preemptible instances are excess Compute Engine capacity, so their availability varies with usage.
If your apps are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Compute Engine costs significantly. For example, batch processing jobs can run on preemptible instances. If some of those instances stop during processing, the job slows but does not completely stop. Preemptible instances complete your batch processing tasks without placing additional workload on your existing instances and without requiring you to pay full price for additional normal instances.
Please check if your instance is preemptible o not.
Also, you can follow the documentation Viewing serial port output and check if there any useful logging information.
Furthermore, please consider using Using the Logs Explorer to collect more troubleshooting information.