I have an infrastructure running on google cloud using GKE. Today the prometheus pods got evicted and were not able to recover. After describing the pod I found this error.

  Warning  FailedMount         2m17s (x42 over 95m)  kubelet, gke-production-default-pool-44cd6cd6-h9rw  Unable to mount volumes for pod "prometheus-6d7c45fc6-5zd5d_kube-system(86c0258f-81ed-11e9-93ac-42010af00102)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"prometheus-6d7c45fc6-5zd5d". list of unmounted volumes=[storage]. list of unattached volumes=[config storage prometheus-token-875gs]
  Warning  FailedAttachVolume  69s (x47 over 96m)    attachdetach-controller                             AttachVolume.Attach failed for volume "pvc-ff226d8b-3814-11e8-a63c-42010af001b0" : googleapi: Error 400: EXTERNAL_RESOURCE_NOT_FOUND - The resource '[email protected]' of type 'serviceAccount' was not found.

Further investigation proven that the service account does not exist in the IAM service.

According to the google documentation supposedly by disabling and re-enabling the api service solves the problem.

The thing is, this operation will most likely delete the kubernetes cluster resources which I do not want to happen.

So the questions are:

  1. Is there another way of solving this issue? If so, how?
  2. Does this operation really delete all the resources?

Thanks everyone for taking the time to check this question.


Solution 1:

Apparently I was able to solve my own issue by undeleting the service account. This operation is not supported by gcloud cli but it is supported in the api. The request made was curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" "https://iam.googleapis.com/v1/projects/-/serviceAccounts/114592978558849211522:undelete"