Private Google Kubernetes cluster can't download images from Google Container Engine

I am trying to set up our private Kubernetes cluster in Google Cloud to connect to Google Container Engine. I'm able to deploy sample images in the cluster without a problem, e.g. gcr.io/google-samples/hello-app:2.0. But when I try to deploy one of our own images, i.e. gcr.io/[OUR_PROJECT_ID]/test-image:1.0, then I get ImagePullBackOff errors showing up in Kubernetes.

ImagePullBackOff never shows any details about what caused the error. I tried logging in to one of the cluster's nodes directly (as recommended in the troubleshooting section of Google's docs), but I can't download the image from there either, even though pulling the image works fine from a public cluster. The node just doesn't seem to be a realistic troubleshooting environment, since although I know the demo image works, even that fails from inside the node:

$ docker pull gcr.io/google-samples/hello-app:2.0
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/images/create?fromImage=gcr.io%2Fgoogle-samples%2Fhello-app&tag=2.0: dial unix /var/run/docker.sock: connect: permission denied

It also works fine locally. So something about the private cluster is blocking it.

How can I get more details on why the image pull failed? It would be great to see the error that Docker is actually returning.

And of course if anyone knows the likely problem here and how to resolve it, I'm all ears. Private Google Access is already enabled on the network, and the cluster's service account already has access to the storage bucket used by GCE, so I don't think those are the issues.


Solution 1:

Problem solved: it was a permissions issue, and it wasn't because it was a private cluster but because our private cluster was using a different service account.

The other part of the problem was that Google actually creates two buckets for the container registry—a global one, and one specific to your location (for example, if you're in the U.S., the second bucket name would start with us.artifacts.).

I'm still learning how these two buckets work, but at least by default it seems that the global one is the one it's using for authentication...in any case I just gave the service account storage.objectAdmin permissions in both buckets and it pulls the images successfully now.