Google Kubernetes Engine node pool does not autoscale from 0 nodes
I am trying to run a machine learning job on GKE, and need to use a GPU.
I created a node pool with Tesla K80, as described in this walkthrough.
I set the minimum node size to 0, and hoped that the autoscaler would automatically determine how many nodes I needed based on my jobs:
gcloud container node-pools create [POOL_NAME] \
--accelerator type=nvidia-tesla-k80,count=1 --zone [COMPUTE_ZONE] \
--cluster [CLUSTER_NAME] --num-nodes 3 --min-nodes 0 --max-nodes 5 \
--enable-autoscaling
Initially, there are no jobs that require GPUs, so the cluster autoscaler correctly downsizes the node pool to 0.
However, when I create job with the following specification
resources:
requests:
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"
Here is the full job configuration. (Please note that this configuration is partially auto-generated. I have also removed some environment variables that are not pertinent to the issue).
the pod is stuck pending with Insufficient nvidia.com/gpu
until I manually increase the node pool to at least 1 node.
Is this a current limitation of GPU node pools, or did I overlook something?
Autoscaler supports scaling GPU nodepools (including to and from 0).
One possible reason for this problem is if you have enabled Node Auto-Provisioning and set resouce limits (via UI or gcloud flags such as --max-cpu, max-memory, etc). Those limits apply to ALL autoscaling in the cluster, including nodepools you created manually with enabled autoscaling (see note in documentation: https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-provisioning#resource_limits).
In particular if you have enabled NAP and you want to autoscale nodepools with GPUs you need to set resouce limits for GPUs as described in https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-provisioning#gpu_limits.
Finally, autoprovisioning also supports GPUs, so (assuming you set the resource limits as described above) you don't actually need to create nodepool for your GPU workload - NAP will create one for you automatically.
===
Also, for future reference - if autoscaler fails to create nodes for some of your pods, you can try to debug it using autoscaler events:
- On your pod (
kubectl describe pod <your-pod>
) there should be one of the 2 events (it may take a minute until they show up):- TriggeredScaleUp - this mean the autoscaler decided to add a node for this pod.
- NotTriggerScaleUp - autoscaler spotted your pod, but it doesn't think any nodepool can be scaled up to help it. In 1.12 and later the event contains a list of reasons why adding nodes to different nodepools wouldn't help the pod. This is usually the most useful event for debugging.
-
kubectl get events -n kube-system | grep cluster-autoscaler
will give you events describing all autoscaler actions (scale-up, scale-down). If a scale-up was attempted, but failed for whatever reason it will also have events describing that.
Note that events are only available in Kubernetes for 1 hour after they were created. You can see historical events in Stackdriver by going to UI and navigating to Stackdriver->Logging->Logs and choosing "GKE Cluster Operations" in drop-down.
Finally you can check the current status of autoscaler by running kubectl get configmap cluster-autoscaler-status -o yaml -n kube-system
.