I have deleted all the Azure AKS Kubernetes Nodes, how to restore back the Cluster to it's original state?

I am new to the Azure AKS Cluster world, and while messing with a test cluster i have deleted all its Nodes with kubectl delete node xxxx, thinking that the cluster will heal itself. Boy, was i wrong.

Now, let me explain the issue, so, when i run kubectl get nodes, i get No resources found. In the "Node Pools" in the portal, i can see that there are 3 Nodes, i have scaled the Pool up and down, but in kubectl shows no nodes - No resources found. When i run kubectl get pods, all the pods are shown in pending state.

Extra Info:

  • The AKS Cluster was created manually, no ARM template or script was saved.
  • The AKS Cluster is using Availability Set (not Scale Set) for the Pool, so i can not add new Pool, and move the Pods there.

My question(s) to you is:

  1. How to get the Nodes to be shown in kubectl again? (The Pool has 3 Nodes there sitting)
  2. Can i somehow restore the Cluster to be working again? Move the Pods somehow, somewhere?
  3. What would you do in this case?

EDIT:

  • after some time showing "No Resources found" when i ran "kubectl get nodes", now 2 nodes came back online, but one is still missing. The Pool has count of 3. The 2 Nodes which are shown are in Ready State. But all the Pods are still in Pending state. No errors in Events.

New Question:

  • Is there a way to start populating the 2 Ready Nodes with the Pending Pods?

Thanks again folks.


Solution 1:

If you have run kubectl delete node, then the node is no longer registered with Kubernetes. If you were using scale set's then the best option would be to scale down and then back up again, to get new nodes and have them re-register. In your scenario with availability sets you don't have that option. You could look at running a node update, which may re-register it, or you can delete the VM and have AKS recreate it.

All of that said, availability sets are not the way to do AKS nowadays, if I were you I would just delete the cluster and recreate it using VMSS, given this is a test cluster.

Solution 2:

Thank you all for helping here, so, we had a support session with MS Support Team, and as always the recommendation was, first upgrade the cluster to supported AKS Version, and then we can see what to do next. I ran az aks upgrade to the next supported version, and all the nodes redeployed themselves correctly, and the connectivity to the API server came back.The Pods started working fine, and the cluster was back online. So to be precise - the solution was to upgrade the Cluster to a supported AKS Cluster Version using CLI.

Thank again folks