CoreDNS failing due to a loop: how to feed kubelet with proper resolvConf?

This is where investigation started: CoreDNS couldn't work for more that a couple of seconds, giving the following errors:

$ kubectl get pods --all-namespaces
NAMESPACE       NAME                                          READY   STATUS             RESTARTS      AGE
ingress-nginx   ingress-nginx-controller-8xcl9                1/1     Running            0             11h
ingress-nginx   ingress-nginx-controller-hwhvk                1/1     Running            0             11h
ingress-nginx   ingress-nginx-controller-xqdqx                1/1     Running            2 (10h ago)   11h
kube-system     calico-kube-controllers-684bcfdc59-cr7hr      1/1     Running            0             11h
kube-system     calico-node-62p58                             1/1     Running            2 (10h ago)   11h
kube-system     calico-node-btvdh                             1/1     Running            0             11h
kube-system     calico-node-q5bkr                             1/1     Running            0             11h
kube-system     coredns-8474476ff8-dnt6b                      0/1     CrashLoopBackOff   1 (3s ago)    5s
kube-system     coredns-8474476ff8-ftcbx                      0/1     Error              1 (2s ago)    5s
kube-system     dns-autoscaler-5ffdc7f89d-4tshm               1/1     Running            2 (10h ago)   11h
kube-system     kube-apiserver-hyzio                          1/1     Running            4 (10h ago)   11h
kube-system     kube-controller-manager-hyzio                 1/1     Running            4 (10h ago)   11h
kube-system     kube-proxy-2d8ls                              1/1     Running            0             11h
kube-system     kube-proxy-c6c4l                              1/1     Running            4 (10h ago)   11h
kube-system     kube-proxy-nzqdd                              1/1     Running            0             11h
kube-system     kube-scheduler-hyzio                          1/1     Running            5 (10h ago)   11h
kube-system     kubernetes-dashboard-548847967d-66dwz         1/1     Running            0             11h
kube-system     kubernetes-metrics-scraper-6d49f96c97-r6dz2   1/1     Running            0             11h
kube-system     nginx-proxy-dyzio                             1/1     Running            0             11h
kube-system     nginx-proxy-zyzio                             1/1     Running            0             11h
kube-system     nodelocaldns-g9wxh                            1/1     Running            0             11h
kube-system     nodelocaldns-j2qc9                            1/1     Running            4 (10h ago)   11h
kube-system     nodelocaldns-vk84j                            1/1     Running            0             11h
kube-system     registry-j5prk                                1/1     Running            0             11h
kube-system     registry-proxy-5wbhq                          1/1     Running            0             11h
kube-system     registry-proxy-77lqd                          1/1     Running            0             11h
kube-system     registry-proxy-s45p4                          1/1     Running            2 (10h ago)   11h

kubectl describe on that pod didn't bring much to the picture:

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  67s                default-scheduler  Successfully assigned kube-system/coredns-8474476ff8-dnt6b to zyzio
  Normal   Pulled     25s (x4 over 68s)  kubelet            Container image "k8s.gcr.io/coredns/coredns:v1.8.0" already present on machine
  Normal   Created    25s (x4 over 68s)  kubelet            Created container coredns
  Normal   Started    25s (x4 over 68s)  kubelet            Started container coredns
  Warning  BackOff    6s (x11 over 66s)  kubelet            Back-off restarting failed container

But viewing logs did:

$ kubectl logs coredns-8474476ff8-dnt6b -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 5b233a0166923d642fdbca0794b712ab
CoreDNS-1.8.0
linux/amd64, go1.15.3, 054c9ae
[FATAL] plugin/loop: Loop (127.0.0.1:49048 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 2906344495550081187.9117452939332601176."

It's great that troubleshooting documentation was linked! I started browsing that page and discovered, that indeed my /etc/resolv.conf contained problematic local IP nameserver 127.0.0.53.

Also, I found real DNS IPs in /run/systemd/resolve/resolv.conf, but the question now is: how to perform the action described in the troubleshooting documentation, saying:

Add the following to your kubelet config yaml: resolvConf: (or via command line flag --resolv-conf deprecated in 1.10). Your “real” resolv.conf is the one that contains the actual IPs of your upstream servers, and no local/loopback address. This flag tells kubelet to pass an alternate resolv.conf to Pods. For systems using systemd-resolved, /run/systemd/resolve/resolv.conf is typically the location of the “real” resolv.conf, although this can be different depending on your distribution.

So, the questions are:

  • how to find or where to create mentioned kubelet config yaml,
  • at what level should I specify the resolvConf value, and
  • can it accept multiple values? I have two nameservers defined. Should they be given as separate entries or an array?

Solution 1:

/etc/resolv.conf/ is located in each of your nodes. You can edit it by SSHing into the node.
Then you have to restart kubelet for changes to take effect.

sudo systemctl restart kubelet

(If that does not work, restart your nodes with sudo reboot)


/home/kubernetes/kubelet-config.yaml (also located on each of your nodes) file contains kubelet's config. You can create new resolv.conf file, and point to it with resolvConf field

apiVersion: kubelet.config.k8s.io/v1beta1
...
kind: KubeletConfiguration
...
resolvConf: <location of the file>

Important: New configuration will only be applied to pods created after the update. It's highly recommended to drain your node before changing configuration.


can it accept multiple values? I have two nameservers defined. Should they be given as separate entries or an array?

Kubelet Configuration documentation states resolvConf is of type string, so probably only single value is accepted.