Unable to deploy Weave CNI - PODs in CrashLoopBackOff state

(This questions has been moved from Stackoverflow)

First I apologize for the lengthy entry but I think its better to give as much detail as possible.

  • Host OS: Win10
  • Guest OS: Ubuntu 20.10 (Groovy)
  • Docker CE: 5:19.03.15~3-0~ubuntu-bionic
  • Kubernetes: 1.20.4-00
  • VirtualBox: 6.1.18 on Win10
    • eth0: NAT
    • eth1: Host only (

I have three control-plane nodes with a keepalived/haproxy combination installed on each of them as a "load balancer" with an IP of As a consequence the apiserver entrypoint is 'poc-lb:8443' which in turn is distributed among the control-plane nodes on port 6443. /etc/hosts on each of the nodes looks like:

  • poc-ctrl-1
  • poc-ctrl-2
  • poc-ctrl-3
  • poc-lb

I initialize the k8s cluster on poc-ctrl-1 using:

sudo kubeadm init --apiserver-advertise-address --control-plane-endpoint poc-lb:8443 --upload-certs

When it has been initialized on that node I deploy the weave CNI plugin using:

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

After the weave plugin has been deployed on the first control-plane node, I join the second and third control-plane nodes (poc-ctrl-2 & poc-ctrl-3) using a 'kubeadm join' command (--token, discovery-token and --certificate-key have been removed for brevity):

sudo kubeadm join poc-lb:8443 --control-plane --apiserver-advertise-address
sudo kubeadm join poc-lb:8443 --control-plane --apiserver-advertise-address

The nodes join without a problem, however, the weave PODs don't seem to be very happy. This is the log for the 'weave' container on poc-ctrl-1:

DEBU: 2021/03/08 15:03:32.486479 [kube-peers] Checking peer "1e:85:5b:9b:50:c5" against list &{[]}
Peer not in list; removing persisted data
INFO: 2021/03/08 15:03:32.561859 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true http-addr: ipalloc-init:consensus=0 ipalloc-range: metrics-addr: name:1e:85:5b:9b:50:c5 nickname:poc-ctrl-1 no-dns:true no-masq-local:true port:6783]
INFO: 2021/03/08 15:03:32.561901 weave  2.8.1
INFO: 2021/03/08 15:03:33.216812 Bridge type is bridged_fastdp
INFO: 2021/03/08 15:03:33.216846 Communication between peers is unencrypted.
INFO: 2021/03/08 15:03:33.224064 Our name is 1e:85:5b:9b:50:c5(poc-ctrl-1)
INFO: 2021/03/08 15:03:33.224115 Launch detected - using supplied peer list: []
INFO: 2021/03/08 15:03:33.224149 Using "no-masq-local" LocalRangeTracker
INFO: 2021/03/08 15:03:33.224155 Checking for pre-existing addresses on weave bridge
INFO: 2021/03/08 15:03:33.233984 [allocator 1e:85:5b:9b:50:c5] No valid persisted data
INFO: 2021/03/08 15:03:33.262924 [allocator 1e:85:5b:9b:50:c5] Initialising via deferred consensus
INFO: 2021/03/08 15:03:33.263027 Sniffing traffic on datapath (via ODP)
INFO: 2021/03/08 15:03:33.265856 Listening for HTTP control messages on
INFO: 2021/03/08 15:03:33.266928 Listening for metrics requests on
INFO: 2021/03/08 15:03:33.401417 Error checking version: Get "https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=5.8.0-41-generic&os=linux&signature=aQyw2dVd0f8HNRaTeZ8N3lnlww9j0P3J5P359AkeBBk%3D&version=2.8.1": dial tcp: lookup checkpoint-api.weave.works on write udp> write: operation not permitted
INFO: 2021/03/08 15:03:33.578810 [kube-peers] Added myself to peer list &{[{1e:85:5b:9b:50:c5 poc-ctrl-1}]}
DEBU: 2021/03/08 15:03:33.588343 [kube-peers] Nodes that have disappeared: map[]
INFO: 2021/03/08 15:03:33.599543 Assuming quorum size of 1
INFO: 2021/03/08 15:03:33.599784 adding entry to weaver-no-masq-local of 0
INFO: 2021/03/08 15:03:33.599809 added entry to weaver-no-masq-local of 0
DEBU: 2021/03/08 15:03:33.684752 registering for updates for node delete events
INFO: 2021/03/08 15:20:34.605758 ->[] connection accepted
INFO: 2021/03/08 15:20:34.620605 ->[|a2:18:ea:75:33:ca(poc-ctrl-3)]: connection ready; using protocol version 2
INFO: 2021/03/08 15:20:34.620811 overlay_switch ->[a2:18:ea:75:33:ca(poc-ctrl-3)] using fastdp
INFO: 2021/03/08 15:20:34.620830 ->[|a2:18:ea:75:33:ca(poc-ctrl-3)]: connection added (new peer)
INFO: 2021/03/08 15:20:34.634204 ->[|a2:18:ea:75:33:ca(poc-ctrl-3)]: connection fully established
INFO: 2021/03/08 15:20:34.723969 sleeve ->[|a2:18:ea:75:33:ca(poc-ctrl-3)]: Effective MTU verified at 1438
INFO: 2021/03/08 15:20:35.742452 Discovered remote MAC a2:18:ea:75:33:ca at a2:18:ea:75:33:ca(poc-ctrl-3)
INFO: 2021/03/08 15:20:36.352445 Discovered remote MAC ee:27:39:76:a7:5d at a2:18:ea:75:33:ca(poc-ctrl-3)
INFO: 2021/03/08 15:20:36.510082 Discovered remote MAC be:c8:b2:c2:d2:cf at a2:18:ea:75:33:ca(poc-ctrl-3)
INFO: 2021/03/08 15:21:04.875787 adding entry to weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:04.875840 added entry to weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:04.876883 adding entry to weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:04.876905 added entry to weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:04.877778 deleting entry from weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:04.877792 deleted entry from weaver-no-masq-local of 0

This is the log for the 'weave' container on poc-ctrl-2:

DEBU: 2021/03/08 15:40:06.625988 [kube-peers] Checking peer "9a:7c:0f:a1:76:36" against list &{[{1e:85:5b:9b:50:c5 poc-ctrl-1}]}
Peer not in list; removing persisted data
FATA: 2021/03/08 15:40:36.654217 [kube-peers] Could not get Kubernetes version: Get "": dial tcp i/o timeout

And, finally, the log for the 'weave' container on poc-ctrl-3:

FATA: 2021/03/08 15:21:04.964921 [kube-peers] Could not update peer list: Unable to fetch ConfigMap kube-system/weave-net: Get "": dial tcp i/o timeout
INFO: 2021/03/08 15:21:04.981699 adding entry to weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:04.981948 added entry to weaver-no-masq-local of 0
INFO: 2021/03/08 15:21:16.935459 ->[] attempting connection
INFO: 2021/03/08 15:21:16.936059 ->[] error during connection attempt: dial tcp :0-> connect: connection refused
FATA: 2021/03/08 15:21:35.037984 [kube-peers] could not set node status: Patch "": dial tcp i/o timeout
INFO: 2021/03/08 15:21:40.255913 ->[] attempting connection
INFO: 2021/03/08 15:21:40.256478 ->[] error during connection attempt: dial tcp :0-> connect: connection refused
INFO: 2021/03/08 15:21:59.917279 Discovered remote MAC 4a:0d:3e:de:62:b4 at 1e:85:5b:9b:50:c5(poc-ctrl-1)
INFO: 2021/03/08 15:22:30.157989 ->[] attempting connection
INFO: 2021/03/08 15:22:30.158579 ->[] error during connection attempt: dial tcp :0-> connect: connection refused
INFO: 2021/03/08 15:23:25.508244 ->[] attempting connection
INFO: 2021/03/08 15:23:25.508785 ->[] error during connection attempt: dial tcp :0-> connect: connection refused
INFO: 2021/03/08 15:24:57.982083 ->[] attempting connection
INFO: 2021/03/08 15:24:57.982653 ->[] error during connection attempt: dial tcp :0-> connect: connection refused
INFO: 2021/03/08 15:26:10.300785 ->[] attempting connection
INFO: 2021/03/08 15:26:10.301685 ->[] error during connection attempt: dial tcp :0-> connect: connection refused
INFO: 2021/03/08 15:27:42.395131 ->[] attempting connection
INFO: 2021/03/08 15:27:42.395556 ->[] error during connection attempt: dial tcp :0-> connect: connection refused
INFO: 2021/03/08 15:34:00.374000 ->[] attempting connection
INFO: 2021/03/08 15:34:00.374547 ->[] error during connection attempt: dial tcp :0-> connect: connection refused
INFO: 2021/03/08 15:40:56.090626 ->[] attempting connection
INFO: 2021/03/08 15:40:56.091130 ->[] error during connection attempt: dial tcp :0-> connect: connection refused

All of the nodes have the 'br_netfilter' loaded and net.bridge.bridge-nf-call-iptables = 1.

The IP is assigned to the kubernetes service on 443/tcp and ports 6783/tcp and 678(3|4)/udp are used by weave. Given the outputs above I get the feeling I have some iptables related issues and/or could it be that the packets travels the default route on the (eth0 interface)?

ip route gives:

default via dev eth0 proto dhcp src metric 100 dev eth0 proto kernel scope link src dev eth0 proto dhcp scope link src metric 100 dev weave proto kernel scope link src dev docker0 proto kernel scope link src linkdown dev eth1 proto kernel scope link src

What have I missed here?

Solution 1:

After inspecting the iptables rules I got a feeling that the IP assigned to the k8s svc MUST be routed to the "wrong" interface. I issued

sudo ip route add dev eth1

and weave started!