Ubuntu 20.04 routing for just one IP (in the same subnet) ends in "dev lo" instead "dev eth0", kubernetes worker node can't connect to master node

I bumped to (as it now seem to me) routing issue. I can no longer access one of my worker nodes (server) from my master node (server). AFAIK, it has nothing to do with Kubernetes, it leads to pure Linux networking issue. As the issue is with only one IP, I was troubleshooting iptables, enabled TRACE and realised that packet actualy comes accross master (eth0), gets to iptables (passes: raw > mangle >nat) but when it has to get routed from nat to filter, it just dissapears. As I understand that is the point in which kernel has to make routing decision. Checked routing and found it's not working for just that one IP (all others from same IP segment are working fine) !? As I'm with cloud provider, and can't troubleshoot networking, so I tried reinstalling OS (same Ubuntu 20.04) of master node (server). Found out that with fresh OS reinstall, the issue was not present, therefore configuration issue must be within my master Linux server (I reverted the server image back form snapshot).

root@vmi57XXXX:~# route  
Kernel IP routing table  
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface  
default         gw.provider.net 0.0.0.0         UG    0      0        0 eth0  
10.244.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0  
10.244.1.0      10.244.1.0      255.255.255.0   UG    0      0        0 flannel.1  
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0  
root@vmi57XXXX:~# ip route get xx.xx.xx.96  
local xx.xx.xx.96 dev lo src xx.xx.xx.96 uid 0   
    cache <local>   
root@vmi57XXXX:~# ip route get xx.xx.xx.95  
xx.xx.xx.95 via xx.xx.xx.1 dev eth0 src xx.xx.xx.95 uid 0   
    cache  
root@vmi57XXXX:~# ip route get xx.xx.xx.97  
xx.xx.xx.97 via xx.xx.xx.1 dev eth0 src xx.xx.xx.97 uid 0   
    cache   
  
root@vmi57XXXX:~# arp -v  
Address                  HWtype  HWaddress           Flags Mask            Iface  
10.244.0.60              ether   8a:94:de:43:b6:0f   C                     cni0  
10.244.0.63              ether   1e:76:6a:60:27:f3   C                     cni0  
10.244.0.62              ether   36:0b:19:5e:57:87   C                     cni0  
gw.provider.net          ether   00:c0:1d:c0:ff:ee   C                     eth0  
10.244.0.64              ether   82:03:61:c5:4d:fb   C                     cni0  
10.244.0.50                      (incomplete)                              cni0  
10.244.1.0               ether   52:3d:a5:f4:c2:2c   CM                    flannel.1  
10.244.0.61              ether   56:19:98:79:a1:3a   C                     cni0  
Entries: 8  Skipped: 0  Found: 8  

root@vmi57XXXX:~# ip netconf show dev eth0
inet eth0 forwarding on rp_filter off mc_forwarding off proxy_neigh off 
ignore_routes_with_linkdown off 
inet6 eth0 forwarding off mc_forwarding off proxy_neigh off 
ignore_routes_with_linkdown off

Any clues about whats going on over there are more than welcome!!!

Thanks

EDIT: After solving the issue it's worth mentioning that this behavior was experienced with Kubernetes 1.21.2-00 and flannel as CNI. I did the upgrade few weeks ago and this was the first restart of one worker node after the upgrade.

Solution 1:

SOLVED!

bad guy was actually Kubernetes - it set an L O C A L route on master node that can't work without Kubernetes' functional networking service (flannel - in my case). Therefore when worker node got rebooted it was no longer able to access master node's API service (6443/tcp) and couldn't present itself to API - that closed magic circle in which woker node looped with no luck.

I learned today about "local" routes maintained by kernel (all present routing tables can be found here: /etc/iproute2/rt_tables).

ip route ls table local
local xx.xx.xx.96 dev kube-ipvs0 proto kernel scope host src xx.xx.xx.96  <<< PROBLEMATIC
local xx.xx.xx.50 dev eth0 proto kernel scope host src xx.xx.xx.50  <<< i.e. OK

delete route

ip route del table local local xx.xx.xx.96 dev kube-ipvs0 proto kernel scope host src xx.xx.xx.96

and now it works

root@vmi57XXXX:~# ip route get xx.xx.xx.96
xx.xx.xx.96 via xx.xx.xx.1 dev eth0 src xx.xx.xx.50 uid 0 
    cache

Ubuntu 20.04 routing for just one IP (in the same subnet) ends in "dev lo" instead "dev eth0", kubernetes worker node can't connect to master node

Solution 1:

Related

Recent Posts