pods in kubernetes can not communicate with other pods and outside cluster hosts
I have 2 master and 3 worker node and one HA proxy for controlePlan, I have many java microservices that communicate together and communicate with DB or KAFKA outside of the kubernetes cluster. network access is any to any open in all hosts. I create deployment for every microservices. But when I exec to container I don't have tcp connection on ports between pods and DB or KAFKA outside of the kubernetes cluster.
from hosts I have telnet to DB or KAFKA But in container pods I don't access.
from hosts:
[root@master1 ~]# telnet oracle.local 1521
Trying 192.198.10.30...
Connected to oracle.local.
Escape character is '^]'.
^C^CConnection closed by foreign host.
[root@master1 ~]#
from pod for example busybux :
[root@master1 ~]# kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ # telnet 192.168.10.30 1521
telnet: can't connect to remote host (192.168.10.30): Connection timed out
cluster status:
[root@master1 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master1.project.co Ready control-plane,master 11d v1.22.2 192.168.10.1 <none> Oracle Linux Server 8.3 5.4.17-2011.7.4.el8uek.x86_64 containerd://1.4.9
master2.project.co Ready control-plane,master 11d v1.22.2 192.168.10.2 <none> Oracle Linux Server 8.3 5.4.17-2011.7.4.el8uek.x86_64 containerd://1.4.9
worker1.project.co Ready <none> 11d v1.22.2 192.168.10.3 <none> Oracle Linux Server 8.3 5.4.17-2011.7.4.el8uek.x86_64 containerd://1.4.9
worker2.project.co Ready <none> 11d v1.22.2 192.168.10.4 <none> Oracle Linux Server 8.3 5.4.17-2011.7.4.el8uek.x86_64 containerd://1.4.9
worker3.project.co Ready <none> 11d v1.22.2 192.168.10.5 <none> Oracle Linux Server 8.3 5.4.17-2011.7.4.el8uek.x86_64 containerd://1.4.9
describe pod busybox:
[root@master1 ~]# kubectl describe pod busybox
Name: busybox
Namespace: default
Priority: 0
Node: worker3.project.co/192.168.10.5
Start Time: Sat, 02 Oct 2021 10:27:05 +0330
Labels: run=busybox
Annotations: cni.projectcalico.org/containerID: 75d7222e8f402c68d9161a7b399df2de6b45e7194b2bb3b0b2730adbdac680c4
cni.projectcalico.org/podIP: 192.168.205.76/32
cni.projectcalico.org/podIPs: 192.168.205.76/32
Status: Pending
IP:
IPs: <none>
Containers:
busybox:
Container ID:
Image: busybox
Image ID:
Port: <none>
Host Port: <none>
Args:
sh
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-69snv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-69snv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 21s default-scheduler Successfully assigned default/busybox to worker3.project.co
Normal Pulling 20s kubelet Pulling image "busybox"
A number of reasons may be the cause of that. Without proper insight to your cluster and network architecture this won't be solved, however, here are some ideas:
- Check if there are any NetworkPolicies applied by executing
kubectl -n <namespace> get netpol
. NetworkPolicies can restrict the communication within and to the outside of the cluster. - Let a Pod run with the
hostNetwork: true
option (not to be done in production, just as a test) and try some connectivity tests again (in both directions). - Check if your cluster's network is properly configured by tracing a network call. Are the routers properly configured and can be used by applications within the cluster?
- Check if your
network access is any to any open in all hosts
statement is true, it may be an issue with firewall configurations.
Bonus:
You seem to only have 2 master nodes which does not make any sense at all if etcd is running within the Kubernetes cluster (kubectl -n kube-system get pods | grep etcd
will show 2 pods if that is the case). Having 2 etcd members give you the exact same failure tolerance as a 1 node cluster but you wasted resources on having another VM that takes up memory, cpu, etc. Consider increasing your master nodes to 3 in order to have a failure tolerance of one. There always must be a majority of the etcd cluster running. Keep in mind that the majority of 2 is still 2.