pods in kubernetes can not communicate with other pods and outside cluster hosts

I have 2 master and 3 worker node and one HA proxy for controlePlan, I have many java microservices that communicate together and communicate with DB or KAFKA outside of the kubernetes cluster. network access is any to any open in all hosts. I create deployment for every microservices. But when I exec to container I don't have tcp connection on ports between pods and DB or KAFKA outside of the kubernetes cluster.

from hosts I have telnet to DB or KAFKA But in container pods I don't access.

from hosts:

[root@master1 ~]# telnet oracle.local 1521
Trying 192.198.10.30...
Connected to oracle.local.
Escape character is '^]'.
^C^CConnection closed by foreign host.
[root@master1 ~]#

from pod for example busybux :

[root@master1 ~]# kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ # telnet 192.168.10.30 1521
telnet: can't connect to remote host (192.168.10.30): Connection timed out

cluster status:

[root@master1 ~]# kubectl get nodes -o wide
NAME                 STATUS   ROLES                  AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                  KERNEL-VERSION                  CONTAINER-RUNTIME
master1.project.co   Ready    control-plane,master   11d   v1.22.2   192.168.10.1    <none>        Oracle Linux Server 8.3   5.4.17-2011.7.4.el8uek.x86_64   containerd://1.4.9
master2.project.co   Ready    control-plane,master   11d   v1.22.2   192.168.10.2    <none>        Oracle Linux Server 8.3   5.4.17-2011.7.4.el8uek.x86_64   containerd://1.4.9
worker1.project.co   Ready    <none>                 11d   v1.22.2   192.168.10.3   <none>        Oracle Linux Server 8.3   5.4.17-2011.7.4.el8uek.x86_64   containerd://1.4.9
worker2.project.co   Ready    <none>                 11d   v1.22.2   192.168.10.4   <none>        Oracle Linux Server 8.3   5.4.17-2011.7.4.el8uek.x86_64   containerd://1.4.9
worker3.project.co   Ready    <none>                 11d   v1.22.2   192.168.10.5   <none>        Oracle Linux Server 8.3   5.4.17-2011.7.4.el8uek.x86_64   containerd://1.4.9

describe pod busybox:

[root@master1 ~]# kubectl describe pod busybox
Name:         busybox
Namespace:    default
Priority:     0
Node:         worker3.project.co/192.168.10.5
Start Time:   Sat, 02 Oct 2021 10:27:05 +0330
Labels:       run=busybox
Annotations:  cni.projectcalico.org/containerID: 75d7222e8f402c68d9161a7b399df2de6b45e7194b2bb3b0b2730adbdac680c4
              cni.projectcalico.org/podIP: 192.168.205.76/32
              cni.projectcalico.org/podIPs: 192.168.205.76/32
Status:       Pending
IP:
IPs:          <none>
Containers:
  busybox:
    Container ID:
    Image:         busybox
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Args:
      sh
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-69snv (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-69snv:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  21s   default-scheduler  Successfully assigned default/busybox to worker3.project.co
  Normal  Pulling    20s   kubelet            Pulling image "busybox"

A number of reasons may be the cause of that. Without proper insight to your cluster and network architecture this won't be solved, however, here are some ideas:

Check if there are any NetworkPolicies applied by executing kubectl -n <namespace> get netpol. NetworkPolicies can restrict the communication within and to the outside of the cluster.
Let a Pod run with the hostNetwork: true option (not to be done in production, just as a test) and try some connectivity tests again (in both directions).
Check if your cluster's network is properly configured by tracing a network call. Are the routers properly configured and can be used by applications within the cluster?
Check if your network access is any to any open in all hosts statement is true, it may be an issue with firewall configurations.

Bonus: You seem to only have 2 master nodes which does not make any sense at all if etcd is running within the Kubernetes cluster (kubectl -n kube-system get pods | grep etcd will show 2 pods if that is the case). Having 2 etcd members give you the exact same failure tolerance as a 1 node cluster but you wasted resources on having another VM that takes up memory, cpu, etc. Consider increasing your master nodes to 3 in order to have a failure tolerance of one. There always must be a majority of the etcd cluster running. Keep in mind that the majority of 2 is still 2.

pods in kubernetes can not communicate with other pods and outside cluster hosts

Related

Recent Posts