Dask clients running in Kubeflow cannot communicate

I am using to following snippet to start an ephemeral Dask cluster within a node of a Kubeflow pipeline:

from dask_kubernetes import KubeCluster
from distributed import Client

cluster = KubeCluster(pod_template='worker-template.yaml', deploy_mode='local') 
client = Client(cluster)
cluster.scale(2)

The cluster seems to be starting:

distributed.scheduler - INFO - Register worker <WorkerState 'tcp://10.0.2.136:40103', name: 1, status: undefined, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.2.136:40103
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://10.0.3.180:38105', name: 0, status: undefined, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.3.180:38105
distributed.core - INFO - Starting established connection

However, as soon as I try to run any calculation or check the configuration with client.get_versions(check=True), I am getting the following error:

OSError: Timed out during handshake while connecting to tcp://10.0.2.136:40103 after 30 s

According to https://github.com/dask/dask-kubernetes/issues/197#issuecomment-548786829, it may be due to Istio blocking IP communications.

Is there a way to solve this issue (ideally, without changing the setup of Kubeflow and Kubernetes)?


Solution 1:

Try this: In worker has a label with this:

 sidecar.istio.io/inject=false