Dask clients running in Kubeflow cannot communicate
I am using to following snippet to start an ephemeral Dask cluster within a node of a Kubeflow pipeline:
from dask_kubernetes import KubeCluster
from distributed import Client
cluster = KubeCluster(pod_template='worker-template.yaml', deploy_mode='local')
client = Client(cluster)
cluster.scale(2)
The cluster seems to be starting:
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://10.0.2.136:40103', name: 1, status: undefined, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.2.136:40103
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://10.0.3.180:38105', name: 0, status: undefined, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.0.3.180:38105
distributed.core - INFO - Starting established connection
However, as soon as I try to run any calculation or check the configuration with client.get_versions(check=True)
, I am getting the following error:
OSError: Timed out during handshake while connecting to tcp://10.0.2.136:40103 after 30 s
According to https://github.com/dask/dask-kubernetes/issues/197#issuecomment-548786829, it may be due to Istio blocking IP communications.
Is there a way to solve this issue (ideally, without changing the setup of Kubeflow and Kubernetes)?
Solution 1:
Try this: In worker has a label with this:
sidecar.istio.io/inject=false