Can kubernetes cluster nodes be geologically spread out?
Me and a couple of friends are thinking of setting up a kubernetes cluster where our homeservers will act as the nodes.
As our nodes will be spread out between our apartments I am worried this will create problems when it comes to:
-
Exposing services outward, since the public IP of the nodes will be different.
-
Network speed and latency between distributed storage services, since we rely on our ISP network connections to communicate between nodes.
We all have around 100 to 250Mbit/s up/down, but handling large storage volumes might become a problem here still.
What challenges do we face when spreading our nodes like this?
Are there any specific functions, tutorials or guides I can read into to learn more about how to solve this?
Is our idea even viable at all?
I'm personally very new at kubermetes, but am very excited about it.
I am thankful for any answers.
Solution 1:
Posting this answer as a community wiki as the topic portrayed in the question without exact specification could be wide and won't have a definitive answer.
First of all you'll need to know the exact requirements for your cluster. What kind of control plane you would like to build. There are multiple options (single master, multiple masters). You can refer to the official documentation:
- Kubernetes.io: Docs: Concepts: Architecture
If you would like to solve a particular tasks, you'd be better to include it in the question as potential issues could be different for different workloads.
As for the beginning you could look on the kubeadm
requirements for the Kubernetes clusters:
- Kubernetes.io: Docs: Setup: Production environment: Kubeadm: Before you begin
Some of the potential issues you can encounter:
-
Controllers and services are designed to spread traffic evenly between pods. You can have troubles with networks delays when traffic jumps all over the nodes.
-
Kubernetes components have predefined timeouts for their operations. With distributed clusters you'll need to tweak them heavily to achieve predictable work without timeouts during cluster operations.
-
Next potential issue: NAT. All Kubernetes nodes are supposed to connect with other nodes without any address translations. It could come to the situation where you'll need to build a reliable VPN connection between sites.
-
You could run into issues with
ETCD
when building highly available Kubernetes cluster. Specifically the synchronization between theETCD
members. Any connection lost could lead to a lost quorum. Without quorum any changes to the cluster would be impossible and it may destroy the cluster.
As for storage I reckon there could be issues with data replication between sites (latency between sites, the amount of data sent between them, the type of storage solution used, ensuring all of the pods have access to the same data at the same time). You can look into the official documentation for storage concepts in Kubernetes:
- Kubernetes.io: Docs: Concepts: Storage