Kube-Reserved/System-Reserved vs Eviction Threshold

I'd like to properly prepare our self-managed clusters for resource pressure scenarios. From the docs I cannot understand the need for configuring the --eviction-hard parameter, when we can achieve the same effect by setting up proper values via --kube-reserved for kubelet and --system-reserved for system daemons.

Let me ask through an example. Why would I need to set the reservations for Kubelet and system-daemons, when seemingly it would suffice to configure --eviction-hard? Whenever there's a resource pressure in general this should be enough to trigger a pod eviction event. So what's the reason for the existence of the options for Kubelet and system-daemons reservations?


As per official documentation:

Node Allocatable

enter image description here

'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now.

The Node allocatable (the resources that scheduler can use to allocate the workload) can be defined as:

  • Node allocatable = Node capacity - kube-reserved - system-reserved

Also, as for:

  • kube-reserved:

kube-reserved is meant to capture resource reservation for kubernetes system daemons like the kubelet, container runtime, node problem detector, etc. It is not meant to reserve resources for system daemons that are run as pods. kube-reserved is typically a function of pod density on the nodes.

-- Kubernetes.io: Docs: Tasks: Administer cluster: Reserve compute resources: Kube reserved

  • system-reserved:

system-reserved is meant to capture resource reservation for OS system daemons like sshd, udev, etc. system-reserved should reserve memory for the kernel too since kernel memory is not accounted to pods in Kubernetes at this time. Reserving resources for user login sessions is also recommended (user.slice in systemd world).

-- Kubernetes.io: Docs: Tasks: Administer cluster: Reserve compute resources: System reserved

In short, you can easily imagine what would happen when you do not reserve enough resources for system components and the Kubelet.

You can even come to the situation where the eviction handler will not come to play because the system will already go into unstable state.

Also worth to mention that:

One thing that you can do with --kube-reserved and --system-reserved is to reserve the CPU needed for those components where the --eviction-hard is basing only on the memory and ephemeral storage.