Kubernetes Taints and Tolerations with a Twist of Real Life

Have you ever walked into a party where there’s a “VIP” section? It’s not that the rest of the guests can’t sit there—it’s just that only those with the right wristband are allowed to.

This is essentially how Taints and Tolerations work in Kubernetes.

Real-Life Analogy: The “No Pets” Couch

Let’s say you have a house with multiple couches. You love dogs, but one of your couches is made of expensive suede and is marked with a sign: “No Pets Allowed.”

Your pet is still free to roam the rest of the house—just not on that couch.

But wait, what if your friend has a small, well-behaved dog wearing a tag that says “Allowed on Suede Couch”? That dog is now tolerated on the special couch.

In Kubernetes:

  • The couch is the node.
  • The “No Pets Allowed” sign is a taint.
  • The “Allowed on Suede Couch” tag is a toleration.

What are Taints?

Taints allow a node to repel certain pods unless they explicitly tolerate the taint. They consist of three components:

  • Key: Identifier for the taint (e.g., dedicated, gpu, spot).
  • Value (optional): Provides more context for the key (e.g., high-memory, nvidia).
  • Effect: Determines the scheduling behavior, can be one of:
    • NoSchedule: Prevent pods without matching tolerations from scheduling onto the node.
    • PreferNoSchedule: Kubernetes will try to avoid scheduling pods without matching tolerations onto the node.
    • NoExecute: Pods already running on the node without matching tolerations will be evicted immediately or after a specified toleration period.

Example of applying a taint to a node:

kubectl taint nodes <node-name> key=value:effect

What are Tolerations?

Tolerations enable a pod to be scheduled onto a node with matching taints. They don’t force the pod onto specific nodes but rather permit it.

Kubernetes Taints and Tolerations

Tolerations have these components:

  • Key: Matches the taint key on the node.
  • Operator (optional): Can be either Equal (default) or Exists.
  • Value (optional): Matches the taint value on the node (required if operator is Equal).
  • Effect (optional): Must match the taint effect (NoSchedule, PreferNoSchedule, or NoExecute).
  • tolerationSeconds (optional, for NoExecute only): Defines how long a pod can remain on the node after a taint with effect NoExecute is applied before eviction.

Example pod definition with tolerations:

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  containers:
  - name: example
    image: nginx
  tolerations:
  - key: "key"
    operator: "Equal"
    value: "value"
    effect: "NoSchedule"

Practical Use Case

Suppose you have nodes with high-performance GPUs reserved only for Machine Learning (ML) workloads. To ensure regular workloads do not occupy these GPU nodes:

Step 1: Taint the GPU node
kubectl taint nodes gpu-node dedicated=gpu:NoSchedule

This command means pods without tolerations for this taint won’t be scheduled onto the GPU node.

Step 2: Configure ML Pods to Tolerate the GPU Node
apiVersion: v1
kind: Pod
metadata:
  name: ml-training-pod
spec:
  containers:
  - name: training-container
    image: my-ml-image:latest
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"

Now, this ML pod specifically tolerates the taint and can be scheduled onto your GPU-equipped node.

Understanding Taint Effects

Kubernetes supports three taint effects:

  • NoSchedule: Pods without matching tolerations won’t be scheduled.
  • PreferNoSchedule: Kubernetes tries to avoid placing pods without matching tolerations but might still do so.
  • NoExecute: Existing pods without matching tolerations are evicted immediately (or after a set duration).
Example: Node Maintenance

When performing node maintenance, you may want to remove existing workloads gracefully:

kubectl taint nodes maintenance-node maintenance=true:NoExecute

Any pods without matching tolerations will be evicted immediately.

Pods can specify tolerations with an optional eviction grace period:

apiVersion: v1
kind: Pod
metadata:
  name: database-pod
spec:
  containers:
  - name: database-container
    image: database:latest
  tolerations:
  - key: "maintenance"
    operator: "Equal"
    value: "true"
    effect: "NoExecute"
    tolerationSeconds: 300

In this example, the pod tolerates the maintenance taint for 5 minutes (300 seconds), allowing it to shut down gracefully.

Final Thoughts

Taints and Tolerations are essential tools for workload isolation in Kubernetes. Use them when you want to:

  • Reserve nodes for special workloads
  • Protect sensitive or resource-intensive hardware
  • Gradually evict pods before maintenance
  • Control which pods go where

Subscribe to Blog via Email

Enter your email address to subscribe to
this blog and receive notifications of new posts by email.
0 Shares:
You May Also Like