Why Kubernetes Clusters Should Have an Odd Number of Nodes

In Kubernetes, clusters often use an odd number of nodes (such as 3, 5, or 7) for components that rely on consensus algorithms, particularly etcd. This practice is rooted in the need to ensure high availability and fault tolerance while maintaining data consistency. Let’s break down the rationale with an explanation of the Raft consensus algorithm

For a deeper dive into Kubernetes architecture, check out our Understanding the Components of Kubernetes Architecture

Understanding Consensus in etcd and Kubernetes

What is etcd in Kubernetes?

  • etcd is a distributed key-value store that holds the state and configuration data of a Kubernetes cluster.
  • etcd uses the Raft consensus algorithm to ensure that data remains consistent across multiple nodes, even when failures occur.

How the Raft Consensus Algorithm Works

The Raft algorithm ensures that a cluster can agree on a single source of truth (state) even if some nodes fail. To make a decision (write/update), a quorum or majority of nodes must agree.

Formula for Quorum:

Quorum=(N/2)+1

Where N is the total number of nodes.
Example for 3 nodes: Quorum = (3 / 2) + 1 = 2
Example for 5 nodes: Quorum = (5 / 2) + 1 = 3

Why does an Odd Number maximize fault Tolerance?

When using an odd number of nodes:

  1. Maximized Fault Tolerance:
    • In a 3-node cluster, 1 node can fail, and the remaining 2 nodes still form a quorum.
    • In a 5-node cluster, 2 nodes can fail, and the remaining 3 nodes still form a quorum.
  2. Avoid Split-Brain Scenarios:
    • With an even number of nodes (e.g., 4), if half the nodes fail (2 nodes), the remaining 2 nodes cannot form a majority, leading to a split-brain scenario where the system cannot decide which side is correct.
    • An odd number eliminates the possibility of equal splits.
  3. Efficiency:
    • Using an odd number minimizes the number of nodes needed to achieve a higher fault tolerance.
    • Adding nodes beyond what’s necessary increases complexity without a proportional benefit.
Cluster SizeQuorumNodes That Can FailFault Tolerance
3 Nodes21High
4 Nodes31No improvement
5 Nodes32Higher
6 Nodes42No improvement
7 Nodes43Even Higher

Using an odd number of nodes in Kubernetes ensures the cluster remains resilient and available, even during node failures. This practice leverages the Raft consensus algorithm to avoid split-brain scenarios and optimize fault tolerance.

Subscribe to Blog via Email

Enter your email address to subscribe to
this blog and receive notifications of new posts by email.
0 Shares:
You May Also Like