Rearchitecting Kubernetes for the Edge: The paper

4 min readNov 29, 2024

Does eventual consistency slow down Kubernetes?

The Rearchitecting Kubernetes for the Edge paper looks at how the strong consistency requirements of etcd, the distributed key-value store powering Kubernetes, can impact performance in Kubernetes environments. This performance degradation is particularly noticeable in edge scenarios, where Kubernetes clusters often consist of thousands of small nodes with limited RAM and CPU. My key takeaway from the paper is that while etcd may appear to be a bottleneck, its strong consistency is a necessary trade-off to maintain the core reliability and features Kubernetes promises.

Paper Summary

Containers and container orchestration have become widespread in the industry, and edge computing use cases for containerized environments are becoming more common. These edge environments typically consist of thousands of small nodes with limited resources like RAM and CPU. Kubernetes is the most popular container orchestration platform and it relies on etcd to store the current and desired state of the cluster. This makes etcd a common path in answering all requests related to the cluster. etcd is a strongly consistent, distributed key-value store, which ensures that the cluster state is always accurate across all nodes.

However, this strong consistency requirement introduces challenges. Every operation on the cluster, including pod scheduling, requires multiple interactions with etcd to read and write the state. Since etcd is strongly consistent and requires replication for fault tolerance, this process is resource-intensive and can slow down performance, especially in environments with limited resources like edge nodes.

A key insight from the paper is:

Ultimately Kubernetes is limited by a fundamental design decision: the reliance on strong consistency in the datastore.

In search of a better solution, the paper proposes replacing etcd with an eventually consistent datastore that would avoid the overhead of maintaining strong consistency.

Etcd’s role in the cluster:

Figure depicting requests required for a pod to get scheduled

In a Kubernetes cluster, the current and desired state of the system is stored in etcd. The workflow for scheduling a new pod involves several steps where etcd is accessed multiple times for reads and writes.

A pod is scheduled, and the system reaches out to etcd to determine the current configuration.
The configuration is passed to the controller, which decides what actions need to be taken.
The scheduler is then informed of which node should run the pod.
The etcd cluster is updated to reflect the new state of the pod on the selected node through the kubelet.
There are also periodic updates made to etcd for the said pod once the pod has been scheduled for up to date statuses

At each of these steps, etcd performs a write operation, which requires consensus among the cluster nodes to ensure data consistency. As the cluster grows, so do the number of interactions with etcd, and the latency introduced by etcd’s strong consistency mechanisms can become a bottleneck. The more pods you have, the more etcd is called, and the slower the cluster can respond to changes.

The proposed solution:

Figure depicting etcd replaced from figure 1 with an eventually consistent datastore

The paper suggests replacing etcd with an eventually consistent datastore. Unlike etcd, which requires a quorum for each write, an eventually consistent datastore can perform faster because it doesn’t require consensus for each operation. This can help reduce latency, particularly in environments where avoiding the slowest node in the consensus process is critical.

The paper recommends using a lazy sync approach in the background, utilizing Conflict-Free Replicated Data Types (CRDTs) to reconcile any eventual inconsistencies.

My Take

On paper, this approach makes sense, but there are a few issues that need to be addressed in practice:

Leader Failover and Outdated Views: If the leader node goes down, and a follower node has an outdated view of the cluster, there’s a risk of unnecessary pod creation or scheduling when it’s not needed. This could lead to duplication of resources and inefficiency.
Idempotency and Resource Waste: To avoid duplication, APIs would need to be idempotent — ensuring that repeated requests don’t result in multiple changes. However, this is a significant challenge in edge environments, where resources are already constrained. If duplicate pods are scheduled because of an outdated cluster state, this can waste valuable resources.

The eventually consistent approach could work in use cases where resource duplication is not a critical concern, but in edge environments where every byte and CPU cycle counts, these concerns become more important. In these scenarios, avoiding unnecessary resource consumption is key.

In summary, the paper provides a compelling argument for reconsidering etcd’s role in Kubernetes, especially for edge environments. The proposed shift to an eventually consistent datastore could reduce performance bottlenecks, but it introduces complexities in terms of consistency, leader failover, and resource duplication. While the paper presents an interesting alternative, the trade-offs involved in moving to eventual consistency must be carefully considered, particularly in resource-constrained environments like the edge.

Ultimately, if this were a simple switch to implement, the major players in the Kubernetes ecosystem would likely have already adopted it.

Sources:

https://dl.acm.org/doi/10.1145/3434770.3459730

Rearchitecting Kubernetes for the Edge: The paper

Paper Summary

Etcd’s role in the cluster:

The proposed solution:

My Take

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Ankit Trehan

No responses yet