In a perfect world, highly-threaded I/O-intensive Linux containers running on Kubernetes would enjoy unimpeded access to CPU resources. However, reality often diverges from this ideal scenario. To bridge this gap and optimize Linux containers, application developers and DevOps teams must comprehend the intricacies of Linux task scheduling and CPU time allocation.

The notion of "real-time" containers centers around empowering mission-critical containers with time-sensitive performance and reliability requirements to coexist harmoniously with non-real-time containers on the same hardware. Before embracing this strategy, it's crucial to evaluate its practicality and feasibility. Under-resourcing real-time containers could lead to subpar application performance and compromise security efforts.

Attaining optimal CPU time for real-time containers necessitates several key capabilities and tools. These include the ability to manipulate I/O requests, a real-time container profile, a runtime environment that enables CPU requests, and a cooperative real-time operating system.

These requirements can be fulfilled by leveraging a real-time scheduler in the host kernel, installing a compatible container runtime engine version that integrates with the OS kernel scheduler, and configuring each container with parameters to handle special CPU requests and related requirements.

Regrettably, there is no officially-supported solution available at present. Developing one will require collaboration between host providers, runtime engine providers, and container developers to create a stable and functional system. For now, host providers caution against the risks of altering CPU allocation. Docker, for instance, advises:

“CPU scheduling and prioritization are advanced kernel-level features. … Setting these values incorrectly can cause your host system to become unstable or unusable”.

So, what can you do today? Your best bet is to follow the current default approach, which requires a brief historical context. In the early days of container technology, containers had no resource constraints, allowing them to utilize as much CPU as the host’s kernel scheduler permitted.

This led to various issues, with containers frequently being shortchanged on CPU resources in the free-for-all. The solution was to introduce a new CPU bandwidth control mechanism into the Linux kernel – the Completely Fair Scheduler (CFS). Integrated into Linux 2.6.23, CFS performs the following functions:

CFS ensures equitable CPU allocation.
If CPU access time provided to different tasks is imbalanced, CFS allocates the necessary time for shortchanged tasks to execute.
CFS tracks task balance by maintaining CPU access times in the virtual runtime. The smaller a task’s virtual runtime, the greater its recognized CPU need.
CFS employs “sleeper fairness” to ensure even idle tasks receive their fair share of CPU when required.
CFS does not rely on priorities directly.

Source: https://t8tech.com/it/architecture/unlock-lightning-fast-linux-containers-expert-strategies-for-real-time-and-i-o-intensive-workload-optimization/

CFS employs a hierarchical framework of time-ordered operations, analogous to a "Red-Black tree", as illustrated below:

Operations on this Red-Black tree occur in O(log n) time, with all executable tasks sorted by the p->se.yruntime key. CFS continually executes the leftmost task on the Red-Black tree, cycling through tasks to ensure each receives a turn and is allocated CPU resources.

Most modern Linux container runtime engines are built upon a cgroup subsystem, with a CPU scheduler under the OS CFS as the default. Notably, this means that in the OS scheduler, each cgroup possesses its virtual runtime subsystem. The OS scheduler grants a cgroup a turn, during which the cgroup consumes its CPU slices, and then passes its turn to the next virtual runtime.

Consequently, it’s essential to consider cgroups in CFS not in terms of processor counts, but in terms of time allocations. Fine-tuning the CPU cgroup subsystem that governs scheduling can ensure that tasks receive relative minimum resources, and can also enforce strict caps on process tasks to prevent them from utilizing more resources than provisioned.

The Linux scheduler’s handling of the CFS virtual runtime during the CFS scheduling class is depicted as follows:

Regrettably, there are distinct challenges to using these methods for controlling CPU allocations to containers. Under CFS, it’s not possible to designate higher priority tasks. I/O-intensive tasks require I/O-waits and syscalls. Because they often take short CPU shares into an I/O-wait stage and then yield to other tasks, the CFS tree tends to shift these tasks to the right – gradually but inevitably reducing their priority.

The dynamic equilibrium of the CFS tree doesn’t allow tasks in a cgroup to demand equal CPU usage. It’s crucial to understand that when tasks in a cgroup necessarily become idle, the cgroup yields its CPU shares to a global pool of leftover CPU time that other cgroups can borrow from. At the same time, it’s inescapable that tasks attached in the CFS queue share CPU resources.

Therefore, creating a complete real-time container under CFS is impossible. However, it is possible to create a “soft real-time container” that can capture extra CPU and achieve solid results before having its CPU allocation degraded after its deadline.

To meet the exacting demands of highly-threaded I/O intensive container applications, development teams must delve into the intricacies of how the Completely Fair Scheduler (CFS) achieves equilibrium within the Red-Black tree and master the art of maximizing the probability that critical tasks remain rooted in the leftmost nodes of the RB tree. Leveraging the Kubernetes CPU manager — which offers supplementary POD management and harnesses CFS mechanisms — is a prudent decision.

It is crucial to acknowledge that the subtleties of each task exert a profound impact on which optimization strategies will yield the most efficacious results. Development teams must embark on a process of experimentation and meticulously monitor the ensuing behaviors to successfully deploy soft real-time containers on Linux that will satisfy their application’s requirements.

Revolutionize Your Workflow: Expert Tips to Supercharge Linux Containers for Blazing-Fast Real-Time and I/O Performance