In the previous article, I provided a detailed introduction to the concept of Pods, which are the most important in the Kubernetes project. In today's article, I will share more details about Pod objects.
Pods vs. Virtual Machines
By now, you should be very clear that it is the Pod, not the container, that is the smallest orchestration unit in the Kubernetes project. This design is reflected in the API objects, where the container becomes just a regular field within the Pod attributes. Naturally, the question arises: which attributes belong to the Pod object, and which belong to the Container?
The Pod plays the role of a "virtual machine" in traditional deployment environments. This design is intended to make the transition from traditional environments (virtual machine environments) to Kubernetes (container environments) smoother.
If we consider the Pod as the "machine" in a traditional environment and the container as the "user program" running on this "machine," then many aspects of the Pod object's design become much easier to understand.
For example, attributes related to scheduling, networking, storage, and security are essentially at the Pod level. The common feature of these attributes is that they describe the "machine" as a whole, not the "programs" running inside.
For instance, configuring the "machine's" network card (i.e., the Pod's network definition), configuring the "machine's" disk (i.e., the Pod's storage definition), and configuring the "machine's" firewall (i.e., the Pod's security definition). Not to mention on which server this "machine" is running (i.e., the Pod's scheduling).
Kubernetes YAML
In Kubernetes, we define Pod resources through declarative files.
NodeSelector: This is a field that allows users to bind a Pod to a Node, as shown below:
apiVersion: v1
kind: Pod
...
spec:
nodeSelector:
disktype: ssd
Such a configuration means that this Pod can only run on a node with the "disktype: ssd" label; otherwise, it will fail to schedule.
NodeName: Once this field of a Pod is assigned, the Kubernetes project will consider that the Pod has been scheduled, and the result of the scheduling is the assigned node name. Therefore, this field is generally set by the scheduler, but users can also set it to "trick" the scheduler, although this practice is usually only used during testing or debugging.
HostAliases: Defines the content of the Pod's hosts file (e.g., /etc/hosts), as shown below:
apiVersion: v1
kind: Pod
...
spec:
hostAliases:
- ip: "10.1.2.3"
hostnames:
- "foo.remote"
- "bar.remote"
...
It should be noted that in the Kubernetes project, if you want to set the content of the hosts file, you must do so through this method. Otherwise, if you directly modify the hosts file, kubelet will automatically overwrite the modified content after the Pod is deleted and recreated.
In addition to the above "machine" related configurations, you may also find that any attributes related to the Linux Namespace of the container are also at the Pod level. This is also easy to understand: the design of the Pod is to allow the containers within it to share as many Linux Namespaces as possible, retaining only the necessary isolation and restriction capabilities. In this way, the effect simulated by the Pod is very similar to the relationship between programs in a virtual machine.
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-demo
spec:
containers:
- name: lifecycle-demo-container
image: nginx
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
preStop:
exec:
command: ["/usr/sbin/nginx","-s","quit"]
This is a Pod YAML file from the official Kubernetes documentation. It is actually very simple, just defining a container with the nginx image. However, in the containers section of this YAML file, you will see that the container has set postStart and preStop parameters.
What does this mean?
Let's start with postStart, which refers to an operation that is immediately executed after the container starts. It should be clear that the operation defined by postStart is executed after the Docker container's ENTRYPOINT, but it does not strictly guarantee order. That is to say, when postStart starts, the ENTRYPOINT may not have ended yet.
Of course, if the postStart execution times out or encounters an error, Kubernetes will report an error message indicating that the container failed to start in the Events of that Pod, causing the Pod to also be in a failed state.
Similarly, the timing of preStop occurs before the container is
killed (for example, when it receives a SIGKILL signal). It is important to clarify that the execution of the preStop operation is synchronous. Therefore, it blocks the current container killing process until the defined operation is completed, which is different from postStart.
So, in this example, after the container successfully starts, it writes a "greeting message" to /usr/share/message (i.e., the operation defined by postStart). And before this container is deleted, we first call the nginx exit command (i.e., the operation defined by preStop), thus achieving a "graceful shutdown" of the container.
After becoming familiar with the main fields of the Pod and its Container section, I will share the lifecycle of such a Pod object in Kubernetes.
The changes in the Pod lifecycle are mainly reflected in the Status part of the Pod API object, which is its third important field besides Metadata and Spec. Among them, pod.status.phase represents the current state of the Pod, with the following possible conditions:
Pending. This status means that the Pod's YAML file has been submitted to Kubernetes, the API object has been created and saved in Etcd. However, some containers in this Pod cannot be created smoothly for some reason. For example, scheduling failure.
Running. In this state, the Pod has been successfully scheduled and is bound to a specific node. All containers it contains have been successfully created, and at least one is currently running.
Succeeded. This state means that all containers in the Pod have run to completion and have exited. This situation is most common when running one-time tasks.
Failed. In this state, at least one container in the Pod has exited in an abnormal state (a non-zero return code). The appearance of this state means you need to figure out how to debug the application in the container, such as checking the Pod's Events and logs.
Unknown. This is an abnormal state, indicating that the Pod's status cannot be continuously reported by kubelet to the kube-apiserver, which is likely due to a communication problem between the master and the Kubelet.
Furthermore, the Status field of the Pod object can be further broken down into a set of Conditions. These detailed status values include: PodScheduled, Ready, Initialized, and Unschedulable. They are mainly used to describe the specific reasons for the current Status.
For example, if the current Status of the Pod is Pending, and the corresponding Condition is Unschedulable, it means there is a problem with its scheduling.
Among them, the Ready sub-status is particularly worth our attention: it means that the Pod has not only started normally (in the Running state) but is also ready to provide services externally. There is a difference between these two (Running and Ready), so you might want to think about it carefully.
These status messages of the Pod are an important criterion for us to judge the running condition of the application, especially when the Pod enters a non-"Running" state, you must be able to react quickly, start tracking and locating based on the represented abnormal situation, rather than frantically consulting the documentation.
Summary
In today's article, I have explained in detail the Pod API object, introduced the core usage methods of Pods, and analyzed the similarities and differences between Pods and Containers in terms of fields. I hope these explanations can help you better understand and remember the core fields in the Pod YAML and their precise meanings.
In fact, the Pod API object is the most core concept in the entire Kubernetes system and is also used in the controllers I will explain later on.