[Kubernetes series] This article takes you through the life cycle of Pod in K8S

Hits: 0


[Pod life cycle] in K8S[]

Three states of a Pod

The flow of the Pod life cycle Pending-> Running-> Succeeded/ Failed.
In the pod, the life cycle is mainly divided into three states. In the initial stage Pending, if a container is started, it enters the running Runningstate, and then enters the success or failure state Succeeded/ Failed.

While the Pod is running, the kubelet can restart the container to handle some failure scenarios. Inside the Pod, Kubernetes keeps track of the state of the different containers and the actions taken to make the Pod rerun.

In Kubernetes, a Pod contains a prescribed constraint part and an actual state part. The state of a Pod object contains a set of Pod conditions (Conditions).

A Pod will only be scheduled once in its lifetime. Once a Pod is scheduled (dispatched) to a node, the Pod will continue to run on that node until the Pod is stopped or terminated.

1. Pod life cycle

Like other standalone application containers, Pods are relatively ephemeral (rather than long-lived) entities. Pods are created, given a unique ID (UID), scheduled to a node, and run on that node until terminated (according to restart policy) or deleted.

If a node is killed, the node’s Pods are also scheduled to be deleted after the given timeout period expires.

Pods cannot regulate their own state like other languages. If the node where the Pod is located fails, the Pod will be deleted; similarly, the Pod cannot survive the node resource exhaustion or the node maintenance and eviction. Kubernetes uses a high-level abstraction to manage these relatively disposable Pod instances called controllers (controllers monitor the public state of the cluster through an apiserver and transition the current state to the desired state.).

Any given Pod (defined by UID) is never “rescheduled” to a different node; instead, the Pod can be replaced by a new, nearly identical Pod. If desired, the new Pod’s name can remain the same, but its UID will be different.

If something claims to have the same lifetime as a Pod, such as a storage volume, that means the object will exist for as long as the Pod (with the same UID) exists. If a Pod is deleted for any reason, even when an identical replacement Pod is created, the associated object (such as the volume here) will be deleted and rebuilt.

Let’s take a look at a Pod structure diagram :

A Pod with multiple containers contains a program for pulling files and a web server, both using persistent volumes as storage shared between containers.

2. Pod status

A Pod’s statusfield is a PodStatus object that contains a phasefield.

Let’s take a look at the status values ​​of the Pod life cycle:

If a node dies or loses contact with other nodes in the cluster, Kubernetes sets the phase of all Pods running on the lost node to Failed .

3. Container status

Kubernetes keeps track of the state of each container in a Pod and can use container lifecycle callbacks to trigger events at specific points in the container lifecycle.

Once the scheduler dispatches a Pod to a node, the kubelet starts creating containers for the Pod through the container runtime. There are three states of the container: Waiting (waiting), Running (running) and Terminated (terminated).

To check the status of the containers in the Pod, you can execute the following command.

$ kubectl describe pod

  • Waiting
    If the container is not in one of the Running or Terminated states, it is in the Waiting state. A container in the Waiting state is still running what it needs to do to start: for example, pulling a container image from a container registry, or applying Secret data to a container, etc. When you use kubectl to query a Pod containing a container in the Waiting state, you will also see a Reason field that gives the reason why the container is in the Waiting state.
  • Running The
    Running state indicates that the container is executing and no problems are occurring. If a postStart callback is configured, the callback has already been executed and completed. If you use kubectl to query the Pod containing the container in the Running state , you will also see information about the container entering the Running state.
  • Terminated A
    container in the Terminated state has started executing and either ended normally or failed for some reason. If you use kubectl to query the Pod containing a container in the Terminated state , you will see why the container entered this state, the exit code, and the start and end times of the container’s execution period.

Fourth, the container restart

The Podspec contains a restartPolicyfield whose possible values ​​include Always, OnFailure, and Never. The default value is Always.

restartPolicyApplies to all containers in a Pod. Only container restart actions for the kubeletrestartPolicy on the same node . When a container in a Pod exits, the kubelet calculates the restart delay (10s, 20s, 40s, …) in an exponential back-off fashion, with a maximum delay of 5 minutes. Once a container executes for 10 minutes without issue, the kubelet resets the restart fallback timer for that container.

5. Pod status

A Pod has a PodStatus object, which contains an array of PodConditions. Pods may or may not pass some of these condition tests.

  • PodScheduled : Pod has been scheduled to a node;
  • ContainersReady : All containers in the Pod are ready;
  • Initialized : All Init containers have completed successfully;
  • Ready : The Pod can serve requests and should be added to the load balancing pool for the corresponding service.
Field Name describe
type The name of the Pod state
status Indicates whether the condition is applicable, possible values ​​are “True”, “False” or “Unknown”
lastProbeTime Timestamp when the Pod status was last probed
lastTransitionTime The timestamp of the last time the Pod transitioned from one state to another
reason Machine-readable, camel-cased (UpperCamelCase) text describing the reason for the last change in condition
message A readable message giving details of the last state transition

5.1. Pod ready state

Applications can inject additional feedback or signals into PodStatus : Pod Readiness. To use this feature, a readinessGateslist to provide the kubelet with an additional set of conditions for it to use when evaluating Pod readiness.

The readiness gate makes decisions based on the current value of the Pod’s status.conditionsfield . If Kubernetes cannot find a state in the status.conditionsfield , the state’s state value defaults to ” False“.

Let’s take a look at a piece of code:

  phase: Running
      status: 'True' 
      lastProbeTime: null
      lastTransitionTime: '2022-07-01T06:36:04Z'
    - type: Ready
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2022-07-01T06:36:05Z'
    - type: ContainersReady
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2022-07-01T06:36:05Z'
    - type: PodScheduled
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2022-07-01T06:36:04Z'

5.2, Pod ready state

use the command directly

$ kubectl patch

It does not modify the state of the object. Applications or Operators need to use actions if they need to set up Pods . You can use one of the Kubernetes client libraries to write code to set custom Pod states for Pod readiness.status.conditions

For a Pod using a custom state, the Pod will only be evaluated as ready if the following statements apply:

  • All containers in the Pod are ready;
  • All conditions in readinessGates are True values.

When a Pod’s containers are all ready, but at least one custom state has no value or is False, the kubelet sets the Pod’s state to ContainersReady.

6. Container Probe

Probes are periodic diagnostics performed by the kubelet on the container . To perform diagnostics, the kubelet can either execute code inside the container or make a network request.

6.1. Inspection mechanism

There are four different ways to use a probe to inspect a container. Every probe must be defined exactly as one of these four mechanisms:

  • exec
    executes the specified command inside the container. Diagnostics are considered successful if the command exits with a return code of 0.
  • grpc
    uses gRPC to perform a remote procedure call. The target should implement gRPC health checks. Diagnosis is considered successful if the status of the response is “SERVING”. The gRPC probe is an alpha feature that is only available if you enable the “GRPCContainerProbe” feature gate.
  • httpGet
    performs an HTTP GET request to the specified port and path on the container’s IP address. If the status code of the response is greater than or equal to 200 and less than 400, the diagnosis is considered successful.
  • tcpSocket
    performs TCP checks on the specified port on the container’s IP address. Diagnostics are considered successful if the port is open. It counts as healthy if the remote system (container) closes the connection immediately after opening it.

6.2. Test results

Each probe will get one of three results:

  • Success The
    container passed the diagnostics.
  • Failure The container failed
  • Unknown Diagnostic failed ,
    so no action will be taken.

6.3. Detection type

For a running container, the kubelet can choose whether to execute the following three probes, and how to react to the probe results:

  • livenessProbe
    indicates whether the container is running. If the liveness probe fails, the kubelet will kill the container and the container will decide the future according to its restart policy. If the container does not provide a liveness probe, the default state is Success .
  • readinessProbe
    indicates whether the container is ready to serve requests. If the readiness probe fails, the endpoint controller removes the Pod’s IP address from the endpoint list of all services that match the Pod. The state value of the ready state before the initial delay defaults to Failure . If the container does not provide a readiness probe, the default state is Success .
  • startupProbe
    indicates whether the application in the container has been started. If a startup probe is provided, all other probes are disabled until this probe succeeds. If the startup probe fails, the kubelet will kill the container and the container will restart according to its restart policy. If the container does not provide a startup probe, the default state is Success .

6.4. Using the Survival Probe

Liveness probes are not necessarily required if a process in a container is capable of crashing itself if it encounters a problem or is unhealthy; the kubelet will restartPolicyautomatically .

If you want the container to be killed and restarted if the probe fails, then specify a liveness probe and specify restartPolicy as ” Always ” or ” OnFailure “.

6.5, using the ready state Probe

Specify a readiness probe if you want to start sending request traffic to the Pod only when the probe is successful. In this case, the readiness probe may be the same as the liveness probe, but the presence of the readiness probe in the specification means that the Pod will not receive any data during the startup phase, and will only start after the probe is successful Receive data.

If you want the container to be able to enter maintenance state on its own, you can also specify a readiness probe that checks for an endpoint that is specific to readiness and thus different from liveness probes.

If your application has strict dependencies on backend services, you can implement both liveness and readiness probes. When the application itself is healthy and the liveness probe is passed, the readiness probe will additionally check whether each required backend service is available. This can help you avoid directing traffic to Pods that only return error messages.

If your container needs to load large data, configuration files, or perform migrations during startup, you can use startup probes. However, if you want to differentiate between applications that have failed and applications that are still processing their startup data, you may prefer to use readiness probes.

If you just want to be able to drain requests when a Pod is deleted, you don’t necessarily need to use a readiness probe; when a Pod is deleted, the Pod will automatically put itself in a not-ready state, regardless of whether the readiness probe exists. While waiting for the containers in the Pod to stop, the Pod will remain in the not-ready state.

6.6. When to start Probe

Startup probes are useful for Pods that contain containers that take a long time to start up. You no longer need to configure a longer liveness probe interval, you just need to set another separate configuration option to perform probes on containers during startup, allowing for much longer than the liveness interval allows.

If your container startup time typically exceeds the initialDelaySeconds + failureThreshold × periodSecondstotal , you should set up a startup probe that performs a check on the same endpoint used by the liveness probe. The default value of periodSeconds is 10 seconds. You should set its failureThreshold high enough to give the container enough time to finish starting, and avoid changing the default value used by the liveness probe. This setting helps reduce the occurrence of deadlock conditions.

7. Stop Pod

Since Pods represent processes running on nodes in the cluster, it is important to allow these processes to terminate gracefully when they are no longer needed. They should generally not be killed arbitrarily with the KILL signal, so that these processes do not have a chance to complete the cleanup.

The design goal is to allow you to request the deletion of a process and know when the process is terminated, while also ensuring that the deletion will eventually complete. When you request to delete a Pod, the cluster records and tracks the Pod’s graceful termination cycle, rather than forcibly killing the Pod directly. In the presence of a force shutdown facility, the kubelet will attempt to gracefully terminate pods.

Typically, the container runtime sends a TERM signal to the main process in each container. Many container runtimes are able to notice the value of STOPSIGNAL in the container image and send that signal instead of TERM. Once the decent termination period has passed, the container runtime sends a KILL signal to all remaining processes, after which the Pod is removed from the API server. If the kubelet or the container runtime’s management service is restarted while waiting for the process to terminate, the cluster will retry from the beginning, giving the Pod a full decent termination deadline.

Here, we have to review the kubectl command line again, because in the following examples, we will use it.

$ kubectl help

kubectl controls the Kubernetes cluster manager.

 Find more information at:

Basic Commands (Beginner):
  create          Create a resource from a file or from stdin
  expose          Take a replication controller, service, deployment or pod and
expose it as a new Kubernetes service
  run runs a specified image in the cluster
  set              sets a specified characteristic for objects

Basic Commands (Intermediate):
  explain          Get documentation for a resource 
  get              shows one or more resources
  edit edits a resource on the server
  delete          Delete resources by file names, stdin, resources and names, or
by resources and label selector

Deploy Commands:
  rollout         Manage the rollout of a resource
  scale           Set a new size for a deployment, replica set, or replication
  autoscale       Auto-scale a deployment, replica set, stateful set, or
replication controller

Cluster Management Commands:
  certificate modifies the certificate resource.
  cluster-info    Display cluster information
  top             Display resource (CPU/memory) usage
  cordon marks node as unschedulable
  uncordon marks node as scheduled
  drain           Drain node in preparation for maintenance
  taint updates taints on one or more nodes

Troubleshooting and Debugging Commands:
   describe Displays the resources details of
           a specified resource or group . logs             Outputs the logs of the container in the pod
  attach Attach to a running container 
  exec executes a command in a container
  port-forward    Forward one or more local ports to a pod
  proxy run a proxy to the Kubernetes API server 
  cp Copy files and directories to  and  from containers
  auth            Inspect authorization
  debug           Create debugging sessions for troubleshooting workloads and

Advanced Commands:
  diff            Diff the live version against a would-be applied version
  apply           Apply a configuration to a resource by file name or stdin
  patch           Update fields of a resource
  replace         Replace a resource by file name or stdin
  wait            Experimental: Wait for a specific condition on one or many
  kustomize       Build a kustomization target from a directory or URL.

Settings Commands:
  label update the labels on this resource
  annotate updates the annotation of a resource
  completion      Output shell completion code for the specified shell (bash,
zsh or fish)

Other Commands:
  alpha           Commands for features in alpha
  api-resources   Print the supported API resources on the server
  api-versions    Print the supported API versions on the server, in the form of
  config Modify the kubeconfig file
  plugin           Provides utilities for interacting with plugins
   version          output client and server version information

  kubectl [flags] [options]

Use "kubectl <command> --help" for more information about a given command.
Use "kubectl options" for a list of global command-line options (applies to all

Let’s look at another example:

  1. Use the kubectl tool to manually delete a specific Pod with a decent termination deadline of the default (30 seconds).
  2. The Pod object in the API server is updated to record the eventual death of the Pod covering the decent termination period, beyond which the Pod is considered dead. If you use kubectl describe to examine the Pod you are deleting, the Pod will show up as “Terminating”. On the node where the pod is running: Once the kubelet sees that the pod is marked as terminating (with a decent termination deadline already set), the kubelet starts a local pod shutdown process.
  3. At the same time, the kubelet initiates the graceful shutdown logic, and the control plane removes the Pod from the corresponding endpoint list (and endpoint slice list, if enabled), filtered if the Pod is selected by the corresponding service with some selector. ReplicaSets and other workload resources no longer consider shutting down in-process Pods as legitimate, capable replicas. Pods that are slow to shut down also cannot continue processing request data because the load balancer (such as a service proxy) has already removed it from the endpoint list at the beginning of the termination grace period.
  4. When the termination grace period is exceeded, the kubelet triggers a forced shutdown process. The container runtime sends a SIGKILL signal to all processes still running in the container in the Pod. The kubelet will also clean up hidden pause containers, if the container runtime uses such a container.
  5. The kubelet triggers the logic to force the deletion of the Pod object from the API server, and sets the graceful termination deadline to 0 (which means immediate deletion).
  6. The API server deletes the Pod’s API object, which is no longer visible from any client.

7.1. Force termination of Pod

By default, all delete operations are accompanied by a 30-second grace period. kubectl deleteThe command supports --grace-period=<seconds>options that allow you to override the default value and set your desired expiration value.

Forcing the grace period to 0 means immediately removing the Pod from the API server. If the Pod is still running on a node, a forced delete operation will trigger the kubelet to perform an immediate cleanup operation.

You must --grace-period=0set additional --forceparameters at the same time to initiate a forced delete request.

When performing a force delete operation, the API server no longer waits for a confirmation message from the kubelet that the Pod has terminated execution on the node it was running on. The API server directly deletes the Pod object so that a new Pod with the same name can be created. On the node side, Pods that are set to terminate immediately still get a little grace time before being force-killed.

7.2. Garbage collection of failed Pods

For a failed Pod, the corresponding API object remains on the cluster’s API server until the user or controller process explicitly deletes it.

The control plane component deletes terminated Pods (with a stage value of Succeeded or Failed ) when the number of Pods exceeds the configured threshold (according to kube-controller-managerthe setting of ). This behavior avoids resource leaks caused by constantly creating and terminating Pods over time.terminated-pod-gc-threshold


The full text of this article is almost 10,000 words. The article refers to the Kubernetes life cycle section.

You may also like...

Leave a Reply

Your email address will not be published.