How to handle OOMkilled errors in KubernetesSeptember 28, 2021 No Comments
Featured article by Jeff Broth
Users will encounter different kinds of errors in any software development environment. This is the same when dealing with container developments. With its immense popularity, Kubernetes has become the leading container orchestration platform. Therefore, it is more likely that you will encounter errors in Kubernetes environments.
Therefore, it is important that we are aware of the common issues in k8s to remedy such issues quickly. In this post, we will look into the OOMkilled error, which is a common occurrence when dealing with Kubernetes.
What is the OOMkilled Error?
OOMkilled error, which is also identifiable by the error code 137, is a resource availability error. It is specifically related to memory, where inadequate memory allocation causes Pods to crash.
In Kubernetes, users can configure memory limits in resources such as Deployments, StatefulSets, Daemonsets, etc. Pods within the clusters will fail when these memory limits are exceeded. OOMkilled errors can occur due to memory limitations both at container creation or while the container is running.
Identifying OOMkilled Errors
The simplest way to identify an OOMkilled error is by checking the Pod status. Then use the describe or get command to explore the issue further. The output of the get pods command indicates any Pod crashes that happened due to OOMkilled error, as shown below.
As you can see, Kubernetes clearly indicates the error in the Pod status if a Pod crashes due to an OOMKilled error.
Fixing the OOMkilled Error
To fix an OOMkilled error, we need to figure out the memory allocation of the failing resource. We can use the describe node function to view the current resource allocations for a Pod, and we can see the resource allocation in the Non-terminated Pods section.
Here, what we need to focus on are the values of Memory Request and Memory Limit. As mentioned previously, users can define the following memory allocations.
- Memory Request - The amount of memory that can be requested by the container.
- Memory Limit - The maximum amount of memory that can be used by the container.
These values are specified in the resource section of the YAML file.
The memory request threshold can exceed the memory limit but not the defined memory limit. For instance, we have configured a memory request of 50MB and a memory limit of 100MB in the above YAML configuration. Even if the memory request is configured at 50MB, the container can request memory up to 100MB exceeding that amount, and the container will still function normally. However, the moment the memory limit exceeds 100MB, the container will crash due to the OOMkilled error.
The simplest way to remedy an OOMkilled error is to increase the memory limit and then recreate the container. This can be done either by increasing the memory limit value while keeping the memory request value unchanged or changing both values comparably.
We can use the kubectl edit command to edit the config if we are using a Kubernetes Deployment, StatefulSets, or Daemonsets. This will prompt the user with the option to edit the config. For instance, we have to change the memory limit from 100MB to 300MB in the following example. Then Kubernetes will automatically terminate the faulty container and provision a new one.
Note – The user will be prompted to edit the configurations in the default text editor depending on his operating system. However, we can use the KUBE_EDITOR environment variable to configure a different text editor.
Remember that we will not be able to directly edit the file if the YAML configuration is a Pod configuration, as the edit command does not support editing resource configurations of a Pod. In that case, we have to delete the Pod manually and recreate it with the updated configuration.
Additional Factors that relate to OOMkilled errors
Node Memory Limit
Even if we increase the memory limits in the YAML configurations, Pods will be stuck in the “Pending” state indefinitely if the nodes do not have enough capacity to allocate the required memory. This can also affect the ability of the node to function properly and will impact the overall cluster performance.
No Resource Specification
Resource limits are not mandatory, and users can create containers even without specifying them. In these instances, the container can use all the available memory within the node. If the memory limits are reached, the containers without specific resource limits are more likely to get terminated.
Namespace Default Memory Limits
If the K8s namespace is configured with a memory limit, it will be automatically applied to container configurations without explicit resource specifications. Breaching this limit can also lead to OOMkilled errors.
The OOMkilled error is a relatively uncomplicated error, yet with far-reaching consequences, leading to Pod crashes. The best approach to deal with this error is to have a proper resource allocation strategy for the Kubernetes deployments. Users can configure resource limits that will not affect the functionality of the application or the node by carefully evaluating the resource usage of the application and the availability of the resources in the k8s cluster.
CLOUD COMPUTING, DATA and ANALYTICS , OPEN SOURCE, SECURITY, SOCIAL BUSINESS