A Deep Dive into Node Draining and Eviction for Safe Node Removal and Workload Migration

How to Gracefully Handle Node Removal and Workload Migrations in Kubernetes with Node Draining and Eviction

--

Photo by orbtal media on Unsplash

Node Draining and Eviction are critical operations that allow you to gracefully handle the removal of a node from a Kubernetes cluster. These operations are essential for maintenance tasks like software upgrades, hardware replacements, or network maintenance.

Today, we will discuss how to safely remove nodes from a Kubernetes cluster and gracefully handle workload migrations, covering the following topics:

  • Understanding Node Draining and Eviction in Kubernetes.
  • How to Perform Node Draining.
  • How to Perform Node Eviction.
  • Best Practices for Node Draining and Eviction

Why Node Draining and Eviction are important

Node draining and eviction are essential for maintaining the stability and reliability of a Kubernetes cluster. They allow you to safely remove a node from the cluster without disrupting the running workloads or causing any downtime to the services. For example, if a node needs to be taken down for maintenance or hardware replacement, you can use node draining to gracefully migrate the workloads to other nodes in the cluster before shutting down the node. This ensures that the services remain available and responsive to the users.

On the other hand, if a node becomes unresponsive or crashes, you can use node eviction to forcefully remove the node from the cluster and reschedule the affected workloads on other healthy nodes. This prevents the failed node from causing any further issues or affecting the performance of the cluster.

Understanding Node Draining and Eviction in Kubernetes.

Before we dive into the details of node draining and eviction, let’s first understand what these terms mean and when they should be used.

  • Node Draining: Node draining is the process of gracefully terminating workloads running on a node before the node is taken down for maintenance or decommissioned from the cluster. Node draining ensures that running workloads are not disrupted and are migrated to other healthy nodes in the cluster.
  • Node Eviction: Node eviction is the process of forcefully removing a node from the cluster due to various reasons such as hardware failures, network issues, or maintenance activities. When a node is evicted, all the running workloads on the node are rescheduled to other healthy nodes in the cluster.

Now that we have a basic understanding of node draining and eviction, let’s see how these operations can be performed in Kubernetes.

How to Perform Node Draining.

Performing node draining in Kubernetes involves the following steps:

  • Label the node: Before you start draining a node, you need to label the node with a custom label to indicate that it is being drained. This is done using the following command:
kubectl label nodes <node-name> node-role.kubernetes.io/draining=drain
  • Cordon the node: The next step is to cordon the node, which prevents any new workloads from being scheduled on the node. This is done using the following command:
kubectl cordon <node-name>
  • Drain the node: Once the node is cordoned, you can start draining the node by evicting the workloads from the node. This is done using the following command:
kubectl drain <node-name> --ignore-daemonsets

The — ignore-daemonsets flag ensures that the workloads running as daemonsets are not evicted from the node.

  • Verify the drain status: After the drain command is executed, you can verify the status of the drain operation using the following command:
kubectl get nodes <node-name> -o jsonpath='{.metadata.annotations.kubernetes\.io/config\.mirror-status}'

If the output of this command shows “evacuated” status, it means that all the workloads on the node have been successfully migrated to other nodes in the cluster.

  • Remove the node: Once the node has been successfully drained, you can remove the node from the cluster using the following command:
kubectl delete node <node-name>

How to Perform Node Eviction

Performing node eviction in Kubernetes involves the following steps:

  • Identify the failed node: The first step in node eviction is to identify the node that has failed or is unresponsive. You can use commands like kubectl get nodesor kubectl describe node to get the status of the nodes in the cluster.
  • Mark the node as unschedulable: If the node is still running but needs to be taken down for maintenance or hardware replacement, you can mark the node as unschedulable using the following command:
kubectl cordon <node-name>

This will prevent any new workloads from being scheduled on the node.

  • Evict the node: If the node is unresponsive or crashed, you can evict the node using the following command:
kubectl delete node <node-name>

This command will forcefully remove the node from the cluster, and all the running workloads on the node will be rescheduled to other healthy nodes in the cluster.

Best Practices for Node Draining and Eviction

  1. Always plan ahead: Node draining and eviction should be planned well in advance to avoid any unexpected downtime. Make sure to communicate with the stakeholders and inform them of the maintenance activities beforehand.
  2. Use appropriate drain timeouts: The drain timeout specifies the duration for which the Kubernetes control plane waits for the workloads to gracefully terminate on the node being drained. It is essential to set appropriate timeouts based on the workload requirements to ensure that the workloads are not terminated abruptly.
  3. Check for any pending updates: Before draining a node, make sure to check if there are any pending updates or patches that need to be applied to the node. Applying updates before draining the node ensures that the node is up-to-date and does not cause any security vulnerabilities in the cluster.
  4. Monitor the drain status: It is essential to monitor the drain status regularly to ensure that the workloads are being migrated successfully to other healthy nodes in the cluster.
  5. Use taints and tolerations: Taints and tolerations can be used to prevent workloads from being scheduled on a node that is being drained or is in the process of being evicted. This ensures that new workloads are not deployed on the node and cause any disruptions.
  6. Test the process: It is always a good practice to test the node draining and eviction process in a non-production environment to ensure that the process works as expected and does not cause any disruptions to the running workloads.
Photo by Quino Al on Unsplash

In conclusion, node draining and eviction are critical operations that help you gracefully handle the removal of nodes from a Kubernetes cluster. By using the right tools and techniques, you can ensure that these operations are performed safely and without any disruptions to the running workloads. It is essential to plan ahead, set appropriate timeouts, check for pending updates, monitor the drain status, use taints and tolerations, and test the process in a non-production environment.

By following these best practices, you can ensure that your Kubernetes cluster is always up-to-date and running smoothly. I hope this blog post has helped you understand how to safely remove nodes from a Kubernetes cluster and gracefully handle workload migrations.

Thanks for reading! I’d appreciate your support and engagement in my stories. :)

Stay informed and entertained by subscribing to my Medium Newsletter. Get my latest articles and content first!

👋 If you find this helpful, please click the clap 👏 button below a few times to show your support for the author 👇

🚀Join FAUN Developer Community & Get Similar Stories in your Inbox Each Week

--

--

Discover the intersection of DevOps, InfoSec, and mindfulness with Ink Insight. Follow for valuable insights! ✍︎ 👨‍💻 🧘🏼