Step 1. Query node information
Start by querying the node details to see if it reports anything that could point to the issue. Run the following command to display node details:Conditions
The Conditions section reports the status of disk and memory. The fields have the following meanings:- OutOfDisk indicates whether the node has run out of disk space.
- MemoryPressure shows if the node is under memory pressure.
- DiskPressure indicates if disk usage has reached a critical level.
- Ready is the main indicator you’re concerned with. If the node is in a “Not Ready” state, this field will show “False”.
Capacity and allocatable resources
These fields show the resources available to the node, such as CPU, memory, and the number of pods it can host. Make sure that the available resources meet the needs of your cluster.- Capacity resources are resources the node has physically available.
- Allocatable resources are the resources the node can allocate to pods after subtracting the overhead from the capacity (i.e., resources used by Kubernetes to manage the node).
Step 2. Check Kubelet logs
If the node information doesn’t provide clear insights, you can SSH into the affected node and check its Kubelet logs. Kubelet is responsible for managing the node’s lifecycle, and problems with it often result in nodes being marked as “Not Ready”. Connect to the node with the following command:- Certificate Errors indicate that the node may be unable to authenticate with the cluster due to expired or incorrect certificates.
- Authentication Errors can imply misconfigured or missing service accounts or tokens.
- Network Errors can indicate that Kubelet has trouble communicating with the control plane or other nodes.
Step 3: Address errors
Once you’ve identified the root cause, address the detected issues. Here are some common solutions based on the type of problem:- Resource Exhaustion: If a node runs out of resources (CPU, memory, disk), you can scale the cluster by adding more nodes, upgrading hardware, or adjusting resource limits and requests for the pods.
- Network Issues: If Kubelet cannot reach the API server or other nodes, verify the node’s network configuration, DNS settings, or firewall rules that might block necessary communication.