Node Disruption Policies
TOC
Understanding Node Disruption in Machine Configuration
By default, when you make certain changes to the systemd units section of a MachineConfig object, the Machine Configuration Operator will drain and reboot the nodes associated with that MachineConfig. However, changes to regular file entries typically do not cause a reboot, which may result in the configuration not taking effect as expected. To address this, you can define node disruption policies to specify which types of changes should trigger node reboots or other disruption actions. These policies are configured in the MachineConfiguration object located in the cpaas-system namespace. See the example below for configuring a node disruption policy.
Once defined, the Machine Configuration Operator validates the policy to detect issues such as invalid formatting. Then, it populates the policy into the status.nodeDisruptionPolicyStatus field of the MachineConfiguration object. These user-defined policies override the cluster's default disruption settings.
A default MachineConfiguration custom resource named cluster is installed with the cluster. You can configure node disruption behavior on this resource.
What You Can Control with Node Disruption Policies
Node disruption policies allow you to define what happens when changes are made to the following configuration areas:
-
Files: You can define behavior for file changes (excluding changes to the root directory). By default, file changes do not trigger any disruption. You can modify this behavior using the
spec.defaultNodeDisruptionPolicySpecAction.filesfield. -
Systemd Units: You can create or modify
systemdservices, including enabling, disabling, or changing their state. By default, changes tosystemdunits trigger a node drain and reboot. -
SSH Public Keys: You can add or update SSH keys for the
bootuser. These changes are applied immediately by default and do not trigger a reboot or drain.
Each change is evaluated against the node disruption policy, which can trigger one or more of the following actions:
Reboot: Drain and reboot the node.None: No disruption is triggered; the change is applied silently.Drain: Drain the node without rebooting.Restart: Restart the specifiedsystemdservice.DaemonReload: Reload allsystemdunit configurations.
Example: Default Node Disruption Policy
The following is the default configuration for the cluster MachineConfiguration resource after installation:
This configuration means:
- File changes trigger no action.
- Systemd unit changes cause a node drain and reboot.
- SSH key changes are applied without disruption.
Example: Customizing File Behavior
You can change the default action for file changes to trigger a reboot:
With this configuration, any change to a managed file will cause the Machine Configuration Operator to drain and reboot the affected node.
Example: Applying Policy to a Specific File
You can also define disruption actions for a specific file path. In the following example, changes to /usr/local/bin/myapp.sh will not trigger a node reboot, but instead reload the systemd configuration and restart the related service:
In this case, when /usr/local/bin/myapp.sh is updated, the Machine Configuration Operator will reload all systemd units and restart the myapp.service—without draining or rebooting the node.