[EKS] Ensure ASG Max Size Reverts to Original Value After EKS Upgrade Workflow with Cluster Autoscaler #2500
Labels
EKS Managed Nodes
EKS Managed Nodes
EKS
Amazon Elastic Kubernetes Service
Proposed
Community submitted issue
Description:
During an EKS managed node group (MNG) upgrade, the Auto Scaling Group (ASG) max size can temporarily increase as part of the upgrade workflow. However, if the Cluster Autoscaler (CA) triggers scale-up activities during the upgrade process (especially during the scale-down phase), the ASG max size is not reverted to its original value after the upgrade completes. This results in unpredictable increases in the ASG max size, creating operational challenges and deviating from the documented behavior.
Expected Behavior:
The EKS upgrade workflow should ensure that the ASG max size is always reverted to its original value after the upgrade completes, even if the Cluster Autoscaler scales up the node group during the upgrade process.
This behavior should align with the documented logic for temporary ASG max size increases during the scale-up phase (e.g., twice the number of availability zones or based on the maxUnavailable value).
Current Behavior:
If the Cluster Autoscaler triggers scaling during the upgrade process, the ASG max size is left in an increased state (e.g., unpredictably higher than the original value).
This causes potential overprovisioning of nodes and additional costs due to an unintended increase in ASG max size.
Proposed Solution:
Introduce a mechanism in the EKS managed node group upgrade workflow to:
Benefit to Customers:
Additional Context:
This feature is especially critical for users leveraging Cluster Autoscaler to dynamically manage node group sizes in production environments. The current behavior introduces unnecessary manual intervention to reset ASG configurations after upgrades, which this enhancement could automate and eliminate.
CaseID where I was suggested to create this feature request Case ID: 173143486100288
The text was updated successfully, but these errors were encountered: