Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] Ensure ASG Max Size Reverts to Original Value After EKS Upgrade Workflow with Cluster Autoscaler #2500

Open
olekszhel opened this issue Dec 15, 2024 · 0 comments
Labels
EKS Managed Nodes EKS Managed Nodes EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue

Comments

@olekszhel
Copy link

Description:
During an EKS managed node group (MNG) upgrade, the Auto Scaling Group (ASG) max size can temporarily increase as part of the upgrade workflow. However, if the Cluster Autoscaler (CA) triggers scale-up activities during the upgrade process (especially during the scale-down phase), the ASG max size is not reverted to its original value after the upgrade completes. This results in unpredictable increases in the ASG max size, creating operational challenges and deviating from the documented behavior.

Expected Behavior:

The EKS upgrade workflow should ensure that the ASG max size is always reverted to its original value after the upgrade completes, even if the Cluster Autoscaler scales up the node group during the upgrade process.
This behavior should align with the documented logic for temporary ASG max size increases during the scale-up phase (e.g., twice the number of availability zones or based on the maxUnavailable value).

Current Behavior:

If the Cluster Autoscaler triggers scaling during the upgrade process, the ASG max size is left in an increased state (e.g., unpredictably higher than the original value).
This causes potential overprovisioning of nodes and additional costs due to an unintended increase in ASG max size.

Proposed Solution:
Introduce a mechanism in the EKS managed node group upgrade workflow to:

  • Track the original ASG max size before the upgrade begins.
  • Automatically restore the original ASG max size once the upgrade process is complete, regardless of any Cluster Autoscaler-triggered activities.

Benefit to Customers:

  • Ensures predictable and stable behavior of ASGs post-upgrade.
  • Prevents unexpected resource usage and costs due to unintended increases in ASG max size.
  • Improves alignment between EKS upgrade workflows and documented behavior, enhancing operational trust and ease of use.

Additional Context:
This feature is especially critical for users leveraging Cluster Autoscaler to dynamically manage node group sizes in production environments. The current behavior introduces unnecessary manual intervention to reset ASG configurations after upgrades, which this enhancement could automate and eliminate.

CaseID where I was suggested to create this feature request Case ID: 173143486100288

@olekszhel olekszhel added the Proposed Community submitted issue label Dec 15, 2024
@mikestef9 mikestef9 added EKS Amazon Elastic Kubernetes Service EKS Managed Nodes EKS Managed Nodes labels Dec 16, 2024
@mikestef9 mikestef9 changed the title Ensure ASG Max Size Reverts to Original Value After EKS Upgrade Workflow with Cluster Autoscaler [EKS] Ensure ASG Max Size Reverts to Original Value After EKS Upgrade Workflow with Cluster Autoscaler Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Managed Nodes EKS Managed Nodes EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue
Projects
None yet
Development

No branches or pull requests

2 participants