Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS Managed Node Groups - Change Disk Size #3180

Open
tonydekeizer opened this issue Oct 16, 2024 · 23 comments
Open

EKS Managed Node Groups - Change Disk Size #3180

tonydekeizer opened this issue Oct 16, 2024 · 23 comments
Labels

Comments

@tonydekeizer
Copy link

tonydekeizer commented Oct 16, 2024

Description

We have a EKS terraform script that generates a EKS Cluster with a EKS Managed node group.
Here is a code snippet of the node group section:

eks_managed_node_groups = {
    default_node_group = {
      name = "managed-ondemand-r5"

      # Starting on 1.30, AL2023 is the default AMI type for EKS managed node groups
      instance_types = var.instance_types
      min_size     = var.min_node_group_size
      max_size     = var.max_node_group_size
      desired_size = var.desired_node_group_size

The cluster and node group creates correctly during the terraform apply and is in production operation.
We have subsequently noticed we need to add more storage to the nodes (logging/images etc) and had not realised the default launch template defaults to 20G Storage per node.

We added the following directive to the above:

disk_size = 100

On running terraform plan it indicates no changes are required.

10:07:48.865 STDOUT terraform: No changes. Your infrastructure matches the configuration.
10:07:48.865 STDOUT terraform: Terraform has compared your real infrastructure against your configuration
10:07:48.865 STDOUT terraform: and found no differences, so no changes are needed.

On doing some research it was suggested to add two additional directives to the default_node_group settinsg to force the change.

create_launch_template = false 
launch_template_name   = ""

On doing a terraform plan we get the following error:

09:47:18.309 STDOUT terraform: Terraform planned the following actions, but then encountered a problem:
09:47:18.310 STDOUT terraform:   # module.eks.module.eks_managed_node_group["default_node_group"].aws_launch_template.this[0] will be destroyed
09:47:18.310 STDOUT terraform:   # (because index [0] is out of range for count)

We are unsure how to get terraform to destroy and build the new node group with the increased disk size. We do not want to do this manually and get our production terraform state out of sync with the actual setup in EKS.

Not sure if this is a bug or we are providing the incorrect directives in our terraform EKS script ?

Cheers
Tony

  • [ X] ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]:
    20.14.0

  • Terraform version:
    1.9.0

  • Provider version(s):

  • provider registry.terraform.io/gavinbunney/kubectl v1.14.0
  • provider registry.terraform.io/hashicorp/aws v5.67.0
  • provider registry.terraform.io/hashicorp/cloudinit v2.3.5
  • provider registry.terraform.io/hashicorp/helm v2.15.0
  • provider registry.terraform.io/hashicorp/kubernetes v2.32.0
  • provider registry.terraform.io/hashicorp/null v3.2.3
  • provider registry.terraform.io/hashicorp/time v0.12.1
  • provider registry.terraform.io/hashicorp/tls v4.0.6

Reproduction Code [Required]

See above
Steps to reproduce the behavior:
See above

No

Yes

see above

Expected behavior

Terraform Plan and apply will destory existing managed nod egroup and add new one using modified launch template and hence increased disk size.

Actual behavior

Terraform plan/apply either do nothing or get an error.

Terminal Output Screenshot(s)

see above

Additional context

@tonydekeizer
Copy link
Author

I have done some more reseach on this and I updated the managed node group using the following directives:

      use_custom_launch_template = false 
      disk_size   = 100

Terraform apply (using a test cluster of course) created a new managed node group, drained and destroyed the old node group succesfully.

On checking the test cluster pods though I noticed that the aws_load_balancer_controller addon fails to start on the new nodes with the following Error log:

{"level":"error","ts":"2024-10-17T00:16:55Z","logger":"setup","msg":"unable to initialize AWS cloud","error":"failed to introspect vpcID from EC2Metadata or Node name, specify --aws-vpc-id instead if EC2Metadata is unavailable: failed to fetch VPC ID from instance metadata: EC2MetadataError │

Any clues why this may have happened ?

Cheers
Tony

@tonydekeizer
Copy link
Author

I was able to rectify the error with the aws-load-balancer-controller deployment by adding the container arg

--aws-vpc-id=XXXXXXXX

and restarting the deployment.

Question is why did the managed node group replacement initiated by the terraform changes above suddenly break the ability of the nodes to access the EC2Metadata thus requiring explicit declaration of the VPC ID ?

Potentially using an updated AMI release ? May need to test this .....

@tonydekeizer
Copy link
Author

Ok, I may have more information on what is happening and hopefully someone can propose a solution or fix.

In my eks terraform script I added the following to update the node group to use a larger disk size :

      use_custom_launch_template = false 
      disk_size   = 100

the use_customer_launch_template= false was required for the EKS terraform module(s) to take any notice of the disk_size directive.

Now comparing the launch templates of my test clusters and managed node groups I found the original (without the use_custom_launch_template = false ) had the following settings in the Advanced Details tab.
image
As soon as I added the use_custom_launch_template = false directive (either on initial build or updating an existing cluster) the associated launch template had this:
image

For IMDSV2, which the EKS AMI defaults to, specific containerised applications that need access to EC2 Metadata require the metadata response hop limit to be greater than 1. This is the case for the AWS Load Balancer Controller which tries to ascertain the VPC ID using the managed node groups EC2 instance metadata service.

Also in my EKS terraform script I have the following directive

metadata_http_put_response_hop_limit = 2

which originally fixed this issue by modifying the created custom launch template hop limit so the AWS Load Balancer Controller deployment worked. Given the new directive stopped customisation of the launch template the EC2 default was used which is currently set to 1 in the

EC2 -> Account Attributes -> Data Protection and Security -> IMDS Defaults

I modified the above and tested again and the new managed node group created still has the Hop Limit set to 1 ??

My only questions are :

  • Why do we need to have the directive use_custom_launch_template = false in order to request a different disk size ?
  • Why does the non customised Launch Template created by the terraform EKS Module when this directive is set force the hop limit to 1 and not use the EC2 settings default ?

Cheers
Tony

@tonydekeizer
Copy link
Author

More information on this.

I managed to change the terraform scripts to allow disk_size setup without breaking the aws load balancer controller deployment.

First add these directives to your managed node group setup. i.e.

use_custom_launch_template = false 
disk_size   = 100

In your eks_addon terraform script make sure you add the necessary aws_load_balancer_controller helm configuration option to set the vpcId :

aws_load_balancer_controller = {
     set = [{
       name  = "enableServiceMutatorWebhook"
       value = "false"
     },
     {
       name = "vpcId"
       value = var.vpc_id
     }]
  }

This way the controller doesn't have to query the nodes EC2metadata ( and fail due to hop limits) for the VpcId ...

Roudabout but works from my tests.

Previous questions still apply though :-)

Cheers
Tony

@bryantbiggs
Copy link
Member

@poussa
Copy link

poussa commented Oct 18, 2024

I am wondering why changing disk_size is behind the use_custom_launch_template = false flag? Similar variables such as instance_type and capacity_type don't require that flag. As stated in the FAQ and here you loose many of the custom launch template features. This is pretty big tradeoff and I don't understand why it is designed so?

20GB is very small for e.g., AI workloads which have big containers.

Any insights?

@bryantbiggs
Copy link
Member

this is not a module question - the module defaults to creating and using a custom launch template in order to give users the widest array of options for customization. This is just how EKS managed node groups work. See https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html

@tonydekeizer
Copy link
Author

Hi
I am a little confused by this now.
If you are building a custom launch template, for very good reason as you and the documentation suggests, why cant we specifiy disk_size unless we tell the module not to build one ?
Can you explain why this is not a module question ?

@poussa
Copy link

poussa commented Oct 21, 2024

@bryantbiggs what is you recommendation for the following use case:

"In my EKS nodes, I need 100GB storage to download and store my container images in order to run them".

  1. Use disk_size = 100 and use_custom_launch_template = false, and loose many features
  2. Add bigger device myself to eks_managed_node_group_defaults like
   block_device_mappings = {
      xvda = {
        device_name = "/dev/xvda"
        ebs         = {
          volume_size           = 100
          volume_type           = "gp3"
          iops                  = 3000
          throughput            = 125
          encrypted             = true
          delete_on_termination = true
        }
      }
    }      
  1. Something else?

@bryantbiggs
Copy link
Member

2

@nelsonpipo
Copy link

@tonydekeizer in case you haven't done it.

Add the section specified by @poussa inside of the eks_manage_node_groups.
It will look like this (using your post example):

eks_managed_node_groups = {
     default_node_group = {
       name = "managed-ondemand-r5"      
       
       block_device_mappings = {
         sda = {
           device_name = "/dev/sda"
           ebs = {
             volume_size = 100
           }
         }
       }

       # Starting on 1.30, AL2023 is the default AMI type for EKS managed node groups
       instance_types = var.instance_types
       min_size     = var.min_node_group_size
       max_size     = var.max_node_group_size
       desired_size = var.desired_node_group_size
 }

@tonydekeizer
Copy link
Author

Hi @nelsonpipo

Thank you. Yes, we have tested this and it works but added the details as per @poussa post i.e

block_device_mappings = {
        xvda = {
          device_name = "/dev/xvda"
          ebs         = {
            volume_size           = 50
            volume_type           = "gp3"
            iops                  = 3000
            throughput            = 125
            encrypted             = true
            delete_on_termination = true
          }
        }
      }

I see you suggest using the devlink sda device mapping .. is that for a particular reason ?

Cheers
Tony

@bryantbiggs
Copy link
Member

bryantbiggs commented Oct 30, 2024

if you describe the AMI, you will see the device names used with the AMI - the Amazon Linux (AL2, AL2023) are typically /dev/xvda for the root volume, Bottlerocket is /dev/xvda for the root volume and /dev/xvdb for the data volume

AL2023:

aws ec2 describe-images --image-id $(aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.31/amazon-linux-2023/x86_64/standard/recommended/image_id \
    --region us-west-2 --query "Parameter.Value" --output text) --region us-west-2
{
    "Images": [
        {
            "PlatformDetails": "Linux/UNIX",
            "UsageOperation": "RunInstances",
            "BlockDeviceMappings": [
                {
                    "Ebs": {
                        "DeleteOnTermination": true,
                        "Iops": 3000,
                        "SnapshotId": "snap-0be3ceb0e2f0255e7",
                        "VolumeSize": 20,
                        "VolumeType": "gp3",
                        "Throughput": 125,
                        "Encrypted": false
                    },
                    "DeviceName": "/dev/xvda"
                }
            ],
            "Description": "EKS-optimized Kubernetes node based on Amazon Linux 2023, (k8s: 1.31.0, containerd: 1.7.*)",
            "EnaSupport": true,
            "Hypervisor": "xen",
            "ImageOwnerAlias": "amazon",
            "Name": "amazon-eks-node-al2023-x86_64-standard-1.31-v20241024",
            "RootDeviceName": "/dev/xvda",
            "RootDeviceType": "ebs",
            "SriovNetSupport": "simple",
            "VirtualizationType": "hvm",
            "BootMode": "uefi-preferred",
            "DeprecationTime": "2026-10-24T07:07:53.000Z",
            "ImdsSupport": "v2.0",
            "ImageId": "ami-00369ea992801deb2",
            "ImageLocation": "amazon/amazon-eks-node-al2023-x86_64-standard-1.31-v20241024",
            "State": "available",
            "OwnerId": "602401143452",
            "CreationDate": "2024-10-24T07:07:53.000Z",
            "Public": true,
            "Architecture": "x86_64",
            "ImageType": "machine"
        }
    ]
}

Bottlerocket

aws ec2 describe-images --image-id $(aws ssm get-parameter --name /aws/service/bottlerocket/aws-k8s-1.31/x86_64/latest/image_id \
    --region us-west-2 --query "Parameter.Value" --output text) --region us-west-2
{
    "Images": [
        {
            "PlatformDetails": "Linux/UNIX",
            "UsageOperation": "RunInstances",
            "BlockDeviceMappings": [
                {
                    "Ebs": {
                        "DeleteOnTermination": true,
                        "SnapshotId": "snap-0fa9c2e0271b950f3",
                        "VolumeSize": 2,
                        "VolumeType": "gp2",
                        "Encrypted": false
                    },
                    "DeviceName": "/dev/xvda"
                },
                {
                    "Ebs": {
                        "DeleteOnTermination": true,
                        "SnapshotId": "snap-0abff213fa53bbbef",
                        "VolumeSize": 20,
                        "VolumeType": "gp2",
                        "Encrypted": false
                    },
                    "DeviceName": "/dev/xvdb"
                }
            ],
            "Description": "bottlerocket-aws-k8s-1.31-x86_64-v1.26.1-943d9a41",
            "EnaSupport": true,
            "Hypervisor": "xen",
            "ImageOwnerAlias": "amazon",
            "Name": "bottlerocket-aws-k8s-1.31-x86_64-v1.26.1-943d9a41",
            "RootDeviceName": "/dev/xvda",
            "RootDeviceType": "ebs",
            "SriovNetSupport": "simple",
            "VirtualizationType": "hvm",
            "BootMode": "uefi-preferred",
            "DeprecationTime": "2026-10-24T21:49:18.000Z",
            "ImageId": "ami-056fd8b527acedaca",
            "ImageLocation": "amazon/bottlerocket-aws-k8s-1.31-x86_64-v1.26.1-943d9a41",
            "State": "available",
            "OwnerId": "651937483462",
            "CreationDate": "2024-10-24T21:49:18.000Z",
            "Public": true,
            "Architecture": "x86_64",
            "ImageType": "machine"
        }
    ]
}

@tonydekeizer
Copy link
Author

Thnaks @bryantbiggs , makes sense. :-)

@sadath-12
Copy link

Hi @tonydekeizer , I tried your example . It does create the volume but does not increase the ephermal storage of the node when I describe it . wondering if you got across it

@sadath-12
Copy link

 eks_managed_node_groups = {
    custom = {
      ami_id                     = data.aws_ami.ubuntu22.id
      instance_types             = var.instance_types
      enable_bootstrap_user_data = true
      min_size                   = var.min_nodes
      max_size                   = var.max_nodes
      desired_size               = var.desired_size

      block_device_mappings = {
        xvda = {
          device_name = "/dev/xvda"
          ebs         = {
            volume_size           = 500
            volume_type           = "gp3"
            iops                  = 3000
            throughput            = 125
            encrypted             = true
            delete_on_termination = true
          }
        }
      }

    }
  }

Here is my configuration

@dataviruset
Copy link

Here is my configuration

This would be great to add to the FAQ document in the section that talks about setting disk_size. I tried to set use_custom_launch_template = false with a larger disk_size for only one specific node group and then I ended up with network connectivity issues between the nodes instead because it seems that even though I set vpc_security_group_ids it didn't take effect.

@sadath-12
Copy link

Yup , In my case I ended up not having permissions on node to attach ebs volumes on PVC's

@bryantbiggs
Copy link
Member

It's the very first FAQ https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/faq.md

Also, PV/PVCs have nothing to do with the node volume

@sadath-12
Copy link

sadath-12 commented Nov 29, 2024

I mean as soon as I configure use_custom_launch_template = false I was getting this issue for creating pvc's so I guess the template got changed which does not have IMDS

could not create volume in EC2: operation error EC2: CreateVolume, get identity: get credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, request canceled, context deadline exceeded"

@dataviruset
Copy link

It's the very first FAQ https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/faq.md

Yes, but there's no mention of the block_device_mappings solution so that people don't need to disable the custom launch template that might potentially create other problems. It would be great to have that mentioned as an option for those who don't want to set use_custom_launch_template = false.

@sadath-12
Copy link

In my case setting the right device name made it work and increased the disk size

@Daemoen
Copy link

Daemoen commented Dec 12, 2024

Like others here, I've argued with this issue multiple times in the past, as well as in the present. A while back, I did find the block_device_mappings, and that works for the 'default' settings, but it does not seem to override, even when you use the block_device_settings in the managed node groups stanza:

  eks_managed_node_group_defaults = {
    metadata_options = {
      http_tokens                 = "required"
      http_endpoint               = "enabled"
      http_put_response_hop_limit = 2
    }

    iam_role_additional_policies = {
      "AmazonEC2FullAccess" = "arn:aws:iam::aws:policy/AmazonEC2FullAccess",
    }

    ebs_optimized = true

    block_device_mappings = {
      xvda = {
        device_name = "/dev/xvda" # This is the default device name for the root volume
        ebs = {
          volume_size           = 100
          volume_type           = "gp3"
          delete_on_termination = true
          encrypted             = true
        }
      }
    }

    tags = merge(local.tags, {
      Name = "${local.name}-eks-managed-node-group"
    })
  }

Actual managed node group setting:

eks_managed_node_groups = {
  general = {
    node_group_name = "company-production-eks-general-nodes"
    key_name        = "general_nodes-prod-key"

    instance_types = ["r7i.2xlarge"]
    ami_type       = "AL2023_x86_64_STANDARD"

    capacity_type = "ON_DEMAND"
    desired_size  = 3
    max_size      = 3
    min_size      = 3

    disk_size = 250
    block_device_mappings = {
      xvda = {
        device_name = "/dev/xvda" # This is the default device name for the root volume
        ebs = {
          volume_size           = 250
          volume_type           = "gp3"
          delete_on_termination = true
          encrypted             = true
        }
      }
    }

    taints = []
    labels = {}

    create_launch_template = true
    tags = {
      Name = "company-production-eks-general-nodes"
    }
  }
}

If I run a plan with the above definitions, as suggested in this set of postings, I would expect that the general node group gets an overridden disk with size of 250. It doesn't. It gets the default disk size, unfortunately.

So what is the correct way to use? The device mapping is correct, I even verified that. I'm on the latest 20.x so it's not that a bug was fixed.

Edit: Updated to the correct settings for block device settings. I experimented with multiple things. None of them worked ultimately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants