You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems impossible to define default cgroups settings for tasks.
Description
We are using ECS via AWS Batch to launch multiple jobs with heavy i/o
The heavy i/o is causing noisy neighbor issues and we would like to limit it. Specifically, Batch (and ecs-agent) fail to spin up new Docker containers, they time out do to delays caused by the heavy I/O. And this is on a Nitro SSD.
Since there is no way to limit file i/o via AWS Batch, we wanted to configure cgroups (v1) as part of our launch configuration to accomplish this.
The partial launch configuration is:
# Get the major/minor versions of raid drive and set the limits to 50% so one job won't block other jobs from starting
MAJOR=`stat -c %t /dev/md0`
MINOR=`stat -c %T /dev/md0`
printf "$MAJOR:$MINOR 4000000000" | sudo tee /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device > /dev/null
printf "$MAJOR:$MINOR 500000" | sudo tee /sys/fs/cgroup/blkio/blkio.throttle.read_iops_device > /dev/null
printf "$MAJOR:$MINOR 2800000000" | sudo tee /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device > /dev/null
printf "$MAJOR:$MINOR 400000" | sudo tee /sys/fs/cgroup/blkio/blkio.throttle.write_iops_device > /dev/null
sudo mkdir -p /sys/fs/cgroup/blkio/ecs
printf "$MAJOR:$MINOR 4000000000" | sudo tee /sys/fs/cgroup/blkio/ecs/blkio.throttle.read_bps_device > /dev/null
printf "$MAJOR:$MINOR 500000" | sudo tee /sys/fs/cgroup/blkio/ecs/blkio.throttle.read_iops_device > /dev/null
printf "$MAJOR:$MINOR 2800000000" | sudo tee /sys/fs/cgroup/blkio/ecs/blkio.throttle.write_bps_device > /dev/null
printf "$MAJOR:$MINOR 400000" | sudo tee /sys/fs/cgroup/blkio/ecs/blkio.throttle.write_iops_device > /dev/null
sudo systemctl daemon-reload
sudo systemctl restart docker
Expected Behavior
The launch configuration sets these values as expected, and since the ecs-agent starts jobs under the ecs cgroup, the expectation is that the above would cause all tasks to inherit these settings.
Observed Behavior
All tasks start with empty cgroups and do not inherit from the ecs parent.
If I start a task and then manually adjust the cgroups from the host that works. But since we don't know the task id / container id ahead of time, this is not a viable solution for us.
Finally, there seems to be almost no documentation available for any of this, though limiting i/o so that multiple jobs can safely run on a single node seems to be a pretty standard use case.
We switched to cgroups 2 and Amazon Linux 2023, same problem.
We are testing a workaround where we use inotifywait to watch cgroups and set them once ECS spins up a task. Seems to work, but definitely feels super hacky.
Summary
It seems impossible to define default cgroups settings for tasks.
Description
Expected Behavior
The launch configuration sets these values as expected, and since the
ecs-agent
starts jobs under the ecs cgroup, the expectation is that the above would cause all tasks to inherit these settings.Observed Behavior
All tasks start with empty cgroups and do not inherit from the ecs parent.
If I start a task and then manually adjust the cgroups from the host that works. But since we don't know the task id / container id ahead of time, this is not a viable solution for us.
Finally, there seems to be almost no documentation available for any of this, though limiting i/o so that multiple jobs can safely run on a single node seems to be a pretty standard use case.
Environment Details
Supporting Log Snippets
TBD
The text was updated successfully, but these errors were encountered: