Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot launch tasks on Ubuntu 22.04 #3227

Closed
sunds opened this issue May 27, 2022 · 9 comments
Closed

Cannot launch tasks on Ubuntu 22.04 #3227

sunds opened this issue May 27, 2022 · 9 comments
Labels

Comments

@sunds
Copy link

sunds commented May 27, 2022

Summary

OS: Ubuntu 22.04 (LTS)
ECS agent version="1.61.1" commit="8dc9fdeb"

Containers will not start.

Description

err=cgroupv2 create: unable to create v2 manager: dial unix /run/systemd/private: connect: no such file or directory

The problem is ECS agent runs in docker and /run/systemd/private is not mounted into the container. Editing the container config to add that bind mount worked around the problem.

Expected Behavior

Container runs

Observed Behavior

Launch fails due to missing bind mount

Environment Details

curl http://localhost:51678/v1/metadata
{"Cluster":"dsunds-test-1","ContainerInstanceArn":"arn:aws:ecs:us-east-1:585275055393:container-instance/dsunds-test-1/17da2f096e234930a8ea495d5cb6b575","Version":"Amazon ECS Agent - v1.61.1 (8dc9fde)"}

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04 LTS
Release: 22.04
Codename: jammy

Deployed onto bare metal server

Supporting Log Snippets

Error from ECS agent log:
cgroup: unable to create cgroup taskARN=arn:aws:ecs:us-east-1:585275055393:task/dsunds-test-1/383621ce97f643749b2c06061d345884 cgroupPath=ecstasks-383621ce97f643749b2c06061d345884.slice cgroupV2=true err=cgroupv2 create: unable to create v2 manager: dial unix /run/systemd/private: connect: no such file or directory"

The relevant part being that last error. Digging into the source it is trying to make a connection to the private DBUS socket

@sunds
Copy link
Author

sunds commented May 27, 2022

It is worth noting that direct access to /run/systemd/private happens only if the dbus daemon cannot be contacted:

// NewWithContext establishes a connection to any available bus and authenticates.
// Callers should call Close() when done with the connection.
func NewWithContext(ctx context.Context) (*Conn, error) {
conn, err := NewSystemConnectionContext(ctx)
if err != nil && os.Geteuid() == 0 {
return NewSystemdConnectionContext(ctx)
}
return conn, err
}

func NewWithContext(ctx context.Context) (*Conn, error) {

@sunds
Copy link
Author

sunds commented May 27, 2022

The problem was apparmor on this system blocking the call to DBUS.

apparmor_status apparmor module is loaded. 38 profiles are loaded. 37 profiles are in enforce mode. ... docker-default

Log:

May 27 03:13:12 garage kernel: [15540.770327] audit: type=1107 audit(1653621192.007:94): pid=759 uid=103 auid=4294967295 ses=4294967295 subj=? msg='apparmor="DENIED" operation="dbus_method_call" bus="system" path="/org/freedesktop/DBus" interface="org.freedesktop.DBus" member="Hello" mask="send" name="org.freedesktop.DBus" pid=5440 label="docker-default" peer_label="unconfined"

Adding --security-opt apparmor:unconfined to the docker run resolved this issue. However this is not the default when it is being installed from https://amazon-ecs-agent.s3.amazonaws.com/ecs-anywhere-install-latest.sh

Perhaps this issue should be moved to https://github.com/aws/amazon-ecs-init ?

Working command:

docker run \
  --name "/ecs-agent" \
  --runtime "runc" \
  --volume "/var/run:/var/run" \
  --volume "/var/log/ecs:/log" \
  --volume "/var/lib/ecs/data:/data" \
  --volume "/etc/ecs:/etc/ecs" \
  --volume "/var/cache/ecs:/var/cache/ecs" \
  --volume "/sys/fs/cgroup:/sys/fs/cgroup" \
  --volume "/var/lib/ecs:/var/lib/ecs" \
  --volume "/var/log/ecs/exec:/log/exec" \
  --volume "/etc/ssl:/etc/ssl:ro" \
  --volume "/root/.aws:/rotatingcreds:ro" \
  --volume "/run/docker/plugins:/run/docker/plugins:ro" \
  --volume "/etc/docker/plugins:/etc/docker/plugins:ro" \
  --volume "/usr/lib/docker/plugins:/usr/lib/docker/plugins:ro" \
  --volume "/var/lib/ecs/deps/execute-command/bin:/managed-agents/execute-command/bin:ro" \
  --volume "/var/lib/ecs/deps/execute-command/config:/managed-agents/execute-command/config" \
  --volume "/var/lib/ecs/deps/execute-command/certs:/managed-agents/execute-command/certs:ro" \
  --volume "/proc:/host/proc:ro" \
  --volume "/usr/lib:/usr/lib:ro" \
  --volume "/lib:/lib:ro" \
  --volume "/usr/lib64:/usr/lib64:ro" \
  --volume "/lib64:/lib64:ro" \
  --volume "/sbin:/host/sbin:ro" \
  --volume "/etc/alternatives:/etc/alternatives:ro" \
  --volume "/usr/sbin:/usr/sbin:ro" \
  --log-driver "json-file" \
  --log-opt max-file="4" \
  --log-opt max-size="16m" \
  --restart "" \
  --network "host" \
  --hostname "garage" \
  --expose "51678/tcp" \
  --expose "51679/tcp" \
  --env "ECS_DATADIR=/data" \
  --env "ECS_ENABLE_TASK_IAM_ROLE=true" \
  --env "ECS_UPDATE_DOWNLOAD_DIR=/var/cache/ecs" \
  --env "ECS_EXTERNAL=true" \
  --env "ECS_CLUSTER=dsunds-test-1" \
  --env "ECS_LOGFILE=/log/ecs-agent.log" \
  --env "ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true" \
  --env "ECS_VOLUME_PLUGIN_CAPABILITIES=[\"efsAuth\"]" \
  --env "ECS_UPDATES_ENABLED=true" \
  --env "ECS_AVAILABLE_LOGGING_DRIVERS=[\"json-file\",\"syslog\",\"awslogs\",\"fluentd\",\"none\"]" \
  --env "ECS_AGENT_LABELS=" \
  --env "ECS_AGENT_CONFIG_FILE_PATH=/etc/ecs/ecs.config.json" \
  --env "SSL_CERT_DIR=/etc/ssl/certs" \
  --env "ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true" \
  --env "AWS_DEFAULT_REGION=us-east-1" \
  --env "ECS_ENABLE_TASK_ENI=false" \
  --env "PATH=/host/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
  --detach \
  --entrypoint "/agent" \
  --security-opt apparmor:unconfined \
  "amazon/amazon-ecs-agent:latest" 

@Realmonia
Copy link
Contributor

Thanks for reporting! Currently Ubuntu22 is not an officially supported platform. ref This is tracked internally and will post there about the updates

@Realmonia Realmonia added the kind/tracking This issue is being tracked internally label Aug 1, 2022
@shanet
Copy link

shanet commented Oct 12, 2022

I ran into the same issue and fixed it by adding a custom apparmor profile that allows access to dbus as such:

#include <tunables/global>

profile docker-ecs-agent flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/base>
  network,
  capability,
  file,
  umount,

  # Host (privileged) processes may send signals to container processes.
  signal (receive) peer=unconfined,
  # dockerd may send signals to container processes (for "docker kill").
  signal (receive) peer=unconfined,
  # Container processes may send signals amongst themselves.
  signal (send,receive) peer=docker-datadog-agent,

  deny @{PROC}/* w,   # deny write for all files directly in /proc (not in a subdir)
  # deny write to files not in /proc/<number>/** or /proc/sys/**
  deny @{PROC}/{[^1-9],[^1-9][^0-9],[^1-9s][^0-9y][^0-9s],[^1-9][^0-9][^0-9][^0-9]*}/** w,
  deny @{PROC}/sys/[^k]** w,  # deny /proc/sys except /proc/sys/k* (effectively /proc/sys/kernel)
  deny @{PROC}/sys/kernel/{?,??,[^s][^h][^m]**} w,  # deny everything except shm* in /proc/sys/kernel/
  deny @{PROC}/sysrq-trigger rwklx,
  deny @{PROC}/kcore rwklx,
  deny mount,
  deny /sys/[^f]*/** wklx,
  deny /sys/f[^s]*/** wklx,
  deny /sys/fs/[^c]*/** wklx,
  deny /sys/fs/c[^g]*/** wklx,
  deny /sys/fs/cg[^r]*/** wklx,
  deny /sys/firmware/** rwklx,
  deny /sys/kernel/security/** rwklx,

  # suppress ptrace denials when using 'docker ps' or using 'ps' inside a container
  ptrace (trace,read,tracedby,readby) peer=docker-datadog-agent,

  # suppress ptrace denials when agent and process-agent are accessing /proc
  ptrace (read),

  # The ECS Agent needs access to dbus in order to launch tasks
  dbus (send, receive, bind),
}

Then ran systemctl reload apparmor to pick up the new profile and finally ran the ECS agent task with --security-opt apparmor=docker-ecs-agent to use it.

@stuart-warren
Copy link

stuart-warren commented Oct 27, 2022

After talking to Canonical support about this, just to get everything straight in my head I believe the issue is:

Ubuntu 22 now used cgroupv2 which is a change, so

m, err := cgroupsv2.NewSystemd(parentCgroupSlice, cgroupPath, generalSlicePID, cgroupsv2.ToResources(cgroupSpec.Specs))

calls

conn, err := systemdDbus.NewWithContext(ctx)

a function that attempts to call org.freedesktop.DBus.Hello as part of the connection process

if that fails it will try to use the /run/systemd/private socket directly as mentioned above

Ubuntu 22 allows the docker-default apparmor profile to contact dbus, but not call org.freedesktop.DBus.Hello only peer to peer connections

ecs-init doesn't currently mount in the /run/systemd/private socket to the ecs-agent container

If you have the ability to tweak the apparmor profile then the above post may work for now, we are on ubuntu core 22 without that ability and have already had to patch ecs-init to make start, so will probably have to add in the extra container mount point to our local patch

@sunds
Copy link
Author

sunds commented Oct 27, 2022

Thanks for the additional detail.

I recommend you either run the agent with --security-opt apparmor:unconfined or load a new apparmor profile for Docker that allows the dbus call. Running the agent with unconfined should not increase risk as it already has broad permissions and host networking.

If you want to use a modified profile, the one posted by @shanet is good. If you want to double check start with the Docker default profile https://github.com/moby/moby/tree/master/profiles/apparmor and add the extra dbus directive. You can scope it a bit more tightly:

# ECS agent requires DBUS send
dbus (send)
  bus=system,


Here is my complete profile as of several weeks ago:

#include <tunables/global>


profile docker-default flags=(attach_disconnected, mediate_deleted) {

#include <abstractions/base>


network,
capability,
file,
umount,

# Host (privileged) processes may send signals to container processes.
signal (receive) peer=unconfined,
# dockerd may send signals to container processes (for "docker kill").
signal (receive) peer=unconfined,
# Container processes may send signals amongst themselves.
signal (send,receive) peer=docker-default,

# ECS agent requires DBUS send
dbus (send)
  bus=system,

deny @{PROC}/* w,   # deny write for all files directly in /proc (not in a subdir)
# deny write to files not in /proc/<number>/** or /proc/sys/**
deny @{PROC}/{[^1-9],[^1-9][^0-9],[^1-9s][^0-9y][^0-9s],[^1-9][^0-9][^0-9][^0-9/]*}/** w,
deny @{PROC}/sys/[^k]** w,  # deny /proc/sys except /proc/sys/k* (effectively /proc/sys/kernel)
deny @{PROC}/sys/kernel/{?,??,[^s][^h][^m]**} w,  # deny everything except shm* in /proc/sys/kernel/
deny @{PROC}/sysrq-trigger rwklx,
deny @{PROC}/kcore rwklx,

deny mount,

deny /sys/[^f]*/** wklx,
deny /sys/f[^s]*/** wklx,
deny /sys/fs/[^c]*/** wklx,
deny /sys/fs/c[^g]*/** wklx,
deny /sys/fs/cg[^r]*/** wklx,
deny /sys/firmware/** rwklx,
deny /sys/kernel/security/** rwklx,


# suppress ptrace denials when using 'docker ps' or using 'ps' inside a container
ptrace (trace,read,tracedby,readby) peer=docker-default,
}

Write this file into /etc/apparmor.d/docker-default

You can install docker and then overwrite the default profile with this command:

apparmor_parser -r docker-default
If this works for your case then a modified ecs-init should not be necessary.

Alternatively if you are modifying ecs-init you can run just the agent with the modified profile or unconfined.

--security-opt apparmor=your_agent_profile or
--security-opt apparmor:unconfined

@yoelvd
Copy link

yoelvd commented Feb 2, 2023

Thanks to @sunds and @shanet today I could run some task in our ECS cluster with external on-prem docker instance which is running ubuntu22.04. Thanks again bros, keep it going!

@chienhanlin
Copy link
Contributor

Thanks @sunds and @shanet very much for bringing up this issue and sharing workaround with us. I am able to reproduce the issue, and use the custom AppArmor profile as a workaround.


Repro setup

  • AMI: "ubuntu-minimal/images/hvm-ssd/ubuntu-jammy-22.04-amd64-minimal-20230302"
  • Registered the Ubuntu 22.04 external instance with ECS Agent version 1.69.0
$ curl -s 127.0.0.1:51678/v1/metadata | python2 -mjson.tool
{
    "Cluster": "default",
    "ContainerInstanceArn": "xxx",
    "Version": "Amazon ECS Agent - v1.69.0 (*b32ab075)"
}
  • Created a custom AppArmor profile, shared by @sunds and @shanet
  • Reloaded the AppArmor profile
  • Stopped ECS Agent and Docker
  • Restarted Docker and ECS Agent
  • Deployed an ECS task through ECS console, and the task reached to running state

As Ubuntu 22.04 is not officially support by ECS Anywhere, and workarounds are available, this issue will be closed. Please feel free to open new issues and track the latest supported operating systems and system architectures via the public documentation.

Thanks.

paulswartz added a commit to paulswartz/on_prem_deploy that referenced this issue Jun 15, 2023
It needs some additional permissions to work with ECS Anywhere on Ubuntu
22.

Upstream issue: aws/amazon-ecs-agent#3227
paulswartz added a commit to paulswartz/on_prem_deploy that referenced this issue Jun 15, 2023
It needs some additional permissions to work with ECS Anywhere on Ubuntu
22.

Upstream issue: aws/amazon-ecs-agent#3227
paulswartz added a commit to mbta/on_prem_deploy that referenced this issue Jun 16, 2023
It needs some additional permissions to work with ECS Anywhere on Ubuntu
22.

Upstream issue: aws/amazon-ecs-agent#3227
@sparrc
Copy link
Contributor

sparrc commented Jan 16, 2024

Hi everyone, this is now supported in agent/init version 1.80.0: https://github.com/aws/amazon-ecs-agent/releases.

Support was added via this PR: #4062

Working on updating the docs now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants