You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ECS Agent today only restarts the Service Connect relay agent upon failure (code reference). If relay exits due to a signal or is stopped manually, it's not started again. This would not only impact all the existing SC tasks on the instance but also fail any new SC tasks that are placed on the instance requiring a restart of the ECS agent or the instance to recover.
We should update the agent to restart relay in all cases except when stopped gracefully by the ECS agent itself. At the very least we should check it's status before placing a new SC task on the instance and restart if not running.
logger.Error("Aborting cleanup for task as it is not reported as stopped", logger.Fields{
field.TaskID: mtask.GetID(),
field.SentStatus: mtask.GetSentStatus().String(),
})
return
}
logger.Info("Cleaning up task's containers and data", logger.Fields{
field.TaskID: mtask.GetID(),
})
Expected Behavior
ECS agent should ensure the relay is running before placing any SC task on the instance.
Observed Behavior
ECS agent only bootstraps the relay before placing "first" SC task on the instance and doesn't restart upon it's exit (without failure).
Environment Details
Supporting Log Snippets
Relay Agent logs
[2024-11-26 06:45:07.116][18][info][main] [source/server/server.cc:932] all clusters initialized. initializing init manager
[2024-11-26 06:45:07.116][18][info][config] [source/common/listener_manager/listener_manager_impl.cc:926] all dependencies initialized. starting workers
[2024-12-03 05:09:50.367][7][warning] [AppNet Agent] [Envoy process 18] Exited with code [-1]
[2024-12-03 05:09:50.368][7][warning] [AppNet Agent] [Envoy process 18] Additional Exit data: [Core Dump: false][Normal Exit: false][Process Signalled: true]
ECS Agent logs
level=warn time=2024-12-03T17:20:21Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1102 MaxAttempts=8640
level=warn time=2024-12-03T17:20:51Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1103 MaxAttempts=8640
level=warn time=2024-12-03T17:21:21Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1104 MaxAttempts=8640
level=warn time=2024-12-03T17:21:51Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1105 MaxAttempts=8640
level=warn time=2024-12-03T17:22:21Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1106 MaxAttempts=8640
level=warn time=2024-12-03T17:22:51Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1107 MaxAttempts=8640
level=warn time=2024-12-03T17:23:21Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1108 MaxAttempts=8640
The text was updated successfully, but these errors were encountered:
Description
ECS Agent today only restarts the Service Connect relay agent upon failure (code reference). If relay exits due to a signal or is stopped manually, it's not started again. This would not only impact all the existing SC tasks on the instance but also fail any new SC tasks that are placed on the instance requiring a restart of the ECS agent or the instance to recover.
We should update the agent to restart relay in all cases except when stopped gracefully by the ECS agent itself. At the very least we should check it's status before placing a new SC task on the instance and restart if not running.
amazon-ecs-agent/agent/engine/serviceconnect/manager_linux.go
Line 350 in 41d593c
amazon-ecs-agent/agent/engine/task_manager.go
Lines 1558 to 1570 in a080504
Expected Behavior
ECS agent should ensure the relay is running before placing any SC task on the instance.
Observed Behavior
ECS agent only bootstraps the relay before placing "first" SC task on the instance and doesn't restart upon it's exit (without failure).
Environment Details
Supporting Log Snippets
Relay Agent logs
ECS Agent logs
The text was updated successfully, but these errors were encountered: