Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always Restart Service Connect Relay agent upon exit #4459

Open
karanvasnani opened this issue Dec 20, 2024 · 0 comments
Open

Always Restart Service Connect Relay agent upon exit #4459

karanvasnani opened this issue Dec 20, 2024 · 0 comments

Comments

@karanvasnani
Copy link

Description

ECS Agent today only restarts the Service Connect relay agent upon failure (code reference). If relay exits due to a signal or is stopped manually, it's not started again. This would not only impact all the existing SC tasks on the instance but also fail any new SC tasks that are placed on the instance requiring a restart of the ECS agent or the instance to recover.
We should update the agent to restart relay in all cases except when stopped gracefully by the ECS agent itself. At the very least we should check it's status before placing a new SC task on the instance and restart if not running.

Expected Behavior

ECS agent should ensure the relay is running before placing any SC task on the instance.

Observed Behavior

ECS agent only bootstraps the relay before placing "first" SC task on the instance and doesn't restart upon it's exit (without failure).

Environment Details

Supporting Log Snippets

Relay Agent logs

[2024-11-26 06:45:07.116][18][info][main] [source/server/server.cc:932] all clusters initialized. initializing init manager
[2024-11-26 06:45:07.116][18][info][config] [source/common/listener_manager/listener_manager_impl.cc:926] all dependencies initialized. starting workers
[2024-12-03 05:09:50.367][7][warning] [AppNet Agent] [Envoy process 18] Exited with code [-1]
[2024-12-03 05:09:50.368][7][warning] [AppNet Agent] [Envoy process 18] Additional Exit data: [Core Dump: false][Normal Exit: false][Process Signalled: true]

ECS Agent logs

level=warn time=2024-12-03T17:20:21Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1102 MaxAttempts=8640
level=warn time=2024-12-03T17:20:51Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1103 MaxAttempts=8640
level=warn time=2024-12-03T17:21:21Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1104 MaxAttempts=8640
level=warn time=2024-12-03T17:21:51Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1105 MaxAttempts=8640
level=warn time=2024-12-03T17:22:21Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1106 MaxAttempts=8640
level=warn time=2024-12-03T17:22:51Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1107 MaxAttempts=8640
level=warn time=2024-12-03T17:23:21Z msg="Blocking cleanup until the task has been reported stopped" task="service-connect-relay-f90c037a-abc1-11ef-a173-064ee580d269" sentStatus="NONE" Attempt=1108 MaxAttempts=8640
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant