Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design a new heartbeat system for PANIC Alerter #277

Open
3 tasks
dillu24 opened this issue Jun 9, 2022 · 0 comments
Open
3 tasks

Design a new heartbeat system for PANIC Alerter #277

dillu24 opened this issue Jun 9, 2022 · 0 comments

Comments

@dillu24
Copy link
Collaborator

dillu24 commented Jun 9, 2022

Rationale

PANIC has a heartbeat mechanism integrated through RabbitMQ. This heartbeat mechanism is used by the Telegram Commands Handler (TCH) and the Slack Commands Handler (SCH) to give a live status of the tool. If a component did not send a heartbeat after X seconds then the SCH and the TCH would declare that component as down, and thus whenever the user types the /status command they will be notified that that particular component is down.

The heartbeat mechanism works as follows:

  • The health-checker sends a ping request to the PING rabbit exchange
  • Every manager component subscribes to the PING rabbit exchange so that whenever a ping is received, they could respond with a heartbeat
  • Upon a ping, the manager checks whether each child processes is running and sends a heartbeat with a list of processes which are running and a list of processes which are not running

From the above one can conclude that the current heartbeat mechanism doesn't truly check whether a component is executing or not, as a process might be running but it may be running into difficulties. It is important to note that the heartbeat mechanism was designed this way because there was no other way to unblock a process which is waiting to consume data on a rabbit blocking channel. Similarly, a monitor executes every X seconds, therefore while sleeping it could net send any heartbeats.

Therefore, the aim of this ticket is to re-think this design and possibly come up with a better one.

Notes:

For ticket closure

Come up with a heartbeat mechanism design and do the following:

  • Present the mechanism to the team
  • Document this design on confluence
  • Create tickets that implement this design
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant