-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: health status implementation #1406
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome job on the well-commented code :)
func (hc *HealthChecker) StartHealthCheck(ctx context.Context) { | ||
// Goroutine to listen for ticks | ||
// At every tick, check and update the health status of the pipeline. | ||
go func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the function StartHealthCheck
be blocked? It's using a goroutine to start it in the daemon server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Please check if it seems fine now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a lot of methods and types exposed in healthStatus.go
, avoid exposing types and methods if they are not used in other packages
Signed-off-by: Sidhant Kohli <[email protected]>
@kohlisid - resolve the conflicts? |
Signed-off-by: Sidhant Kohli <[email protected]>
Signed-off-by: Sidhant Kohli <[email protected]>
Signed-off-by: Sidhant Kohli <[email protected]>
Signed-off-by: Sidhant Kohli <[email protected]>
Signed-off-by: Sidhant Kohli <[email protected]>
Signed-off-by: Sidhant Kohli <[email protected]>
Signed-off-by: Sidhant Kohli <[email protected]>
Signed-off-by: Sidhant Kohli <[email protected]>
Signed-off-by: Sidhant Kohli <[email protected]>
Signed-off-by: Sidhant Kohli <[email protected]>
Signed-off-by: Sidhant Kohli <[email protected]>
Health status definitions
It is divided into two parts:
Resource Health can be "healthy (0) | unhealthy (1) | paused (3) | unknown (4)".
Resource Health purely means it is up and running.
Resource health will be the max(health) based of each vertex's health
Resource health checks if all the pods are in running state for the pipeline, and also for paused, unknown pipeline etc
Data Criticality on the other end shows whether the pipeline is working as expected.
It represents the pending messages, lags, etc.
Data Criticality can be "ok (0) | warning (1) | critical (2)".
A backlogged pipeline can be healthy even though it has an increasing back-pressure.
For data criticality the timeline data is populated for the pipeline, and if the average usage lies above the thresholds then the required state is assigned. For critical states we have the option do a lookback and only assign it to be critical if we see a predefined number of critical state in a lookback period window.