-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add health for monovertex #1954
Conversation
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1954 +/- ##
==========================================
- Coverage 57.77% 57.66% -0.12%
==========================================
Files 412 414 +2
Lines 28789 28966 +177
==========================================
+ Hits 16633 16702 +69
- Misses 11214 11326 +112
+ Partials 942 938 -4 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
} | ||
|
||
message GetMonoVertexStatusRequest { | ||
string monovertex = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using google.protobuf.Empty
for the request is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! For now I had copied over the pipeline template and was just adding the implementation.
I checked, and we wouldn't need it the pipeline API as well I guess
I'll update for both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// GetPipelineWatermarks return the watermark of the given pipeline
rpc GetPipelineWatermarks (GetPipelineWatermarksRequest) returns (GetPipelineWatermarksResponse) {
option (google.api.http).get = "/api/v1/pipelines/{pipeline}/watermarks";
};
rpc GetPipelineStatus (GetPipelineStatusRequest) returns (GetPipelineStatusResponse) {
option (google.api.http).get = "/api/v1/pipelines/{pipeline}/status";
};
Actually, we won't need for these 2 pipeline level APIs to daemon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whynowy
Sorry just to clarify, you mean that you are good with the current implementation or are you good to change them to empty?
We don't use the request so I believe we can update to empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use empty for MonoVertex. For Pipeline, we need to do several iterations if we want to correct it, leave it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, Will create an issue to track that and refactor for later!
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
|
|
// 3. Critical: The MonoVertex is not working as expected | ||
// We need to check the following things to determine the data criticality of the MonoVertex: | ||
// At any given instant of time what is the desired number of replicas required by the MonoVertex | ||
// to clear out the backlog in the target state time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the case when pods are continuously being restarted? or what if the TPS is low and pending and is growing very slowly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like cases like pod restarting will be caught by resource health.
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
fixes #1952
For data health, this adds critical status, warning status to be added in a follow up