Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add health for monovertex #1954

Merged
merged 16 commits into from
Aug 19, 2024
Merged

feat: add health for monovertex #1954

merged 16 commits into from
Aug 19, 2024

Conversation

kohlisid
Copy link
Contributor

@kohlisid kohlisid commented Aug 16, 2024

fixes #1952
For data health, this adds critical status, warning status to be added in a follow up

Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Copy link

codecov bot commented Aug 16, 2024

Codecov Report

Attention: Patch coverage is 28.12500% with 138 lines in your changes missing coverage. Please review.

Project coverage is 57.66%. Comparing base (a52102d) to head (b059b90).
Report is 7 commits behind head on main.

Files Patch % Lines
pkg/mvtxdaemon/server/service/mvtx_service.go 0.00% 37 Missing ⚠️
server/apis/v1/health.go 0.00% 36 Missing ⚠️
pkg/mvtxdaemon/server/service/health_status.go 65.33% 24 Missing and 2 partials ⚠️
server/apis/v1/handler.go 0.00% 18 Missing ⚠️
pkg/mvtxdaemon/client/restful_client.go 0.00% 9 Missing ⚠️
pkg/mvtxdaemon/server/daemon_server.go 0.00% 6 Missing ⚠️
pkg/mvtxdaemon/client/grpc_client.go 0.00% 5 Missing ⚠️
server/routes/routes.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1954      +/-   ##
==========================================
- Coverage   57.77%   57.66%   -0.12%     
==========================================
  Files         412      414       +2     
  Lines       28789    28966     +177     
==========================================
+ Hits        16633    16702      +69     
- Misses      11214    11326     +112     
+ Partials      942      938       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
}

message GetMonoVertexStatusRequest {
string monovertex = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using google.protobuf.Empty for the request is good enough.

Copy link
Contributor Author

@kohlisid kohlisid Aug 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! For now I had copied over the pipeline template and was just adding the implementation.
I checked, and we wouldn't need it the pipeline API as well I guess
I'll update for both

Copy link
Contributor Author

@kohlisid kohlisid Aug 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  // GetPipelineWatermarks return the watermark of the given pipeline
  rpc GetPipelineWatermarks (GetPipelineWatermarksRequest) returns (GetPipelineWatermarksResponse) {
    option (google.api.http).get = "/api/v1/pipelines/{pipeline}/watermarks";
  };

  rpc GetPipelineStatus (GetPipelineStatusRequest) returns (GetPipelineStatusResponse) {
    option (google.api.http).get = "/api/v1/pipelines/{pipeline}/status";
  };

Actually, we won't need for these 2 pipeline level APIs to daemon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whynowy
Sorry just to clarify, you mean that you are good with the current implementation or are you good to change them to empty?
We don't use the request so I believe we can update to empty

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use empty for MonoVertex. For Pipeline, we need to do several iterations if we want to correct it, leave it for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Will create an issue to track that and refactor for later!

Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
@kohlisid
Copy link
Contributor Author

https://localhost:8443/api/v1/namespaces/default/mono-vertices/simple-mono-vertex/health
{
    "data": {
        "resourceHealthStatus": "healthy",
        "dataHealthStatus": "healthy",
        "resourceHealthMessage": "mono vertex \"simple-mono-vertex\" is healthy",
        "dataHealthMessage": "MonoVertex data flow is healthy",
        "resourceHealthCode": "M1",
        "dataHealthCode": "D1"
    },
    "errMsg": null
}

@kohlisid
Copy link
Contributor Author

https://localhost:4327/api/v1/status
{
  "status": {
    "status": "healthy",
    "message": "MonoVertex data flow is healthy",
    "code": "D1"
  }
}

Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
@kohlisid kohlisid requested a review from whynowy August 17, 2024 22:28
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
@kohlisid kohlisid self-assigned this Aug 19, 2024
@vigith vigith marked this pull request as ready for review August 19, 2024 15:38
@vigith vigith self-requested a review as a code owner August 19, 2024 15:38
Comment on lines +122 to +125
// 3. Critical: The MonoVertex is not working as expected
// We need to check the following things to determine the data criticality of the MonoVertex:
// At any given instant of time what is the desired number of replicas required by the MonoVertex
// to clear out the backlog in the target state time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case when pods are continuously being restarted? or what if the TPS is low and pending and is growing very slowly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like cases like pod restarting will be caught by resource health.

@vigith vigith merged commit b54a4cd into main Aug 19, 2024
26 checks passed
@vigith vigith deleted the monovtx-health branch August 19, 2024 18:23
KeranYang pushed a commit that referenced this pull request Aug 19, 2024
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
SaniyaKalamkar pushed a commit to SaniyaKalamkar/numaflow that referenced this pull request Jan 19, 2025
Signed-off-by: Sidhant Kohli <sidhant.kohli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Health Status for MonoVertex
3 participants