Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflect unhealthy providers in application status #524

Merged
merged 8 commits into from
Dec 23, 2024

Conversation

Vikrantpalle
Copy link
Contributor

Feature or Problem

Scaler status returns a Failed status if it contains any providers which have failed their most recent health check and are in a Failed state.

Related Issues

Release Information

Consumer Impact

Testing

Unit Test(s)

Added 2 unit tests each for daemon and spread scalers

  • test_unhealthy_providers_return_unhealthy_status
  • test_healthy_providers_return_healthy_status

Acceptance or Integration

Manual Verification

Verified the scaler status after reconcile is run and that all unit tests passed.

@Vikrantpalle Vikrantpalle requested a review from a team as a code owner December 17, 2024 09:41
@Vikrantpalle
Copy link
Contributor Author

Vikrantpalle commented Dec 17, 2024

One potential edge case is if events ProviderHealthCheckFailed -> ProviderStopped get processed in the order ProviderStopped -> ProviderHealthCheckFailed. Then the status will remain in the Failed state for the provider and scaler. One potential fix is decaying out the provider but I'm not sure whether wadm already does this currently.

Copy link
Member

@brooksmtownsend brooksmtownsend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here @Vikrantpalle ! I have one request around how we're handling the health check event here, I think we should update the status with the message on the event but not actually do a full reconcile. Let me know if there's any more detail I can add there, and thank you for adding tests!

crates/wadm/src/scaler/daemonscaler/provider.rs Outdated Show resolved Hide resolved
crates/wadm/src/scaler/spreadscaler/provider.rs Outdated Show resolved Hide resolved
crates/wadm/src/scaler/spreadscaler/provider.rs Outdated Show resolved Hide resolved
@Vikrantpalle Vikrantpalle marked this pull request as draft December 18, 2024 09:15
@Vikrantpalle Vikrantpalle marked this pull request as ready for review December 18, 2024 11:27
@Vikrantpalle
Copy link
Contributor Author

I added a new Unhealthy status because I thought it would be more clear and wouldn't require matching the message inside the failed status to figure out whether a scaler is unhealthy. Feel free to let me know if you think this isn't required, I can revert to use the Failed status like before.

@brooksmtownsend

Copy link
Member

@brooksmtownsend brooksmtownsend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Vikrantpalle I do like the idea of the Unhealthy status, for now though I think we have to use StatusType::Failed since using the new status will fail to deserialize on clients where that status doesn't exist yet 😕 It's not ideal, we should probably have a fallback there, but wash doesn't ATM.

I think it's good to keep the Unhealthy status, but if you can just make sure that the scaler uses Failed for now we can release this and then update the scaler in a future version to use Unhealthy

Copy link
Member

@brooksmtownsend brooksmtownsend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@brooksmtownsend brooksmtownsend merged commit 062130e into wasmCloud:main Dec 23, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants