-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Add duplicate nodeclaim chack for Repair Controller #1916
Conversation
cb8f8b9
to
12964ef
Compare
96f3161
to
e6baba8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit
/lgtm
/hold
e6baba8
to
6023fb7
Compare
2697970
to
164084c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: engedaam, jmdeal The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
if !clusterHealthy { | ||
c.recorder.Publish(NodeRepairBlockedUnmanagedNodeClaim(node, nodeClaim, fmt.Sprintf("more then %s nodes are unhealthy in the cluster", allowedUnhealthyPercent.String()))...) | ||
if nodeutils.IsDuplicateNodeClaimError(err) { | ||
log.FromContext(ctx).Error(err, "failed to validate node health") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's odd that we would just be logging the error and not returning the error. If you are looking to handle cases where you don't want to keep requeueing because the error isn't retryable, you can consider doing something like a TerminalError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyways, do you need to log an error for this kind of failure, I think this should propagate up as a registration error if this even happens at all -- I'd probably just consider ignoring this rather than logging for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can use reconcile.TerminalError
here (didn't know this was an option). The idea was just to fail load and not reconcile on the error. Customer should really be intervening and fixing the broken state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, I don't think that you need to code this state in because we don't really even know of a true instance of this happening -- I would be for just removing the error logging that you have here and then just do the standard ignore pattern that we have with other errors that we just don't want to reconcile on -- there are potential options that we could consider with marking NodeClaims as unhealthy in our status conditions if we haven't already, but handling it here feels like the wrong place to do it -- if every controller handled it this way, the logs would start to look really noisy when you ran into this state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that seems reasonable to me. I can drop this check all together. Gonna open a new PR to avoid needing to alter to much in this commit
164084c
to
149bc28
Compare
New changes are detected. LGTM label has been removed. |
Fixes #N/A
Description
How was this change tested?
make presubmit
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.