-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-48340: skip OperatorHubSourceError metric checking when disableAllDefaultSources is true #29435
base: master
Are you sure you want to change the base?
Conversation
@jianzhangbjz: This pull request references Jira Issue OCPBUGS-48340, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/pj-rehearse periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-upgrade-ovn-ipv6 |
this isn't the correct approach, you should rather add the alert name to |
Hi @simonpasquier , thanks for the help! Could you help review it again? Thanks! |
/pj-rehearse periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-upgrade-ovn-ipv6 |
@@ -47,4 +47,5 @@ var AllowedAlertNames = []string{ | |||
"CDIDefaultStorageClassDegraded", // Installing openshift virt with RWX storage fire an alarm, that is not relevant for most of the tests. | |||
"VirtHandlerRESTErrorsHigh", // https://issues.redhat.com/browse/CNV-50418 | |||
"VirtControllerRESTErrorsHigh", // https://issues.redhat.com/browse/CNV-50418 | |||
"OperatorHubSourceError", // https://issues.redhat.com/browse/OCPBUGS-48340, should skip this metric checking when disableAllDefaultSources is true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"OperatorHubSourceError", // https://issues.redhat.com/browse/OCPBUGS-48340, should skip this metric checking when disableAllDefaultSources is true | |
"OperatorHubSourceError", // https://issues.redhat.com/browse/OCPBUGS-48340 |
// https://issues.redhat.com/browse/OCPBUGS-48340 | ||
if SkipOperatorHubMetricsCheck(oc) { | ||
allowedAlertNames = removeElement(allowedAlertNames, "OperatorHubSourceError") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this
|
||
func SkipOperatorHubMetricsCheck(oc *exutil.CLI) bool { | ||
stdout, stderr, err := oc.AsAdmin().Run("get").Args("operatorhub", "cluster", "-o=jsonpath={.spec.disableAllDefaultSources}").Outputs() | ||
if err != nil { | ||
fmt.Printf("command failed: %v\nstderr: %s\nstdout:%s", err, stderr, stdout) | ||
} | ||
return stdout == "true" | ||
} | ||
|
||
func removeElement(slice []string, element string) []string { | ||
var result []string | ||
for _, v := range slice { | ||
if v != element { | ||
result = append(result, v) | ||
} | ||
} | ||
return result | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Hi @simonpasquier , could you help review it again? Thanks! The logic is that add |
@jianzhangbjz Can you explain the rationale here more? This alert is only firing 7% of the time right now. Where are the default sources being disabled in CI clusters? |
Hi @joelanford , I believe they disable it in the baremetalds-devscripts-setup step, see the https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-upgrade-ovn-ipv6/1877545344619778048/artifacts/e2e-metal-ipi-upgrade-ovn-ipv6/baremetalds-devscripts-setup/artifacts/root/dev-scripts/logs/06_create_cluster-2025-01-09-205310.log 2025-01-09 21:45:15 +(utils.sh:754): add_local_certificate_as_trusted(): oc patch image.config.openshift.io/cluster --patch '{"spec":{"additionalTrustedCA":{"name":"registry-config"}}}' --type=merge
2025-01-09 21:45:15 image.config.openshift.io/cluster patched
2025-01-09 21:45:15 +(./06_create_cluster.sh:81): [[ -n true ]]
2025-01-09 21:45:15 +(./06_create_cluster.sh:81): [[ true != \f\a\l\s\e ]]
2025-01-09 21:45:15 +(./06_create_cluster.sh:82): oc patch OperatorHub cluster --type json -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'
2025-01-09 21:45:16 operatorhub.config.openshift.io/cluster patched
2025-01-09 21:45:16 +(./06_create_cluster.sh:86): [[ -n '' ]] |
Besides, the operatorhub is disabled in many scenarios, such as disconnected and proxy envs, for example, this https://github.com/openshift/release/blob/master/ci-operator/step-registry/ipi/openstack/pre/disconnected/ipi-openstack-pre-disconnected-chain.yaml and https://github.com/openshift/release/blob/14145d08a2d8b367133ab9883bdde96f8443a2f0/ci-operator/step-registry/telco5g/cnf/tests/telco5g-cnf-tests-commands.sh#L451 |
/pj-rehearse periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-upgrade-ovn-ipv6 |
/test e2e-aws-ovn-microshift-serial |
So as soon as a catalog is disabled, the catalog operator will remove the The rule for this alert is defined here: https://github.com/operator-framework/operator-marketplace/blob/5776ce8e796910c2dfc98a42062f97ed98e81b2c/manifests/12_prometheus_rule.yaml#L23 I'm not a prometheus expert, but I think the expression there will just not have results for the time ranges that the default catalogs are disabled. If there are no results, I would think that would mean the expression would not evaluate to true. But I haven't tested this theory. One possibility to consider. If the catalog source is unhealthy before the catalog source is disabled, the the most recent sample for a given |
Yes, I think so.
Yes, these catalogsource unhealthy due to the image pulling issue, so they disbaled |
/hold |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jan--f, jianzhangbjz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test e2e-aws-ovn-edge-zones |
1 similar comment
Late to review but i think that it's simpler to exclude |
/hold Revision 0c45942 was retested 3 times: holding |
/retest-required |
/retest-required |
Job Failure Risk Analysis for sha: 0c45942
|
Job Failure Risk Analysis for sha: 0c45942
|
1 similar comment
Job Failure Risk Analysis for sha: 0c45942
|
/hold Revision 0c45942 was retested 3 times: holding |
/unhold |
/retest-required |
/retest |
Job Failure Risk Analysis for sha: 0c45942
|
Job Failure Risk Analysis for sha: 0c45942
|
Job Failure Risk Analysis for sha: 0c45942
|
@jianzhangbjz: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
When disableAllDefaultSources is true, all default catalogSource are disabled, so there is no metric.