Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance restart proxy feature to differentiate between user workloads and kyma workloads #1249

Closed
21 tasks done
strekm opened this issue Jan 16, 2025 · 1 comment
Closed
21 tasks done
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@strekm
Copy link
Contributor

strekm commented Jan 16, 2025

Description

Istio bump is related to the requirement of restarting the pod with Istio sidecar injected, to keep it up to date with the istiod. In a situation where the customer's workload is broken for any reason, the pods are not able to get up - we cannot reconcile the state of the data plane of the service mesh to the required state. This situation puts our Istio operator in a state of a constant retries of proxy restart, resulting in Istio CR in an infinite loop of processing -> error -> processing state. This kind of situation is independent of Istio Module team, but because it's reflected in the inability to restart a Pod, it's an Istio Module team that receives alerts, and therefore responsibility. As a solution for this issue, we should aim to limit a number of retries during the proxy restart. Inability to successfully restart a workload should end up in the Istio CR warning state indicating that the actions is on customer side, to fix the workload so the proxies can be updated.

Consider the case when there is a single Pod without parent Deployment, that we can not restart. We should decide whether to ignore it or Istio CR status must be warning.

Consider Istio CR status when node is draining and Deployment/Pod in evicted state. Should not be causing error status.

TODOs:

  • Implement logic that detects if the workload is customers' or Kyma's (try to look at the annotations/labels)
  • If the proxy cannot be restarted on the customer workload, put Istio CR in the Warning state
  • If the proxy cannot be restarted on the Kyma workload, put Istio CR in the Error state
  • Configure the reconciliation retry to the exponential time in case of failing proxy restart
  • Provide documentation
  • Provide release notes
  • Discuss in the team an adjustment of the pagination mechanism in the proxy restart to avoid paginating through requeueing --> we'll create a follow-up ticket for this

PRs

ACs [PO]

  • issues with restarting kyma workloads should be reported as error
  • issues with restarting user workloads should be reported as warning
  • pod w/o parent should put Istio CR in warning
  • retries should increase waiting time in between up to max backoff time
  • cover the case with single Pod without deployment
  • documentation updated

DoD [Developer & Reviewer]

@strekm strekm added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 16, 2025
@strekm strekm added this to the 1.15.0 milestone Jan 16, 2025
@strekm strekm modified the milestones: 1.15.0, 1.14.0 Jan 24, 2025
@videlov videlov mentioned this issue Jan 27, 2025
1 task
@mluk-sap
Copy link

Done
PR: #1253

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants