Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snatch docs #38

Merged
merged 13 commits into from
Jan 27, 2025
48 changes: 38 additions & 10 deletions docs/user/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,42 @@
> **TIP:** Apart from the {Module Name} heading, you can use your own titles for the remaining sections. You can also add more module-specific sections.
# {Module Name}
> Modify the title and insert the name of your module. Use Heading 1 (H1).
# KIM Snatch

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
## Overview
> Provide a description of your module and its components. Describe its features and functionalities. Mention the scope and add information on the CustomResourceDefinitions (CRDs).
> You can divide this section to the relevant subsections.
The KIM-Snatch Module is part of KIM's worker pool feature. It is a mandatory Kyma module and deployed on all Kyma managed runtimes (SKR).

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
In the past, Kyma had only one worker pool (so called "Kyma worker pool") where every workload was scheduled on. This Kyma worker pool is mandatory and cannot be removed from a Kyma runtime. Customers have several configuration options, but it's not fully adjustable and can be too limited for customers who require special node setups.

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
By introducing the Kyma worker pool feature, customers can add additional worker pools to their Kyma runtime. This enables customer to introduce worker nodes, which are optimized for their particular workload requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By introducing the Kyma worker pool feature, customers can add additional worker pools to their Kyma runtime. This enables customer to introduce worker nodes, which are optimized for their particular workload requirements.
With the Kyma worker pool feature, you can add additional worker pools to your Kyma runtime and introduce worker nodes optimized for their particular workload requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 6 and 8 describe the Kyma worker pool.
If these are two different things, I recommend naming/describing them so that it's easy to understand which is which.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully agreed. I've remove overlapping names and used different formatting for Kyma worker pool.

To ensure customer worker pools are reserved for customer workloads, KIM-Snatch got introduced. It is responsible to assign Kyma workloads (e.g. operators of Kyma modules) to the Kyma worker pool. This has several advantages:

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
* Kyma workloads are not allocating resources on customer worker pools. This ensures that customers have the full capacity of the worker pool available for their workloads.
* It reduce the risk of incompatibility between Kyma container images and individually configured worker pools.
tobiscr marked this conversation as resolved.
Show resolved Hide resolved

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
## Technical Approach
The KIM-Snatch module introduces a [mutating admission webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook) in Kubernetes.

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
It is intercepting all pods which are scheduled in a Kyma managed namespaces. A managed namespace is by [KLM](https://github.com/kyma-project/lifecycle-manager) always labeled with `operator.kyma-project.io/managed-by: kyma`. KIM reacts only on pods which are scheduled in one of these labeled namespaces. Typical Kyma managed namespaces are `kyma-system` or, if the Kyma Istio module is used, `istio`.

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
![KIM Snatch Webhook](./assets/snatch-deployment.png)

Before the pod is handed over to the Kubernetes scheduler, KIM-Snatch adds a [`nodeAffinity`](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity) to the pod's manifest. This informs the Kubernetes scheduler to prefer nodes within the Kyma worker pool for this pod.

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
## Limitations

### Using the Kyma worker pool is not enforced
Assigning a pod to a specific worker pool can cause drawbacks, for example:
tobiscr marked this conversation as resolved.
Show resolved Hide resolved

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
* Resources of the preferred worker pool are exhausted while other worker pools would have still free capacities.
* If no suitable worker pool can be found and the node-affinity is set as a "hard" rule, the pod won't be scheduled.
tobiscr marked this conversation as resolved.
Show resolved Hide resolved

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
To overcome these limitations, the configured node-affinity on Kyma workloads is a "soft" rule (we use `preferredDuringSchedulingIgnoredDuringExecution`, for more details see [Kubernetes docs](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity)). The Kubernetes scheduler will prefer the Kyma worker pool, but if it's not possible to schedule the pod in this pool, it will also consider other worker pools.

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
### Cases when Kyma workloads are not intercepted

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
#### Non-available webhook will be ignored by Kubernetes
Kubernetes calls could be heavily impacted if a mandatory admission webhook isn't responsive enough. This can lead to timeouts and massive performance degradation.
tobiscr marked this conversation as resolved.
Show resolved Hide resolved

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
## Useful Links (Optional)
> Provide links to the most relevant module documentation (tutorials, technical references, resources, etc.).
To prevent such side-effects, the KIM-Snatch webhook is configured with a [failure tolerating policy](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#failure-policy) which allows Kubernetes to continue in case of errors. This implies, that downtimes or failures of the webhook will be accepted and pods get scheduled without a `nodeAffinity`.

tobiscr marked this conversation as resolved.
Show resolved Hide resolved
## Feedback (Optional)
> Describe how users can provide feedback.
#### Already scheduled pods are ignored by webhook
tobiscr marked this conversation as resolved.
Show resolved Hide resolved
Additionally, all pods which are already scheduled and running on a worker node won't receive the `nodeAffinity` as it's only allowed to intercept non-scheduled pods. Means, running pods would have to be restarted to receive the `nodeAffinity`. This webhook is not restarting running pods to avoid any service interruptions or reduced user experience for our customers.
tobiscr marked this conversation as resolved.
Show resolved Hide resolved
Binary file added docs/user/assets/snatch-deployment.png
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In the diagram, please replace the following:
    • Kyma Runtime (SKR) with SAP BTP, Kyma Runtime
    • Customer with User
  2. Is it possible to move the namespace shape so that it doesn't intersect with the channel (?) symbol?
  3. The diagram does not follow our content guidelines, but as we are thinking of switching to TAM, let me check if the current version can be accepted.
  4. Still, can you change it to SVG. If it's added as snatch-deployment.drawio.svg, it will be easier to edit/update if need be.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading