Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: guidance for pod memory allocation #1195

Merged
merged 2 commits into from
Feb 6, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion dev-docs/aks/nested-virt-internals.md
Original file line number Diff line number Diff line change
@@ -206,7 +206,7 @@ api sockets.
<summary>List some facts about all CH VMs</summary>

```sh
find /run/vc/vm -name clh-api.sock -exec ch-remote --api-socket "{}" info ";" |
find /run/vc/vm -name clh-api.sock -exec curl -sS --unix-socket "{}" http://./api/v1/vm.info ";" |
jq -s 'map( {
"sock": .config.vsock.socket,
"policy": .config.payload.host_data,
@@ -233,3 +233,5 @@ find /run/vc/vm -name clh-api.sock -exec ch-remote --api-socket "{}" info ";" |
]
```
</details>

The API is documented [here](https://github.com/cloud-hypervisor/cloud-hypervisor/blob/v43.0/docs/api.md).
65 changes: 65 additions & 0 deletions docs/docs/deployment.md
Original file line number Diff line number Diff line change
@@ -128,6 +128,71 @@ spec: # v1.PodSpec
runtimeClassName: contrast-cc
```

### Pod resources

Contrast workloads are deployed as one confidential virtual machine (CVM) per pod.
In order to configure the CVM resources correctly, Contrast workloads require a stricter specification of pod resources compared to standard [Kubernetes resource management].

The total memory available to the CVM is calculated from the sum of the individual containers' memory limits and a static `RuntimeClass` overhead that accounts for services running inside the CVM.
Consider the following abbreviated example resource definitions:

```yaml
kind: RuntimeClass
handler: contrast-cc
overhead:
podFixed:
memory: 256Mi
---
spec: # v1.PodSpec
containers:
- name: my-container
image: "my-image@sha256:..."
resources:
limits:
memory: 128Mi
- name: my-sidecar
image: "my-other-image@sha256:..."
resources:
limits:
memory: 64Mi
```

Contrast launches this pod as a VM with 448MiB of memory: 192MiB for the containers and 256MiB for the Linux kernel, the Kata agent and other base processes.

When calculating the VM resource requirements, init containers aren't taken into account.
If you have an init container that requires large amounts of memory, you need to adjust the memory limit of one of the main containers in the pod.
Since memory can't be shared dynamically with the host, each container should have a memory limit that covers its worst-case requirements.

Kubernetes packs a node until the sum of pod _requests_ reaches the node's total memory.
Since a Contrast pod is always going to consume node memory according to the _limits_, the accounting is only correct if the request is equal to the limit.
Thus, once you determined the memory requirements of your application, you should add a resource section to the pod specification with request and limit:

```yaml
spec: # v1.PodSpec
containers:
- name: my-container
image: "my-image@sha256:..."
resources:
requests:
memory: 50Mi
limits:
memory: 50Mi
```

:::note

On bare metal platforms, container images are pulled from within the guest CVM and stored in encrypted memory.
The CVM mounts a `tmpfs` for the image layers that's capped at 50% of the total VM memory.
This tmpfs holds the extracted image layers, so the uncompressed image size needs to be taken into account when setting the container limits.
Registry interfaces often show the compressed size of an image, the decompressed image is usually a factor of 2-4x larger if the content is mostly binary.
For example, the `nginx:stable` image reports a compressed image size of 67MiB, but storing the uncompressed layers needs about 184MiB of memory.
Although only the extracted layers are stored, and those layers are reused across containers within the same pod, the memory limit should account for both the compressed and the decompressed layer simultaneously.
Altogether, setting the limit to 10x the compressed image size should be sufficient for small to medium images.

:::

[Kubernetes resource management]: <https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/>

### Handling TLS

In the initialization process, the `contrast-secrets` shared volume is populated with X.509 certificates for your workload.
13 changes: 13 additions & 0 deletions docs/docs/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -164,3 +164,16 @@ contrast version v0.X.0
image versions: ghcr.io/edgelesssys/contrast/coordinator@sha256:...
ghcr.io/edgelesssys/contrast/initializer@sha256:...
```

## VM runs out of memory

Since pod VMs are statically sized, it's easier to run out of memory due to misconfigurations.
Setting the right memory limits is even more important on bare metal, where the image layers need to be stored in the guest memory, too.
If you see an error message like this, the VM doesn't have enough space to pull images:

```
LAST SEEN TYPE REASON OBJECT MESSAGE
2m31s Warning Failed Pod/my-pod-76dc84fc75-6xn7s Error: failed to create containerd task: failed to create shim task: failed to handle layer: hasher sha256: failed to unpack [...] No space left on device (os error 28)
```

This error can be resolved by increasing the memory limit of the containers, see the [Workload deployment](deployment.md#pod-resources) guide.