-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect SA token rotation #440
Conversation
b47f6a7
to
3c1d004
Compare
3c1d004
to
765591c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change also removes the existing logic around the DELETE event, which is a leftover from previous changes and is now a no-op.
Could we make that a discrete change, please? It would be helpful for me if we split up some of the non-functional reorganization work into a discrete PR so the change in logic is clearer in this one.
inotifywait -m "${SERVICEACCOUNT_PATH}" -e create | | ||
while read -r directory _ filename; do | ||
target=$(realpath "$directory/$filename") | ||
if [[ (-f "$target" && "${target##*/}" == "token") || (-d "$target" && -e "$target/token") ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little surprising that we would see the creation of a directory but not the creation of the file itself. I suppose the potential downside is only that this could fire redundant updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first part of this condition checks if the token file was created directly as /var/run/secrets/kubernetes.io/serviceaccount/token
, not symlinked. I've added this just in case, but in my tests (k3d, kind and AKS) that file is a symlink that doesn't change, i.e. inotify doesn't detect anything on it even if what it points to has changed.
The second part of the condition is what I've observed, which is the creation of a new directory with the token, to which the token
symlink points to.
Testing this in AKS results in:
[2024-12-05 21:46:54] Created CNI config /host/etc/cni/net.d/15-azure-swift-overlay.conflist
Setting up watches.
Watches established.
Setting up watches.
Watches established.
[2024-12-05 22:36:05] Detected creation of file in /var/run/secrets/kubernetes.io/serviceaccount/: ..2024_12_05_22_36_05.129366695; recreating kubeconfig file
[2024-12-05 22:36:05] Detected creation of file in /var/run/secrets/kubernetes.io/serviceaccount/: ..data_tmp; ignoring
I didn't observe any duplicate triggering.
In my last push I simplified the change by undoing the DELETE refactoring and the removal of linkerd-cni.conf.default. I moved those changes into #446 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
Fixes linkerd/linkerd2#12573 ## Problem When deployed, the linkerd-cni pod gets its service account token mounted automatically by k8s: ```yaml - name: kube-api-access-729gv projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace ``` According to this, the token is set to expire after an hour. When the linkerd-cni pod starts it deploys the file `ZZZ-linkerd-cni-kubeconfig` in to the **host** file system. That config contains the token sourced from `/var/run/secrets/kubernetes.io/serviceaccount` (mounted by the pod). When the token gets rotated after an hour, that token file is updated but `ZZZ-linkerd-cni-kubeconfig` is not updated. The `linkerd-cni` binary uses that token to connect to the kube-api, so having an outdated token should forbid it from functioning properly, which would manifest as new pods in the data plane not being able to acquire a proper network config. However, that failure isn't usually observed, except for the cases pointed out in linkerd/linkerd2#12573. The reason is that the token's actual lifetime is one year, due to kube-api's `--service-account-extend-token-expiration` [flag](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/#options) which is usually set as `true` to avoid breaking too many instances not yet adapted to use tokens with short expirations: > Turns on projected service account expiration extension during token generation, which helps safe transition from legacy token to bound service account token feature. If this flag is enabled, admission injected tokens would be extended up to 1 year to prevent unexpected failure during transition, ignoring value of service-account-max-token-expiration. ## Repro ### AKS The issue currently affects AKS clusters using OIDC keys. To reproduce, create a new cluster in AKS, making sure "Enable OIDC" and "Workload Identity" is ticked in the UI. Then install the linkerd-cni plugin, labelling the linkerd-cni DaemonSet so that its ServiceAccount token is provided via OIDC: ``` linkerd install-cni --set-string "podLabels.azure\.workload\.identity/use"="true" | kubectl apply -f - ``` And install linkerd with cni enabled, and an injected instance of emojivoto. The secret token is rotated after an hour, but the old one remains valid for a 24h. Manually rotating the key as detailed in the [docs](https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer#rotate-the-oidc-key) should invalidate the old key. After that, bouncing any emojivoto pod will prove unsuccessful with the following event being raised: ``` Warning FailedCreatePodSandBox 15s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8121291446642b272cea9ee5f083958a37bab0dd7060c4d9c06bb05fecf911d2": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Unauthorized ``` ## Fix This change adds a new function `monitor_service_account_token()` that monitors the rollout of the token file; which is a symlink whose target changes as a new token is deployed. When detecting a new token file, this function calls the new `create_kubeconfig()` function. This change also removes the existing logic around the DELETE event, which is a leftover from previous changes and is now a no-op. Also, as detailed in linkerd/linkerd2#13407, the ServiceAccount token has been removed from the cni config template because it's not used, simplifying things as we can regenerate the kubeconfig file without having to touch the cni config file. Finally, the file `linkerd-cni.conf.default` has been removed as is not used. ## Test Same as with the repro above, but use the cni-plugin image that contains the fix: ``` linkerd install-cni --set-string "podLabels.azure\.workload\.identity/use"="true" --set image.name="ghcr.io/alpeb/cni-plugin" --set image.version="v1.5.3" | kubectl apply -f - ``` After an hour when the token gets rotated you should see the event in the linkerd-cni pod logs.
cbe4299
to
fbd3ba7
Compare
This change removes the `policy` entry from the cni config template, which isn't used. That contained a `__SERVICEACCOUNT_TOKEN__` placeholder, which was coupling this config file with the `ZZZ-linkerd-cni-kubeconfig` file generated by linkerd-cni. In linkerd/linkerd2-proxy-init#440 we add support for detecting changes in the mounted serviceaccount token file (see #12573), and the current change facilitates that effort. Co-authored-by: Oliver Gould <[email protected]>
Fixes linkerd/linkerd2#12573
Problem
When deployed, the linkerd-cni pod gets its service account token mounted automatically by k8s:
According to this, the token is set to expire after an hour.
When the linkerd-cni pod starts it deploys the file
ZZZ-linkerd-cni-kubeconfig
in to the host file system.That config contains the token sourced from
/var/run/secrets/kubernetes.io/serviceaccount
(mounted by the pod).When the token gets rotated after an hour, that token file is updated but
ZZZ-linkerd-cni-kubeconfig
is not updated.The
linkerd-cni
binary uses that token to connect to the kube-api, so having an outdated token should forbid it from functioning properly, which would manifest as new pods in the data plane not being able to acquire a proper network config.However, that failure isn't usually observed, except for the cases pointed out in linkerd/linkerd2#12573. The reason is that the token's actual lifetime is one year, due to kube-api's
--service-account-extend-token-expiration
flag which is usually set astrue
to avoid breaking too many instances not yet adapted to use tokens with short expirations:Repro
The issue currently affects AKS clusters using OIDC keys. To reproduce, create a new cluster in AKS, making sure "Enable OIDC" and "Workload Identity" is ticked in the UI.
Then install the linkerd-cni plugin, labelling the linkerd-cni DaemonSet so that its ServiceAccount token is provided via OIDC:
And install linkerd with cni enabled, and an injected instance of emojivoto.
The secret token is rotated after an hour, but the old one remains valid for a 24h. Manually rotating the key as detailed in the docs should invalidate the old key.
After that, bouncing any emojivoto pod will prove unsuccessful with the following event being raised:
Fix
This change adds a new function
monitor_service_account_token()
that monitors the rollout of the token file; which is a symlink whose target changes as a new token is deployed. When detecting a new token file, this function calls the newcreate_kubeconfig()
function.Also, as detailed in linkerd/linkerd2#13407, the ServiceAccount token has been removed from the cni config template because it's not used, simplifying things as we can regenerate the kubeconfig file without having to touch the cni config file.
Test
Same as with the repro above, but use the cni-plugin image that contains the fix:
After an hour when the token gets rotated you should see the event in the linkerd-cni pod logs.