-
Hello, I'm trying to use linkerd (version edge-24.7.5) on a two-node bare-metal k3s cluster. The linkerd control plane pods are running on the same node as the k3s control plane, and the workloads I'm trying to inject are running on the worker node. The pods I'm trying to inject get stuck at
I have installed linkerd using Helm and configured it to rotate certs with cert-manager using the following values:
Here is the output of
The problem usually happens if the k3s control plane has been running for some time, and restarting the control plane sometimes fixes the issue. I haven't encountered the same problem when the linkerd control plane is run on the worker, but I'm assuming it's better to host the linkerd control plane on the k3s control plane node? I've tested pod connectivity across nodes using iperf3 with both TCP and UDP, and in both cases the transfer rates are pretty good when compared to the values I get going direct from host to host outside k3s. I've also changed my Flannel backend from vxlan to host-gw, and although that improved the raw transfer speeds, it hasn't fixed this issue. At this point I'm not sure what else to try. Is this an issue with linkerd, or should I do more to rule out the CNI? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Found the source of the problem: clock drift between the nodes (Raspberry Pis) exceeded the lifetime of the certificates issued by the control plane (from the logs looks like the order of 20 seconds?). This mean the workloads couldn't validate the generated certificates, resulting in the |
Beta Was this translation helpful? Give feedback.
Found the source of the problem: clock drift between the nodes (Raspberry Pis) exceeded the lifetime of the certificates issued by the control plane (from the logs looks like the order of 20 seconds?). This mean the workloads couldn't validate the generated certificates, resulting in the
NotValidYet
error in the logs above.