Replies: 8 comments 25 replies
-
It sounds like you have etcd s3 settings in your config file. Can you try with: |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for the reply, Brad! Before proceeding, I need to clarify the situation here so you have all the information:
Now, I tried what you suggested. I get another error this time, and I feel like that is because I created a new cluster. What do you think?
|
Beta Was this translation helpful? Give feedback.
-
FYI: Tomorrow I plan to verify if a simple snapshot restore works if I do not recreate the cluster, as well as testing a simpler setup where I have only a single node. |
Beta Was this translation helpful? Give feedback.
-
I've had some interesting findings! I created a new single-node RKE2 cluster and did some experiments. First I tried taking a snapshot and restoring it on the same node. That worked fine (as expected). Then I recreated the node using Terraform, installed RKE2 on it and tried restoring the same snapshot again. That did not work! Based on my prior knowledge of etcd snapshots, as well as the fact that there is a section in the RKE2 docs for "Restoring a Snapshot to New Nodes", I thought this should work. Was I wrong? Have I misunderstood the docs? Single node, no recreation
This was successful: $ ssh rancher-node011
[admin@rancher-node011 ~]$ curl -sfL https://get.rke2.io | sudo INSTALL_RKE2_VERSION=v1.28.10+rke2r1 sh -
[admin@rancher-node011 ~]$ sudo systemctl enable rke2-server.service
[admin@rancher-node011 ~]$ sudo systemctl start rke2-server.service
[admin@rancher-node011 ~]$ sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml create ns foo
[admin@rancher-node011 ~]$ sudo rke2 etcd-snapshot save
INFO[0000] Snapshot on-demand-rancher-node011-1720423972 saved.
[admin@rancher-node011 ~]$ sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml delete ns foo
[admin@rancher-node011 ~]$ sudo systemctl stop rke2-server
[admin@rancher-node011 ~]$ sudo rke2 server --cluster-reset --cluster-reset-restore-path=/var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720423972
WARN[0000] not running in CIS mode
INFO[0000] Applying Pod Security Admission Configuration
INFO[0000] Static pod cleanup in progress
INFO[0000] Logging temporary containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0000] Running temporary containerd /var/lib/rancher/rke2/bin/containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0010] Removing pod etcd-rancher-node011
INFO[0010] Removing pod kube-apiserver-rancher-node011
INFO[0020] Static pod cleanup completed successfully
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-api-server-agent-load-balancer.json: no such file or directory
INFO[0020] Starting rke2 v1.28.10+rke2r1 (b0d0d687d98f4fa015e7b30aaf2807b50edcc5d7)
INFO[0020] Managed etcd cluster bootstrap already complete and initialized
INFO[0020] Pre-restore etcd database moved to /var/lib/rancher/rke2/server/db/etcd-old-1720424151
{"level":"info","ts":"2024-07-08T09:35:51.056292+0200","caller":"snapshot/v3_snapshot.go:248","msg":"restoring snapshot","path":"/var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720423972","wal-dir":"/var/lib/rancher/rke2/server/db/etcd/member/wal","data-dir":"/var/lib/rancher/rke2/server/db/etcd","snap-dir":"/var/lib/rancher/rke2/server/db/etcd/member/snap","stack":"go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/go/pkg/mod/github.com/k3s-io/etcd/etcdutl/[email protected]/snapshot/v3_snapshot.go:254\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Restore\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:1486\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Reset\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:410\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/managed.go:71\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).Start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/cluster.go:91\ngithub.com/k3s-io/k3s/pkg/daemons/control.prepare\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:261\ngithub.com/k3s-io/k3s/pkg/daemons/control.Server\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:35\ngithub.com/k3s-io/k3s/pkg/server.StartServer\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/server/server.go:56\ngithub.com/k3s-io/k3s/pkg/cli/server.run\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:498\ngithub.com/k3s-io/k3s/pkg/cli/server.RunWithControllers\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:48\ngithub.com/rancher/rke2/pkg/rke2.Server\n\t/source/pkg/rke2/rke2.go:123\ngithub.com/rancher/rke2/pkg/cli/cmds.ServerRun\n\t/source/pkg/cli/cmds/server.go:168\ngithub.com/urfave/cli.HandleAction\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:524\ngithub.com/urfave/cli.Command.Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/command.go:175\ngithub.com/urfave/cli.(*App).Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:277\nmain.main\n\t/source/main.go:23\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
{"level":"info","ts":"2024-07-08T09:35:51.101282+0200","caller":"membership/store.go:141","msg":"Trimming membership information from the backend..."}
{"level":"info","ts":"2024-07-08T09:35:51.105414+0200","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"220455547a6da141","local-member-id":"0","added-peer-id":"df593b333a718eb9","added-peer-peer-urls":["https://130.237.255.11:2380"]}
{"level":"info","ts":"2024-07-08T09:35:51.112349+0200","caller":"snapshot/v3_snapshot.go:269","msg":"restored snapshot","path":"/var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720423972","wal-dir":"/var/lib/rancher/rke2/server/db/etcd/member/wal","data-dir":"/var/lib/rancher/rke2/server/db/etcd","snap-dir":"/var/lib/rancher/rke2/server/db/etcd/member/snap"}
INFO[0020] Starting etcd for new cluster, cluster-reset=true
INFO[0020] Server node token is available at /var/lib/rancher/rke2/server/token
INFO[0020] To join server node to cluster: rke2 server -s https://130.237.255.11:9345 -t ${SERVER_NODE_TOKEN}
INFO[0020] Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/k3s/containerd/containerd.sock: connect: connection refused"
INFO[0020] Agent node token is available at /var/lib/rancher/rke2/server/agent-token
INFO[0020] To join agent node to cluster: rke2 agent -s https://130.237.255.11:9345 -t ${AGENT_NODE_TOKEN}
INFO[0020] Wrote kubeconfig /etc/rancher/rke2/rke2.yaml
INFO[0020] Run: rke2 kubectl
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
INFO[0020] Adding server to load balancer rke2-agent-load-balancer: 127.0.0.1:9345
INFO[0020] Running load balancer rke2-agent-load-balancer 127.0.0.1:6444 -> [127.0.0.1:9345] [default: 127.0.0.1:9345]
INFO[0020] Running load balancer rke2-api-server-agent-load-balancer 127.0.0.1:6443 -> [] [default: ]
INFO[0021] Password verified locally for node rancher-node011
INFO[0021] certificate CN=rancher-node011 signed by CN=rke2-server-ca@1720423680: notBefore=2024-07-08 07:28:00 +0000 UTC notAfter=2025-07-08 07:35:52 +0000 UTC
INFO[0021] certificate CN=system:node:rancher-node011,O=system:nodes signed by CN=rke2-client-ca@1720423680: notBefore=2024-07-08 07:28:00 +0000 UTC notAfter=2025-07-08 07:35:52 +0000 UTC
INFO[0021] Module overlay was already loaded
INFO[0021] Module nf_conntrack was already loaded
INFO[0021] Module br_netfilter was already loaded
INFO[0021] Module iptable_nat was already loaded
INFO[0021] Module iptable_filter was already loaded
INFO[0021] Runtime image index.docker.io/rancher/rke2-runtime:v1.28.10-rke2r1 bin and charts directories already exist; skipping extract
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-csi-driver.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-calico-crd.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-calico.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-canal.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-validation-webhook.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-cloud-provider.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-coredns.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-flannel.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-metrics-server.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-controller.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-cpi.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-cilium.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-multus.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-controller-crd.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-csi.yaml to set cluster configuration values
WARN[0021] SELinux is enabled on this host, but rke2 has not been started with --selinux - containerd SELinux support is disabled
INFO[0021] Logging containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0021] Running containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0022] containerd is now running
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt
INFO[0022] Image index.docker.io/rancher/rke2-cloud-provider:v1.29.3-build20240412 has already been pulled
INFO[0022] Imported docker.io/rancher/rke2-cloud-provider:v1.29.3-build20240412
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt in 4.11883ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/etcd-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20240418 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20240418
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/etcd-image.txt in 1.867412ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt in 1.903957ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-controller-manager-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-controller-manager-image.txt in 1.16302ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-proxy-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-proxy-image.txt in 521.461µs
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-scheduler-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-scheduler-image.txt in 589.768µs
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/runtime-image.txt
INFO[0022] Image index.docker.io/rancher/rke2-runtime:v1.28.10-rke2r1 has already been pulled
INFO[0022] Imported docker.io/rancher/rke2-runtime:v1.28.10-rke2r1
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/runtime-image.txt in 1.772119ms
INFO[0022] Running kubelet --address=0.0.0.0 --alsologtostderr=false --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=rancher-node011 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --kubelet-cgroups=/rke2 --log-file=/var/lib/rancher/rke2/agent/logs/kubelet.log --log-file-max-size=50 --logtostderr=false --node-ip=130.237.255.11 --node-labels= --pod-infra-container-image=index.docker.io/rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --stderrthreshold=FATAL --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
INFO[0022] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T09:35:56.338674+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0009f8700/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0025] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0027] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T09:36:01.339652+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0009f8540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0030] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0032] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T09:36:06.340487+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0009f88c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0035] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0037] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[0040] Pod for etcd not synced (pod sandbox not found), retrying
{"level":"warn","ts":"2024-07-08T09:36:11.340944+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0009f8540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0040] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0042] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T09:36:16.34117+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000734700/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0045] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0047] Defragmenting etcd database
INFO[0047] etcd data store connection OK
INFO[0047] Waiting for API server to become available
INFO[0047] Saving cluster bootstrap data to datastore
INFO[0047] ETCD server is now running
INFO[0047] rke2 is up and running
WARN[0047] Bootstrap key already exists
INFO[0047] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[0052] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[0054] Defragmenting etcd database
INFO[0054] Reconciling bootstrap data between datastore and disk
INFO[0054] Cluster reset: backing up certificates directory to /var/lib/rancher/rke2/server/tls-1720424185
WARN[0054] Updating bootstrap data on disk from datastore
INFO[0054] certificate CN=etcd-peer signed by CN=etcd-peer-ca@1720423680: notBefore=2024-07-08 07:28:00 +0000 UTC notAfter=2025-07-08 07:36:25 +0000 UTC
INFO[0054] certificate CN=etcd-server signed by CN=etcd-server-ca@1720423680: notBefore=2024-07-08 07:28:00 +0000 UTC notAfter=2025-07-08 07:36:25 +0000 UTC
INFO[0054] Shutting down kubelet and etcd
ERRO[0054] Kubelet exited: signal: killed
INFO[0057] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[0059] Managed etcd cluster membership has been reset, restart without --cluster-reset flag now. Backup and delete ${datadir}/server/db on each peer etcd server and rejoin the nodes
[admin@rancher-node011 ~]$ sudo systemctl start rke2-server
[admin@rancher-node011 ~]$ sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get ns
NAME STATUS AGE
default Active 9m46s
foo Active 7m4s
kube-node-lease Active 9m46s
kube-public Active 9m46s
kube-system Active 9m46s Single node, with recreation
Restoring to a newly installed node failed! Starting with the known good snapshot from previous example: [admin@rancher-node011 ~]$ sudo cp /var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720423972 /home/admin/
[admin@rancher-node011 ~]$ sudo chown admin:admin on-demand-rancher-node011-1720423972
[admin@rancher-node011 ~]$ exit
$ scp rancher-node011:/home/admin/on-demand-rancher-node011-1720423972 . Then I ran $ scp ./on-demand-rancher-node011-1720423972 rancher-node011:/home/admin/
$ ssh rancher-node011
[admin@rancher-node011 ~]$ curl -sfL https://get.rke2.io | sudo INSTALL_RKE2_VERSION=v1.28.10+rke2r1 sh -
[admin@rancher-node011 ~]$ sudo systemctl enable rke2-server.service
[admin@rancher-node011 ~]$ sudo systemctl start rke2-server.service
[admin@rancher-node011 ~]$ sudo systemctl stop rke2-server.service
[admin@rancher-node011 ~]$ sudo rke2 server --cluster-reset --cluster-reset-restore-path=/home/admin/on-demand-rancher-node011-172042
3972
WARN[0000] not running in CIS mode
INFO[0000] Applying Pod Security Admission Configuration
INFO[0000] Static pod cleanup in progress
INFO[0000] Logging temporary containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0000] Running temporary containerd /var/lib/rancher/rke2/bin/containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0010] Removing pod etcd-rancher-node011
INFO[0010] Removing pod kube-apiserver-rancher-node011
INFO[0020] Static pod cleanup completed successfully
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-api-server-agent-load-balancer.json: no such file or directory
INFO[0020] Starting rke2 v1.28.10+rke2r1 (b0d0d687d98f4fa015e7b30aaf2807b50edcc5d7)
INFO[0020] Managed etcd cluster bootstrap already complete and initialized
INFO[0020] Pre-restore etcd database moved to /var/lib/rancher/rke2/server/db/etcd-old-1720425506
{"level":"info","ts":"2024-07-08T09:58:26.601845+0200","caller":"snapshot/v3_snapshot.go:248","msg":"restoring snapshot","path":"/home/admin/on-demand-rancher-node011-1720423972","wal-dir":"/var/lib/rancher/rke2/server/db/etcd/member/wal","data-dir":"/var/lib/rancher/rke2/server/db/etcd","snap-dir":"/var/lib/rancher/rke2/server/db/etcd/member/snap","stack":"go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/go/pkg/mod/github.com/k3s-io/etcd/etcdutl/[email protected]/snapshot/v3_snapshot.go:254\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Restore\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:1486\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Reset\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:410\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/managed.go:71\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).Start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/cluster.go:91\ngithub.com/k3s-io/k3s/pkg/daemons/control.prepare\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:261\ngithub.com/k3s-io/k3s/pkg/daemons/control.Server\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:35\ngithub.com/k3s-io/k3s/pkg/server.StartServer\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/server/server.go:56\ngithub.com/k3s-io/k3s/pkg/cli/server.run\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:498\ngithub.com/k3s-io/k3s/pkg/cli/server.RunWithControllers\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:48\ngithub.com/rancher/rke2/pkg/rke2.Server\n\t/source/pkg/rke2/rke2.go:123\ngithub.com/rancher/rke2/pkg/cli/cmds.ServerRun\n\t/source/pkg/cli/cmds/server.go:168\ngithub.com/urfave/cli.HandleAction\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:524\ngithub.com/urfave/cli.Command.Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/command.go:175\ngithub.com/urfave/cli.(*App).Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:277\nmain.main\n\t/source/main.go:23\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
{"level":"info","ts":"2024-07-08T09:58:26.639679+0200","caller":"membership/store.go:141","msg":"Trimming membership information from the backend..."}
{"level":"info","ts":"2024-07-08T09:58:26.654626+0200","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"220455547a6da141","local-member-id":"0","added-peer-id":"df593b333a718eb9","added-peer-peer-urls":["https://130.237.255.11:2380"]}
{"level":"info","ts":"2024-07-08T09:58:26.660707+0200","caller":"snapshot/v3_snapshot.go:269","msg":"restored snapshot","path":"/home/admin/on-demand-rancher-node011-1720423972","wal-dir":"/var/lib/rancher/rke2/server/db/etcd/member/wal","data-dir":"/var/lib/rancher/rke2/server/db/etcd","snap-dir":"/var/lib/rancher/rke2/server/db/etcd/member/snap"}
INFO[0020] Starting etcd for new cluster, cluster-reset=true
INFO[0020] Server node token is available at /var/lib/rancher/rke2/server/token
INFO[0020] Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/k3s/containerd/containerd.sock: connect: connection refused"
INFO[0020] To join server node to cluster: rke2 server -s https://130.237.255.11:9345 -t ${SERVER_NODE_TOKEN}
INFO[0020] Agent node token is available at /var/lib/rancher/rke2/server/agent-token
INFO[0020] To join agent node to cluster: rke2 agent -s https://130.237.255.11:9345 -t ${AGENT_NODE_TOKEN}
INFO[0020] Wrote kubeconfig /etc/rancher/rke2/rke2.yaml
INFO[0020] Run: rke2 kubectl
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
INFO[0020] Adding server to load balancer rke2-agent-load-balancer: 127.0.0.1:9345
INFO[0020] Running load balancer rke2-agent-load-balancer 127.0.0.1:6444 -> [127.0.0.1:9345] [default: 127.0.0.1:9345]
INFO[0020] Running load balancer rke2-api-server-agent-load-balancer 127.0.0.1:6443 -> [] [default: ]
INFO[0021] Password verified locally for node rancher-node011
INFO[0021] certificate CN=rancher-node011 signed by CN=rke2-server-ca@1720425416: notBefore=2024-07-08 07:56:56 +0000 UTC notAfter=2025-07-08 07:58:27 +0000 UTC
INFO[0021] certificate CN=system:node:rancher-node011,O=system:nodes signed by CN=rke2-client-ca@1720425416: notBefore=2024-07-08 07:56:56 +0000 UTC notAfter=2025-07-08 07:58:27 +0000 UTC
INFO[0021] Module overlay was already loaded
INFO[0021] Module nf_conntrack was already loaded
INFO[0021] Module br_netfilter was already loaded
INFO[0021] Module iptable_nat was already loaded
INFO[0021] Module iptable_filter was already loaded
INFO[0021] Runtime image index.docker.io/rancher/rke2-runtime:v1.28.10-rke2r1 bin and charts directories already exist; skipping extract
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-calico-crd.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-validation-webhook.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-cpi.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-csi.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-cilium.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-flannel.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-controller.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-csi-driver.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-canal.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-coredns.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-controller-crd.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-cloud-provider.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-calico.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-metrics-server.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-multus.yaml to set cluster configuration values
WARN[0021] SELinux is enabled on this host, but rke2 has not been started with --selinux - containerd SELinux support is disabled
INFO[0021] Logging containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0021] Running containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0022] containerd is now running
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt
INFO[0022] Image index.docker.io/rancher/rke2-cloud-provider:v1.29.3-build20240412 has already been pulled
INFO[0022] Imported docker.io/rancher/rke2-cloud-provider:v1.29.3-build20240412
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt in 3.05048ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/etcd-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20240418 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20240418
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/etcd-image.txt in 2.855414ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt in 2.930525ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-controller-manager-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-controller-manager-image.txt in 566.075µs
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-proxy-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-proxy-image.txt in 1.743271ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-scheduler-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-scheduler-image.txt in 1.543467ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/runtime-image.txt
INFO[0022] Image index.docker.io/rancher/rke2-runtime:v1.28.10-rke2r1 has already been pulled
INFO[0022] Imported docker.io/rancher/rke2-runtime:v1.28.10-rke2r1
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/runtime-image.txt in 1.627189ms
INFO[0022] Running kubelet --address=0.0.0.0 --alsologtostderr=false --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=rancher-node011 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --kubelet-cgroups=/rke2 --log-file=/var/lib/rancher/rke2/agent/logs/kubelet.log --log-file-max-size=50 --logtostderr=false --node-ip=130.237.255.11 --node-labels= --pod-infra-container-image=index.docker.io/rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --stderrthreshold=FATAL --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
INFO[0022] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T09:58:31.878154+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00083a1c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0025] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0025] Defragmenting etcd database
INFO[0025] etcd data store connection OK
INFO[0025] Saving cluster bootstrap data to datastore
INFO[0025] ETCD server is now running
INFO[0025] rke2 is up and running
INFO[0025] Waiting for API server to become available
panic: bootstrap data already found and encrypted with different token
goroutine 316 [running]:
github.com/k3s-io/k3s/pkg/cluster.(*Cluster).Start.func1()
/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/cluster.go:122 +0xc7
created by github.com/k3s-io/k3s/pkg/cluster.(*Cluster).Start in goroutine 1
/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/cluster.go:117 +0x5ad |
Beta Was this translation helpful? Give feedback.
-
Continuing my experiments, I had similar findings for my original multi-node config. As far as I can tell, it was exactly the same except I had to set Looking forward to your input on this! Does it sound like two (potentially separate) issues to you also? And just for completeness, here's my new experiment: Multi-node, no recreation
Created using same config as in my original post. Notice how it fails when I run the cluster reset without [admin@rancher-node011 ~]$ sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml create ns foo
[admin@rancher-node011 ~]$ sudo systemctl stop rke2-server
[admin@rancher-node012 ~]$ sudo systemctl stop rke2-server
[admin@rancher-node013 ~]$ sudo systemctl stop rke2-server
[admin@rancher-node011 ~]$ sudo rke2 server --cluster-reset --cluster-reset-restore-path=/var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720427121
WARN[0000] not running in CIS mode
INFO[0000] Applying Pod Security Admission Configuration
INFO[0000] Static pod cleanup in progress
INFO[0000] Logging temporary containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0000] Running temporary containerd /var/lib/rancher/rke2/bin/containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0010] Removing pod etcd-rancher-node011
INFO[0010] Removing pod kube-apiserver-rancher-node011
INFO[0020] Static pod cleanup completed successfully
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-api-server-agent-load-balancer.json: no such file or directory
INFO[0020] Starting rke2 v1.28.10+rke2r1 (b0d0d687d98f4fa015e7b30aaf2807b50edcc5d7)
INFO[0020] Managed etcd cluster bootstrap already complete and initialized
INFO[0020] Retrieving etcd snapshot /var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720427121 from S3
INFO[0020] Checking if S3 bucket k1h-rancher-dev-etcd-snapshots exists
INFO[0020] S3 bucket k1h-rancher-dev-etcd-snapshots exists
FATA[0020] starting kubernetes: preparing server: start managed database: The specified key does not exist.
[admin@rancher-node011 ~]$ sudo rke2 server --etcd-s3=false --cluster-reset --cluster-reset-restore-path=/var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720427121
WARN[0000] not running in CIS mode
INFO[0000] Applying Pod Security Admission Configuration
INFO[0000] Static pod cleanup in progress
INFO[0000] Logging temporary containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0000] Running temporary containerd /var/lib/rancher/rke2/bin/containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0010] Removing pod cloud-controller-manager-rancher-node011
INFO[0010] Removing pod kube-controller-manager-rancher-node011
INFO[0010] Removing pod kube-scheduler-rancher-node011
INFO[0020] Static pod cleanup completed successfully
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-api-server-agent-load-balancer.json: no such file or directory
INFO[0020] Starting rke2 v1.28.10+rke2r1 (b0d0d687d98f4fa015e7b30aaf2807b50edcc5d7)
INFO[0020] Managed etcd cluster bootstrap already complete and initialized
INFO[0020] Pre-restore etcd database moved to /var/lib/rancher/rke2/server/db/etcd-old-1720427323
{"level":"info","ts":"2024-07-08T10:28:43.336561+0200","caller":"snapshot/v3_snapshot.go:248","msg":"restoring snapshot","path":"/var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720427121","wal-dir":"/var/lib/rancher/rke2/server/db/etcd/member/wal","data-dir":"/var/lib/rancher/rke2/server/db/etcd","snap-dir":"/var/lib/rancher/rke2/server/db/etcd/member/snap","stack":"go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/go/pkg/mod/github.com/k3s-io/etcd/etcdutl/[email protected]/snapshot/v3_snapshot.go:254\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Restore\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:1486\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Reset\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:410\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/managed.go:71\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).Start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/cluster.go:91\ngithub.com/k3s-io/k3s/pkg/daemons/control.prepare\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:261\ngithub.com/k3s-io/k3s/pkg/daemons/control.Server\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:35\ngithub.com/k3s-io/k3s/pkg/server.StartServer\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/server/server.go:56\ngithub.com/k3s-io/k3s/pkg/cli/server.run\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:498\ngithub.com/k3s-io/k3s/pkg/cli/server.RunWithControllers\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:48\ngithub.com/rancher/rke2/pkg/rke2.Server\n\t/source/pkg/rke2/rke2.go:123\ngithub.com/rancher/rke2/pkg/cli/cmds.ServerRun\n\t/source/pkg/cli/cmds/server.go:168\ngithub.com/urfave/cli.HandleAction\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:524\ngithub.com/urfave/cli.Command.Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/command.go:175\ngithub.com/urfave/cli.(*App).Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:277\nmain.main\n\t/source/main.go:23\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
{"level":"info","ts":"2024-07-08T10:28:43.421335+0200","caller":"membership/store.go:141","msg":"Trimming membership information from the backend..."}
{"level":"info","ts":"2024-07-08T10:28:43.426676+0200","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"220455547a6da141","local-member-id":"0","added-peer-id":"df593b333a718eb9","added-peer-peer-urls":["https://130.237.255.11:2380"]}
{"level":"info","ts":"2024-07-08T10:28:43.435079+0200","caller":"snapshot/v3_snapshot.go:269","msg":"restored snapshot","path":"/var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720427121","wal-dir":"/var/lib/rancher/rke2/server/db/etcd/member/wal","data-dir":"/var/lib/rancher/rke2/server/db/etcd","snap-dir":"/var/lib/rancher/rke2/server/db/etcd/member/snap"}
INFO[0020] Starting etcd for new cluster, cluster-reset=true
INFO[0020] Server node token is available at /var/lib/rancher/rke2/server/token
INFO[0020] To join server node to cluster: rke2 server -s https://130.237.255.11:9345 -t ${SERVER_NODE_TOKEN}
INFO[0020] Agent node token is available at /var/lib/rancher/rke2/server/agent-token
INFO[0020] Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/k3s/containerd/containerd.sock: connect: connection refused"
INFO[0020] To join agent node to cluster: rke2 agent -s https://130.237.255.11:9345 -t ${AGENT_NODE_TOKEN}
INFO[0020] Wrote kubeconfig /etc/rancher/rke2/rke2.yaml
INFO[0020] Run: rke2 kubectl
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
INFO[0020] Adding server to load balancer rke2-agent-load-balancer: 127.0.0.1:9345
INFO[0020] Running load balancer rke2-agent-load-balancer 127.0.0.1:6444 -> [127.0.0.1:9345] [default: 127.0.0.1:9345]
INFO[0020] Running load balancer rke2-api-server-agent-load-balancer 127.0.0.1:6443 -> [] [default: ]
INFO[0021] Password verified locally for node rancher-node011
INFO[0021] certificate CN=rancher-node011 signed by CN=rke2-server-ca@1720426618: notBefore=2024-07-08 08:16:58 +0000 UTC notAfter=2025-07-08 08:28:44 +0000 UTC
INFO[0021] certificate CN=system:node:rancher-node011,O=system:nodes signed by CN=rke2-client-ca@1720426618: notBefore=2024-07-08 08:16:58 +0000 UTC notAfter=2025-07-08 08:28:44 +0000 UTC
INFO[0021] Module overlay was already loaded
INFO[0021] Module nf_conntrack was already loaded
INFO[0021] Module br_netfilter was already loaded
INFO[0021] Module iptable_nat was already loaded
INFO[0021] Module iptable_filter was already loaded
INFO[0021] Runtime image index.docker.io/rancher/rke2-runtime:v1.28.10-rke2r1 bin and charts directories already exist; skipping extract
INFO[0021] No cluster configuration value changes necessary for manifest /var/lib/rancher/rke2/server/manifests/kube-vip.yaml
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-cpi.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-csi.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-metrics-server.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-controller.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-calico-crd.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-cloud-provider.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-calico.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-canal.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-flannel.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-multus.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-csi-driver.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-cilium.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-coredns.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-controller-crd.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-validation-webhook.yaml to set cluster configuration values
WARN[0021] SELinux is enabled on this host, but rke2 has not been started with --selinux - containerd SELinux support is disabled
INFO[0021] Logging containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0021] Running containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0022] containerd is now running
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt
INFO[0022] Image index.docker.io/rancher/rke2-cloud-provider:v1.29.3-build20240412 has already been pulled
INFO[0022] Imported docker.io/rancher/rke2-cloud-provider:v1.29.3-build20240412
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt in 2.675436ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/etcd-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20240418 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20240418
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/etcd-image.txt in 1.794877ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt in 1.831026ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-controller-manager-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-controller-manager-image.txt in 1.394561ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-proxy-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-proxy-image.txt in 604.682µs
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-scheduler-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-scheduler-image.txt in 2.490358ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/runtime-image.txt
INFO[0022] Image index.docker.io/rancher/rke2-runtime:v1.28.10-rke2r1 has already been pulled
INFO[0022] Imported docker.io/rancher/rke2-runtime:v1.28.10-rke2r1
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/runtime-image.txt in 1.677863ms
INFO[0022] Running kubelet --address=0.0.0.0 --alsologtostderr=false --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=rancher-node011 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --kubelet-cgroups=/rke2 --log-file=/var/lib/rancher/rke2/agent/logs/kubelet.log --log-file-max-size=50 --logtostderr=false --node-ip=130.237.255.11 --node-labels= --pod-infra-container-image=index.docker.io/rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --stderrthreshold=FATAL --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
INFO[0022] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T10:28:48.647685+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0009c9880/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0025] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0027] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T10:28:53.648939+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0009c9a40/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0030] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0032] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T10:28:58.64993+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000c7ec40/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0035] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0037] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[0040] Pod for etcd not synced (pod sandbox not found), retrying
{"level":"warn","ts":"2024-07-08T10:29:03.650555+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000c7ee00/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0040] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0042] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T10:29:08.651518+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0009c9880/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0045] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0047] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[0049] Defragmenting etcd database
INFO[0049] etcd data store connection OK
INFO[0049] Waiting for API server to become available
INFO[0049] ETCD server is now running
INFO[0049] rke2 is up and running
INFO[0049] Saving cluster bootstrap data to datastore
WARN[0049] Bootstrap key already exists
INFO[0051] Defragmenting etcd database
INFO[0051] Reconciling bootstrap data between datastore and disk
INFO[0051] Cluster reset: backing up certificates directory to /var/lib/rancher/rke2/server/tls-1720427355
WARN[0051] Updating bootstrap data on disk from datastore
INFO[0051] certificate CN=etcd-peer signed by CN=etcd-peer-ca@1720426618: notBefore=2024-07-08 08:16:58 +0000 UTC notAfter=2025-07-08 08:29:15 +0000 UTC
INFO[0051] certificate CN=etcd-server signed by CN=etcd-server-ca@1720426618: notBefore=2024-07-08 08:16:58 +0000 UTC notAfter=2025-07-08 08:29:15 +0000 UTC
INFO[0051] Shutting down kubelet and etcd
ERRO[0051] Kubelet exited: signal: killed
INFO[0052] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[0056] Managed etcd cluster membership has been reset, restart without --cluster-reset flag now. Backup and delete ${datadir}/server/db on each peer etcd server and rejoin the nodes
[admin@rancher-node011 ~]$ sudo systemctl start rke2-server
[admin@rancher-node011 ~]$ sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get ns
NAME STATUS AGE
default Active 13m
foo Active 6m16s
kube-node-lease Active 13m
kube-public Active 13m
kube-system Active 13m
kube-vip Active 12m
system-upgrade Active 9m4s Multi-node, with recreation
[admin@rancher-node011 ~]$ sudo cp /var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720427121 /home/admin/
[admin@rancher-node011 ~]$ sudo chown admin:admin /home/admin/on-demand-rancher-node011-1720427121
[admin@rancher-node011 ~]$ exit
$ scp rancher-node011:/home/admin/on-demand-rancher-node011-1720427121 . Recreate the nodes using Terraform… $ scp ./on-demand-rancher-node011-1720427121 rancher-node011:/home/admin/
[admin@rancher-node011 ~]$ sudo systemctl stop rke2-server
[admin@rancher-node012 ~]$ sudo systemctl stop rke2-server
[admin@rancher-node013 ~]$ sudo systemctl stop rke2-server
[admin@rancher-node011 ~]$ sudo rke2 server --etcd-s3=false --cluster-reset --cluster-reset-restore-path=/home/admin/on-demand-rancher-node011-1720427121
WARN[0000] not running in CIS mode
INFO[0000] Applying Pod Security Admission Configuration
INFO[0000] Static pod cleanup in progress
INFO[0000] Logging temporary containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0000] Running temporary containerd /var/lib/rancher/rke2/bin/containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0010] Removing pod etcd-rancher-node011
INFO[0010] Removing pod kube-apiserver-rancher-node011
INFO[0020] Static pod cleanup completed successfully
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-api-server-agent-load-balancer.json: no such file or directory
INFO[0020] Starting rke2 v1.28.10+rke2r1 (b0d0d687d98f4fa015e7b30aaf2807b50edcc5d7)
INFO[0020] Managed etcd cluster bootstrap already complete and initialized
INFO[0020] Pre-restore etcd database moved to /var/lib/rancher/rke2/server/db/etcd-old-1720429331
{"level":"info","ts":"2024-07-08T11:02:11.00826+0200","caller":"snapshot/v3_snapshot.go:248","msg":"restoring snapshot","path":"/home/admin/on-demand-rancher-node011-1720427121","wal-dir":"/var/lib/rancher/rke2/server/db/etcd/member/wal","data-dir":"/var/lib/rancher/rke2/server/db/etcd","snap-dir":"/var/lib/rancher/rke2/server/db/etcd/member/snap","stack":"go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/go/pkg/mod/github.com/k3s-io/etcd/etcdutl/[email protected]/snapshot/v3_snapshot.go:254\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Restore\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:1486\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Reset\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:410\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/managed.go:71\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).Start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/cluster.go:91\ngithub.com/k3s-io/k3s/pkg/daemons/control.prepare\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:261\ngithub.com/k3s-io/k3s/pkg/daemons/control.Server\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:35\ngithub.com/k3s-io/k3s/pkg/server.StartServer\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/server/server.go:56\ngithub.com/k3s-io/k3s/pkg/cli/server.run\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:498\ngithub.com/k3s-io/k3s/pkg/cli/server.RunWithControllers\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:48\ngithub.com/rancher/rke2/pkg/rke2.Server\n\t/source/pkg/rke2/rke2.go:123\ngithub.com/rancher/rke2/pkg/cli/cmds.ServerRun\n\t/source/pkg/cli/cmds/server.go:168\ngithub.com/urfave/cli.HandleAction\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:524\ngithub.com/urfave/cli.Command.Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/command.go:175\ngithub.com/urfave/cli.(*App).Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:277\nmain.main\n\t/source/main.go:23\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
{"level":"info","ts":"2024-07-08T11:02:11.114223+0200","caller":"membership/store.go:141","msg":"Trimming membership information from the backend..."}
{"level":"info","ts":"2024-07-08T11:02:11.148311+0200","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"220455547a6da141","local-member-id":"0","added-peer-id":"df593b333a718eb9","added-peer-peer-urls":["https://130.237.255.11:2380"]}
{"level":"info","ts":"2024-07-08T11:02:11.158099+0200","caller":"snapshot/v3_snapshot.go:269","msg":"restored snapshot","path":"/home/admin/on-demand-rancher-node011-1720427121","wal-dir":"/var/lib/rancher/rke2/server/db/etcd/member/wal","data-dir":"/var/lib/rancher/rke2/server/db/etcd","snap-dir":"/var/lib/rancher/rke2/server/db/etcd/member/snap"}
INFO[0020] Starting etcd for new cluster, cluster-reset=true
INFO[0020] Server node token is available at /var/lib/rancher/rke2/server/token
INFO[0020] To join server node to cluster: rke2 server -s https://130.237.255.11:9345 -t ${SERVER_NODE_TOKEN}
INFO[0020] Agent node token is available at /var/lib/rancher/rke2/server/agent-token
INFO[0020] Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/k3s/containerd/containerd.sock: connect: connection refused"
INFO[0020] To join agent node to cluster: rke2 agent -s https://130.237.255.11:9345 -t ${AGENT_NODE_TOKEN}
INFO[0020] Wrote kubeconfig /etc/rancher/rke2/rke2.yaml
INFO[0020] Run: rke2 kubectl
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
INFO[0020] Adding server to load balancer rke2-agent-load-balancer: 127.0.0.1:9345
INFO[0020] Running load balancer rke2-agent-load-balancer 127.0.0.1:6444 -> [127.0.0.1:9345] [default: 127.0.0.1:9345]
INFO[0020] Running load balancer rke2-api-server-agent-load-balancer 127.0.0.1:6443 -> [] [default: ]
INFO[0021] Password verified locally for node rancher-node011
INFO[0021] certificate CN=rancher-node011 signed by CN=rke2-server-ca@1720428863: notBefore=2024-07-08 08:54:23 +0000 UTC notAfter=2025-07-08 09:02:12 +0000 UTC
INFO[0021] certificate CN=system:node:rancher-node011,O=system:nodes signed by CN=rke2-client-ca@1720428863: notBefore=2024-07-08 08:54:23 +0000 UTC notAfter=2025-07-08 09:02:12 +0000 UTC
INFO[0021] Module overlay was already loaded
INFO[0021] Module nf_conntrack was already loaded
INFO[0021] Module br_netfilter was already loaded
INFO[0021] Module iptable_nat was already loaded
INFO[0021] Module iptable_filter was already loaded
INFO[0021] Runtime image index.docker.io/rancher/rke2-runtime:v1.28.10-rke2r1 bin and charts directories already exist; skipping extract
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-flannel.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-controller.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-cloud-provider.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-csi.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-canal.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-cilium.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-calico.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-metrics-server.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-multus.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-validation-webhook.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-csi-driver.yaml to set cluster configuration values
INFO[0021] No cluster configuration value changes necessary for manifest /var/lib/rancher/rke2/server/manifests/kube-vip.yaml
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-coredns.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-controller-crd.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-cpi.yaml to set cluster configuration values
INFO[0021] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-calico-crd.yaml to set cluster configuration values
WARN[0021] SELinux is enabled on this host, but rke2 has not been started with --selinux - containerd SELinux support is disabled
INFO[0021] Logging containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0021] Running containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0022] containerd is now running
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt
INFO[0022] Image index.docker.io/rancher/rke2-cloud-provider:v1.29.3-build20240412 has already been pulled
INFO[0022] Imported docker.io/rancher/rke2-cloud-provider:v1.29.3-build20240412
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt in 3.71657ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/etcd-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20240418 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20240418
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/etcd-image.txt in 3.040854ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt in 1.782362ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-controller-manager-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-controller-manager-image.txt in 1.085357ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-proxy-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-proxy-image.txt in 1.012657ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/kube-scheduler-image.txt
INFO[0022] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0022] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/kube-scheduler-image.txt in 2.309809ms
INFO[0022] Pulling images from /var/lib/rancher/rke2/agent/images/runtime-image.txt
INFO[0022] Image index.docker.io/rancher/rke2-runtime:v1.28.10-rke2r1 has already been pulled
INFO[0022] Imported docker.io/rancher/rke2-runtime:v1.28.10-rke2r1
INFO[0022] Imported images from /var/lib/rancher/rke2/agent/images/runtime-image.txt in 1.900632ms
INFO[0022] Running kubelet --address=0.0.0.0 --alsologtostderr=false --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=rancher-node011 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --kubelet-cgroups=/rke2 --log-file=/var/lib/rancher/rke2/agent/logs/kubelet.log --log-file-max-size=50 --logtostderr=false --node-ip=130.237.255.11 --node-labels= --pod-infra-container-image=index.docker.io/rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --stderrthreshold=FATAL --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
INFO[0022] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T11:02:16.376379+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000ad5180/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0025] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0028] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T11:02:21.377144+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000873180/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0030] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0033] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T11:02:26.377726+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000ff81c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0035] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0038] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-08T11:02:31.37802+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000ff8380/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[0040] Pod for etcd not synced (pod sandbox not found), retrying
WARN[0040] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0043] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[0045] Defragmenting etcd database
FATA[0047] bootstrap data already found and encrypted with different token |
Beta Was this translation helpful? Give feedback.
-
Both of these are expected results
If you have s3 enabled when restoring, it is expected that you are passing the name of the s3 snapshot to restore, not the path to a file on disk. If you want to use a local snapshot instead of the s3 snapshot, then you should not enable s3 when restoring.
You're not using the same token value. If you restore to the same nodes, then the token exists on disk and will remain consistent. If you build new nodes and restore to them, then a new token will be generated, and will differ from the token value previously used. You should either manually specify the token in your tf, or capture the automatically generated random token from the initial nodes, and reuse it when restoring. This is covered in the docs: https://docs.rke2.io/backup_restore#other-notes-on-restoring-a-snapshot
|
Beta Was this translation helpful? Give feedback.
-
I tried restoring on a new cluster when providing the old cluster token, but that still failed in the same manner as my original issue (empty snapshot file): Restore from S3 fails
[admin@rancher-node011 ~]$ sudo ls -hl /var/lib/rancher/rke2/server/db/snapshots
total 0
[admin@rancher-node011 ~]$ sudo rke2 server --token "${OLD_RKE2_TOKEN:?}" --cluster-reset --cluster-reset-restore-path on-demand-rancher-node011-1720512744
WARN[0000] not running in CIS mode
INFO[0000] Applying Pod Security Admission Configuration
INFO[0000] Static pod cleanup in progress
INFO[0000] Logging temporary containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0000] Running temporary containerd /var/lib/rancher/rke2/bin/containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0010] Removing pod etcd-rancher-node011
INFO[0010] Removing pod kube-apiserver-rancher-node011
INFO[0020] Static pod cleanup completed successfully
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
WARN[0020] remove /var/lib/rancher/rke2/agent/etc/rke2-api-server-agent-load-balancer.json: no such file or directory
INFO[0020] Starting rke2 v1.28.10+rke2r1 (b0d0d687d98f4fa015e7b30aaf2807b50edcc5d7)
INFO[0020] Managed etcd cluster bootstrap already complete and initialized
INFO[0020] Retrieving etcd snapshot on-demand-rancher-node011-1720512744 from S3
INFO[0020] Checking if S3 bucket k1h-rancher-dev-etcd-snapshots exists
INFO[0020] S3 bucket k1h-rancher-dev-etcd-snapshots exists
INFO[0020] S3 download complete for /var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720512744
INFO[0020] Pre-restore etcd database moved to /var/lib/rancher/rke2/server/db/etcd-old-1720513942
{"level":"info","ts":"2024-07-09T10:32:22.392713+0200","caller":"snapshot/v3_snapshot.go:248","msg":"restoring snapshot","path":"/var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720512744","wal-dir":"/var/lib/rancher/rke2/server/db/etcd/member/wal","data-dir":"/var/lib/rancher/rke2/server/db/etcd","snap-dir":"/var/lib/rancher/rke2/server/db/etcd/member/snap","stack":"go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/go/pkg/mod/github.com/k3s-io/etcd/etcdutl/[email protected]/snapshot/v3_snapshot.go:254\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Restore\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:1486\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Reset\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:410\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/managed.go:71\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).Start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/cluster.go:91\ngithub.com/k3s-io/k3s/pkg/daemons/control.prepare\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:261\ngithub.com/k3s-io/k3s/pkg/daemons/control.Server\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:35\ngithub.com/k3s-io/k3s/pkg/server.StartServer\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/server/server.go:56\ngithub.com/k3s-io/k3s/pkg/cli/server.run\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:498\ngithub.com/k3s-io/k3s/pkg/cli/server.RunWithControllers\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:48\ngithub.com/rancher/rke2/pkg/rke2.Server\n\t/source/pkg/rke2/rke2.go:123\ngithub.com/rancher/rke2/pkg/cli/cmds.ServerRun\n\t/source/pkg/cli/cmds/server.go:168\ngithub.com/urfave/cli.HandleAction\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:524\ngithub.com/urfave/cli.Command.Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/command.go:175\ngithub.com/urfave/cli.(*App).Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:277\nmain.main\n\t/source/main.go:23\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
FATA[0020] starting kubernetes: preparing server: start managed database: seek /var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720512744: invalid argument
[admin@rancher-node011 ~]$ sudo ls -hl /var/lib/rancher/rke2/server/db/snapshots
total 0
-rw-------. 1 root root 0 Jul 9 10:32 on-demand-rancher-node011-1720512744 Same with `--debug`
[admin@rancher-node011 ~]$ sudo rke2 server --token "${OLD_RKE2_TOKEN:?}" --cluster-reset --cluster-reset-restore-path on-demand-rancher-node011-1720512744 --debug
WARN[0000] not running in CIS mode
INFO[0000] Applying Pod Security Admission Configuration
INFO[0000] Static pod cleanup in progress
INFO[0000] Logging temporary containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0000] Running temporary containerd /var/lib/rancher/rke2/bin/containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0010] Static pod cleanup completed successfully
WARN[0010] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
WARN[0010] remove /var/lib/rancher/rke2/agent/etc/rke2-api-server-agent-load-balancer.json: no such file or directory
INFO[0010] Starting rke2 v1.28.10+rke2r1 (b0d0d687d98f4fa015e7b30aaf2807b50edcc5d7)
INFO[0010] Managed etcd cluster initializing
INFO[0010] Retrieving etcd snapshot on-demand-rancher-node011-1720512744 from S3
INFO[0010] Checking if S3 bucket k1h-rancher-dev-etcd-snapshots exists
W0709 10:39:49.511256 23074 logging.go:59] [core] [Channel #4 SubChannel #5] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused"
INFO[0010] S3 bucket k1h-rancher-dev-etcd-snapshots exists
DEBU[0010] Skip setting S3 snapshot cluster ID and token during cluster-reset
DEBU[0010] Downloading snapshot from s3://k1h-rancher-dev-etcd-snapshots/on-demand-rancher-node011-1720512744
DEBU[0010] Downloading snapshot metadata from s3://k1h-rancher-dev-etcd-snapshots/.metadata/on-demand-rancher-node011-1720512744
INFO[0010] S3 download complete for /var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720512744
INFO[0010] Pre-restore etcd database moved to /var/lib/rancher/rke2/server/db/etcd-old-1720514389
{"level":"info","ts":"2024-07-09T10:39:49.589084+0200","caller":"snapshot/v3_snapshot.go:248","msg":"restoring snapshot","path":"/var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720512744","wal-dir":"/var/lib/rancher/rke2/server/db/etcd/member/wal","data-dir":"/var/lib/rancher/rke2/server/db/etcd","snap-dir":"/var/lib/rancher/rke2/server/db/etcd/member/snap","stack":"go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/go/pkg/mod/github.com/k3s-io/etcd/etcdutl/[email protected]/snapshot/v3_snapshot.go:254\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Restore\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:1486\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Reset\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:410\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/managed.go:71\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).Start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/cluster.go:91\ngithub.com/k3s-io/k3s/pkg/daemons/control.prepare\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:261\ngithub.com/k3s-io/k3s/pkg/daemons/control.Server\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:35\ngithub.com/k3s-io/k3s/pkg/server.StartServer\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/server/server.go:56\ngithub.com/k3s-io/k3s/pkg/cli/server.run\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:498\ngithub.com/k3s-io/k3s/pkg/cli/server.RunWithControllers\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:48\ngithub.com/rancher/rke2/pkg/rke2.Server\n\t/source/pkg/rke2/rke2.go:123\ngithub.com/rancher/rke2/pkg/cli/cmds.ServerRun\n\t/source/pkg/cli/cmds/server.go:168\ngithub.com/urfave/cli.HandleAction\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:524\ngithub.com/urfave/cli.Command.Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/command.go:175\ngithub.com/urfave/cli.(*App).Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:277\nmain.main\n\t/source/main.go:23\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
FATA[0010] starting kubernetes: preparing server: start managed database: seek /var/lib/rancher/rke2/server/db/snapshots/on-demand-rancher-node011-1720512744: invalid argument I also tried manually downloading the file from S3 and restoring with Using old token but not S3
[admin@rancher-node011 ~]$ sudo rke2 server --token "${OLD_RKE2_TOKEN:?}" --etcd-s3=false --cluster-reset --cluster-reset-restore-path=/home/admin/on-demand-rancher-node011-1720512744
WARN[0000] not running in CIS mode
INFO[0000] Applying Pod Security Admission Configuration
INFO[0000] Static pod cleanup in progress
INFO[0000] Logging temporary containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0000] Running temporary containerd /var/lib/rancher/rke2/bin/containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0010] Static pod cleanup completed successfully
WARN[0010] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
WARN[0010] remove /var/lib/rancher/rke2/agent/etc/rke2-api-server-agent-load-balancer.json: no such file or directory
INFO[0010] Starting rke2 v1.28.10+rke2r1 (b0d0d687d98f4fa015e7b30aaf2807b50edcc5d7)
INFO[0010] Managed etcd cluster initializing
INFO[0010] Pre-restore etcd database moved to /var/lib/rancher/rke2/server/db/etcd-old-1720514708
{"level":"info","ts":"2024-07-09T10:45:08.714215+0200","caller":"snapshot/v3_snapshot.go:248","msg":"restoring snapshot","path":"/home/admin/on-demand-rancher-node011-1720512744","wal-dir":"/var/lib/rancher/rke2/server/db/etcd/member/wal","data-dir":"/var/lib/rancher/rke2/server/db/etcd","snap-dir":"/var/lib/rancher/rke2/server/db/etcd/member/snap","stack":"go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/go/pkg/mod/github.com/k3s-io/etcd/etcdutl/[email protected]/snapshot/v3_snapshot.go:254\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Restore\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:1486\ngithub.com/k3s-io/k3s/pkg/etcd.(*ETCD).Reset\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/etcd/etcd.go:410\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/managed.go:71\ngithub.com/k3s-io/k3s/pkg/cluster.(*Cluster).Start\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cluster/cluster.go:91\ngithub.com/k3s-io/k3s/pkg/daemons/control.prepare\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:261\ngithub.com/k3s-io/k3s/pkg/daemons/control.Server\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/daemons/control/server.go:35\ngithub.com/k3s-io/k3s/pkg/server.StartServer\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/server/server.go:56\ngithub.com/k3s-io/k3s/pkg/cli/server.run\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:498\ngithub.com/k3s-io/k3s/pkg/cli/server.RunWithControllers\n\t/go/pkg/mod/github.com/k3s-io/[email protected]/pkg/cli/server/server.go:48\ngithub.com/rancher/rke2/pkg/rke2.Server\n\t/source/pkg/rke2/rke2.go:123\ngithub.com/rancher/rke2/pkg/cli/cmds.ServerRun\n\t/source/pkg/cli/cmds/server.go:168\ngithub.com/urfave/cli.HandleAction\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:524\ngithub.com/urfave/cli.Command.Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/command.go:175\ngithub.com/urfave/cli.(*App).Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:277\nmain.main\n\t/source/main.go:23\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"}
{"level":"info","ts":"2024-07-09T10:45:08.792691+0200","caller":"membership/store.go:141","msg":"Trimming membership information from the backend..."}
{"level":"info","ts":"2024-07-09T10:45:08.818607+0200","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"220455547a6da141","local-member-id":"0","added-peer-id":"df593b333a718eb9","added-peer-peer-urls":["https://130.237.255.11:2380"]}
{"level":"info","ts":"2024-07-09T10:45:08.827777+0200","caller":"snapshot/v3_snapshot.go:269","msg":"restored snapshot","path":"/home/admin/on-demand-rancher-node011-1720512744","wal-dir":"/var/lib/rancher/rke2/server/db/etcd/member/wal","data-dir":"/var/lib/rancher/rke2/server/db/etcd","snap-dir":"/var/lib/rancher/rke2/server/db/etcd/member/snap"}
INFO[0010] Starting etcd for new cluster, cluster-reset=true
INFO[0010] Server node token is available at /var/lib/rancher/rke2/server/token
INFO[0010] Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/k3s/containerd/containerd.sock: connect: connection refused"
INFO[0010] To join server node to cluster: rke2 server -s https://130.237.255.11:9345 -t ${SERVER_NODE_TOKEN}
INFO[0010] Agent node token is available at /var/lib/rancher/rke2/server/agent-token
INFO[0010] To join agent node to cluster: rke2 agent -s https://130.237.255.11:9345 -t ${AGENT_NODE_TOKEN}
INFO[0010] Wrote kubeconfig /etc/rancher/rke2/rke2.yaml
INFO[0010] Run: rke2 kubectl
WARN[0010] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
INFO[0010] Adding server to load balancer rke2-agent-load-balancer: 127.0.0.1:9345
INFO[0010] Running load balancer rke2-agent-load-balancer 127.0.0.1:6444 -> [127.0.0.1:9345] [default: 127.0.0.1:9345]
INFO[0010] Running load balancer rke2-api-server-agent-load-balancer 127.0.0.1:6443 -> [] [default: ]
INFO[0011] Password verified locally for node rancher-node011
INFO[0011] certificate CN=rancher-node011 signed by CN=rke2-server-ca@1720513267: notBefore=2024-07-09 08:21:07 +0000 UTC notAfter=2025-07-09 08:45:09 +0000 UTC
INFO[0011] certificate CN=system:node:rancher-node011,O=system:nodes signed by CN=rke2-client-ca@1720513267: notBefore=2024-07-09 08:21:07 +0000 UTC notAfter=2025-07-09 08:45:10 +0000 UTC
INFO[0011] Module overlay was already loaded
INFO[0011] Module nf_conntrack was already loaded
INFO[0011] Module br_netfilter was already loaded
INFO[0011] Module iptable_nat was already loaded
INFO[0011] Module iptable_filter was already loaded
INFO[0011] Runtime image index.docker.io/rancher/rke2-runtime:v1.28.10-rke2r1 bin and charts directories already exist; skipping extract
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-csi.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-calico.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-cilium.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-coredns.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-csi-driver.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-cpi.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-calico-crd.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-canal.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-validation-webhook.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-cloud-provider.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-controller-crd.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-flannel.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-metrics-server.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-multus.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-controller.yaml to set cluster configuration values
INFO[0011] No cluster configuration value changes necessary for manifest /var/lib/rancher/rke2/server/manifests/kube-vip.yaml
WARN[0011] SELinux is enabled on this host, but rke2 has not been started with --selinux - containerd SELinux support is disabled
INFO[0011] Logging containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0011] Running containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0012] containerd is now running
INFO[0012] Pulling images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt
INFO[0012] Image index.docker.io/rancher/rke2-cloud-provider:v1.29.3-build20240412 has already been pulled
INFO[0012] Imported docker.io/rancher/rke2-cloud-provider:v1.29.3-build20240412
INFO[0012] Imported images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt in 3.562557ms
INFO[0012] Pulling images from /var/lib/rancher/rke2/agent/images/etcd-image.txt
INFO[0012] Image index.docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20240418 has already been pulled
INFO[0012] Imported docker.io/rancher/hardened-etcd:v3.5.9-k3s1-build20240418
INFO[0012] Imported images from /var/lib/rancher/rke2/agent/images/etcd-image.txt in 2.135643ms
INFO[0012] Pulling images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt
INFO[0012] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0012] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0012] Imported images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt in 2.711275ms
INFO[0012] Pulling images from /var/lib/rancher/rke2/agent/images/kube-controller-manager-image.txt
INFO[0012] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0012] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0012] Imported images from /var/lib/rancher/rke2/agent/images/kube-controller-manager-image.txt in 1.393999ms
INFO[0012] Pulling images from /var/lib/rancher/rke2/agent/images/kube-proxy-image.txt
INFO[0012] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0012] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0012] Imported images from /var/lib/rancher/rke2/agent/images/kube-proxy-image.txt in 580.711µs
INFO[0012] Pulling images from /var/lib/rancher/rke2/agent/images/kube-scheduler-image.txt
INFO[0012] Image index.docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514 has already been pulled
INFO[0012] Imported docker.io/rancher/hardened-kubernetes:v1.28.10-rke2r1-build20240514
INFO[0012] Imported images from /var/lib/rancher/rke2/agent/images/kube-scheduler-image.txt in 524.169µs
INFO[0012] Pulling images from /var/lib/rancher/rke2/agent/images/runtime-image.txt
INFO[0012] Image index.docker.io/rancher/rke2-runtime:v1.28.10-rke2r1 has already been pulled
INFO[0012] Imported docker.io/rancher/rke2-runtime:v1.28.10-rke2r1
INFO[0012] Imported images from /var/lib/rancher/rke2/agent/images/runtime-image.txt in 1.572278ms
INFO[0012] Running kubelet --address=0.0.0.0 --alsologtostderr=false --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=rancher-node011 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --kubelet-cgroups=/rke2 --log-file=/var/lib/rancher/rke2/agent/logs/kubelet.log --log-file-max-size=50 --logtostderr=false --node-ip=130.237.255.11 --node-labels= --pod-infra-container-image=index.docker.io/rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --stderrthreshold=FATAL --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
INFO[0012] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-09T10:45:14.073422+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000c4bdc0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0015] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0018] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-09T10:45:19.073955+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000af0e00/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0020] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0023] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-09T10:45:24.075075+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000618e00/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0025] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0028] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
INFO[0030] Pod for etcd not synced (pod sandbox not found), retrying
{"level":"warn","ts":"2024-07-09T10:45:29.075868+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000af0e00/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0030] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0033] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-09T10:45:34.076052+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000af0fc0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
WARN[0035] Failed to get apiserver address from etcd: context deadline exceeded
INFO[0037] Defragmenting etcd database
INFO[0037] Reconciling bootstrap data between datastore and disk
INFO[0037] Cluster reset: backing up certificates directory to /var/lib/rancher/rke2/server/tls-1720514736
WARN[0037] Updating bootstrap data on disk from datastore
INFO[0037] certificate CN=system:admin,O=system:masters signed by CN=rke2-client-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] certificate CN=system:rke2-supervisor,O=system:masters signed by CN=rke2-client-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] certificate CN=system:kube-controller-manager signed by CN=rke2-client-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] certificate CN=system:kube-scheduler signed by CN=rke2-client-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] certificate CN=system:apiserver,O=system:masters signed by CN=rke2-client-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] certificate CN=system:kube-proxy signed by CN=rke2-client-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] certificate CN=system:rke2-controller signed by CN=rke2-client-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] certificate CN=rke2-cloud-controller-manager signed by CN=rke2-client-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] certificate CN=kube-apiserver signed by CN=rke2-server-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] certificate CN=system:auth-proxy signed by CN=rke2-request-header-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] certificate CN=etcd-client signed by CN=etcd-server-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] certificate CN=etcd-peer signed by CN=etcd-peer-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] certificate CN=etcd-server signed by CN=etcd-server-ca@1720512055: notBefore=2024-07-09 08:00:55 +0000 UTC notAfter=2025-07-09 08:45:36 +0000 UTC
INFO[0037] Shutting down kubelet and etcd
ERRO[0037] Kubelet exited: signal: killed
INFO[0038] Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:6444/v1-rke2/readyz: 500 Internal Server Error
{"level":"warn","ts":"2024-07-09T10:45:38.832741+0200","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000618c40/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""}
INFO[0040] Failed to test data store connection: context deadline exceeded
INFO[0040] Waiting for etcd server to become available
INFO[0042] Managed etcd cluster membership has been reset, restart without --cluster-reset flag now. Backup and delete ${datadir}/server/db on each peer etcd server and rejoin the nodes Granted my little excursion with the missing node token was not relevant, I think my original issue still stands. Do you agree, or am I missing something else here? |
Beta Was this translation helpful? Give feedback.
-
Hi! I just want to say that I'm in talks with NetApp support to try and debug this. I have sent detailed logs to them and awaiting response. |
Beta Was this translation helpful? Give feedback.
-
Environmental Info:
RKE2 Version:
v1.28.10+rke2r1
Node(s) CPU architecture, OS, and Version:
Linux rancher-node011 5.14.0-284.30.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 25 09:13:12 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
3 servers, 0 agents. Configured to be an upstream cluster to deploy Rancher.
Describe the bug:
It seems like RKE2 cannot restore etcd snapshots stored in an S3 bucket.
Steps To Reproduce:
Create a (HA) cluster with the following config:
Take an etcd snapshot:
sudo rke2 etcd-snapshot save
Stop the
rke2-server
on all nodes and then run this command on the initial server node:Expected behavior:
I expect a successful cluster reset restoring to the specified etcd snapshot.
Actual behavior:
Additional context / logs:
This bears some resemblance to these Issues:
I've tried restoring it via a manual download too. This fails with another error:
Beta Was this translation helpful? Give feedback.
All reactions