-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-proxy get stuck if master is recreated on new instance #56720
Comments
@calvix: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Curious how is the kube-proxy health check configured? Agree it is odd that no errors are logged. |
Hello @MrHohn , We have this health check:
|
kube-proxy usually talks to the master directly by IP, so if you change teh master IP, kube-proxy gets lost? |
interesting, we are using In general, i think this issue related to the fact that we are hardly killing master VM and some TCP connections stuck in ESTABLISCHED state. We had similar problems with Calico (projectcalico/calico#1420) and own node-controller (uses pure client-go library) If VM that serves k8s api got killed TCP connection open on client side until client drops it (for me it was 10-17 minutes). Even tcp keepalive (which enabled by default in client-go/golang-net) does not help. Finally only solution for node-contoller (app that uses client-go) was to restart pod by kubernetes if it can not connect to api. Here is my question in client-go. I hope someone will answer it at some point, as i think this is mostly a cause of similar issues. Similar k8s issue that also can be related (i've attached a lot of tcpdumps there). P.S. me and @calvix are from the same company. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
The issue is not fixed, but we have workarounds in place that helps us mitigate the issue. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
Kube-proxy get stuck after master goes down and is recreated on a new machine.
We run
kube-proxy
as daemon set and under normal circumstanceskube-proxy
works fine.We run k8s nodes as imutable instances and if there is reboot, stop or error, the node is recreated as whole new machine with new ip,mac and everyhting. Etcd data is stored on persistent storage but OS is not.
K8s API endpoint stays same.
This lead to an issue when the master is "recreated" then
kube-proxy
is in some weird stuck state when it doesn't work. We run health checks on thekube-proxy
, but this does not trigger any restart as thekube-proxy
thinks that its healthy and there is not a single log entry indicating that anything is wrong.To fix it we need to kill all kube-proxy pods and then it works again.
My wild assumption is that kube-proxy is holding open connection to the k8s-api and if the master is recreated with new ip, kubeproxy is still using the old non-working connection.
What you expected to happen:
Kube-proxy is checking if the current connection to the K8S api is valid in some period of time and if not the it force reconnection.
How to reproduce it (as minimally and precisely as possible):
service
(they should not work properly) or create a new one and test that new service.Anything else we need to know?:
Environment:
kubectl version
):but we got similar behavior also on
1.8.1
uname -a
):@kubernetes/sig-network-bugs
The text was updated successfully, but these errors were encountered: