-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows nodes can't reach service network #77
Comments
I found a solution to this, which was to add --ip-masq arg to flanneld.exe Is there a reason this wouldn't be the default, when the linux default is to use it? |
It was included in the old scripts. Probably just got lost in the new version. Feel free to make a PR with the fix! |
Will create a PR. It seems odd to add to the commandline in the run.ps1. I will try to find a better way that is similar to the default linux kube-flannel.yml |
Hmm, curious that this hasn't been a problem for us so far. AFAIK service IPs work today, I would guess that this is covered by one of the conformance tests that is passing for us: https://k8s-testgrid.appspot.com/sig-windows#kubeadm-windows-gcp-k8s-stable |
I too am having connection issues when assigning a service with type LoadBalancer. I tried your suggested fix with applying ip-masq ( From my Linux nodes services works as expected; from Windows there are only connectivity from within the cluster; any external ip's is not getting routed for some reason. Any help is appreciated; have looked into this issue for days now .. Runnung K8S 1.18.3, Flannel 0.12 and custom images to support 1909 images. Network is host-gw/l2bridge, as i have never have success with vxlan. |
As a workaround, i setup an Ingress for one of the services; then it works fine (i guess it is because ingress is hosted on Linux - and the connection inside the cluster works fine). However, the other ports i have opened up for cannot be assigned to ingress as they are non-http protocols. |
As yet another workaround, for non-http protocols, I had to remove the LoadBalancer type and op-in for NodePort. Then i needed to re-configure my router, to translate a normal port to a node-port. Do note, that the two workarounds is only when talking Windows nodes; Linux nodes works as expected. I really hope the SIG team will start to investigate these issues further. Thanks. |
What for me works, if all the pods are up and running on the Windows Node I do the following Stop-Service kubelet Then the hole networking are working, LoadBalancer works, ping in the pods works to al locations. |
Note that we added a couple of Kube-Proxy fixes for Windows hosts a couple weeks ago that can lead to the symptoms described in this issue: Can you re-try with a Kube-Proxy version that has these fixes, please? @sbangari and @Keith-Mange, FYI |
Today tested with v1.18.4, no difference. |
@Stefanbs23 : Thank you for trying it! Unfortunately v1.18.x don't seem to have these changes. That said, the changes should theoretically be in the next v1.19.x release. |
@JocelynBerrendonner Sorry, I'm fairly new to Kubernetes, so I'm probably misunderstanding how this works. Isn't it something about the routing (which afaik is governed by the CNI plugin) or the kube-proxy instance on the master node (which is Linux) that would be causing the problem? In other words, isn't the Windows kube-proxy instance only responsible for dealing with services whose pods are scheduled on Windows (which is not the case with coredns)? |
Welp, empirically at least I'm wrong. I looked at I guess I need to do more research on how network traffic flows through the system with the "Service" concept in Kubernetes. |
@masaeedu : this bug indeed tracks a Windows issue and has nothing to do with Linux. Each instance of Kube-Proxy is responsible for plumbing services connectivity on the node it runs on. The problem you describe seems unrelated to this bug. Let's open a different issue to track it! |
I am having this same issue #103 not sure how to resolve this. I have tried a number of things |
I'm having the same issue with the master node (Linux, v1.19.3) and worker node (windows v1.19.0) |
@sbangari: heads up |
I cannot reach the service in the Windows node from the Linux node.
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Having the same issue on a deployment with flannel vxlan using Kubernetes 1.21 on both the control plane (Linux) and the node (Windows Server Core 2004). I can send data between Pods in the Windows node but have no connectivity from these pods to to the outside or any service IP hosted in either the Linux or Windows nodes. |
@joaoestevinho: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This seems to still be an active problem. |
I have the following setup
Windows 2019-1909
Kubernetes 1.18.2
Control Plane: CentOS 7.7 with k8s 1.18.2 built with kubedm
CNI: flannel with vxlan, using the proper vxlan ID and UDP ports for Windows compatibility
I followed PrepareNode.ps1 script here to get the 1909 server ready, but had to build my own kube-proxy and kube-flannel windows images as those don't support 1909. I had to build the setup.exe from another system and just ADD it into the container as there isn't a golang:servercore1909 image to use as the build image.
I've followed the instructions at https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/ to get everything up and running. I can successfully get a pod containing a servercore 1909 image. When I exec into this POD, I can ping all the linux cluster node IPs just fine. Route tables look accurate
However, when I try to reach the service network, including coredns, my connections time out. So I can't do DNS lookups whatsoever.
I can reach outside my cluster fine as well (nslookup using our physical DNS server IP addresses)
The only thing that doesn't seem to be working is service network connectivity. the node does have a proper route to the service network, and I can see \etc\cni\net.d\10-flannel.conf have the correct ExceptionList for OutBoundNAT that covers both the service network and pod networks, and also have a ROUTE type endpointpolicy with destination set to the service network with NeedEncap: true
The text was updated successfully, but these errors were encountered: