Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Networking issues with calico on windows nodes - no internet connectivity #378

Open
Breee opened this issue Oct 11, 2024 · 2 comments
Open
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@Breee
Copy link

Breee commented Oct 11, 2024

Describe the bug

  • All Nodes can ping all other nodes --> Good
  • Linux pods work as expected --> Good
  • Pods started on windows nodes do not have network (not even DNS): --> BAD
$ k exec iis-demo-795f98f84d-lmrx5 -it --  nslookup google.com
DNS request timed out.
    timeout was 2 seconds.
Server:  UnKnown
Address:  10.42.0.10

DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.

k exec iis-demo-795f98f84d-lmrx5 -it -- ipconfig

Windows IP Configuration
Ethernet adapter vEthernet (8cc68cfad871ed1e55d5408e88553e50ecbf1e420975154524012f33b9ecf69c_Calico):
   Connection-specific DNS Suffix  . : default.svc.cluster.local
   Link-local IPv6 Address . . . . . : fe80::854b:7b85:43c9:44a7%34
   IPv4 Address. . . . . . . . . . . : 10.42.249.79
   Subnet Mask . . . . . . . . . . . : 255.255.255.192
   Default Gateway . . . . . . . . . : 10.42.249.65

To Reproduce

Cluster API manifest for kubeadm

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: ${CI_COMMIT_REF_NAME}
    cni-windows: calico
    windows: enabled
  name: ${CI_COMMIT_REF_NAME}
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 10.42.128.0/17
    services:
      cidrBlocks:
      - 10.42.0.0/17
 [..]
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: ${K8S_CLUSTER}-windows-no-domain-join
  namespace: default
spec:
  template:
    spec:
      verbosity: 5
      files:
      - content: |-
          Add-MpPreference -ExclusionProcess C:/opt/cni/bin/calico.exe
          Add-MpPreference -ExclusionProcess C:/opt/cni/bin/calico-ipam.exe
        path: C:/defender-exclude-calico.ps1
        permissions: "0744"
      joinConfiguration:
        nodeRegistration:
          criSocket: npipe:////./pipe/containerd-containerd
          ignorePreflightErrors: ["IsPrivilegedUser"]
          kubeletExtraArgs:
            cloud-provider: external
            v: "2"
            windows-priorityclass: ABOVE_NORMAL_PRIORITY_CLASS
            hostname-override: '{{ ds.meta_data["local_hostname"] }}'
          name: '{{ ds.meta_data["local_hostname"] }}'
      preKubeadmCommands:
      - mklink /d c:\etc\kubernetes\ssl c:\etc\kubernetes\pki
      postKubeadmCommands:
      - nssm set kubelet start SERVICE_AUTO_START
      - powershell C:/defender-exclude-calico.ps1

calico installation values:

installation:
  enabled: true
  # Configures Calico networking.
  serviceCIDRs:
    - 10.42.0.0/17
  calicoNetwork:
    bgp: Disabled
    linuxDataplane: Iptables
    windowsDataplane: HNS
    ipPools:
    - name: default-ipv4-ippool
      blockSize: 26
      cidr: 10.42.128.0/17
      encapsulation: VXLAN
      natOutgoing: Enabled
      nodeSelector: all()

apiServer:
  enabled: true

script:

      export CALICO_VERSION="v3.28.1"
      kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/projectcalico/calico/${CALICO_VERSION}/manifests/operator-crds.yaml
      helm repo add projectcalico https://docs.tigera.io/calico/charts
      kubectl apply -f calico/namespace.yaml
      cat calico/endpoints.yaml  | \
        envsubst | \
        kubectl apply -f -
      cat calico/values.yaml  | \
      envsubst | \
      helm install --version ${CALICO_VERSION} --namespace tigera-operator  calico projectcalico/tigera-operator --values -
      sleep 30
      while ! kubectl get installation default; do
          echo "Waiting for installation default to exist..."
          sleep 10
      done
      while ! kubectl get ippool default-ipv4-ippool; do
          echo "Waiting for ippool default-ipv4-ippool to exist..."
          sleep 10
      done
      kubectl patch ippool default-ipv4-ippool --type='json' -p='[{"op": "replace", "path": "/spec/vxlanMode", "value": "Always"}]'
      while ! kubectl get ipamconfig default; do
          echo "Waiting for ipamconfig default to exist..."
          sleep 10
      done
      kubectl patch ipamconfig default --type merge --patch='{"spec": {"strictAffinity": true}}'
      curl -L https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/calico/kube-proxy/kube-proxy.yml | sed 's/KUBE_PROXY_VERSION/v1.31.1/g' | kubectl apply -f -

full rendered installation:

apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  annotations:
    meta.helm.sh/release-name: calico
    meta.helm.sh/release-namespace: tigera-operator
  creationTimestamp: "2024-10-09T11:56:28Z"
  finalizers:
  - operator.tigera.io/installation-controller
  - tigera.io/operator-cleanup
  - operator.tigera.io/apiserver-controller
  generation: 5
  labels:
    app.kubernetes.io/managed-by: Helm
  name: default
  resourceVersion: "786311"
  uid: 7dcb3345-8343-4fe8-8133-3c74e193515c
spec:
  calicoNetwork:
    bgp: Disabled
    hostPorts: Enabled
    ipPools:
    - allowedUses:
      - Workload
      - Tunnel
      blockSize: 26
      cidr: 10.42.128.0/17
      disableBGPExport: false
      encapsulation: VXLAN
      name: default-ipv4-ippool
      natOutgoing: Enabled
      nodeSelector: all()
    linuxDataplane: Iptables
    linuxPolicySetupTimeoutSeconds: 0
    multiInterfaceMode: None
    nodeAddressAutodetectionV4:
      kubernetes: NodeInternalIP
    windowsDataplane: HNS
  cni:
    ipam:
      type: Calico
    type: Calico
  controlPlaneReplicas: 2
  flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
  imagePullSecrets: []
  kubeletVolumePluginPath: /var/lib/kubelet
  kubernetesProvider: ""
  logging:
    cni:
      logFileMaxAgeDays: 30
      logFileMaxCount: 10
      logFileMaxSize: 100Mi
      logSeverity: Info
  nodeUpdateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  nonPrivileged: Disabled
  serviceCIDRs:
  - 10.42.0.0/17
  variant: Calico
  windowsNodes:
    cniBinDir: /opt/cni/bin
    cniConfigDir: /etc/cni/net.d
    cniLogDir: /var/log/calico/cni
status:
  calicoVersion: v3.28.1
  computed:
    calicoNetwork:
      bgp: Disabled
      hostPorts: Enabled
      ipPools:
      - allowedUses:
        - Workload
        - Tunnel
        blockSize: 26
        cidr: 10.42.128.0/17
        disableBGPExport: false
        encapsulation: VXLAN
        name: default-ipv4-ippool
        natOutgoing: Enabled
        nodeSelector: all()
      linuxDataplane: Iptables
      linuxPolicySetupTimeoutSeconds: 0
      multiInterfaceMode: None
      nodeAddressAutodetectionV4:
        kubernetes: NodeInternalIP
      windowsDataplane: HNS
    cni:
      ipam:
        type: Calico
      type: Calico
    controlPlaneReplicas: 2
    flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
    kubeletVolumePluginPath: /var/lib/kubelet
    logging:
      cni:
        logFileMaxAgeDays: 30
        logFileMaxCount: 10
        logFileMaxSize: 100Mi
        logSeverity: Info
    nodeUpdateStrategy:
      rollingUpdate:
        maxUnavailable: 1
      type: RollingUpdate
    nonPrivileged: Disabled
    serviceCIDRs:
    - 10.42.0.0/17
    variant: Calico
    windowsNodes:
      cniBinDir: /opt/cni/bin
      cniConfigDir: /etc/cni/net.d
      cniLogDir: /var/log/calico/cni
  conditions:
  - lastTransitionTime: "2024-10-11T06:36:14Z"
    message: All Objects Available
    observedGeneration: 5
    reason: AllObjectsAvailable
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-10-11T06:36:14Z"
    message: All objects available
    observedGeneration: 5
    reason: AllObjectsAvailable
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-10-11T06:36:14Z"
    message: All Objects Available
    observedGeneration: 5
    reason: AllObjectsAvailable
    status: "False"
    type: Progressing
  mtu: 1450
  variant: Calico

IPAM

$ k get ipamconfig default -o yaml
apiVersion: crd.projectcalico.org/v1
kind: IPAMConfig
metadata:
  annotations:
    projectcalico.org/metadata: '{"creationTimestamp":null}'
  creationTimestamp: "2024-10-09T11:57:01Z"
  generation: 2
  name: default
  resourceVersion: "885"
  uid: d17c15b9-2059-4a2a-b8f0-87614d2570b3
spec:
  autoAllocateBlocks: true
  strictAffinity: true

pool

 k get ippools. default-ipv4-ippool  -o yaml
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  creationTimestamp: "2024-10-09T11:56:34Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: tigera-operator
  name: default-ipv4-ippool
  resourceVersion: "1455"
  uid: fa602211-17ff-4a75-aca4-099d0983e81e
spec:
  allowedUses:
  - Workload
  - Tunnel
  blockSize: 26
  cidr: 10.42.128.0/17
  ipipMode: Never
  natOutgoing: true
  nodeSelector: all()
  vxlanMode: Always

Expected behavior
Pod have internet / dns

Kubernetes (please complete the following information):

  • Windows Server version: Windows Server 2022 Standard 10.0.20348.2700
  • Kubernetes Version: 1.31.1
  • CNI: calico v3.28.1

Additional context

calico-windows-node: attached as log file
caliconode.log

annotations:

$ k get nodes -o yaml | grep projectcalico.org
      projectcalico.org/IPv4Address: 10.13.18.42/24
      projectcalico.org/IPv4VXLANTunnelAddr: 10.42.183.128
      projectcalico.org/IPv4Address: 10.13.18.64/24
      projectcalico.org/IPv4VXLANTunnelAddr: 10.42.170.192
      projectcalico.org/IPv4Address: 10.13.18.9/24
      projectcalico.org/IPv4VXLANTunnelAddr: 10.42.254.64
      projectcalico.org/IPv4Address: 10.13.18.8/24
      projectcalico.org/IPv4VXLANTunnelAddr: 10.42.237.128
      projectcalico.org/IPv4Address: 10.12.12.157/16
      projectcalico.org/IPv4VXLANTunnelAddr: 10.42.249.65
      projectcalico.org/VXLANTunnelMACAddr: 00:15:5d:6c:b8:ef
      projectcalico.org/IPv4Address: 10.12.12.35/16
      projectcalico.org/IPv4VXLANTunnelAddr: 10.42.215.129
      projectcalico.org/VXLANTunnelMACAddr: 00:15:5d:16:e4:e5

nodes

 k get nodes -o wide
NAME                      STATUS   ROLES           AGE   VERSION   INTERNAL-IP    EXTERNAL-IP    OS-IMAGE                       KERNEL-VERSION      CONTAINER-RUNTIME
clippy-md-0-2kt4j-4wthh   Ready    <none>          43h   v1.31.1   10.13.18.42    10.13.18.42    Ubuntu 20.04.6 LTS             5.4.0-195-generic   containerd://1.7.22
clippy-md-0-2kt4j-9l6xw   Ready    <none>          43h   v1.31.1   10.13.18.64    10.13.18.64    Ubuntu 20.04.6 LTS             5.4.0-195-generic   containerd://1.7.22
clippy-md-0-2kt4j-ffmgj   Ready    <none>          43h   v1.31.1   10.13.18.9     10.13.18.9     Ubuntu 20.04.6 LTS             5.4.0-195-generic   containerd://1.7.22
clippy-nn78j              Ready    control-plane   43h   v1.31.1   10.13.18.8     10.13.18.8     Ubuntu 20.04.6 LTS             5.4.0-195-generic   containerd://1.7.22
cw2-ptmtr-nwbvs           Ready    <none>          25h   v1.31.1   10.12.12.157   10.12.12.157   Windows Server 2022 Standard   10.0.20348.2700     containerd://1.7.22
win-w85jm-ldgdk           Ready    <none>          63m   v1.31.1   10.12.12.35    10.12.12.35    Windows Server 2022 Standard   10.0.20348.2700     containerd://1.7.22
$ Get-HnsNetwork
ActivityId             : 9F7B4A29-2870-42B5-B73B-10FF10F5ADB5
AdditionalParams       :
CurrentEndpointCount   : 0
DNSServerCompartment   : 3
DrMacAddress           : 00-15-5D-6C-B8-EF
Extensions             : {@{Id=E7C3B2F0-F3C5-48DF-AF2B-10FED6D72E7A; IsEnabled=False;
                         Name=Microsoft Windows Filtering Platform},
                         @{Id=F74F241B-440F-4433-BB28-00F89EAD20D8; IsEnabled=True;
                         Name=Microsoft Azure VFP Switch Extension},
                         @{Id=430BDADD-BAB0-41AB-A369-94B67FA5BE0A; IsEnabled=True;
                         Name=Microsoft NDIS Capture}}
Flags                  : 0
Health                 : @{LastErrorCode=0; LastUpdateTime=133731058207475297}
ID                     : 7DA4BC16-24A3-4A60-92FB-EAF3196E1FA0
IPv6                   : False
LayeredOn              : 8FA2C2D1-16E0-4693-9DCB-10B8459164D1
MacPools               : {@{EndMacAddress=00-15-5D-C8-FF-FF;
                         StartMacAddress=00-15-5D-C8-F0-00}}
ManagementIP           : 10.12.12.157
MaxConcurrentEndpoints : 0
Name                   : External
Policies               : {}
State                  : 1
Subnets                : {@{AdditionalParams=; AddressPrefix=192.168.255.0/30; Flags=0;
                         GatewayAddress=192.168.255.1; Health=;
                         ID=CA943AB7-B803-4CF8-B107-85B8FEAF949C;
                         IpSubnets=System.Object[]; ObjectType=5;
                         Policies=System.Object[]; State=0}}
TotalEndpoints         : 0
Type                   : Overlay
Version                : 55834574851
Resources              : @{AdditionalParams=; AllocationOrder=1;
                         Allocators=System.Object[]; CompartmentOperationTime=0; Flags=0;
                         Health=; ID=9F7B4A29-2870-42B5-B73B-10FF10F5ADB5;
                         PortOperationTime=0; State=1; SwitchOperationTime=0;
                         VfpOperationTime=0; parentId=D26B2287-32EE-41BD-A150-EDA2DBB20A30}

ActivityId             : 8C92A66D-7817-40AE-AFE6-0BB5824D54D9
AdditionalParams       :
CurrentEndpointCount   : 0
DNSServerCompartment   : 4
DrMacAddress           : 00-15-5D-6C-B8-EF
Extensions             : {@{Id=E7C3B2F0-F3C5-48DF-AF2B-10FED6D72E7A; IsEnabled=False;
                         Name=Microsoft Windows Filtering Platform},
                         @{Id=F74F241B-440F-4433-BB28-00F89EAD20D8; IsEnabled=True;
                         Name=Microsoft Azure VFP Switch Extension},
                         @{Id=430BDADD-BAB0-41AB-A369-94B67FA5BE0A; IsEnabled=True;
                         Name=Microsoft NDIS Capture}}
Flags                  : 0
Health                 : @{LastErrorCode=0; LastUpdateTime=133731058414671503}
ID                     : D1DBE980-BE3E-4049-AA5D-93A4EBEF45B2
IPv6                   : False
LayeredOn              : 8FA2C2D1-16E0-4693-9DCB-10B8459164D1
MacPools               : {@{EndMacAddress=00-15-5D-55-1F-FF;
                         StartMacAddress=00-15-5D-55-10-00}}
ManagementIP           : 10.12.12.157
MaxConcurrentEndpoints : 1
Name                   : Calico
Policies               : {@{DestinationPrefix=10.42.183.128/26;
                         DistributedRouterMacAddress=66-ef-b3-b4-4c-c8; IsolationId=4096;
                         ProviderAddress=10.13.18.42; Type=RemoteSubnetRoute},
                         @{DestinationPrefix=10.42.237.128/26;
                         DistributedRouterMacAddress=66-52-57-21-f1-b0; IsolationId=4096;
                         ProviderAddress=10.13.18.8; Type=RemoteSubnetRoute},
                         @{DestinationPrefix=10.42.170.192/26;
                         DistributedRouterMacAddress=66-84-f6-b1-67-9c; IsolationId=4096;
                         ProviderAddress=10.13.18.64; Type=RemoteSubnetRoute},
                         @{DestinationPrefix=10.42.215.128/26;
                         DistributedRouterMacAddress=00-15-5d-16-e4-e5; IsolationId=4096;
                         ProviderAddress=10.12.12.35; Type=RemoteSubnetRoute}...}
State                  : 1
Subnets                : {@{AdditionalParams=; AddressPrefix=10.42.249.64/26; Flags=0;
                         GatewayAddress=10.42.249.65; Health=;
                         ID=019DD141-81FA-4F45-B567-356817EA46DC;
                         IpSubnets=System.Object[]; ObjectType=5;
                         Policies=System.Object[]; State=0}}
TotalEndpoints         : 2
Type                   : Overlay
Version                : 55834574851
Resources              : @{AdditionalParams=; AllocationOrder=1;
                         Allocators=System.Object[]; CompartmentOperationTime=0; Flags=0;
                         Health=; ID=8C92A66D-7817-40AE-AFE6-0BB5824D54D9;
                         PortOperationTime=0; State=1; SwitchOperationTime=0;
                         VfpOperationTime=0; parentId=D26B2287-32EE-41BD-A150-EDA2DBB20A30}
@Breee
Copy link
Author

Breee commented Oct 14, 2024

Workaround is here: microsoft/Windows-Containers#516
We should add a hint to the guide

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants