Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-webhook-certgen create secret job not recreating existing CA secret if parameters change #12767

Closed
botchk opened this issue Jan 31, 2025 · 10 comments
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@botchk
Copy link

botchk commented Jan 31, 2025

What happened:

When changing the controller.name of an existing installation, the admission webhook is failing all ingress requests with a cert validation error.

Error from server (InternalError): error when creating "ingress-hinterseer.yaml": Internal error occurred: failed calling webhook
"validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://ingress-nginx-hansi-hinterseer-admission.default.svc:443/networking/v1/ingresses?timeout=10s":
tls: failed to verify certificate: x509: certificate is valid for ingress-nginx-hansi-admission,
ingress-nginx-hansi-admission.default.svc, not ingress-nginx-hansi-hinterseer-admission.default.svc

This situation can only be resolved manually by

# Delete the existing CA secret
kubectl delete secret -n ingress-nginx-hansi ingress-nginx-hansi-admission
# Rerun the create secret job, to create the secret with the new CA
# Rerun the patch secret job, to inject the new CA from the secret to the validating webhook configuration

Related issues have been reported here already: #5968

What you expected to happen:

When changing controller.name and updating an existing installation, the admission webhook should use a valid certificate and not reject ingress requests with an error.

The responsible code in ingress-nginx/kube-webhook-certgen can be found here: https://github.com/kubernetes/ingress-nginx/blob/main/images/kube-webhook-certgen/rootfs/cmd/create.go#L27-L34

Two possible fixes could be:

  • Don't check if the secret already exists and always update the secret when the job runs
  • Check if the secrets exists and uses the generated CA. If not, update the secret with the new CA

This would prevent the need for manual intervention and certification validation errors where it is not immediately clear on why it happens.

NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version): 1.9.6

Kubernetes version (use kubectl version): v1.29.12-eks-2d5f260

Environment:

  • Cloud provider or hardware configuration: Dell Prevision 5690

  • OS (e.g. from /etc/os-release): Ubuntu 22.04.5 LTS

  • Kernel (e.g. uname -a): 6.8.0-52-generic

  • Install tools:

    • kind
  • Basic cluster related info:

    • Server Version: v1.29.12
      
  • How was the ingress-nginx-controller installed:

    • If helm was used then please show output of helm ls -A | grep -i ingress
    • helm ls -A | grep -i ingress                           
      ingress-nginx   default         2               2025-02-03 14:36:02.869574307 +0100 CET deployed        ingress-nginx-4.12.0    1.12.0
      
    • If helm was used then please show output of helm -n <ingresscontrollernamespace> get values <helmreleasename>
    • helm -n default get values ingress-nginx
      USER-SUPPLIED VALUES:
      controller:
        name: hansi-hinterseer
      
  • Current State of the controller:

    • kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.12.0
              helm.sh/chart=ingress-nginx-4.12.0
Annotations:  meta.helm.sh/release-name: ingress-nginx
              meta.helm.sh/release-namespace: default
Controller:   k8s.io/ingress-nginx
Events:       <none>
  • kubectl -n <ingresscontrollernamespace> get all -A -o wide
NAMESPACE            NAME                                                 READY   STATUS    RESTARTS   AGE   IP           NODE                 NOMINATED NODE   READINESS GATES
default              pod/ingress-nginx-hansi-hinterseer-f6799499b-tgp5c   1/1     Running   0          11m   10.244.0.9   kind-control-plane   <none>           <none>
kube-system          pod/coredns-76f75df574-92k42                         1/1     Running   0          21m   10.244.0.3   kind-control-plane   <none>           <none>
kube-system          pod/coredns-76f75df574-s7dxv                         1/1     Running   0          21m   10.244.0.4   kind-control-plane   <none>           <none>
kube-system          pod/etcd-kind-control-plane                          1/1     Running   0          21m   172.19.0.4   kind-control-plane   <none>           <none>
kube-system          pod/kindnet-j5lz4                                    1/1     Running   0          21m   172.19.0.4   kind-control-plane   <none>           <none>
kube-system          pod/kube-apiserver-kind-control-plane                1/1     Running   0          21m   172.19.0.4   kind-control-plane   <none>           <none>
kube-system          pod/kube-controller-manager-kind-control-plane       1/1     Running   0          21m   172.19.0.4   kind-control-plane   <none>           <none>
kube-system          pod/kube-proxy-b2lkg                                 1/1     Running   0          21m   172.19.0.4   kind-control-plane   <none>           <none>
kube-system          pod/kube-scheduler-kind-control-plane                1/1     Running   0          21m   172.19.0.4   kind-control-plane   <none>           <none>
local-path-storage   pod/local-path-provisioner-6c5f9c5b66-fpp95          1/1     Running   0          21m   10.244.0.2   kind-control-plane   <none>           <none>

NAMESPACE     NAME                                               TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE   SELECTOR
default       service/ingress-nginx-hansi-hinterseer             LoadBalancer   10.96.252.175   <pending>     80:31356/TCP,443:32386/TCP   11m   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
default       service/ingress-nginx-hansi-hinterseer-admission   ClusterIP      10.96.185.32    <none>        443/TCP                      11m   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
default       service/kubernetes                                 ClusterIP      10.96.0.1       <none>        443/TCP                      21m   <none>
kube-system   service/kube-dns                                   ClusterIP      10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP       21m   k8s-app=kube-dns

NAMESPACE     NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE   CONTAINERS    IMAGES                                          SELECTOR
kube-system   daemonset.apps/kindnet      1         1         1       1            1           kubernetes.io/os=linux   21m   kindnet-cni   docker.io/kindest/kindnetd:v20241212-9f82dd49   app=kindnet
kube-system   daemonset.apps/kube-proxy   1         1         1       1            1           kubernetes.io/os=linux   21m   kube-proxy    registry.k8s.io/kube-proxy:v1.29.12             k8s-app=kube-proxy

NAMESPACE            NAME                                             READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS               IMAGES                                                                                                                     SELECTOR
default              deployment.apps/ingress-nginx-hansi-hinterseer   1/1     1            1           11m   controller               registry.k8s.io/ingress-nginx/controller:v1.12.0@sha256:e6b8de175acda6ca913891f0f727bca4527e797d52688cbe9fec9040d6f6b6fa   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
kube-system          deployment.apps/coredns                          2/2     2            2           21m   coredns                  registry.k8s.io/coredns/coredns:v1.11.1                                                                                    k8s-app=kube-dns
local-path-storage   deployment.apps/local-path-provisioner           1/1     1            1           21m   local-path-provisioner   docker.io/kindest/local-path-provisioner:v20241212-8ac705d0                                                                app=local-path-provisioner

NAMESPACE            NAME                                                       DESIRED   CURRENT   READY   AGE   CONTAINERS               IMAGES                                                                                                                     SELECTOR
default              replicaset.apps/ingress-nginx-hansi-hinterseer-f6799499b   1         1         1       11m   controller               registry.k8s.io/ingress-nginx/controller:v1.12.0@sha256:e6b8de175acda6ca913891f0f727bca4527e797d52688cbe9fec9040d6f6b6fa   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=f6799499b
kube-system          replicaset.apps/coredns-76f75df574                         2         2         2       21m   coredns                  registry.k8s.io/coredns/coredns:v1.11.1                                                                                    k8s-app=kube-dns,pod-template-hash=76f75df574
local-path-storage   replicaset.apps/local-path-provisioner-6c5f9c5b66          1         1         1       21m   local-path-provisioner   docker.io/kindest/local-path-provisioner:v20241212-8ac705d0
  • kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
NAME                                             READY   STATUS    RESTARTS   AGE
ingress-nginx-hansi-hinterseer-f6799499b-tgp5c   1/1     Running   0          12m
  • kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
Name:                     ingress-nginx-hansi-hinterseer
Namespace:                default
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.12.0
                          helm.sh/chart=ingress-nginx-4.12.0
Annotations:              meta.helm.sh/release-name: ingress-nginx
                          meta.helm.sh/release-namespace: default
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.96.252.175
IPs:                      10.96.252.175
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  31356/TCP
Endpoints:                10.244.0.9:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  32386/TCP
Endpoints:                10.244.0.9:443
Session Affinity:         None
External Traffic Policy:  Cluster
Internal Traffic Policy:  Cluster
Events:                   <none>

  • Current state of ingress object, if applicable:
    • Not needed

How to reproduce this issue:

  1. Install Kind https://kind.sigs.k8s.io/docs/user/quick-start/
  2. Install Helm https://helm.sh/docs/intro/install/
  3. Create Kind cluster
    • kind create cluster --image "kindest/node:v1.29.12"
  4. Fetch and install ingress nginx with helm
    • helm fetch --version 4.12.0 ingress-nginx/ingress-nginx --untar
    • cat values.yaml
      controller:
        name: hansi
      
    - `helm install ingress-nginx ingress-nginx -f values.yaml`
    
  5. Create an ingress resource. A service is not needed here since we are only testing the admission webhook
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hansi
spec:
  ingressClassName: nginx
  rules:
    - host: hansi-hinterseer.at
      http:
        paths:
          - backend:
              service:
                name: hansi
                port:
                  number: 80
            path: /
            pathType: Prefix
EOF
  1. Everything should succeed until this point. Now change ingress nginx installation values and upgrade
cat values.yaml
controller:
  name: hansi-hinterseer

helm upgrade ingress-nginx ingress-nginx -f values.yaml
7. Create an ingress resource. Validation fails with error

cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hinterseer
spec:
  ingressClassName: nginx
  rules:
    - host: hansi-hinterseer.at
      http:
        paths:
          - backend:
              service:
                name: hinterseer
                port:
                  number: 80
            path: /
            pathType: Prefix
EOF
Error from server (InternalError): error when creating "ingress-hinterseer.yaml": Internal error occurred: failed calling webhook
"validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://ingress-nginx-hansi-hinterseer-admission.default.svc:443/networking/v1/ingresses?timeout=10s":
tls: failed to verify certificate: x509: certificate is valid for ingress-nginx-hansi-admission,
ingress-nginx-hansi-admission.default.svc, not ingress-nginx-hansi-hinterseer-admission.default.svc

Anything else we need to know:
-

@botchk botchk added the kind/bug Categorizes issue or PR as related to a bug. label Jan 31, 2025
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 31, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@longwuyuan
Copy link
Contributor

/remove-kind bug
The issues description you have provided is too cryptic and too succint. It does not provide the information needed by readers of a github issue to reproduce or even understand the problem to begin with.

The template of a new bug report template asks questions that are visible in the description post . Answer those questions. And then kindly add the exact install or upgrade command used, the version of the controller attempted to be installed etc etc.

@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jan 31, 2025
@botchk
Copy link
Author

botchk commented Feb 3, 2025

@longwuyuan thanks for the feedback. I updated the description to make it more clear.

@longwuyuan
Copy link
Contributor

Thanks. More clear now.
AFAIK, changing the name of the controller , resulting in automatically changing the webook certgen CA ,is not a existing feature of the controller.

Also there is no possibility that such a feature will be added and supported because there are no resources to allocate for such work or for future maintenance of such feature.

Some use cases may seem justified to need such a feature but normal uses cases where any changes outside a cert impact the fields in the cert like Subject and Alt-Subject, would most often be a reinstall.

@botchk
Copy link
Author

botchk commented Feb 4, 2025

It's definitely an edge case that can happen in different ways, e.g. a new installation finding an existing secret CA from a previous installation that was not cleaned up manually. The challenge mostly comes from needing to dive deeper into the webhook certgen to understand the actual issue, since the certification verification error message does not directly point at the issue.

Issues like #5968 show, that some are running into this cases.

Since the change would be concentrated on the webhook certgen service, I would be willing to contribute this myself. But as you already mentioned, this would then depend if there are resources to allocate to review and maintain this in the future.

@longwuyuan
Copy link
Contributor

Thank you for update because now there is a point to discuss.

In the original issue-description you are talking about changing the controller.name spec of a existing instance of the controller. In your latest post above you contradict that by mentioning a new installation finding a existing secret CA from a previous install, that was not cleaned up. Help clarify this contradiciton.

We have announced that if there is a PR that improves controller to fix a existing problem, it will be categorized as a "bug-fix" and accepted, because it comes without the worries of "new feature".

I think that currently the "normal" expectation for changing the controller.name spec is that it will not occur as an upgrade and the impact is similar to uninstall and reinstall. Let me know your thoughts and then the clear issue-description can be commented on by maintainer developer.

@botchk
Copy link
Author

botchk commented Feb 4, 2025

Sorry for the confusion. I indeed encountered this issue in two different ways, since the underlying common factor is the CA secret:

  • An upgrade of an existing helm installation, changing only the controller.name. The name of the CA secret stays the same, so a new valid CA is not generated
  • A new helm installation on a namespace that previously already had an ingress-nginx installation, using the same release name. The previous CA secret was not cleaned up manually and the new installation therefore doesn't generate a new valid CA secret
  • There are also some other scenarios where this issue could be encountered. E.g. an installation without helm but with manifests directly, changing the parameters of --host in the create secret job due to some custom requirements

I agree with your statement that changing the controller.name of an existing installation can be treated as a new installation. Nevertheless, since it can also affect new installations due to a left over secret from a previous installation, this could be treated as a "bug-fix"? It would improve the installation experience, since knowledge of CA cert generation and manual intervention is needed otherwise.

@longwuyuan
Copy link
Contributor

ok. My observations are ;

  • If I am not wrong, this relates to code that was forked and then maintained in this project. So that means if there is any complication involving maintenance by project will not be done.

  • I agree that if you change the controller.name spec, then a user will land in this trouble scene. But now its clear that the project has to decide if changing controller.name is a new feature or a new supported use-case. We are not creating new-features for sure and now we know we are unable to support new use-cases like this one.

  • It can not be a bug for the reason that we have made multiple releases last year so all users should have reported this as a bug. Since only and only the rare-use-case you describe is the problem, this is not a bug.

  • So it seems that we have to address this lack of documentation for the un-initiated user, on the certgen plus related jobs/patches, on handling this problem, is a area of potential improvement.

  • I think you can propose documentation in the FAQ section.

  • If you do want to address this with code, I think editing the issue description with a proper summary is first requirement. Also an absolute requirement is the perspectives & realities of both you and the project as pre my comments. The super heavily occupied developer maintainers may look at that of informative PR and comment on it.

  • Sorry for the situation but the critical information here is we actually depricated several highly used and highly benefitting features. All because we can not allocate for anything more than the Ingress-API spec that the controller is expected to deliver securely out of the box.

@botchk
Copy link
Author

botchk commented Feb 5, 2025

Thanks for your elaborate response. I don't want to keep you busy any longer with a non-critical issue and will close it therefore. I don't think FAQ documentation will add something here. There are threads like #5968, which are quite helpful already to understand the problem and solve this issue manually. Similarly, there is now also this thread to point people into the right direction.

@longwuyuan
Copy link
Contributor

I wish to emphasize that the intention to improve is extremely high. But we have these situations arising out of lack of developer time where the precise processing of a problem is not conventional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
Development

No branches or pull requests

3 participants