Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS auth secret not found using controller v1.11.4 #12620

Closed
vramperez opened this issue Jan 2, 2025 · 15 comments
Closed

TLS auth secret not found using controller v1.11.4 #12620

vramperez opened this issue Jan 2, 2025 · 15 comments
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@vramperez
Copy link

vramperez commented Jan 2, 2025

What happened:

When updating the helm chart to version 4.11.4 (v1.11.4 of the controller), it is not able to find the secret specified with the nginx.ingress.kubernetes.io/auth-tls-secret annotation, which in previous versions had been working fine and the secret still exists and with the same value. This causes all requests to return a 403. I also tried with version 4.12.0 in case it was an incompatibility problem with the k8s 1.31 versoin, but the behaviour is the same.

k logs ingress-nginx-controller-vb8vw --tail 100000 | grep -i "ca-secret"
E0102 09:48:02.038394       7 annotations.go:219] "error reading Ingress annotation" err="error obtaining certificate: local SSL certificate internal-develop-ci/ca-secret was not found" name="CertificateAuth" ingress="internal-develop-ci/app-tls-ingress"

Also in nginx.conf

# error obtaining certificate: local SSL certificate internal-develop-ci/ca-secret was not found
return 403;
  • Ingress object:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: prod
    meta.helm.sh/release-name: app
    meta.helm.sh/release-namespace: internal-develop-ci
    nginx.ingress.kubernetes.io/auth-tls-pass-certificate-to-upstream: "true"
    nginx.ingress.kubernetes.io/auth-tls-secret: internal-develop-ci/ca-secret
    nginx.ingress.kubernetes.io/auth-tls-verify-client: optional
    nginx.ingress.kubernetes.io/auth-tls-verify-depth: "3"
    nginx.ingress.kubernetes.io/backend-protocol: HTTP
    nginx.ingress.kubernetes.io/configuration-snippet: |
      if ($ssl_client_verify != "SUCCESS") { return 403; }
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/use-regex: "true"
  creationTimestamp: "2024-07-08T07:48:06Z"
  generation: 7
  labels:
    app.kubernetes.io/managed-by: Helm
  name: app-tls-ingress
  namespace: internal-develop-ci
  resourceVersion: "511512779"
  uid: 7a3b7ed6-e1ff-411c-907c-849ccb3f5efc
spec:
  ingressClassName: nginx
  rules:
  - host: app.develop.ci.help
    http:
      paths:
      - backend:
          service:
            name: app-svc
            port:
              number: 80
        path: /pemea
        pathType: Prefix
  tls:
  - hosts:
    - app.develop.ci.help
    secretName: app-tls-secret
status:
  loadBalancer:
    ingress:
    - hostname: a0fd247c95da34021b7ad53b770c03dd-0187727592fb99ea.elb.eu-west-1.amazonaws.com

What you expected to happen:

That it would continue to function normally, being able to validate the client certificate with the CAs chain specified in the ca.crt field of the ca-secret secret.

NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version):

ingress-nginx-controller-6qdqb:/etc/nginx$ /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.11.4
  Build:         ba0f2ee37f032c9f11967b74862c60a43ed59b36
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.25.5

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version):

kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.31.3-eks-56e63d8

Environment:

  • How was the ingress-nginx-controller installed:
    • If helm was used then please show output of helm ls -A | grep -i ingress:
NAME         	NAMESPACE          	REVISION	UPDATED                                	STATUS  	CHART               	APP VERSION
ingress-nginx	nginx-ingress	64      	2025-01-02 10:47:15.078656077 +0100 CET	deployed	ingress-nginx-4.11.4	1.11.4 
  • If helm was used then please show output of helm -n <ingresscontrollernamespace> get values <helmreleasename>
USER-SUPPLIED VALUES:
controller:
admissionWebhooks:
  patch:
    image:
      digest: ""
allowSnippetAnnotations: true
config:
  proxy-body-size: 600m
  stream-access-log-path: /dev/null
  use-proxy-protocol: false
image:
  digest: ""
  digestChroot: ""
ingressClass: nginx
ingressClassResource:
  default: true
  name: nginx
kind: DaemonSet
livenessProbe:
  periodSeconds: 30
metrics:
  enabled: true
  serviceMonitor:
    enabled: true
    namespace: monitoring
priorityClassName: app-infra-priority
readinessProbe:
  periodSeconds: 3
resources:
  limits:
    cpu: 900m
  requests:
    cpu: 100m
service:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "600"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
  nodePorts:
    http: "31685"
    https: "30761"
tolerations:
- effect: NoExecute
  key: node.kubernetes.io/not-ready
  operator: Exists
  tolerationSeconds: 5
- effect: NoExecute
  key: node.kubernetes.io/unreachable
  operator: Exists
  tolerationSeconds: 5
tcp:
"5349": internal-develop-ci/app-webrtc-turn-svc:5349
udp: {}

  • Current State of the controller:
    • kubectl describe ingressclasses
  kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.11.4
              helm.sh/chart=ingress-nginx-4.11.4
Annotations:  ingressclass.kubernetes.io/is-default-class: true
              meta.helm.sh/release-name: ingress-nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>
@vramperez vramperez added the kind/bug Categorizes issue or PR as related to a bug. label Jan 2, 2025
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 2, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@longwuyuan
Copy link
Contributor

Can you remove he namespace name and just only use the secret name, to see what the error message impact is.

@longwuyuan
Copy link
Contributor

/triage needs-information

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jan 3, 2025
@vramperez
Copy link
Author

Hello, thank you for your quick response.

If I delete the namespace, the error says that the format is invalid. More specifically:

E0103 07:40:46.458866       7 annotations.go:219] "error reading Ingress annotation" err="location denied, reason: invalid format (namespace/name) found in 'ca-secret'" name="CertificateAuth" ingress="internal-develop-ci/app-tls-ingress"

It's weird, because if I rollback with helm to the immediately previous version 4.11.3 (controller v1.11.3), everything works perfectly again without changing anything at all.

@longwuyuan
Copy link
Contributor

Can you share screen or write steps to reproduce on a kind cluster.

@longwuyuan
Copy link
Contributor

The problem is that readers have to trust that you have not made any human error like secret without ca and so on. Secrets usually take only cert+key so proof is needed for how you added ca to that secret.

@longwuyuan
Copy link
Contributor

i recall creating a secret with 3 components like cert, key and ca-cert but I have to read and search on how I did that for a test.

@vramperez
Copy link
Author

vramperez commented Jan 3, 2025

Hi, right now I can't generate a minimal example to reproduce it because I don't have time, but as soon as I have time I'll put it here.

For the moment, I share with you the content of the ca.crt field of the ca-secret secret that when updating it says that it can't find (in json format due to github yaml restrictions).

kubectl -n internal-develop-ci get secret ca-secret -o json > ca-secret.json
ca-secret.json

As can be seen, it's just the Ubuntu 24.04 trusted CAs chain.

I understand that I have to demonstrate with a minimal example how to reproduce the error, but I thought that reviewing the changes of the controller from v1.11.3 to v1.11.4 maybe you could see if something has been changed that affects this.

@longwuyuan
Copy link
Contributor

longwuyuan commented Jan 3, 2025

I could help if you explain everything about that secret. who is hte cert from. how did you create the secret. show the kubectl describe of the secret (redacted hash). show the kubectl get of the secret. show the kubectl get events of that namespace. show the logs etc etc

@vramperez
Copy link
Author

Hi, I've found the problem, I hadn't seen this error line, which is more descriptive:
W0103 11:19:24.955854 7 backend_ssl.go:47] Error obtaining X.509 certificate: unexpected error creating SSL Cert: x509: negative serial number

It turns out that in the ca-secret secret that gives error, which is a chain of CAs, there is a CA that has a negative serial number. Up to version v1.11.3 of the Ingress Controller driver, this is not a problem, but from v1.11.4 onwards it is, because the version of go has been updated to v1.23.3 and although it does not appear in the releases notes, now the x509.ParseCertificate says:

Before Go 1.23, ParseCertificate accepted certificates with negative serial numbers. This behavior can be restored by including "x509negativeserial=1" in the GODEBUG environment variable.

Thanks to this stackoverflow post.

I will try to update the chain of CAs, to see if there is no longer any with a negative serial number, and that would solve my case.

@longwuyuan
Copy link
Contributor

thanks for updating. Close the case once resolved.

@vramperez
Copy link
Author

I close the issue because I confirm that if there is no CA with a negative serial number the problem is solved. Thanks!

@longwuyuan
Copy link
Contributor

/remove-kind bug

@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jan 3, 2025
@logan2211
Copy link

This issue is affecting me when using the default Debian bullseye trust store located at /etc/ssl/certs/ca-certificates.crt as the ca.crt contents in my nginx.ingress.kubernetes.io/proxy-ssl-secret.

After v1.11.3, ingress-nginx throws backend_ssl.go:47] Error obtaining X.509 certificate: unexpected error creating SSL Cert: x509: negative serial number and fails to read the default CA certificate trust store from Debian bullseye (and most likely many other distros).

This is caused by golang/go#65085 / https://go-review.googlesource.com/c/go/+/562343
The commit itself specifically acknowledges There is ... one trusted certificate I could find in the web pki which has a negative serial number.

CATCert (who is responsible for this root cert) successfully argued for its inclusion in Mozilla's trust store, stating:

*** We agree with that point, since 2002 the RFC 3280 and later RFC 5280 force CAs not to issue negative numbers. The fact is that our root CA created in 2003 is not accomplishing that point because in design time was used the RFC2459 that didn't have this requirement about serial numbers. Anyway it is specified in RFCs 3280 and 5280 that "Certificate users SHOULD be prepared to gracefully handle such certificates", so the practical experience is that we have not been reported yet for any interoperability problems caused by this issue. Furthermore we are planning to create another root certificate during 2012 with SHA-256 algorithm, so we will take special care at this point. We hope this will not be a blocking point for the inclusion of CATCert's root certificate in Firefox navigator.

To me it seems premature that the commit was merged in conflict with past precedent regarding this particular root certificate, knowingly breaking Go from being able to accept operating system default PKI trust stores.

Trust-manager recently worked around Go's regression by allowing negative serial numbers when using Go 1.23, which is how I would propose ingress-nginx resolve this absent a better solution.

@longwuyuan
Copy link
Contributor

@logan2211 we rely on feedback so this could be significant. An action item becomes implied on the project if enough number of users are impacted.

But a discussion is needed so please join the slack channel ingress-nginx-dev and see if maintainers engage. You can also join the ingress-nginx community meeting https://github.com/kubernetes/community/tree/master/sig-network

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Development

No branches or pull requests

4 participants