Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support exec readinessProbes for sidecar serving containers #15484

Open
FloMedja opened this issue Aug 26, 2024 · 10 comments · May be fixed by #15773
Open

Support exec readinessProbes for sidecar serving containers #15484

FloMedja opened this issue Aug 26, 2024 · 10 comments · May be fixed by #15773
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Issues which should be fixed (post-triage)

Comments

@FloMedja
Copy link

FloMedja commented Aug 26, 2024

Knative version v1.15.0

Expected Behavior

The Knative serving supports the exec readiness probes for serving sidecar containers.

I call serving sidecar containers the containers that didn't have the port defined.

Actual Behavior

The knative serving reconciler triggers an error because the exec readiness probe is not supported in the serving sidecar containers. It is due to the fact that the applyReadinessProbeDefaults is used to compute the probes for the main and sidecar serving containers. The function needs a port to create a TCPProbe in the queue container. That TCPProbe is required for serving containers exec readiness probes.

Question : Why do we need to have a TCP Probe on the queue container for the exec probes of the serving containers ?

Reflection : If ports definition were allowed for sidecar containers, this issue would be trivial to fix. I see that an issue is open for but is on ice for a while.

Contribution: I can help fixing the issue when we will decide on a common implementation.

Steps to Reproduce the Problem

  1. Create a knative serving with this yaml :
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  annotations:
    # Add necessary annotations
    serving.knative.dev/istio-gateways: knative-serving/knative-external-ingress-gateway # Change If you don't use istio gateway
  name: helloworld-tcp-error
  namespace: default # Customize your namespace
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
        autoscaling.knative.dev/metric: concurrency
        autoscaling.knative.dev/max-scale: "1"
        autoscaling.knative.dev/min-scale: "1"
    spec:
      containerConcurrency: 0
      containers:
        - image: ghcr.io/knative/helloworld-go:latest
          name: app
          env:
            - name: TARGET
              value: "Go Sample v1"
          ports:
            - containerPort: 8080
              protocol: TCP
          readinessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - cat /proc/1/status
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - "CAP_SYS_ADMIN"
            runAsNonRoot: true
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
        - name: sidecar
          image: registry.k8s.io/busybox
          args:
            - /bin/sh
            - -c
            - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
          readinessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - cat /proc/1/status
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - "CAP_SYS_ADMIN"
            runAsNonRoot: true
            runAsUser: 1000
            seccompProfile:
              type: RuntimeDefault
      timeoutSeconds: 60
  traffic:
    - latestRevision: true
      percent: 100
  1. retrieve the container events with kubectl events -n <namespace> you will see an error similar as :
5s (x21 over 38m)        Warning   InternalError           Revision/helloworld-tcp-error-00001                            failed to create deployment "helloworld-tcp-error-00001-deployment": failed to make deployment: failed to create PodSpec: failed to create queue-proxy container: sidecar readiness probe does not define probe port on container: sidecar
@FloMedja FloMedja added the kind/bug Categorizes issue or PR as related to a bug. label Aug 26, 2024
@github-staff github-staff deleted a comment Aug 27, 2024
@FloMedja
Copy link
Author

@skonto Can I have your input on this please ?
Thanks :)

@FloMedja
Copy link
Author

@ReToCode Can I have your input on this issue please ?
Thanks :)

@github-staff github-staff deleted a comment from Lxx-c Oct 23, 2024
@github-staff github-staff deleted a comment from Lxx-c Oct 23, 2024
@github-staff github-staff deleted a comment from Lxx-c Oct 23, 2024
@FloMedja
Copy link
Author

FloMedja commented Nov 7, 2024

@skonto @ReToCode hi, gentle ping :)

Copy link

github-actions bot commented Feb 6, 2025

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 6, 2025
@FloMedja
Copy link
Author

FloMedja commented Feb 6, 2025

/remove-lifecycle stale

@knative-prow knative-prow bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 6, 2025
@skonto
Copy link
Contributor

skonto commented Feb 7, 2025

Hi @FloMedja,

We need to probe early at the QP side to (before kubelet to speed up readiness) and by design (there are more details in the doc) we don't allow exec probes because we cant call exec from the queue proxy. There is a special treatment for that case.

cc @dprotaso

@FloMedja
Copy link
Author

FloMedja commented Feb 7, 2025

Thanks @skonto . The documentation explains in more detail what I noticed in the code. At the moment , the exec probe on the main container is delegate to Kubernetes. In that case the QP rely on a TCP probe to the exposed port of the main container. When a exec probe is defined on a sidecar container it is just ignored.

Would it be a good idea to also delegate the exec probe to Kubernetes when defined on sidecar containers ? The QP won't have a TCP Probe to rely on in that case, but that won't be an issue since no traffic should be directly forwarded to the sidecar container.

@dprotaso
Copy link
Member

dprotaso commented Feb 9, 2025

Would it be a good idea to also delegate the exec probe to Kubernetes when defined on sidecar containers ?

I think this is the path forward. If there is any exec probes in the Pod we should just defer to kubernetes to signal the health of that Pod and avoid any optimizations that we have.

We already do this with exec probes on the main container

/triage accepted

@knative-prow knative-prow bot added the triage/accepted Issues which should be fixed (post-triage) label Feb 9, 2025
@FloMedja
Copy link
Author

@dprotaso @skonto Thanks for the answers. I will move forward with the implementation.

@FloMedja
Copy link
Author

FloMedja commented Feb 12, 2025

@dprotaso @skonto I have a MR #15773. I will need a member ok knative organization to review and mark it as /ok-to-test.
Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
@dprotaso @skonto @FloMedja and others