Skip to content
This repository has been archived by the owner on Nov 3, 2024. It is now read-only.

Children of objects handled in Response are pruned, although they aren't touched by the Router #91

Open
antoineco opened this issue Jul 7, 2023 · 1 comment

Comments

@antoineco
Copy link

antoineco commented Jul 7, 2023

Context

I am using baaah to manage Tekton TaskRun objects.

When baaah creates those TaskRuns, they are automatically labeled with apply.acorn.io/hash, as expected.

Problem

Tekton propagates labels from the parent TaskRun to its children Pods, but baaah uses the aforementioned label to determine whether it should prune objects, and ends up pruning Pods which it doesn't manage.

Demonstration

Here is a debug session as an illustration, where the code snippets correspond to the breakpoints I set:

existing, err := a.list(gvk, set, objs)
if err != nil {
return fmt.Errorf("failed to list %s for %s: %w", gvk, debugID, err)
}

(dlv) print set
k8s.io/apimachinery/pkg/labels.Selector(k8s.io/apimachinery/pkg/labels.internalSelector) [
        {
                key: "apply.acorn.io/hash",
                operator: "=",
                strValues: []string len: 1, cap: 1, [
                        "576cd23303466727e7fab759412925c6f2bf8934",
                ],},
]
(dlv) print existing
map[github.com/acorn-io/baaah/pkg/apply/objectset.ObjectKey]sigs.k8s.io/controller-runtime/pkg/client.Object [
        {Name: "pod-managed-by-tekton", Namespace: "default"}: *k8s.io/api/core/v1.Pod {
                TypeMeta: (*"k8s.io/apimachinery/pkg/apis/meta/v1.TypeMeta")(0xc000a3a000),
                ObjectMeta: (*"k8s.io/apimachinery/pkg/apis/meta/v1.ObjectMeta")(0xc000a3a020),
                Spec: (*"k8s.io/api/core/v1.PodSpec")(0xc000a3a108),
                Status: (*"k8s.io/api/core/v1.PodStatus")(0xc000a3a328),},
        {Name: "pod-managed-by-baaah", Namespace: "default"}: *k8s.io/api/core/v1.Pod {
                TypeMeta: (*"k8s.io/apimachinery/pkg/apis/meta/v1.TypeMeta")(0xc000a3a428),
                ObjectMeta: (*"k8s.io/apimachinery/pkg/apis/meta/v1.ObjectMeta")(0xc000a3a448),
                Spec: (*"k8s.io/api/core/v1.PodSpec")(0xc000a3a530),
                Status: (*"k8s.io/api/core/v1.PodStatus")(0xc000a3a750),},
]

var toReplace []objectset.ObjectKey
toCreate, toDelete, toUpdate := compareSets(existing, objs)
// check for resources in the objectset but under a different version of the same group/kind
toDelete = a.filterCrossVersion(allObjs, gvk, toDelete)

(dlv) print toDelete
[]github.com/acorn-io/baaah/pkg/apply/objectset.ObjectKey len: 1, cap: 1, [
        {
                Name: "pod-managed-by-tekton",
                Namespace: "default",},
]

As seen above, baaah matches on the Pod managed by Tekton, and ends up pruning that Pod, despite the Router's client never ever interacting with that particular Pod.

@antoineco
Copy link
Author

antoineco commented Jul 8, 2023

Here is a sample program that reproduces the issue. Tekton must be running.

go.mod

module repro

go 1.20

require (
        github.com/acorn-io/baaah v0.0.0-20230707151126-5d519d272865
        github.com/tektoncd/pipeline v0.47.0
        k8s.io/api v0.27.3
        k8s.io/apimachinery v0.27.3
)

main.go

package main

import (
	"context"
	"fmt"
	"os/signal"
	"syscall"

	"github.com/acorn-io/baaah"
	"github.com/acorn-io/baaah/pkg/router"

	corev1 "k8s.io/api/core/v1"
	kerrors "k8s.io/apimachinery/pkg/api/errors"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/apimachinery/pkg/labels"
	"k8s.io/apimachinery/pkg/runtime"

	pipelinev1 "github.com/tektoncd/pipeline/pkg/apis/pipeline/v1"
)

func main() {
	ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
	defer cancel()

	r, err := newRouter()
	if err != nil {
		panic(fmt.Errorf("initializing router: %w", err))
	}

	if err = r.Start(ctx); err != nil {
		panic(fmt.Errorf("starting router: %w", err))
	}

	select {}
}

func newRouter() (*router.Router, error) {
	s := runtime.NewScheme()

	var sb runtime.SchemeBuilder
	sb.Register(corev1.AddToScheme, pipelinev1.AddToScheme)

	if err := sb.AddToScheme(s); err != nil {
		return nil, fmt.Errorf("adding types to scheme: %w", err)
	}

	r, err := baaah.DefaultRouter("issue91", s)
	if err != nil {
		return nil, fmt.Errorf("creating baaah router: %w", err)
	}

	sel, err := labels.Parse("issue-number=91")
	if err != nil {
		return nil, fmt.Errorf("parsing labels selector: %w", err)
	}

	r.Type(&corev1.Pod{}).
		Selector(sel).
		HandlerFunc(func(req router.Request, resp router.Response) error {
			p := req.Object.(*corev1.Pod)

			tr := &pipelinev1.TaskRun{}
			err := req.Get(tr, p.Namespace, p.Name)
			switch {
			case kerrors.IsNotFound(err):
				tr = newTaskRun(p.Namespace, p.Name)
			case err != nil:
				return fmt.Errorf("parsing labels selector: %w", err)
			}

			resp.Objects(tr)
			return nil
		})

	return r, nil
}

func newTaskRun(ns, name string) *pipelinev1.TaskRun {
	return &pipelinev1.TaskRun{
		ObjectMeta: metav1.ObjectMeta{
			Namespace: ns,
			Name:      name,
		},
		Spec: pipelinev1.TaskRunSpec{
			TaskSpec: &pipelinev1.TaskSpec{
				Steps: []pipelinev1.Step{{
					Image:   "alpine",
					Command: []string{"sleep", "20"},
				}},
			},
		},
	}
}

go run ./
kubectl run --labels='issue-number=91' --image nginx:alpine issue91
watch -n 0.3 kubectl get pods,taskruns.tekton.dev

Notice how the Pod issue91-pod gets terminated by baaah, then re-created by Tekton, multiple times until Tekton gives up:

Every 0.3s: kubectl get pods,taskruns.tekton.dev

NAME              READY   STATUS        RESTARTS   AGE
pod/issue91       1/1     Running       0          12s
pod/issue91-pod   0/1     Terminating   0          4s    <----------- pruned by baaah

NAME                         SUCCEEDED   REASON    STARTTIME   COMPLETIONTIME
taskrun.tekton.dev/issue91   Unknown     Pending   11s

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant