bug: `blade create k8s container-container remove` not works as expect #1092

spencercjh · 2025-01-13T09:00:54Z

Issue Description

bug report

Describe what happened (or what feature you want)

There is a bug in the implementation of container deletion logic for the Containerd runtime in chaosblade-exec-cri. The current implementation only removes metadata from the bucket usingContainerService().Delete`, but it does not properly delete the associated task (including processes) or the container's rootfs snapshot.

This issue causes Kubelet to fail to update the container state via CRI, which prevents new containers from being created for the affected Pod.

Describe what you expected to happen

The target container restarted with a non-zero exit code like 137.

How to reproduce it (as minimally and precisely as possible)

Use blade create k8s container-container remove to remove a container in a Pod of Kubernetes.
Watch Pods and there SHOULD be changes about ready container counts like (1/1 to 0/1 for the exp then 1/1 for the new container start)
kubectl exec -it $THE_TARGET_POD -c $TARGET_CONTAINER -- bash leads an internal error from CRI for container not found.

Tell us your environment

containerd 1.6.5
the latest chaosblade

Anything else we need to know?

I think the whole chaosblade-exec-cri implementation doesn't make a lot of sense and deviates from the cri in the repo's name (It means interacting with the container runtime via the CRI interface)
The compromise is to refer to containerd cmd/ctr/commands/containers/containers.go to remove containers and their associated resources.

Here is some examples worked on my env to fix:

Use `containerd.Container`

// Load the container
container, err := c.cclient.LoadContainer(ctx, containerId)
if err != nil {
    return err
}
task, err := container.Task(ctx, nil)
if err != nil {
    return err
}
 
// Stop the task if it's running
if task != nil {
    // Send SIGTERM
    if err := task.Kill(ctx, syscall.SIGKILL); err != nil {
        return err
    }
    // Delete the task
    if _, err := task.Delete(ctx, containerd.WithProcessKill); err != nil {
        return err
    }
}

Use existed containerd client's Service API

if _, err := c.cclient.TaskService().Kill(ctx, &tasks.KillRequest{
	ContainerID: containerId,
	Signal:      uint32(syscall.SIGKILL),
}); err != nil {
	return err
}

return nil

Use `k8s.io/cri-client`

klogger := klog.FromContext(ctx)
criSvc, err := cripkg.NewRemoteRuntimeService(
	"unix:///run/containerd/containerd.sock",
	15*time.Second,
	noop.NewTracerProvider(),
	&klogger,
)
if err != nil {
	return err
}

if err := criSvc.StopContainer(ctx, containerId, 0); err != nil {
	return err
}

return nil

The above will restart the container in the Pod with a restart record.

The text was updated successfully, but these errors were encountered:

spencercjh · 2025-01-14T12:43:49Z

In my real-world testing, crictl and kubelet behave out of the ordinary if the container (with all its information) is completely removed. So I changed the code shown in the issue.

The main problem is that it affects the information in the containerStatuses(from kubectl get pods -o yaml) field of the pod (the last container shown is incorrect, it won't be the one you removed), the restart information of the container and the restart count of the pod will be inaccurate (e.g. it stays at 0 forever). I'm sorry I don't have much time to read through the kube-apiserver, kubelet and containerd cri plugin server code to explain why this happened.

I took a cursory look at the code in the containerd cri plugin, which has the Status StatusStorage field, and it looks like cri stores more information about the status of containers for use in Kubernetes.

xcaspar · 2025-01-22T09:07:35Z

The core problem of ChaosBlade is that the stop container operation is missing in the ChaosBlade RemoveContainer implementation, which also lacks the task kill operation. There is no problem with the above repair solutions. I suggest using the second solution, which is Use existed containerd client's Service API

xcaspar self-assigned this Jan 13, 2025

xcaspar added the chaosblade-exec-cri label Jan 13, 2025

xcaspar assigned MandssS Jan 13, 2025

spencercjh mentioned this issue Jan 23, 2025

fix: kill task before delete container chaosblade-io/chaosblade-exec-cri#22

Merged

xcaspar closed this as completed in chaosblade-io/chaosblade-exec-cri#22 Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: `blade create k8s container-container remove` not works as expect #1092

bug: `blade create k8s container-container remove` not works as expect #1092

spencercjh commented Jan 13, 2025 •

edited

Loading

spencercjh commented Jan 14, 2025 •

edited

Loading

xcaspar commented Jan 22, 2025

bug: blade create k8s container-container remove not works as expect #1092

bug: blade create k8s container-container remove not works as expect #1092

Comments

spencercjh commented Jan 13, 2025 • edited Loading

Issue Description

Describe what happened (or what feature you want)

Describe what you expected to happen

How to reproduce it (as minimally and precisely as possible)

Tell us your environment

Anything else we need to know?

Use containerd.Container

Use existed containerd client's Service API

Use k8s.io/cri-client

spencercjh commented Jan 14, 2025 • edited Loading

xcaspar commented Jan 22, 2025

bug: `blade create k8s container-container remove` not works as expect #1092

bug: `blade create k8s container-container remove` not works as expect #1092

spencercjh commented Jan 13, 2025 •

edited

Loading

Use `containerd.Container`

Use `k8s.io/cri-client`

spencercjh commented Jan 14, 2025 •

edited

Loading