Reconsider design of restart events issued when rolling Kafka Pods #10958

scholzj · 2024-12-15T22:27:04Z

Currently, when the Kafka pods are rolled, we issue Kubernetes Events describing the reason for the restart. It is done only for the Kafka, Connect and MM2 node restarts. The events are issued to the Pods as the main objects.

This approach has several issues:

Pods have many events when they are restarted. So the restart reason event is issued by Strimzi CO, it is easily lost among them
Very often, the restart reason is Pod has old revision which means that the Pod definition has changed -> but the root cause of the change could be for example updated listener certificate or something similar.

I think we should consider the future of the events used by the operator and two options for how to deal with them come to my mind:

We can issue them with the custom resource as the main object they reference (i.e. the regarding field). That would make it easier to find the events as the custom resource will have only our events and not the events related to the Pod lifecycle. The Pod might be referenced as the related resource if needed. Issuing the events to the custom resource might also make it easier to consider other situations when we might want to issue events.
We can simply remove them. There seem to be some users using them, but I think it is a relatively small number of users. So removing them might help to simplify our codebase and testing.

The text was updated successfully, but these errors were encountered:

ppatierno · 2024-12-17T12:56:28Z

I have never used them directly but my guess is that Kube events are useful for some users (as you said we have them if not that many). Even the idea about integrating self-healing could rely on events in the future.
My take on this is going with option 1. I agree that our events could be "missed" in between pods lifecycle events. My question is about which custom resource you are referring to? Our pods are related to the StrimziPodSet resource so are you referring to it and then using "related" to specify the specific pod?

scholzj · 2024-12-17T13:26:14Z

I was thinking the Kafka, KafkaConnect, KafkaMIrrorMaker2 etc. resources

katheris · 2025-01-02T15:37:09Z

I would also vote for option 1 of having them use the custom resource as the main object they reference. I think it is useful to be able to see the restart as an event rather than having to always look through the operator logs.

scholzj added enhancement needs-triage labels Dec 15, 2024

scholzj changed the title ~~Reconsider design restart events issued when rolling Kafka Pods~~ Reconsider design of restart events issued when rolling Kafka Pods Dec 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconsider design of restart events issued when rolling Kafka Pods #10958

Reconsider design of restart events issued when rolling Kafka Pods #10958

scholzj commented Dec 15, 2024

ppatierno commented Dec 17, 2024

scholzj commented Dec 17, 2024

katheris commented Jan 2, 2025

Reconsider design of restart events issued when rolling Kafka Pods #10958

Reconsider design of restart events issued when rolling Kafka Pods #10958

Comments

scholzj commented Dec 15, 2024

ppatierno commented Dec 17, 2024

scholzj commented Dec 17, 2024

katheris commented Jan 2, 2025