You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when the Kafka pods are rolled, we issue Kubernetes Events describing the reason for the restart. It is done only for the Kafka, Connect and MM2 node restarts. The events are issued to the Pods as the main objects.
This approach has several issues:
Pods have many events when they are restarted. So the restart reason event is issued by Strimzi CO, it is easily lost among them
Very often, the restart reason is Pod has old revision which means that the Pod definition has changed -> but the root cause of the change could be for example updated listener certificate or something similar.
I think we should consider the future of the events used by the operator and two options for how to deal with them come to my mind:
We can issue them with the custom resource as the main object they reference (i.e. the regarding field). That would make it easier to find the events as the custom resource will have only our events and not the events related to the Pod lifecycle. The Pod might be referenced as the related resource if needed. Issuing the events to the custom resource might also make it easier to consider other situations when we might want to issue events.
We can simply remove them. There seem to be some users using them, but I think it is a relatively small number of users. So removing them might help to simplify our codebase and testing.
The text was updated successfully, but these errors were encountered:
scholzj
changed the title
Reconsider design restart events issued when rolling Kafka Pods
Reconsider design of restart events issued when rolling Kafka Pods
Dec 15, 2024
I have never used them directly but my guess is that Kube events are useful for some users (as you said we have them if not that many). Even the idea about integrating self-healing could rely on events in the future.
My take on this is going with option 1. I agree that our events could be "missed" in between pods lifecycle events. My question is about which custom resource you are referring to? Our pods are related to the StrimziPodSet resource so are you referring to it and then using "related" to specify the specific pod?
I would also vote for option 1 of having them use the custom resource as the main object they reference. I think it is useful to be able to see the restart as an event rather than having to always look through the operator logs.
Currently, when the Kafka pods are rolled, we issue Kubernetes Events describing the reason for the restart. It is done only for the Kafka, Connect and MM2 node restarts. The events are issued to the Pods as the main objects.
This approach has several issues:
Pod has old revision
which means that the Pod definition has changed -> but the root cause of the change could be for example updated listener certificate or something similar.I think we should consider the future of the events used by the operator and two options for how to deal with them come to my mind:
regarding
field). That would make it easier to find the events as the custom resource will have only our events and not the events related to the Pod lifecycle. The Pod might be referenced as therelated
resource if needed. Issuing the events to the custom resource might also make it easier to consider other situations when we might want to issue events.The text was updated successfully, but these errors were encountered: