KubernetesPodOperator finishes despite marked as failed #45830

pavelpi · 2025-01-21T13:39:33Z

pavelpi
Jan 21, 2025

we have long running task as KubernetesPodOperator.

When we mark the task as failed, it is marked red immediately on UI and the rest of the pipeline finishes accordingly. However Airflow does not kill the pod and when the pod finishes, the task is turned green despite the fact the DAG already finished long ago.

Is there any way how to tell Airflow to kill the pod in kubernetes when we mark the task as failed?

Tested both with versions 2.7.3 and 2.10.4.

Looks similar to #37263, but in our case is_delete_operator_pod does not have any effect

potiuk · 2025-01-21T13:59:59Z

potiuk
Jan 21, 2025
Collaborator

From what I see in the code - KPO should use delete_namespaced_pod method

airflow/providers/src/airflow/providers/cncf/kubernetes/operators/pod.py

Line 1043 in 287b1d8

if self.termination_grace_period is not None:

depending on parameter you have there can be different grace_periods set - there is 0 for immediate, default depending on your POD default temaplate, you can also pas different parameter. Also in case POD cannot be deleted for whatever reason you should see a log.

One scenario which I can see it happening is when you have long enough grace period and your running task also needs to respond to various SIGNALS (likely TERM/HUP) that K8S uses to kill your task. If you have long running, badly written C-code that does not respond to signals, or where you do not handle signal propagation in your container (for example use bash as entrypoint) https://www.kaggle.com/code/residentmario/best-practices-for-propagating-signals-on-docker the initial signals that K8S sends might be swallowed and the POD will continue running until grace period expires.

That's theory. In theory either forcing grace period to 0 or properly handling signals and terminating your in-container process should do the job.

But maybe there are other problems I am not aware of.

0 replies

pavelpi · 2025-01-21T17:20:04Z

pavelpi
Jan 21, 2025
Author

Thanks @potiuk for quick reply!

I was actually able to reproduce this behavior with python operator that just waits for 30s. If you mark it as failed from UI, it is marked red, it still works in the background and once you refresh the frontend in 30s, the task is showed as green.
So the assumption is that this is generic Airflow feature - it does not stop running tasks when you mark them as failed on UI.

for reference this is the DAG:

from datetime import datetime
from airflow.operators.python import PythonOperator
from airflow import DAG
import time


def wait_callable():
    for i in range(1, 30):
        print(f'sleep #{0}', i)
        time.sleep(1)


default_args = {
    'owner': 'airflow',
    'start_date': datetime(2020, 3, 4)
}

dag = DAG("wait_dag_python",
          default_args=default_args,
          schedule=None)


wait_task = PythonOperator(
    task_id='wait_task',
    python_callable=wait_callable,
    dag=dag
)

the logfile shows printed line every 1s regardless if you fail the task or not.
Is there any way how to terminate the task when I fail it from UI manually?

1 reply

potiuk Jan 21, 2025
Collaborator

It really depends on what executor you have - what executor are you using?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KubernetesPodOperator finishes despite marked as failed #45830

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

KubernetesPodOperator finishes despite marked as failed #45830

pavelpi Jan 21, 2025

Replies: 2 comments · 1 reply

potiuk Jan 21, 2025 Collaborator

pavelpi Jan 21, 2025 Author

potiuk Jan 21, 2025 Collaborator

pavelpi
Jan 21, 2025

Replies: 2 comments 1 reply

potiuk
Jan 21, 2025
Collaborator

pavelpi
Jan 21, 2025
Author

potiuk Jan 21, 2025
Collaborator