Skip to content

Conversation

@SameerMesiah97
Copy link
Contributor

@SameerMesiah97 SameerMesiah97 commented Dec 7, 2025

Description

This PR introduces a new on_finish_action mode, delete_active_pod, which deletes pods that are in the 'Pending' or 'Running' state when the task finishes. This allows Airflow to clean up zombie or orphaned pods that never reach a terminal phase.

Why introduce a new flag instead of changing delete_succeeded_pod?

The behavior of delete_succeeded_pod was left unchanged to avoid breaking backward compatibility. Changing its semantics to also delete active pods could alter expectations for existing users who rely on the current behavior for debugging or operational workflows. Introducing a new explicit flag ensures clarity while preserving existing behavior.

Tests

The deletion logic inside process_pod_deletion previously had no direct unit test coverage.

This PR adds a parameterized test that verifies pod deletion behavior across all supported modes ('delete_pod', 'delete_succeeded_pod', and the new 'delete_active_pod') and pod phases ('Pending', 'Running', 'Completed and Succeeded'). This ensures that the new flag behaves correctly and that existing behavior remains stable.

Existing tests have been updated where applicable to handle the 'delete_active_pod' cleanup mode.

Documentation

Docstring updated to reflect the new delete_active_pod option.

Related Issues

closes: #59083

@shubham36deshpande
Copy link

@SameerMesiah97, I see you have taken condition

self.on_finish_action == OnFinishAction.DELETE_ACTIVE_POD
and (pod.status.phase == PodPhase.RUNNING or pod.status.phase == PodPhase.PENDING)

isnt this true for all the running tasks?

Also, can we implement a timebased mechanism for this? as some pods might be in pending state due to resource constaints for 1-2 mins and we dont want to delete them.

@SameerMesiah97
Copy link
Contributor Author

SameerMesiah97 commented Dec 7, 2025

@SameerMesiah97, I see you have taken condition

self.on_finish_action == OnFinishAction.DELETE_ACTIVE_POD
and (pod.status.phase == PodPhase.RUNNING or pod.status.phase == PodPhase.PENDING)

isnt this true for all the running tasks?

process_pod_deletion is not called arbitrarily but on cleanup i.e. when the task has terminated/completed. You are right that the condition applies to all running pods but the code path in the PR will not pick them up unless the overarching task triggers the clean_up function. In addition to this, there are 2 other scenarios where process_pod_deletion is called:

  1. reattach_on_restart is set to 'True' and an existing pod is found. The 'on_finish_action' variable decides whether to delete this pod.
  2. A pod with duplicate labels if found but this code path is only activated when 'on_finish_action' is set to 'delete_pod'.

In both of these 2 scenarios, only the pod(s) related to the tasks calling these functions will be affected.

Also, can we implement a timebased mechanism for this? as some pods might be in pending state due to resource constaints for 1-2 mins and we dont want to delete them.

The motivation behind this PR is to give users the option to delete zombie pods that may continue to be active after the task has been terminated. We could add a wait period before deletion for active pods but I am not sure if it belongs in this function as it is intended to facilitate clean ups rather than fetching existing pods when the task begins execution. Perhaps, this is out of scope?

Thank you for the feedback.

@SameerMesiah97 SameerMesiah97 force-pushed the 59083-Active-Pod-Deletion branch 2 times, most recently from b4b9593 to 3d300d6 Compare December 7, 2025 19:56
@SameerMesiah97 SameerMesiah97 force-pushed the 59083-Active-Pod-Deletion branch from 3d300d6 to 2f19864 Compare December 8, 2025 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kubernetes Pod Operator Task keeps Pod with Pending/Running state in detele_succeded_pod

2 participants