Skip to content

Fix/clusterqueue cascade deletion#434

Open
salexo wants to merge 2 commits intomainfrom
fix/clusterqueue-cascade-deletion
Open

Fix/clusterqueue cascade deletion#434
salexo wants to merge 2 commits intomainfrom
fix/clusterqueue-cascade-deletion

Conversation

@salexo
Copy link
Copy Markdown
Collaborator

@salexo salexo commented Mar 24, 2026

Description

Recreating the KaiwoQueueConfig while there are active workloads attached to Kueue ClusterQueues caused a deadlock. This PR provides a fix by removing owner references between KaiwoQueueConfig and ClusterQueues.

Replace ownerReference-based ownership of ClusterQueues with a
kaiwo.silogen.ai/managed-by label. The ownerReference caused Kubernetes
GC to cascade-delete all ClusterQueues when KaiwoQueueConfig was
deleted, but Kueue's resource-in-use finalizer blocked deletion of
queues with active workloads, creating a permanent deadlock where
terminating queues blocked creation of replacements with the same name.

Changes:
- Replace SetControllerReference with managed-by label on ClusterQueues
- Filter ClusterQueue list to only Kaiwo-managed queues
- Skip terminating ClusterQueues instead of trying to update them
- Add RequeueAfter on sync failure so the reconciler retries
- Watch ClusterQueues so deletions trigger re-reconciliation
- Migrate existing ClusterQueues by adding label and stripping stale
  ownerReferences on first reconciliation
- Fix double r.Delete() call in the cleanup loop
@salexo salexo requested a review from AVSuni as a code owner March 24, 2026 08:42
Verifies that ClusterQueues created by the operator have the
managed-by label, no ownerReferences, and survive KaiwoQueueConfig
deletion without being cascade-deleted.
@salexo salexo force-pushed the fix/clusterqueue-cascade-deletion branch from eac9cad to 733586d Compare March 24, 2026 09:10
Copy link
Copy Markdown
Collaborator

@bjorn-amd bjorn-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants