Skip to content

Requeue zone update when context is cancelled#1965

Open
vrutkovs wants to merge 5 commits intomasterfrom
context-cancelled
Open

Requeue zone update when context is cancelled#1965
vrutkovs wants to merge 5 commits intomasterfrom
context-cancelled

Conversation

@vrutkovs
Copy link
Collaborator

Attach a more detailed error every time we cancel the context. Requeue the request if the cancellation occurred during zone processing. This would prevent some zones from being left untouched, as otherwise the controller would restart from scratch.

Fixes #1962

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 8 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/controller/operator/controllers_test.go">

<violation number="1" location="internal/controller/operator/controllers_test.go:355">
P2: This test uses an unreachable `context.Canceled`+`ErrZone` error shape, so it does not verify the real requeue-on-zone-cancel path.</violation>
</file>

<file name="internal/controller/operator/factory/vmdistributed/zone.go">

<violation number="1" location="internal/controller/operator/factory/vmdistributed/zone.go:406">
P1: `WithCancelCause` alone is not enough here: `wait.PollUntilContextCancel` only returns `ctx.Err()`, so `ErrZone` never reaches the reconcile code and the new requeue path will not trigger.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@vrutkovs vrutkovs force-pushed the context-cancelled branch from 998bb02 to 3e2db8b Compare March 13, 2026 13:31
@vrutkovs vrutkovs force-pushed the context-cancelled branch from 303cbe5 to d35cbc1 Compare March 13, 2026 15:10
@AndrewChubatiuk
Copy link
Contributor

this solution most likely doesn't cover a case described in an issue, when VMCluster reconciliation for some reason returned context.Canceled and VMDistributed waits for it's readiness forever

contextCancelErrorsTotal.Inc()
var errZone *vmdistributed.ErrZone
if errors.As(err, &errZone) {
return ctrl.Result{Requeue: true}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just ignore cause from cmd/main.go and requeue for all others cases?
in this case other causes are not needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it here 00bc895

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also let's keep only this cancelWithCause and drop the rest

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be removed in #1964 anyway? I want to keep this PR minimal so that it can be backported to 0.68 (not so sure about #1964).

Perhaps its easier to merge these changes in #1964?

Copy link
Contributor

@AndrewChubatiuk AndrewChubatiuk Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean all WithCancelCause added in this PR. Let's drop the rest besides one, which actually impacts reconcile behaviour, #1964 initially for a different purpose, it keeps only one function for reconcile errors handling and processes all reconcile errors in this function

@vrutkovs vrutkovs force-pushed the context-cancelled branch from 7303054 to 62338e1 Compare March 16, 2026 08:19
@vrutkovs vrutkovs force-pushed the context-cancelled branch from 62338e1 to fa3e1b3 Compare March 16, 2026 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

handleReconcileErr swallows context.Canceled without requeueing, permanently dropping CRs from work queue

2 participants