Skip to content

reconcile: wait till PVC is in bound phase#1971

Open
AndrewChubatiuk wants to merge 1 commit intomasterfrom
wait-for-pvc-status
Open

reconcile: wait till PVC is in bound phase#1971
AndrewChubatiuk wants to merge 1 commit intomasterfrom
wait-for-pvc-status

Conversation

@AndrewChubatiuk
Copy link
Contributor

@AndrewChubatiuk AndrewChubatiuk commented Mar 16, 2026

fixes #1970

Summary by cubic

Wait for PVCs to reach the Bound phase before proceeding in reconcile and during StatefulSet PVC expansion. This removes race conditions where pods start while PVCs are still Pending or terminating.

  • Bug Fixes
    • Added waitForPVCBound with 1s polling and a 5s timeout; respects PVC generation and errors on unexpected phases or termination.
    • Wrapped PVC create/update in a conflict-retry flow and skipped updates for PVCs with a non-zero DeletionTimestamp.
    • Invoked the wait after PVC reconcile and in updateSTSPVC to ensure expansions complete before continuing.

Written for commit 83d01f0. Summary will update on new commits.

}

return updatePVC(ctx, rclient, &existingObj, newObj, prevObj, owner)
func waitForPVCBound(ctx context.Context, rclient client.Client, nsn types.NamespacedName, generation int64) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add unit tests for this function

if !pvc.DeletionTimestamp.IsZero() {
return true, fmt.Errorf("cannot wait for PVC=%s, which is being terminated", nsn.String())
}
if generation > pvc.Generation {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we need generation here at all - and why would PVC generation match the deployment?

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 3 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/controller/operator/factory/reconcile/reconcile.go">

<violation number="1" location="internal/controller/operator/factory/reconcile/reconcile.go:27">
P1: The new PVC wait uses a hardcoded 5s timeout, so PVC binding can fail much earlier than the operator's configured readiness deadlines.</violation>
</file>

<file name="internal/controller/operator/factory/reconcile/statefulset_pvc_expand.go">

<violation number="1" location="internal/controller/operator/factory/reconcile/statefulset_pvc_expand.go:124">
P1: `waitForPVCBound` is called with the StatefulSet name instead of the PVC name, so it polls a non-existent PVC and times out.</violation>
</file>

<file name="internal/controller/operator/factory/reconcile/pvc.go">

<violation number="1" location="internal/controller/operator/factory/reconcile/pvc.go:51">
P1: Terminating PVCs no longer short-circuit successfully; this unconditional wait turns the previous skip path into a reconcile error.</violation>

<violation number="2" location="internal/controller/operator/factory/reconcile/pvc.go:51">
P1: Waiting for PVCs to reach `Bound` here can block valid `WaitForFirstConsumer` claims before their Deployment is created.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


var (
pvcWaitBoundIntervalCheck = 1 * time.Second
pvcWaitReadyTimeout = 5 * time.Second
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The new PVC wait uses a hardcoded 5s timeout, so PVC binding can fail much earlier than the operator's configured readiness deadlines.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/controller/operator/factory/reconcile/reconcile.go, line 27:

<comment>The new PVC wait uses a hardcoded 5s timeout, so PVC binding can fail much earlier than the operator's configured readiness deadlines.</comment>

<file context>
@@ -23,6 +23,8 @@ import (
 
 var (
+	pvcWaitBoundIntervalCheck = 1 * time.Second
+	pvcWaitReadyTimeout       = 5 * time.Second
 	podWaitReadyIntervalCheck = 50 * time.Millisecond
 	appWaitReadyDeadline      = 5 * time.Second
</file context>
Fix with Cubic

case !existingObj.CreationTimestamp.IsZero():
generation = existingObj.Generation
}
return waitForPVCBound(ctx, rclient, nsn, generation)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Waiting for PVCs to reach Bound here can block valid WaitForFirstConsumer claims before their Deployment is created.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/controller/operator/factory/reconcile/pvc.go, line 51:

<comment>Waiting for PVCs to reach `Bound` here can block valid `WaitForFirstConsumer` claims before their Deployment is created.</comment>

<file context>
@@ -20,24 +21,58 @@ import (
+	case !existingObj.CreationTimestamp.IsZero():
+		generation = existingObj.Generation
 	}
+	return waitForPVCBound(ctx, rclient, nsn, generation)
+}
 
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cluster is operational prematurely on PVC expand

2 participants