Skip to content

Fix ResourceClaim cleanup on v1beta2 clusters#642

Open
iaroslav-reflection wants to merge 2 commits into
ai-dynamo:mainfrom
iaroslav-reflection:reflectionai/v0.1.0-alpha.8-dra-v1beta2
Open

Fix ResourceClaim cleanup on v1beta2 clusters#642
iaroslav-reflection wants to merge 2 commits into
ai-dynamo:mainfrom
iaroslav-reflection:reflectionai/v0.1.0-alpha.8-dra-v1beta2

Conversation

@iaroslav-reflection
Copy link
Copy Markdown

Summary

  • keep the preferred resource.k8s.io/v1 ResourceClaim cleanup path
  • fall back to resource.k8s.io/v1beta2 when the API server does not serve v1
  • route PCS/PCLQ ResourceClaim list and delete cleanup through the shared fallback helpers

Before / After

Area Before After
ResourceClaim delete Finalizer cleanup used resource.k8s.io/v1 only. Cleanup tries v1 first and falls back to v1beta2 on Kubernetes no-match errors.
ResourceClaim delete collection PCS/PCLQ delete paths could fail before matching labels if the cluster lacked v1. Delete collection uses the same v1 -> v1beta2 fallback.
Finalizer verification list Cleanup verification listed ResourceClaims as v1 metadata only. Metadata listing falls back to v1beta2, so finalizers can verify cleanup on Kubernetes 1.33 clusters.

Validation

  • go test ./internal/resourceclaim ./internal/controller/podcliqueset/components/resourceclaim ./internal/controller/podclique/components/resourceclaim
  • go test ./internal/resourceclaim -run 'Test(DeleteResourceClaim|DeleteResourceClaims|ListResourceClaimMetadata)' -count=1 -v

Context

Some Kubernetes 1.33 clusters serve DRA ResourceClaims as resource.k8s.io/v1beta2 but not resource.k8s.io/v1. The old cleanup code returned a RESTMapper no-match error in that case, which could leave Grove finalizers stuck even when there were no matching ResourceClaims to delete.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creation of the ResourceClaim uses v1. Will this not be an issue for k8s version 1.33 where the version was v1beta2?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly a Get call uses only v1 and does not support v1beta2

@saturley-hall
Copy link
Copy Markdown
Member

saturley-hall commented Jun 3, 2026

/ok to test feebecc

Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
@nvda-mesharma
Copy link
Copy Markdown

/ok to test 90f52d1

1 similar comment
@dillon-cullinan
Copy link
Copy Markdown

/ok to test 90f52d1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants