Open
Conversation
Contributor
This was referenced Mar 31, 2026
scotwells
added a commit
that referenced
this pull request
Apr 1, 2026
> [!NOTE] > CI shows 6 failing e2e tests — these are pre-existing failures unrelated to this PR. See #549 for the fix. ## Summary - Project creation could stall for up to 10 minutes while another project was being deleted. The controller now handles deletions without blocking other work. - Deletion progress is visible via a new `ResourceCleanup` status condition on the project, so operators can see exactly what phase the cleanup is in. - The controller now processes up to 4 projects concurrently instead of 1. ## What changed The project controller previously ran a synchronous cleanup operation during deletion that could block for up to 10 minutes. During that time, no other project — including newly created ones — could be reconciled. The cleanup is now split into two non-blocking steps: issuing delete commands (fast) and checking whether resources have drained (cheap poll). The controller tracks progress via a `ResourceCleanup` condition with three reasons: - `CleanupStarted` → delete commands are being issued - `CleanupAwaitingCompletion` → waiting for resources to be removed - `CleanupComplete` → all resources removed, finalizer can be released If resources haven't drained after a check, the controller transitions back to `CleanupStarted` to automatically re-issue delete commands on the next reconcile. ## Test plan - [x] Existing unit tests pass (`task test:unit`) - [ ] Delete a project and verify the `ResourceCleanup` condition progresses through `CleanupStarted` → `CleanupAwaitingCompletion` → `CleanupComplete` - [ ] Create a new project while another project is being deleted — creation should not be delayed - [ ] Kill the controller pod mid-deletion and verify cleanup resumes after restart - [x] End-to-end tests pass (`task test:end-to-end`) — new `project-deletion` test passes in 20.7s 🤖 Generated with [Claude Code](https://claude.com/claude-code)
scotwells
added a commit
that referenced
this pull request
Apr 2, 2026
## Summary - Wires the existing finalizer into the UserInvitation controller so invitation-related PolicyBindings (`getinvitation` and `acceptinvitation`) are deleted before an invitation is removed - Removes a redundant manual PolicyBinding deletion in the acceptance path since the finalizer now handles cleanup of both bindings - Adds unit tests verifying finalizer registration and PolicyBinding garbage collection - Adds an end-to-end Chainsaw test for the full invitation lifecycle ## Context When a UserInvitation was deleted (accepted, expired, or revoked), the associated PolicyBindings in `milo-system` were left behind because the controller's finalizer was registered but never invoked during reconciliation. This produced 49 orphaned PolicyBindings in production stuck in `ResourceSelectorValidationFailed` state. Closes #535 ## Test plan - [x] Unit tests verify finalizer is added on first reconcile - [x] Unit tests verify both PolicyBindings are deleted on UserInvitation deletion - [x] Existing unit tests updated to account for finalizer registration cycle - [x] Chainsaw e2e test validates full lifecycle: create invitation → verify PolicyBindings → delete invitation → verify cleanup - [x] e2e test passes in CI > [!NOTE] > The CI workflow reports a failure due to pre-existing e2e test issues on `main` (`note-multicluster-subject`, `clusternote-multicluster-subject`, `crm-note-contact-lifecycle`). These are unrelated to this PR and are being resolved in #549. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
scotwells
added a commit
that referenced
this pull request
Apr 2, 2026
## Summary - Adds a finalizer to the UserController that deletes all OrganizationMemberships referencing a user before allowing the User object to be removed - Extends the OrganizationMembership validation webhook to allow last-owner membership deletion when the referenced User is being deleted (verified via direct API server read, not cache) - Fixes the Organization webhook and UserInvitation controller to set correct User ownerReferences on OrganizationMemberships - Adds a two-pass self-delete in the OrganizationMembership controller for existing orphaned memberships where the User no longer exists ## Context When a User was deleted, OrganizationMembership resources referencing that user were not cleaned up because they had no User ownerReference (webhook creation path) or had a malformed one (invitation path used `.Group` instead of `.String()` for APIVersion). The orphaned memberships left their owned PolicyBindings stuck in `SubjectValidationFailed` state. Closes #536 ## Test plan - [x] Webhook tests verify deletion is allowed when User has DeletionTimestamp - [x] Webhook tests verify deletion is allowed when User is already gone - [x] Webhook tests verify last-owner guard still blocks when User is active - [x] All existing unit tests pass - [x] Chainsaw e2e test validates full lifecycle: create user + org → verify membership → delete user → verify cleanup - [x] e2e test passes in CI > [!NOTE] > The CI workflow reports a failure due to pre-existing e2e test issues on `main` (`note-multicluster-subject`, `clusternote-multicluster-subject`, `crm-note-contact-lifecycle`). These are unrelated to this PR and are being resolved in #549. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
- Add control plane reachability wait to notes multicluster tests - Add quota grant for CRM note lifecycle test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
05f42e6 to
c9f2743
Compare
The apiserver was being OOMKilled at 512Mi during e2e tests when multiple chainsaw tests run concurrently, causing cascading failures. Bump memory limit to 1Gi and CPU limit to 1 core, with requests raised proportionally to 512Mi memory and 200m CPU. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Context
Three e2e tests have been failing on main for some time:
note-multicluster-subjectandclusternote-multicluster-subject— wrote to the project control plane immediately afterProject Ready=true, but the proxy wasn't reachable yet. Fixed by adding aGrantCreationPolicy+ wait forResourceGrantin the project control plane (same pattern asmulti-cluster-enforcement).crm-note-contact-lifecycle— Note creation was rejected with "Insufficient quota resources available" because noResourceGrantexisted. Fixed by adding a grant in the test setup.Test plan
🤖 Generated with Claude Code