Skip to content

fix: Add controller finalizer when policy is active state only#1775

Merged
jvanz merged 8 commits into
mainfrom
finalizer-removal
May 28, 2026
Merged

fix: Add controller finalizer when policy is active state only#1775
jvanz merged 8 commits into
mainfrom
finalizer-removal

Conversation

@jvanz
Copy link
Copy Markdown
Member

@jvanz jvanz commented May 27, 2026

Description

This changes the controller reconciliation loop to ensure that the finalizer used by the controller will be added in the policy resource only when it is in active state. Therefore, after the kubewarden-controller uninstall, the policies can be removed with no issues.

In other words, when the policy is in:

  • Scheduled status: no Finalizer (new), no {Validating,Mutating}WebhookConfiguration. Can be safely deleted. Will be garbage-collected if the CRD is removed.
  • Active status: has an associated {Validating,Mutating}WebhookConfiguration, must not be garbage-collected away or we break the cluster. Therefore, it also has a Finalizer (as it already had prior to this change).

Tests

Beyond the e2e tests,I have a bash script that simulate helm charts uninstall to see if this is working as expected:

Click to see the testing script
#!/bin/bash
set -e

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

log() {
    echo -e "${BLUE}==>${NC} $1"
}

success() {
    echo -e "${GREEN}${NC} $1"
}

error() {
    echo -e "${RED}${NC} $1"
}

warn() {
    echo -e "${YELLOW}!${NC} $1"
}

# Configuration
CLUSTER_NAME="${CLUSTER_NAME:-kubewarden-test}"
NAMESPACE="${NAMESPACE:-kubewarden}"
CONTROLLER_IMAGE="ghcr.io/kubewarden/adm-controller/controller:dev"
AUDIT_SCANNER_IMAGE="ghcr.io/kubewarden/adm-controller/audit-scanner:dev"
POLICY_SERVER_IMAGE="ghcr.io/kubewarden/adm-controller/policy-server:dev"

# Check if cluster exists
if kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then
    warn "Cluster ${CLUSTER_NAME} already exists"
    read -p "Delete and recreate? (y/N) " -n 1 -r
    echo
    if [[ $REPLY =~ ^[Yy]$ ]]; then
        log "Deleting existing cluster..."
        kind delete cluster --name "${CLUSTER_NAME}"
    else
        log "Using existing cluster"
        REUSE_CLUSTER=true
    fi
fi

# Create cluster if needed
if [[ "$REUSE_CLUSTER" != "true" ]]; then
    log "Creating kind cluster: ${CLUSTER_NAME}"
    kind create cluster --name "${CLUSTER_NAME}"
    success "Cluster created"
fi

# Set kubectl context
kubectl config use-context "kind-${CLUSTER_NAME}"

# Load images into cluster
log "Loading images into kind cluster..."
kind load docker-image "${CONTROLLER_IMAGE}" --name "${CLUSTER_NAME}"
kind load docker-image "${AUDIT_SCANNER_IMAGE}" --name "${CLUSTER_NAME}"
kind load docker-image "${POLICY_SERVER_IMAGE}" --name "${CLUSTER_NAME}"
success "Images loaded"

# Create namespace
log "Creating namespace: ${NAMESPACE}"
kubectl create namespace "${NAMESPACE}" --dry-run=client -o yaml | kubectl apply -f -
kubectl label namespace "${NAMESPACE}" \
    pod-security.kubernetes.io/enforce=restricted \
    pod-security.kubernetes.io/enforce-version=latest \
    --overwrite

# Install CRDs chart
log "Installing kubewarden-crds chart..."
helm upgrade --install kubewarden-crds \
    ./charts/kubewarden-crds \
    --namespace "${NAMESPACE}" \
    --wait
success "kubewarden-crds installed"

# Install controller chart
log "Installing kubewarden-controller chart..."
helm upgrade --install kubewarden-controller \
    ./charts/kubewarden-controller \
    --namespace "${NAMESPACE}" \
    --set image.tag=dev \
    --set auditScanner.image.tag=dev \
    --set logLevel=debug \
    --wait \
    --timeout 5m
success "kubewarden-controller installed"

# Wait for controller to be ready
log "Waiting for controller deployment..."
kubectl wait --for=condition=available --timeout=300s \
    deployment/kubewarden-controller -n "${NAMESPACE}"
success "Controller is ready"

# Create a PolicyServer
log "Creating PolicyServer..."
cat <<EOF | kubectl apply -f -
apiVersion: policies.kubewarden.io/v1
kind: PolicyServer
metadata:
  name: default
  namespace: ${NAMESPACE}
spec:
  image: ${POLICY_SERVER_IMAGE}
  replicas: 1
  serviceAccountName: kubewarden-controller
EOF

# Wait for PolicyServer deployment
log "Waiting for PolicyServer deployment..."
kubectl wait --for=condition=available --timeout=300s \
    deployment/policy-server-default -n "${NAMESPACE}" 2>/dev/null || true
success "PolicyServer deployed"

# Create a sample ClusterAdmissionPolicy
log "Creating sample ClusterAdmissionPolicy..."
cat <<EOF | kubectl apply -f -
apiVersion: policies.kubewarden.io/v1
kind: ClusterAdmissionPolicy
metadata:
  name: privileged-pods
spec:
  module: registry://ghcr.io/kubewarden/policies/pod-privileged:v0.2.5
  rules:
    - apiGroups: [""]
      apiVersions: ["v1"]
      resources: ["pods"]
      operations: ["CREATE", "UPDATE"]
  mutating: false
  policyServer: default
EOF
success "Policy created"

# Show initial state
echo ""
log "Initial state:"
echo ""

echo -e "${BLUE}Policies:${NC}"
kubectl get clusteradmissionpolicies -o custom-columns=\
NAME:.metadata.name,\
STATUS:.status.policyStatus,\
FINALIZERS:.metadata.finalizers

echo ""
echo -e "${BLUE}PolicyServers:${NC}"
kubectl get policyservers -n "${NAMESPACE}"

echo ""
echo -e "${BLUE}ValidatingWebhookConfigurations:${NC}"
kubectl get validatingwebhookconfigurations -l app.kubernetes.io/part-of=kubewarden

echo ""
warn "Press Enter to uninstall kubewarden-controller chart..."
read

# Uninstall controller chart
log "Uninstalling kubewarden-controller chart..."
helm uninstall kubewarden-controller -n "${NAMESPACE}" --wait --timeout 2m
success "kubewarden-controller uninstalled"

# Check policy state after controller uninstall
echo ""
log "State after controller uninstall:"
echo ""

echo -e "${BLUE}Policies (checking finalizers):${NC}"
POLICIES=$(kubectl get clusteradmissionpolicies -o json)
if echo "$POLICIES" | jq -e '.items | length > 0' >/dev/null 2>&1; then
    echo "$POLICIES" | jq -r '.items[] |
        "\(.metadata.name):
          Status: \(.status.policyStatus // "unknown")
          Finalizers: \(.metadata.finalizers // [] | join(", ") | if . == "" then "NONE" else . end)"'

    # Check if any policies have finalizers
    HAS_FINALIZERS=$(echo "$POLICIES" | jq -r '.items[] | select(.metadata.finalizers != null and (.metadata.finalizers | length > 0)) | .metadata.name')
    if [ -z "$HAS_FINALIZERS" ]; then
        success "No policies have finalizers (expected behavior)"
    else
        error "Policies still have finalizers:"
        echo "$HAS_FINALIZERS"
    fi
else
    warn "No policies found"
fi

echo ""
echo -e "${BLUE}ValidatingWebhookConfigurations:${NC}"
WEBHOOKS=$(kubectl get validatingwebhookconfigurations -l app.kubernetes.io/part-of=kubewarden --no-headers 2>/dev/null | wc -l)
if [ "$WEBHOOKS" -eq 0 ]; then
    success "No webhooks remaining (expected behavior)"
else
    error "Webhooks still exist:"
    kubectl get validatingwebhookconfigurations -l app.kubernetes.io/part-of=kubewarden
fi

echo ""
warn "Press Enter to uninstall kubewarden-crds chart (this should NOT hang)..."
read

# Uninstall CRDs chart (this should complete quickly)
log "Uninstalling kubewarden-crds chart..."
START_TIME=$(date +%s)
if timeout 10s helm uninstall kubewarden-crds -n "${NAMESPACE}" --wait; then
    END_TIME=$(date +%s)
    DURATION=$((END_TIME - START_TIME))
    success "kubewarden-crds uninstalled in ${DURATION} seconds (expected: < 5s)"
else
    error "CRD uninstall timed out or failed (indicates finalizer blocking)"
fi

# Verify policies are deleted
echo ""
REMAINING_POLICIES=$(kubectl get clusteradmissionpolicies --no-headers 2>/dev/null | wc -l)
if [ "$REMAINING_POLICIES" -eq 0 ]; then
    success "All policies deleted"
else
    error "${REMAINING_POLICIES} policies still remain"
    kubectl get clusteradmissionpolicies
fi

echo ""
log "Test complete!"
echo ""
echo -e "${GREEN}Summary:${NC}"
echo "  - Controller chart uninstalled successfully"
echo "  - Policies should have NO finalizers after controller uninstall"
echo "  - Webhooks should be removed"
echo "  - CRD chart uninstall should complete in < 5 seconds"

echo ""
warn "Keep cluster for inspection? (y/N)"
read -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
    log "Deleting cluster..."
    kind delete cluster --name "${CLUSTER_NAME}"
    success "Cluster deleted"
else
    echo ""
    echo -e "${BLUE}Cluster preserved. To inspect:${NC}"
    echo "  kubectl config use-context kind-${CLUSTER_NAME}"
    echo ""
    echo -e "${BLUE}To delete later:${NC}"
    echo "  kind delete cluster --name ${CLUSTER_NAME}"
fi

@jvanz jvanz self-assigned this May 27, 2026
Copilot AI review requested due to automatic review settings May 27, 2026 19:36
@jvanz jvanz requested a review from a team as a code owner May 27, 2026 19:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes policy finalizer handling so controller-managed finalizers are no longer added by admission webhooks at creation time, and are instead added during reconciliation once policy activation can proceed. It also adds Helm pre-delete cleanup logic and tests around the finalizer lifecycle.

Changes:

  • Removed finalizer injection from policy and policy group defaulters, with unit test expectations updated.
  • Added finalizer add/remove logic to the policy reconciler around active/unavailable policy server states.
  • Added a Helm pre-delete hook fallback and new controller integration tests for finalizer lifecycle behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
internal/controller/policy_subreconciler.go Moves finalizer management into policy reconciliation.
internal/controller/policy_finalizer_test.go Adds integration-style tests for finalizer lifecycle scenarios.
charts/kubewarden-controller/templates/pre-delete-hook.yaml Expands uninstall hook to delete PolicyServers and patch remaining policy finalizers.
api/policies/v1/clusteradmissionpolicygroup_webhook.go Stops adding finalizer during cluster policy group defaulting.
api/policies/v1/clusteradmissionpolicygroup_webhook_test.go Updates defaulter test to expect no finalizer.
api/policies/v1/clusteradmissionpolicy_webhook.go Stops adding finalizer during cluster policy defaulting.
api/policies/v1/clusteradmissionpolicy_webhook_test.go Updates defaulter test to expect no finalizer.
api/policies/v1/admissionpolicygroup_webhook.go Stops adding finalizer during namespaced policy group defaulting.
api/policies/v1/admissionpolicygroup_webhook_test.go Updates defaulter test to expect no finalizer.
api/policies/v1/admissionpolicy_webhook.go Stops adding finalizer during namespaced policy defaulting.
api/policies/v1/admissionpolicy_webhook_test.go Updates defaulter test to expect no finalizer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/controller/policy_finalizer_test.go Outdated
Comment thread charts/kubewarden-controller/templates/pre-delete-hook.yaml Outdated
Comment thread charts/kubewarden-controller/templates/pre-delete-hook.yaml Outdated
Comment thread internal/controller/policy_finalizer_test.go Outdated
Comment thread internal/controller/policy_subreconciler.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Comment thread internal/controller/policy_finalizer_test.go Outdated
Comment thread internal/controller/policy_finalizer_test.go Outdated
Comment thread internal/controller/policy_finalizer_test.go Outdated
Comment thread internal/controller/policy_finalizer_test.go Outdated
Comment thread internal/controller/policy_finalizer_test.go Outdated
Comment thread internal/controller/policy_finalizer_test.go Outdated
Comment thread internal/controller/policy_finalizer_test.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Comment thread internal/controller/policy_subreconciler.go
@jvanz jvanz force-pushed the finalizer-removal branch 2 times, most recently from 8089dfd to a8a2d87 Compare May 28, 2026 03:43
jvanz added 4 commits May 28, 2026 00:59
Mutating webhooks added finalizers to all policies at creation, even if
the policy never became active, leaving policies with finalizers but no
webhooks. The reconciler now handles finalizers based on actual webhook
state: add when the policy becomes active, remove when the webhook is
deleted. This updates all four policy webhook types and their tests.

Assisted-by: Claude Code
Signed-off-by: José Guilherme Vanz <jguilhermevanz@suse.com>
The reconciler adds finalizers before creating webhooks and removes them
when the PolicyServer is deleted or unavailable. This prevents webhook
leaks if the controller crashes between the two steps. Removing the
finalizer when the PolicyServer is gone lets policies be deleted without
blocking CRD removal.

Assisted-by: Claude Code
Signed-off-by: José Guilherme Vanz <jguilhermevanz@suse.com>
Updates the pre-delete hook to wait for the policy server deletion. This
is added to give time to the controller to reconciate the policies of
the policy server.

Assisted-by: Claude Code
Signed-off-by: José Guilherme Vanz <jguilhermevanz@suse.com>
…rver deleted

Tests now check that finalizers appear when policies reach active state
and disappear when the PolicyServer is deleted. Also covers the upgrade
case where policies might have the old pre-1.14 finalizer that needs
cleaning up.

Assisted-by: Claude Code
Signed-off-by: José Guilherme Vanz <jguilhermevanz@suse.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Comment thread e2e/clusteradmissionpolicy_test.go Outdated
Comment on lines 268 to 273
// Verify policy has finalizer before deletion
require.True(t, containsFinalizer(policy.GetFinalizers(), constants.KubewardenFinalizer),
"Policy should have finalizer before deletion")

// Delete the policy
err := cfg.Client().Resources().Delete(ctx, policy)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a6aebc2.

Comment thread e2e/clusteradmissionpolicy_test.go Outdated
Comment on lines +470 to +471
require.True(t, containsFinalizer(policy.GetFinalizers(), constants.KubewardenFinalizer),
"Policy should have finalizer before deletion")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a6aebc2, which tackles this and the previous copilot review comment.

Comment thread e2e/clusteradmissionpolicy_test.go Outdated
Comment on lines +635 to +651
// Verify finalizer is removed from policy
err = wait.For(conditions.New(cfg.Client().Resources()).ResourceMatch(
&policiesv1.ClusterAdmissionPolicy{ObjectMeta: metav1.ObjectMeta{Name: policyName}},
func(object k8s.Object) bool {
p := object.(*policiesv1.ClusterAdmissionPolicy)
return !containsFinalizer(p.GetFinalizers(), constants.KubewardenFinalizer)
},
), wait.WithTimeout(testTimeout), wait.WithInterval(testPollInterval))
require.NoError(t, err, "Finalizer should be removed when PolicyServer is deleted")

// Verify policy status transitioned to scheduled
var policy policiesv1.ClusterAdmissionPolicy
err = cfg.Client().Resources().Get(ctx, policyName, "", &policy)
require.NoError(t, err)
require.Equal(t, policiesv1.PolicyStatusScheduled, policy.Status.PolicyStatus,
"Policy should be scheduled after PolicyServer deletion")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 87dc788.

Comment on lines +162 to +166
controllerutil.RemoveFinalizer(policy, constants.KubewardenFinalizer)
controllerutil.RemoveFinalizer(policy, constants.KubewardenFinalizerPre114)
if err := r.Update(ctx, policy); err != nil {
return ctrl.Result{}, fmt.Errorf("cannot update policy: %w", err)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is valid.
In addition, this new block in reconcilePolicyServerUnavailable() is the same as we already have in reconcilePolicyDeletion(), we can extract it to a function.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit on my fork addresses this and also DRYes 2 functions into one: viccuad@817f51b

Please review, I'm happy pushing it to this PR's branch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this refactor, do you want to push a new PR?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the commit here

Merge `deleteWebhookConfiguration()` and `reconcilePolicyDeletion()`
into `removePolicyWebhooksAndFinalizers()`.

This removes repetion and gives us a single place where the webhooks are
removed, and we change the finalizers.

Signed-off-by: Víctor Cuadrado Juan <vcuadradojuan@suse.de>
Comment thread e2e/clusteradmissionpolicy_test.go Outdated
require.NoError(t, err, "ValidatingWebhookConfiguration should be deleted")

// Wait for policy to be deleted (finalizer removed)
err = wait.For(conditions.New(cfg.Client().Resources()).ResourceDeleted(policy),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can never succeed, as we add a special integrationTestFinalizer in the policy factories: https://github.com/kubewarden/adm-controller/blob/finalizer-removal/api/policies/v1/factories.go#L225-L234

This is the reason why the e2e tests fail on CI.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed 5401119 that fixes the e2e test errors.

Copy link
Copy Markdown
Member

@flavio flavio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. We need to address the failures and the some of the comments from copilot, as highlighted by Victor.

viccuad added 2 commits May 28, 2026 11:17
The behavior has now changed: Finalizers are only added to policies
not on creation, but when they get their associated
WebhookConfigurations.

Don't check for actual policy deletion, as that will never happen.

Leave around the `integrationTestsFinalizer`.

Signed-off-by: Víctor Cuadrado Juan <vcuadradojuan@suse.de>
When reconciling a policy whose PolicyServer is gone, 2 separate API
writes happen:
1. Metadata write: removePolicyWebhooksAndFinalizers calls Update() and
   persists the finalizer removal.
2. Status write: control returns to reconcile, which calls
   r.Status().Update(). This persists PolicyStatusScheduled.

These are two distinct round-trips to the API server. Between them, an
observer (the e2e test polling via Get) can land in any of three states:
- Old finalizer + old status (Active): before write #1 lands
- No finalizer + old status (Active): between writes #1 and #2
- No finalizer + new status (Scheduled): after write #2 lands

The fix Waits until both conditions hold simultaneously, so it cannot
observe the intermediate state.

Signed-off-by: Víctor Cuadrado Juan <vcuadradojuan@suse.de>
The factories don't add the finalizer, so we need to ensure we fetch
the last version of the policy, to check that it indeed has the
finalizers now.

In the past it was working, because policies had finalizers since
creation.

Signed-off-by: Víctor Cuadrado Juan <vcuadradojuan@suse.de>
@viccuad
Copy link
Copy Markdown
Member

viccuad commented May 28, 2026

Tackled the opened comments, ready for another review. The e2e tests pass locally for me.

edit: the e2e tests passed on my fork (same commits) https://github.com/viccuad/adm-controller/actions/runs/26567167995/job/78264737960

Copy link
Copy Markdown
Member Author

@jvanz jvanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I cannot "approve" my own PR. But I'm fine with the latest changes from Victor.

@jvanz jvanz merged commit 8d3a4f6 into main May 28, 2026
55 checks passed
@github-project-automation github-project-automation Bot moved this from Pending review to Done in Kubewarden Admission Controller May 28, 2026
@viccuad viccuad changed the title Add controller finalizer when policy is active state only fix: Add controller finalizer when policy is active state only May 28, 2026
@viccuad viccuad deleted the finalizer-removal branch May 28, 2026 12:20
@viccuad viccuad mentioned this pull request May 28, 2026
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

4 participants