Add real-cluster E2E release gate#83
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a real-cluster E2E “release gate” by making make test-e2e run the existing playground chaos suite against an actual Kubernetes cluster, and wiring that into GitHub Actions so release publishing is blocked on the E2E run. It also updates playground setup and docs/examples to align with Helm CRD ownership behavior and CI execution.
Changes:
- Replace the placeholder
make test-e2ewith profile-drivenplayground-chaos run-allexecution (smoke/release/full) and add Make targets for smoke/profile runs. - Add reusable kind+Calico E2E workflow and a trigger workflow (nightly/manual/PR label), and require the E2E gate in the release workflow before publishing.
- Adjust playground setup for CI (skip image build) and CRD installation behavior; update docs/examples to remove the now-obsolete
installCRDsvalue guidance.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
WISHLIST.md |
Marks #32 complete and adds follow-up #43 for backup/PITR-specific E2E scenarios. |
playground/setup.sh |
Adds CI-friendly image-build skip and a toggle for relying on Helm-installed CRDs vs manual CRD apply. |
Makefile |
Implements real test-e2e/test-e2e-smoke targets and adds a profile-filtered chaos runner target. |
internal/playground/runner/profile.go |
Introduces E2E profiles and scenario selection filtering. |
internal/playground/runner/profile_test.go |
Adds unit tests for profile validation and selection behavior. |
internal/playground/runner/profile_registry_test.go |
Verifies profiles select the intended subset from the registered scenarios set. |
internal/playground/runner/junit.go |
Ensures the JUnit output directory exists before writing the report. |
examples/production-values.yaml |
Removes installCRDs example since CRD ownership guidance changed. |
examples/argocd-application.yaml |
Removes installCRDs override from Argo CD Helm values example. |
docs/docs/production-install-examples.mdx |
Updates CRD ownership/install guidance to reflect Helm CRD behavior. |
docs/docs/install-production.mdx |
Removes --set installCRDs=true/false guidance; clarifies Helm CRD upgrade limitations. |
docs/docs/gitops.mdx |
Simplifies CRD ownership table and clarifies install/upgrade sequencing. |
cmd/playground-chaos/main.go |
Adds --profile flag and applies profile filtering in run-all. |
charts/bloodraven/values.yaml |
Removes the installCRDs value from chart values. |
AGENTS.md |
Documents new E2E Make targets and profile-based chaos runs. |
.github/workflows/release.yml |
Adds an E2E release-profile gate job and makes publishing jobs depend on it. |
.github/workflows/README.md |
Documents the new E2E workflows and the profile matrix. |
.github/workflows/e2e.yml |
Adds the trigger workflow for nightly/manual/PR-label E2E runs. |
.github/workflows/_e2e.yml |
Adds the reusable kind+Calico cluster workflow that deploys the playground and runs E2E. |
.github/kind/e2e-calico.yaml |
Adds a kind config that disables default CNI and supports Calico-enforced NetworkPolicy testing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0a78f67 to
387720e
Compare
6875b36 to
fb1c1f7
Compare
Addressed both unresolved Copilot review threads and resolved them:
Verification run after the fixes:
Skipped threads: none. |
Final update after CI:
|
Summary
make test-e2ewith profile-driven real-cluster playground chaos runs (smoke,release,full).Test Plan
go test ./internal/playground/runner— PASSmake build-playground-chaos— PASSmake vet— PASSnpm run build --prefix docs— PASSnpm run verify:llms --prefix docs— PASSmake test-unit— PASSmake test-component— PASSgit diff --check— PASSmake lint— SKIPPED locally (golangci-lintnot installed)Real kind/k3d E2E was not run locally in this harness; the added workflow creates a fresh kind+Calico cluster and runs the selected profile.
Megamind Artifacts
.tmp/megamind-wishlist-32/plans/final.md.tmp/megamind-wishlist-32/reviews/validated-findings.md.tmp/megamind-wishlist-32/reviews/fixed-review.md.tmp/megamind-wishlist-32/final/local-gates.mdMegamind Educational Appendix
Journey
.tmp/megamind-wishlist-32/briefs/request.md,.tmp/megamind-wishlist-32/briefs/context.md).playground-chaosrunner instead of inventing a new E2E framework; avoid false NetworkPolicy coverage on CNIs that do not enforce policies; make release blocking explicit; define PR smoke scope; upload forensics; and do not pretend backup/PITR coverage exists without scenarios (.tmp/megamind-wishlist-32/critiques/mbot-critique.md)..tmp/megamind-wishlist-32/plans/second-draft.md,.tmp/megamind-wishlist-32/plans/final.md).make test-e2e,make test-e2e-smoke,playground-chaos run-all --profile, profile selection tests, kind+Calico reusable workflow, nightly/manual/PR-label triggers, release workflow dependency, CI-friendly playground setup, JUnit directory creation, and docs/examples cleanup for Helm CRD ownership (.tmp/megamind-wishlist-32/agents/e2e-gate-coder-final.md,.tmp/megamind-wishlist-32/final/diff-stat.txt).0sbefore Calico, Calico is mandatory, node readiness useskubectl wait nodes, JUnit parent directories are created, PR E2E reruns on synchronize while labeled, workflow concurrency was added, and live-registry tests guard profile drift (.tmp/megamind-wishlist-32/reviews/validated-findings.md,.tmp/megamind-wishlist-32/fixes/review-fixes-final.md).make lintwas skipped becausegolangci-lintwas not installed, and real kind/k3d E2E was not run inside this harness (.tmp/megamind-wishlist-32/final/local-gates.md,.tmp/megamind-wishlist-32/final/delivery.md).Design Decisions
cmd/playground-chaosas the E2E harness.test/e2eGo/Ginkgo/e2e-framework suite..tmp/megamind-wishlist-32/critiques/mbot-critique.md; final plan scope;cmd/playground-chaos/main.go;Makefile.smoke,release, andfullprofiles.run-all, or an undefined PR smoke subset.internal/playground/runner/profile.go;internal/playground/runner/profile_test.go;internal/playground/runner/profile_registry_test.go;Makefile.run-allprofile asfull, but makemake test-e2edefault torelease.playground-chaos run-allusers while making the canonical E2E gate the curated release profile.run-alldefault semantics to release.DefaultProfile = ProfileFullininternal/playground/runner/profile.go;E2E_PROFILE ?= releaseinMakefile..tmp/megamind-wishlist-32/critiques/mbot-critique.md;.github/kind/e2e-calico.yaml;.github/workflows/_e2e.yml.release.yml..github/workflows/release.yml; critique release-enforcement findings._e2e.ymlworkflow plus trigger workflow..github/workflows/_e2e.yml;.github/workflows/e2e.yml;.github/workflows/release.yml;.github/workflows/README.md.crds/directory, not aninstallCRDsvalue.crds/based on values; the previous value was misleading. CI setup skips manual Bloodraven CRD apply so fresh Helm install exercises chart CRDs.installCRDs=true/false.playground/setup.sh; removed value incharts/bloodraven/values.yaml; docs/example edits;.tmp/megamind-wishlist-32/reviews/validated-findings.md..tmp/megamind-wishlist-32/reviews/validated-findings.md;WISHLIST.md;.tmp/megamind-wishlist-32/final/pr-body.md.Architecture
flowchart TD subgraph Local[Local / Make entrypoints] A[make test-e2e\nE2E_PROFILE defaults release] --> B[bin/playground-chaos run-all] A2[make test-e2e-smoke] --> B A3[make chaos-run-all-profile PROFILE=...] --> B end subgraph Runner[playground-chaos] B --> C[--profile validation\nsmoke | release | full] C --> D[runner.SelectForProfile] D --> E[Executor runs selected scenarios] E --> F[JUnit XML] E --> G[chaos-results forensics on failure] end subgraph CI[GitHub Actions] H[e2e.yml\nschedule/manual/PR label] --> I[_e2e.yml reusable] R[release.yml e2e-gate] --> I I --> J[kind bloodraven-e2e\nCalico CNI] J --> K[build + load playground images] K --> L[playground/setup.sh\nSKIP_IMAGE_BUILD=1\nBLOODRAVEN_SETUP_HELM_INSTALL_CRDS=1] L --> A F --> M[upload JUnit] G --> N[upload forensics/kind/setup logs] end R --> O[draft/docker publishing require e2e-gate]Key module boundaries:
internal/playground/runner/profile.gois intentionally small and data-driven: profile constants, allowlists, validation, andSelectForProfile(all, profile)filtering. It does not know Kubernetes or scenario internals.cmd/playground-chaos/main.goowns CLI parsing and wires--profileonly intorun-all; single-scenariorunremains unchanged.Makefileis the local contract:test-e2ebuilds the runner and runsrun-all --profile=$(E2E_PROFILE) --auto-reset --continue-on-failure --junit-out=...;test-e2e-smokeis a convenience wrapper..github/workflows/_e2e.ymlis the CI implementation: build runner/images, create kind with Calico, install Calico, load images, deploy playground, run the requested profile, then upload JUnit and failure artifacts.playground/setup.shgained CI-safe behavior by honoring prebuilt/preloaded images (SKIP_IMAGE_BUILD) and skipping manual Bloodraven CRD application when Helm should install chart CRDs on a fresh cluster (BLOODRAVEN_SETUP_HELM_INSTALL_CRDS).installCRDsvalue and explain that Helm installs chart CRDs on first install while upgrades need explicit CRD review/application.Lessons
crds/install on first install and are not templated by values or automatically upgraded. Avoid fakeinstallCRDstoggles unless the chart actually implements them.AGENTS.mdso future agents route unit/component/envtest/real-cluster scenario work to the right directories and runner.Evidence
.tmp/megamind-wishlist-32/briefs/request.md;.tmp/megamind-wishlist-32/briefs/context.md..tmp/megamind-wishlist-32/critiques/mbot-critique.md..tmp/megamind-wishlist-32/plans/final.md.internal/playground/runner/profile.go.internal/playground/runner/profile_test.go;internal/playground/runner/profile_registry_test.go.--profileforrun-all, validates values, filters scenarios, and still writes JUnit.cmd/playground-chaos/main.go.Makefile; diff againstmainshows removal of theTESTING_2.0.mdplaceholder.internal/playground/runner/junit.go; validated finding #5..github/kind/e2e-calico.yaml;.github/workflows/_e2e.yml..github/workflows/e2e.yml.e2e-gatebefore draft/docker, with Helm/publish transitively dependent..github/workflows/release.yml.playground/setup.sh;.github/workflows/_e2e.ymlenv.installCRDsvalue and docs/examples references were removed because Helm CRDs are not value-controlled.charts/bloodraven/values.yaml;docs/docs/install-production.mdx;docs/docs/gitops.mdx;docs/docs/production-install-examples.mdx;examples/argocd-application.yaml;examples/production-values.yaml;.tmp/megamind-wishlist-32/reviews/fixed-review.md.WISHLIST.mditem #43;.tmp/megamind-wishlist-32/reviews/validated-findings.md;.tmp/megamind-wishlist-32/final/pr-body.md..tmp/megamind-wishlist-32/final/local-gates.md;.tmp/megamind-wishlist-32/agents/e2e-gate-coder-final.md..tmp/megamind-wishlist-32/final/delivery.md;.tmp/megamind-wishlist-32/final/diff-stat.txt.