Add MysqlStandbyCluster CRD + Phase 0 cross-cluster DR runbook by colinmollenhour · Pull Request #84 · ShipStream/bloodraven

colinmollenhour · 2026-05-23T07:32:51Z

AI Megamind - By: Claude Code (Claude Sonnet orchestrating; Opus + GPT-5.5 + Gemini-3.1-Pro for review/critique)

Summary

First slice of WISHLIST #7 — cross-region/cross-cluster DR. Implements Phases 0 + 1 of the multi-phase plan: an end-to-end cross-cluster DR runbook over the existing MysqlFailoverGroup + initFromBackup + PITR surface, plus a new MysqlStandbyCluster CRD with a passive verifier reconciler.

Phase 0 (docs runbook). New docs/docs/multi-cluster-dr.mdx is a complete recovery walkthrough using only today's CRDs. An on-call engineer can recover into another cluster using this doc alone — no new operator features required. Includes topology, IAM policy, encryption-passphrase distribution, source-fencing checklist, recovery commands, DNS cutover, and prose failback narrative.
Phase 1 (passive verifier CR). New MysqlStandbyCluster CRD (api/v1alpha1/mysqlstandbycluster_types.go, 523 lines) plus reconciler (internal/controller/standbycluster_reconciler.go, 747 lines). The CR declares a DR relationship from the DR cluster's side: scan the shared S3 bucket on freshness.discoveryInterval (default 5m), pick the newest dump by @.json end timestamp (not lex order — a real bug; see "Notes" below), read per-site PITR manifests, publish BucketReadable and SourceConfigKnown conditions. No activation, no promotion, no writes into MySQL — those land in Phase 2 (continuous restore verification) and Phase 3 (dr-activate), which are explicit follow-up PRs.

The CRD ships the full v1alpha1 schema (template, freshness, activate blocks) so the API surface is locked once and Phase 2/3 can read those fields without bumping the CRD version (no conversion webhook is available — see docs/docs/known-limitations.mdx).

What's NOT in this PR (explicit follow-ups)

Phase	Feature	Why deferred
2	`Restorable` condition powered by `MysqlBackupVerification` reuse; `dr-cursors/<name>.json` retention-floor sentinel	Hard prereq on WISHLIST #43 (real-cluster PITR/backup E2E scenarios), which is also still open
3	`dr-activate` kubectl plugin verb; activation state machine (Validating → Restoring → Replaying → Provisioning → Activated); materialization of the active `MysqlFailoverGroup`	Builds on Phase 2's `Restorable` gate
4	Symmetric failback CR + runbook	Builds on Phase 3

The full plan and per-phase scope are in .tmp/megamind-dr-7/plans/second-draft.md (not committed; available locally during the run).

Test plan

Notes for reviewers

This PR went through Megamind's full review/fix loop:

Three-model MBOT critique of the wishlist line surfaced 30 deduplicated findings; the critic dissent on the transport choice (Gemini wanted network-mediated; Opus + GPT-5.5 picked object-store-mediated) was resolved with the transport=ObjectStore|Network enum where Network is reserved for v2.
One planning agent produced a 2353-line implementation plan with explicit Phase 0..4 sequencing.
Three coding agents (CRD scaffolding, reconciler, docs) implemented Phases 0 + 1 in disjoint work packages.
Three-reviewer ultra-review (bugs / runtime / craft) returned 31 findings. 18 routed to fixes (most notable: dump selection was using sort.Strings() on GenerateName-suffixed directory names — picks the wrong dump; replaced with @.json end-timestamp comparison). 5 were explicitly deferred to Phase 2/3 (e.g. standby metrics, GenerationChangedPredicate hardening) and tracked in .tmp/megamind-dr-7/reviews/validated-findings.md.
Three fix agents addressed every validated finding. One fixed-review pass caught a trivial Helm-chart CRD copy drift; one-line re-copy.

Megamind Educational Brief

Educational brief — Cross-cluster DR (WISHLIST #7)

Status: future-state design. No PR has been opened. The
authoritative spec is
plans/second-draft.md (2353 lines).
This brief compresses it for reviewers and future agents.

Run: .tmp/megamind-dr-7/ · Mode: planning-only · Readiness:
READY_TO_START with zero unresolved decisions.

Journey

How the wishlist line traveled through Megamind's planning loop:

Resolution. User invocation Help me plan out WISHLIST.md item #7
resolved against WISHLIST.md:21 (cross-region/cross-cluster DR as a
first-class feature). The line bundles four distinct ideas — new CR,
continuous shipping, one-command promote, runbook — that the critics
treated as separable products.
Context capture. briefs/context.md grounded the planning in the
existing surface: per-cluster operator, sidecar archiver gated on
!@@read_only (intra-pod, primary-only), dr-only site role
(intra-cluster, never auto-promoted), full-backup + PITR archive in
S3, initFromBackup + pointInTime already deployed.
MBOT critique. Three critics — Claude Opus (max thinking),
OpenAI GPT-5.5 (xhigh), Google Gemini 3.1 Pro (high) — ran in
parallel against the wishlist line and produced ~30 deduplicated
findings: 7 contradictions/hidden-assumptions (C-*), 10 failure-mode
gaps (F-*), 8 architectural decisions needing an explicit choice
(D-*), 5 naming concerns (N-*), 8 scope-discipline items (S-*).
Critic dissent (one item). D-1, the DR transport choice: Gemini
argued for network-mediated first (cross-cluster MySQL replication)
and deferring continuous S3 replay. Opus + GPT lean object-store-
mediated because today's surface already does the work. The
collector sided with the 2-of-3 majority; the rejected option survives
as a reserved transport=Network enum so v2 can revisit without a
CRD bump.
Recommended-defaults table. The critique closed with a 9-row
table (D-1..D-8, S-3) that became the seed for the planning pass. The
table is the source of every "Source" citation in the Design Decisions
section below.
Single planning pass. One Claude Opus agent applied the defaults
end-to-end, producing the 16-section second draft (goals, phasing,
CRD shape, state machine, conditions, metrics, IAM/RBAC, DNS, test
plan, docs, risks, readiness). No MBOD/bundled-decisions phase ran
because the critique left zero open multi-option questions. Status
landed at READY_TO_START.

Design Decisions

Each row resolves a critique finding. "Source" cites the plan section
or current-code path that grounds it.

ID	Decision	Choice	Alternative rejected	Source
D-1	DR transport	Object-store-mediated (S3 + PITR archive already in place). `transport=ObjectStore`.	Network-mediated (`CHANGE REPLICATION SOURCE` across clusters). Reserved as `transport=Network` enum for v2.	`plans/second-draft.md` §1.1, §4.4; critique §3 (D-1, Gemini dissent)
N-1	Kind name	`MysqlStandbyCluster` (short name `msc`).	`MysqlDRTarget` — collides with `SiteRoleDROnly` (`api/v1alpha1/types.go:280-283`), which is the intra-cluster passive role and cannot be auto-promoted. Also rejected: `MysqlClusterReplica`, `MysqlRemoteFollower`, `MysqlDRPair`.	`plans/second-draft.md` §4.2; critique §4 (N-1)
N-3	Activation verb	`kubectl bloodraven dr-activate`	`promote` — already means zero-RPO intra-cluster switchover with `transactionsLost=0` (`cmd/kubectl-bloodraven/promote.go:23-46`). Reusing it across a non-zero-RPO cross-cluster path misleads operators.	`plans/second-draft.md` §6.5; critique §4 (N-3)
D-2	CRD residency	Target-side only. The CR lives on the DR cluster and declares "consume from bucket X, promote on confirm." The source operator never knows the relationship exists.	Source-side CR, or symmetric pair. Both require an out-of-band linking step. Failback is achieved by symmetry — drop a new standby CR on the original source.	`plans/second-draft.md` §4.3; critique §3 (D-2)
D-5	Promotion contract	Spec confirm-token (RFC 3339). `spec.activate.confirm` must parse and be strictly greater than `status.activation.confirmTokenUsed`. Mirrors `restoreInPlace.confirm`.	Annotation (source operator never sees it) or one-shot `MysqlPromote` CR (extra Kind for negligible gain).	`plans/second-draft.md` §6.1; `api/v1alpha1/backup_types.go:723-732`
S-3	Split-brain stance	Accept-loss with after-the-fact audit. Controller does not check whether the source is still writable. Operator owns the risk; runbook says "fence source first."	Bucket-fence sentinel object (TTL'd, source-written) as a hard interlock. Deferred to follow-up; the `transport` enum makes adding `spec.activate.requireSourceFenceTTL` non-breaking.	`plans/second-draft.md` §1.2, §6.7, §15.4; critique §5 (S-3)
D-6	DNS cutover	Operator writes `DNSEndpoint` in both clusters; user runs external-dns symmetrically. Bloodraven owns per-MFG records; the application-facing record (weighted-CNAME / GSLB / manual flip) is user-owned.	Operator-driven cross-provider DNS cutover. Blast radius too large; explicit non-goal.	`plans/second-draft.md` §12; `internal/platform/dns.go:23-31`; `api/v1alpha1/types.go:371-384`
D-3	Bucket IAM	User-provisioned. Read-only at DR (`s3:ListBucket`, `s3:GetObject`) plus a tightly-scoped write on `dr-cursors/*` only. Runbook publishes the minimum policy.	Operator-managed IAM (STS / bucket policy automation). Out of v1 scope.	`plans/second-draft.md` §11.1; critique §3 (D-3)
D-4	Encryption passphrase	User-managed. Source-side passphrase Secret is manually mirrored to DR namespace; preflight (Validating phase) validates non-empty.	Operator-driven passphrase distribution. Unanimous critic rejection.	`plans/second-draft.md` §11.2; `docs/docs/backup-encryption.mdx:217-271`
F-3	Failback shape	Symmetric: a second `MysqlStandbyCluster` on the original source cluster pointing at the new primary's bucket prefix. No dedicated `MysqlFailback` Kind.	Dedicated failback CR. Near-duplicate of the standby Kind; semantic difference is zero from operator's perspective.	`plans/second-draft.md` §7; critique §2 (F-3)
D-7	Artifact ownership across clusters	DR-side bucket scan + synthetic shadow `MysqlBackup` CRs. DR controller writes phase-`Succeeded` `MysqlBackup` CRs annotated `dr.bloodraven.shipstream.io/synthetic=true`; the existing verification reconciler is taught a single predicate to accept them.	Source-side mirroring of `MysqlBackup` CRs (GitOps or otherwise). Violates "only the bucket is the cross-cluster bus."	`plans/second-draft.md` §5.2.1
D-8	Single CR vs multiple	Single Kind with `transport` discriminator (matches `BackupStorage.Type` precedent in `api/v1alpha1/backup_types.go:388-444`).	Separate Kinds for each transport mode. Causes API surface bloat.	`plans/second-draft.md` §4.4
F-2	PITR pruning vs DR consumer	`dr-cursors/<ns>-<name>.json` retention-floor sentinel. DR controller refreshes every 5m (TTL 60m); source operator's `/pitr-cutoff` returns `min(MysqlBackup_retention, oldest_required_across_cursors)`.	Source-side CR coordination, or accepting the race.	`plans/second-draft.md` §5.3; `cmd/bloodraven/main.go:388-410`; `internal/sidecar/binlog_archiver.go:350-458`
S-1, S-2	Phasing	Phase 0 (docs runbook over existing surface) ships first. Phases 1..4 layer the CR, verification, activation, failback on top. Each phase ships independent value; Phase 0 is the floor if Phases 1+ slip.	Big-bang ship of CR + activation. Critics unanimous that writing the runbook surfaces the gaps the CR must close.	`plans/second-draft.md` §2, §3

Architecture

Bloodraven on main (commit 5b5f0b0) is a single-cluster Kubernetes
operator: each MysqlFailoverGroup is one logical database with 2-16
sites that all live in the same cluster. The sidecar binlog archiver
runs only on the active primary (gated on !@@read_only), uploads
sealed binlogs to a shared S3 prefix, and the operator drives PITR
pruning from /pitr-cutoff. Today's "DR into another cluster" is a
manual checklist: stand up a fresh MFG in the target cluster with
spec.initFromBackup pointing at the source bucket, mirror passphrase
Secrets, flip DNS. There is no CR tracking the relationship, no
freshness signal, no consumer-side retention guard, no audit-grade
promote.

WISHLIST #7 introduces one new Kind — MysqlStandbyCluster — that
lives on the DR cluster, declares the relationship, continuously
verifies the latest dump + PITR window is restorable, and on a
confirm-token-gated dr-activate materializes a writable
MysqlFailoverGroup loaded from the source archive. The only
cross-cluster bus is the shared object store. Each operator stays
single-cluster: no federation, no operator-to-operator RPC.

Diagram 1 — End-state two-cluster topology

flowchart LR
    subgraph SourceCluster["Source cluster (e.g. us-west-prod)"]
        direction TB
        SOp["Bloodraven operator"]
        MFG["MysqlFailoverGroup (orders)"]
        SidePri["Sidecar (active primary)<br/>!@@read_only ⇒ writes binlogs"]
        SideRep["Sidecar (replicas)<br/>@@read_only ⇒ idle"]
        SOp -->|"reconciles"| MFG
        MFG --> SidePri
        MFG --> SideRep
    end

    subgraph Bucket["Shared S3 bucket (cross-cluster bus)"]
        direction TB
        Dumps["&lt;prefix&gt;/&lt;mysqlbackup-name&gt;/<br/>(full dumps + @.json)"]
        Binlogs["&lt;prefix&gt;/binlogs/<br/>(sealed binlogs + per-site manifest)"]
        Cursors["&lt;prefix&gt;/dr-cursors/&lt;name&gt;.json<br/>(retention floor sentinel)"]
    end

    subgraph DRCluster["DR cluster (e.g. us-east-prod)"]
        direction TB
        DOp["Bloodraven operator<br/>+ MysqlStandbyClusterReconciler"]
        MSC["MysqlStandbyCluster CR<br/>(verifier mode)"]
        MBVer["MysqlBackupVerification (periodic)<br/>+ synthetic MysqlBackup CRs"]
        FutureMFG["Materialized MysqlFailoverGroup<br/>(not yet created — Phase 3 only)"]
        DOp -->|"reconciles"| MSC
        MSC -->|"Owns"| MBVer
        MSC -.->|"materializes on dr-activate"| FutureMFG
    end

    SidePri -->|"PUT sealed binlogs"| Binlogs
    SOp -->|"PUT full dumps (Job)"| Dumps
    SOp -->|"GET dr-cursors/*.json<br/>during /pitr-cutoff"| Cursors

    MSC -->|"GET (list + read) dumps, binlogs"| Dumps
    MSC -->|"GET binlog manifests"| Binlogs
    MSC -->|"PUT dr-cursors/&lt;name&gt;.json<br/>(only object DR writes)"| Cursors

    classDef src fill:#fee,stroke:#900
    classDef dr fill:#eef,stroke:#009
    classDef bus fill:#ffd,stroke:#960
    class SourceCluster,SOp,MFG,SidePri,SideRep src
    class DRCluster,DOp,MSC,MBVer,FutureMFG dr
    class Bucket,Dumps,Binlogs,Cursors bus

The asymmetry is the design's defining feature:

The source cluster writes the bucket: full dumps via backup Jobs
(operator-driven), sealed binlogs via the sidecar archiver (gated on
!@@read_only, so the upload happens only on the active primary
and switches over within one scan cycle on failover).
The DR cluster only reads dumps + binlogs. The one exception
is the dr-cursors/<name>.json sentinel — a tiny per-standby file
the DR controller refreshes every 5 minutes (TTL 60m) to bound the
source operator's /pitr-cutoff and prevent it from pruning binlogs
a DR consumer still needs (critique F-2).
IAM follows the asymmetry: s3:ListBucket + s3:GetObject on the
whole prefix; s3:PutObject + s3:DeleteObject scoped to
dr-cursors/* only.

Diagram 2 — `MysqlStandbyCluster` activation state machine

Mirrors plans/second-draft.md §9. One transition per reconcile so
operator restarts land on a well-defined observable state.

stateDiagram-v2
    [*] --> None
    None: "" (no activation requested)

    None --> Validating: "confirm set & valid<br/>Restorable=True (or AcceptUnverified=true)<br/>not already Activated"

    Validating --> Restoring: "spec snapshot taken<br/>template MFG name free or owned by this CR<br/>preflight passed"
    Validating --> Failed: "RFC3339 parse fail<br/>confirm ≤ confirmTokenUsed<br/>Restorable stale + !acceptUnverified<br/>TemplateInvalid"

    Restoring --> Replaying: "materialized MFG<br/>status.restore.phase == Succeeded"
    Restoring --> Failed: "MFG status.restore.phase == Failed<br/>(RestoreFailed) or MaterializedGroupCollision"

    Replaying --> Provisioning: "initFromBackup.pointInTime applied (or N/A)<br/>target GTID covers source dump GTID"
    Replaying --> Failed: "PitrReplayFailed<br/>(GTID mismatch)"

    Provisioning --> Activated: "MFG status.activeSite != ''<br/>Ready=True condition stamped"
    Provisioning --> Failed: "wall-clock &gt; spec.activate.restoreTimeout<br/>(ProvisioningTimeout)"

    Activated --> [*]: "terminal success<br/>Active=True, ActivationInProgress=False"
    Failed --> [*]: "terminal failure<br/>confirmTokenUsed NOT bumped — edit confirm to retry"

Key invariants:

confirmTokenUsed is monotonically non-decreasing. A retry after
Failed requires the user to bump spec.activate.confirm to a
strictly-greater RFC 3339 timestamp (or use --auto-confirm /
kubectl bloodraven dr-activate).
Every transition writes status before the next phase's work
starts. Crash semantics: the next reconcile reads the current phase,
re-runs idempotent work (e.g. CreateOrUpdate on the materialized
MFG), and re-checks the exit condition. Pattern matches
PlannedFailoverReconciler.handle* in
internal/controller/planned_failover_reconciler.go:138-152.
Post-Activated the controller stops processing new confirm
edits and emits an ActivationLocked event. A second activation is
always a fresh CR.

Diagram 3 — DR-event lifecycle (with failback)

sequenceDiagram
    autonumber
    participant Apps as "Applications"
    participant SrcOp as "Source operator"
    participant Bucket as "Shared S3 bucket"
    participant DrOp as "DR operator"
    participant MSC as "MysqlStandbyCluster CR"
    participant DrMFG as "Materialized MysqlFailoverGroup"
    participant Admin as "Admin"

    Note over SrcOp,Bucket: "Steady state (Phase 1 + 2)"
    SrcOp->>Bucket: "PUT full dumps + sealed binlogs"
    DrOp->>Bucket: "LIST + GET (discovery loop, 5m)"
    DrOp->>MSC: "stamp status.discovered, BucketReadable=True"
    DrOp->>Bucket: "PUT dr-cursors/&lt;name&gt;.json (5m refresh)"
    DrOp->>DrOp: "scheduled MysqlBackupVerification (cron, default 0 4 * * *)"
    DrOp->>MSC: "Restorable=True; bloodraven_dr_restorable_timestamp_seconds gauge"

    Note over SrcOp,DrOp: "Source cluster loss"
    SrcOp--xApps: "primary unreachable / cluster API down"
    Admin->>Admin: "confirm source down (3 signals: /active-site 503,<br/>API server unreachable, MySQL TCP unreachable)"

    Note over Admin,MSC: "Activation (Phase 3)"
    Admin->>MSC: "kubectl bloodraven dr-activate &lt;msc&gt; --confirm $(date -u +%FT%TZ) --wait"
    DrOp->>MSC: "Validating: parse confirm, snapshot discovered.dumpName/Loc/GTID"
    DrOp->>DrMFG: "Restoring: create MFG with spec=template.spec + synthesized initFromBackup"
    DrMFG->>Bucket: "GET dump + binlogs (existing initFromBackup path)"
    DrMFG->>DrOp: "status.restore.phase=Succeeded"
    DrOp->>MSC: "Replaying: validate target GTID ⊇ source dump GTID"
    DrOp->>MSC: "Provisioning: wait Ready=True, activeSite set"
    DrOp->>MSC: "Activated: stamp materializedFailoverGroup,<br/>Active=True, emit StandbyActivated event"
    DrMFG-->>Apps: "writable (after DNS cutover by admin)"

    Note over Admin,Bucket: "DNS cutover (D-6) — user-driven"
    Admin->>Apps: "flip weighted-CNAME / external-dns ownership"

    Note over SrcOp,Bucket: "Source returns (Phase 4 failback)"
    SrcOp->>SrcOp: "original cluster comes back"
    Admin->>SrcOp: "delete old MFG + PVCs (destructive, manual)"
    Admin->>DrMFG: "ensure spec.backup.profiles[].storage.s3.prefix uses<br/>new directional layout (e.g. orders/east/)"
    Admin->>SrcOp: "apply *new* MysqlStandbyCluster pointing at DR cluster's prefix"
    SrcOp->>Bucket: "discovery + verification against DR's new bucket prefix"
    SrcOp->>SrcOp: "Restorable=True"
    Admin->>SrcOp: "kubectl bloodraven dr-activate (failback) — original cluster becomes standby of new primary"

The symmetry of MysqlStandbyCluster is the failback story: the same
Kind/controller/state-machine runs in both directions. No new "failback"
Kind, no swap-direction operation; just a second standby CR pointing
the other way. The plan calls this "current-state-driven, not
identity-driven" — exactly the same discipline as in-cluster
fail-back, where a returning original primary wins promotion only if
it wins the normal GTID-freshest candidate path.

CR shape (top-level fields from plan §8)

MysqlStandbyClusterSpec (shipstream.io/v1alpha1, namespace-scoped,
shortname msc, categories bloodraven;mysql;dr):

transport — ObjectStore (only honored in v1) or reserved Network.
source — failoverGroupName, optional namespace/cluster (informational), storage (mirrors BackupStorage), profileName, optional decryption (mirrors BackupDecryptionSpec).
template — embedded MysqlFailoverGroupSpec declared at standby-CR-creation time so activation is not a YAML scramble during an incident; plus name of the MFG to materialize.
freshness — discoveryInterval (5m default), verifySchedule cron (default 0 4 * * * UTC), verifyTimeZone, maxStaleness (48h default), suspend, retentionFloorRefresh (5m default).
activate — confirm (required RFC 3339), optional pointInTime (mirrors PointInTimeSpec), acceptUnverified (bypass Restorable gate), restoreTimeout (2h default).

Status carries discovered, lastVerified, activation (the full
StandbyActivationStatus audit block with source/target GTID, PITR
stop datetime, replayed binlog count, materialized active site,
reason, message), materializedFailoverGroup, and conditions.
Conditions: BucketReadable, SourceConfigKnown, Restorable,
ActivationInProgress, Active.

Phasing (plan §2)

Phase 0 — docs/docs/multi-cluster-dr.mdx runbook over existing CRDs only. Ships first; surfaces every gap Phases 1+ must close. Required for v1 floor.
Phase 1 — MysqlStandbyCluster CR + controller in passive verifier mode. Discovery loop populates status.discovered; stamps BucketReadable and SourceConfigKnown. No load, no materialization.
Phase 2 — Continuous DR readiness. Synthetic MysqlBackup CRs (annotated dr.bloodraven.shipstream.io/synthetic=true); CronJob-scheduled MysqlBackupVerification runs; Restorable condition; bloodraven_dr_restorable_timestamp_seconds gauge. Source operator gains dr-cursors/*.json honor in /pitr-cutoff. Hard prereq: WISHLIST Bump azure/setup-helm from 4 to 5 #43 PITR E2E scenarios.
Phase 3 — dr-activate (kubectl plugin) + spec confirm-token; full activation state machine; materialized MFG. New verb name picked deliberately to not collide with intra-cluster promote (zero-RPO).
Phase 4 — Failback runbook + symmetric-CR rehearsal. No CRD changes — the Kind is already symmetric.

Two crucial reuse points

The new controller is essentially a scheduler around primitives that
already exist.

Existing MysqlBackupVerification powers Phase 2 readiness. The
verification reconciler already restores a backup into an ephemeral
mysqld and (optionally) replays binlogs to validate the dump. The
only new code on that path is a single predicate flip in
internal/controller/backup_verification_reconciler.go to accept
MysqlBackup CRs carrying the synthetic annotation and resolve
their location from MysqlBackup.status.location.
Existing initFromBackup + pointInTime powers Phase 3
activation. The Restoring phase synthesizes an initFromBackup
block pointing at the discovered dump location (+ optional
pointInTime) and creates the materialized MFG. From there, the
normal greenfield bootstrap path runs unchanged — restore Job,
sentinel write, replica clone, DNSEndpoint write, isFreshDeploy
gating. The standby controller's job at that point is purely to
wait for the existing status.restore.phase=Succeeded and
Ready=True signals.

This is the design's lever: almost every primitive Phase 2/3 needs
already exists. The new CR is a scheduler + audit layer that names
the relationship; nearly all the heavy machinery (S3 client, BRV1
header parsing, dump load via mysqlsh util.loadDump, binlog replay,
DNSEndpoint, condition surface, metrics shape) is reused verbatim.

Lessons

A naming collision is a critique-phase finding, not an
implementation-review finding. All three critics independently
surfaced N-1 — MysqlDRTarget vs SiteRoleDROnly. If the planning
pass had gone first, the name would have shipped, gone to
implementation review, and been renamed at the worst possible time
(after generated DeepCopy code + Helm chart edits + docs are in
flight). The MBOT critique catches naming hazards before anyone
writes a Go file.
Make the only cross-cluster bus explicit in the first diagram.
Diagram 1 puts the shared S3 bucket dead center with its three
subprefixes, and labels the directionality of every arrow. The
trust boundary becomes obvious immediately — and the asymmetry
("source writes, DR reads, except for one tiny sentinel object")
catches the C-1/F-2 critiques in one image. A reviewer who only
reads the diagram still knows the answer to "what runs the shipper
on the DR side?" (nothing).
MBOT critique value comes from picking models with different
failure modes. Opus + GPT + Gemini disagreed on exactly one
thing — the D-1 transport choice — and that disagreement was the
most valuable finding in the entire critique. The decision became
visible (object-store-first, with Network reserved as a
forward-compatible enum) rather than buried in a single agent's
default. When models agree on everything, the critique is probably
rubber-stamping; when one dissents on one item, the planner has a
real tradeoff to write down.
Existing primitives drive the CRD shape, not the other way
around. The template field, the synthetic-MysqlBackup trick,
the confirm-token pattern, the phase enum vocabulary — every one of
these mirrors something already in the codebase
(MysqlFailoverGroupSpec, MysqlBackup shape, RestoreInPlaceSpec,
PlannedFailoverPhase). The CR is dense with mirrors X /
analog of Y citations on purpose: it makes the v1 surface
forward-compatible with the existing operator's discipline, and it
keeps the implementation small because most of the heavy code is
already there.
Phasing lets the docs ship before the code. Phase 0 (runbook
over existing CRDs only) is independently useful: an on-call
engineer at 03:00 can recover into another cluster using only the
existing surface. That floor de-risks every later phase — if Phase 1+
slips a release, users still have a documented recovery path. The
rest of the wishlist line's gaps (no freshness signal, no audit-
grade promote) become enhancements over a working baseline, not
blockers for shipping anything.
Cross-cluster split-brain is a policy decision, not a
technology decision. S-3 was the single hardest call. The plan
resolves it as "accept-loss with audit" + runbook + post-hoc
divergent-GTID detection on rejoin — explicitly, in §1.2 non-goals
and §6.7. Critically, the transport discriminator preserves the
option to add a spec.activate.requireSourceFenceTTL bucket
sentinel in v2 without a CRD bump. The lesson generalizes: declare
the v1 stance up front (it's documented in non-goals) so design
review doesn't re-litigate it; and leave a forward-compatible knob
for future interlock-mode without committing to it now.
The kubectl-plugin verb name encodes a contract. promote
ships a specific guarantee (drain → GTID catch-up →
transactionsLost=0). dr-activate cannot offer that contract.
Different verb. Operators reading docs or running history-search
immediately know which contract they're invoking — N-3 is a tiny
decision with disproportionate operator-experience leverage.
Megamind's planning loop wins when readiness gates are tight.
This run landed at READY_TO_START with zero unresolved
decisions because the recommended-defaults table closed every
open [D-*] / [N-*] / [S-*] item with a concrete pick. A
planning agent applying defaults that don't close every open
finding produces a draft with TODOs; that's where implementation
cycles start spinning. The discipline is: if the critique can't
produce a default, the critique is not done.

Evidence

Claim-to-source table. Diagram and design-decision rows ground in
specific plan sections; runtime/contract claims ground in
current-code paths verified at planning time.

Claim	Source
Source operator is the sole bucket writer for dumps + binlogs	`internal/sidecar/binlog_archiver.go:239,537` (`IsReadOnly` gate); `internal/controller/backup_reconciler.go:60` (backup-Job RBAC); `plans/second-draft.md` §1.1, §3.3 step 1
Sidecar archiver runs only on the active primary, per-pod	`internal/sidecar/binlog_archiver.go:531-537` (read-only check); critique §1 (C-1, F-9)
`dr-only` site role is intra-cluster, never auto-promoted	`api/v1alpha1/types.go:280-283`; `docs/docs/multi-site.mdx:14-23`; `docs/docs/known-limitations.mdx:60-63`
Existing DR is "ad-hoc `initFromBackup` in another cluster"	`WISHLIST.md:21`; `briefs/context.md` §"What DR today actually looks like"; `api/v1alpha1/backup_types.go:552-616`
Diagram 1 (topology) grounded	`plans/second-draft.md` §1.1, §3.3 (bullet 1 topology overview), §5.3 (cursor file), §11.1 (IAM asymmetry)
Diagram 2 (activation state machine) grounded	`plans/second-draft.md` §9 (entire section); enum at §8.3 (`StandbyActivationPhase`); idempotency rules at §9.3-§9.4
Diagram 3 (DR-event lifecycle + failback) grounded	`plans/second-draft.md` §6 (activation flow), §7 (failback runbook), §12.4 (DNS event copy)
`MysqlStandbyCluster` Kind name (D-1/N-1)	`plans/second-draft.md` §4.2; critique §4 (N-1) "MysqlStandbyCluster is the clearest"
Object-store transport choice with `Network` reserved (D-1)	`plans/second-draft.md` §1.2, §4.4, §8.2 (`StandbyTransport` enum); critique §3 (D-1) "object-store-mediated DR is the natural extension"
`dr-activate` verb chosen over `promote` (N-3)	`plans/second-draft.md` §6.5; `cmd/kubectl-bloodraven/promote.go:23-46` (`transactionsLost=0` contract); critique §4 (N-3)
Target-side-only CRD residency (D-2)	`plans/second-draft.md` §4.3; critique §3 (D-2) "the cluster running the command and the cluster being promoted are the same"
Spec confirm-token gate pattern (D-5)	`plans/second-draft.md` §6.1; `api/v1alpha1/backup_types.go:723-732` (`RestoreInPlaceSpec.Confirm`)
Phase enum mirrors `RestoreInPlacePhase`	`plans/second-draft.md` §8.3 (`StandbyActivationPhase`); `api/v1alpha1/backup_types.go:762-810`
Split-brain stance "accept-loss with audit" (S-3)	`plans/second-draft.md` §1.2, §6.7, §15.4, §15.7; critique §5 (S-3); `docs/docs/durability-and-rpo.mdx:94-118` (divergent-GTID detection)
DR cluster reads only, plus dr-cursors writes	`plans/second-draft.md` §5.3, §11.1; IAM policy in §11.1 (`DRReadOnly` + `DRCursorWrite` statements scoped to `dr-cursors/*`)
`dr-cursors/<name>.json` retention-floor sentinel (F-2 mitigation)	`plans/second-draft.md` §5.3, §15.5; `cmd/bloodraven/main.go:388-410` (`/pitr-cutoff` handler); `internal/sidecar/binlog_archiver.go:350-458` (archive pruning)
`Restorable` condition powered by `MysqlBackupVerification` (S-6)	`plans/second-draft.md` §5.2; `api/v1alpha1/mysqlbackupverification_types.go` (existing CRD); reuse via synthetic-MysqlBackup annotation predicate flip at `internal/controller/backup_verification_reconciler.go`
Activation reuses `initFromBackup` + `pointInTime` unchanged	`plans/second-draft.md` §6.2-§6.3, §9 (Restoring phase); `api/v1alpha1/backup_types.go:552-616` (`InitFromBackupSpec` shape); `api/v1alpha1/backup_types.go:191-210` (`PointInTimeSpec`)
`template` field declared at CR-create-time, not activation-time	`plans/second-draft.md` §4.4 (last bullet) "user declares site list, DNS hostname, storage class, credentials secret at standby-CR-creation time, not at activation time (otherwise activation becomes a YAML scramble in an incident)"
Materialized MFG owner ref with `BlockOwnerDeletion=false`	`plans/second-draft.md` §4.3, §9.2 (Restoring phase work) — deleting standby after activation does NOT cascade-delete the writable MFG
Second activation is locked post-`Activated`; users delete-and-recreate to re-fire	`plans/second-draft.md` §6.6; `ActivationLocked` event in §10.5
Failback is symmetric — second `MysqlStandbyCluster` on returning original source (F-3)	`plans/second-draft.md` §7.1-§7.5; critique §2 (F-3)
Directional bucket prefix recommendation (e.g. `orders/east/`)	`plans/second-draft.md` §7.2 step 3, §7.4 (future automation note)
DNS handled by external-dns symmetrically; operator only writes `DNSEndpoint` (D-6)	`plans/second-draft.md` §12; `internal/platform/dns.go:23-31`; `api/v1alpha1/types.go:371-384`
Encryption passphrase mirrored manually; preflight-validated (D-4)	`plans/second-draft.md` §11.2; `docs/docs/backup-encryption.mdx:217-271`; `api/v1alpha1/backup_types.go:343-349` (`BackupDecryptionSpec` reuse)
IAM policy minimum (D-3)	`plans/second-draft.md` §11.1 (JSON policy verbatim)
Per-cluster operator with leader election; no federation (S-8)	`briefs/context.md` §"Operator/sidecar facts"; `plans/second-draft.md` §1.2 (non-goal)
Metrics: `bloodraven_dr_` series mirror `bloodraven_backup_` shape	`plans/second-draft.md` §10.3; `internal/metrics/metrics.go:114-200,163-170` (existing pattern)
Hard prereq on WISHLIST #43 (PITR E2E scenarios) for Phase 2	`plans/second-draft.md` §2.2, §15.1; `WISHLIST.md:17` (#43); critique §5 (S-5)
Critic dissent on D-1 transport choice (Opus + GPT vs Gemini)	`critiques/mbot-critique.md` §"Where the critics disagreed"
Ledger: planning-only run, zero unresolved decisions at end	`final/ledger.md`; `plans/second-draft.md` §16
CRD evolution constraint (no conversion webhook on `v1alpha1`)	`docs/docs/known-limitations.mdx:18-19`; `plans/second-draft.md` §15.12
Per-MFG Helm RBAC is hand-maintained; new Kind requires mirror	`plans/second-draft.md` §11.4.2; CLAUDE.md "Pre-PR gate" §5; `charts/bloodraven/templates/clusterrole.yaml:48-77`
Synthetic `MysqlBackup` annotation contract for verification reuse	`plans/second-draft.md` §5.2.1; annotations `dr.bloodraven.shipstream.io/synthetic=true`, `dr.bloodraven.shipstream.io/source-bucket=…`
Cross-cluster RPO floor `max_binlog_size ÷ throughput` + upload latency	`plans/second-draft.md` §1.2, §10.4; `docs/docs/durability-and-rpo.mdx:142-167`; critique §1 (C-3, F-6)

End of brief. Length target ~400-700 lines; this brief is within that
budget while staying grounded in the artifacts listed in the
"Required reading" section of the prompt.

This is the first slice of WISHLIST #7 ("Cross-region/cross-cluster DR as a first-class feature") and implements Phases 0 + 1 of the plan at .tmp/megamind-dr-7/plans/second-draft.md. Phase 0 (docs runbook) — docs/docs/multi-cluster-dr.mdx is a complete end-to-end recovery runbook that works against the existing surface (MysqlFailoverGroup + initFromBackup + PITR archive). An on-call engineer can recover into another cluster using only this doc and today's CRDs. Phases 1+ layer first-class tooling on top. Phase 1 (passive verifier CR) — new MysqlStandbyCluster CRD declares a DR relationship from the DR cluster's side. The reconciler scans the shared S3 bucket on a configurable cadence (default 5m), discovers the newest full dump (by @.json `end` timestamp, not lex order) and reads per-site PITR manifests, then publishes BucketReadable and SourceConfigKnown conditions. No activation, no promotion, no writes into MySQL — that machinery lands in Phases 2 and 3 (deferred to follow-up PRs). The CRD ships the full v1alpha1 schema (template/freshness/activate blocks) so the API surface is locked once and Phase 2/3 can read those fields without bumping the CRD version (no conversion webhook available, per docs/docs/known-limitations.mdx). Implementation went through Megamind's review/fix loop: - Three-model MBOT critique of the wishlist line → consolidated defaults - One planning agent produced the 2353-line plan - Three coding agents (CRD scaffolding, reconciler, docs) implemented Phase 0 + 1 in parallel work packages - Three-reviewer ultra-review (bugs/runtime/craft) returned 31 findings; 18 routed to fixes (S/R/D/T bundles), 5 explicitly deferred to Phase 2/3 - Three fix agents addressed every validated finding; one fixed-review pass plus a one-line trailing chart-CRD refresh - All local gates green: go build, go vet, golangci-lint, race-test suite, generate/manifests clean See .tmp/megamind-dr-7/ for run artifacts (plan, critique, reviews, fixes, educational brief).

The envtest tests created MysqlStandbyCluster CRs with a near-empty template.spec, which passed the local fake-client tests but failed admission in CI: spec.source.storage.s3.credentialsSecret: Invalid value: "" spec.template.spec.dns.hostname: Invalid value: "" spec.template.spec.sites: Required value The CRD embeds the full MysqlFailoverGroupSpec under spec.template.spec, so admission validates the template at standby-cluster create time, not only at Phase 3 activation. The fixture now mirrors examples/minimal-failovergroup.yaml (two primary-candidate sites with zone/taintNodeSelector/lbIP/storage and a DNS hostname) and sets a non-empty credentialsSecret on the S3 source. A new ensureEnvtestS3CredsSecret helper provisions the dummy Secret referenced by spec.source.storage.s3.credentialsSecret so the reconciler's resolveS3CredsToDir path (which always runs before the SetNewStoreFunc injection point) finds a valid Secret. Each of the three envtest tests now calls it before creating the CR. The miss: `make test-envtest` was not run locally before the original push; only the unit + component suites were. CLAUDE.md's Pre-PR gate requires test-envtest when CRD validation is touched.

colinmollenhour · 2026-05-24T07:49:40Z

+	prefix string,
+) (*standbyScanResult, error) {
+	// List everything under the prefix. For large archives this may be
+	// many thousands of keys; ArchiveStore handles pagination internally.


AI Ultra Review · Commit: bafb7c1 · Role: runtime · Flagged by: Pi

The discovery loop lists the entire backup prefix every 5 minutes by default. In production that prefix contains dump shard files plus the full archived-binlog history, so work and S3 List cost scale with total archive age/size for every MysqlStandbyCluster, not just with metadata objects. That can throttle or slow controller workers in long-lived installations. Please bound discovery to metadata/index prefixes (or maintain a small sentinel/index object) rather than scanning the full archive namespace on each reconcile.

AI Review Response · Commit: fe09da2 · By: OpenCode with GPT 5.5

I addressed the worker-hang part of this by wrapping discovery scans in a bounded context timeout, and the sibling-prefix safety issue in a separate resolved thread. I did not add a new metadata index/sentinel object in this commit because the current ArchiveStore contract only exposes List/Get and there is no existing writer-side index format for dumps.

Do you want Phase 1 to block on introducing a durable dump index/sentinel format now, or is bounded-time prefix scanning acceptable for this phase with the index format tracked as a follow-up?

colinmollenhour · 2026-05-24T07:49:53Z

AI Ultra Review · Commit: bafb7c1 · Roles: bugs, runtime, craft · Models: Pi

Posted 4 inline findings after validating and deduplicating the role outputs.

Agent	Found	Validated	False Positives	Unique Finds	Shared Finds	Accuracy	Composite Score
Pi	9	8	1	4	0	89%	6

Best/worst agent: no differentiation; only the Pi-backed participant was available in this harness run.

Role	Found	Validated	Unique-to-role	Accuracy
bugs	3	3	1	100%
runtime	3	3	1	100%
craft	3	2	0	67%

Most validated signal came from bugs and runtime; craft contributed duplicate coverage for the dump-selection and lastScanAt issues but also produced one false positive about generated artifacts that were present in the PR and excluded only from review bucketing.

colinmollenhour · 2026-05-24T08:10:29Z

AI Ultra Review · Commit: bafb7c1 · Roles: bugs, runtime, craft · Models: GPT, Gemini, Qwen

Full-branch re-review posted 5 additional inline findings. Four validated findings from this pass were already covered by the prior ultra-review, so I did not duplicate those comments: encrypted source archive discovery, lexicographic dump-candidate truncation, stale LastScanAt, and unbounded full-prefix listing.

Agent	Found	Validated	False Positives	Unique Finds	Shared Finds	Accuracy	Composite Score
GPT	8	8	0	3	5	100%	11
Gemini	13	5	8	1	4	38%	-10
Qwen	9	2	7	0	2	22%	-12

Best agent by composite score: GPT. Worst agent by composite score: Qwen. Gemini produced the most new signal for prefix-boundary and envtest issues, but also returned several lower-confidence craft items that did not clear validation.

Role	Found	Validated	Unique-to-role	Accuracy
bugs	6	4	1	67%
runtime	8	4	1	50%
craft	16	6	3	38%

Most validated signal came from craft for docs/test issues and from runtime for production-safety hazards. No role produced zero validated issues.

colinmollenhour · 2026-05-24T08:33:36Z

AI Review Response · Commit: fe09da2 · By: OpenCode with GPT 5.5 · Summary

Pushed commit fe09da2 (Address standby cluster review feedback) and resolved 8 review threads.

Fixes included: wired spec.source.decryption.passphraseSecret into PITRConfig.PassphraseFile; added a bounded discovery scan context; normalized S3 prefixes and enforced slash-bounded candidate matching; removed the lexicographic dump-candidate cap and added a >10 non-time-sorted dump regression test; updated LastScanAt on every successful scan while keeping BucketScanned event suppression separate; corrected the /active-site runbook endpoint; added required spec.template to the docs sample; and changed the envtest to fail on unexpected reconcile errors.

Posted one follow-up question on the remaining unresolved thread about whether Phase 1 must introduce a durable dump index/sentinel format now, or whether bounded-time prefix scanning is acceptable with indexing tracked as follow-up. I skipped resolving that thread pending the reviewer decision.

Validation: go test ./internal/controller -run TestMysqlStandbyCluster passes. The targeted envtest command could not run locally because this checkout excludes all test/envtest files without the required build constraints; CI Test / Envtest is running. Current PR state: branch pushed, working tree clean, Generate Check passed, other CI checks are in progress.

colinmollenhour added 2 commits May 23, 2026 07:31