Add MysqlStandbyCluster CRD + Phase 0 cross-cluster DR runbook#84
Add MysqlStandbyCluster CRD + Phase 0 cross-cluster DR runbook#84colinmollenhour wants to merge 3 commits into
Conversation
This is the first slice of WISHLIST #7 ("Cross-region/cross-cluster DR as a first-class feature") and implements Phases 0 + 1 of the plan at .tmp/megamind-dr-7/plans/second-draft.md. Phase 0 (docs runbook) — docs/docs/multi-cluster-dr.mdx is a complete end-to-end recovery runbook that works against the existing surface (MysqlFailoverGroup + initFromBackup + PITR archive). An on-call engineer can recover into another cluster using only this doc and today's CRDs. Phases 1+ layer first-class tooling on top. Phase 1 (passive verifier CR) — new MysqlStandbyCluster CRD declares a DR relationship from the DR cluster's side. The reconciler scans the shared S3 bucket on a configurable cadence (default 5m), discovers the newest full dump (by @.json `end` timestamp, not lex order) and reads per-site PITR manifests, then publishes BucketReadable and SourceConfigKnown conditions. No activation, no promotion, no writes into MySQL — that machinery lands in Phases 2 and 3 (deferred to follow-up PRs). The CRD ships the full v1alpha1 schema (template/freshness/activate blocks) so the API surface is locked once and Phase 2/3 can read those fields without bumping the CRD version (no conversion webhook available, per docs/docs/known-limitations.mdx). Implementation went through Megamind's review/fix loop: - Three-model MBOT critique of the wishlist line → consolidated defaults - One planning agent produced the 2353-line plan - Three coding agents (CRD scaffolding, reconciler, docs) implemented Phase 0 + 1 in parallel work packages - Three-reviewer ultra-review (bugs/runtime/craft) returned 31 findings; 18 routed to fixes (S/R/D/T bundles), 5 explicitly deferred to Phase 2/3 - Three fix agents addressed every validated finding; one fixed-review pass plus a one-line trailing chart-CRD refresh - All local gates green: go build, go vet, golangci-lint, race-test suite, generate/manifests clean See .tmp/megamind-dr-7/ for run artifacts (plan, critique, reviews, fixes, educational brief).
The envtest tests created MysqlStandbyCluster CRs with a near-empty template.spec, which passed the local fake-client tests but failed admission in CI: spec.source.storage.s3.credentialsSecret: Invalid value: "" spec.template.spec.dns.hostname: Invalid value: "" spec.template.spec.sites: Required value The CRD embeds the full MysqlFailoverGroupSpec under spec.template.spec, so admission validates the template at standby-cluster create time, not only at Phase 3 activation. The fixture now mirrors examples/minimal-failovergroup.yaml (two primary-candidate sites with zone/taintNodeSelector/lbIP/storage and a DNS hostname) and sets a non-empty credentialsSecret on the S3 source. A new ensureEnvtestS3CredsSecret helper provisions the dummy Secret referenced by spec.source.storage.s3.credentialsSecret so the reconciler's resolveS3CredsToDir path (which always runs before the SetNewStoreFunc injection point) finds a valid Secret. Each of the three envtest tests now calls it before creating the CR. The miss: `make test-envtest` was not run locally before the original push; only the unit + component suites were. CLAUDE.md's Pre-PR gate requires test-envtest when CRD validation is touched.
| prefix string, | ||
| ) (*standbyScanResult, error) { | ||
| // List everything under the prefix. For large archives this may be | ||
| // many thousands of keys; ArchiveStore handles pagination internally. |
There was a problem hiding this comment.
AI Ultra Review · Commit: bafb7c1 · Role: runtime · Flagged by: Pi
The discovery loop lists the entire backup prefix every 5 minutes by default. In production that prefix contains dump shard files plus the full archived-binlog history, so work and S3 List cost scale with total archive age/size for every MysqlStandbyCluster, not just with metadata objects. That can throttle or slow controller workers in long-lived installations. Please bound discovery to metadata/index prefixes (or maintain a small sentinel/index object) rather than scanning the full archive namespace on each reconcile.
There was a problem hiding this comment.
AI Review Response · Commit: fe09da2 · By: OpenCode with GPT 5.5
I addressed the worker-hang part of this by wrapping discovery scans in a bounded context timeout, and the sibling-prefix safety issue in a separate resolved thread. I did not add a new metadata index/sentinel object in this commit because the current ArchiveStore contract only exposes List/Get and there is no existing writer-side index format for dumps.
Do you want Phase 1 to block on introducing a durable dump index/sentinel format now, or is bounded-time prefix scanning acceptable for this phase with the index format tracked as a follow-up?
Posted 4 inline findings after validating and deduplicating the role outputs.
Best/worst agent: no differentiation; only the Pi-backed participant was available in this harness run.
Most validated signal came from |
Full-branch re-review posted 5 additional inline findings. Four validated findings from this pass were already covered by the prior ultra-review, so I did not duplicate those comments: encrypted source archive discovery, lexicographic dump-candidate truncation, stale
Best agent by composite score: GPT. Worst agent by composite score: Qwen. Gemini produced the most new signal for prefix-boundary and envtest issues, but also returned several lower-confidence craft items that did not clear validation.
Most validated signal came from craft for docs/test issues and from runtime for production-safety hazards. No role produced zero validated issues. |
Pushed commit Fixes included: wired Posted one follow-up question on the remaining unresolved thread about whether Phase 1 must introduce a durable dump index/sentinel format now, or whether bounded-time prefix scanning is acceptable with indexing tracked as follow-up. I skipped resolving that thread pending the reviewer decision. Validation: |
Summary
First slice of WISHLIST #7 — cross-region/cross-cluster DR. Implements Phases 0 + 1 of the multi-phase plan: an end-to-end cross-cluster DR runbook over the existing
MysqlFailoverGroup+initFromBackup+ PITR surface, plus a newMysqlStandbyClusterCRD with a passive verifier reconciler.docs/docs/multi-cluster-dr.mdxis a complete recovery walkthrough using only today's CRDs. An on-call engineer can recover into another cluster using this doc alone — no new operator features required. Includes topology, IAM policy, encryption-passphrase distribution, source-fencing checklist, recovery commands, DNS cutover, and prose failback narrative.MysqlStandbyClusterCRD (api/v1alpha1/mysqlstandbycluster_types.go, 523 lines) plus reconciler (internal/controller/standbycluster_reconciler.go, 747 lines). The CR declares a DR relationship from the DR cluster's side: scan the shared S3 bucket onfreshness.discoveryInterval(default 5m), pick the newest dump by@.json endtimestamp (not lex order — a real bug; see "Notes" below), read per-site PITR manifests, publishBucketReadableandSourceConfigKnownconditions. No activation, no promotion, no writes into MySQL — those land in Phase 2 (continuous restore verification) and Phase 3 (dr-activate), which are explicit follow-up PRs.The CRD ships the full v1alpha1 schema (
template,freshness,activateblocks) so the API surface is locked once and Phase 2/3 can read those fields without bumping the CRD version (no conversion webhook is available — seedocs/docs/known-limitations.mdx).What's NOT in this PR (explicit follow-ups)
Restorablecondition powered byMysqlBackupVerificationreuse;dr-cursors/<name>.jsonretention-floor sentineldr-activatekubectl plugin verb; activation state machine (Validating → Restoring → Replaying → Provisioning → Activated); materialization of the activeMysqlFailoverGroupRestorablegateThe full plan and per-phase scope are in
.tmp/megamind-dr-7/plans/second-draft.md(not committed; available locally during the run).Test plan
make generate && make manifests— clean, no further drift on re-rundiff config/crd/bases/shipstream.io_mysqlstandbyclusters.yaml charts/bloodraven/crds/shipstream.io_mysqlstandbyclusters.yaml— byte-identicalgo build ./...— successmake vetmake lint(golangci-lint run ./...)go test -race -count=1 ./internal/... ./test/component/— full suite, includes the new standby-cluster reconciler tests (13 original + 9 fix-pass extras)kubectl apply --dry-run=client -f examples/standby-cluster.yamlagainst the new CRD — acceptedtest/envtest/standbycluster_test.goNotes for reviewers
This PR went through Megamind's full review/fix loop:
transport=ObjectStore|Networkenum whereNetworkis reserved for v2.sort.Strings()onGenerateName-suffixed directory names — picks the wrong dump; replaced with@.json end-timestamp comparison). 5 were explicitly deferred to Phase 2/3 (e.g. standby metrics, GenerationChangedPredicate hardening) and tracked in.tmp/megamind-dr-7/reviews/validated-findings.md.Megamind Educational Brief
Educational brief — Cross-cluster DR (WISHLIST #7)
Journey
How the wishlist line traveled through Megamind's planning loop:
Help me plan out WISHLIST.md item #7resolved against
WISHLIST.md:21(cross-region/cross-cluster DR as afirst-class feature). The line bundles four distinct ideas — new CR,
continuous shipping, one-command promote, runbook — that the critics
treated as separable products.
briefs/context.mdgrounded the planning in theexisting surface: per-cluster operator, sidecar archiver gated on
!@@read_only(intra-pod, primary-only),dr-onlysite role(intra-cluster, never auto-promoted), full-backup + PITR archive in
S3,
initFromBackup+pointInTimealready deployed.OpenAI GPT-5.5 (xhigh), Google Gemini 3.1 Pro (high) — ran in
parallel against the wishlist line and produced ~30 deduplicated
findings: 7 contradictions/hidden-assumptions (
C-*), 10 failure-modegaps (
F-*), 8 architectural decisions needing an explicit choice(
D-*), 5 naming concerns (N-*), 8 scope-discipline items (S-*).argued for network-mediated first (cross-cluster MySQL replication)
and deferring continuous S3 replay. Opus + GPT lean object-store-
mediated because today's surface already does the work. The
collector sided with the 2-of-3 majority; the rejected option survives
as a reserved
transport=Networkenum so v2 can revisit without aCRD bump.
table (D-1..D-8, S-3) that became the seed for the planning pass. The
table is the source of every "Source" citation in the Design Decisions
section below.
end-to-end, producing the 16-section second draft (goals, phasing,
CRD shape, state machine, conditions, metrics, IAM/RBAC, DNS, test
plan, docs, risks, readiness). No MBOD/bundled-decisions phase ran
because the critique left zero open multi-option questions. Status
landed at
READY_TO_START.Design Decisions
Each row resolves a critique finding. "Source" cites the plan section
or current-code path that grounds it.
transport=ObjectStore.CHANGE REPLICATION SOURCEacross clusters). Reserved astransport=Networkenum for v2.plans/second-draft.md§1.1, §4.4; critique §3 (D-1, Gemini dissent)MysqlStandbyCluster(short namemsc).MysqlDRTarget— collides withSiteRoleDROnly(api/v1alpha1/types.go:280-283), which is the intra-cluster passive role and cannot be auto-promoted. Also rejected:MysqlClusterReplica,MysqlRemoteFollower,MysqlDRPair.plans/second-draft.md§4.2; critique §4 (N-1)kubectl bloodraven dr-activatepromote— already means zero-RPO intra-cluster switchover withtransactionsLost=0(cmd/kubectl-bloodraven/promote.go:23-46). Reusing it across a non-zero-RPO cross-cluster path misleads operators.plans/second-draft.md§6.5; critique §4 (N-3)plans/second-draft.md§4.3; critique §3 (D-2)spec.activate.confirmmust parse and be strictly greater thanstatus.activation.confirmTokenUsed. MirrorsrestoreInPlace.confirm.MysqlPromoteCR (extra Kind for negligible gain).plans/second-draft.md§6.1;api/v1alpha1/backup_types.go:723-732transportenum makes addingspec.activate.requireSourceFenceTTLnon-breaking.plans/second-draft.md§1.2, §6.7, §15.4; critique §5 (S-3)DNSEndpointin both clusters; user runs external-dns symmetrically. Bloodraven owns per-MFG records; the application-facing record (weighted-CNAME / GSLB / manual flip) is user-owned.plans/second-draft.md§12;internal/platform/dns.go:23-31;api/v1alpha1/types.go:371-384s3:ListBucket,s3:GetObject) plus a tightly-scoped write ondr-cursors/*only. Runbook publishes the minimum policy.plans/second-draft.md§11.1; critique §3 (D-3)plans/second-draft.md§11.2;docs/docs/backup-encryption.mdx:217-271MysqlStandbyClusteron the original source cluster pointing at the new primary's bucket prefix. No dedicatedMysqlFailbackKind.plans/second-draft.md§7; critique §2 (F-3)MysqlBackupCRs. DR controller writes phase-SucceededMysqlBackupCRs annotateddr.bloodraven.shipstream.io/synthetic=true; the existing verification reconciler is taught a single predicate to accept them.MysqlBackupCRs (GitOps or otherwise). Violates "only the bucket is the cross-cluster bus."plans/second-draft.md§5.2.1transportdiscriminator (matchesBackupStorage.Typeprecedent inapi/v1alpha1/backup_types.go:388-444).plans/second-draft.md§4.4dr-cursors/<ns>-<name>.jsonretention-floor sentinel. DR controller refreshes every 5m (TTL 60m); source operator's/pitr-cutoffreturnsmin(MysqlBackup_retention, oldest_required_across_cursors).plans/second-draft.md§5.3;cmd/bloodraven/main.go:388-410;internal/sidecar/binlog_archiver.go:350-458plans/second-draft.md§2, §3Architecture
Bloodraven on
main(commit5b5f0b0) is a single-cluster Kubernetesoperator: each
MysqlFailoverGroupis one logical database with 2-16sites that all live in the same cluster. The sidecar binlog archiver
runs only on the active primary (gated on
!@@read_only), uploadssealed binlogs to a shared S3 prefix, and the operator drives PITR
pruning from
/pitr-cutoff. Today's "DR into another cluster" is amanual checklist: stand up a fresh MFG in the target cluster with
spec.initFromBackuppointing at the source bucket, mirror passphraseSecrets, flip DNS. There is no CR tracking the relationship, no
freshness signal, no consumer-side retention guard, no audit-grade
promote.
WISHLIST #7 introduces one new Kind —
MysqlStandbyCluster— thatlives on the DR cluster, declares the relationship, continuously
verifies the latest dump + PITR window is restorable, and on a
confirm-token-gated
dr-activatematerializes a writableMysqlFailoverGrouploaded from the source archive. The onlycross-cluster bus is the shared object store. Each operator stays
single-cluster: no federation, no operator-to-operator RPC.
Diagram 1 — End-state two-cluster topology
flowchart LR subgraph SourceCluster["Source cluster (e.g. us-west-prod)"] direction TB SOp["Bloodraven operator"] MFG["MysqlFailoverGroup (orders)"] SidePri["Sidecar (active primary)<br/>!@@read_only ⇒ writes binlogs"] SideRep["Sidecar (replicas)<br/>@@read_only ⇒ idle"] SOp -->|"reconciles"| MFG MFG --> SidePri MFG --> SideRep end subgraph Bucket["Shared S3 bucket (cross-cluster bus)"] direction TB Dumps["<prefix>/<mysqlbackup-name>/<br/>(full dumps + @.json)"] Binlogs["<prefix>/binlogs/<br/>(sealed binlogs + per-site manifest)"] Cursors["<prefix>/dr-cursors/<name>.json<br/>(retention floor sentinel)"] end subgraph DRCluster["DR cluster (e.g. us-east-prod)"] direction TB DOp["Bloodraven operator<br/>+ MysqlStandbyClusterReconciler"] MSC["MysqlStandbyCluster CR<br/>(verifier mode)"] MBVer["MysqlBackupVerification (periodic)<br/>+ synthetic MysqlBackup CRs"] FutureMFG["Materialized MysqlFailoverGroup<br/>(not yet created — Phase 3 only)"] DOp -->|"reconciles"| MSC MSC -->|"Owns"| MBVer MSC -.->|"materializes on dr-activate"| FutureMFG end SidePri -->|"PUT sealed binlogs"| Binlogs SOp -->|"PUT full dumps (Job)"| Dumps SOp -->|"GET dr-cursors/*.json<br/>during /pitr-cutoff"| Cursors MSC -->|"GET (list + read) dumps, binlogs"| Dumps MSC -->|"GET binlog manifests"| Binlogs MSC -->|"PUT dr-cursors/<name>.json<br/>(only object DR writes)"| Cursors classDef src fill:#fee,stroke:#900 classDef dr fill:#eef,stroke:#009 classDef bus fill:#ffd,stroke:#960 class SourceCluster,SOp,MFG,SidePri,SideRep src class DRCluster,DOp,MSC,MBVer,FutureMFG dr class Bucket,Dumps,Binlogs,Cursors busThe asymmetry is the design's defining feature:
(operator-driven), sealed binlogs via the sidecar archiver (gated on
!@@read_only, so the upload happens only on the active primaryand switches over within one scan cycle on failover).
is the
dr-cursors/<name>.jsonsentinel — a tiny per-standby filethe DR controller refreshes every 5 minutes (TTL 60m) to bound the
source operator's
/pitr-cutoffand prevent it from pruning binlogsa DR consumer still needs (critique F-2).
s3:ListBucket+s3:GetObjecton thewhole prefix;
s3:PutObject+s3:DeleteObjectscoped todr-cursors/*only.Diagram 2 —
MysqlStandbyClusteractivation state machineMirrors
plans/second-draft.md§9. One transition per reconcile sooperator restarts land on a well-defined observable state.
stateDiagram-v2 [*] --> None None: "" (no activation requested) None --> Validating: "confirm set & valid<br/>Restorable=True (or AcceptUnverified=true)<br/>not already Activated" Validating --> Restoring: "spec snapshot taken<br/>template MFG name free or owned by this CR<br/>preflight passed" Validating --> Failed: "RFC3339 parse fail<br/>confirm ≤ confirmTokenUsed<br/>Restorable stale + !acceptUnverified<br/>TemplateInvalid" Restoring --> Replaying: "materialized MFG<br/>status.restore.phase == Succeeded" Restoring --> Failed: "MFG status.restore.phase == Failed<br/>(RestoreFailed) or MaterializedGroupCollision" Replaying --> Provisioning: "initFromBackup.pointInTime applied (or N/A)<br/>target GTID covers source dump GTID" Replaying --> Failed: "PitrReplayFailed<br/>(GTID mismatch)" Provisioning --> Activated: "MFG status.activeSite != ''<br/>Ready=True condition stamped" Provisioning --> Failed: "wall-clock > spec.activate.restoreTimeout<br/>(ProvisioningTimeout)" Activated --> [*]: "terminal success<br/>Active=True, ActivationInProgress=False" Failed --> [*]: "terminal failure<br/>confirmTokenUsed NOT bumped — edit confirm to retry"Key invariants:
confirmTokenUsedis monotonically non-decreasing. A retry afterFailedrequires the user to bumpspec.activate.confirmto astrictly-greater RFC 3339 timestamp (or use
--auto-confirm/kubectl bloodraven dr-activate).starts. Crash semantics: the next reconcile reads the current phase,
re-runs idempotent work (e.g.
CreateOrUpdateon the materializedMFG), and re-checks the exit condition. Pattern matches
PlannedFailoverReconciler.handle*ininternal/controller/planned_failover_reconciler.go:138-152.Activatedthe controller stops processing newconfirmedits and emits an
ActivationLockedevent. A second activation isalways a fresh CR.
Diagram 3 — DR-event lifecycle (with failback)
sequenceDiagram autonumber participant Apps as "Applications" participant SrcOp as "Source operator" participant Bucket as "Shared S3 bucket" participant DrOp as "DR operator" participant MSC as "MysqlStandbyCluster CR" participant DrMFG as "Materialized MysqlFailoverGroup" participant Admin as "Admin" Note over SrcOp,Bucket: "Steady state (Phase 1 + 2)" SrcOp->>Bucket: "PUT full dumps + sealed binlogs" DrOp->>Bucket: "LIST + GET (discovery loop, 5m)" DrOp->>MSC: "stamp status.discovered, BucketReadable=True" DrOp->>Bucket: "PUT dr-cursors/<name>.json (5m refresh)" DrOp->>DrOp: "scheduled MysqlBackupVerification (cron, default 0 4 * * *)" DrOp->>MSC: "Restorable=True; bloodraven_dr_restorable_timestamp_seconds gauge" Note over SrcOp,DrOp: "Source cluster loss" SrcOp--xApps: "primary unreachable / cluster API down" Admin->>Admin: "confirm source down (3 signals: /active-site 503,<br/>API server unreachable, MySQL TCP unreachable)" Note over Admin,MSC: "Activation (Phase 3)" Admin->>MSC: "kubectl bloodraven dr-activate <msc> --confirm $(date -u +%FT%TZ) --wait" DrOp->>MSC: "Validating: parse confirm, snapshot discovered.dumpName/Loc/GTID" DrOp->>DrMFG: "Restoring: create MFG with spec=template.spec + synthesized initFromBackup" DrMFG->>Bucket: "GET dump + binlogs (existing initFromBackup path)" DrMFG->>DrOp: "status.restore.phase=Succeeded" DrOp->>MSC: "Replaying: validate target GTID ⊇ source dump GTID" DrOp->>MSC: "Provisioning: wait Ready=True, activeSite set" DrOp->>MSC: "Activated: stamp materializedFailoverGroup,<br/>Active=True, emit StandbyActivated event" DrMFG-->>Apps: "writable (after DNS cutover by admin)" Note over Admin,Bucket: "DNS cutover (D-6) — user-driven" Admin->>Apps: "flip weighted-CNAME / external-dns ownership" Note over SrcOp,Bucket: "Source returns (Phase 4 failback)" SrcOp->>SrcOp: "original cluster comes back" Admin->>SrcOp: "delete old MFG + PVCs (destructive, manual)" Admin->>DrMFG: "ensure spec.backup.profiles[].storage.s3.prefix uses<br/>new directional layout (e.g. orders/east/)" Admin->>SrcOp: "apply *new* MysqlStandbyCluster pointing at DR cluster's prefix" SrcOp->>Bucket: "discovery + verification against DR's new bucket prefix" SrcOp->>SrcOp: "Restorable=True" Admin->>SrcOp: "kubectl bloodraven dr-activate (failback) — original cluster becomes standby of new primary"The symmetry of
MysqlStandbyClusteris the failback story: the sameKind/controller/state-machine runs in both directions. No new "failback"
Kind, no swap-direction operation; just a second standby CR pointing
the other way. The plan calls this "current-state-driven, not
identity-driven" — exactly the same discipline as in-cluster
fail-back, where a returning original primary wins promotion only if
it wins the normal GTID-freshest candidate path.
CR shape (top-level fields from plan §8)
MysqlStandbyClusterSpec(shipstream.io/v1alpha1, namespace-scoped,shortname
msc, categoriesbloodraven;mysql;dr):transport—ObjectStore(only honored in v1) or reservedNetwork.source—failoverGroupName, optionalnamespace/cluster(informational),storage(mirrorsBackupStorage),profileName, optionaldecryption(mirrorsBackupDecryptionSpec).template— embeddedMysqlFailoverGroupSpecdeclared at standby-CR-creation time so activation is not a YAML scramble during an incident; plusnameof the MFG to materialize.freshness—discoveryInterval(5m default),verifySchedulecron (default0 4 * * *UTC),verifyTimeZone,maxStaleness(48h default),suspend,retentionFloorRefresh(5m default).activate—confirm(required RFC 3339), optionalpointInTime(mirrorsPointInTimeSpec),acceptUnverified(bypass Restorable gate),restoreTimeout(2h default).Status carries
discovered,lastVerified,activation(the fullStandbyActivationStatusaudit block with source/target GTID, PITRstop datetime, replayed binlog count, materialized active site,
reason, message),
materializedFailoverGroup, andconditions.Conditions:
BucketReadable,SourceConfigKnown,Restorable,ActivationInProgress,Active.Phasing (plan §2)
docs/docs/multi-cluster-dr.mdxrunbook over existing CRDs only. Ships first; surfaces every gap Phases 1+ must close. Required for v1 floor.MysqlStandbyClusterCR + controller in passive verifier mode. Discovery loop populatesstatus.discovered; stampsBucketReadableandSourceConfigKnown. No load, no materialization.MysqlBackupCRs (annotateddr.bloodraven.shipstream.io/synthetic=true); CronJob-scheduledMysqlBackupVerificationruns;Restorablecondition;bloodraven_dr_restorable_timestamp_secondsgauge. Source operator gainsdr-cursors/*.jsonhonor in/pitr-cutoff. Hard prereq: WISHLIST Bump azure/setup-helm from 4 to 5 #43 PITR E2E scenarios.dr-activate(kubectl plugin) + spec confirm-token; full activation state machine; materialized MFG. New verb name picked deliberately to not collide with intra-clusterpromote(zero-RPO).Two crucial reuse points
The new controller is essentially a scheduler around primitives that
already exist.
MysqlBackupVerificationpowers Phase 2 readiness. Theverification reconciler already restores a backup into an ephemeral
mysqld and (optionally) replays binlogs to validate the dump. The
only new code on that path is a single predicate flip in
internal/controller/backup_verification_reconciler.goto acceptMysqlBackupCRs carrying the synthetic annotation and resolvetheir location from
MysqlBackup.status.location.initFromBackup+pointInTimepowers Phase 3activation. The Restoring phase synthesizes an
initFromBackupblock pointing at the discovered dump location (+ optional
pointInTime) and creates the materialized MFG. From there, thenormal greenfield bootstrap path runs unchanged — restore Job,
sentinel write, replica clone, DNSEndpoint write,
isFreshDeploygating. The standby controller's job at that point is purely to
wait for the existing
status.restore.phase=SucceededandReady=Truesignals.This is the design's lever: almost every primitive Phase 2/3 needs
already exists. The new CR is a scheduler + audit layer that names
the relationship; nearly all the heavy machinery (S3 client, BRV1
header parsing, dump load via
mysqlsh util.loadDump, binlog replay,DNSEndpoint, condition surface, metrics shape) is reused verbatim.Lessons
A naming collision is a critique-phase finding, not an
implementation-review finding. All three critics independently
surfaced N-1 —
MysqlDRTargetvsSiteRoleDROnly. If the planningpass had gone first, the name would have shipped, gone to
implementation review, and been renamed at the worst possible time
(after generated DeepCopy code + Helm chart edits + docs are in
flight). The MBOT critique catches naming hazards before anyone
writes a Go file.
Make the only cross-cluster bus explicit in the first diagram.
Diagram 1 puts the shared S3 bucket dead center with its three
subprefixes, and labels the directionality of every arrow. The
trust boundary becomes obvious immediately — and the asymmetry
("source writes, DR reads, except for one tiny sentinel object")
catches the C-1/F-2 critiques in one image. A reviewer who only
reads the diagram still knows the answer to "what runs the shipper
on the DR side?" (nothing).
MBOT critique value comes from picking models with different
failure modes. Opus + GPT + Gemini disagreed on exactly one
thing — the D-1 transport choice — and that disagreement was the
most valuable finding in the entire critique. The decision became
visible (object-store-first, with
Networkreserved as aforward-compatible enum) rather than buried in a single agent's
default. When models agree on everything, the critique is probably
rubber-stamping; when one dissents on one item, the planner has a
real tradeoff to write down.
Existing primitives drive the CRD shape, not the other way
around. The
templatefield, the synthetic-MysqlBackuptrick,the confirm-token pattern, the phase enum vocabulary — every one of
these mirrors something already in the codebase
(
MysqlFailoverGroupSpec,MysqlBackupshape,RestoreInPlaceSpec,PlannedFailoverPhase). The CR is dense withmirrors X/analog of Ycitations on purpose: it makes the v1 surfaceforward-compatible with the existing operator's discipline, and it
keeps the implementation small because most of the heavy code is
already there.
Phasing lets the docs ship before the code. Phase 0 (runbook
over existing CRDs only) is independently useful: an on-call
engineer at 03:00 can recover into another cluster using only the
existing surface. That floor de-risks every later phase — if Phase 1+
slips a release, users still have a documented recovery path. The
rest of the wishlist line's gaps (no freshness signal, no audit-
grade promote) become enhancements over a working baseline, not
blockers for shipping anything.
Cross-cluster split-brain is a policy decision, not a
technology decision. S-3 was the single hardest call. The plan
resolves it as "accept-loss with audit" + runbook + post-hoc
divergent-GTID detection on rejoin — explicitly, in §1.2 non-goals
and §6.7. Critically, the
transportdiscriminator preserves theoption to add a
spec.activate.requireSourceFenceTTLbucketsentinel in v2 without a CRD bump. The lesson generalizes: declare
the v1 stance up front (it's documented in non-goals) so design
review doesn't re-litigate it; and leave a forward-compatible knob
for future interlock-mode without committing to it now.
The kubectl-plugin verb name encodes a contract.
promoteships a specific guarantee (drain → GTID catch-up →
transactionsLost=0).dr-activatecannot offer that contract.Different verb. Operators reading docs or running history-search
immediately know which contract they're invoking — N-3 is a tiny
decision with disproportionate operator-experience leverage.
Megamind's planning loop wins when readiness gates are tight.
This run landed at
READY_TO_STARTwith zero unresolveddecisions because the recommended-defaults table closed every
open
[D-*]/[N-*]/[S-*]item with a concrete pick. Aplanning agent applying defaults that don't close every open
finding produces a draft with TODOs; that's where implementation
cycles start spinning. The discipline is: if the critique can't
produce a default, the critique is not done.
Evidence
Claim-to-source table. Diagram and design-decision rows ground in
specific plan sections; runtime/contract claims ground in
current-code paths verified at planning time.
internal/sidecar/binlog_archiver.go:239,537(IsReadOnlygate);internal/controller/backup_reconciler.go:60(backup-Job RBAC);plans/second-draft.md§1.1, §3.3 step 1internal/sidecar/binlog_archiver.go:531-537(read-only check); critique §1 (C-1, F-9)dr-onlysite role is intra-cluster, never auto-promotedapi/v1alpha1/types.go:280-283;docs/docs/multi-site.mdx:14-23;docs/docs/known-limitations.mdx:60-63initFromBackupin another cluster"WISHLIST.md:21;briefs/context.md§"What DR today actually looks like";api/v1alpha1/backup_types.go:552-616plans/second-draft.md§1.1, §3.3 (bullet 1 topology overview), §5.3 (cursor file), §11.1 (IAM asymmetry)plans/second-draft.md§9 (entire section); enum at §8.3 (StandbyActivationPhase); idempotency rules at §9.3-§9.4plans/second-draft.md§6 (activation flow), §7 (failback runbook), §12.4 (DNS event copy)MysqlStandbyClusterKind name (D-1/N-1)plans/second-draft.md§4.2; critique §4 (N-1) "MysqlStandbyCluster is the clearest"Networkreserved (D-1)plans/second-draft.md§1.2, §4.4, §8.2 (StandbyTransportenum); critique §3 (D-1) "object-store-mediated DR is the natural extension"dr-activateverb chosen overpromote(N-3)plans/second-draft.md§6.5;cmd/kubectl-bloodraven/promote.go:23-46(transactionsLost=0contract); critique §4 (N-3)plans/second-draft.md§4.3; critique §3 (D-2) "the cluster running the command and the cluster being promoted are the same"plans/second-draft.md§6.1;api/v1alpha1/backup_types.go:723-732(RestoreInPlaceSpec.Confirm)RestoreInPlacePhaseplans/second-draft.md§8.3 (StandbyActivationPhase);api/v1alpha1/backup_types.go:762-810plans/second-draft.md§1.2, §6.7, §15.4, §15.7; critique §5 (S-3);docs/docs/durability-and-rpo.mdx:94-118(divergent-GTID detection)plans/second-draft.md§5.3, §11.1; IAM policy in §11.1 (DRReadOnly+DRCursorWritestatements scoped todr-cursors/*)dr-cursors/<name>.jsonretention-floor sentinel (F-2 mitigation)plans/second-draft.md§5.3, §15.5;cmd/bloodraven/main.go:388-410(/pitr-cutoffhandler);internal/sidecar/binlog_archiver.go:350-458(archive pruning)Restorablecondition powered byMysqlBackupVerification(S-6)plans/second-draft.md§5.2;api/v1alpha1/mysqlbackupverification_types.go(existing CRD); reuse via synthetic-MysqlBackup annotation predicate flip atinternal/controller/backup_verification_reconciler.goinitFromBackup+pointInTimeunchangedplans/second-draft.md§6.2-§6.3, §9 (Restoring phase);api/v1alpha1/backup_types.go:552-616(InitFromBackupSpecshape);api/v1alpha1/backup_types.go:191-210(PointInTimeSpec)templatefield declared at CR-create-time, not activation-timeplans/second-draft.md§4.4 (last bullet) "user declares site list, DNS hostname, storage class, credentials secret at standby-CR-creation time, not at activation time (otherwise activation becomes a YAML scramble in an incident)"BlockOwnerDeletion=falseplans/second-draft.md§4.3, §9.2 (Restoring phase work) — deleting standby after activation does NOT cascade-delete the writable MFGActivated; users delete-and-recreate to re-fireplans/second-draft.md§6.6;ActivationLockedevent in §10.5MysqlStandbyClusteron returning original source (F-3)plans/second-draft.md§7.1-§7.5; critique §2 (F-3)orders/east/)plans/second-draft.md§7.2 step 3, §7.4 (future automation note)DNSEndpoint(D-6)plans/second-draft.md§12;internal/platform/dns.go:23-31;api/v1alpha1/types.go:371-384plans/second-draft.md§11.2;docs/docs/backup-encryption.mdx:217-271;api/v1alpha1/backup_types.go:343-349(BackupDecryptionSpecreuse)plans/second-draft.md§11.1 (JSON policy verbatim)briefs/context.md§"Operator/sidecar facts";plans/second-draft.md§1.2 (non-goal)bloodraven_dr_*series mirrorbloodraven_backup_*shapeplans/second-draft.md§10.3;internal/metrics/metrics.go:114-200,163-170(existing pattern)plans/second-draft.md§2.2, §15.1;WISHLIST.md:17(#43); critique §5 (S-5)critiques/mbot-critique.md§"Where the critics disagreed"final/ledger.md;plans/second-draft.md§16v1alpha1)docs/docs/known-limitations.mdx:18-19;plans/second-draft.md§15.12plans/second-draft.md§11.4.2; CLAUDE.md "Pre-PR gate" §5;charts/bloodraven/templates/clusterrole.yaml:48-77MysqlBackupannotation contract for verification reuseplans/second-draft.md§5.2.1; annotationsdr.bloodraven.shipstream.io/synthetic=true,dr.bloodraven.shipstream.io/source-bucket=…max_binlog_size ÷ throughput+ upload latencyplans/second-draft.md§1.2, §10.4;docs/docs/durability-and-rpo.mdx:142-167; critique §1 (C-3, F-6)End of brief. Length target ~400-700 lines; this brief is within that
budget while staying grounded in the artifacts listed in the
"Required reading" section of the prompt.