From 0c00254b330091e16264a989b39e02b1a494734a Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Thu, 4 Jun 2026 16:37:16 -0700 Subject: [PATCH 01/15] docs: add schema migration rfc --- docs/developer/rfc/lazy-schema-migration.md | 316 ++++++++++++++++++++ 1 file changed, 316 insertions(+) create mode 100644 docs/developer/rfc/lazy-schema-migration.md diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md new file mode 100644 index 00000000..3f6c8f74 --- /dev/null +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -0,0 +1,316 @@ +# RFC 002: Lazy Schema Migration for Lock-File Fingerprints + +- **Status**: Draft +- **Author**: @damcilva +- **Created**: 2026-06-04 +- **Related code**: + - [`internal/fingerprint/fingerprint.go`](../../../internal/fingerprint/fingerprint.go) — `ComputeIdentity`, `combineInputs` + - [`internal/lockfile/lockfile.go`](../../../internal/lockfile/lockfile.go) — `ComponentLock`, version gate + - [`internal/projectconfig/fingerprint_test.go`](../../../internal/projectconfig/fingerprint_test.go) — field-inclusion audit + - [`internal/app/azldev/core/components/resolver.go`](../../../internal/app/azldev/core/components/resolver.go) — `computeFreshnessStatus` + +## Background + +### Lock files and fingerprints + +`azldev` tracks the resolved state of each component in a per-component lock file under `locks/.lock`. A lock pins the upstream commit and records a content **fingerprint** of every input that affects the component's build output: + +```go +// internal/lockfile/lockfile.go +type ComponentLock struct { + Version int // lock file FORMAT version, currently 1 + ImportCommit string // write-once fork point + UpstreamCommit string // resolved upstream commit + ManualBump int // mass-rebuild counter + InputFingerprint string // sha256 of all render inputs + ResolutionInputHash string // sha256 of upstream-resolution inputs +} +``` + +The fingerprint is computed by [`fingerprint.ComputeIdentity`](../../../internal/fingerprint/fingerprint.go). Its core is a single structural hash of the resolved component config: + +```go +// hashstructure walks every exported field of ComponentConfig. +// Fields tagged `fingerprint:"-"` are excluded; everything else is included. +configHash, err := hashstructure.Hash(component, hashstructure.FormatV2, + &hashstructure.HashOptions{TagName: "fingerprint"}) +``` + +`configHash` is then folded together with the source identity, overlay file hashes, manual bump, and distro release version into a domain-separated SHA256 (`combineInputs`). Field inclusion is policed by [`TestAllFingerprintedFieldsHaveDecision`](../../../internal/projectconfig/fingerprint_test.go): every field of every fingerprinted struct must be consciously categorized as **included** (no tag) or **excluded** (`fingerprint:"-"`). The safe default is *included* — a new field contributes to the hash unless told otherwise. + +Drift is detected in [`resolver.go`](../../../internal/app/azldev/core/components/resolver.go): `computeFreshnessStatus` → `checkFingerprintFreshness` recomputes the identity and compares it to `InputFingerprint`, yielding `FreshnessCurrent` or `FreshnessStale`. `component update` ([`update.go`](../../../internal/app/azldev/cmds/component/update.go)) re-stamps the lock and flips a user-visible `Changed` flag whenever the fingerprint moves. + +### The three version axes + +As the tool matures, three *independent* notions of "version" are emerging. Conflating them is the source of the problems in this RFC: + +| Axis | Versions what | Lives where | Exists today? | +| ---- | ------------- | ----------- | ------------- | +| **Config schema version** | on-disk TOML field shape | load / migration layer | No | +| **Fingerprint algorithm version** | how inputs fold into the hash | `fingerprint` combiner | No (implicitly v1) | +| **Lock file format version** | lock file serialization | `lockfile` | Yes (`Version = 1`) | + +### The problem + +Because field inclusion defaults to *included*, **adding any new fingerprinted config field re-hashes every component**, even components that never set the field. `hashstructure` hashes a zero-value field identically to a present-but-empty field — but *differently* from a field that does not exist in the struct at all. So the moment the Go struct gains `Foo string`, every component's `configHash` changes, every `InputFingerprint` changes, and every `*.lock` shows drift on the next `component update`. + +Concretely: we add field `foo` and set `foo = "baz"` on package `bar`. The desired outcome is that **only** `bar.lock` drifts. The actual outcome today is that **all** lock files drift. + +**The root concern is git churn, not rebuilds.** The mass rebuild is a knock-on effect; the thing we actually want to protect is the **lock-file diff in a PR**. A change that touches one package should produce exactly one changed `*.lock` — ideally zero changed bytes in any other lock file, in any way. Lock files should change *only* when there is a real, per-component change. Clean diffs keep PRs reviewable, keep `git blame` meaningful, and make "this lock moved" a trustworthy signal that *that component's* inputs actually changed. The rebuild fan-out follows for free once the diffs are clean. + +There is a harder variant lurking behind the additive case: **non-additive** schema changes — renaming a field, removing one, changing a baked-in default, or fixing a bug in the hashing logic itself. These legitimately change the *meaning* of the config without changing user intent, and we will eventually need to absorb them without forcing every consumer to rebuild. + +### Goals + +- **G1 (primary, non-functional): no spurious lock-file diffs.** Landing a config-schema or hashing change must not rewrite `*.lock` files for components whose effective inputs are unchanged — not even to bump a version field. Soft requirement (strongly preferred, not a hard gate), but it shapes which solutions are acceptable: it rules out any eager "migrate everything" pass. +- **G2: only real changes drift.** A lock changes iff that component's build-effective inputs changed. +- **G3: piecemeal, lazy migration.** Schema/algorithm evolution rolls out per-component, riding along with independent changes, never as a big-bang. +- **G4: additive fields are drift-neutral by construction.** Adding an unset field should be invisible to every existing lock with no author effort beyond declaring intent. +- **G5: correctness backstop preserved.** Never silently under-rebuild: a genuine input change must always drift its lock. + +## Problem inventory + +| # | Problem | Root cause | Severity | +| - | ------- | ---------- | -------- | +| 1 | Adding a config field drifts every lock, even unaffected components | Field inclusion defaults to *included*; zero-value ≠ absent in struct hash | Mass rebuild | +| 2 | No way to land a semantically no-op schema change (rename/move) without drift | Fingerprint hashes raw struct shape, not normalized intent | Mass rebuild | +| 3 | No way to evolve the hashing algorithm (bugfix, input reorder) without drift | `combineInputs` has no version; old and new outputs are incomparable | Mass rebuild + lock churn | +| 4 | No on-disk config schema version | `ConfigFile` has a `$schema` URL but no version field | Blocks managed migration | +| 5 | Migration is all-or-nothing | Freshness check is binary match/no-match against one stored hash | No piecemeal rollout | + +Problems 1–3 share a shape: a change that *should* be invisible to most components is forced to be visible to all of them, because the fingerprint cannot distinguish "input changed" from "encoding changed." Problem 4 is the missing primitive for managed config evolution. Problem 5 is the property we actually want from any solution — **per-component, lazy** migration, where a lock upgrades only when something independently touches it. + +## How fingerprinting works today (detail) + +```text +ComponentConfig ──hashstructure(TagName:"fingerprint")──► configHash (uint64) + │ +SourceIdentity ───────────────────────────────────────────┐ │ +OverlayFileHashes ────────────────────────────────────────┤ │ +ManualBump ───────────────────────────────────────────────┤ ▼ +ReleaseVer ───────────────────────────────────────────► combineInputs ──► "sha256:…" (InputFingerprint) +``` + +Two properties of `hashstructure` v2.0.2 are load-bearing for this RFC: + +1. **No per-field `omitempty`.** The only field tags it recognizes are `-`/`ignore` (skip) and `set`/`string` (encoding). A zero-value field is hashed; it is not skipped. +2. **It honors the `Includable` interface.** If the value (or a pointer to it) implements `HashInclude(field string, v interface{}) (bool, error)`, the walker calls it per field and omits the field when it returns `false`. **An omitted field hashes identically to a field that was never declared.** There is also a global `IgnoreZeroValue` option that skips *all* zero-value fields. + +The struct's type name *is* part of the hash (`hashstructure` mixes in `reflect.Type.Name()`), but that name does not change when fields are added, so it is irrelevant to drift. + +One constraint: the top-level value passed to `hashstructure.Hash` is not addressable, so an `Includable` implementation must use a **value receiver** to be seen for the root struct. + +## Change taxonomy + +Not every config change should be treated the same way. The right mechanism depends on what kind of change it is. This taxonomy drives the design. + +| Class | Example | Should unaffected locks drift? | Mechanism | +| ----- | ------- | ------------------------------ | --------- | +| **Additive field** | new `foo` field, unset on most components | No — only setters drift | Default omitempty (Layer 1); no version bump | +| **Additive with non-zero default** | new field defaulted to `"auto"` via defaults merge | No | Algorithm version + replay (Layer 2) | +| **Rename / move** | `foo` → `bar`, same semantics | No | Schema migration → canonical hash (Layer 3) + Layer 2 | +| **Semantic change** | meaning of `foo` changes; output differs | Yes — that's correct | None; drift is intended | +| **Hashing bugfix** | overlay ordering bug in `combineInputs` | No | Algorithm version + replay (Layer 2) | +| **Field removal** | drop deprecated `foo` | No, if nobody set it | Migration drops field; Layer 2 for setters | + +The recurring requirement across the "No" rows is the same: **distinguish a change in user intent from a change in encoding, and only drift on the former.** Note the first row: with omitempty as the *default* (Layer 1), additive fields need no version bump and no replay at all — they are hash-neutral by construction. Layer 2 then carries only the genuinely hard cases (rows 2, 5). + +## Research + +### `hashstructure` options + +- **`Includable` (per-field callback)** keeps existing hashes byte-identical: fields that don't opt into omission hash exactly as they do today. This is the only option that solves Problem 1 *without* itself triggering a mass rebuild. +- **`IgnoreZeroValue` (global)** is simpler to wire but flips the hash of *every* struct that has any zero-value field — i.e. it is itself a mass-rebuild event, and it removes our ability to say "this empty field is meaningful." Rejected for the default path. + +### How other tools version lock state + +- **Cargo (`Cargo.lock`)** carries an explicit `version = 4` at the top of the lock and teaches `cargo` to read older versions, upgrading in place on the next write. Migration is lazy — touching the lock upgrades it. +- **npm (`package-lock.json`)** uses `lockfileVersion` and supports reading v1/v2/v3, rewriting to the current version on install. +- **Terraform state** stores a `version` and a `terraform_version`; state is upgraded forward on use, never downgraded. +- **Go modules** avoid the problem entirely by hashing *content* (`h1:` dirhashes) rather than a struct shape, so adding metadata fields never perturbs existing sums. + +The common pattern: an **integer version stamped into the persisted artifact**, plus the ability to **read and replay older versions**, plus **lazy forward-migration on write**. Our `ComponentLock.Version` already provides the slot; today we only ever reject mismatches instead of migrating. + +### Where the hashing logic should live + +A natural question (raised during design) is whether to move hashing onto the config types as a method. The hashing logic decomposes into two separable jobs: + +1. **Pure config hash** — `hashstructure.Hash(component, …)` plus field-inclusion policy. This is genuinely *about the config type*; `HashInclude` is already a method on it. +2. **Combiner / orchestration** — reads overlay file contents (needs `opctx.FS`), folds in source identity / releasever / bump, applies domain separation, and (Layer 2) selects an algorithm version. None of these are config fields. + +Moving (1) onto the type improves cohesion and version-locality. Moving (2) onto the type would drag I/O and cross-cutting algorithm versioning into `projectconfig` (a pure data package that `lockfile` imports), and would scatter the centralized field-inclusion audit. The combiner must own algorithm versioning because "I changed how overlays fold in" is not a per-type concern. **Recommendation: a hybrid seam** — expose `ComponentConfig.ConfigHash()` on the type; keep the combiner in `fingerprint`. + +## Proposed approach + +The design is **layered**, not a single switch. Each layer is independently shippable and addresses a distinct row of the taxonomy. Layers 1 and 2 cover the immediate need (Problems 1–3); Layer 3 is the forward-looking config-schema-version axis (Problem 4) and can follow later. + +### Layer 1 — Omitempty as the default inclusion policy + +Today the safe default is *include-always*: a new field contributes to the hash even at zero value. We **flip the default to omitempty** (include only when non-zero) and make the inclusion policy an explicit, exhaustive, CI-enforced choice per field. + +Every fingerprinted field must carry one of three `fingerprint` tag values: + +| Tag | Meaning | When to use | +| --- | ------- | ----------- | +| `fingerprint:"omitempty"` | included **only when non-zero** (the new default) | almost all fields | +| `fingerprint:"always"` | included even at zero value | fields whose **zero value is build-meaningful** (e.g. a `bool` that defaults true, where `false` must rebuild) | +| `fingerprint:"-"` | excluded from the hash entirely | paths, publish routing, runtime state | + +There is no untagged state. `TestAllFingerprintedFieldsHaveDecision` is rewritten to assert that **every** field of every fingerprinted struct carries a valid tag value — failing CI on any bare field. This is *simpler* than today's audit: it no longer maintains an `expectedExclusions` registry, it just checks for tag presence and validity. The conscious decision moves to the point of field definition, where the author has the context to judge whether zero is meaningful. + +Implement `Includable` on each fingerprinted struct, delegating to one shared helper: + +```go +// includeFingerprintField reports whether a field participates in the hash. +// "-" fields never reach here (hashstructure skips them first). "always" fields +// are included unconditionally; "omitempty" (the default) is included only when +// the resolved value is non-zero. +func includeFingerprintField(t reflect.Type, field string, v reflect.Value) (bool, error) { + sf, ok := t.FieldByName(field) + if !ok { + return true, nil + } + switch sf.Tag.Get("fingerprint") { + case "always": + return true, nil + default: // "omitempty" + return !v.IsZero(), nil + } +} + +// Value receiver: the root struct passed to hashstructure.Hash is not addressable. +func (c ComponentConfig) HashInclude(field string, v interface{}) (bool, error) { + return includeFingerprintField(reflect.TypeOf(c), field, reflect.ValueOf(v)) +} +``` + +**Why flipping the default is safe — fingerprints see the resolved config.** The usual objection to blanket omitempty is the false-negative footgun: a field whose zero is meaningful gets omitted and collides with "unset," so two semantically different configs hash the same and a rebuild is missed. That objection assumes we hash *raw user input*. We do not. `ComputeIdentity` runs on the **resolved, post-merge** config (`*result.config`, after defaults are applied). The omit predicate is therefore "the *resolved value* equals Go-zero," not "the user didn't type it." Consequences: + +- Two configs that both resolve a field to zero build identically → hashing them the same is **correct**, not a collision. +- "Unset" never reaches the hasher — it has already been resolved to its default. If the default is non-zero, the field is non-zero and is included anyway. If the default *is* zero, then unset and explicit-zero resolve identically → same build → same hash → correct. + +So the classic false-negative requires absence ≠ zero-default *at the point of hashing*, and post-merge resolution closes that gap. The load-bearing invariant is **G5's guarantee restated structurally: the fingerprint must see exactly the build-effective resolved config.** That invariant must already hold, or fingerprinting is broken independently of this change. The `fingerprint:"always"` escape hatch (plus the mandatory-tag audit) is cheap insurance against the invariant silently drifting later — e.g. if someone applies a default *after* fingerprinting. + +**Result:** additive fields are drift-neutral **by construction** (G4) — an unset field omits identically to a field that never existed, with no version bump and no replay. Only setters drift (G2). The cost is one tag per field (verbose but mechanical) and two genuine edge cases (see below). + +#### Edge cases under default omitempty + +- **Meaningful zero with a non-zero default** (e.g. `int Jobs` defaulting to `4`, where `0` means serial). Post-merge: unset → `4` (included), explicit `0` → `0` (omitted-by-omitempty). These build differently *and* hash differently, so there is no collision — they are consistent. Such fields rarely trigger omission at all because the default keeps them non-zero. Tag them `always` only if a zero value must be distinguishable from a future change of default. +- **nil vs empty slice.** `reflect.Value.IsZero` on a slice is `IsNil`. A missing TOML key → nil → omitted; `key = []` → non-nil empty → included. Default omitempty thus makes nil-vs-empty a hash distinction that include-always collapses. Almost never observable, but it is a real behavioral edge; `always` forces both to hash. + +**Adopting this flip is itself a fingerprint-algorithm change** (every config's hash moves), so it does not land for free — it is absorbed by Layer 2's versioned replay rather than by rewriting locks. See Layer 2. + +### Layer 2 — Versioned fingerprint with lazy replay (algorithm and default changes) + +Stamp the algorithm version into the lock and teach the freshness check to **replay** older versions: + +1. Add `FingerprintVersion int` (`toml:"fingerprint-version,omitempty"`) to `ComponentLock`. Old locks read as `0` = baseline. The lock **format** `Version` stays `1`; this is a *content* version and is fully backward compatible. +2. Turn `ComputeIdentity` into a thin dispatcher over a small registry of historical compute functions, keyed by version. Keep the last *N* versions: + + ```go + var fingerprinters = map[int]computeFn{ + 1: computeV1, // current algorithm + 2: computeV2, // e.g. fixes overlay-ordering bug, or absorbs a new default + } + const currentFingerprintVersion = 2 + ``` + +3. In `checkFingerprintFreshness`, compute at the **current** version. On mismatch, if `lock.FingerprintVersion < current`, recompute at the lock's recorded version. If *that* matches the stored hash, the inputs are unchanged and only the algorithm evolved → treat as `FreshnessCurrent` and flag for silent re-stamp. Otherwise → `FreshnessStale`. +4. `component update` always stamps `FingerprintVersion = current` when it writes. Migration is therefore **lazy and per-component**: a lock upgrades only when something independently touches it. + +This resolves Problems 2 (for default changes), 3 (hashing bugfixes), and 5 (piecemeal rollout). It is the same lazy-forward-migration pattern Cargo/npm use, specialized to a content hash. + +#### Churn-avoidance policies (G1) + +The version stamp is itself a potential source of spurious diffs — the exact thing G1 forbids. Two policies keep it invisible until a real change forces a write: + +- **`fingerprint-version` is `omitempty` in TOML.** A baseline (`version 0/absent`) lock that is never otherwise touched never materializes the field, so its bytes stay identical. The field only appears in a lock that was *already* being rewritten for an independent reason. Existing checked-in locks therefore produce **zero diff** on the day this lands. +- **Re-stamp only on a real write; never write to advance the version.** The "silent re-stamp" in step 3 is *piggybacked* onto a write that is already happening — it must never be its own trigger. `component update` must keep its existing write-on-change guard: if nothing else changed, the version bump alone does **not** dirty the lock. (Concretely, the equivalent of `if !result.Changed && !resHashChanged { return false, nil }` stays in force; the re-stamp rides the `Changed` path, it does not create one.) + +Together these make migration strictly opportunistic: a lock advances its version the next time its component changes for real, and not one commit sooner. + +#### First concrete use: the Layer 1 switchover + +Flipping the inclusion default to omitempty (Layer 1) moves every config's hash, so it cannot ship as a free additive change — it is **Layer 2's first real customer.** It registers as `computeV2` (omitempty default) alongside `computeV1` (include-always), bumps `currentFingerprintVersion`, and is absorbed by replay: every existing lock recomputes clean at v1, is recognized as unchanged-inputs, and re-stamps to v2 *only when next written* per the churn policy above. No mass regen, no flag day. And because omitempty makes all future additive changes hash-neutral by construction (G4), it permanently **shrinks** the set of changes that need a Layer 2 version event at all — Layer 1 is both the first user of Layer 2 and the thing that reduces Layer 2's future workload. + +### Layer 3 — Config schema version and canonical migration (future) + +This is the on-disk TOML axis. It is **independent** of the fingerprint axis and only needed once we make *non-additive* TOML changes (rename/move/remove fields in the file format itself). + +1. Add an explicit `schema-version` to the config file (distinct from the existing `$schema` URL, which is for editor validation). +2. At **load time**, migrate older config shapes forward into the single latest canonical struct *before* anything hashes them. Fingerprinting stays blissfully unaware of file-format history. +3. Pair with the **hybrid seam**: expose `ComponentConfig.ConfigHash()` on the type (pure struct hash + inclusion policy); keep the combiner in `fingerprint`. + +The critical invariant: **migrate old TOML → latest canonical struct, then hash once.** A semantically no-op migration (rename `foo`→`bar`) must produce the *same* canonical struct, hence the same hash, hence no drift — handled by Layer 2's replay only if the *encoding* changed, and by Layer 3's normalization for the *file shape*. Do **not** keep parallel `V1.Hash()`/`V2.Hash()` methods on versioned structs: that couples the lock to a Go type identity instead of a simple integer, and forces two independent code paths to agree on a hash forever. + +### Layer interaction + +```text +TOML on disk ──Layer 3: migrate to canonical struct──► ComponentConfig + │ + Layer 1: HashInclude omits zero fields (default omitempty) + ▼ + Layer 2: ComputeIdentity[version] ──► InputFingerprint + │ + lazy replay + re-stamp on update + ▼ + locks/.lock +``` + +## Design decisions + +### D1 — `Includable` vs `IgnoreZeroValue` + +Both omit zero values; the difference is **control granularity and escape hatches.** + +| | `Includable` per-field (chosen) | `IgnoreZeroValue` global | +| --- | --- | --- | +| Meaningful empties | Preserved via `fingerprint:"always"` | Lost — no opt-out | +| Per-field intent | Explicit, CI-audited | Invisible | +| Wiring | One helper + value-receiver method per struct | One option flag | + +`IgnoreZeroValue` is a blunt global switch with no way to keep a build-meaningful zero. `Includable` gives the same default behavior **plus** the `always` escape hatch and a point-of-definition audit. Both move every hash once on adoption — that cost is absorbed by Layer 2 either way (see the switchover note), so it is not a differentiator. + +### D2 — Mandatory explicit tags, default omitempty + +Every fingerprinted field must carry `fingerprint:"-"`, `"omitempty"`, or `"always"` — there is no untagged state. Rationale: + +- The *unsafe* failure direction is the false-negative (a meaningful field omitted → missed rebuild). Defaulting to omitempty tilts toward that direction, so the safety check must be loud, not implicit. +- A mandatory tag forces the "is this field's zero value build-meaningful?" decision **at the point of definition**, where the author has the context — better locality than a far-away exclusions registry. +- It *simplifies* the audit: assert every field has a valid tag value; delete the `expectedExclusions` map entirely. + +Fully implicit (omitempty default, no tags, no audit) was rejected — it removes the only guard against the unsafe direction. `fingerprint:"omitempty"` mirrors Go's own `json:",omitempty"`; `"always"` and `"-"` read unambiguously alongside it. + +### D3 — Content version vs format version in the lock + +Reusing `ComponentLock.Version` for the algorithm would force a format-version bump (and the strict `Parse` gate would reject old locks outright). A separate `FingerprintVersion` keeps the format stable and old locks readable, enabling lazy migration instead of hard rejection. + +### D4 — Method-on-type hashing + +Adopt the **hybrid seam**: pure `ConfigHash()` on the config type, combiner in `fingerprint`. A full move was rejected (layering regression: I/O + crypto + algorithm versioning do not belong on a data type). See [Research](#where-the-hashing-logic-should-live). + +## Alternatives considered + +- **Global `IgnoreZeroValue`** — see D1. Same default behavior but no per-field escape hatch for meaningful zeros and no point-of-definition audit. Rejected. +- **Implicit omitempty (no mandatory tags, no audit)** — see D2. Removes the only guard against the unsafe false-negative direction. Rejected in favor of mandatory 3-way tags. +- **Content-hash the rendered config** (Go-modules style) instead of struct-hashing — would sidestep field-shape sensitivity, but we deliberately exclude many fields (`paths`, `publish`, snapshots) from the fingerprint, so a blanket content hash over-captures. Rejected. +- **Parallel versioned structs with per-struct `Hash()`** — couples locks to Go type identity and duplicates hashing logic per version. Rejected in favor of Layer 2's integer-versioned combiner + Layer 3 canonical migration. +- **Bump lock format `Version` and migrate eagerly** — eager migration rewrites every lock at once, the exact mass-churn we are trying to avoid. Rejected in favor of lazy per-component re-stamp. + +## Incremental delivery + +1. **PR A (Layer 1)**: shared `includeFingerprintField` helper + `HashInclude` on `ComponentConfig` and `PackageConfig`; tag every fingerprinted field with one of `-`/`omitempty`/`always`; rewrite the field-decision audit to assert valid-tag presence and drop the `expectedExclusions` registry. **Note:** flipping the default moves every hash, so PR A must land *with or after* PR B's version machinery — it registers as `computeV2`, not as a standalone change. Unit test: an unset `omitempty` field is hash-invisible; setting it drifts; an `always` field drifts even at zero. +2. **PR B (Layer 2)**: `FingerprintVersion` on `ComponentLock`; version-dispatched `ComputeIdentity`; replay + re-stamp in `checkFingerprintFreshness` and `update.go`. Unit test: old-version lock with unchanged inputs → `Current`; changed inputs → `Stale`; re-stamp on update. +3. **PR C (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) — set a new `omitempty` field on a single component and assert only that lock drifts. +4. **PR D (Layer 3, later)**: `schema-version` field, load-time canonical migration, `ComponentConfig.ConfigHash()` seam. Gated on the first real non-additive TOML change. + +Each PR is independently revertible. Because the Layer 1 default flip is a hash-moving change, PRs A and B ship together (or B first); the `fingerprint-version` omitempty stamp and churn policies ensure existing locks see zero diff until independently touched. Layer 3 migrates lazily on next write. + +## Open questions + +1. How many historical fingerprint versions should the registry retain before dropping the oldest? (Trade-off: replay coverage vs. dead code.) +2. Should a lazy re-stamp during a *read-only* command (`render`, `build` freshness check) write the lock back, or defer all writes to `component update`? Writing on read is surprising; deferring means freshness checks stay slightly slower until the next update. +3. For Layer 3, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. +4. Should `omitempty` semantics use `reflect.Value.IsZero()` (Go's notion) or a config-aware notion of "unset" (e.g. nil pointer vs empty string)? Pointers would make "set to empty" expressible but complicate the structs. +5. Do we want a `component update --rehash` escape hatch that force-advances `FingerprintVersion` across the whole project (for when a change *is* intended to be global)? +6. Can the audit go further than tag-presence and *statically* flag fields whose zero value is likely meaningful (e.g. a `bool` defaulting true) and nudge toward `always`? Or is the point-of-definition tag plus code review sufficient? From 78870c3e39bdc3a9575d01a632b66ff6db4cc361 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Fri, 5 Jun 2026 10:17:49 -0700 Subject: [PATCH 02/15] update --- docs/developer/rfc/lazy-schema-migration.md | 177 ++++++++++++++++---- 1 file changed, 147 insertions(+), 30 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index 3f6c8f74..af24f6da 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -7,7 +7,9 @@ - [`internal/fingerprint/fingerprint.go`](../../../internal/fingerprint/fingerprint.go) — `ComputeIdentity`, `combineInputs` - [`internal/lockfile/lockfile.go`](../../../internal/lockfile/lockfile.go) — `ComponentLock`, version gate - [`internal/projectconfig/fingerprint_test.go`](../../../internal/projectconfig/fingerprint_test.go) — field-inclusion audit - - [`internal/app/azldev/core/components/resolver.go`](../../../internal/app/azldev/core/components/resolver.go) — `computeFreshnessStatus` + - [`internal/app/azldev/core/components/resolver.go`](../../../internal/app/azldev/core/components/resolver.go) — `computeFreshnessStatus`, `BuildDirtyChange` + - [`internal/app/azldev/cmds/component/update.go`](../../../internal/app/azldev/cmds/component/update.go) — `Changed` decision, re-stamp write + - [`internal/app/azldev/core/sources/synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go) — `FindFingerprintChanges` (synthetic changelog/release) ## Background @@ -47,7 +49,7 @@ As the tool matures, three *independent* notions of "version" are emerging. Conf | Axis | Versions what | Lives where | Exists today? | | ---- | ------------- | ----------- | ------------- | | **Config schema version** | on-disk TOML field shape | load / migration layer | No | -| **Fingerprint algorithm version** | how inputs fold into the hash | `fingerprint` combiner | No (implicitly v1) | +| **Lock content-hash version** | how inputs fold into the lock's stored hashes (`InputFingerprint` *and* `ResolutionInputHash`) | `fingerprint` combiner | No (implicitly v1) | | **Lock file format version** | lock file serialization | `lockfile` | Yes (`Version = 1`) | ### The problem @@ -158,6 +160,8 @@ Every fingerprinted field must carry one of three `fingerprint` tag values: There is no untagged state. `TestAllFingerprintedFieldsHaveDecision` is rewritten to assert that **every** field of every fingerprinted struct carries a valid tag value — failing CI on any bare field. This is *simpler* than today's audit: it no longer maintains an `expectedExclusions` registry, it just checks for tag presence and validity. The conscious decision moves to the point of field definition, where the author has the context to judge whether zero is meaningful. +**`Includable` is resolved per-struct — every fingerprinted struct needs the method.** `hashstructure` looks up `Includable` on each struct it walks (and the whole tree is non-addressable, since the root is passed by value), so a `HashInclude` on `ComponentConfig` alone governs only `ComponentConfig`'s own fields. On any nested struct that lacks its own value-receiver `HashInclude`, the `omitempty`/`always` tags are **decorative** — `hashstructure` natively understands only `-`/`ignore`/`set`/`string`, so the tag passes the CI audit while the field is still hashed at zero, and G4 silently holds only at the top level. The audit (`fingerprint_test.go` registers ~10 fingerprinted structs: `ComponentConfig`, `ComponentBuildConfig`, `CheckConfig`, `PackageConfig`, `ComponentOverlay`, `SpecSource`, `DistroReference`, `SourceFileReference`, `ReleaseConfig`, `ComponentRenderConfig`) must therefore **also assert that every registered struct implements `Includable`** — so a new fingerprinted struct cannot ship with inert tags. All registered structs get the one-line delegating method. + Implement `Includable` on each fingerprinted struct, delegating to one shared helper: ```go @@ -165,7 +169,7 @@ Implement `Includable` on each fingerprinted struct, delegating to one shared he // "-" fields never reach here (hashstructure skips them first). "always" fields // are included unconditionally; "omitempty" (the default) is included only when // the resolved value is non-zero. -func includeFingerprintField(t reflect.Type, field string, v reflect.Value) (bool, error) { +func includeFingerprintField(t reflect.Type, field string, val reflect.Value) (bool, error) { sf, ok := t.FieldByName(field) if !ok { return true, nil @@ -174,13 +178,20 @@ func includeFingerprintField(t reflect.Type, field string, v reflect.Value) (boo case "always": return true, nil default: // "omitempty" - return !v.IsZero(), nil + return !val.IsZero(), nil } } // Value receiver: the root struct passed to hashstructure.Hash is not addressable. +// +// CRITICAL: hashstructure calls HashInclude(field, innerV) where innerV is +// ALREADY a reflect.Value (the field's value), boxed into the interface{}. +// So we must TYPE-ASSERT it, not reflect.ValueOf it. reflect.ValueOf(v) would +// describe the reflect.Value struct itself (always non-zero) → !IsZero() always +// true → omitempty silently never fires and Layer 1 no-ops. Verified against +// hashstructure v2.0.2 hashstructure.go:346 (`include.HashInclude(name, innerV)`). func (c ComponentConfig) HashInclude(field string, v interface{}) (bool, error) { - return includeFingerprintField(reflect.TypeOf(c), field, reflect.ValueOf(v)) + return includeFingerprintField(reflect.TypeOf(c), field, v.(reflect.Value)) } ``` @@ -196,42 +207,110 @@ So the classic false-negative requires absence ≠ zero-default *at the point of #### Edge cases under default omitempty - **Meaningful zero with a non-zero default** (e.g. `int Jobs` defaulting to `4`, where `0` means serial). Post-merge: unset → `4` (included), explicit `0` → `0` (omitted-by-omitempty). These build differently *and* hash differently, so there is no collision — they are consistent. Such fields rarely trigger omission at all because the default keeps them non-zero. Tag them `always` only if a zero value must be distinguishable from a future change of default. -- **nil vs empty slice.** `reflect.Value.IsZero` on a slice is `IsNil`. A missing TOML key → nil → omitted; `key = []` → non-nil empty → included. Default omitempty thus makes nil-vs-empty a hash distinction that include-always collapses. Almost never observable, but it is a real behavioral edge; `always` forces both to hash. +- **nil vs empty slice.** `reflect.Value.IsZero` on a slice is `IsNil`. A missing TOML key → nil → omitted; `key = []` → non-nil empty → included. Default omitempty thus makes nil-vs-empty a hash distinction that include-always collapses. Almost never observable — but a TOML formatter that strips empty arrays (or any round-trip that maps `[]`→absent) would flip hashes. **Tag rule: for any slice/map field where an explicit-empty value is reachable and build-meaningful, prefer `fingerprint:"always"`** so nil and empty both hash and the distinction can't silently move a fingerprint. **Adopting this flip is itself a fingerprint-algorithm change** (every config's hash moves), so it does not land for free — it is absorbed by Layer 2's versioned replay rather than by rewriting locks. See Layer 2. -### Layer 2 — Versioned fingerprint with lazy replay (algorithm and default changes) +### Layer 2 — Versioned lock content with lazy replay (algorithm and default changes) -Stamp the algorithm version into the lock and teach the freshness check to **replay** older versions: +Stamp one **lock content-hash version** into the lock and teach the freshness check to **replay** older versions. The version governs *both* stored hashes (`InputFingerprint` and `ResolutionInputHash`) — they live in one lock, share one write event, and a single integer is the natural fit (see [scope note](#both-hashes-share-one-version) for why one version, not two): -1. Add `FingerprintVersion int` (`toml:"fingerprint-version,omitempty"`) to `ComponentLock`. Old locks read as `0` = baseline. The lock **format** `Version` stays `1`; this is a *content* version and is fully backward compatible. -2. Turn `ComputeIdentity` into a thin dispatcher over a small registry of historical compute functions, keyed by version. Keep the last *N* versions: +1. Add `LockContentVersion int` (`toml:"lock-content-version,omitempty"`) to `ComponentLock`. **An absent field reads as `1`** — the current, pre-RFC algorithms — *not* `0`. (`0` is the Go zero value but no `v0` exists; map the zero to the baseline at read time: `ver := lock.LockContentVersion; if ver == 0 { ver = 1 }`.) The lock **format** `Version` stays `1`; this is a *content* version and is fully backward compatible. +2. Turn the combiner into a thin dispatcher over a small registry of historical algorithms, keyed by version. Each entry pairs the two compute functions; when only one algorithm changes, the other slot **reuses** the prior function (no version-neutral hash moves for the untouched one). Keep versions back to a declared floor (see [Registry floor](#registry-floor-and-forced-migration)): ```go - var fingerprinters = map[int]computeFn{ - 1: computeV1, // current algorithm - 2: computeV2, // e.g. fixes overlay-ordering bug, or absorbs a new default + type lockAlgo struct { + fingerprint computeFn // produces InputFingerprint + resolution resolveFn // produces ResolutionInputHash + } + var lockAlgos = map[int]lockAlgo{ + 1: {computeFP1, computeRes1}, // current (pre-RFC) algorithms — the implicit baseline + 2: {computeFP2, computeRes1}, // omitempty default (Layer 1); resolution UNCHANGED → reuse v1 fn } - const currentFingerprintVersion = 2 + const currentLockContentVersion = 2 + const minSupportedLockContentVersion = 1 ``` -3. In `checkFingerprintFreshness`, compute at the **current** version. On mismatch, if `lock.FingerprintVersion < current`, recompute at the lock's recorded version. If *that* matches the stored hash, the inputs are unchanged and only the algorithm evolved → treat as `FreshnessCurrent` and flag for silent re-stamp. Otherwise → `FreshnessStale`. -4. `component update` always stamps `FingerprintVersion = current` when it writes. Migration is therefore **lazy and per-component**: a lock upgrades only when something independently touches it. +3. In `checkFingerprintFreshness`, compute at the **current** version. On mismatch, if the lock's recorded version `< current`, recompute at the lock's recorded version. If *that* matches the stored hash, the inputs are unchanged and only the algorithm evolved → treat as `FreshnessCurrent` and flag for silent re-stamp. Otherwise → `FreshnessStale`. (Phase 1 wires this for the fingerprint hash; the resolution hash reuses `computeRes1` until its algorithm first changes — see scope note.) +4. `component update` stamps `LockContentVersion = current` **only when it is already writing for an independent reason** (see the churn policy below). Migration is therefore **lazy and per-component**: a lock upgrades only when something independently touches it. This resolves Problems 2 (for default changes), 3 (hashing bugfixes), and 5 (piecemeal rollout). It is the same lazy-forward-migration pattern Cargo/npm use, specialized to a content hash. +#### Both hashes share one version + +`ComponentLock` carries two persisted content hashes: `InputFingerprint` (render inputs, via `hashstructure` + `Includable`) and `ResolutionInputHash` (upstream-resolution inputs — a flat SHA256 over seven explicit fields in `ComputeResolutionHash`, *not* a struct walk, so the omitempty/`Includable` story does not apply to it). Both have the **same evolution problem**: appending an input or reordering the fold moves every lock's hash → G1 churn. + +We version them with **one shared integer**, not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version fields would double the floor/replay/`--rehash` machinery for an input set (`ResolutionInputHash`) that changes rarely — YAGNI. + +**Phasing.** Naming the field `lock-content-version` *now* is the one expensive-to-reverse decision (it is baked into the on-disk TOML schema the moment Layer 2 ships; renaming a persisted key is itself a migration). The fingerprint replay is wired in the first Layer 2 PR. **Resolution-hash replay is reserved, not yet wired** — the registry slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`), with no schema change. Critically, `ResolutionInputHash` does **not** feed the synthetic changelog path, so its churn is a one-line lock rewrite + a wasted re-resolution, never a phantom release (unlike `InputFingerprint`; see [Downstream consumers](#downstream-fingerprint-consumers-blast-radius)). + #### Churn-avoidance policies (G1) The version stamp is itself a potential source of spurious diffs — the exact thing G1 forbids. Two policies keep it invisible until a real change forces a write: -- **`fingerprint-version` is `omitempty` in TOML.** A baseline (`version 0/absent`) lock that is never otherwise touched never materializes the field, so its bytes stay identical. The field only appears in a lock that was *already* being rewritten for an independent reason. Existing checked-in locks therefore produce **zero diff** on the day this lands. -- **Re-stamp only on a real write; never write to advance the version.** The "silent re-stamp" in step 3 is *piggybacked* onto a write that is already happening — it must never be its own trigger. `component update` must keep its existing write-on-change guard: if nothing else changed, the version bump alone does **not** dirty the lock. (Concretely, the equivalent of `if !result.Changed && !resHashChanged { return false, nil }` stays in force; the re-stamp rides the `Changed` path, it does not create one.) +- **`lock-content-version` is `omitempty` in TOML.** A baseline (absent / version `1`) lock that is never otherwise touched never materializes the field, so its bytes stay identical. The field only appears in a lock that was *already* being rewritten for an independent reason. Existing checked-in locks therefore produce **zero diff** on the day this lands. +- **The `Changed` decision must replay *before* it compares — this is the subtle seam.** The naive read of the existing guard `if !result.Changed && !resHashChanged { return false, nil }` suggests the re-stamp harmlessly "rides the `Changed` path." **It does not.** In [`update.go`](../../../internal/app/azldev/cmds/component/update.go), `result.Changed` is set to `true` the instant `lock.InputFingerprint != identity.Fingerprint` — and `identity` is computed at the *current* version. That comparison sits **upstream** of the write guard. So after the v1→v2 switchover, the current-version hash differs from every stored v1 hash, `Changed` flips for ~every component, and we get exactly the mass auto-release-bump + mass lock rewrite G1 forbids. The fix is mandatory, not incidental: + + ```go + // Replay at the lock's recorded version BEFORE deciding Changed. + lockVer := lock.LockContentVersion + if lockVer == 0 { + lockVer = 1 + } + replayed, _ := fingerprint.ComputeIdentityAt(lockVer, *result.config, releaseVer, opts) + if lock.InputFingerprint != replayed.Fingerprint { + result.Changed = true // a REAL input change under the lock's own algorithm + } + // else: hashes match under the old algorithm → inputs unchanged, only the + // algorithm moved → NOT Changed. Advance the version only if some other real + // change is already dirtying this lock. + lock.InputFingerprint = identity.Fingerprint // current-version hash + if result.Changed { // re-stamp piggybacks a real write; never its own trigger + lock.LockContentVersion = currentLockContentVersion + } + ``` + + The principle: **"changed?" is judged under the lock's own algorithm version; the stored hash is only upgraded to the current version when the lock is already dirty for a real reason.** (When resolution replay is wired, the same replay-before-compare applies to the `resHashChanged` silent-write guard.) Together these make migration strictly opportunistic: a lock advances its version the next time its component changes for real, and not one commit sooner. +#### Registry floor and forced migration + +Lazy migration means an untouched lock can sit at an old version **indefinitely** (G3 by design). That makes "keep the last *N* versions" a **correctness cliff, not a tuning knob**: if pruning drops the compute function a lock still depends on, replay becomes impossible → forced `FreshnessStale` → the mass rebuild/rewrite (and, via the downstream-consumer analysis below, mass changelog churn) the whole design exists to avoid. So the floor must be explicit and paired with an escape hatch, decided now: + +- **`minSupportedLockContentVersion`** is a hard floor. A lock below it cannot be replayed and is treated as `Stale`. Dropping a registry entry is therefore a deliberate, breaking, announced act — never incidental cleanup. +- **`component update --rehash`** (Open Q#5, promoted to a requirement) force-advances every lock to the current version in one deliberate pass. This is the *only* sanctioned way to retire an old version: rehash the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. + +**Mixed-toolchain hazard.** `go-toml` silently drops unknown fields, so an *older* azldev binary that rewrites a lock a newer binary had stamped will strip `lock-content-version`, regressing it to the baseline. On the next new-binary run the stored (baseline-replayed) hash won't match the current algorithm → spurious `Changed` + bump. This is the classic down-migration trap. Mitigation is a documented invariant ("all writers of a given `locks/` tree must be ≥ the version that tree was last stamped at"), enforced in CI by pinning the azldev version; a hard guard (refuse to write a lock whose on-disk version exceeds the binary's `currentLockContentVersion`) is a possible belt-and-suspenders. + +#### Replaying across a changed input set — `{a,b,c}` → `{a,b,d}` + +A lock stores **one opaque hash string** plus its `LockContentVersion`; it does *not* store the individual inputs. So when the measured set changes — say the fingerprint stops measuring `c` and starts measuring `d` — an existing lock (whose stored hash was computed over `{a,b,c}` at v1) is reconciled the only way an opaque hash allows: **recompute and compare, at the lock's own version.** + +Split the change into its two halves; they are handled independently: + +- **Adding `d`** is the additive case — `d` is tagged `omitempty`, so for any component that doesn't set it the hash is byte-identical (G4). Free. No version bump. +- **Dropping `c`** is what forces the version bump, and it is reconciled by replay: + 1. `computeFP2` (measures `{a,b,d}`) ≠ stored hash → mismatch. + 2. lock version (1) < current (2) → **replay `computeFP1`** (still measures `{a,b,c}`). + 3. v1-replay == stored hash? **Yes** → `a,b,c` unchanged since the lock was written; only the *measurement* evolved → `FreshnessCurrent`, lazy re-stamp. **No** → a real input moved → `Stale`, rebuild. Both correct. + +So the bump is **not breaking**: replay answers "were the *old* inputs unchanged?" without rebuilding. + +**The load-bearing constraint the rest of Layer 2 assumes implicitly:** *a replay function reads the live config struct.* `computeFP1` is Go code in **today's** binary, reading fields off **today's** struct. That is fine when the struct shape is unchanged (the omitempty flip, a combiner bugfix, a changed default — all replay against the same fields). But **physically deleting field `c` from the struct breaks `computeFP1`** — it can no longer read `c`, cannot reproduce the `{a,b,c}` hash, and every lock that set `c` is forced `Stale`. Removal-from-the-struct is therefore the one edit that silently defeats replay. + +The way around it is a **deprecate-then-delete** two-step, both non-breaking: + +1. **Bump to v2 measuring `{a,b,d}` but keep field `c` in the struct**, tagged `fingerprint:"-"` so `computeFP2` ignores it while `computeFP1` can still read it for replay. Every old lock replays clean at v1, is recognized as unchanged, lazy re-stamps to v2. Zero forced rebuilds. +2. **Only after the floor passes v1** (`minSupportedLockContentVersion = 2`, ideally after a deliberate `--rehash`) physically delete field `c`. `computeFP1` is already retired, so nothing reads `c` anymore. + +> **Invariant:** a field may be physically removed from the config struct only after *every* registry entry that measured it has been retired below `minSupportedLockContentVersion`. Equivalently: retained replay functions and the struct they read must stay in sync — you cannot delete a field a live version still needs. + +This makes "drop an input" a lazy, per-component migration rather than a fleet-wide rebuild — at the cost of carrying a deprecated field on the struct until its replay function ages out. + #### First concrete use: the Layer 1 switchover -Flipping the inclusion default to omitempty (Layer 1) moves every config's hash, so it cannot ship as a free additive change — it is **Layer 2's first real customer.** It registers as `computeV2` (omitempty default) alongside `computeV1` (include-always), bumps `currentFingerprintVersion`, and is absorbed by replay: every existing lock recomputes clean at v1, is recognized as unchanged-inputs, and re-stamps to v2 *only when next written* per the churn policy above. No mass regen, no flag day. And because omitempty makes all future additive changes hash-neutral by construction (G4), it permanently **shrinks** the set of changes that need a Layer 2 version event at all — Layer 1 is both the first user of Layer 2 and the thing that reduces Layer 2's future workload. +Flipping the inclusion default to omitempty (Layer 1) moves every config's hash, so it cannot ship as a free additive change — it is **Layer 2's first real customer.** It registers as the `computeFP2` algorithm (omitempty default) alongside `computeFP1` (include-always), bumps `currentLockContentVersion` to 2, and is absorbed by replay: every existing lock recomputes clean at v1, is recognized as unchanged-inputs, and re-stamps to v2 *only when next written* per the churn policy above. (The resolution slot is unchanged across this bump — v2 reuses `computeRes1`.) No mass regen, no flag day. And because omitempty makes all future additive changes hash-neutral by construction (G4), it permanently **shrinks** the set of changes that need a Layer 2 version event at all — Layer 1 is both the first user of Layer 2 and the thing that reduces Layer 2's future workload. ### Layer 3 — Config schema version and canonical migration (future) @@ -243,6 +322,8 @@ This is the on-disk TOML axis. It is **independent** of the fingerprint axis and The critical invariant: **migrate old TOML → latest canonical struct, then hash once.** A semantically no-op migration (rename `foo`→`bar`) must produce the *same* canonical struct, hence the same hash, hence no drift — handled by Layer 2's replay only if the *encoding* changed, and by Layer 3's normalization for the *file shape*. Do **not** keep parallel `V1.Hash()`/`V2.Hash()` methods on versioned structs: that couples the lock to a Go type identity instead of a simple integer, and forces two independent code paths to agree on a hash forever. +**Caveat — `hashstructure` hashes the struct type name.** It mixes `reflect.Type.Name()` into the hash, so a Layer-3 migration that moves content into a *renamed* Go struct changes the fingerprint even when the content is byte-identical. "Rename is drift-neutral" therefore holds only if the canonical struct **keeps the original type name**, or the rename is shipped as a Layer-2 version bump that absorbs it. Prefer keeping the type name; reserve the version bump for when the type genuinely must be renamed. + ### Layer interaction ```text @@ -257,6 +338,39 @@ TOML on disk ──Layer 3: migrate to canonical struct──► ComponentConfig locks/.lock ``` +## Downstream fingerprint consumers (blast radius) + +The versioned-replay story in Layer 2 must hold for **every** reader of `InputFingerprint`, not just the two paths it grew up around. This is the migration blast-radius map; each consumer's behavior under a v1→v2 switchover is stated explicitly. + +| Consumer | Reads | Compares | Migration behavior required | +| -------- | ----- | -------- | --------------------------- | +| `checkFingerprintFreshness` (resolver) | recomputed identity | vs stored hash | Replay at lock version (Layer 2 core) | +| `component update` `Changed` decision | recomputed identity | vs stored hash | **Replay before `Changed`** (see churn policy / M2 seam) | +| `synthistory.FindFingerprintChanges` | stored hash strings across git history | adjacent commits | **No change needed — if migration stays lazy** | +| `synthistory.BuildDirtyChange` | recomputed (current ver) | vs stored `headLock` hash | **Replay at headLock version** before declaring dirty | +| `ResolutionInputHash` staleness/write | recomputed resolution hash | vs stored | **Shares the version; replay reserved, not yet wired** | + +### The synthetic changelog/release path is the real hazard + +[`synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go) turns fingerprint movement into **user-visible, shipped** package state — `%autochangelog` entries and `%autorelease` increments. There are two distinct comparators, and the design resolves them asymmetrically. + +- **`FindFingerprintChanges` (historical walker)** does a raw, version-blind string compare of `InputFingerprint` across the lock's git history and emits a synthetic changelog/release entry on every change. Making it genuinely version-aware is hard-to-infeasible — it only has committed *strings*, no inputs to replay. **It does not need to be**, *provided migration stays strictly lazy.* Under the churn policy, a version bump only ever rides a commit where a real input also changed, so there is never a version-only commit in history for the walker to misread. The migration folds honestly into that real change's entry. **This is a design decision, not a code fix:** the v1→v2 conversion is an *accepted, per-component, notable* changelog event that piggybacks a real change. + - **Trap:** this only holds while migration is lazy. A fleet-wide `--rehash` (or the M2 bug where `Changed` flips for everyone) converts *phantom* → *honest-but-fleet-wide* — a truthful but fleet-wide release bump, i.e. **G1 is dead.** "Accept as notable" is therefore conditional on **migration never riding a version-only or fleet-wide write** (the `--rehash` floor pass excepted, because it is deliberate and operator-driven). +- **`BuildDirtyChange` (live dirty check)** compares a *recomputed* current-version (v2) hash against the *stored* (possibly v1) `headLock.InputFingerprint` and declares dirty on inequality. "Accept as notable" does **not** save this path: post-switchover an *unchanged* component would read **dirty on every `render`/`build`** until re-stamped — a persistent, recurring spurious signal, worse than a one-time entry. The fix is **free**: it is the *same replay Layer 2 already owes the freshness check* — replay at `headLock`'s recorded version before declaring dirty. One additional call site for logic already being written, no new mechanism. + +**Net:** M1 is not "make the changelog walker version-aware" (hard, maybe infeasible). It is two things already on the books — (1) the strict lazy churn policy, so the walker never sees a version-only commit; and (2) extend the freshness replay to `BuildDirtyChange`, one extra call site. + +### `ResolutionInputHash` — shares the version, replay deferred + +`ComponentLock` carries a *second* persisted content hash, `ResolutionInputHash`, with its own staleness logic and its own silent-write path (it writes when only `resHashChanged`, never flipping `Changed`). It has the **identical** evolution problem as `InputFingerprint`: any future change to `ComputeResolutionHash`'s algorithm moves every lock's hash — exactly the mass-churn this RFC exists to prevent. + +The single `lock-content-version` covers it (see [Both hashes share one version](#both-hashes-share-one-version)). What differs is **blast radius**, which is why we wire its replay later, not now: + +- `ResolutionInputHash` does **not** feed `synthistory` — so an algorithm change can never mint a phantom changelog/release (the M1 hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. +- It is a flat seven-field SHA256, not a struct walk, so the Layer 1 omitempty flip leaves it untouched — it has no pending v1→v2 event. Its registry slot stays `computeRes1` until its inputs genuinely change. + +**Decision:** name the field for the general case now (`lock-content-version`); wire fingerprint replay in Layer 2's first PR; reserve resolution replay (slot present, prior fn reused) and wire it the day `ComputeResolutionHash` first changes — a localized follow-up with no schema change. This fixes the one irreversible thing (the persisted key name) without speculative code (KISS/YAGNI on the second replay). + ## Design decisions ### D1 — `Includable` vs `IgnoreZeroValue` @@ -283,34 +397,37 @@ Fully implicit (omitempty default, no tags, no audit) was rejected — it remove ### D3 — Content version vs format version in the lock -Reusing `ComponentLock.Version` for the algorithm would force a format-version bump (and the strict `Parse` gate would reject old locks outright). A separate `FingerprintVersion` keeps the format stable and old locks readable, enabling lazy migration instead of hard rejection. +Reusing `ComponentLock.Version` for the algorithm would force a format-version bump (and the strict `Parse` gate would reject old locks outright). A separate `LockContentVersion` keeps the format stable and old locks readable, enabling lazy migration instead of hard rejection. It is named for the *general* case — it versions every content hash the lock stores (`InputFingerprint` now, `ResolutionInputHash` when its replay is wired) — because the persisted TOML key is the one thing that is expensive to rename after ship. ### D4 — Method-on-type hashing Adopt the **hybrid seam**: pure `ConfigHash()` on the config type, combiner in `fingerprint`. A full move was rejected (layering regression: I/O + crypto + algorithm versioning do not belong on a data type). See [Research](#where-the-hashing-logic-should-live). +Two constraints keep the seam from eroding back into the rejected methods-on-type design: **`ConfigHash()` must stay version-frozen** (it computes exactly one algorithm; it does *not* dispatch over versions — a single method "can't replay its own past"), and **the combiner is the sole version authority.** Version dispatch lives entirely in `fingerprint`'s registry; `ConfigHash()` is just the current pure-config step it calls. Keep `ConfigHash()` unexported-or-narrow if practical, so callers cannot route around the registry to get a raw, version-agnostic hash. + ## Alternatives considered - **Global `IgnoreZeroValue`** — see D1. Same default behavior but no per-field escape hatch for meaningful zeros and no point-of-definition audit. Rejected. - **Implicit omitempty (no mandatory tags, no audit)** — see D2. Removes the only guard against the unsafe false-negative direction. Rejected in favor of mandatory 3-way tags. -- **Content-hash the rendered config** (Go-modules style) instead of struct-hashing — would sidestep field-shape sensitivity, but we deliberately exclude many fields (`paths`, `publish`, snapshots) from the fingerprint, so a blanket content hash over-captures. Rejected. +- **Content-hash the rendered config** (Go-modules style) instead of struct-hashing. The naive version of this — "hash all the bytes" — over-captures, since we deliberately exclude many fields (`paths`, `publish`, snapshots) from the fingerprint. The *stronger* form is a **canonical-projection hash**: serialize only the included fields, keys sorted, and hash those bytes — immune to field-shape drift without per-field reflection tags. We still stay with `hashstructure` + `Includable` because our inclusion policy is **conditional** (omitempty = include-if-non-zero, evaluated on the resolved value), which a static byte serializer would have to re-implement anyway — so the projection hash buys field-shape immunity at the cost of reimplementing the very predicate `Includable` already gives us, plus a second serialization format to keep stable forever. Rejected on that basis, but recorded as the principled alternative; it is the one foundational choice that would be expensive to reverse post-adoption. - **Parallel versioned structs with per-struct `Hash()`** — couples locks to Go type identity and duplicates hashing logic per version. Rejected in favor of Layer 2's integer-versioned combiner + Layer 3 canonical migration. - **Bump lock format `Version` and migrate eagerly** — eager migration rewrites every lock at once, the exact mass-churn we are trying to avoid. Rejected in favor of lazy per-component re-stamp. ## Incremental delivery -1. **PR A (Layer 1)**: shared `includeFingerprintField` helper + `HashInclude` on `ComponentConfig` and `PackageConfig`; tag every fingerprinted field with one of `-`/`omitempty`/`always`; rewrite the field-decision audit to assert valid-tag presence and drop the `expectedExclusions` registry. **Note:** flipping the default moves every hash, so PR A must land *with or after* PR B's version machinery — it registers as `computeV2`, not as a standalone change. Unit test: an unset `omitempty` field is hash-invisible; setting it drifts; an `always` field drifts even at zero. -2. **PR B (Layer 2)**: `FingerprintVersion` on `ComponentLock`; version-dispatched `ComputeIdentity`; replay + re-stamp in `checkFingerprintFreshness` and `update.go`. Unit test: old-version lock with unchanged inputs → `Current`; changed inputs → `Stale`; re-stamp on update. +1. **PR A (Layer 1)**: shared `includeFingerprintField` helper + a delegating value-receiver `HashInclude` on **every** fingerprinted struct (all ~10 registered in `fingerprint_test.go`, not just `ComponentConfig`/`PackageConfig` — see the per-struct resolution note in Layer 1); tag every fingerprinted field with one of `-`/`omitempty`/`always`; rewrite the field-decision audit to (a) assert valid-tag presence and (b) assert every registered struct implements `Includable`, then drop the `expectedExclusions` registry. **Note:** flipping the default moves every hash, so PR A must land *with or after* PR B's version machinery — it registers as the `computeFP2` algorithm, not a standalone change. Unit tests: a zeroed `omitempty` field hashes **equal to its absence-equivalent** (not merely "setting it drifts" — that positive-direction test passes even if `HashInclude` is a no-op, so it must be paired with the zero-equals-absent assertion that actually exercises omission); an `always` field drifts even at zero. +2. **PR B (Layer 2)**: `LockContentVersion` on `ComponentLock` (+ `ComponentLockData` and `populateFromLock`, so the replay site can read the version); a paired version registry (fingerprint + resolution compute fns) with a `minSupportedLockContentVersion` floor; fingerprint replay-before-`Changed` in `update.go`; fingerprint replay in `checkFingerprintFreshness` **and `BuildDirtyChange`** (same replay logic, two call sites). Resolution-hash replay is *reserved* — the registry slot reuses `computeRes1`; not wired until `ComputeResolutionHash` first changes. Unit tests: old-version lock with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write. 3. **PR C (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) — set a new `omitempty` field on a single component and assert only that lock drifts. 4. **PR D (Layer 3, later)**: `schema-version` field, load-time canonical migration, `ComponentConfig.ConfigHash()` seam. Gated on the first real non-additive TOML change. -Each PR is independently revertible. Because the Layer 1 default flip is a hash-moving change, PRs A and B ship together (or B first); the `fingerprint-version` omitempty stamp and churn policies ensure existing locks see zero diff until independently touched. Layer 3 migrates lazily on next write. +Each PR is independently revertible. Because the Layer 1 default flip is a hash-moving change, PRs A and B ship together (or B first); the `lock-content-version` omitempty stamp and churn policies ensure existing locks see zero diff until independently touched. Layer 3 migrates lazily on next write. ## Open questions -1. How many historical fingerprint versions should the registry retain before dropping the oldest? (Trade-off: replay coverage vs. dead code.) -2. Should a lazy re-stamp during a *read-only* command (`render`, `build` freshness check) write the lock back, or defer all writes to `component update`? Writing on read is surprising; deferring means freshness checks stay slightly slower until the next update. -3. For Layer 3, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. -4. Should `omitempty` semantics use `reflect.Value.IsZero()` (Go's notion) or a config-aware notion of "unset" (e.g. nil pointer vs empty string)? Pointers would make "set to empty" expressible but complicate the structs. -5. Do we want a `component update --rehash` escape hatch that force-advances `FingerprintVersion` across the whole project (for when a change *is* intended to be global)? -6. Can the audit go further than tag-presence and *statically* flag fields whose zero value is likely meaningful (e.g. a `bool` defaulting true) and nudge toward `always`? Or is the point-of-definition tag plus code review sufficient? +1. Should a lazy re-stamp during a *read-only* command (`render`, `build` freshness check) write the lock back, or defer all writes to `component update`? Writing on read is surprising; deferring means freshness checks stay slightly slower until the next update. (Leaning: defer all writes to `update`, keeping reads side-effect-free.) +2. For Layer 3, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. +3. Should `omitempty` semantics use `reflect.Value.IsZero()` (Go's notion) or a config-aware notion of "unset" (e.g. nil pointer vs empty string)? Pointers would make "set to empty" expressible but complicate the structs. +4. Can the audit go further than tag-presence and *statically* flag fields whose zero value is likely meaningful (e.g. a `bool` defaulting true) and nudge toward `always`? Or is the point-of-definition tag plus code review sufficient? +5. Should the mixed-toolchain hazard get a hard write-time guard (refuse to write a lock whose on-disk version exceeds the binary's `currentLockContentVersion`), or is the CI version-pin invariant enough? + +*Resolved in-text (recorded here so they aren't re-litigated):* registry retention is a **floor**, not "last N" (M8 / Registry floor); `--rehash` is the sanctioned forced-migration pass (promoted from a question to a requirement); absent `LockContentVersion` reads as `1`; one shared `lock-content-version` covers both stored hashes, with resolution-hash replay reserved (slot present, fn reused) until `ComputeResolutionHash` first changes. From 3ed59c0f9c4eef3c657f2c6e7ae4b81e915be0d0 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Fri, 5 Jun 2026 16:08:35 -0700 Subject: [PATCH 03/15] update 2 --- docs/developer/rfc/lazy-schema-migration.md | 480 ++++++++++++-------- 1 file changed, 288 insertions(+), 192 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index af24f6da..8e9a56c6 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -1,15 +1,17 @@ -# RFC 002: Lazy Schema Migration for Lock-File Fingerprints +# RFC 002: Lock-File Fingerprint Reset and Lazy Schema Migration - **Status**: Draft - **Author**: @damcilva - **Created**: 2026-06-04 - **Related code**: - - [`internal/fingerprint/fingerprint.go`](../../../internal/fingerprint/fingerprint.go) — `ComputeIdentity`, `combineInputs` - - [`internal/lockfile/lockfile.go`](../../../internal/lockfile/lockfile.go) — `ComponentLock`, version gate + - [`internal/fingerprint/fingerprint.go`](../../../internal/fingerprint/fingerprint.go) — `ComputeIdentity`, `ComputeResolutionHash`, `combineInputs` + - [`internal/lockfile/lockfile.go`](../../../internal/lockfile/lockfile.go) — `ComponentLock`, `Parse` format-version gate - [`internal/projectconfig/fingerprint_test.go`](../../../internal/projectconfig/fingerprint_test.go) — field-inclusion audit - - [`internal/app/azldev/core/components/resolver.go`](../../../internal/app/azldev/core/components/resolver.go) — `computeFreshnessStatus`, `BuildDirtyChange` + - [`internal/app/azldev/core/components/resolver.go`](../../../internal/app/azldev/core/components/resolver.go) — `computeFreshnessStatus`, `checkFingerprintFreshness` - [`internal/app/azldev/cmds/component/update.go`](../../../internal/app/azldev/cmds/component/update.go) — `Changed` decision, re-stamp write - - [`internal/app/azldev/core/sources/synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go) — `FindFingerprintChanges` (synthetic changelog/release) + - [`internal/app/azldev/cmds/component/changed.go`](../../../internal/app/azldev/cmds/component/changed.go) — `classifyComponent`, `haveMatchingFingerprints` (CI classification) + - [`internal/app/azldev/core/sources/synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go) — `FindFingerprintChanges`, `BuildDirtyChange` (synthetic changelog/release) + - [`internal/app/azldev/core/sources/sourceprep.go`](../../../internal/app/azldev/core/sources/sourceprep.go) — `computeCurrentFingerprint` ## Background @@ -46,11 +48,11 @@ Drift is detected in [`resolver.go`](../../../internal/app/azldev/core/component As the tool matures, three *independent* notions of "version" are emerging. Conflating them is the source of the problems in this RFC: -| Axis | Versions what | Lives where | Exists today? | -| ---- | ------------- | ----------- | ------------- | -| **Config schema version** | on-disk TOML field shape | load / migration layer | No | -| **Lock content-hash version** | how inputs fold into the lock's stored hashes (`InputFingerprint` *and* `ResolutionInputHash`) | `fingerprint` combiner | No (implicitly v1) | -| **Lock file format version** | lock file serialization | `lockfile` | Yes (`Version = 1`) | +| Axis | Versions what | Lives where | Exists today? | Forced-migration verb | +| ---- | ------------- | ----------- | ------------- | --------------------- | +| **Config schema version** | on-disk TOML field shape | load / migration layer | No | `config migrate` (future) | +| **Lock content-hash version** | how inputs fold into the lock's stored hashes (`InputFingerprint` *and* `ResolutionInputHash`) | `fingerprint` combiner | No (implicitly v1) | `component migrate` | +| **Lock file format version** | lock file serialization | `lockfile` | Yes (`Version = 1`) | — (frozen at `1`) | ### The problem @@ -62,13 +64,33 @@ Concretely: we add field `foo` and set `foo = "baz"` on package `bar`. The desir There is a harder variant lurking behind the additive case: **non-additive** schema changes — renaming a field, removing one, changing a baked-in default, or fixing a bug in the hashing logic itself. These legitimately change the *meaning* of the config without changing user intent, and we will eventually need to absorb them without forcing every consumer to rebuild. +### The substrate problem: replay only works if old algorithms stay frozen + +The natural fix for non-additive change is **versioned replay**: stamp an algorithm version into the lock, keep the old algorithm around, and when a lock is behind, recompute with *its* algorithm to ask "were the inputs actually unchanged, or did only the encoding move?" If unchanged, accept the lock without a rebuild. + +Replay only works if an old algorithm function can faithfully reproduce the hash it produced when the lock was written. **On the current `hashstructure` substrate, it cannot** — a "frozen" algorithm function is not actually frozen: + +- Its body is `hashstructure.Hash(component, …)`, which **reflects over the live Go struct**. Add a field later and the old function now sees that field (at zero value, included) → its output moves → it can no longer reproduce the historical hash. So *adding* a field breaks *replay of older versions*, which is exactly the additive case we are trying to make free. +- It also resolves the live **method set**: once `ComponentConfig` implements `Includable`, the same `hashstructure.Hash` call silently switches inclusion behavior, with no per-call opt-out (the interface is resolved automatically). + +The consequence is sharp: an incremental "flip the default to omitempty, lazily migrate" plan **cannot keep its central promise.** "Additive fields are drift-neutral by construction" holds only for locks already at the new version; for the older locks that lazy migration deliberately leaves alone, the next field addition forces a hash change anyway. You do not avoid the mass rebuild — you defer it to the first field addition, and you build the whole replay apparatus on a substrate that makes replay unsound. + +### The opportunity: a coordinated cutover is already scheduled + +The project has a **dev→prod environment cutover** coming that forces a full rebuild regardless. This is a *coordinated cutover* — a one-time, distro-wide switch with no mixed-version window, the sanctioned moment to make changes that cannot be made lazily. That changes the calculus completely. The entire "lazy" framing exists to *avoid* a mass update; if exactly one sanctioned mass update is already on the calendar, the strategy inverts: + +> **Lazy migration is for the cheap and additive. The one free rebuild is a budget — spend it exclusively on the one-way doors that are cheap now and a coordinated-cutover-only change later.** + +This RFC therefore has two parts: **(1)** a one-time **reset** at the dev→prod cutover that replaces the hashing substrate with one whose old algorithms are *genuinely* frozen, and **(2)** a **post-reset lazy migration** mechanism (versioned registry + replay) that rides that clean substrate for the rare genuine algorithm change thereafter. Part 2 is what the original "lazy" design was reaching for; part 1 is what makes it sound. + ### Goals -- **G1 (primary, non-functional): no spurious lock-file diffs.** Landing a config-schema or hashing change must not rewrite `*.lock` files for components whose effective inputs are unchanged — not even to bump a version field. Soft requirement (strongly preferred, not a hard gate), but it shapes which solutions are acceptable: it rules out any eager "migrate everything" pass. -- **G2: only real changes drift.** A lock changes iff that component's build-effective inputs changed. -- **G3: piecemeal, lazy migration.** Schema/algorithm evolution rolls out per-component, riding along with independent changes, never as a big-bang. -- **G4: additive fields are drift-neutral by construction.** Adding an unset field should be invisible to every existing lock with no author effort beyond declaring intent. -- **G5: correctness backstop preserved.** Never silently under-rebuild: a genuine input change must always drift its lock. +- **G1 (primary, non-functional): no spurious lock-file diffs *after the reset*.** Once prod locks exist, landing a config-schema or hashing change must not rewrite `*.lock` files for components whose effective inputs are unchanged. The reset itself is the *one* sanctioned exception, absorbed by the already-scheduled rebuild. +- **G2: only real changes drift.** Post-reset, a lock changes iff that component's build-effective inputs changed. +- **G3: piecemeal, lazy migration post-reset.** Genuine algorithm evolution after the reset rolls out per-component, riding independent changes, never as a big-bang. +- **G4: additive fields are drift-neutral by construction — *truly*, not just for new locks.** On the projection substrate (below) an unset additive field is invisible to *every* lock including old ones, because old algorithms pin an explicit field list and never reflect over the live struct. +- **G5: correctness backstop preserved.** Never silently under-rebuild: a genuine input change must always drift its lock. Replay may accept encoding/over-capture changes; it must never mask a behavior-changing one. +- **G6 (new, hard): back-compatible reads for synthetic history.** The new binary must still **read** pre-reset locks across git history (synthetic changelog/release walks them), even though it **writes** only the new format. Reading never recomputes a historical hash — it compares stored strings only. ## Problem inventory @@ -79,8 +101,9 @@ There is a harder variant lurking behind the additive case: **non-additive** sch | 3 | No way to evolve the hashing algorithm (bugfix, input reorder) without drift | `combineInputs` has no version; old and new outputs are incomparable | Mass rebuild + lock churn | | 4 | No on-disk config schema version | `ConfigFile` has a `$schema` URL but no version field | Blocks managed migration | | 5 | Migration is all-or-nothing | Freshness check is binary match/no-match against one stored hash | No piecemeal rollout | +| 6 | Versioned replay is unsound on the current substrate | "Frozen" algorithm = `hashstructure.Hash` over the **live** struct/method-set; adding a field moves the old function's output | Replay cannot reproduce historical hashes | -Problems 1–3 share a shape: a change that *should* be invisible to most components is forced to be visible to all of them, because the fingerprint cannot distinguish "input changed" from "encoding changed." Problem 4 is the missing primitive for managed config evolution. Problem 5 is the property we actually want from any solution — **per-component, lazy** migration, where a lock upgrades only when something independently touches it. +Problems 1–5 share a shape: a change that *should* be invisible to most components is forced to be visible to all of them, because the fingerprint cannot distinguish "input changed" from "encoding changed." Problem 4 is the missing primitive for managed config evolution. Problem 5 is the property we want from any post-reset solution — **per-component, lazy** migration. Problem 6 is the one that kills the *incremental* path outright: the very mechanism that would make problems 1–3 free (versioned replay) is unsound while the substrate reflects the live struct. Fixing 6 is what the reset buys. ## How fingerprinting works today (detail) @@ -98,9 +121,15 @@ Two properties of `hashstructure` v2.0.2 are load-bearing for this RFC: 1. **No per-field `omitempty`.** The only field tags it recognizes are `-`/`ignore` (skip) and `set`/`string` (encoding). A zero-value field is hashed; it is not skipped. 2. **It honors the `Includable` interface.** If the value (or a pointer to it) implements `HashInclude(field string, v interface{}) (bool, error)`, the walker calls it per field and omits the field when it returns `false`. **An omitted field hashes identically to a field that was never declared.** There is also a global `IgnoreZeroValue` option that skips *all* zero-value fields. -The struct's type name *is* part of the hash (`hashstructure` mixes in `reflect.Type.Name()`), but that name does not change when fields are added, so it is irrelevant to drift. +The struct's type name *is* part of the hash (`hashstructure` mixes in `reflect.Type.Name()`), so a rename of the Go type moves every hash even when content is byte-identical. + +**Why this substrate cannot host frozen replay.** Every property above is resolved *at hash time against the live program*, not against a pinned description of the v1 encoding: -One constraint: the top-level value passed to `hashstructure.Hash` is not addressable, so an `Includable` implementation must use a **value receiver** to be seen for the root struct. +- The set of fields walked is whatever the struct has *now* — add a field, and last year's `computeFP1` (whose body is still just `hashstructure.Hash(component)`) now includes it. +- Whether `Includable` is consulted depends on whether the type implements it *now* — not on what was true when v1 locks were written. +- A `value` vs `pointer` receiver subtlety even decides whether the root struct's `HashInclude` is seen at all (the top-level value is not addressable). + +A function meant to be "the v1 algorithm, forever" therefore changes meaning every time the struct or its method set changes. That is the disqualifier for the incremental plan (Problem 6) and the motivation for the projection substrate below, whose v1 function pins an explicit field list and is immune to all three. ## Change taxonomy @@ -108,21 +137,28 @@ Not every config change should be treated the same way. The right mechanism depe | Class | Example | Should unaffected locks drift? | Mechanism | | ----- | ------- | ------------------------------ | --------- | -| **Additive field** | new `foo` field, unset on most components | No — only setters drift | Default omitempty (Layer 1); no version bump | -| **Additive with non-zero default** | new field defaulted to `"auto"` via defaults merge | No | Algorithm version + replay (Layer 2) | -| **Rename / move** | `foo` → `bar`, same semantics | No | Schema migration → canonical hash (Layer 3) + Layer 2 | -| **Semantic change** | meaning of `foo` changes; output differs | Yes — that's correct | None; drift is intended | -| **Hashing bugfix** | overlay ordering bug in `combineInputs` | No | Algorithm version + replay (Layer 2) | -| **Field removal** | drop deprecated `foo` | No, if nobody set it | Migration drops field; Layer 2 for setters | +| **Additive field** | new `foo` field, unset on most components | No — only setters drift | **Free, no bump.** Add `foo` to the current `projectVN` as omit-if-zero; a component that leaves it unset emits identical bytes, so no shipped hash moves. Setters drift (correct). | +| **Additive with non-zero default** | new field defaulted to `"auto"` via defaults merge | No | **Bump + replay.** The default resolves non-zero on *every* component, so it is emitted everywhere and would move every hash — omit-if-zero can't save it. Ship `projectV(N+1)` that emits it; old locks **replay at their version** (which didn't emit it), match their stored digest → recognized unchanged → lazy re-stamp, no rebuild. | +| **Rename / move** | `foo` → `bar`, same semantics | No | **Schema migration + bump + replay.** Migrate old TOML → canonical struct (the rename lands in the struct), then ship `projectV(N+1)` that emits the renamed field. Old locks replay at their version and are recognized unchanged → lazy re-stamp, no rebuild. | +| **Semantic change** | meaning of `foo` changes; output differs | Yes — that's correct | **None.** The build output genuinely differs, so the lock *should* drift. Replay at the old version would (correctly) mismatch → `Stale` → rebuild. Nothing to suppress. | +| **Hashing bugfix** | overlay ordering bug in the combiner | No | **Bump + replay.** Ship the fixed combiner as the version-`N+1` half of `computeFP(N+1)`; old locks replay at the old (buggy) version. If their inputs are unchanged the buggy digest still matches → recognized unchanged → lazy re-stamp to the fixed version, no rebuild. | +| **Newly measured input** | start folding in a new overlay source or identity element | No | **Bump + replay.** A non-config input is added in the combiner half of `computeFP(N+1)` (a config field would go in `projectV(N+1)`). Old locks replay at their version, which didn't fold it in, match their stored digest → recognized unchanged → lazy re-stamp, no rebuild. **Caveat:** until a lock migrates, replay is *blind* to the new input, so a change to it reads as fresh (false-fresh) — if it is build-critical, force a `component migrate` pass instead of riding lazy adoption (see [churn-avoidance](#churn-avoidance-policies-g1)). | +| **Field removal** | drop deprecated `foo` | No, if nobody set it | **Deprecate-then-delete (+ bump for setters).** Bump to a `projectV(N+1)` that stops emitting `foo` but **keep the field on the struct** so the old `projectVN` can still read it for replay. Only after the floor passes that version (ideally after a `component migrate`) physically delete the field. Setters drift on the bump; non-setters replay clean. | + +The recurring requirement across the "No" rows is the same: **distinguish a change in user intent from a change in encoding, and only drift on the former.** Note the first row: on the projection substrate, a new field is added to `projectVN` as *omit-if-zero*, so a component that does not set it emits identical bytes and stays hash-neutral — *for every lock, old or new*, because old configs never set the brand-new field. Adding it does not move any existing hash (no shipped lock set it), so it needs no version bump. Part 2 then carries only the genuinely hard cases (rows 2, 5, and post-reset renames/removals). The shared move in every "Bump + replay" row is the same primitive — **increment the content version, keep the old `projectVN` as a frozen replay function, and let unchanged locks re-stamp lazily** — detailed in [Part 2](#part-2--post-reset-lazy-migration). -The recurring requirement across the "No" rows is the same: **distinguish a change in user intent from a change in encoding, and only drift on the former.** Note the first row: with omitempty as the *default* (Layer 1), additive fields need no version bump and no replay at all — they are hash-neutral by construction. Layer 2 then carries only the genuinely hard cases (rows 2, 5). +> **`projectVN`** is shorthand used throughout this RFC for the hand-written *projection function* introduced by this design (defined in [Substrate options](#substrate-options) and [The projection substrate](#the-projection-substrate)). The `N` is the lock content version: `projectV1` is the function that names and serializes exactly the fields content-version 1 measures, `projectV2` the next algorithm, and so on. Each `projectVN` is frozen once shipped — that is the whole point. ## Research -### `hashstructure` options +### Substrate options -- **`Includable` (per-field callback)** keeps existing hashes byte-identical: fields that don't opt into omission hash exactly as they do today. This is the only option that solves Problem 1 *without* itself triggering a mass rebuild. -- **`IgnoreZeroValue` (global)** is simpler to wire but flips the hash of *every* struct that has any zero-value field — i.e. it is itself a mass-rebuild event, and it removes our ability to say "this empty field is meaningful." Rejected for the default path. +Two substrates can produce a content fingerprint of the resolved config. The difference that matters here is **whether an old algorithm function can be frozen.** + +- **`hashstructure` + `Includable` (rejected as the substrate).** Keeps existing hashes byte-identical and gives per-field omission via `HashInclude`. But, as established above (Problem 6), a function built on `hashstructure.Hash` reflects over the live struct and method set, so it cannot be a frozen historical algorithm. It also requires a value-receiver `HashInclude` on *every* nested fingerprinted struct and a subtle `v.(reflect.Value)` type-assert to work at all — brittle plumbing in service of a substrate that still can't host sound replay. +- **Canonical projection + stdlib hash (chosen).** Split the two jobs `hashstructure` fuses — *field selection* and *hashing* — into explicit steps. A `projectVN` function names the exact fields version N measures, emits them in a canonical, sorted, self-delimiting byte form, and an stdlib `sha256` hashes those bytes. Because `projectV1` references an **explicit, pinned field list**, it does not see fields added later, does not depend on the type's method set, and does not depend on receiver subtleties. It is a genuinely frozen pure function — the property replay requires. The cost is owning a small projection encoder plus **golden hash vectors** per version (a checked-in `(config, version) → hash` table) so "frozen" is CI-enforced, not merely intended. + +The projection substrate is what makes G4 true for old locks and what makes Part 2's replay sound. It is adopted at the reset (below), not incrementally. ### How other tools version lock state @@ -131,207 +167,259 @@ The recurring requirement across the "No" rows is the same: **distinguish a chan - **Terraform state** stores a `version` and a `terraform_version`; state is upgraded forward on use, never downgraded. - **Go modules** avoid the problem entirely by hashing *content* (`h1:` dirhashes) rather than a struct shape, so adding metadata fields never perturbs existing sums. -The common pattern: an **integer version stamped into the persisted artifact**, plus the ability to **read and replay older versions**, plus **lazy forward-migration on write**. Our `ComponentLock.Version` already provides the slot; today we only ever reject mismatches instead of migrating. +The common pattern: an **integer version stamped into the persisted artifact**, plus the ability to **read and replay older versions**, plus **lazy forward-migration on write**. We keep `ComponentLock.Version` (the lock *format* slot) fixed at `1` and carry the *content* version **inside the `InputFingerprint` token** (`v:sha256:…`) rather than in a separate struct field — one atomic value, no version/digest desync, no new TOML field for an old binary to mishandle. The Go-modules lesson is the deepest one: hashing *content* rather than struct shape is what makes additive metadata free — the canonical-projection substrate is our version of that lesson. ### Where the hashing logic should live -A natural question (raised during design) is whether to move hashing onto the config types as a method. The hashing logic decomposes into two separable jobs: +With the projection substrate the fingerprint algorithm decomposes into two steps. **Both are versioned together** by the single lock content version — the version pins the *entire* fingerprint computation, not just the field list: + +1. **Projection** — `projectVN(config)` names and serializes the config fields version N measures. This is *about the config type*, but it is data extraction, not hashing: it returns canonical **bytes**, not a hash. +2. **Combiner / orchestration** — reads overlay file contents (needs `opctx.FS`), folds in source identity / releasever / bump, applies domain separation, and runs `sha256` over the projection bytes plus those non-config inputs. None of these are config fields, but the combiner equally decides *what is measured*: starting to fold in a new overlay source, adding an identity input, or reordering the fold all change the digest exactly as a projection change does. -1. **Pure config hash** — `hashstructure.Hash(component, …)` plus field-inclusion policy. This is genuinely *about the config type*; `HashInclude` is already a method on it. -2. **Combiner / orchestration** — reads overlay file contents (needs `opctx.FS`), folds in source identity / releasever / bump, applies domain separation, and (Layer 2) selects an algorithm version. None of these are config fields. +So the per-version compute function in the registry is the **whole algorithm** — `computeFPN` = `projectVN` + the combiner step frozen at version N. "Watching another field" splits cleanly: if it is a *config* field, it goes in `projectV(N+1)`; if it is a *non-config* input (a new overlay source, a new identity element), it goes in the combiner half of `computeFP(N+1)`. Either way it is a content-version bump absorbed by replay, never a silent hash move. The combiner is the **sole version authority**: it owns the registry and the dispatch, and `projectVN` is just the frozen config-extraction step it calls. -Moving (1) onto the type improves cohesion and version-locality. Moving (2) onto the type would drag I/O and cross-cutting algorithm versioning into `projectconfig` (a pure data package that `lockfile` imports), and would scatter the centralized field-inclusion audit. The combiner must own algorithm versioning because "I changed how overlays fold in" is not a per-type concern. **Recommendation: a hybrid seam** — expose `ComponentConfig.ConfigHash()` on the type; keep the combiner in `fingerprint`. +Expose the projection on (or beside) the config type and keep the combiner in `fingerprint`. **Do not** expose a `ConfigHash()` method on the type: a method that returns a finished hash both drags a hashing concern onto a data type *and* tempts callers to route around the version registry to get a raw, version-agnostic hash. Returning bytes from `projectVN` keeps the type ignorant of versioning and crypto. ## Proposed approach -The design is **layered**, not a single switch. Each layer is independently shippable and addresses a distinct row of the taxonomy. Layers 1 and 2 cover the immediate need (Problems 1–3); Layer 3 is the forward-looking config-schema-version axis (Problem 4) and can follow later. +The design has **two parts** with very different cost profiles: -### Layer 1 — Omitempty as the default inclusion policy +1. **Part 1 — the reset (one coordinated cutover).** At the dev→prod cutover, swap the hashing substrate to canonical projection, declare the post-cutover projection as content-version **v1**, and spend the already-scheduled rebuild on every change that is *cheap now and a one-way door later* (the irreversible changes). Pre-reset locks already committed to **git history** stay readable and are never recomputed (the back-compat invariant below); a pre-reset lock in the **working tree** is force-rehashed to the `v1:` token on its first post-reset `update`. +2. **Part 2 — post-reset lazy migration (below).** A versioned registry + replay, now riding the *frozen* projection functions, absorbs the rare genuine algorithm change after the cutover, lazily and per-component, with no second coordinated cutover. -Today the safe default is *include-always*: a new field contributes to the hash even at zero value. We **flip the default to omitempty** (include only when non-zero) and make the inclusion policy an explicit, exhaustive, CI-enforced choice per field. +The original "lazy" instinct was right for Part 2 and wrong for Part 1: there is no way to make a substrate swap or a batch of one-way-door normalizations free, so they must ride the one rebuild we are already paying for. Everything that *can* be lazy (additive fields) is pushed into Part 2 and costs nothing. -Every fingerprinted field must carry one of three `fingerprint` tag values: +## Part 1 — The reset -| Tag | Meaning | When to use | -| --- | ------- | ----------- | -| `fingerprint:"omitempty"` | included **only when non-zero** (the new default) | almost all fields | -| `fingerprint:"always"` | included even at zero value | fields whose **zero value is build-meaningful** (e.g. a `bool` that defaults true, where `false` must rebuild) | -| `fingerprint:"-"` | excluded from the hash entirely | paths, publish routing, runtime state | +### The projection substrate -There is no untagged state. `TestAllFingerprintedFieldsHaveDecision` is rewritten to assert that **every** field of every fingerprinted struct carries a valid tag value — failing CI on any bare field. This is *simpler* than today's audit: it no longer maintains an `expectedExclusions` registry, it just checks for tag presence and validity. The conscious decision moves to the point of field definition, where the author has the context to judge whether zero is meaningful. +Replace `hashstructure.Hash(component, …)` with an explicit two-step pipeline: -**`Includable` is resolved per-struct — every fingerprinted struct needs the method.** `hashstructure` looks up `Includable` on each struct it walks (and the whole tree is non-addressable, since the root is passed by value), so a `HashInclude` on `ComponentConfig` alone governs only `ComponentConfig`'s own fields. On any nested struct that lacks its own value-receiver `HashInclude`, the `omitempty`/`always` tags are **decorative** — `hashstructure` natively understands only `-`/`ignore`/`set`/`string`, so the tag passes the CI audit while the field is still hashed at zero, and G4 silently holds only at the top level. The audit (`fingerprint_test.go` registers ~10 fingerprinted structs: `ComponentConfig`, `ComponentBuildConfig`, `CheckConfig`, `PackageConfig`, `ComponentOverlay`, `SpecSource`, `DistroReference`, `SourceFileReference`, `ReleaseConfig`, `ComponentRenderConfig`) must therefore **also assert that every registered struct implements `Includable`** — so a new fingerprinted struct cannot ship with inert tags. All registered structs get the one-line delegating method. +```text +ComponentConfig ──projectV1(cfg)──► canonical bytes ──sha256──► configHash + (explicit field list, (stdlib) + sorted keys, emit-if-nonzero) +``` -Implement `Includable` on each fingerprinted struct, delegating to one shared helper: +`projectV1` is hand-written and names exactly the fields v1 measures. It emits a canonical, sorted, self-delimiting byte stream (length-prefixed keys + values) so distinct field sets cannot collide, and it omits a field when its **resolved value is zero** (the omitempty behavior, now a property of the encoder, not a struct tag). A field whose zero value is build-meaningful is simply listed as *always-emit* in `projectV1`. -```go -// includeFingerprintField reports whether a field participates in the hash. -// "-" fields never reach here (hashstructure skips them first). "always" fields -// are included unconditionally; "omitempty" (the default) is included only when -// the resolved value is non-zero. -func includeFingerprintField(t reflect.Type, field string, val reflect.Value) (bool, error) { - sf, ok := t.FieldByName(field) - if !ok { - return true, nil - } - switch sf.Tag.Get("fingerprint") { - case "always": - return true, nil - default: // "omitempty" - return !val.IsZero(), nil - } -} +Three things this buys that `hashstructure` could not: + +- **Frozen by construction.** `projectV1` references a pinned field list, so adding `Foo` to the struct later is invisible to it — `projectV1`'s output for an old config is unchanged. This is what makes Part 2's replay sound (Problem 6) and G4 true for *old* locks, not just new ones. +- **No method-set / receiver magic.** No `Includable`, no per-nested-struct method, no `v.(reflect.Value)` type-assert footgun. Field selection is ordinary code. +- **Golden-vector enforced.** A checked-in table of `(config, version) → hash` vectors is asserted in CI, so any accidental change to a historical `projectVN` fails the build. "Frozen" stops being a promise and becomes a test. + +The cost is owning the projection encoder and the golden vectors. That cost is paid once, at the reset, against a rebuild we are already doing. + +### Baseline v1 — omit-if-zero, no include-always legacy + +Because the reset rebuilds everything, there is **no pre-existing population to stay byte-compatible with.** That removes the single biggest constraint of the incremental plan: we do **not** need an `include-always` compatibility mode to preserve today's hashes. `projectV1` is the omit-if-zero projection from day one. There is no `computeFP1 = legacy include-always` entry to carry forever — the registry's floor *starts* at the clean projection. -// Value receiver: the root struct passed to hashstructure.Hash is not addressable. -// -// CRITICAL: hashstructure calls HashInclude(field, innerV) where innerV is -// ALREADY a reflect.Value (the field's value), boxed into the interface{}. -// So we must TYPE-ASSERT it, not reflect.ValueOf it. reflect.ValueOf(v) would -// describe the reflect.Value struct itself (always non-zero) → !IsZero() always -// true → omitempty silently never fires and Layer 1 no-ops. Verified against -// hashstructure v2.0.2 hashstructure.go:346 (`include.HashInclude(name, innerV)`). -func (c ComponentConfig) HashInclude(field string, v interface{}) (bool, error) { - return includeFingerprintField(reflect.TypeOf(c), field, v.(reflect.Value)) +```go +// projectV1 emits the canonical byte form of the fields v1 measures. +// Field selection is explicit code, not reflection — this is what freezes it. +// emit() length-prefixes key+value so distinct field sets cannot collide; +// it skips a field when the resolved value is zero (the omit-if-zero default). +func projectV1(c *ComponentConfig) []byte { + var b canonicalBuf + b.emit("upstream", c.Upstream) // omit-if-zero + b.emit("patches", c.Patches) // omit-if-zero (nil and [] both → absent) + b.emitAlways("strip_debug", c.StripDebug) // always: zero (false) is build-meaningful + // … one line per measured field, in a fixed order … + return b.Bytes() } ``` -**Why flipping the default is safe — fingerprints see the resolved config.** The usual objection to blanket omitempty is the false-negative footgun: a field whose zero is meaningful gets omitted and collides with "unset," so two semantically different configs hash the same and a rebuild is missed. That objection assumes we hash *raw user input*. We do not. `ComputeIdentity` runs on the **resolved, post-merge** config (`*result.config`, after defaults are applied). The omit predicate is therefore "the *resolved value* equals Go-zero," not "the user didn't type it." Consequences: +**Why omit-if-zero is safe — fingerprints see the resolved config.** The usual objection to blanket omit-if-zero is the false-negative footgun: a field whose zero is meaningful gets omitted and collides with "unset," so two semantically different configs hash the same and a rebuild is missed. That objection assumes we hash *raw user input*. We do not. `ComputeIdentity` runs on the **resolved, post-merge** config (`*result.config`, after defaults are applied). The omit predicate is therefore "the *resolved value* equals Go-zero," not "the user didn't type it." Consequences: - Two configs that both resolve a field to zero build identically → hashing them the same is **correct**, not a collision. -- "Unset" never reaches the hasher — it has already been resolved to its default. If the default is non-zero, the field is non-zero and is included anyway. If the default *is* zero, then unset and explicit-zero resolve identically → same build → same hash → correct. +- "Unset" never reaches the hasher — it has already been resolved to its default. If the default is non-zero, the field is non-zero and is emitted anyway. If the default *is* zero, then unset and explicit-zero resolve identically → same build → same hash → correct. + +So the classic false-negative requires absence ≠ zero-default *at the point of hashing*, and post-merge resolution closes that gap. The load-bearing invariant is **G5's guarantee restated structurally: the fingerprint must see exactly the build-effective resolved config.** That invariant must already hold, or fingerprinting is broken independently of this change. `emitAlways` is the escape hatch for the rare field whose zero value is build-meaningful. + +**Result:** additive fields are drift-neutral **by construction** (G4) — a newly added field, listed omit-if-zero in `projectVN`, emits nothing for any component that does not set it, so it is invisible to every lock that leaves it unset, old or new. Adding it moves no existing hash (no shipped lock could have set a field that did not yet exist), so it needs no version bump. Only setters drift (G2). + +#### Edge cases under omit-if-zero + +- **Meaningful zero with a non-zero default** (e.g. `int Jobs` defaulting to `4`, where `0` means serial). Post-merge: unset → `4` (emitted), explicit `0` → omitted. These build differently *and* hash differently, so there is no collision — they are consistent. Use `emitAlways` only if a zero value must be distinguishable from a future change of default. +- **nil vs empty slice.** A missing TOML key → nil → omitted; `key = []` → non-nil empty → emitted. For any slice/map field where an explicit-empty value is reachable and build-meaningful, use `emitAlways` so nil and empty both hash. + +### The reset load-out — what to spend the free rebuild on + +The reset rebuild is a budget. Spend it on the irreversible / cutover-only changes; **do not** spend it on anything Part 2 can do lazily for free. Priority order: -So the classic false-negative requires absence ≠ zero-default *at the point of hashing*, and post-merge resolution closes that gap. The load-bearing invariant is **G5's guarantee restated structurally: the fingerprint must see exactly the build-effective resolved config.** That invariant must already hold, or fingerprinting is broken independently of this change. The `fingerprint:"always"` escape hatch (plus the mandatory-tag audit) is cheap insurance against the invariant silently drifting later — e.g. if someone applies a default *after* fingerprinting. +1. **Switch the substrate to canonical projection.** Foundational, one-way, enables everything else. (Above.) +2. **Establish `projectV1` as omit-if-zero with no include-always legacy.** The compatibility mode never enters the registry, so it never has to age out. +3. **Keep the lock *format* `Version` at `1` — the content-version token carries the reset.** The reset adds **no new TOML field** (the atomic token in item 4 reuses `InputFingerprint`) and touches **no** pinning field (`upstream-commit`, `import-commit`, `manual-bump`), so an old binary still parses a reset lock and reads everything it needs to *queue a build*. The substrate swap rides entirely on the content-version machinery (Part 2): pre-reset locks carry a legacy (prefix-less) token below the registry floor, and the reset is simply the **first forced upgrade** of the fleet to the `v1:` token. This also makes the one real mixed-toolchain risk self-correcting: if an old binary ever rewrites a reset lock with its legacy-substrate hash, the next new-binary run sees a sub-floor token and **force-rehashes** it back to `v1` — a clean forced upgrade, never silent corruption (next subsection). +4. **Adopt an atomic, self-describing `v1:sha256:…` token** for the stored hash, so the version and the digest can never desync (closes the re-stamp/desync class of bug where the version field and the hash field are written independently). +5. **Unify on `sha256` everywhere**, retiring the `uint64`→decimal-string wart from the `hashstructure` era. One hash format, one encoding. +6. **Do every pending rename / default-normalization now.** Renaming a field, moving content between structs, or changing a baked-in default is a one-way door under Part 2 (it needs a version bump + replay); at the reset it is free because everything rebuilds anyway. This is where the schema-axis "hardest cases" get absorbed cheaply. -**Result:** additive fields are drift-neutral **by construction** (G4) — an unset field omits identically to a field that never existed, with no version bump and no replay. Only setters drift (G2). The cost is one tag per field (verbose but mechanical) and two genuine edge cases (see below). +**Anti-goal:** do *not* burn reset budget on additive fields — Part 2 handles those for free, forever. The single success criterion for the load-out is that **no second coordinated cutover is ever needed**: after the reset, every future change must be expressible as either a free additive field or a lazy Part 2 version bump. -#### Edge cases under default omitempty +### The lock changes at the reset — atomic token + forced upgrade -- **Meaningful zero with a non-zero default** (e.g. `int Jobs` defaulting to `4`, where `0` means serial). Post-merge: unset → `4` (included), explicit `0` → `0` (omitted-by-omitempty). These build differently *and* hash differently, so there is no collision — they are consistent. Such fields rarely trigger omission at all because the default keeps them non-zero. Tag them `always` only if a zero value must be distinguishable from a future change of default. -- **nil vs empty slice.** `reflect.Value.IsZero` on a slice is `IsNil`. A missing TOML key → nil → omitted; `key = []` → non-nil empty → included. Default omitempty thus makes nil-vs-empty a hash distinction that include-always collapses. Almost never observable — but a TOML formatter that strips empty arrays (or any round-trip that maps `[]`→absent) would flip hashes. **Tag rule: for any slice/map field where an explicit-empty value is reachable and build-meaningful, prefer `fingerprint:"always"`** so nil and empty both hash and the distinction can't silently move a fingerprint. +The stored hash becomes a single self-describing token: -**Adopting this flip is itself a fingerprint-algorithm change** (every config's hash moves), so it does not land for free — it is absorbed by Layer 2's versioned replay rather than by rewriting locks. See Layer 2. +```text +input-fingerprint = "v1:sha256:9f86d0…" # :: +``` + +One field carries both the content version and the digest, so they cannot be written out of step (a class of desync bug the prior split-field design was exposed to). Parsing splits on `:`; an absent prefix on a pre-reset lock reads as the legacy format. + +The lock **format** `Version` stays at `1`. The on-disk *schema* is unchanged — same fields, same TOML shape — so an old binary still parses a reset lock and reads its pins (`upstream-commit`, `import-commit`, `manual-bump`), which is all it needs to queue a build. What changes is the *value* of `InputFingerprint`: the substrate swap is expressed purely as a content-version step, and the reset is the **first forced upgrade** to the `v1:` token. The existing singleton `Parse` gate (`Version == 1`) is left untouched; all substrate/version reconciliation routes through the content-version registry instead of a format gate. + +Recovery from a sub-`v1` token is the **same mechanism** as the reset itself: a token with no `v:` prefix (or a version below `minSupportedLockContentVersion`) cannot be replayed, so it is treated as `Stale` and **force-rehashed** to the current version on the next `update`. One code path unifies three cases: + +- **Pre-reset locks** carry a legacy decimal hash with no prefix → force-rehashed to `v1` at the reset. +- **An old binary that rewrites a reset lock** stamps its legacy-substrate hash (no prefix) → the next new-binary run force-rehashes it back to `v1`. The mischief is self-correcting, never silent corruption. +- **A future floor raise** (after a deliberate `component migrate`) retires an old `v` the same way. + +This is the one place back-compatibility is load-bearing, and it is satisfied without a format bump: old binaries read pins and build; the fingerprint value reconciles by version. See the next section for why reading *historical* locks never needs to recompute their hash at all. + +### Back-compat invariant — synthetic history reads stored strings, never recomputes + +The reset is only safe because of a property of the codebase verified against the source: **nothing that reads a *historical* lock ever recomputes a fingerprint for it.** Every historical reader compares the *stored* hash strings; the only code that recomputes a fingerprint does so for the **current working tree against HEAD**, never against an arbitrary past commit. Concretely: + +| Reader | What it does with a historical lock | Recomputes? | +| ------ | ----------------------------------- | ----------- | +| `synthistory.FindFingerprintChanges` | walks `lockfile.ShowAtCommit`→`Parse`, compares `InputFingerprint` *strings* between adjacent commits | No | +| `synthistory.BuildDirtyChange` | compares the precomputed current fingerprint to HEAD's stored string | No (HEAD only) | +| `sourceprep.computeCurrentFingerprint` | the *only* `ComputeIdentity` call on this surface — computes for the **current tree**, compares to HEAD's stored hash | Current tree only | + +The consequence: **swapping the substrate is invisible to synthetic history.** A pre-reset (legacy-token) lock and a post-reset `v1:` lock are just two different opaque strings at two different commits; the walker reports "changed" across the reset commit (correct — it *is* a notable, deliberate, fleet-wide event, the coordinated cutover) and never tries to recompute either side. Applying historic overlays likewise reads stored lock fields and needs no hash recomputation. + +> **Invariant (must hold forever):** synthetic history and historic-overlay application operate on **stored lock fields only.** No reader recomputes a fingerprint for a historical commit. This is precisely what lets a frozen `projectVN` be *forward-only*: it never has to reproduce a hash from a different substrate generation, only hashes the lock that the *current* binary writes. A future change that recomputes a historical fingerprint would break this and must be rejected in review. + +This invariant — no reader recomputes a historical fingerprint — is the complete back-compatibility story: **new-reads-old by string, never-recompute-old by algorithm.** The lock *format* never bumps, so old and new binaries parse every lock identically; only the *interpretation* of the fingerprint value evolves, and that rides the content-version registry. + +## Part 2 — Post-reset lazy migration + +The reset gives us a clean, frozen substrate. Part 2 is the machinery that rides it for the rare genuine algorithm change *after* the cutover — lazily, per-component, with no second coordinated cutover. This is the original "lazy" design, now sound because `projectVN` is genuinely frozen. -### Layer 2 — Versioned lock content with lazy replay (algorithm and default changes) +### Versioned lock content with lazy replay (algorithm changes) -Stamp one **lock content-hash version** into the lock and teach the freshness check to **replay** older versions. The version governs *both* stored hashes (`InputFingerprint` and `ResolutionInputHash`) — they live in one lock, share one write event, and a single integer is the natural fit (see [scope note](#both-hashes-share-one-version) for why one version, not two): +Stamp one **lock content-hash version** into the lock (the `v1:` prefix of the atomic token) and teach the freshness check to **replay** older versions. The version governs *both* stored hashes (`InputFingerprint` and `ResolutionInputHash`) — they live in one lock, share one write event, and a single integer is the natural fit (see [scope note](#both-hashes-share-one-version) for why one version, not two): -1. Add `LockContentVersion int` (`toml:"lock-content-version,omitempty"`) to `ComponentLock`. **An absent field reads as `1`** — the current, pre-RFC algorithms — *not* `0`. (`0` is the Go zero value but no `v0` exists; map the zero to the baseline at read time: `ver := lock.LockContentVersion; if ver == 0 { ver = 1 }`.) The lock **format** `Version` stays `1`; this is a *content* version and is fully backward compatible. +1. The content version lives in the atomic `v:sha256:…` token (it is **not** the lock *format* `Version`, which stays at `1`). The registry floor *starts* at `1` = the projection baseline; there is no legacy pre-projection algorithm in the registry, because pre-reset locks are never replayed (they are read-only history, per the invariant above). A pre-reset lock's prefix-less token is therefore *below* the floor and reconciled by force-rehash, not replay. 2. Turn the combiner into a thin dispatcher over a small registry of historical algorithms, keyed by version. Each entry pairs the two compute functions; when only one algorithm changes, the other slot **reuses** the prior function (no version-neutral hash moves for the untouched one). Keep versions back to a declared floor (see [Registry floor](#registry-floor-and-forced-migration)): ```go type lockAlgo struct { - fingerprint computeFn // produces InputFingerprint - resolution resolveFn // produces ResolutionInputHash + fingerprint computeFn // produces the InputFingerprint digest + resolution resolveFn // produces the ResolutionInputHash digest } var lockAlgos = map[int]lockAlgo{ - 1: {computeFP1, computeRes1}, // current (pre-RFC) algorithms — the implicit baseline - 2: {computeFP2, computeRes1}, // omitempty default (Layer 1); resolution UNCHANGED → reuse v1 fn + 1: {computeFP1, computeRes1}, // projection + combiner baseline, established at the reset + // a future GENUINE algorithm change appends: 2: {computeFP2, computeRes1} } - const currentLockContentVersion = 2 - const minSupportedLockContentVersion = 1 + const currentLockContentVersion = 1 // == the reset baseline; bumps only on a real algo change + const minSupportedLockContentVersion = 1 // floor; raise only after a deliberate `component migrate` ``` -3. In `checkFingerprintFreshness`, compute at the **current** version. On mismatch, if the lock's recorded version `< current`, recompute at the lock's recorded version. If *that* matches the stored hash, the inputs are unchanged and only the algorithm evolved → treat as `FreshnessCurrent` and flag for silent re-stamp. Otherwise → `FreshnessStale`. (Phase 1 wires this for the fingerprint hash; the resolution hash reuses `computeRes1` until its algorithm first changes — see scope note.) -4. `component update` stamps `LockContentVersion = current` **only when it is already writing for an independent reason** (see the churn policy below). Migration is therefore **lazy and per-component**: a lock upgrades only when something independently touches it. +3. In `checkFingerprintFreshness`, compute at the **current** version. On mismatch, if the lock's token version `< current`, recompute at the lock's token version. If *that* matches the stored digest, the inputs are unchanged and only the algorithm evolved → treat as `FreshnessCurrent` and flag for silent re-stamp. Otherwise → `FreshnessStale`. (The resolution hash reuses `computeRes1` until its algorithm first changes — see scope note.) +4. `component update` re-stamps the token to the **current** version **only when it is already writing for an independent reason** (see the churn policy below). Migration is therefore **lazy and per-component**: a lock upgrades only when something independently touches it. This resolves Problems 2 (for default changes), 3 (hashing bugfixes), and 5 (piecemeal rollout). It is the same lazy-forward-migration pattern Cargo/npm use, specialized to a content hash. #### Both hashes share one version -`ComponentLock` carries two persisted content hashes: `InputFingerprint` (render inputs, via `hashstructure` + `Includable`) and `ResolutionInputHash` (upstream-resolution inputs — a flat SHA256 over seven explicit fields in `ComputeResolutionHash`, *not* a struct walk, so the omitempty/`Includable` story does not apply to it). Both have the **same evolution problem**: appending an input or reordering the fold moves every lock's hash → G1 churn. +`ComponentLock` carries two persisted content hashes: `InputFingerprint` (render inputs, via `projectVN` + `sha256`) and `ResolutionInputHash` (upstream-resolution inputs — a flat SHA256 over seven explicit fields in `ComputeResolutionHash`). Both have the **same evolution problem**: appending an input or reordering the fold moves every lock's hash → G1 churn. -We version them with **one shared integer**, not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version fields would double the floor/replay/`--rehash` machinery for an input set (`ResolutionInputHash`) that changes rarely — YAGNI. +We version them with **one shared integer** (the token's `v` prefix), not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version fields would double the floor/replay/migrate machinery for an input set (`ResolutionInputHash`) that changes rarely — YAGNI. -**Phasing.** Naming the field `lock-content-version` *now* is the one expensive-to-reverse decision (it is baked into the on-disk TOML schema the moment Layer 2 ships; renaming a persisted key is itself a migration). The fingerprint replay is wired in the first Layer 2 PR. **Resolution-hash replay is reserved, not yet wired** — the registry slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`), with no schema change. Critically, `ResolutionInputHash` does **not** feed the synthetic changelog path, so its churn is a one-line lock rewrite + a wasted re-resolution, never a phantom release (unlike `InputFingerprint`; see [Downstream consumers](#downstream-fingerprint-consumers-blast-radius)). +**Phasing.** The atomic token format (`v:sha256:…`) is fixed at the reset, so there is no expensive-to-reverse key-naming decision left for Part 2. Fingerprint replay is wired in Part 2's first PR. **Resolution-hash replay is reserved, not yet wired** — the registry slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`), with no schema change. Critically, `ResolutionInputHash` does **not** feed the synthetic changelog path, so its churn is a one-line lock rewrite + a wasted re-resolution, never a phantom release (unlike `InputFingerprint`; see [Downstream consumers](#downstream-fingerprint-consumers-blast-radius)). #### Churn-avoidance policies (G1) -The version stamp is itself a potential source of spurious diffs — the exact thing G1 forbids. Two policies keep it invisible until a real change forces a write: - -- **`lock-content-version` is `omitempty` in TOML.** A baseline (absent / version `1`) lock that is never otherwise touched never materializes the field, so its bytes stay identical. The field only appears in a lock that was *already* being rewritten for an independent reason. Existing checked-in locks therefore produce **zero diff** on the day this lands. -- **The `Changed` decision must replay *before* it compares — this is the subtle seam.** The naive read of the existing guard `if !result.Changed && !resHashChanged { return false, nil }` suggests the re-stamp harmlessly "rides the `Changed` path." **It does not.** In [`update.go`](../../../internal/app/azldev/cmds/component/update.go), `result.Changed` is set to `true` the instant `lock.InputFingerprint != identity.Fingerprint` — and `identity` is computed at the *current* version. That comparison sits **upstream** of the write guard. So after the v1→v2 switchover, the current-version hash differs from every stored v1 hash, `Changed` flips for ~every component, and we get exactly the mass auto-release-bump + mass lock rewrite G1 forbids. The fix is mandatory, not incidental: - - ```go - // Replay at the lock's recorded version BEFORE deciding Changed. - lockVer := lock.LockContentVersion - if lockVer == 0 { - lockVer = 1 - } - replayed, _ := fingerprint.ComputeIdentityAt(lockVer, *result.config, releaseVer, opts) - if lock.InputFingerprint != replayed.Fingerprint { - result.Changed = true // a REAL input change under the lock's own algorithm - } - // else: hashes match under the old algorithm → inputs unchanged, only the - // algorithm moved → NOT Changed. Advance the version only if some other real - // change is already dirtying this lock. - lock.InputFingerprint = identity.Fingerprint // current-version hash - if result.Changed { // re-stamp piggybacks a real write; never its own trigger - lock.LockContentVersion = currentLockContentVersion - } - ``` - - The principle: **"changed?" is judged under the lock's own algorithm version; the stored hash is only upgraded to the current version when the lock is already dirty for a real reason.** (When resolution replay is wired, the same replay-before-compare applies to the `resHashChanged` silent-write guard.) - -Together these make migration strictly opportunistic: a lock advances its version the next time its component changes for real, and not one commit sooner. +The version stamp is itself a potential source of spurious diffs — the exact thing G1 forbids. The rule that prevents it is one idea: **judge "changed?" by replaying the lock's *own* version, not the current one.** Everything below follows from that. + +**Why the obvious approach is wrong.** Today `update.go` sets `result.Changed = true` the instant `lock.InputFingerprint != identity.Fingerprint`, where `identity` is computed at the **current** version. That comparison sits *upstream* of the write guard `if !result.Changed && !resHashChanged { return false, nil }`. So the moment you ship a v1→v2 *algorithm* change, the current-version hash differs from every stored v1 token, `Changed` flips for **~every component at once**, and you get the mass auto-release-bump + mass lock rewrite G1 exists to prevent. The version stamp cannot "harmlessly ride the `Changed` path" — it *triggers* it. + +**The fix: replay before you compare.** Recompute at the lock's recorded version first, and only call it changed if *that* disagrees: + +```go +// Replay at the lock token's recorded version BEFORE deciding Changed. +lockVer := parseTokenVersion(lock.InputFingerprint) // "v1:sha256:…" → 1 +replayed := fingerprint.ComputeIdentityAt(lockVer, *result.config, releaseVer, opts) +if lock.InputFingerprint != replayed.Token() { + result.Changed = true // a REAL input change under the lock's own algorithm +} +// else: tokens match under the old algorithm → inputs unchanged, only the +// algorithm moved → NOT Changed. + +// Re-stamp to the current version ONLY when the lock is already dirty for a +// real reason — the version upgrade piggybacks a real write, never triggers one. +if result.Changed { + lock.InputFingerprint = identity.Token() // current version + digest, written together +} +``` + +This makes migration strictly **opportunistic**: a lock advances its version the next time its component changes for real, and not one commit sooner. Because the version lives *inside* the atomic token, a lock at `v1` with unchanged inputs keeps its exact `v1:sha256:…` bytes — there is no separate version field to materialize and no zero-diff bookkeeping. (When resolution replay is wired, the same replay-before-compare guards the `resHashChanged` write.) + +**The unavoidable flip side — false-fresh on a newly-measured input.** "Replay at the lock's own version" is what buys churn-avoidance, but it is the *same* property that creates a blind spot, because replaying `computeFP(old)` is **blind to any input that version did not measure.** Concretely, when v2 starts folding in an input v1 never touched (the [*Newly measured input*](#change-taxonomy) row): + +- A change to that **new** input on a still-`v1` lock replays at v1, which ignores it → digest still matches → **`Changed = false`** → the change is silently treated as fresh. +- The new input only takes effect on that lock when the lock migrates to v2 — i.e. the next time it is dirtied for an *independent* reason, or via `component migrate`. + +This is correct *by contract* (a v1 lock promises freshness under the v1 input set, which excludes the new input), and harmless for a cosmetic input. But for a **build-critical** new input it is a latent-stale hazard: artifacts can lag the new input by an unbounded number of commits. **Decision rule:** if a newly-measured input must take effect fleet-wide immediately, do **not** rely on lazy adoption — pair the version bump with a deliberate `component migrate` (see [Registry floor and forced migration](#registry-floor-and-forced-migration)). Lazy adoption is the default; `component migrate` is the opt-in for inputs that cannot wait. #### Registry floor and forced migration Lazy migration means an untouched lock can sit at an old version **indefinitely** (G3 by design). That makes "keep the last *N* versions" a **correctness cliff, not a tuning knob**: if pruning drops the compute function a lock still depends on, replay becomes impossible → forced `FreshnessStale` → the mass rebuild/rewrite (and, via the downstream-consumer analysis below, mass changelog churn) the whole design exists to avoid. So the floor must be explicit and paired with an escape hatch, decided now: - **`minSupportedLockContentVersion`** is a hard floor. A lock below it cannot be replayed and is treated as `Stale`. Dropping a registry entry is therefore a deliberate, breaking, announced act — never incidental cleanup. -- **`component update --rehash`** (Open Q#5, promoted to a requirement) force-advances every lock to the current version in one deliberate pass. This is the *only* sanctioned way to retire an old version: rehash the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. +- **`component migrate`** (Open Q#5, promoted to a requirement) force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* — it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** flip the release signal (unlike `--bump`). A migration that re-resolved or bumped would no longer be a pure version advance. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal — each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). -**Mixed-toolchain hazard.** `go-toml` silently drops unknown fields, so an *older* azldev binary that rewrites a lock a newer binary had stamped will strip `lock-content-version`, regressing it to the baseline. On the next new-binary run the stored (baseline-replayed) hash won't match the current algorithm → spurious `Changed` + bump. This is the classic down-migration trap. Mitigation is a documented invariant ("all writers of a given `locks/` tree must be ≥ the version that tree was last stamped at"), enforced in CI by pinning the azldev version; a hard guard (refuse to write a lock whose on-disk version exceeds the binary's `currentLockContentVersion`) is a possible belt-and-suspenders. +**Mixed-toolchain hazard — handled by force-rehash, not a format gate.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock — but the **atomic token** makes that harmless: it stamps a legacy (prefix-less) or lower-`v` hash, which the next new-binary run detects as sub-floor and **force-rehashes** to the current version. Self-correcting, never silent corruption. The symmetric residual is a binary that predates a content-version `v2` and meets a `v2` token it cannot replay: it must **error** (the token version exceeds its `currentLockContentVersion`), not silently restamp at `v1`. A one-line write guard (refuse to write a token whose version exceeds the binary's `currentLockContentVersion`) plus the CI version-pin closes that direction. #### Replaying across a changed input set — `{a,b,c}` → `{a,b,d}` -A lock stores **one opaque hash string** plus its `LockContentVersion`; it does *not* store the individual inputs. So when the measured set changes — say the fingerprint stops measuring `c` and starts measuring `d` — an existing lock (whose stored hash was computed over `{a,b,c}` at v1) is reconciled the only way an opaque hash allows: **recompute and compare, at the lock's own version.** +A lock stores **one atomic token** (`v:sha256:…`); it does *not* store the individual inputs. So when the measured set changes — say the fingerprint stops measuring `c` and starts measuring `d` — an existing lock is reconciled the only way an opaque digest allows: **recompute and compare, at the lock's own version.** Split the change into its two halves; they are handled independently: -- **Adding `d`** is the additive case — `d` is tagged `omitempty`, so for any component that doesn't set it the hash is byte-identical (G4). Free. No version bump. +- **Adding `d`** is the additive case — `projectV1` never listed `d`, so for any lock at v1 the digest is byte-identical whether or not the struct now has `d` (G4, *truly* — the property `hashstructure` could not give). Free. No version bump. - **Dropping `c`** is what forces the version bump, and it is reconciled by replay: - 1. `computeFP2` (measures `{a,b,d}`) ≠ stored hash → mismatch. - 2. lock version (1) < current (2) → **replay `computeFP1`** (still measures `{a,b,c}`). - 3. v1-replay == stored hash? **Yes** → `a,b,c` unchanged since the lock was written; only the *measurement* evolved → `FreshnessCurrent`, lazy re-stamp. **No** → a real input moved → `Stale`, rebuild. Both correct. + 1. `computeFP2` (measures `{a,b,d}`) ≠ stored digest → mismatch. + 2. token version (1) < current (2) → **replay `computeFP1`** (still measures `{a,b,c}`). + 3. v1-replay == stored digest? **Yes** → `a,b,c` unchanged since the lock was written; only the *measurement* evolved → `FreshnessCurrent`, lazy re-stamp. **No** → a real input moved → `Stale`, rebuild. Both correct. So the bump is **not breaking**: replay answers "were the *old* inputs unchanged?" without rebuilding. -**The load-bearing constraint the rest of Layer 2 assumes implicitly:** *a replay function reads the live config struct.* `computeFP1` is Go code in **today's** binary, reading fields off **today's** struct. That is fine when the struct shape is unchanged (the omitempty flip, a combiner bugfix, a changed default — all replay against the same fields). But **physically deleting field `c` from the struct breaks `computeFP1`** — it can no longer read `c`, cannot reproduce the `{a,b,c}` hash, and every lock that set `c` is forced `Stale`. Removal-from-the-struct is therefore the one edit that silently defeats replay. +**The one constraint replay still imposes: a retained `projectVN` must be able to read every field it lists.** Unlike the `hashstructure` substrate, `projectV1` is immune to field *additions* (it never reflects the live struct). It is *not* immune to field *removal*: `projectV1` names `c` explicitly, so physically deleting `c` from the struct stops `projectV1` from compiling. Removal is therefore the one edit still gated by a **deprecate-then-delete** two-step, both non-breaking: -The way around it is a **deprecate-then-delete** two-step, both non-breaking: +1. **Bump to v2 measuring `{a,b,d}` but keep field `c` on the struct** so `projectV1` can still read it for replay (`projectV2` simply does not list `c`). Every old lock replays clean at v1, is recognized as unchanged, lazy re-stamps to v2. Zero forced rebuilds. +2. **Only after the floor passes v1** (`minSupportedLockContentVersion = 2`, ideally after a deliberate `component migrate`) physically delete field `c` and `projectV1`. -1. **Bump to v2 measuring `{a,b,d}` but keep field `c` in the struct**, tagged `fingerprint:"-"` so `computeFP2` ignores it while `computeFP1` can still read it for replay. Every old lock replays clean at v1, is recognized as unchanged, lazy re-stamps to v2. Zero forced rebuilds. -2. **Only after the floor passes v1** (`minSupportedLockContentVersion = 2`, ideally after a deliberate `--rehash`) physically delete field `c`. `computeFP1` is already retired, so nothing reads `c` anymore. +> **Invariant:** a field may be physically removed from the config struct only after *every* retained `projectVN` that lists it has been retired below `minSupportedLockContentVersion`. Retained projection functions and the struct they read must stay in sync — you cannot delete a field a live version still names. -> **Invariant:** a field may be physically removed from the config struct only after *every* registry entry that measured it has been retired below `minSupportedLockContentVersion`. Equivalently: retained replay functions and the struct they read must stay in sync — you cannot delete a field a live version still needs. +This makes "drop an input" a lazy, per-component migration rather than a fleet-wide rebuild — at the cost of carrying a deprecated field on the struct until its projection function ages out. -This makes "drop an input" a lazy, per-component migration rather than a fleet-wide rebuild — at the cost of carrying a deprecated field on the struct until its replay function ages out. +#### First post-reset customer -#### First concrete use: the Layer 1 switchover +The reset establishes `projectV1` directly; it is *not* itself a Part 2 version event (it rides the rebuild, not replay). Part 2's machinery therefore sits idle until the **first genuine algorithm change after the cutover** — e.g. a `computeFP2` that fixes an overlay-folding bug, folds in a newly measured input, or changes a baked-in default. That change registers `computeFP2`, bumps `currentLockContentVersion` to 2, and is absorbed by replay with no second coordinated cutover. Because the projection substrate makes additive config changes hash-neutral by construction (G4), the *only* changes that ever need a Part 2 version event are genuine non-additive algorithm changes — a deliberately small set. -Flipping the inclusion default to omitempty (Layer 1) moves every config's hash, so it cannot ship as a free additive change — it is **Layer 2's first real customer.** It registers as the `computeFP2` algorithm (omitempty default) alongside `computeFP1` (include-always), bumps `currentLockContentVersion` to 2, and is absorbed by replay: every existing lock recomputes clean at v1, is recognized as unchanged-inputs, and re-stamps to v2 *only when next written* per the churn policy above. (The resolution slot is unchanged across this bump — v2 reuses `computeRes1`.) No mass regen, no flag day. And because omitempty makes all future additive changes hash-neutral by construction (G4), it permanently **shrinks** the set of changes that need a Layer 2 version event at all — Layer 1 is both the first user of Layer 2 and the thing that reduces Layer 2's future workload. +## Config schema version and canonical migration (future) -### Layer 3 — Config schema version and canonical migration (future) - -This is the on-disk TOML axis. It is **independent** of the fingerprint axis and only needed once we make *non-additive* TOML changes (rename/move/remove fields in the file format itself). +This is the on-disk TOML axis. It is **independent** of the fingerprint axis and only needed once we make *non-additive* TOML changes (rename/move/remove fields in the file format itself) that were *not* already absorbed by the reset's normalization pass. Most of the hardest cases are spent at the reset (load-out item 6); this axis covers whatever non-additive TOML change arises *after*. 1. Add an explicit `schema-version` to the config file (distinct from the existing `$schema` URL, which is for editor validation). -2. At **load time**, migrate older config shapes forward into the single latest canonical struct *before* anything hashes them. Fingerprinting stays blissfully unaware of file-format history. -3. Pair with the **hybrid seam**: expose `ComponentConfig.ConfigHash()` on the type (pure struct hash + inclusion policy); keep the combiner in `fingerprint`. +2. At **load time**, migrate older config shapes forward into the single latest canonical struct *before* anything hashes them. Fingerprinting stays blissfully unaware of file-format history. A `config migrate` command (sibling to today's `config schema` / `config dump`) makes this an explicit, reviewable pass that rewrites stale TOML files in place to the current `schema-version`. +3. The projection substrate already provides the clean seam: `projectVN` reads the post-migration canonical struct; the combiner stays in `fingerprint`. No `ConfigHash()` method is added (see [the seam note](#where-the-hashing-logic-should-live)). -The critical invariant: **migrate old TOML → latest canonical struct, then hash once.** A semantically no-op migration (rename `foo`→`bar`) must produce the *same* canonical struct, hence the same hash, hence no drift — handled by Layer 2's replay only if the *encoding* changed, and by Layer 3's normalization for the *file shape*. Do **not** keep parallel `V1.Hash()`/`V2.Hash()` methods on versioned structs: that couples the lock to a Go type identity instead of a simple integer, and forces two independent code paths to agree on a hash forever. +The critical invariant: **migrate old TOML → latest canonical struct, then project once.** A semantically no-op migration (rename `foo`→`bar`) must produce the *same* canonical struct, hence the same projection bytes, hence no drift. This is what keeps the schema axis **orthogonal** to the lock axis: a faithful `config migrate` is a pure re-encoding that moves *no* fingerprint, so it never triggers a `component migrate`. If a TOML change genuinely alters build meaning, that is a content-version bump (Part 2), not a `config migrate`. -**Caveat — `hashstructure` hashes the struct type name.** It mixes `reflect.Type.Name()` into the hash, so a Layer-3 migration that moves content into a *renamed* Go struct changes the fingerprint even when the content is byte-identical. "Rename is drift-neutral" therefore holds only if the canonical struct **keeps the original type name**, or the rename is shipped as a Layer-2 version bump that absorbs it. Prefer keeping the type name; reserve the version bump for when the type genuinely must be renamed. +**Resolved by projection:** the old `hashstructure` caveat — that it mixed `reflect.Type.Name()` into the hash, so renaming a Go struct moved every fingerprint even with identical content — **no longer applies.** `projectVN` hashes only the explicit field bytes it emits, never the type name. A struct rename is now genuinely drift-neutral. -### Layer interaction +## Pipeline ```text -TOML on disk ──Layer 3: migrate to canonical struct──► ComponentConfig +TOML on disk ──migrate to canonical struct (schema axis)──► ComponentConfig │ - Layer 1: HashInclude omits zero fields (default omitempty) + projectVN: emit explicit fields, omit-if-zero ▼ - Layer 2: ComputeIdentity[version] ──► InputFingerprint + combiner: sha256 over projection + overlays + identity │ lazy replay + re-stamp on update ▼ @@ -340,94 +428,102 @@ TOML on disk ──Layer 3: migrate to canonical struct──► ComponentConfig ## Downstream fingerprint consumers (blast radius) -The versioned-replay story in Layer 2 must hold for **every** reader of `InputFingerprint`, not just the two paths it grew up around. This is the migration blast-radius map; each consumer's behavior under a v1→v2 switchover is stated explicitly. +The versioned-replay story in Part 2 must hold for **every** reader of `InputFingerprint`, not just the two paths it grew up around. This is the post-reset migration blast-radius map; each consumer's behavior under a Part 2 v1→v2 algorithm switchover is stated explicitly. (The *reset itself* is invisible to these consumers as analyzed under [Back-compat invariant](#back-compat-invariant--synthetic-history-reads-stored-strings-never-recomputes): they compare stored strings, and pre-reset locks are never recomputed.) | Consumer | Reads | Compares | Migration behavior required | | -------- | ----- | -------- | --------------------------- | -| `checkFingerprintFreshness` (resolver) | recomputed identity | vs stored hash | Replay at lock version (Layer 2 core) | -| `component update` `Changed` decision | recomputed identity | vs stored hash | **Replay before `Changed`** (see churn policy / M2 seam) | -| `synthistory.FindFingerprintChanges` | stored hash strings across git history | adjacent commits | **No change needed — if migration stays lazy** | -| `synthistory.BuildDirtyChange` | recomputed (current ver) | vs stored `headLock` hash | **Replay at headLock version** before declaring dirty | +| `checkFingerprintFreshness` (resolver) | recomputed identity | vs stored token | Replay at token version (Part 2 core) | +| `component update` `Changed` decision | recomputed identity | vs stored token | **Replay before `Changed`** (see churn policy seam) | +| `changed.go` `classifyComponent` / `haveMatchingFingerprints` (CI classifier) | stored token strings | version-blind compare | **Replay-aware compare** — a v1 token must match its v2 re-stamp as "same" | +| `synthistory.FindFingerprintChanges` | stored token strings across git history | adjacent commits | **No change needed — if migration stays lazy** | +| `synthistory.BuildDirtyChange` | recomputed (current ver) | vs stored `headLock` token | **Replay at headLock version** before declaring dirty | | `ResolutionInputHash` staleness/write | recomputed resolution hash | vs stored | **Shares the version; replay reserved, not yet wired** | +The `changed.go` classifier is the easily-missed fifth consumer: [`classifyComponent`](../../../internal/app/azldev/cmds/component/changed.go) and `haveMatchingFingerprints` do raw, version-blind token compares to decide CI classification. Post-switchover a v1 token and its semantically-identical v2 re-stamp are different strings, so a naive compare would misclassify the component as changed. It needs the same replay-aware comparison as the freshness check (compare at the older token's version), not a raw string equality. + ### The synthetic changelog/release path is the real hazard [`synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go) turns fingerprint movement into **user-visible, shipped** package state — `%autochangelog` entries and `%autorelease` increments. There are two distinct comparators, and the design resolves them asymmetrically. - **`FindFingerprintChanges` (historical walker)** does a raw, version-blind string compare of `InputFingerprint` across the lock's git history and emits a synthetic changelog/release entry on every change. Making it genuinely version-aware is hard-to-infeasible — it only has committed *strings*, no inputs to replay. **It does not need to be**, *provided migration stays strictly lazy.* Under the churn policy, a version bump only ever rides a commit where a real input also changed, so there is never a version-only commit in history for the walker to misread. The migration folds honestly into that real change's entry. **This is a design decision, not a code fix:** the v1→v2 conversion is an *accepted, per-component, notable* changelog event that piggybacks a real change. - - **Trap:** this only holds while migration is lazy. A fleet-wide `--rehash` (or the M2 bug where `Changed` flips for everyone) converts *phantom* → *honest-but-fleet-wide* — a truthful but fleet-wide release bump, i.e. **G1 is dead.** "Accept as notable" is therefore conditional on **migration never riding a version-only or fleet-wide write** (the `--rehash` floor pass excepted, because it is deliberate and operator-driven). -- **`BuildDirtyChange` (live dirty check)** compares a *recomputed* current-version (v2) hash against the *stored* (possibly v1) `headLock.InputFingerprint` and declares dirty on inequality. "Accept as notable" does **not** save this path: post-switchover an *unchanged* component would read **dirty on every `render`/`build`** until re-stamped — a persistent, recurring spurious signal, worse than a one-time entry. The fix is **free**: it is the *same replay Layer 2 already owes the freshness check* — replay at `headLock`'s recorded version before declaring dirty. One additional call site for logic already being written, no new mechanism. + - **Trap:** this only holds while migration is lazy. A fleet-wide `component migrate` (or a regression where `Changed` flips for everyone) converts *phantom* → *honest-but-fleet-wide* — a truthful but fleet-wide release bump, i.e. **G1 is dead.** "Accept as notable" is therefore conditional on **migration never riding a version-only or fleet-wide write** (the `component migrate` floor pass and the one-time reset excepted, because they are deliberate and operator-driven). +- **`BuildDirtyChange` (live dirty check)** compares a *recomputed* current-version (v2) hash against the *stored* (possibly v1) `headLock.InputFingerprint` and declares dirty on inequality. "Accept as notable" does **not** save this path: post-switchover an *unchanged* component would read **dirty on every `render`/`build`** until re-stamped — a persistent, recurring spurious signal, worse than a one-time entry. The fix is **free**: it is the *same replay Part 2 already owes the freshness check* — replay at `headLock`'s recorded version before declaring dirty. One additional call site for logic already being written, no new mechanism. -**Net:** M1 is not "make the changelog walker version-aware" (hard, maybe infeasible). It is two things already on the books — (1) the strict lazy churn policy, so the walker never sees a version-only commit; and (2) extend the freshness replay to `BuildDirtyChange`, one extra call site. +**Net:** the changelog-walker concern is not "make the walker version-aware" (hard, maybe infeasible). It is two things already on the books — (1) the strict lazy churn policy, so the walker never sees a version-only commit; and (2) extend the freshness replay to `BuildDirtyChange` and the `changed.go` classifier, a few extra call sites for logic already being written. The reset commit is the single deliberate exception: it *is* a fleet-wide notable event, the coordinated cutover, intentionally visible. ### `ResolutionInputHash` — shares the version, replay deferred `ComponentLock` carries a *second* persisted content hash, `ResolutionInputHash`, with its own staleness logic and its own silent-write path (it writes when only `resHashChanged`, never flipping `Changed`). It has the **identical** evolution problem as `InputFingerprint`: any future change to `ComputeResolutionHash`'s algorithm moves every lock's hash — exactly the mass-churn this RFC exists to prevent. -The single `lock-content-version` covers it (see [Both hashes share one version](#both-hashes-share-one-version)). What differs is **blast radius**, which is why we wire its replay later, not now: +The single shared content version (the token's `v` prefix) covers it (see [Both hashes share one version](#both-hashes-share-one-version)). What differs is **blast radius**, which is why we wire its replay later, not now: -- `ResolutionInputHash` does **not** feed `synthistory` — so an algorithm change can never mint a phantom changelog/release (the M1 hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. -- It is a flat seven-field SHA256, not a struct walk, so the Layer 1 omitempty flip leaves it untouched — it has no pending v1→v2 event. Its registry slot stays `computeRes1` until its inputs genuinely change. +- `ResolutionInputHash` does **not** feed `synthistory` — so an algorithm change can never mint a phantom changelog/release (that hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. +- It is a flat seven-field SHA256, not a struct walk, so the projection substrate leaves it untouched — it has no pending version event. Its registry slot stays `computeRes1` until its inputs genuinely change. -**Decision:** name the field for the general case now (`lock-content-version`); wire fingerprint replay in Layer 2's first PR; reserve resolution replay (slot present, prior fn reused) and wire it the day `ComputeResolutionHash` first changes — a localized follow-up with no schema change. This fixes the one irreversible thing (the persisted key name) without speculative code (KISS/YAGNI on the second replay). +**Decision:** the atomic token format is fixed at the reset, so there is no irreversible key-naming decision left; wire fingerprint replay in Part 2's first PR; reserve resolution replay (slot present, prior fn reused) and wire it the day `ComputeResolutionHash` first changes — a localized follow-up with no schema change. KISS/YAGNI on the second replay. ## Design decisions -### D1 — `Includable` vs `IgnoreZeroValue` +### D1 — Canonical projection vs `hashstructure` + `Includable` -Both omit zero values; the difference is **control granularity and escape hatches.** +Both can omit zero values; the decisive difference is **whether an old algorithm can be frozen**, which `Includable` cannot deliver (Problem 6). -| | `Includable` per-field (chosen) | `IgnoreZeroValue` global | +| | Canonical projection (chosen) | `hashstructure` + `Includable` | | --- | --- | --- | -| Meaningful empties | Preserved via `fingerprint:"always"` | Lost — no opt-out | -| Per-field intent | Explicit, CI-audited | Invisible | -| Wiring | One helper + value-receiver method per struct | One option flag | +| Old algorithm frozen | Yes — explicit pinned field list | No — reflects the live struct/method-set | +| Sound replay (Part 2) | Yes | No (the disqualifier) | +| Meaningful empties | `emitAlways` per field | `fingerprint:"always"` per field | +| Type-name in hash | No (rename is drift-neutral) | Yes (rename moves every hash) | +| Plumbing | Projection encoder + golden vectors | Value-receiver `HashInclude` on every nested struct + `v.(reflect.Value)` assert | -`IgnoreZeroValue` is a blunt global switch with no way to keep a build-meaningful zero. `Includable` gives the same default behavior **plus** the `always` escape hatch and a point-of-definition audit. Both move every hash once on adoption — that cost is absorbed by Layer 2 either way (see the switchover note), so it is not a differentiator. +`Includable` keeps today's hashes byte-identical, which mattered for an *incremental* rollout — but that property is worthless once the reset rebuilds everything anyway, and it comes attached to a substrate that makes replay unsound. Projection trades byte-compatibility (which we are spending on the coordinated cutover regardless) for frozen replay (which we need forever). Adopted at the reset. -### D2 — Mandatory explicit tags, default omitempty +### D2 — Explicit field lists + golden vectors over reflection tags -Every fingerprinted field must carry `fingerprint:"-"`, `"omitempty"`, or `"always"` — there is no untagged state. Rationale: +Field selection lives in `projectVN` as ordinary, explicit Go code (one `emit`/`emitAlways` line per measured field), not in struct tags read by a reflective walker. Rationale: -- The *unsafe* failure direction is the false-negative (a meaningful field omitted → missed rebuild). Defaulting to omitempty tilts toward that direction, so the safety check must be loud, not implicit. -- A mandatory tag forces the "is this field's zero value build-meaningful?" decision **at the point of definition**, where the author has the context — better locality than a far-away exclusions registry. -- It *simplifies* the audit: assert every field has a valid tag value; delete the `expectedExclusions` map entirely. +- The *unsafe* failure direction is the false-negative (a meaningful field silently omitted → missed rebuild). An explicit list makes "what does v1 measure?" greppable in one function, and the **golden-vector test** turns any accidental change to a historical projection into a CI failure — a far stronger guard than a tag-presence audit. +- It forces the "is this field's zero value build-meaningful?" decision at the call site (`emit` vs `emitAlways`), with full context. +- It removes the `Includable` nested-struct trap entirely: there is no per-struct method to forget, no decorative tag that passes the audit while silently hashing a zero. -Fully implicit (omitempty default, no tags, no audit) was rejected — it removes the only guard against the unsafe direction. `fingerprint:"omitempty"` mirrors Go's own `json:",omitempty"`; `"always"` and `"-"` read unambiguously alongside it. +The cost is writing `projectVN` by hand instead of leaning on reflection. That is the point: hand-written selection is what makes the function frozen and auditable. -### D3 — Content version vs format version in the lock +### D3 — Atomic self-describing token; no format bump, reconcile via force-rehash -Reusing `ComponentLock.Version` for the algorithm would force a format-version bump (and the strict `Parse` gate would reject old locks outright). A separate `LockContentVersion` keeps the format stable and old locks readable, enabling lazy migration instead of hard rejection. It is named for the *general* case — it versions every content hash the lock stores (`InputFingerprint` now, `ResolutionInputHash` when its replay is wired) — because the persisted TOML key is the one thing that is expensive to rename after ship. +The stored hash is a single `v:sha256:` token, not separate version and digest fields. One field, written atomically, so the version and the digest can never desync (the class of bug a split-field design invites when one is written and the other is not). -### D4 — Method-on-type hashing +The lock **format** `Version` stays at `1`. An earlier draft bumped it (1→2) as a poison pill to stop old binaries touching reset locks, but that also stops them reading pins to *queue a build* — too blunt. Instead, back-compat rests on two cheaper properties: the format is unchanged so every binary parses every lock, and the content-version registry **force-rehashes** any sub-floor token (legacy, or downgraded by an old binary) up to the current version. Old binaries stay useful (read pins, build); their only possible mischief — writing a legacy-substrate hash — is self-correcting on the next new-binary run, not silent corruption. Back-compat is therefore: **same format forever, reconcile fingerprints by version, never recompute history.** -Adopt the **hybrid seam**: pure `ConfigHash()` on the config type, combiner in `fingerprint`. A full move was rejected (layering regression: I/O + crypto + algorithm versioning do not belong on a data type). See [Research](#where-the-hashing-logic-should-live). +### D4 — Project to bytes, not a `ConfigHash()` method on the type -Two constraints keep the seam from eroding back into the rejected methods-on-type design: **`ConfigHash()` must stay version-frozen** (it computes exactly one algorithm; it does *not* dispatch over versions — a single method "can't replay its own past"), and **the combiner is the sole version authority.** Version dispatch lives entirely in `fingerprint`'s registry; `ConfigHash()` is just the current pure-config step it calls. Keep `ConfigHash()` unexported-or-narrow if practical, so callers cannot route around the registry to get a raw, version-agnostic hash. +`projectVN(config) []byte` returns canonical bytes; the combiner in `fingerprint` owns the `sha256` and the version dispatch. A `ConfigHash()` method that returns a finished hash was rejected: it drags crypto + versioning onto a data type, and it tempts callers to route around the version registry to get a raw, version-agnostic hash. Returning bytes keeps the config type ignorant of versioning, and keeps the combiner the **sole version authority**. See [the seam note](#where-the-hashing-logic-should-live). ## Alternatives considered -- **Global `IgnoreZeroValue`** — see D1. Same default behavior but no per-field escape hatch for meaningful zeros and no point-of-definition audit. Rejected. -- **Implicit omitempty (no mandatory tags, no audit)** — see D2. Removes the only guard against the unsafe false-negative direction. Rejected in favor of mandatory 3-way tags. -- **Content-hash the rendered config** (Go-modules style) instead of struct-hashing. The naive version of this — "hash all the bytes" — over-captures, since we deliberately exclude many fields (`paths`, `publish`, snapshots) from the fingerprint. The *stronger* form is a **canonical-projection hash**: serialize only the included fields, keys sorted, and hash those bytes — immune to field-shape drift without per-field reflection tags. We still stay with `hashstructure` + `Includable` because our inclusion policy is **conditional** (omitempty = include-if-non-zero, evaluated on the resolved value), which a static byte serializer would have to re-implement anyway — so the projection hash buys field-shape immunity at the cost of reimplementing the very predicate `Includable` already gives us, plus a second serialization format to keep stable forever. Rejected on that basis, but recorded as the principled alternative; it is the one foundational choice that would be expensive to reverse post-adoption. -- **Parallel versioned structs with per-struct `Hash()`** — couples locks to Go type identity and duplicates hashing logic per version. Rejected in favor of Layer 2's integer-versioned combiner + Layer 3 canonical migration. -- **Bump lock format `Version` and migrate eagerly** — eager migration rewrites every lock at once, the exact mass-churn we are trying to avoid. Rejected in favor of lazy per-component re-stamp. +- **Incremental lazy migration on the `hashstructure` substrate** (the original plan): flip the inclusion default to omitempty via `Includable`, version the lock content, and migrate lazily — *without* a reset. Rejected: Problem 6 makes its central promise unkeepable. A "frozen" replay function built on `hashstructure.Hash` reflects the live struct, so the first field addition after the switchover moves the old algorithm's output and forces a rehash anyway. The incremental path therefore does not actually avoid a coordinated cutover — it defers one to the first field addition, on a substrate that makes replay unsound. With a coordinated cutover already scheduled (the dev→prod cutover), spending it once on a clean projection substrate strictly dominates. +- **Global `IgnoreZeroValue`** — a blunt switch that omits *all* zero fields with no escape hatch for build-meaningful zeros, and still on the non-frozen `hashstructure` substrate. Rejected. +- **Parallel versioned structs with per-struct `Hash()`** — couples locks to Go type identity and duplicates hashing logic per version. Rejected in favor of Part 2's integer-versioned combiner over frozen projections. +- **Bump the lock format `Version` 1→2 as a poison pill** (an earlier draft's choice) — makes old binaries hard-reject reset locks. Rejected: it also blocks old binaries from reading pins to queue a build, and it is unnecessary, since the content-version registry already force-rehashes any sub-floor or downgraded token (D3). Same-format + force-rehash keeps old binaries useful without risking silent corruption. +- **Eager fleet-wide migration as the steady-state mechanism** — rewriting every lock on every algorithm change is the mass-churn the design exists to prevent. Rejected for the steady state. The *reset* is a deliberate, one-time, operator-driven eager pass riding an already-scheduled rebuild — the sanctioned exception, not the rule; `component migrate` is its post-reset equivalent for retiring an old version. ## Incremental delivery -1. **PR A (Layer 1)**: shared `includeFingerprintField` helper + a delegating value-receiver `HashInclude` on **every** fingerprinted struct (all ~10 registered in `fingerprint_test.go`, not just `ComponentConfig`/`PackageConfig` — see the per-struct resolution note in Layer 1); tag every fingerprinted field with one of `-`/`omitempty`/`always`; rewrite the field-decision audit to (a) assert valid-tag presence and (b) assert every registered struct implements `Includable`, then drop the `expectedExclusions` registry. **Note:** flipping the default moves every hash, so PR A must land *with or after* PR B's version machinery — it registers as the `computeFP2` algorithm, not a standalone change. Unit tests: a zeroed `omitempty` field hashes **equal to its absence-equivalent** (not merely "setting it drifts" — that positive-direction test passes even if `HashInclude` is a no-op, so it must be paired with the zero-equals-absent assertion that actually exercises omission); an `always` field drifts even at zero. -2. **PR B (Layer 2)**: `LockContentVersion` on `ComponentLock` (+ `ComponentLockData` and `populateFromLock`, so the replay site can read the version); a paired version registry (fingerprint + resolution compute fns) with a `minSupportedLockContentVersion` floor; fingerprint replay-before-`Changed` in `update.go`; fingerprint replay in `checkFingerprintFreshness` **and `BuildDirtyChange`** (same replay logic, two call sites). Resolution-hash replay is *reserved* — the registry slot reuses `computeRes1`; not wired until `ComputeResolutionHash` first changes. Unit tests: old-version lock with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write. -3. **PR C (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) — set a new `omitempty` field on a single component and assert only that lock drifts. -4. **PR D (Layer 3, later)**: `schema-version` field, load-time canonical migration, `ComponentConfig.ConfigHash()` seam. Gated on the first real non-additive TOML change. +The reset (Part 1) must land as one coherent change at the dev→prod cutover; its pieces are independently reviewable but ship together because they all move the hash. + +1. **PR A (substrate)**: `projectVN` encoder (`canonicalBuf`, `emit`/`emitAlways`), `projectV1` with the explicit field list, `sha256` combiner, and the golden-vector test. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Unit tests: a field absent from `projectV1` is invisible to the digest; `emitAlways` fields hash even at zero; golden vectors pin the v1 output. +2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`. Ships at the cutover; absorbed by the scheduled rebuild. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. +3. **PR C (Part 2 machinery)**: the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`), `ComputeIdentityAt`, replay-before-`Changed` in `update.go`, and replay in `checkFingerprintFreshness`, `BuildDirtyChange`, **and the `changed.go` classifier**. Resolution replay reserved (slot reuses `computeRes1`). With only `v1` registered this is inert but proven. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write. +4. **PR D (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) — add a field absent from `projectV1` and set it on one component; assert only that lock drifts and every other lock is byte-identical. +5. **PR E (config schema axis, later)**: `schema-version` field + load-time canonical migration + the `config migrate` command. Gated on the first post-reset non-additive TOML change not already absorbed by the reset's normalization pass. -Each PR is independently revertible. Because the Layer 1 default flip is a hash-moving change, PRs A and B ship together (or B first); the `lock-content-version` omitempty stamp and churn policies ensure existing locks see zero diff until independently touched. Layer 3 migrates lazily on next write. +Each PR is independently revertible up to the cutover. PRs A–B land together at the dev→prod cutover (they move every hash and are absorbed by the scheduled rebuild); PR C is inert until the first post-reset algorithm change; PRs D–E follow. ## Open questions 1. Should a lazy re-stamp during a *read-only* command (`render`, `build` freshness check) write the lock back, or defer all writes to `component update`? Writing on read is surprising; deferring means freshness checks stay slightly slower until the next update. (Leaning: defer all writes to `update`, keeping reads side-effect-free.) -2. For Layer 3, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. -3. Should `omitempty` semantics use `reflect.Value.IsZero()` (Go's notion) or a config-aware notion of "unset" (e.g. nil pointer vs empty string)? Pointers would make "set to empty" expressible but complicate the structs. -4. Can the audit go further than tag-presence and *statically* flag fields whose zero value is likely meaningful (e.g. a `bool` defaulting true) and nudge toward `always`? Or is the point-of-definition tag plus code review sufficient? -5. Should the mixed-toolchain hazard get a hard write-time guard (refuse to write a lock whose on-disk version exceeds the binary's `currentLockContentVersion`), or is the CI version-pin invariant enough? +2. For the config schema axis, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. +3. Should the omit-if-zero predicate use `reflect.Value.IsZero()` (Go's notion) or a config-aware notion of "unset" (e.g. nil pointer vs empty string)? `projectVN` makes this a per-field choice in code, so it can differ field to field — but a default convention is still worth settling. +4. What is the canonical byte encoding for `projectVN` (length-prefixed key+value? a stable subset of TOML/CBOR?), and how are golden vectors stored and regenerated? This is the one substrate detail that is expensive to change after the reset. +5. Should the residual mixed-toolchain case get a hard write-time guard (refuse to write a token whose version exceeds `currentLockContentVersion`), or is force-rehash on read + the CI version-pin enough? (The operator escape hatch is `component migrate`; this question is only about the *automatic* guard.) -*Resolved in-text (recorded here so they aren't re-litigated):* registry retention is a **floor**, not "last N" (M8 / Registry floor); `--rehash` is the sanctioned forced-migration pass (promoted from a question to a requirement); absent `LockContentVersion` reads as `1`; one shared `lock-content-version` covers both stored hashes, with resolution-hash replay reserved (slot present, fn reused) until `ComputeResolutionHash` first changes. +*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling); one shared content version covers both stored hashes, with resolution-hash replay reserved (slot present, fn reused) until `ComputeResolutionHash` first changes. From eff913d438538c4899a6f1da87f9dcd2901ee141 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Fri, 5 Jun 2026 17:47:56 -0700 Subject: [PATCH 04/15] update 3 --- docs/developer/rfc/lazy-schema-migration.md | 190 ++++++++++++++------ 1 file changed, 135 insertions(+), 55 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index 8e9a56c6..65f403c9 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -88,7 +88,7 @@ This RFC therefore has two parts: **(1)** a one-time **reset** at the dev→prod - **G1 (primary, non-functional): no spurious lock-file diffs *after the reset*.** Once prod locks exist, landing a config-schema or hashing change must not rewrite `*.lock` files for components whose effective inputs are unchanged. The reset itself is the *one* sanctioned exception, absorbed by the already-scheduled rebuild. - **G2: only real changes drift.** Post-reset, a lock changes iff that component's build-effective inputs changed. - **G3: piecemeal, lazy migration post-reset.** Genuine algorithm evolution after the reset rolls out per-component, riding independent changes, never as a big-bang. -- **G4: additive fields are drift-neutral by construction — *truly*, not just for new locks.** On the projection substrate (below) an unset additive field is invisible to *every* lock including old ones, because old algorithms pin an explicit field list and never reflect over the live struct. +- **G4: additive fields are drift-neutral by construction — *truly*, not just for new locks.** On the projection substrate (below) an unset additive field is invisible to *every* lock including old ones, because old versions emit only the fields their tags include — a field added later is not in any shipped version's tag set, so it cannot move an existing hash. - **G5: correctness backstop preserved.** Never silently under-rebuild: a genuine input change must always drift its lock. Replay may accept encoding/over-capture changes; it must never mask a behavior-changing one. - **G6 (new, hard): back-compatible reads for synthetic history.** The new binary must still **read** pre-reset locks across git history (synthetic changelog/release walks them), even though it **writes** only the new format. Reading never recomputes a historical hash — it compares stored strings only. @@ -129,7 +129,7 @@ The struct's type name *is* part of the hash (`hashstructure` mixes in `reflect. - Whether `Includable` is consulted depends on whether the type implements it *now* — not on what was true when v1 locks were written. - A `value` vs `pointer` receiver subtlety even decides whether the root struct's `HashInclude` is seen at all (the top-level value is not addressable). -A function meant to be "the v1 algorithm, forever" therefore changes meaning every time the struct or its method set changes. That is the disqualifier for the incremental plan (Problem 6) and the motivation for the projection substrate below, whose v1 function pins an explicit field list and is immune to all three. +A function meant to be "the v1 algorithm, forever" therefore changes meaning every time the struct or its method set changes. That is the disqualifier for the incremental plan (Problem 6) and the motivation for the projection substrate below, whose v1 projection emits only its version-tagged fields and reads neither the method set nor the type name — immune to all three. ## Change taxonomy @@ -137,17 +137,19 @@ Not every config change should be treated the same way. The right mechanism depe | Class | Example | Should unaffected locks drift? | Mechanism | | ----- | ------- | ------------------------------ | --------- | -| **Additive field** | new `foo` field, unset on most components | No — only setters drift | **Free, no bump.** Add `foo` to the current `projectVN` as omit-if-zero; a component that leaves it unset emits identical bytes, so no shipped hash moves. Setters drift (correct). | -| **Additive with non-zero default** | new field defaulted to `"auto"` via defaults merge | No | **Bump + replay.** The default resolves non-zero on *every* component, so it is emitted everywhere and would move every hash — omit-if-zero can't save it. Ship `projectV(N+1)` that emits it; old locks **replay at their version** (which didn't emit it), match their stored digest → recognized unchanged → lazy re-stamp, no rebuild. | -| **Rename / move** | `foo` → `bar`, same semantics | No | **Schema migration + bump + replay.** Migrate old TOML → canonical struct (the rename lands in the struct), then ship `projectV(N+1)` that emits the renamed field. Old locks replay at their version and are recognized unchanged → lazy re-stamp, no rebuild. | +| **Additive field** | new `foo` field, unset on most components | No — only setters drift | **Free, no bump.** Tag the new field `vN..*` (current version, omit-if-zero); a component that leaves it unset emits identical bytes, so no shipped hash moves — adding an omit-if-zero field to the live version is the one output-preserving no-bump edit. Setters drift (correct). | +| **Additive with non-zero default** | new field defaulted to `"auto"` via defaults merge | No | **Bump + replay.** The default resolves non-zero on *every* component, so it is emitted everywhere and would move every hash — omit-if-zero can't save it. Bump and tag the field `v(N+1)..*`; old locks **replay at their version** (whose set excludes it), match their stored digest → recognized unchanged → lazy re-stamp, no rebuild. | +| **Default change on an *existing* field** | bump `jobs` default `4`→`8` | Yes — every component's effective input moved | **Not lazy-maskable.** Replay recomputes the *current* config (now resolving to `8`) under the old algorithm → `jobs=8` ≠ stored `jobs=4` → honest fleet-wide drift; replay cannot suppress it because the resolved value genuinely changed for everyone. Escape hatch: `config migrate` writes the *old* resolved value explicitly (`jobs=4`) into each config **before** moving the default — existing components then pin the old value (no drift) and only new components pick up `8`. Without that pre-pass it is a legitimate (if large) fleet rebuild, not a bug. | +| **Rename / move** | `foo` → `bar`, same semantics | No | **Schema migration + bump + replay.** Migrate old TOML → canonical struct (the rename lands in the struct), then tag the renamed field `v(N+1)..*`. Old locks replay at their version and are recognized unchanged → lazy re-stamp, no rebuild. | | **Semantic change** | meaning of `foo` changes; output differs | Yes — that's correct | **None.** The build output genuinely differs, so the lock *should* drift. Replay at the old version would (correctly) mismatch → `Stale` → rebuild. Nothing to suppress. | | **Hashing bugfix** | overlay ordering bug in the combiner | No | **Bump + replay.** Ship the fixed combiner as the version-`N+1` half of `computeFP(N+1)`; old locks replay at the old (buggy) version. If their inputs are unchanged the buggy digest still matches → recognized unchanged → lazy re-stamp to the fixed version, no rebuild. | -| **Newly measured input** | start folding in a new overlay source or identity element | No | **Bump + replay.** A non-config input is added in the combiner half of `computeFP(N+1)` (a config field would go in `projectV(N+1)`). Old locks replay at their version, which didn't fold it in, match their stored digest → recognized unchanged → lazy re-stamp, no rebuild. **Caveat:** until a lock migrates, replay is *blind* to the new input, so a change to it reads as fresh (false-fresh) — if it is build-critical, force a `component migrate` pass instead of riding lazy adoption (see [churn-avoidance](#churn-avoidance-policies-g1)). | -| **Field removal** | drop deprecated `foo` | No, if nobody set it | **Deprecate-then-delete (+ bump for setters).** Bump to a `projectV(N+1)` that stops emitting `foo` but **keep the field on the struct** so the old `projectVN` can still read it for replay. Only after the floor passes that version (ideally after a `component migrate`) physically delete the field. Setters drift on the bump; non-setters replay clean. | +| **Newly measured input** | start folding in a new overlay source or identity element | No | **Bump + replay.** A non-config input is added in the combiner half of `computeFP(N+1)` (a config field would be tagged `v(N+1)..*` instead). Old locks replay at their version, which didn't fold it in, match their stored digest → recognized unchanged → lazy re-stamp, no rebuild. **Caveat:** until a lock migrates, replay is *blind* to the new input, so a change to it reads as fresh (false-fresh) — if it is build-critical, force a `component migrate` pass instead of riding lazy adoption (see [churn-avoidance](#churn-avoidance-policies-g1)). | +| **Field removal** | drop deprecated `foo` | No, if nobody set it | **Deprecate-then-delete (+ bump for setters).** Close the field's range at the prior version (`vK..*` → `vK..vN`, so v(N+1) stops measuring it) but **keep the field on the struct** so older versions can still read it for replay. Only after the floor passes vN (ideally after a `component migrate`) physically delete the field. Setters drift on the bump; non-setters replay clean. | +| **Resurrected field** | re-measure a previously-dropped `foo` | Depends — only if its value moved | **Tag edit (+ bump).** Append a new range to the field's set (`v1..v3,v8..*`) so v8+ measures it again while v1–v7 stay byte-identical (golden-vector-enforced). If the field was already physically deleted, bring it back as a fresh additive field tagged `v8..*`. The earlier life and the revival never collide because each version's output is pinned independently. | -The recurring requirement across the "No" rows is the same: **distinguish a change in user intent from a change in encoding, and only drift on the former.** Note the first row: on the projection substrate, a new field is added to `projectVN` as *omit-if-zero*, so a component that does not set it emits identical bytes and stays hash-neutral — *for every lock, old or new*, because old configs never set the brand-new field. Adding it does not move any existing hash (no shipped lock set it), so it needs no version bump. Part 2 then carries only the genuinely hard cases (rows 2, 5, and post-reset renames/removals). The shared move in every "Bump + replay" row is the same primitive — **increment the content version, keep the old `projectVN` as a frozen replay function, and let unchanged locks re-stamp lazily** — detailed in [Part 2](#part-2--post-reset-lazy-migration). +The recurring requirement across the "No" rows is the same: **distinguish a change in user intent from a change in encoding, and only drift on the former.** Note the first row: on the projection substrate, a new field is added to `projectVN` as *omit-if-zero*, so a component that does not set it emits identical bytes and stays hash-neutral — *for every lock, old or new*, because old configs never set the brand-new field. Adding it does not move any existing hash (no shipped lock set it), so it needs no version bump. Part 2 then carries only the genuinely hard cases (rows 2, 5, and post-reset renames/removals). The shared move in every "Bump + replay" row is the same primitive — **increment the content version, keep the old `projectVN` as a frozen replay projection, and let unchanged locks re-stamp lazily** — detailed in [Part 2](#part-2--post-reset-lazy-migration). -> **`projectVN`** is shorthand used throughout this RFC for the hand-written *projection function* introduced by this design (defined in [Substrate options](#substrate-options) and [The projection substrate](#the-projection-substrate)). The `N` is the lock content version: `projectV1` is the function that names and serializes exactly the fields content-version 1 measures, `projectV2` the next algorithm, and so on. Each `projectVN` is frozen once shipped — that is the whole point. +> **`projectVN`** is shorthand used throughout this RFC for the canonical *projection at content-version N* introduced by this design (defined in [Substrate options](#substrate-options) and [The projection substrate](#the-projection-substrate)). It is **not** N hand-written functions: it is a single generic walker, `project(cfg, N)`, whose per-field membership is declared in version-set tags on the struct fields (see [Version-tagged field selection](#version-tagged-field-selection)). `projectV1` means `project(cfg, 1)` — the fields whose tag set includes v1; `projectV2` the next version, and so on. Each version's projection is frozen once shipped (its tags never move; golden vectors enforce it) — that is the whole point. ## Research @@ -156,7 +158,7 @@ The recurring requirement across the "No" rows is the same: **distinguish a chan Two substrates can produce a content fingerprint of the resolved config. The difference that matters here is **whether an old algorithm function can be frozen.** - **`hashstructure` + `Includable` (rejected as the substrate).** Keeps existing hashes byte-identical and gives per-field omission via `HashInclude`. But, as established above (Problem 6), a function built on `hashstructure.Hash` reflects over the live struct and method set, so it cannot be a frozen historical algorithm. It also requires a value-receiver `HashInclude` on *every* nested fingerprinted struct and a subtle `v.(reflect.Value)` type-assert to work at all — brittle plumbing in service of a substrate that still can't host sound replay. -- **Canonical projection + stdlib hash (chosen).** Split the two jobs `hashstructure` fuses — *field selection* and *hashing* — into explicit steps. A `projectVN` function names the exact fields version N measures, emits them in a canonical, sorted, self-delimiting byte form, and an stdlib `sha256` hashes those bytes. Because `projectV1` references an **explicit, pinned field list**, it does not see fields added later, does not depend on the type's method set, and does not depend on receiver subtleties. It is a genuinely frozen pure function — the property replay requires. The cost is owning a small projection encoder plus **golden hash vectors** per version (a checked-in `(config, version) → hash` table) so "frozen" is CI-enforced, not merely intended. +- **Canonical projection + stdlib hash (chosen).** Split the two jobs `hashstructure` fuses — *field selection* and *hashing* — into explicit steps. Field selection is **declared per field** as a version-set in the `fingerprint` tag (`fingerprint:"v1..*"`); a single generic walker, `project(cfg, N)`, emits the fields whose set includes version N in a canonical, sorted, self-delimiting byte form, and an stdlib `sha256` hashes those bytes. Because a shipped version's tag membership is **fixed and golden-vector-pinned**, `project(cfg, 1)` does not see fields added later, does not depend on the type's method set, and does not depend on receiver subtleties. It is a genuinely frozen pure function of `(cfg, version)` — the property replay requires. The cost is owning a small projection encoder, the version-set tags, and **golden hash vectors** per version (a checked-in `(config, version) → hash` table) so "frozen" is CI-enforced, not merely intended. The projection substrate is what makes G4 true for old locks and what makes Part 2's replay sound. It is adopted at the reset (below), not incrementally. @@ -169,6 +171,8 @@ The projection substrate is what makes G4 true for old locks and what makes Part The common pattern: an **integer version stamped into the persisted artifact**, plus the ability to **read and replay older versions**, plus **lazy forward-migration on write**. We keep `ComponentLock.Version` (the lock *format* slot) fixed at `1` and carry the *content* version **inside the `InputFingerprint` token** (`v:sha256:…`) rather than in a separate struct field — one atomic value, no version/digest desync, no new TOML field for an old binary to mishandle. The Go-modules lesson is the deepest one: hashing *content* rather than struct shape is what makes additive metadata free — the canonical-projection substrate is our version of that lesson. +**Where we go *beyond* the precedent (stated honestly).** All four tools above keep exactly **one** active algorithm: Cargo/npm/Terraform rewrite the *whole* artifact to the current version on next touch (eager-on-write), and Go modules sidestep replay entirely by never re-migrating semantics. **None of them keeps N historical hashing algorithms alive simultaneously across an indefinitely-unmigrated fleet** — which is exactly Part 2's behavior. The citations support "version stamp + lazy forward-migrate on write"; they do *not* cover "frozen algorithms coexisting forever." That coexistence is justified here on its own terms (it is what avoids a fleet rebuild on every algorithm change), and its one real cost — append-only registry growth — is bounded by the [floor-advance cadence](#registry-floor-and-forced-migration), not by precedent. + ### Where the hashing logic should live With the projection substrate the fingerprint algorithm decomposes into two steps. **Both are versioned together** by the single lock content version — the version pins the *entire* fingerprint computation, not just the field list: @@ -196,36 +200,92 @@ The original "lazy" instinct was right for Part 2 and wrong for Part 1: there is Replace `hashstructure.Hash(component, …)` with an explicit two-step pipeline: ```text -ComponentConfig ──projectV1(cfg)──► canonical bytes ──sha256──► configHash - (explicit field list, (stdlib) - sorted keys, emit-if-nonzero) +ComponentConfig ──project(cfg,1)──► canonical bytes ──sha256──► configHash + (version-tagged fields, (stdlib) + sorted keys, emit-if-nonzero) ``` -`projectV1` is hand-written and names exactly the fields v1 measures. It emits a canonical, sorted, self-delimiting byte stream (length-prefixed keys + values) so distinct field sets cannot collide, and it omits a field when its **resolved value is zero** (the omitempty behavior, now a property of the encoder, not a struct tag). A field whose zero value is build-meaningful is simply listed as *always-emit* in `projectV1`. +`projectV1` is the projection at version 1 — `project(cfg, 1)`. Field membership is declared **on each struct field** as a version-set in the `fingerprint` tag (`fingerprint:"v1..*"`); a single generic walker emits, in stable key order, every field whose set includes the target version, length-prefixing key+value so distinct field sets cannot collide. It omits a field when its **resolved value is zero** (omit-if-zero, an encoder property now, not a struct-tag toggle); a range prefixed with `!` (e.g. `!v1..*`) always-emits, for fields whose zero is build-meaningful. There is no per-version function — only the generic walker parametrized by version. (Grammar and recovery semantics: [Version-tagged field selection](#version-tagged-field-selection) below.) Three things this buys that `hashstructure` could not: -- **Frozen by construction.** `projectV1` references a pinned field list, so adding `Foo` to the struct later is invisible to it — `projectV1`'s output for an old config is unchanged. This is what makes Part 2's replay sound (Problem 6) and G4 true for *old* locks, not just new ones. -- **No method-set / receiver magic.** No `Includable`, no per-nested-struct method, no `v.(reflect.Value)` type-assert footgun. Field selection is ordinary code. -- **Golden-vector enforced.** A checked-in table of `(config, version) → hash` vectors is asserted in CI, so any accidental change to a historical `projectVN` fails the build. "Frozen" stops being a promise and becomes a test. +- **Frozen by construction.** A version's field set is fixed by tags that never change for a shipped version (golden vectors enforce it), so adding `Foo` to the struct later is invisible to `project(cfg, 1)` — its output for an old config is unchanged. This is what makes Part 2's replay sound (Problem 6) and G4 true for *old* locks, not just new ones. +- **No method-set / receiver magic.** No `Includable`, no per-nested-struct method, no `v.(reflect.Value)` type-assert footgun. Selection is a declarative tag the walker reads. +- **Golden-vector enforced.** A checked-in table of `(config, version) → hash` vectors is asserted in CI, so any accidental change to a historical projection — a tag edit that moves a shipped version's membership — fails the build. "Frozen" stops being a promise and becomes a test. The cost is owning the projection encoder and the golden vectors. That cost is paid once, at the reset, against a rebuild we are already doing. +### Version-tagged field selection + +Field membership in each version's projection is declared **on the struct field**, as a version-set in the existing `fingerprint` tag — not in a hand-written per-version function. One generic walker, `project(cfg, N)`, emits every field whose set includes `N`. This is the chosen mechanism; hand-written `projectVN` functions are the [Option B alternative](#alternatives-considered). + +**Grammar** (deliberately small): + +```ebnf +tag = "-" | member, { ",", member } ; +member = [ "!" ], range ; (* leading "!" ⇒ always-emit for this range *) +range = version, [ "..", ( version | "*" ) ] ; +version = "v", digit, { digit } ; +``` + +| Tag | Meaning | +| --- | --- | +| *(absent)* | **build failure** — every fingerprinted field must carry an explicit decision | +| `-` | never measured (unchanged from today) | +| `v1..*` | measured from v1 onward, omit-if-zero — the common "active field" case | +| `v1..v4` | measured v1–v4, then dropped | +| `v3..*` | introduced at v3 | +| `v1..v4,v6..*` | measured v1–v4, **dropped at v5, brought back at v6** | +| `!v1..*` | measured v1 onward, **always-emit** (zero value still hashes) | +| `v1..v4,!v5..*` | omit-if-zero v1–v4, then **always-emit from v5** (the temporal toggle) | + +`*` resolves to "this version and every later one," so an *active* field never needs a tag edit across a version bump — only a field that is *dropped* at the bump gets its range closed (`v1..*` → `v1..vN`). + +**Recovery is the property that justifies the range syntax.** The hard requirement: if we drop a field, then versions later realize we need it again, we must be able to bring it back *without* disturbing any frozen historical hash. The rule that guarantees it: **you only ever *add* a range for the *new* version; you never edit a shipped version's membership.** Walk it: + +- `Foo` tagged `v1..*`. At the v2 bump we drop it → edit to `v1..v1`. v1 still emits `Foo`; v2+ does not. +- At v5 we need it back → edit to `v1..v1,v5..*`. **v1's membership is unchanged (still in the set), v2–v4 unchanged (still out), only v5+ is added.** + +Every frozen output is byte-preserved, and the **golden vectors prove it**: the edit `v1..v1` → `v1..v1,v5..*` must leave the v1–v4 vectors identical or CI fails. The grammar lets you *express* the non-contiguous set; the golden vectors *forbid* rewriting history while doing so. Two recovery flavors, both covered: (a) field still on the struct (lingering for replay) → reopen its range; (b) field already physically deleted (floor passed it) → bring-back is just a fresh additive field tagged `vN..*`. Same outcome, no special case. + +**Always-emit is per-range, for the same reason.** Whether a field's *zero value emits* can change over time just as its membership can — so `!` flags an individual range, not the whole field. `v1..v4,!v5..*` means omit-if-zero through v4, then always-emit from v5. Toggling it is an *output-changing* edit (a zero-valued field starts or stops emitting), so it lands as a new range at a new version exactly like a drop/re-add — same output-preservation rule, same golden-vector enforcement. The walker just asks "which range holds N, and is it `!`?" This is why the earlier whole-field `always` flag was wrong: it could not be toggled temporally without abandoning the generic walker. + +**What tags version, and what they don't.** Tags version *membership* — which fields a version measures. They do **not** version *encoding* — how a field's bytes are formed, or how the combiner folds non-field inputs. So the generic walker absorbs additive / removal / bring-back changes as pure tag edits (zero code), while a genuine encoding or combiner change still ships as versioned code in `computeFP(N+1)` (the walker output + the combiner step frozen at N). The taxonomy's non-additive rows are exactly that small set. + +**Enforcement**, three layers: + +1. **No tag → build failure** restores the safe default the projection otherwise gives up (G-1): a forgotten field fails loudly instead of silently dropping out of the hash. The tag *is* the per-field completeness ledger — no separate audit, no field→key bridge. +2. **Well-formedness:** ranges parse, are sorted and non-overlapping, and name no version above `currentLockContentVersion` (open `*` excepted); a `!` prefix is per-range and orthogonal to those checks. A malformed or future-referencing set fails the build. +3. **Golden vectors** pin `(config, version) → hash`, so any edit that changes a *shipped* version's output for an existing config fails CI. The precise rule is **output-preservation**, not literal membership-freezing: adding an *omit-if-zero* field to the current version (`vN..*`) is the one no-bump edit, because no existing config set it so no existing vector moves; every *output-changing* edit instead defines a **new** version (closing or opening a range at the bump), leaving all earlier versions byte-identical. You can introduce membership for the new version freely, but you can never silently rewrite a shipped hash. + +This retires the `expectedExclusions` map in `fingerprint_test.go` outright: "no tag → fail" makes every decision explicit and local, and golden vectors catch accidental include→exclude edits — the map's two jobs, both subsumed. + ### Baseline v1 — omit-if-zero, no include-always legacy Because the reset rebuilds everything, there is **no pre-existing population to stay byte-compatible with.** That removes the single biggest constraint of the incremental plan: we do **not** need an `include-always` compatibility mode to preserve today's hashes. `projectV1` is the omit-if-zero projection from day one. There is no `computeFP1 = legacy include-always` entry to carry forever — the registry's floor *starts* at the clean projection. ```go -// projectV1 emits the canonical byte form of the fields v1 measures. -// Field selection is explicit code, not reflection — this is what freezes it. -// emit() length-prefixes key+value so distinct field sets cannot collide; -// it skips a field when the resolved value is zero (the omit-if-zero default). -func projectV1(c *ComponentConfig) []byte { +// Membership is declared per field as a version-set tag; one generic walker +// emits the fields whose set includes the target version, in stable key order. +// What freezes a version is that its tags never change once shipped (golden +// vectors enforce it) — not that the walker is bespoke. +type ComponentConfig struct { + Upstream string `fingerprint:"v1..*"` // measured v1+, omit-if-zero + Patches []string `fingerprint:"v1..*"` // omit-if-zero (nil and [] both → absent) + StripDebug bool `fingerprint:"!v1..*"` // always-emit: zero (false) is build-meaningful + Internal string `fingerprint:"-"` // never measured + // … every fingerprinted field carries an explicit tag; absent ⇒ build failure … +} + +func project(c *ComponentConfig, version int) []byte { var b canonicalBuf - b.emit("upstream", c.Upstream) // omit-if-zero - b.emit("patches", c.Patches) // omit-if-zero (nil and [] both → absent) - b.emitAlways("strip_debug", c.StripDebug) // always: zero (false) is build-meaningful - // … one line per measured field, in a fixed order … + for _, f := range fingerprintFields(c) { // reflection, cached, sorted by key + r := f.set.rangeContaining(version) + if r == nil { + continue // field not measured at this version + } + b.emit(f.key, f.value, r.always) // r.always (the range's '!') ⇒ emit even when zero + } return b.Bytes() } ``` @@ -235,14 +295,14 @@ func projectV1(c *ComponentConfig) []byte { - Two configs that both resolve a field to zero build identically → hashing them the same is **correct**, not a collision. - "Unset" never reaches the hasher — it has already been resolved to its default. If the default is non-zero, the field is non-zero and is emitted anyway. If the default *is* zero, then unset and explicit-zero resolve identically → same build → same hash → correct. -So the classic false-negative requires absence ≠ zero-default *at the point of hashing*, and post-merge resolution closes that gap. The load-bearing invariant is **G5's guarantee restated structurally: the fingerprint must see exactly the build-effective resolved config.** That invariant must already hold, or fingerprinting is broken independently of this change. `emitAlways` is the escape hatch for the rare field whose zero value is build-meaningful. +So the classic false-negative requires absence ≠ zero-default *at the point of hashing*, and post-merge resolution closes that gap. The load-bearing invariant is **G5's guarantee restated structurally: the fingerprint must see exactly the build-effective resolved config.** That invariant must already hold, or fingerprinting is broken independently of this change. A `!`-prefixed range is the escape hatch for the rare field whose zero value is build-meaningful. **Result:** additive fields are drift-neutral **by construction** (G4) — a newly added field, listed omit-if-zero in `projectVN`, emits nothing for any component that does not set it, so it is invisible to every lock that leaves it unset, old or new. Adding it moves no existing hash (no shipped lock could have set a field that did not yet exist), so it needs no version bump. Only setters drift (G2). #### Edge cases under omit-if-zero -- **Meaningful zero with a non-zero default** (e.g. `int Jobs` defaulting to `4`, where `0` means serial). Post-merge: unset → `4` (emitted), explicit `0` → omitted. These build differently *and* hash differently, so there is no collision — they are consistent. Use `emitAlways` only if a zero value must be distinguishable from a future change of default. -- **nil vs empty slice.** A missing TOML key → nil → omitted; `key = []` → non-nil empty → emitted. For any slice/map field where an explicit-empty value is reachable and build-meaningful, use `emitAlways` so nil and empty both hash. +- **Meaningful zero with a non-zero default** (e.g. `int Jobs` defaulting to `4`, where `0` means serial). Post-merge: unset → `4` (emitted), explicit `0` → omitted. These build differently *and* hash differently, so there is no collision — they are consistent. Use a `!` range only if a zero value must be distinguishable from a future change of default. +- **nil vs empty slice.** A missing TOML key → nil → omitted; `key = []` → non-nil empty → emitted. For any slice/map field where an explicit-empty value is reachable and build-meaningful, use a `!` range so nil and empty both hash. ### The reset load-out — what to spend the free rebuild on @@ -255,7 +315,7 @@ The reset rebuild is a budget. Spend it on the irreversible / cutover-only chang 5. **Unify on `sha256` everywhere**, retiring the `uint64`→decimal-string wart from the `hashstructure` era. One hash format, one encoding. 6. **Do every pending rename / default-normalization now.** Renaming a field, moving content between structs, or changing a baked-in default is a one-way door under Part 2 (it needs a version bump + replay); at the reset it is free because everything rebuilds anyway. This is where the schema-axis "hardest cases" get absorbed cheaply. -**Anti-goal:** do *not* burn reset budget on additive fields — Part 2 handles those for free, forever. The single success criterion for the load-out is that **no second coordinated cutover is ever needed**: after the reset, every future change must be expressible as either a free additive field or a lazy Part 2 version bump. +**Anti-goal:** do *not* burn reset budget on additive fields — Part 2 handles those for free, forever. The success criterion for the load-out is that **no *routine* change ever forces a second coordinated cutover**: after the reset, every ordinary change must be expressible as either a free additive field or a lazy Part 2 version bump. Retiring an *old* content version is the one sanctioned exception — a fleet-wide `component migrate` is itself a deliberate, planned, reset-grade event (see [Registry floor](#registry-floor-and-forced-migration)); the goal is that nothing *unplanned* ever forces one. ### The lock changes at the reset — atomic token + forced upgrade @@ -315,6 +375,19 @@ Stamp one **lock content-hash version** into the lock (the `v1:` prefix of the a } const currentLockContentVersion = 1 // == the reset baseline; bumps only on a real algo change const minSupportedLockContentVersion = 1 // floor; raise only after a deliberate `component migrate` + + // init enforces the registry/floor contract: every version in + // [minSupported, current] MUST have an entry, or replay panics at + // runtime instead of failing the build. The map and the two consts are + // edited independently, so this assertion is load-bearing, not decorative. + func init() { + for v := minSupportedLockContentVersion; v <= currentLockContentVersion; v++ { + if _, ok := lockAlgos[v]; !ok { + panic(fmt.Sprintf("lockAlgos missing version %d in [%d,%d]", + v, minSupportedLockContentVersion, currentLockContentVersion)) + } + } + } ``` 3. In `checkFingerprintFreshness`, compute at the **current** version. On mismatch, if the lock's token version `< current`, recompute at the lock's token version. If *that* matches the stored digest, the inputs are unchanged and only the algorithm evolved → treat as `FreshnessCurrent` and flag for silent re-stamp. Otherwise → `FreshnessStale`. (The resolution hash reuses `computeRes1` until its algorithm first changes — see scope note.) @@ -369,9 +442,10 @@ This is correct *by contract* (a v1 lock promises freshness under the v1 input s Lazy migration means an untouched lock can sit at an old version **indefinitely** (G3 by design). That makes "keep the last *N* versions" a **correctness cliff, not a tuning knob**: if pruning drops the compute function a lock still depends on, replay becomes impossible → forced `FreshnessStale` → the mass rebuild/rewrite (and, via the downstream-consumer analysis below, mass changelog churn) the whole design exists to avoid. So the floor must be explicit and paired with an escape hatch, decided now: - **`minSupportedLockContentVersion`** is a hard floor. A lock below it cannot be replayed and is treated as `Stale`. Dropping a registry entry is therefore a deliberate, breaking, announced act — never incidental cleanup. -- **`component migrate`** (Open Q#5, promoted to a requirement) force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* — it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** flip the release signal (unlike `--bump`). A migration that re-resolved or bumped would no longer be a pure version advance. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal — each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). +- **`component migrate`** (Open Q#5, promoted to a requirement) force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* — it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every token's digest — advancing the algorithm version is the whole point — so a fleet-wide migrate **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved token as notable, exactly as [the synthetic-history trap](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. That is *why* migrate is reset-grade and rare, not a free background sweep — the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal — each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). +- **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine — left alone, the registry, golden vectors, and deprecated tombstone fields grow **append-only** (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion − minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. -**Mixed-toolchain hazard — handled by force-rehash, not a format gate.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock — but the **atomic token** makes that harmless: it stamps a legacy (prefix-less) or lower-`v` hash, which the next new-binary run detects as sub-floor and **force-rehashes** to the current version. Self-correcting, never silent corruption. The symmetric residual is a binary that predates a content-version `v2` and meets a `v2` token it cannot replay: it must **error** (the token version exceeds its `currentLockContentVersion`), not silently restamp at `v1`. A one-line write guard (refuse to write a token whose version exceeds the binary's `currentLockContentVersion`) plus the CI version-pin closes that direction. +**Mixed-toolchain hazard — bounded by the version-pin, not auto-repair.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock, stamping a legacy (prefix-less) or lower-`v` hash. In the **working tree** this is self-correcting: the next new-binary run detects the sub-floor token and force-rehashes it to the current version. But "self-correcting" stops at the working tree — if a downgraded lock is **committed**, `FindFingerprintChanges` reads `v1 → legacy → v1` as two real release events, and a published `%autorelease` increment cannot be withdrawn. So the load-bearing guard against *committed* phantom releases is the **CI version-pin**: post-cutover, no old binary may run the `update`-and-commit step. (The force-rehash only cleans the working tree; it does not undo history.) The *symmetric* residual — a binary that predates content-version `v2` meeting a `v2` token it cannot replay — is closed by a **required** write-time guard (Open Q#5, now a requirement): refuse to write a token whose version exceeds the binary's `currentLockContentVersion`, erroring rather than silently restamping at `v1`. Note this guard lives in the binary doing the write, so it constrains *newer-but-not-newest* binaries; it does **not** retroactively constrain a genuinely *old* binary — that direction is the version-pin's job. #### Replaying across a changed input set — `{a,b,c}` → `{a,b,d}` @@ -387,14 +461,14 @@ Split the change into its two halves; they are handled independently: So the bump is **not breaking**: replay answers "were the *old* inputs unchanged?" without rebuilding. -**The one constraint replay still imposes: a retained `projectVN` must be able to read every field it lists.** Unlike the `hashstructure` substrate, `projectV1` is immune to field *additions* (it never reflects the live struct). It is *not* immune to field *removal*: `projectV1` names `c` explicitly, so physically deleting `c` from the struct stops `projectV1` from compiling. Removal is therefore the one edit still gated by a **deprecate-then-delete** two-step, both non-breaking: +**The one constraint replay still imposes: a field a retained version still measures must stay on the struct.** The projection is immune to field *additions* (the walker only emits fields whose tag set includes the target version, so a new field is invisible to old versions). It is *not* immune to field *removal*: v1 still measures `c` (its tag set includes v1) and the retained v1 golden vector sets `c`, so physically deleting `c` from the struct makes that vector's config unconstructable → the golden-vector test fails to build. (Hand-written `projectVN` functions would make this a *compile* error instead — a marginally stronger guard the tag walker trades for an equally-blocking CI one; see [D2](#d2--version-tagged-field-selection--golden-vectors).) Removal is therefore the one edit still gated by a **deprecate-then-delete** two-step, both non-breaking: -1. **Bump to v2 measuring `{a,b,d}` but keep field `c` on the struct** so `projectV1` can still read it for replay (`projectV2` simply does not list `c`). Every old lock replays clean at v1, is recognized as unchanged, lazy re-stamps to v2. Zero forced rebuilds. +1. **Bump to v2 measuring `{a,b,d}` but keep field `c` on the struct** so the v1 projection can still read it for replay (close `c`'s tag to `v1..v1`, so v2 does not measure it). Every old lock replays clean at v1, is recognized as unchanged, lazy re-stamps to v2. Zero forced rebuilds. 2. **Only after the floor passes v1** (`minSupportedLockContentVersion = 2`, ideally after a deliberate `component migrate`) physically delete field `c` and `projectV1`. -> **Invariant:** a field may be physically removed from the config struct only after *every* retained `projectVN` that lists it has been retired below `minSupportedLockContentVersion`. Retained projection functions and the struct they read must stay in sync — you cannot delete a field a live version still names. +> **Invariant:** a field may be physically removed from the config struct only after *every* retained version whose tag set includes it has been retired below `minSupportedLockContentVersion`. Retained versions and the struct they read must stay in sync — you cannot delete a field a live version's golden vector still sets. -This makes "drop an input" a lazy, per-component migration rather than a fleet-wide rebuild — at the cost of carrying a deprecated field on the struct until its projection function ages out. +This makes "drop an input" a lazy, per-component migration rather than a fleet-wide rebuild — at the cost of carrying a deprecated field on the struct until the last version measuring it ages out. #### First post-reset customer @@ -410,7 +484,7 @@ This is the on-disk TOML axis. It is **independent** of the fingerprint axis and The critical invariant: **migrate old TOML → latest canonical struct, then project once.** A semantically no-op migration (rename `foo`→`bar`) must produce the *same* canonical struct, hence the same projection bytes, hence no drift. This is what keeps the schema axis **orthogonal** to the lock axis: a faithful `config migrate` is a pure re-encoding that moves *no* fingerprint, so it never triggers a `component migrate`. If a TOML change genuinely alters build meaning, that is a content-version bump (Part 2), not a `config migrate`. -**Resolved by projection:** the old `hashstructure` caveat — that it mixed `reflect.Type.Name()` into the hash, so renaming a Go struct moved every fingerprint even with identical content — **no longer applies.** `projectVN` hashes only the explicit field bytes it emits, never the type name. A struct rename is now genuinely drift-neutral. +**Resolved by projection:** the old `hashstructure` caveat — that it mixed `reflect.Type.Name()` into the hash, so renaming a Go struct moved every fingerprint even with identical content — **no longer applies.** `projectVN` hashes only the explicit field bytes it emits, never the type name. A struct rename is now genuinely drift-neutral — **pinned by a golden test** (rename a fingerprinted struct while keeping its fields identical → byte-identical digest), so the property is CI-enforced, not just asserted here. ## Pipeline @@ -434,12 +508,17 @@ The versioned-replay story in Part 2 must hold for **every** reader of `InputFin | -------- | ----- | -------- | --------------------------- | | `checkFingerprintFreshness` (resolver) | recomputed identity | vs stored token | Replay at token version (Part 2 core) | | `component update` `Changed` decision | recomputed identity | vs stored token | **Replay before `Changed`** (see churn policy seam) | -| `changed.go` `classifyComponent` / `haveMatchingFingerprints` (CI classifier) | stored token strings | version-blind compare | **Replay-aware compare** — a v1 token must match its v2 re-stamp as "same" | +| `changed.go` `classifyComponent` / `haveMatchingFingerprints` (CI classifier) | stored token strings (two historical git refs) | string compare | **String-only — must NOT replay** (no inputs available, and replaying historical configs would violate the no-recompute invariant); kept honest by strict-lazy churn, exactly like `FindFingerprintChanges` | | `synthistory.FindFingerprintChanges` | stored token strings across git history | adjacent commits | **No change needed — if migration stays lazy** | | `synthistory.BuildDirtyChange` | recomputed (current ver) | vs stored `headLock` token | **Replay at headLock version** before declaring dirty | | `ResolutionInputHash` staleness/write | recomputed resolution hash | vs stored | **Shares the version; replay reserved, not yet wired** | -The `changed.go` classifier is the easily-missed fifth consumer: [`classifyComponent`](../../../internal/app/azldev/cmds/component/changed.go) and `haveMatchingFingerprints` do raw, version-blind token compares to decide CI classification. Post-switchover a v1 token and its semantically-identical v2 re-stamp are different strings, so a naive compare would misclassify the component as changed. It needs the same replay-aware comparison as the freshness check (compare at the older token's version), not a raw string equality. +**Two comparator classes, not one — and only one of them can replay.** The consumers split cleanly by *what they hold*: + +- **Current-tree comparators** (`checkFingerprintFreshness`, `update`'s `Changed`, `BuildDirtyChange`) recompute against *live inputs*, so they **can and must** replay at the stored token's version. Feasible and invariant-safe. +- **Stored-vs-stored historical comparators** (`FindFingerprintChanges`, `changed.go`'s `classifyComponent`/`haveMatchingFingerprints`) hold only committed token *strings* from two git refs — no config, no FS, no inputs. They **cannot** replay, and replaying would require recomputing a historical fingerprint, which the [forever-invariant](#back-compat-invariant--synthetic-history-reads-stored-strings-never-recomputes) forbids outright. Both stay **string-only**, kept honest by the *same* strict-lazy churn policy: under lazy migration a v1→v2 re-stamp only ever rides a commit whose inputs genuinely changed, so a raw string compare never sees a version-only delta. + +The `changed.go` classifier was the easily-missed member of the *second* class. The fix is **not** to make it replay-aware (impossible, and invariant-violating) — it is to confirm it lives under the strict-lazy guarantee, exactly as `FindFingerprintChanges` does. An earlier draft of this table wrongly demanded replay here; that obligation is removed. ### The synthetic changelog/release path is the real hazard @@ -460,7 +539,7 @@ The single shared content version (the token's `v` prefix) covers it (see [Bo - `ResolutionInputHash` does **not** feed `synthistory` — so an algorithm change can never mint a phantom changelog/release (that hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. - It is a flat seven-field SHA256, not a struct walk, so the projection substrate leaves it untouched — it has no pending version event. Its registry slot stays `computeRes1` until its inputs genuinely change. -**Decision:** the atomic token format is fixed at the reset, so there is no irreversible key-naming decision left; wire fingerprint replay in Part 2's first PR; reserve resolution replay (slot present, prior fn reused) and wire it the day `ComputeResolutionHash` first changes — a localized follow-up with no schema change. KISS/YAGNI on the second replay. +**Decision:** the atomic token format is fixed at the reset, so there is no irreversible key-naming decision left; wire fingerprint replay in Part 2's first PR; reserve resolution replay (slot present, prior fn reused) and wire it the day `ComputeResolutionHash` first changes — a localized follow-up with no schema change. KISS/YAGNI on the second replay. **One constraint for that day:** give `ResolutionInputHash` its *own* version prefix (decoupled from the `InputFingerprint` token) when replay is wired. Sharing one integer is fine *now* because resolution has no pending version event; but a *resolution-only* algorithm change must not be forced to advance the `InputFingerprint` token — that field feeds `FindFingerprintChanges`, so bumping it for a resolution change would mint a phantom release. Independent prefixes keep a resolution bump off the release-bearing field. ## Design decisions @@ -470,23 +549,24 @@ Both can omit zero values; the decisive difference is **whether an old algorithm | | Canonical projection (chosen) | `hashstructure` + `Includable` | | --- | --- | --- | -| Old algorithm frozen | Yes — explicit pinned field list | No — reflects the live struct/method-set | +| Old algorithm frozen | Yes — version-tagged fields, golden-vector pinned | No — reflects the live struct/method-set | | Sound replay (Part 2) | Yes | No (the disqualifier) | -| Meaningful empties | `emitAlways` per field | `fingerprint:"always"` per field | +| Meaningful empties | `!`-prefixed range per field | `fingerprint:"always"` per field | | Type-name in hash | No (rename is drift-neutral) | Yes (rename moves every hash) | -| Plumbing | Projection encoder + golden vectors | Value-receiver `HashInclude` on every nested struct + `v.(reflect.Value)` assert | +| Plumbing | Generic walker + version tags + golden vectors | Value-receiver `HashInclude` on every nested struct + `v.(reflect.Value)` assert | `Includable` keeps today's hashes byte-identical, which mattered for an *incremental* rollout — but that property is worthless once the reset rebuilds everything anyway, and it comes attached to a substrate that makes replay unsound. Projection trades byte-compatibility (which we are spending on the coordinated cutover regardless) for frozen replay (which we need forever). Adopted at the reset. -### D2 — Explicit field lists + golden vectors over reflection tags +### D2 — Version-tagged field selection + golden vectors -Field selection lives in `projectVN` as ordinary, explicit Go code (one `emit`/`emitAlways` line per measured field), not in struct tags read by a reflective walker. Rationale: +Field membership lives in a per-field version-set tag (`fingerprint:"v1..*"`) read by one generic walker — not in N hand-written functions, and not in the binary include/exclude tag of today's reflective audit. Rationale: -- The *unsafe* failure direction is the false-negative (a meaningful field silently omitted → missed rebuild). An explicit list makes "what does v1 measure?" greppable in one function, and the **golden-vector test** turns any accidental change to a historical projection into a CI failure — a far stronger guard than a tag-presence audit. -- It forces the "is this field's zero value build-meaningful?" decision at the call site (`emit` vs `emitAlways`), with full context. -- It removes the `Includable` nested-struct trap entirely: there is no per-struct method to forget, no decorative tag that passes the audit while silently hashing a zero. +- **The unsafe direction is the false-negative** (a meaningful field silently omitted → missed rebuild → stale artifact). A *mandatory* tag — absent → build failure — makes the include/exclude decision impossible to forget, restoring the safe default a bare hand-written list quietly gives up (G-1). The tag *is* the per-field completeness ledger: no separate audit, no field→emit-key bridge. +- **Version-awareness is declarative.** A field's whole lifecycle — introduced at v3, dropped at v5, revived at v8 — is one greppable string on the field (`v3..v4,v8..*`), not a diff smeared across three function bodies. Recovery (bring-back) is *expressible* precisely because the set is non-contiguous. +- **Golden vectors freeze it.** Editing a shipped version's output changes a checked-in `(config, version) → hash` vector → CI failure — the same backstop that would protect hand-written functions, minus the per-version boilerplate. +- **It retires `expectedExclusions`.** The map in `fingerprint_test.go` exists to (a) force a decision on every field and (b) catch accidental exclusion-tag removal; no-tag-fail and golden vectors subsume both. -The cost is writing `projectVN` by hand instead of leaning on reflection. That is the point: hand-written selection is what makes the function frozen and auditable. +The one thing hand-written functions do better: field *removal* is **compile-enforced** there (deleting a field a retained `projectVN` still names won't compile), where the tag walker downgrades it to an equally-blocking *golden-vector* build failure. That single marginal loss buys declarative lifecycles, native completeness, and first-class recovery — a good trade. The hand-written variant is kept as [Option B](#alternatives-considered). ### D3 — Atomic self-describing token; no format bump, reconcile via force-rehash @@ -496,7 +576,7 @@ The lock **format** `Version` stays at `1`. An earlier draft bumped it (1→2) a ### D4 — Project to bytes, not a `ConfigHash()` method on the type -`projectVN(config) []byte` returns canonical bytes; the combiner in `fingerprint` owns the `sha256` and the version dispatch. A `ConfigHash()` method that returns a finished hash was rejected: it drags crypto + versioning onto a data type, and it tempts callers to route around the version registry to get a raw, version-agnostic hash. Returning bytes keeps the config type ignorant of versioning, and keeps the combiner the **sole version authority**. See [the seam note](#where-the-hashing-logic-should-live). +`project(config, version) []byte` returns canonical bytes; the combiner in `fingerprint` owns the `sha256` and the version dispatch. A `ConfigHash()` method that returns a finished hash was rejected: it drags crypto + versioning onto a data type, and it tempts callers to route around the version registry to get a raw, version-agnostic hash. Returning bytes keeps the config type ignorant of versioning, and keeps the combiner the **sole version authority**. See [the seam note](#where-the-hashing-logic-should-live). ## Alternatives considered @@ -505,14 +585,16 @@ The lock **format** `Version` stays at `1`. An earlier draft bumped it (1→2) a - **Parallel versioned structs with per-struct `Hash()`** — couples locks to Go type identity and duplicates hashing logic per version. Rejected in favor of Part 2's integer-versioned combiner over frozen projections. - **Bump the lock format `Version` 1→2 as a poison pill** (an earlier draft's choice) — makes old binaries hard-reject reset locks. Rejected: it also blocks old binaries from reading pins to queue a build, and it is unnecessary, since the content-version registry already force-rehashes any sub-floor or downgraded token (D3). Same-format + force-rehash keeps old binaries useful without risking silent corruption. - **Eager fleet-wide migration as the steady-state mechanism** — rewriting every lock on every algorithm change is the mass-churn the design exists to prevent. Rejected for the steady state. The *reset* is a deliberate, one-time, operator-driven eager pass riding an already-scheduled rebuild — the sanctioned exception, not the rule; `component migrate` is its post-reset equivalent for retiring an old version. +- **Hand-written per-version `projectVN` selection functions (instead of version tags).** Each version gets a bespoke `func projectVN(c) []byte` with one explicit `emit`/`emitAlways` line per measured field. *Win:* field removal is **compile-enforced** — deleting a struct field a retained `projectVN` still names won't compile (the tag walker downgrades this to a CI-time golden-vector failure). *Losses:* membership is smeared across N function bodies instead of one declarative tag per field; "bring a field back a few versions later" has no first-class expression (you re-add an `emit` line, with nothing tying it to the field's earlier life); and the mandatory-decision property (G-1) needs a *separate* completeness test with an awkward field→emit-key bridge, where the tag simply *is* the ledger. Rejected in favor of version tags: the declarative lifecycle, native completeness, and expressible recovery outweigh trading one compile-time guard for an equally-blocking CI guard. +- **Per-field hash manifest in the lock (instead of one opaque token).** Store `{field → hash}` (à la `go.sum`) rather than a single `v:sha256:…` digest. *Genuine wins:* dropping a field becomes ignoring its manifest line — no projection kept alive for replay, so the **deprecate-then-delete two-step and the registry-retirement deadlock** (the append-only growth above) both vanish; and the stored-vs-stored historical comparators become structural set-diffs rather than version-blind string compares. *Why the opaque token still wins for azldev:* (1) the projection substrate **already** delivers additive immunity (G4) — the manifest's headline draw — so that advantage is moot, not additive; (2) the manifest does **not** kill the false-fresh hazard, contrary to first impression — an old lock has *no line* for a newly-measured input, so there is still no baseline to detect a change to it (the blind spot is relocated, not removed); (3) it makes *algorithm evolution* — the entire point of Part 2 — **harder**, needing per-field versioning where the token needs one integer for the whole algorithm; and (4) it bloats every lock to O(fields × components) (the well-known `go.sum` size cost). The manifest is the better tool for a *static* input set that mainly grows and shrinks; the opaque token + single version is the better tool for an *evolving hashing algorithm*, which is azldev's actual problem. Recorded explicitly because the reset bakes the storage model in — token-vs-manifest is irreversible after PR B — and the retirement deadlock the manifest would have dissolved is instead answered by the floor-advance cadence above. ## Incremental delivery The reset (Part 1) must land as one coherent change at the dev→prod cutover; its pieces are independently reviewable but ship together because they all move the hash. -1. **PR A (substrate)**: `projectVN` encoder (`canonicalBuf`, `emit`/`emitAlways`), `projectV1` with the explicit field list, `sha256` combiner, and the golden-vector test. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Unit tests: a field absent from `projectV1` is invisible to the digest; `emitAlways` fields hash even at zero; golden vectors pin the v1 output. +1. **PR A (substrate)**: the canonical encoder (`canonicalBuf`, `emit` with a per-range always flag), the generic tag-driven `project(cfg, N)` walker + version-set tag parser, **version tags on every fingerprinted field** (absent → build failure), the `sha256` combiner, and golden vectors. The mandatory-tag test replaces both the retired `TestAllFingerprintedFieldsHaveDecision` audit and its `expectedExclusions` map — the G-1 fix is now native to the tag, no field→key bridge needed. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Unit tests: a field tagged `v2..*` is invisible to a v1 projection; a `!`-prefixed range hashes even at zero; a field with **no** `fingerprint` tag fails the build; golden vectors pin v1; editing a shipped version's tag membership so an existing config's output moves fails a golden vector; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. 2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`. Ships at the cutover; absorbed by the scheduled rebuild. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. -3. **PR C (Part 2 machinery)**: the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`), `ComputeIdentityAt`, replay-before-`Changed` in `update.go`, and replay in `checkFingerprintFreshness`, `BuildDirtyChange`, **and the `changed.go` classifier**. Resolution replay reserved (slot reuses `computeRes1`). With only `v1` registered this is inert but proven. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write. +3. **PR C (Part 2 machinery)**: the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`), `ComputeIdentityAt`, and replay at the three *current-tree* sites — replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, and `BuildDirtyChange`. The two *historical* comparators (`FindFingerprintChanges`, `changed.go`'s `classifyComponent`) stay **string-only** and rely only on the strict-lazy guarantee, not replay. Resolution replay reserved (slot reuses `computeRes1`). **Not fully inert:** this PR switches the live current-tree compares from raw-string to replay-aware *on merge* — only the *registry dispatch* is dormant while just `v1` exists, and `BuildDirtyChange`'s replay is a hard prerequisite for any later PR that registers `v2`. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write. 4. **PR D (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) — add a field absent from `projectV1` and set it on one component; assert only that lock drifts and every other lock is byte-identical. 5. **PR E (config schema axis, later)**: `schema-version` field + load-time canonical migration + the `config migrate` command. Gated on the first post-reset non-additive TOML change not already absorbed by the reset's normalization pass. @@ -523,7 +605,5 @@ Each PR is independently revertible up to the cutover. PRs A–B land together a 1. Should a lazy re-stamp during a *read-only* command (`render`, `build` freshness check) write the lock back, or defer all writes to `component update`? Writing on read is surprising; deferring means freshness checks stay slightly slower until the next update. (Leaning: defer all writes to `update`, keeping reads side-effect-free.) 2. For the config schema axis, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. 3. Should the omit-if-zero predicate use `reflect.Value.IsZero()` (Go's notion) or a config-aware notion of "unset" (e.g. nil pointer vs empty string)? `projectVN` makes this a per-field choice in code, so it can differ field to field — but a default convention is still worth settling. -4. What is the canonical byte encoding for `projectVN` (length-prefixed key+value? a stable subset of TOML/CBOR?), and how are golden vectors stored and regenerated? This is the one substrate detail that is expensive to change after the reset. -5. Should the residual mixed-toolchain case get a hard write-time guard (refuse to write a token whose version exceeds `currentLockContentVersion`), or is force-rehash on read + the CI version-pin enough? (The operator escape hatch is `component migrate`; this question is only about the *automatic* guard.) -*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling); one shared content version covers both stored hashes, with resolution-hash replay reserved (slot present, fn reused) until `ComputeResolutionHash` first changes. +*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → build failure, `!`-prefix for always-emit), read by one generic walker — this restores "forgotten field → loud build failure" (G-1) natively and retires the `expectedExclusions` map; baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version covers both stored hashes now, with resolution-hash replay reserved (slot present, fn reused) and given its **own** prefix when `ComputeResolutionHash` first changes. From d45148fc507e3e62cda757fc008aa7fef81ad9d4 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Mon, 8 Jun 2026 16:07:15 -0700 Subject: [PATCH 05/15] update 4 --- docs/developer/rfc/lazy-schema-migration.md | 95 +++++++++++++-------- 1 file changed, 61 insertions(+), 34 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index 65f403c9..1bd76d5c 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -241,6 +241,10 @@ version = "v", digit, { digit } ; `*` resolves to "this version and every later one," so an *active* field never needs a tag edit across a version bump — only a field that is *dropped* at the bump gets its range closed (`v1..*` → `v1..vN`). +**The emit-key is the field's TOML key, frozen — never the Go identifier.** The walker emits each field under a stable string key and sorts by it; that key is the field's **`toml:` name** (or an explicit `key=` in the tag for the rare keyless field), pinned once and treated as part of the frozen output. It is deliberately *not* the Go field name, so a cosmetic Go rename (`Foo`→`Bar`, same TOML key, same tag) is genuinely byte-neutral — making the [struct-rename drift-neutral claim](#config-schema-version-and-canonical-migration-future) true at the *field* level too, not just the type level. Renaming the *emit-key* is an output-changing edit and therefore a version bump like any other. + +**The grammar is frozen at three operators** (`..` range, `!` always-emit, `*` open-end). It deliberately re-invents protobuf's `reserved` field-range discipline, and protobuf survived because `reserved` never grew. Adding a fourth operator is an RFC-grade change, not a tag edit — cheap insurance against a bespoke mini-language accreting edge cases. + **Recovery is the property that justifies the range syntax.** The hard requirement: if we drop a field, then versions later realize we need it again, we must be able to bring it back *without* disturbing any frozen historical hash. The rule that guarantees it: **you only ever *add* a range for the *new* version; you never edit a shipped version's membership.** Walk it: - `Foo` tagged `v1..*`. At the v2 bump we drop it → edit to `v1..v1`. v1 still emits `Foo`; v2+ does not. @@ -252,13 +256,28 @@ Every frozen output is byte-preserved, and the **golden vectors prove it**: the **What tags version, and what they don't.** Tags version *membership* — which fields a version measures. They do **not** version *encoding* — how a field's bytes are formed, or how the combiner folds non-field inputs. So the generic walker absorbs additive / removal / bring-back changes as pure tag edits (zero code), while a genuine encoding or combiner change still ships as versioned code in `computeFP(N+1)` (the walker output + the combiner step frozen at N). The taxonomy's non-additive rows are exactly that small set. -**Enforcement**, three layers: +> **The walker still reflects the live struct — tags freeze only *membership*, golden vectors freeze the rest.** Be honest about this: `project(cfg, N)` reflects the live struct exactly as the rejected `hashstructure` substrate did (Problem 6). The version-set tag re-freezes *which fields* a version measures, but three other things the walker reads from the live struct are **not** frozen by the tag — the **emit-key** (frozen instead by the immutable-TOML-key rule above), the **per-field encoding/type** (how a value becomes bytes), and the **zero-predicate** (what counts as omittable). All three are re-frozen by **golden-vector coverage**, not by code structure: a change to any of them moves a covered vector and fails CI. That makes [golden-vector coverage](#golden-vector-coverage-is-the-load-bearing-invariant) the load-bearing invariant of the whole substrate — the place this design trades a *structural* guarantee for a *test* guarantee, the way the atomic token (D3) trades the other way. It is acceptable only because the coverage rule below is stated and enforced. + +**Enforcement**, four layers — the first three are cheap syntactic guards, the fourth is the load-bearing one: + +1. **No tag → build failure** makes the include/exclude decision impossible to *forget*: a field with no `fingerprint` tag fails the build. This restores the safe default the bare projection gives up (G-1's *forgotten-field* case). +2. **Well-formedness:** ranges parse, are sorted and non-overlapping, and name no version above `currentLockContentVersion` (open `*` excepted); a `!` prefix is per-range and orthogonal. A malformed or future-referencing set fails the build. +3. **Exclusion ledger (kept, not retired).** A field tagged `fingerprint:"-"` is the **dangerous** direction — it removes the field from the hash, the G5-violating way to ship a stale artifact — so it gets *more* scrutiny, not less. An enumerated list (the surviving half of today's `expectedExclusions`) names every `-` field with a one-line justification; adding or removing a `-` requires editing the list, so an accidental exclusion fails CI. (An *earlier* draft of this RFC claimed mandatory tags "retire `expectedExclusions` outright" — that overreached: mandatory tags fix the *forgotten*-field default, but only an independent ledger catches a *wrongly-excluded* field, since no coverage test exercises a field that claims to be unmeasured.) +4. **Golden-vector coverage** — the keystone (next). + +#### Golden-vector coverage is the load-bearing invariant + +Frozen-ness moved from a *structural* property (the rejected hand-written functions would not *compile* if you deleted a named field) to a *test* property (the golden-vector table forbids the live walker from drifting). That trade is sound **only if the corpus actually exercises every field of every retained version.** State it as a first-class, enforced invariant: -1. **No tag → build failure** restores the safe default the projection otherwise gives up (G-1): a forgotten field fails loudly instead of silently dropping out of the hash. The tag *is* the per-field completeness ledger — no separate audit, no field→key bridge. -2. **Well-formedness:** ranges parse, are sorted and non-overlapping, and name no version above `currentLockContentVersion` (open `*` excepted); a `!` prefix is per-range and orthogonal to those checks. A malformed or future-referencing set fails the build. -3. **Golden vectors** pin `(config, version) → hash`, so any edit that changes a *shipped* version's output for an existing config fails CI. The precise rule is **output-preservation**, not literal membership-freezing: adding an *omit-if-zero* field to the current version (`vN..*`) is the one no-bump edit, because no existing config set it so no existing vector moves; every *output-changing* edit instead defines a **new** version (closing or opening a range at the bump), leaving all earlier versions byte-identical. You can introduce membership for the new version freely, but you can never silently rewrite a shipped hash. +> **Coverage invariant:** every field whose tag set includes a retained version MUST appear, **non-zero**, in at least one retained golden vector — and a discrimination check must vary it and assert the version's hash *moves*. A field that is never exercised non-zero is invisible to CI, so any drift in its emit-key, encoding, or zero-handling would pass silently. -This retires the `expectedExclusions` map in `fingerprint_test.go` outright: "no tag → fail" makes every decision explicit and local, and golden vectors catch accidental include→exclude edits — the map's two jobs, both subsumed. +This single rule mechanically closes three holes at once, each otherwise a silent stale-artifact (G5) bug that escapes CI *precisely because* some field is unset in the vectors: + +- **Wrong/narrow range** (e.g. `v1..v1` on a field that is build-effective at v2, or a typo'd gap `v1..v4,v6..*` dropping v5): the field is unmeasured at the current version, so the discrimination check at that version fails — varying it leaves the hash unmoved. +- **Emit-key drift** (a field rename that reached the key): the covered config now emits under a different key → its vector moves → CI fails. +- **Encoding/type drift** (a field's bytes change shape under the live walker): same — the covered vector moves. + +Without coverage, all three pass CI whenever the field happens to be zero in the corpus. With it, "the golden vectors prove frozen-ness" becomes a true statement instead of an aspiration. This is the substrate's one *test*-enforced (not structurally-enforced) guarantee, and it must be wired in PR A alongside the vectors themselves. ### Baseline v1 — omit-if-zero, no include-always legacy @@ -270,21 +289,29 @@ Because the reset rebuilds everything, there is **no pre-existing population to // What freezes a version is that its tags never change once shipped (golden // vectors enforce it) — not that the walker is bespoke. type ComponentConfig struct { - Upstream string `fingerprint:"v1..*"` // measured v1+, omit-if-zero - Patches []string `fingerprint:"v1..*"` // omit-if-zero (nil and [] both → absent) - StripDebug bool `fingerprint:"!v1..*"` // always-emit: zero (false) is build-meaningful - Internal string `fingerprint:"-"` // never measured + Upstream string `toml:"upstream" fingerprint:"v1..*"` // key "upstream", omit-if-zero + Patches []string `toml:"patches" fingerprint:"v1..*"` // omit-if-zero + StripDebug bool `toml:"strip-debug" fingerprint:"!v1..*"` // always-emit: zero (false) is build-meaningful + Internal string `toml:"-" fingerprint:"-"` // never measured // … every fingerprinted field carries an explicit tag; absent ⇒ build failure … } +// fingerprintFields recurses into nested fingerprinted structs (same scope as +// the retired audit), returning each leaf field with its frozen TOML key, its +// version-set, and the resolved value. The key — not the Go field name — is what +// gets emitted, so a Go rename is byte-neutral. func project(c *ComponentConfig, version int) []byte { var b canonicalBuf - for _, f := range fingerprintFields(c) { // reflection, cached, sorted by key + for _, f := range fingerprintFields(c) { // reflection, cached, sorted by TOML key r := f.set.rangeContaining(version) if r == nil { continue // field not measured at this version } - b.emit(f.key, f.value, r.always) // r.always (the range's '!') ⇒ emit even when zero + // omit when the resolved value IsZero, UNLESS this range is '!' (always-emit). + if !r.always && f.value.IsZero() { + continue + } + b.emit(f.key, f.value) } return b.Bytes() } @@ -301,8 +328,10 @@ So the classic false-negative requires absence ≠ zero-default *at the point of #### Edge cases under omit-if-zero +The omit predicate is **`reflect.Value.IsZero()`**, one global rule for every field (resolving former Open Q#3); `!` is the only per-field override. The consequences need stating because `IsZero` is type-specific: + - **Meaningful zero with a non-zero default** (e.g. `int Jobs` defaulting to `4`, where `0` means serial). Post-merge: unset → `4` (emitted), explicit `0` → omitted. These build differently *and* hash differently, so there is no collision — they are consistent. Use a `!` range only if a zero value must be distinguishable from a future change of default. -- **nil vs empty slice.** A missing TOML key → nil → omitted; `key = []` → non-nil empty → emitted. For any slice/map field where an explicit-empty value is reachable and build-meaningful, use a `!` range so nil and empty both hash. +- **nil vs empty slice — they hash *differently* under `IsZero`.** A nil slice is zero → omitted; a non-nil empty slice (`[]`) is **not** zero → emitted. If post-merge resolution can produce *either* nil or `[]` for the same intent, that ambiguity would move a hash — so the rule is: **resolution must normalize to one canonical form**, and where an explicit-empty value is build-meaningful and reachable, tag the field `!` so nil and empty both emit and stay distinguishable. This is a constraint on the resolver, pinned by a golden vector, not a free-for-all. ### The reset load-out — what to spend the free rebuild on @@ -399,7 +428,7 @@ This resolves Problems 2 (for default changes), 3 (hashing bugfixes), and 5 (pie `ComponentLock` carries two persisted content hashes: `InputFingerprint` (render inputs, via `projectVN` + `sha256`) and `ResolutionInputHash` (upstream-resolution inputs — a flat SHA256 over seven explicit fields in `ComputeResolutionHash`). Both have the **same evolution problem**: appending an input or reordering the fold moves every lock's hash → G1 churn. -We version them with **one shared integer** (the token's `v` prefix), not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version fields would double the floor/replay/migrate machinery for an input set (`ResolutionInputHash`) that changes rarely — YAGNI. +We version them with **one shared integer** (the token's `v` prefix), not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version fields would double the floor/replay/migrate machinery for an input set (`ResolutionInputHash`) that changes rarely — YAGNI. **The shared integer is permanent, made safe by digest-comparison.** The one hazard a shared prefix could create: a *resolution-only* algorithm bump drags the `InputFingerprint` token's prefix `v1`→`v2` while its digest is unchanged (the fingerprint algorithm was reused), and a *full-token* changelog walker would misread that prefix move as a release. We close it not by splitting the version but by having the historical changelog/classifier comparators compare the **digest** (the `:` tail), stripping the `v:` prefix: a resolution-only bump moves the prefix but not the digest → no phantom release; a real input change moves the digest → fires. Both fields are always co-written in the same `update` pass and the prefix advances whenever *either* algorithm advances, so the single prefix stays an honest version for both. (See [the synthetic-history path](#the-synthetic-changelogrelease-path-is-the-real-hazard).) **Phasing.** The atomic token format (`v:sha256:…`) is fixed at the reset, so there is no expensive-to-reverse key-naming decision left for Part 2. Fingerprint replay is wired in Part 2's first PR. **Resolution-hash replay is reserved, not yet wired** — the registry slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`), with no schema change. Critically, `ResolutionInputHash` does **not** feed the synthetic changelog path, so its churn is a one-line lock rewrite + a wasted re-resolution, never a phantom release (unlike `InputFingerprint`; see [Downstream consumers](#downstream-fingerprint-consumers-blast-radius)). @@ -442,8 +471,8 @@ This is correct *by contract* (a v1 lock promises freshness under the v1 input s Lazy migration means an untouched lock can sit at an old version **indefinitely** (G3 by design). That makes "keep the last *N* versions" a **correctness cliff, not a tuning knob**: if pruning drops the compute function a lock still depends on, replay becomes impossible → forced `FreshnessStale` → the mass rebuild/rewrite (and, via the downstream-consumer analysis below, mass changelog churn) the whole design exists to avoid. So the floor must be explicit and paired with an escape hatch, decided now: - **`minSupportedLockContentVersion`** is a hard floor. A lock below it cannot be replayed and is treated as `Stale`. Dropping a registry entry is therefore a deliberate, breaking, announced act — never incidental cleanup. -- **`component migrate`** (Open Q#5, promoted to a requirement) force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* — it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every token's digest — advancing the algorithm version is the whole point — so a fleet-wide migrate **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved token as notable, exactly as [the synthetic-history trap](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. That is *why* migrate is reset-grade and rare, not a free background sweep — the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal — each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). -- **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine — left alone, the registry, golden vectors, and deprecated tombstone fields grow **append-only** (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion − minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. +- **`component migrate`** (Open Q#5, promoted to a requirement) force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* — it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm — advancing that algorithm is the whole point — so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic-history trap](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm moves the shared prefix but not the `InputFingerprint` digest, so it is correctly release-silent.) That is *why* migrate is reset-grade and rare, not a free background sweep — the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal — each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). +- **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine — left alone, the registry, golden vectors, and deprecated tombstone fields grow **append-only** (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion − minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. **Residual, stated honestly:** if genuine algorithm changes arrive *faster* than planned rebuilds, the spread ceiling becomes a hard stop that *forces* an unplanned, release-grade `component migrate` — the one thing the design says is always deliberate. The ceiling does not eliminate the expensive event; it bounds the backlog by *converting* an unbounded version spread into an occasional forced migrate. That is the accepted price of lazy-forever coexistence, not a contradiction to hide. **Mixed-toolchain hazard — bounded by the version-pin, not auto-repair.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock, stamping a legacy (prefix-less) or lower-`v` hash. In the **working tree** this is self-correcting: the next new-binary run detects the sub-floor token and force-rehashes it to the current version. But "self-correcting" stops at the working tree — if a downgraded lock is **committed**, `FindFingerprintChanges` reads `v1 → legacy → v1` as two real release events, and a published `%autorelease` increment cannot be withdrawn. So the load-bearing guard against *committed* phantom releases is the **CI version-pin**: post-cutover, no old binary may run the `update`-and-commit step. (The force-rehash only cleans the working tree; it does not undo history.) The *symmetric* residual — a binary that predates content-version `v2` meeting a `v2` token it cannot replay — is closed by a **required** write-time guard (Open Q#5, now a requirement): refuse to write a token whose version exceeds the binary's `currentLockContentVersion`, erroring rather than silently restamping at `v1`. Note this guard lives in the binary doing the write, so it constrains *newer-but-not-newest* binaries; it does **not** retroactively constrain a genuinely *old* binary — that direction is the version-pin's job. @@ -461,7 +490,7 @@ Split the change into its two halves; they are handled independently: So the bump is **not breaking**: replay answers "were the *old* inputs unchanged?" without rebuilding. -**The one constraint replay still imposes: a field a retained version still measures must stay on the struct.** The projection is immune to field *additions* (the walker only emits fields whose tag set includes the target version, so a new field is invisible to old versions). It is *not* immune to field *removal*: v1 still measures `c` (its tag set includes v1) and the retained v1 golden vector sets `c`, so physically deleting `c` from the struct makes that vector's config unconstructable → the golden-vector test fails to build. (Hand-written `projectVN` functions would make this a *compile* error instead — a marginally stronger guard the tag walker trades for an equally-blocking CI one; see [D2](#d2--version-tagged-field-selection--golden-vectors).) Removal is therefore the one edit still gated by a **deprecate-then-delete** two-step, both non-breaking: +**The one constraint replay still imposes: a field a retained version still measures must stay on the struct.** The projection is immune to field *additions* (the walker only emits fields whose tag set includes the target version, so a new field is invisible to old versions). It is *not* immune to field *removal*: v1 still measures `c` (its tag set includes v1) and the retained v1 golden vector sets `c`, so physically deleting `c` from the struct makes that vector's config unconstructable → the golden-vector test fails to build. (Hand-written `projectVN` functions would make this a *compile* error instead — a marginally stronger guard the tag walker trades for an equally-blocking CI one; see [D2](#d2--version-tagged-field-selection--golden-vector-coverage).) Removal is therefore the one edit still gated by a **deprecate-then-delete** two-step, both non-breaking: 1. **Bump to v2 measuring `{a,b,d}` but keep field `c` on the struct** so the v1 projection can still read it for replay (close `c`'s tag to `v1..v1`, so v2 does not measure it). Every old lock replays clean at v1, is recognized as unchanged, lazy re-stamps to v2. Zero forced rebuilds. 2. **Only after the floor passes v1** (`minSupportedLockContentVersion = 2`, ideally after a deliberate `component migrate`) physically delete field `c` and `projectV1`. @@ -484,7 +513,7 @@ This is the on-disk TOML axis. It is **independent** of the fingerprint axis and The critical invariant: **migrate old TOML → latest canonical struct, then project once.** A semantically no-op migration (rename `foo`→`bar`) must produce the *same* canonical struct, hence the same projection bytes, hence no drift. This is what keeps the schema axis **orthogonal** to the lock axis: a faithful `config migrate` is a pure re-encoding that moves *no* fingerprint, so it never triggers a `component migrate`. If a TOML change genuinely alters build meaning, that is a content-version bump (Part 2), not a `config migrate`. -**Resolved by projection:** the old `hashstructure` caveat — that it mixed `reflect.Type.Name()` into the hash, so renaming a Go struct moved every fingerprint even with identical content — **no longer applies.** `projectVN` hashes only the explicit field bytes it emits, never the type name. A struct rename is now genuinely drift-neutral — **pinned by a golden test** (rename a fingerprinted struct while keeping its fields identical → byte-identical digest), so the property is CI-enforced, not just asserted here. +**Resolved by projection:** the old `hashstructure` caveat — that it mixed `reflect.Type.Name()` into the hash, so renaming a Go struct moved every fingerprint even with identical content — **no longer applies.** The walker hashes only the explicit field bytes it emits, under each field's **frozen TOML key**, never the Go type or field name. So *both* a struct-type rename **and** a cosmetic field rename (`Foo`→`Bar`, same `toml:` key) are genuinely drift-neutral — **pinned by golden tests** (rename a fingerprinted struct, and rename a field while keeping its TOML key → byte-identical digest in both cases), so the property is CI-enforced, not just asserted here. Renaming the *TOML key itself* is an output-changing edit and takes a version bump like any other. ## Pipeline @@ -508,27 +537,29 @@ The versioned-replay story in Part 2 must hold for **every** reader of `InputFin | -------- | ----- | -------- | --------------------------- | | `checkFingerprintFreshness` (resolver) | recomputed identity | vs stored token | Replay at token version (Part 2 core) | | `component update` `Changed` decision | recomputed identity | vs stored token | **Replay before `Changed`** (see churn policy seam) | -| `changed.go` `classifyComponent` / `haveMatchingFingerprints` (CI classifier) | stored token strings (two historical git refs) | string compare | **String-only — must NOT replay** (no inputs available, and replaying historical configs would violate the no-recompute invariant); kept honest by strict-lazy churn, exactly like `FindFingerprintChanges` | -| `synthistory.FindFingerprintChanges` | stored token strings across git history | adjacent commits | **No change needed — if migration stays lazy** | +| `changed.go` `classifyComponent` / `haveMatchingFingerprints` (CI classifier) | stored token strings (two historical git refs) | **digest compare** (strip `v:` prefix) | **String-only — must NOT replay** (no inputs available, and replaying historical configs would violate the no-recompute invariant); comparing the digest makes it immune to version-only deltas | +| `synthistory.FindFingerprintChanges` | stored token strings across git history | **digest of adjacent commits** (strip `v:` prefix) | **String-only; digest-compare** so a version-only re-stamp (including a resolution-only bump) never fires a release | | `synthistory.BuildDirtyChange` | recomputed (current ver) | vs stored `headLock` token | **Replay at headLock version** before declaring dirty | | `ResolutionInputHash` staleness/write | recomputed resolution hash | vs stored | **Shares the version; replay reserved, not yet wired** | **Two comparator classes, not one — and only one of them can replay.** The consumers split cleanly by *what they hold*: - **Current-tree comparators** (`checkFingerprintFreshness`, `update`'s `Changed`, `BuildDirtyChange`) recompute against *live inputs*, so they **can and must** replay at the stored token's version. Feasible and invariant-safe. -- **Stored-vs-stored historical comparators** (`FindFingerprintChanges`, `changed.go`'s `classifyComponent`/`haveMatchingFingerprints`) hold only committed token *strings* from two git refs — no config, no FS, no inputs. They **cannot** replay, and replaying would require recomputing a historical fingerprint, which the [forever-invariant](#back-compat-invariant--synthetic-history-reads-stored-strings-never-recomputes) forbids outright. Both stay **string-only**, kept honest by the *same* strict-lazy churn policy: under lazy migration a v1→v2 re-stamp only ever rides a commit whose inputs genuinely changed, so a raw string compare never sees a version-only delta. +- **Stored-vs-stored historical comparators** (`FindFingerprintChanges`, `changed.go`'s `classifyComponent`/`haveMatchingFingerprints`) hold only committed token *strings* from two git refs — no config, no FS, no inputs. They **cannot** replay, and replaying would require recomputing a historical fingerprint, which the [forever-invariant](#back-compat-invariant--synthetic-history-reads-stored-strings-never-recomputes) forbids outright. Both stay **string-only**, and both compare the **digest** (stripping the `v:` version prefix), which makes them inherently immune to version-only deltas — a v1→v2 re-stamp with an unchanged digest reads as "no change." (Strict-lazy churn is still the policy that keeps re-stamps from riding no-op commits in the first place, but the comparators no longer *depend* on it for correctness.) The `changed.go` classifier was the easily-missed member of the *second* class. The fix is **not** to make it replay-aware (impossible, and invariant-violating) — it is to confirm it lives under the strict-lazy guarantee, exactly as `FindFingerprintChanges` does. An earlier draft of this table wrongly demanded replay here; that obligation is removed. +**Why this contract is enforced by prose, not a type — and why that is acceptable here.** A reviewer-vigilance rule across ≥5 comparison sites is exactly the kind of discipline this RFC elsewhere converts to structure (the atomic token, D3). A stronger guard exists: model the stored hash as an opaque `fingerprint.Token` type (unexported internals) routed through one `Reconcile(lock) → {Fresh | Stale | RestampTo(v)}` API, so a *7th* consumer physically cannot write raw `==` and silently land in the wrong class. We **defer** it deliberately, for one reason that distinguishes it from the token-vs-manifest decision: it is **not a one-way door.** The `Token` type is a pure post-reset refactor that can land any time without touching the on-disk format. And the failure mode of the prose contract is *safe-by-construction*: a forgotten replay site compares a stored token to a current-version recompute, which under a version mismatch yields **in**equality → a spurious `Stale`/`Changed` → a wasteful rebuild (G1 churn), **never** a false match → stale artifact (G5). The dangerous direction is unreachable by omission. So the `Token` type is a *recommended follow-up* (it would also tidy N1's six existing sites), not a reset precondition. Tracked, not blocking. + ### The synthetic changelog/release path is the real hazard [`synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go) turns fingerprint movement into **user-visible, shipped** package state — `%autochangelog` entries and `%autorelease` increments. There are two distinct comparators, and the design resolves them asymmetrically. -- **`FindFingerprintChanges` (historical walker)** does a raw, version-blind string compare of `InputFingerprint` across the lock's git history and emits a synthetic changelog/release entry on every change. Making it genuinely version-aware is hard-to-infeasible — it only has committed *strings*, no inputs to replay. **It does not need to be**, *provided migration stays strictly lazy.* Under the churn policy, a version bump only ever rides a commit where a real input also changed, so there is never a version-only commit in history for the walker to misread. The migration folds honestly into that real change's entry. **This is a design decision, not a code fix:** the v1→v2 conversion is an *accepted, per-component, notable* changelog event that piggybacks a real change. - - **Trap:** this only holds while migration is lazy. A fleet-wide `component migrate` (or a regression where `Changed` flips for everyone) converts *phantom* → *honest-but-fleet-wide* — a truthful but fleet-wide release bump, i.e. **G1 is dead.** "Accept as notable" is therefore conditional on **migration never riding a version-only or fleet-wide write** (the `component migrate` floor pass and the one-time reset excepted, because they are deliberate and operator-driven). +- **`FindFingerprintChanges` (historical walker)** compares `InputFingerprint` across the lock's git history and emits a synthetic changelog/release entry on every change. It compares the **digest** (stripping the `v:` version prefix), not the full token — a one-line string operation, not the infeasible version-aware replay (it has only committed *strings*, no inputs). So a version-only re-stamp (a lazy v1→v2 with an unchanged digest, or a resolution-only bump that advances the shared prefix) is **invisible** to it; only a moved digest — a genuine input change — fires. The migration folds honestly into the real change's entry that carries it. **Design decision:** the v1→v2 conversion is an *accepted, per-component, notable* changelog event that piggybacks a real change, and digest-comparison makes that robust by construction rather than reliant on perfect lazy discipline. + - **`component migrate` is release-grade *when it moves digests*.** A migrate that retires a *fingerprint* algorithm re-stamps every unchanged lock from `computeFP1`'s digest to `computeFP2`'s — the digests move, the walker fires, and the fleet-wide release is the deliberate cost ([registry floor](#registry-floor-and-forced-migration)). A migrate that retires only a *resolution* algorithm moves the shared prefix but not the `InputFingerprint` digest, so it is correctly release-silent. Either way the firing is *honest* (a real digest move), never a phantom-prefix artifact — which is exactly what the strict-lazy policy used to guard against and digest-comparison now guarantees structurally. - **`BuildDirtyChange` (live dirty check)** compares a *recomputed* current-version (v2) hash against the *stored* (possibly v1) `headLock.InputFingerprint` and declares dirty on inequality. "Accept as notable" does **not** save this path: post-switchover an *unchanged* component would read **dirty on every `render`/`build`** until re-stamped — a persistent, recurring spurious signal, worse than a one-time entry. The fix is **free**: it is the *same replay Part 2 already owes the freshness check* — replay at `headLock`'s recorded version before declaring dirty. One additional call site for logic already being written, no new mechanism. -**Net:** the changelog-walker concern is not "make the walker version-aware" (hard, maybe infeasible). It is two things already on the books — (1) the strict lazy churn policy, so the walker never sees a version-only commit; and (2) extend the freshness replay to `BuildDirtyChange` and the `changed.go` classifier, a few extra call sites for logic already being written. The reset commit is the single deliberate exception: it *is* a fleet-wide notable event, the coordinated cutover, intentionally visible. +**Net:** the changelog-walker concern is not "make the walker version-aware" (hard, maybe infeasible). It is two cheap things — (1) the historical comparators (`FindFingerprintChanges`, `changed.go`) compare the **digest**, so a version-only delta never fires; and (2) extend the *current-tree* replay to `BuildDirtyChange` (which *does* hold live inputs), one call site for logic already being written. The reset commit is the single deliberate exception: it *is* a fleet-wide notable event, the coordinated cutover, intentionally visible. ### `ResolutionInputHash` — shares the version, replay deferred @@ -539,7 +570,7 @@ The single shared content version (the token's `v` prefix) covers it (see [Bo - `ResolutionInputHash` does **not** feed `synthistory` — so an algorithm change can never mint a phantom changelog/release (that hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. - It is a flat seven-field SHA256, not a struct walk, so the projection substrate leaves it untouched — it has no pending version event. Its registry slot stays `computeRes1` until its inputs genuinely change. -**Decision:** the atomic token format is fixed at the reset, so there is no irreversible key-naming decision left; wire fingerprint replay in Part 2's first PR; reserve resolution replay (slot present, prior fn reused) and wire it the day `ComputeResolutionHash` first changes — a localized follow-up with no schema change. KISS/YAGNI on the second replay. **One constraint for that day:** give `ResolutionInputHash` its *own* version prefix (decoupled from the `InputFingerprint` token) when replay is wired. Sharing one integer is fine *now* because resolution has no pending version event; but a *resolution-only* algorithm change must not be forced to advance the `InputFingerprint` token — that field feeds `FindFingerprintChanges`, so bumping it for a resolution change would mint a phantom release. Independent prefixes keep a resolution bump off the release-bearing field. +**Decision:** the atomic token format is fixed at the reset, so there is no irreversible key-naming decision left; wire fingerprint replay in Part 2's first PR; reserve resolution replay (slot present, prior fn reused) and wire it the day `ComputeResolutionHash` first changes — a localized follow-up with no schema change. KISS/YAGNI on the second replay. **One shared version, kept honest by digest-comparison — no split.** An earlier draft planned to give `ResolutionInputHash` its *own* prefix when replay is wired, reasoning that a resolution-only bump must not advance the release-bearing `InputFingerprint` token. That split is unnecessary: the historical changelog/classifier comparators compare the **digest**, not the version prefix (see [Both hashes share one version](#both-hashes-share-one-version)), so advancing the shared prefix for a resolution-only change moves no digest and mints no release. The shared integer therefore stays permanent; when `ComputeResolutionHash` first changes we add `computeRes2`, bump the shared version, and re-stamp both fields together — the `InputFingerprint` digest is unchanged (its algorithm was reused), so the changelog walker stays correctly silent. ## Design decisions @@ -557,16 +588,13 @@ Both can omit zero values; the decisive difference is **whether an old algorithm `Includable` keeps today's hashes byte-identical, which mattered for an *incremental* rollout — but that property is worthless once the reset rebuilds everything anyway, and it comes attached to a substrate that makes replay unsound. Projection trades byte-compatibility (which we are spending on the coordinated cutover regardless) for frozen replay (which we need forever). Adopted at the reset. -### D2 — Version-tagged field selection + golden vectors +### D2 — Version-tagged field selection + golden-vector coverage Field membership lives in a per-field version-set tag (`fingerprint:"v1..*"`) read by one generic walker — not in N hand-written functions, and not in the binary include/exclude tag of today's reflective audit. Rationale: -- **The unsafe direction is the false-negative** (a meaningful field silently omitted → missed rebuild → stale artifact). A *mandatory* tag — absent → build failure — makes the include/exclude decision impossible to forget, restoring the safe default a bare hand-written list quietly gives up (G-1). The tag *is* the per-field completeness ledger: no separate audit, no field→emit-key bridge. +- **The unsafe direction is the false-negative** (a meaningful field silently omitted → missed rebuild → stale artifact). A *mandatory* tag — absent → build failure — makes the include/exclude decision impossible to *forget* (G-1's forgotten-field case). The *wrongly-excluded* case (a `-` tag on a build-effective field) is caught by the kept exclusion ledger, and the *wrongly-included-but-unmeasured* case by golden-vector coverage — see [Enforcement](#golden-vector-coverage-is-the-load-bearing-invariant). - **Version-awareness is declarative.** A field's whole lifecycle — introduced at v3, dropped at v5, revived at v8 — is one greppable string on the field (`v3..v4,v8..*`), not a diff smeared across three function bodies. Recovery (bring-back) is *expressible* precisely because the set is non-contiguous. -- **Golden vectors freeze it.** Editing a shipped version's output changes a checked-in `(config, version) → hash` vector → CI failure — the same backstop that would protect hand-written functions, minus the per-version boilerplate. -- **It retires `expectedExclusions`.** The map in `fingerprint_test.go` exists to (a) force a decision on every field and (b) catch accidental exclusion-tag removal; no-tag-fail and golden vectors subsume both. - -The one thing hand-written functions do better: field *removal* is **compile-enforced** there (deleting a field a retained `projectVN` still names won't compile), where the tag walker downgrades it to an equally-blocking *golden-vector* build failure. That single marginal loss buys declarative lifecycles, native completeness, and first-class recovery — a good trade. The hand-written variant is kept as [Option B](#alternatives-considered). +- **Honest cost: frozen-ness is now *test*-enforced, not *structurally* enforced.** The walker reflects the live struct (the very thing Problem 6 rejected); only the version-set tag re-freezes membership, and golden-vector **coverage** re-freezes the emit-key, encoding, and zero-predicate. This is a real trade — a hand-written `projectVN` would make field removal a *compile* error — accepted because the coverage invariant turns it into an equally-blocking CI failure, and because the declarative lifecycle, native completeness, and first-class recovery are worth it. The hand-written variant is kept as [Option B](#alternatives-considered). ### D3 — Atomic self-describing token; no format bump, reconcile via force-rehash @@ -592,9 +620,9 @@ The lock **format** `Version` stays at `1`. An earlier draft bumped it (1→2) a The reset (Part 1) must land as one coherent change at the dev→prod cutover; its pieces are independently reviewable but ship together because they all move the hash. -1. **PR A (substrate)**: the canonical encoder (`canonicalBuf`, `emit` with a per-range always flag), the generic tag-driven `project(cfg, N)` walker + version-set tag parser, **version tags on every fingerprinted field** (absent → build failure), the `sha256` combiner, and golden vectors. The mandatory-tag test replaces both the retired `TestAllFingerprintedFieldsHaveDecision` audit and its `expectedExclusions` map — the G-1 fix is now native to the tag, no field→key bridge needed. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Unit tests: a field tagged `v2..*` is invisible to a v1 projection; a `!`-prefixed range hashes even at zero; a field with **no** `fingerprint` tag fails the build; golden vectors pin v1; editing a shipped version's tag membership so an existing config's output moves fails a golden vector; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. +1. **PR A (substrate)**: the canonical encoder (`canonicalBuf`, `emit` with a per-range always flag), the generic tag-driven `project(cfg, N)` walker (recursing into nested fingerprinted structs) + version-set tag parser, **version tags on every fingerprinted field** (absent → build failure), the frozen **TOML-key** emit rule, the `reflect.Value.IsZero()` omit-predicate, the `sha256` combiner, the golden vectors **and the coverage invariant** (every field measured at a retained version appears non-zero in ≥1 vector, with a discrimination check). The mandatory-tag test plus the slimmed exclusion ledger replace the retired `TestAllFingerprintedFieldsHaveDecision` audit — the inclusion default is now native to the tag, the exclusion default stays ledgered. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Unit tests: a field tagged `v2..*` is invisible to a v1 projection; a `!`-prefixed range hashes even at zero; a field with **no** `fingerprint` tag fails the build; a **nested** fingerprinted struct with a tagless field fails the build; a **Go-field rename keeping the TOML key** yields a byte-identical digest; the **coverage/discrimination** test fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`); golden vectors pin v1; editing a shipped version's output for an existing config fails a golden vector; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. 2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`. Ships at the cutover; absorbed by the scheduled rebuild. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. -3. **PR C (Part 2 machinery)**: the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`), `ComputeIdentityAt`, and replay at the three *current-tree* sites — replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, and `BuildDirtyChange`. The two *historical* comparators (`FindFingerprintChanges`, `changed.go`'s `classifyComponent`) stay **string-only** and rely only on the strict-lazy guarantee, not replay. Resolution replay reserved (slot reuses `computeRes1`). **Not fully inert:** this PR switches the live current-tree compares from raw-string to replay-aware *on merge* — only the *registry dispatch* is dormant while just `v1` exists, and `BuildDirtyChange`'s replay is a hard prerequisite for any later PR that registers `v2`. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write. +3. **PR C (Part 2 machinery)**: the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`), `ComputeIdentityAt`, and replay at the three *current-tree* sites — replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, and `BuildDirtyChange`. The two *historical* comparators (`FindFingerprintChanges`, `changed.go`'s `classifyComponent`) switch to **digest-compare** (strip the `v:` prefix), not replay. Resolution replay reserved (slot reuses `computeRes1`). **Not fully inert:** this PR switches the live current-tree compares from raw-string to replay-aware *on merge* — only the *registry dispatch* is dormant while just `v1` exists, and `BuildDirtyChange`'s replay is a hard prerequisite for any later PR that registers `v2`. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event. 4. **PR D (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) — add a field absent from `projectV1` and set it on one component; assert only that lock drifts and every other lock is byte-identical. 5. **PR E (config schema axis, later)**: `schema-version` field + load-time canonical migration + the `config migrate` command. Gated on the first post-reset non-additive TOML change not already absorbed by the reset's normalization pass. @@ -604,6 +632,5 @@ Each PR is independently revertible up to the cutover. PRs A–B land together a 1. Should a lazy re-stamp during a *read-only* command (`render`, `build` freshness check) write the lock back, or defer all writes to `component update`? Writing on read is surprising; deferring means freshness checks stay slightly slower until the next update. (Leaning: defer all writes to `update`, keeping reads side-effect-free.) 2. For the config schema axis, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. -3. Should the omit-if-zero predicate use `reflect.Value.IsZero()` (Go's notion) or a config-aware notion of "unset" (e.g. nil pointer vs empty string)? `projectVN` makes this a per-field choice in code, so it can differ field to field — but a default convention is still worth settling. -*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → build failure, `!`-prefix for always-emit), read by one generic walker — this restores "forgotten field → loud build failure" (G-1) natively and retires the `expectedExclusions` map; baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version covers both stored hashes now, with resolution-hash replay reserved (slot present, fn reused) and given its **own** prefix when `ComputeResolutionHash` first changes. +*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → build failure, `!`-prefix for always-emit), read by one generic walker — this restores "forgotten field → loud build failure" (G-1) natively; the **emit-key is the frozen TOML key** (never the Go field name, so a field rename is byte-neutral), the **omit-predicate is `reflect.Value.IsZero()`** (former Open Q#3), and the tag DSL is **frozen at three operators**; frozen-ness rests on the **golden-vector coverage invariant** (every field measured at a retained version appears non-zero in ≥1 vector, with a discrimination check) plus a **kept exclusion ledger** for `-` fields (the inclusion default is native to the tag; the *exclusion* default stays ledgered because it is the G5-dangerous direction); baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version covers both stored hashes **permanently** (no split) — the historical changelog/classifier comparators compare the **digest** (stripping the `v:` prefix), so advancing the shared prefix for a resolution-only algorithm change moves no digest and mints no release; resolution replay stays reserved (slot present, `computeRes1` reused) until `ComputeResolutionHash` first changes. From 8889b9185d4a430084de88d6923bc5c95c1f8d50 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Mon, 8 Jun 2026 16:25:32 -0700 Subject: [PATCH 06/15] update 5 --- docs/developer/rfc/lazy-schema-migration.md | 33 ++++++++++++++------- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index 1bd76d5c..6f234402 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -241,9 +241,20 @@ version = "v", digit, { digit } ; `*` resolves to "this version and every later one," so an *active* field never needs a tag edit across a version bump — only a field that is *dropped* at the bump gets its range closed (`v1..*` → `v1..vN`). -**The emit-key is the field's TOML key, frozen — never the Go identifier.** The walker emits each field under a stable string key and sorts by it; that key is the field's **`toml:` name** (or an explicit `key=` in the tag for the rare keyless field), pinned once and treated as part of the frozen output. It is deliberately *not* the Go field name, so a cosmetic Go rename (`Foo`→`Bar`, same TOML key, same tag) is genuinely byte-neutral — making the [struct-rename drift-neutral claim](#config-schema-version-and-canonical-migration-future) true at the *field* level too, not just the type level. Renaming the *emit-key* is an output-changing edit and therefore a version bump like any other. +**The emit-key is the field's TOML key, frozen — never the Go identifier.** The walker emits each field under a stable string key and sorts by it; that key is the field's **`toml:` name**, or — for a field with no usable TOML key — an explicit `key=` member in the `fingerprint` tag (grammar below). The key is pinned once and treated as part of the frozen output. It is deliberately *not* the Go field name, so a cosmetic Go rename (`Foo`→`Bar`, same TOML key, same tag) is genuinely byte-neutral — making the [struct-rename drift-neutral claim](#config-schema-version-and-canonical-migration-future) true at the *field* level too, not just the type level. Renaming the *emit-key* is an output-changing edit and therefore a version bump like any other. **Duplicate emit-keys within one retained version fail the build** (two fields resolving to the same key would collide and alias — a silent G5 hazard), so the key set is checked for uniqueness at every retained version. -**The grammar is frozen at three operators** (`..` range, `!` always-emit, `*` open-end). It deliberately re-invents protobuf's `reserved` field-range discipline, and protobuf survived because `reserved` never grew. Adding a fourth operator is an RFC-grade change, not a tag edit — cheap insurance against a bespoke mini-language accreting edge cases. +**The grammar is frozen at three range-operators** (`..` range, `!` always-emit, `*` open-end), plus the orthogonal `key=` override: + +```ebnf +tag = "-" | [ keyopt, "," ], set ; +keyopt = "key=", identifier ; (* emit-key override; default is the toml: name *) +set = member, { ",", member } ; +member = [ "!" ], range ; (* leading "!" ⇒ always-emit for this range *) +range = version, [ "..", ( version | "*" ) ] ; +version = "v", digit, { digit } ; +``` + +It deliberately re-invents protobuf's `reserved` field-range discipline, and protobuf survived because `reserved` never grew. Adding a fourth *range*-operator is an RFC-grade change, not a tag edit — cheap insurance against a bespoke mini-language accreting edge cases. **Recovery is the property that justifies the range syntax.** The hard requirement: if we drop a field, then versions later realize we need it again, we must be able to bring it back *without* disturbing any frozen historical hash. The rule that guarantees it: **you only ever *add* a range for the *new* version; you never edit a shipped version's membership.** Walk it: @@ -261,7 +272,7 @@ Every frozen output is byte-preserved, and the **golden vectors prove it**: the **Enforcement**, four layers — the first three are cheap syntactic guards, the fourth is the load-bearing one: 1. **No tag → build failure** makes the include/exclude decision impossible to *forget*: a field with no `fingerprint` tag fails the build. This restores the safe default the bare projection gives up (G-1's *forgotten-field* case). -2. **Well-formedness:** ranges parse, are sorted and non-overlapping, and name no version above `currentLockContentVersion` (open `*` excepted); a `!` prefix is per-range and orthogonal. A malformed or future-referencing set fails the build. +2. **Well-formedness:** ranges parse, are sorted and non-overlapping, and name no version above `currentLockContentVersion` (open `*` excepted); a `!` prefix is per-range and orthogonal; **emit-keys are unique within every retained version**. A malformed, future-referencing, or key-colliding set fails the build. 3. **Exclusion ledger (kept, not retired).** A field tagged `fingerprint:"-"` is the **dangerous** direction — it removes the field from the hash, the G5-violating way to ship a stale artifact — so it gets *more* scrutiny, not less. An enumerated list (the surviving half of today's `expectedExclusions`) names every `-` field with a one-line justification; adding or removing a `-` requires editing the list, so an accidental exclusion fails CI. (An *earlier* draft of this RFC claimed mandatory tags "retire `expectedExclusions` outright" — that overreached: mandatory tags fix the *forgotten*-field default, but only an independent ledger catches a *wrongly-excluded* field, since no coverage test exercises a field that claims to be unmeasured.) 4. **Golden-vector coverage** — the keystone (next). @@ -271,9 +282,11 @@ Frozen-ness moved from a *structural* property (the rejected hand-written functi > **Coverage invariant:** every field whose tag set includes a retained version MUST appear, **non-zero**, in at least one retained golden vector — and a discrimination check must vary it and assert the version's hash *moves*. A field that is never exercised non-zero is invisible to CI, so any drift in its emit-key, encoding, or zero-handling would pass silently. +**The oracle must be independent of the tag — the keystone's keystone.** The subtle trap: if the discrimination check derives *which versions to test* from the same range tag it is meant to police, a wrong-narrow tag silences its own check. A field build-effective today but mistagged `v1..v1` (while `currentLockContentVersion` is `v2`) tells a tag-derived generator "only check v1" — so the v2 discrimination check the invariant promises **never runs**, and the field silently drops out of the current hash (G5). The fix is a tag-independent oracle: **every fingerprinted field on the struct is expected to be measured-and-discriminating at the *current* version, unless it appears in a reviewed *dropped-fields ledger*** (sibling to the `-` exclusion ledger, naming each field intentionally closed at version N with a justification). The coverage test reads that ledger, *not* the range tag, to decide what to assert — so a range that excludes the current version without a matching ledger entry fails CI. The tag is the implementation; the ledger is the oracle; they are checked against each other. + This single rule mechanically closes three holes at once, each otherwise a silent stale-artifact (G5) bug that escapes CI *precisely because* some field is unset in the vectors: -- **Wrong/narrow range** (e.g. `v1..v1` on a field that is build-effective at v2, or a typo'd gap `v1..v4,v6..*` dropping v5): the field is unmeasured at the current version, so the discrimination check at that version fails — varying it leaves the hash unmoved. +- **Wrong/narrow range** (e.g. `v1..v1` on a field that is build-effective at v2, or a typo'd gap `v1..v4,v6..*` dropping v5): the field is not in the dropped-fields ledger, so the oracle still expects a *current-version* discrimination move — the tag says it is unmeasured there, the assertion fires, CI fails. (This is the case the tag-independent oracle exists to catch.) - **Emit-key drift** (a field rename that reached the key): the covered config now emits under a different key → its vector moves → CI fails. - **Encoding/type drift** (a field's bytes change shape under the live walker): same — the covered vector moves. @@ -472,7 +485,7 @@ Lazy migration means an untouched lock can sit at an old version **indefinitely* - **`minSupportedLockContentVersion`** is a hard floor. A lock below it cannot be replayed and is treated as `Stale`. Dropping a registry entry is therefore a deliberate, breaking, announced act — never incidental cleanup. - **`component migrate`** (Open Q#5, promoted to a requirement) force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* — it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm — advancing that algorithm is the whole point — so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic-history trap](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm moves the shared prefix but not the `InputFingerprint` digest, so it is correctly release-silent.) That is *why* migrate is reset-grade and rare, not a free background sweep — the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal — each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). -- **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine — left alone, the registry, golden vectors, and deprecated tombstone fields grow **append-only** (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion − minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. **Residual, stated honestly:** if genuine algorithm changes arrive *faster* than planned rebuilds, the spread ceiling becomes a hard stop that *forces* an unplanned, release-grade `component migrate` — the one thing the design says is always deliberate. The ceiling does not eliminate the expensive event; it bounds the backlog by *converting* an unbounded version spread into an occasional forced migrate. That is the accepted price of lazy-forever coexistence, not a contradiction to hide. +- **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine — left alone, the registry, golden vectors, and deprecated tombstone fields grow **append-only** (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion − minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. **Early-warning ramp:** the ceiling is a *warning at ceiling−1*, a hard failure only at the ceiling — so an approaching floor-raise surfaces as a heads-up on the PR *before* the one that registers `v(N+1)`, converting the forced migrate from a surprise blocking failure into a planned event (the design's stated goal that nothing *unplanned* ever forces a migrate). **Residual, stated honestly:** if genuine algorithm changes arrive *faster* than planned rebuilds, the ceiling still ultimately *forces* an unplanned, release-grade `component migrate`. The ceiling does not eliminate the expensive event; it bounds the backlog by *converting* an unbounded version spread into an occasional forced migrate, with one version of advance notice. That is the accepted price of lazy-forever coexistence, not a contradiction to hide. **Mixed-toolchain hazard — bounded by the version-pin, not auto-repair.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock, stamping a legacy (prefix-less) or lower-`v` hash. In the **working tree** this is self-correcting: the next new-binary run detects the sub-floor token and force-rehashes it to the current version. But "self-correcting" stops at the working tree — if a downgraded lock is **committed**, `FindFingerprintChanges` reads `v1 → legacy → v1` as two real release events, and a published `%autorelease` increment cannot be withdrawn. So the load-bearing guard against *committed* phantom releases is the **CI version-pin**: post-cutover, no old binary may run the `update`-and-commit step. (The force-rehash only cleans the working tree; it does not undo history.) The *symmetric* residual — a binary that predates content-version `v2` meeting a `v2` token it cannot replay — is closed by a **required** write-time guard (Open Q#5, now a requirement): refuse to write a token whose version exceeds the binary's `currentLockContentVersion`, erroring rather than silently restamping at `v1`. Note this guard lives in the binary doing the write, so it constrains *newer-but-not-newest* binaries; it does **not** retroactively constrain a genuinely *old* binary — that direction is the version-pin's job. @@ -547,9 +560,9 @@ The versioned-replay story in Part 2 must hold for **every** reader of `InputFin - **Current-tree comparators** (`checkFingerprintFreshness`, `update`'s `Changed`, `BuildDirtyChange`) recompute against *live inputs*, so they **can and must** replay at the stored token's version. Feasible and invariant-safe. - **Stored-vs-stored historical comparators** (`FindFingerprintChanges`, `changed.go`'s `classifyComponent`/`haveMatchingFingerprints`) hold only committed token *strings* from two git refs — no config, no FS, no inputs. They **cannot** replay, and replaying would require recomputing a historical fingerprint, which the [forever-invariant](#back-compat-invariant--synthetic-history-reads-stored-strings-never-recomputes) forbids outright. Both stay **string-only**, and both compare the **digest** (stripping the `v:` version prefix), which makes them inherently immune to version-only deltas — a v1→v2 re-stamp with an unchanged digest reads as "no change." (Strict-lazy churn is still the policy that keeps re-stamps from riding no-op commits in the first place, but the comparators no longer *depend* on it for correctness.) -The `changed.go` classifier was the easily-missed member of the *second* class. The fix is **not** to make it replay-aware (impossible, and invariant-violating) — it is to confirm it lives under the strict-lazy guarantee, exactly as `FindFingerprintChanges` does. An earlier draft of this table wrongly demanded replay here; that obligation is removed. +The `changed.go` classifier was the easily-missed member of the *second* class. The fix is **not** to make it replay-aware (impossible, and invariant-violating) — it is to give it the same **digest-compare** as `FindFingerprintChanges`, so a version-only delta reads as "no change." An earlier draft of this table wrongly demanded replay here; that obligation is removed. -**Why this contract is enforced by prose, not a type — and why that is acceptable here.** A reviewer-vigilance rule across ≥5 comparison sites is exactly the kind of discipline this RFC elsewhere converts to structure (the atomic token, D3). A stronger guard exists: model the stored hash as an opaque `fingerprint.Token` type (unexported internals) routed through one `Reconcile(lock) → {Fresh | Stale | RestampTo(v)}` API, so a *7th* consumer physically cannot write raw `==` and silently land in the wrong class. We **defer** it deliberately, for one reason that distinguishes it from the token-vs-manifest decision: it is **not a one-way door.** The `Token` type is a pure post-reset refactor that can land any time without touching the on-disk format. And the failure mode of the prose contract is *safe-by-construction*: a forgotten replay site compares a stored token to a current-version recompute, which under a version mismatch yields **in**equality → a spurious `Stale`/`Changed` → a wasteful rebuild (G1 churn), **never** a false match → stale artifact (G5). The dangerous direction is unreachable by omission. So the `Token` type is a *recommended follow-up* (it would also tidy N1's six existing sites), not a reset precondition. Tracked, not blocking. +**This contract is enforced by a type, not prose — the `fingerprint.Token` choke-point lands in PR C.** A reviewer-vigilance rule across the five-plus comparison sites is exactly the kind of discipline this RFC elsewhere converts to structure (the atomic token, D3), and the digest-comparison rework *raised* the stakes: it scattered `v:` prefix-parsing across two historical + three current-tree sites, and it blessed a **copyable** "digest-compare two stored strings" template. The residual hazard is therefore no longer mere *omission* — a forgotten replay at a current-tree site fails *safely* toward inequality → spurious `Stale`/`Changed` → wasteful rebuild (G1 churn, never G5) — it is *mis-classification*: a future consumer that holds live inputs but cargo-cults the **historical** stored-string template never looks at those inputs → silently accepts a stale tree → reachable G5. Omission is safe; mis-classification is not, and only structure closes it. So PR C (which already edits every one of those sites) introduces an opaque `fingerprint.Token` type (unexported internals) with **one strict parser** — `ParseToken` accepts only `sha256:` (legacy) and `v:sha256:`, treats any malformed token as *changed* (never normalizing a parse failure to an empty digest), and is the *sole* way to read a stored hash — routed through a single `Reconcile(lock) → {Fresh | Stale | RestampTo(v)}` API. A raw `==` on a stored hash outside that package will not compile. This converts the prose two-class contract into a compiler-enforced one at zero marginal cost, because the sites are being touched anyway. (Not a one-way door — the `Token` type has no on-disk-format dependency — but there is no reason to touch all five sites twice and carry the mis-classification window in between.) ### The synthetic changelog/release path is the real hazard @@ -620,9 +633,9 @@ The lock **format** `Version` stays at `1`. An earlier draft bumped it (1→2) a The reset (Part 1) must land as one coherent change at the dev→prod cutover; its pieces are independently reviewable but ship together because they all move the hash. -1. **PR A (substrate)**: the canonical encoder (`canonicalBuf`, `emit` with a per-range always flag), the generic tag-driven `project(cfg, N)` walker (recursing into nested fingerprinted structs) + version-set tag parser, **version tags on every fingerprinted field** (absent → build failure), the frozen **TOML-key** emit rule, the `reflect.Value.IsZero()` omit-predicate, the `sha256` combiner, the golden vectors **and the coverage invariant** (every field measured at a retained version appears non-zero in ≥1 vector, with a discrimination check). The mandatory-tag test plus the slimmed exclusion ledger replace the retired `TestAllFingerprintedFieldsHaveDecision` audit — the inclusion default is now native to the tag, the exclusion default stays ledgered. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Unit tests: a field tagged `v2..*` is invisible to a v1 projection; a `!`-prefixed range hashes even at zero; a field with **no** `fingerprint` tag fails the build; a **nested** fingerprinted struct with a tagless field fails the build; a **Go-field rename keeping the TOML key** yields a byte-identical digest; the **coverage/discrimination** test fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`); golden vectors pin v1; editing a shipped version's output for an existing config fails a golden vector; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. +1. **PR A (substrate)**: the canonical encoder (`canonicalBuf`, `emit` with a per-range always flag), the generic tag-driven `project(cfg, N)` walker (recursing into nested fingerprinted structs) + version-set tag parser, **version tags on every fingerprinted field** (absent → build failure), the frozen **TOML-key** emit rule, the `reflect.Value.IsZero()` omit-predicate, the `sha256` combiner, the golden vectors **and the coverage invariant** (every field measured at a retained version appears non-zero in ≥1 vector, with a discrimination check). The mandatory-tag test plus the slimmed exclusion ledger replace the retired `TestAllFingerprintedFieldsHaveDecision` audit — the inclusion default is now native to the tag, the exclusion default stays ledgered. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Unit tests: a field tagged `v2..*` is invisible to a v1 projection; a `!`-prefixed range hashes even at zero; a field with **no** `fingerprint` tag fails the build; a **nested** fingerprinted struct with a tagless field fails the build; a **Go-field rename keeping the TOML key** yields a byte-identical digest; two fields colliding on one emit-key fail the build; a field whose range excludes the current version without a **dropped-fields-ledger** entry fails the coverage/discrimination check; the **coverage/discrimination** test fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`); golden vectors pin v1; editing a shipped version's output for an existing config fails a golden vector; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. 2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`. Ships at the cutover; absorbed by the scheduled rebuild. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. -3. **PR C (Part 2 machinery)**: the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`), `ComputeIdentityAt`, and replay at the three *current-tree* sites — replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, and `BuildDirtyChange`. The two *historical* comparators (`FindFingerprintChanges`, `changed.go`'s `classifyComponent`) switch to **digest-compare** (strip the `v:` prefix), not replay. Resolution replay reserved (slot reuses `computeRes1`). **Not fully inert:** this PR switches the live current-tree compares from raw-string to replay-aware *on merge* — only the *registry dispatch* is dormant while just `v1` exists, and `BuildDirtyChange`'s replay is a hard prerequisite for any later PR that registers `v2`. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event. +3. **PR C (Part 2 machinery)**: the opaque **`fingerprint.Token` type** (unexported internals) with the single strict `ParseToken` (accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match) and the `Reconcile(lock) → {Fresh | Stale | RestampTo(v)}` API; the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **all five** comparison sites through `Token`/`Reconcile` — replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, `BuildDirtyChange` (the three current-tree sites), plus digest-compare in `FindFingerprintChanges` and `changed.go`'s `classifyComponent` (the two historical sites). Resolution replay reserved (slot reuses `computeRes1`). **Not fully inert:** this PR switches the live compares from raw-string to `Token`-routed *on merge* — only the *registry dispatch* is dormant while just `v1` exists, and `BuildDirtyChange`'s replay is a hard prerequisite for any later PR that registers `v2`. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a stored hash outside the `fingerprint` package fails to compile. 4. **PR D (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) — add a field absent from `projectV1` and set it on one component; assert only that lock drifts and every other lock is byte-identical. 5. **PR E (config schema axis, later)**: `schema-version` field + load-time canonical migration + the `config migrate` command. Gated on the first post-reset non-additive TOML change not already absorbed by the reset's normalization pass. @@ -633,4 +646,4 @@ Each PR is independently revertible up to the cutover. PRs A–B land together a 1. Should a lazy re-stamp during a *read-only* command (`render`, `build` freshness check) write the lock back, or defer all writes to `component update`? Writing on read is surprising; deferring means freshness checks stay slightly slower until the next update. (Leaning: defer all writes to `update`, keeping reads side-effect-free.) 2. For the config schema axis, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. -*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → build failure, `!`-prefix for always-emit), read by one generic walker — this restores "forgotten field → loud build failure" (G-1) natively; the **emit-key is the frozen TOML key** (never the Go field name, so a field rename is byte-neutral), the **omit-predicate is `reflect.Value.IsZero()`** (former Open Q#3), and the tag DSL is **frozen at three operators**; frozen-ness rests on the **golden-vector coverage invariant** (every field measured at a retained version appears non-zero in ≥1 vector, with a discrimination check) plus a **kept exclusion ledger** for `-` fields (the inclusion default is native to the tag; the *exclusion* default stays ledgered because it is the G5-dangerous direction); baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version covers both stored hashes **permanently** (no split) — the historical changelog/classifier comparators compare the **digest** (stripping the `v:` prefix), so advancing the shared prefix for a resolution-only algorithm change moves no digest and mints no release; resolution replay stays reserved (slot present, `computeRes1` reused) until `ComputeResolutionHash` first changes. +*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → build failure, `!`-prefix for always-emit), read by one generic walker — this restores "forgotten field → loud build failure" (G-1) natively; the **emit-key is the frozen TOML key** (never the Go field name, so a field rename is byte-neutral; `key=` overrides for keyless fields; duplicate emit-keys fail the build), the **omit-predicate is `reflect.Value.IsZero()`** (former Open Q#3), and the tag DSL is **frozen at three range-operators** (`..`, `!`, `*`) plus the orthogonal `key=`; frozen-ness rests on the **golden-vector coverage invariant** (every field measured at a retained version appears non-zero in ≥1 vector, with a discrimination check) whose oracle is a **tag-independent dropped-fields ledger** (a field absent from it must discriminate at the *current* version), plus a **kept exclusion ledger** for `-` fields (the inclusion default is native to the tag; the *exclusion* default stays ledgered because it is the G5-dangerous direction); the stored hash is read only through an opaque **`fingerprint.Token`** type with one strict parser and a `Reconcile` API (adopted in PR C, closing the comparator mis-classification hazard structurally rather than by prose); baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version covers both stored hashes **permanently** (no split) — the historical changelog/classifier comparators compare the **digest** (stripping the `v:` prefix), so advancing the shared prefix for a resolution-only algorithm change moves no digest and mints no release; resolution replay stays reserved (slot present, `computeRes1` reused) until `ComputeResolutionHash` first changes. From dddb71595561d4a81d9209fce86819e922202545 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Mon, 8 Jun 2026 16:35:18 -0700 Subject: [PATCH 07/15] cleanup --- docs/developer/rfc/lazy-schema-migration.md | 54 ++++++++++----------- 1 file changed, 26 insertions(+), 28 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index 6f234402..d223ca10 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -139,7 +139,7 @@ Not every config change should be treated the same way. The right mechanism depe | ----- | ------- | ------------------------------ | --------- | | **Additive field** | new `foo` field, unset on most components | No — only setters drift | **Free, no bump.** Tag the new field `vN..*` (current version, omit-if-zero); a component that leaves it unset emits identical bytes, so no shipped hash moves — adding an omit-if-zero field to the live version is the one output-preserving no-bump edit. Setters drift (correct). | | **Additive with non-zero default** | new field defaulted to `"auto"` via defaults merge | No | **Bump + replay.** The default resolves non-zero on *every* component, so it is emitted everywhere and would move every hash — omit-if-zero can't save it. Bump and tag the field `v(N+1)..*`; old locks **replay at their version** (whose set excludes it), match their stored digest → recognized unchanged → lazy re-stamp, no rebuild. | -| **Default change on an *existing* field** | bump `jobs` default `4`→`8` | Yes — every component's effective input moved | **Not lazy-maskable.** Replay recomputes the *current* config (now resolving to `8`) under the old algorithm → `jobs=8` ≠ stored `jobs=4` → honest fleet-wide drift; replay cannot suppress it because the resolved value genuinely changed for everyone. Escape hatch: `config migrate` writes the *old* resolved value explicitly (`jobs=4`) into each config **before** moving the default — existing components then pin the old value (no drift) and only new components pick up `8`. Without that pre-pass it is a legitimate (if large) fleet rebuild, not a bug. | +| **Default change on an *existing* field** | bump `jobs` default `4`→`8` | Yes — every component's effective input moved | **Not lazy-maskable.** Replay recomputes the *current* config (now resolving to `8`) under the old algorithm → `jobs=8` ≠ stored `jobs=4` → genuine fleet-wide drift; replay cannot suppress it because the resolved value genuinely changed for everyone. Escape hatch: `config migrate` writes the *old* resolved value explicitly (`jobs=4`) into each config **before** moving the default — existing components then pin the old value (no drift) and only new components pick up `8`. Without that pre-pass it is a legitimate (if large) fleet rebuild, not a bug. | | **Rename / move** | `foo` → `bar`, same semantics | No | **Schema migration + bump + replay.** Migrate old TOML → canonical struct (the rename lands in the struct), then tag the renamed field `v(N+1)..*`. Old locks replay at their version and are recognized unchanged → lazy re-stamp, no rebuild. | | **Semantic change** | meaning of `foo` changes; output differs | Yes — that's correct | **None.** The build output genuinely differs, so the lock *should* drift. Replay at the old version would (correctly) mismatch → `Stale` → rebuild. Nothing to suppress. | | **Hashing bugfix** | overlay ordering bug in the combiner | No | **Bump + replay.** Ship the fixed combiner as the version-`N+1` half of `computeFP(N+1)`; old locks replay at the old (buggy) version. If their inputs are unchanged the buggy digest still matches → recognized unchanged → lazy re-stamp to the fixed version, no rebuild. | @@ -171,7 +171,7 @@ The projection substrate is what makes G4 true for old locks and what makes Part The common pattern: an **integer version stamped into the persisted artifact**, plus the ability to **read and replay older versions**, plus **lazy forward-migration on write**. We keep `ComponentLock.Version` (the lock *format* slot) fixed at `1` and carry the *content* version **inside the `InputFingerprint` token** (`v:sha256:…`) rather than in a separate struct field — one atomic value, no version/digest desync, no new TOML field for an old binary to mishandle. The Go-modules lesson is the deepest one: hashing *content* rather than struct shape is what makes additive metadata free — the canonical-projection substrate is our version of that lesson. -**Where we go *beyond* the precedent (stated honestly).** All four tools above keep exactly **one** active algorithm: Cargo/npm/Terraform rewrite the *whole* artifact to the current version on next touch (eager-on-write), and Go modules sidestep replay entirely by never re-migrating semantics. **None of them keeps N historical hashing algorithms alive simultaneously across an indefinitely-unmigrated fleet** — which is exactly Part 2's behavior. The citations support "version stamp + lazy forward-migrate on write"; they do *not* cover "frozen algorithms coexisting forever." That coexistence is justified here on its own terms (it is what avoids a fleet rebuild on every algorithm change), and its one real cost — append-only registry growth — is bounded by the [floor-advance cadence](#registry-floor-and-forced-migration), not by precedent. +**Where this design goes beyond the precedent.** All four tools above keep exactly **one** active algorithm: Cargo/npm/Terraform rewrite the *whole* artifact to the current version on next touch (eager-on-write), and Go modules sidestep replay entirely by never re-migrating semantics. **None of them keeps N historical hashing algorithms alive simultaneously across an indefinitely-unmigrated fleet** — which is exactly Part 2's behavior. The citations support "version stamp + lazy forward-migrate on write"; they do *not* cover "frozen algorithms coexisting forever." That coexistence is justified here on its own terms (it is what avoids a fleet rebuild on every algorithm change), and its one real cost — append-only registry growth — is bounded by the [floor-advance cadence](#registry-floor-and-forced-migration), not by precedent. ### Where the hashing logic should live @@ -263,17 +263,17 @@ It deliberately re-invents protobuf's `reserved` field-range discipline, and pro Every frozen output is byte-preserved, and the **golden vectors prove it**: the edit `v1..v1` → `v1..v1,v5..*` must leave the v1–v4 vectors identical or CI fails. The grammar lets you *express* the non-contiguous set; the golden vectors *forbid* rewriting history while doing so. Two recovery flavors, both covered: (a) field still on the struct (lingering for replay) → reopen its range; (b) field already physically deleted (floor passed it) → bring-back is just a fresh additive field tagged `vN..*`. Same outcome, no special case. -**Always-emit is per-range, for the same reason.** Whether a field's *zero value emits* can change over time just as its membership can — so `!` flags an individual range, not the whole field. `v1..v4,!v5..*` means omit-if-zero through v4, then always-emit from v5. Toggling it is an *output-changing* edit (a zero-valued field starts or stops emitting), so it lands as a new range at a new version exactly like a drop/re-add — same output-preservation rule, same golden-vector enforcement. The walker just asks "which range holds N, and is it `!`?" This is why the earlier whole-field `always` flag was wrong: it could not be toggled temporally without abandoning the generic walker. +**Always-emit is per-range, for the same reason.** Whether a field's *zero value emits* can change over time just as its membership can — so `!` flags an individual range, not the whole field. `v1..v4,!v5..*` means omit-if-zero through v4, then always-emit from v5. Toggling it is an *output-changing* edit (a zero-valued field starts or stops emitting), so it lands as a new range at a new version exactly like a drop/re-add — same output-preservation rule, same golden-vector enforcement. The walker just asks "which range holds N, and is it `!`?" A whole-field always-flag could not express this temporal toggle without a per-version walker. **What tags version, and what they don't.** Tags version *membership* — which fields a version measures. They do **not** version *encoding* — how a field's bytes are formed, or how the combiner folds non-field inputs. So the generic walker absorbs additive / removal / bring-back changes as pure tag edits (zero code), while a genuine encoding or combiner change still ships as versioned code in `computeFP(N+1)` (the walker output + the combiner step frozen at N). The taxonomy's non-additive rows are exactly that small set. -> **The walker still reflects the live struct — tags freeze only *membership*, golden vectors freeze the rest.** Be honest about this: `project(cfg, N)` reflects the live struct exactly as the rejected `hashstructure` substrate did (Problem 6). The version-set tag re-freezes *which fields* a version measures, but three other things the walker reads from the live struct are **not** frozen by the tag — the **emit-key** (frozen instead by the immutable-TOML-key rule above), the **per-field encoding/type** (how a value becomes bytes), and the **zero-predicate** (what counts as omittable). All three are re-frozen by **golden-vector coverage**, not by code structure: a change to any of them moves a covered vector and fails CI. That makes [golden-vector coverage](#golden-vector-coverage-is-the-load-bearing-invariant) the load-bearing invariant of the whole substrate — the place this design trades a *structural* guarantee for a *test* guarantee, the way the atomic token (D3) trades the other way. It is acceptable only because the coverage rule below is stated and enforced. +> **The walker still reflects the live struct — tags freeze only *membership*, golden vectors freeze the rest.** `project(cfg, N)` reflects the live struct exactly as the rejected `hashstructure` substrate did (Problem 6). The version-set tag re-freezes *which fields* a version measures, but three other things the walker reads from the live struct are **not** frozen by the tag — the **emit-key** (frozen instead by the immutable-TOML-key rule above), the **per-field encoding/type** (how a value becomes bytes), and the **zero-predicate** (what counts as omittable). All three are re-frozen by **golden-vector coverage**, not by code structure: a change to any of them moves a covered vector and fails CI. [Golden-vector coverage](#golden-vector-coverage-is-the-load-bearing-invariant) is therefore the load-bearing invariant of the whole substrate — the one place this design trades a *structural* guarantee for a *test* guarantee. It is sound only because the coverage rule below is stated and enforced. **Enforcement**, four layers — the first three are cheap syntactic guards, the fourth is the load-bearing one: -1. **No tag → build failure** makes the include/exclude decision impossible to *forget*: a field with no `fingerprint` tag fails the build. This restores the safe default the bare projection gives up (G-1's *forgotten-field* case). +1. **No tag → build failure** makes the include/exclude decision impossible to *forget*: a field with no `fingerprint` tag fails the build. This restores the safe inclusion default the bare projection gives up — a forgotten field would otherwise drop silently out of the hash (a G5 stale-artifact hazard). 2. **Well-formedness:** ranges parse, are sorted and non-overlapping, and name no version above `currentLockContentVersion` (open `*` excepted); a `!` prefix is per-range and orthogonal; **emit-keys are unique within every retained version**. A malformed, future-referencing, or key-colliding set fails the build. -3. **Exclusion ledger (kept, not retired).** A field tagged `fingerprint:"-"` is the **dangerous** direction — it removes the field from the hash, the G5-violating way to ship a stale artifact — so it gets *more* scrutiny, not less. An enumerated list (the surviving half of today's `expectedExclusions`) names every `-` field with a one-line justification; adding or removing a `-` requires editing the list, so an accidental exclusion fails CI. (An *earlier* draft of this RFC claimed mandatory tags "retire `expectedExclusions` outright" — that overreached: mandatory tags fix the *forgotten*-field default, but only an independent ledger catches a *wrongly-excluded* field, since no coverage test exercises a field that claims to be unmeasured.) +3. **Exclusion ledger (kept, not retired).** A field tagged `fingerprint:"-"` is the **dangerous** direction — it removes the field from the hash, the G5-violating way to ship a stale artifact — so it gets *more* scrutiny, not less. An enumerated list (the surviving half of today's `expectedExclusions`) names every `-` field with a one-line justification; adding or removing a `-` requires editing the list, so an accidental exclusion fails CI. Mandatory tags fix the *forgotten*-field default, but only this independent ledger catches a *wrongly-excluded* field — no coverage test exercises a field that claims to be unmeasured. 4. **Golden-vector coverage** — the keystone (next). #### Golden-vector coverage is the load-bearing invariant @@ -282,7 +282,7 @@ Frozen-ness moved from a *structural* property (the rejected hand-written functi > **Coverage invariant:** every field whose tag set includes a retained version MUST appear, **non-zero**, in at least one retained golden vector — and a discrimination check must vary it and assert the version's hash *moves*. A field that is never exercised non-zero is invisible to CI, so any drift in its emit-key, encoding, or zero-handling would pass silently. -**The oracle must be independent of the tag — the keystone's keystone.** The subtle trap: if the discrimination check derives *which versions to test* from the same range tag it is meant to police, a wrong-narrow tag silences its own check. A field build-effective today but mistagged `v1..v1` (while `currentLockContentVersion` is `v2`) tells a tag-derived generator "only check v1" — so the v2 discrimination check the invariant promises **never runs**, and the field silently drops out of the current hash (G5). The fix is a tag-independent oracle: **every fingerprinted field on the struct is expected to be measured-and-discriminating at the *current* version, unless it appears in a reviewed *dropped-fields ledger*** (sibling to the `-` exclusion ledger, naming each field intentionally closed at version N with a justification). The coverage test reads that ledger, *not* the range tag, to decide what to assert — so a range that excludes the current version without a matching ledger entry fails CI. The tag is the implementation; the ledger is the oracle; they are checked against each other. +**The oracle must be independent of the tag.** The subtle trap: if the discrimination check derives *which versions to test* from the same range tag it is meant to police, a wrong-narrow tag silences its own check. A field build-effective today but mistagged `v1..v1` (while `currentLockContentVersion` is `v2`) tells a tag-derived generator "only check v1" — so the v2 discrimination check the invariant promises **never runs**, and the field silently drops out of the current hash (G5). The fix is a tag-independent oracle: **every fingerprinted field on the struct is expected to be measured-and-discriminating at the *current* version, unless it appears in a reviewed *dropped-fields ledger*** (sibling to the `-` exclusion ledger, naming each field intentionally closed at version N with a justification). The coverage test reads that ledger, *not* the range tag, to decide what to assert — so a range that excludes the current version without a matching ledger entry fails CI. The tag is the implementation; the ledger is the oracle; they are checked against each other. This single rule mechanically closes three holes at once, each otherwise a silent stale-artifact (G5) bug that escapes CI *precisely because* some field is unset in the vectors: @@ -367,7 +367,7 @@ The stored hash becomes a single self-describing token: input-fingerprint = "v1:sha256:9f86d0…" # :: ``` -One field carries both the content version and the digest, so they cannot be written out of step (a class of desync bug the prior split-field design was exposed to). Parsing splits on `:`; an absent prefix on a pre-reset lock reads as the legacy format. +One field carries both the content version and the digest, so they cannot be written out of step (the desync bug a split version/digest field invites). Parsing splits on `:`; an absent prefix on a pre-reset lock reads as the legacy format. The lock **format** `Version` stays at `1`. The on-disk *schema* is unchanged — same fields, same TOML shape — so an old binary still parses a reset lock and reads its pins (`upstream-commit`, `import-commit`, `manual-bump`), which is all it needs to queue a build. What changes is the *value* of `InputFingerprint`: the substrate swap is expressed purely as a content-version step, and the reset is the **first forced upgrade** to the `v1:` token. The existing singleton `Parse` gate (`Version == 1`) is left untouched; all substrate/version reconciliation routes through the content-version registry instead of a format gate. @@ -441,9 +441,9 @@ This resolves Problems 2 (for default changes), 3 (hashing bugfixes), and 5 (pie `ComponentLock` carries two persisted content hashes: `InputFingerprint` (render inputs, via `projectVN` + `sha256`) and `ResolutionInputHash` (upstream-resolution inputs — a flat SHA256 over seven explicit fields in `ComputeResolutionHash`). Both have the **same evolution problem**: appending an input or reordering the fold moves every lock's hash → G1 churn. -We version them with **one shared integer** (the token's `v` prefix), not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version fields would double the floor/replay/migrate machinery for an input set (`ResolutionInputHash`) that changes rarely — YAGNI. **The shared integer is permanent, made safe by digest-comparison.** The one hazard a shared prefix could create: a *resolution-only* algorithm bump drags the `InputFingerprint` token's prefix `v1`→`v2` while its digest is unchanged (the fingerprint algorithm was reused), and a *full-token* changelog walker would misread that prefix move as a release. We close it not by splitting the version but by having the historical changelog/classifier comparators compare the **digest** (the `:` tail), stripping the `v:` prefix: a resolution-only bump moves the prefix but not the digest → no phantom release; a real input change moves the digest → fires. Both fields are always co-written in the same `update` pass and the prefix advances whenever *either* algorithm advances, so the single prefix stays an honest version for both. (See [the synthetic-history path](#the-synthetic-changelogrelease-path-is-the-real-hazard).) +We version them with **one shared integer** (the token's `v` prefix), not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version fields would double the floor/replay/migrate machinery for an input set (`ResolutionInputHash`) that changes rarely — YAGNI. **The shared integer is permanent, made safe by digest-comparison.** The one hazard a shared prefix could create: a *resolution-only* algorithm bump drags the `InputFingerprint` token's prefix `v1`→`v2` while its digest is unchanged (the fingerprint algorithm was reused), and a *full-token* changelog walker would misread that prefix move as a release. We close it not by splitting the version but by having the historical changelog/classifier comparators compare the **digest** (the `:` tail), stripping the `v:` prefix: a resolution-only bump moves the prefix but not the digest → no phantom release; a real input change moves the digest → fires. Both fields are always co-written in the same `update` pass and the prefix advances whenever *either* algorithm advances, so the single prefix is a correct version for both. (See [the synthetic-history path](#the-synthetic-changelogrelease-path-is-the-real-hazard).) -**Phasing.** The atomic token format (`v:sha256:…`) is fixed at the reset, so there is no expensive-to-reverse key-naming decision left for Part 2. Fingerprint replay is wired in Part 2's first PR. **Resolution-hash replay is reserved, not yet wired** — the registry slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`), with no schema change. Critically, `ResolutionInputHash` does **not** feed the synthetic changelog path, so its churn is a one-line lock rewrite + a wasted re-resolution, never a phantom release (unlike `InputFingerprint`; see [Downstream consumers](#downstream-fingerprint-consumers-blast-radius)). +**Phasing.** The atomic token format (`v:sha256:…`) is fixed at the reset. Fingerprint replay is wired in Part 2's first PR; **resolution-hash replay is reserved, not yet wired** — the slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`), with no schema change. The deferral is safe because of its smaller blast radius — see [`ResolutionInputHash`](#resolutioninputhash--shares-the-version-replay-deferred). #### Churn-avoidance policies (G1) @@ -485,7 +485,7 @@ Lazy migration means an untouched lock can sit at an old version **indefinitely* - **`minSupportedLockContentVersion`** is a hard floor. A lock below it cannot be replayed and is treated as `Stale`. Dropping a registry entry is therefore a deliberate, breaking, announced act — never incidental cleanup. - **`component migrate`** (Open Q#5, promoted to a requirement) force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* — it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm — advancing that algorithm is the whole point — so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic-history trap](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm moves the shared prefix but not the `InputFingerprint` digest, so it is correctly release-silent.) That is *why* migrate is reset-grade and rare, not a free background sweep — the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal — each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). -- **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine — left alone, the registry, golden vectors, and deprecated tombstone fields grow **append-only** (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion − minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. **Early-warning ramp:** the ceiling is a *warning at ceiling−1*, a hard failure only at the ceiling — so an approaching floor-raise surfaces as a heads-up on the PR *before* the one that registers `v(N+1)`, converting the forced migrate from a surprise blocking failure into a planned event (the design's stated goal that nothing *unplanned* ever forces a migrate). **Residual, stated honestly:** if genuine algorithm changes arrive *faster* than planned rebuilds, the ceiling still ultimately *forces* an unplanned, release-grade `component migrate`. The ceiling does not eliminate the expensive event; it bounds the backlog by *converting* an unbounded version spread into an occasional forced migrate, with one version of advance notice. That is the accepted price of lazy-forever coexistence, not a contradiction to hide. +- **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine — left alone, the registry, golden vectors, and deprecated tombstone fields grow **append-only** (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion − minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. **Early-warning ramp:** the ceiling is a *warning at ceiling−1*, a hard failure only at the ceiling — so an approaching floor-raise surfaces as a heads-up on the PR *before* the one that registers `v(N+1)`, converting the forced migrate from a surprise blocking failure into a planned event (the design's goal that nothing *unplanned* ever forces a migrate). **Residual:** if genuine algorithm changes arrive *faster* than planned rebuilds, the ceiling still ultimately *forces* an unplanned, release-grade `component migrate`. The ceiling does not eliminate the expensive event; it bounds the backlog by *converting* an unbounded version spread into an occasional forced migrate, with one version of advance notice. This is the accepted cost of lazy-forever coexistence. **Mixed-toolchain hazard — bounded by the version-pin, not auto-repair.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock, stamping a legacy (prefix-less) or lower-`v` hash. In the **working tree** this is self-correcting: the next new-binary run detects the sub-floor token and force-rehashes it to the current version. But "self-correcting" stops at the working tree — if a downgraded lock is **committed**, `FindFingerprintChanges` reads `v1 → legacy → v1` as two real release events, and a published `%autorelease` increment cannot be withdrawn. So the load-bearing guard against *committed* phantom releases is the **CI version-pin**: post-cutover, no old binary may run the `update`-and-commit step. (The force-rehash only cleans the working tree; it does not undo history.) The *symmetric* residual — a binary that predates content-version `v2` meeting a `v2` token it cannot replay — is closed by a **required** write-time guard (Open Q#5, now a requirement): refuse to write a token whose version exceeds the binary's `currentLockContentVersion`, erroring rather than silently restamping at `v1`. Note this guard lives in the binary doing the write, so it constrains *newer-but-not-newest* binaries; it does **not** retroactively constrain a genuinely *old* binary — that direction is the version-pin's job. @@ -560,30 +560,28 @@ The versioned-replay story in Part 2 must hold for **every** reader of `InputFin - **Current-tree comparators** (`checkFingerprintFreshness`, `update`'s `Changed`, `BuildDirtyChange`) recompute against *live inputs*, so they **can and must** replay at the stored token's version. Feasible and invariant-safe. - **Stored-vs-stored historical comparators** (`FindFingerprintChanges`, `changed.go`'s `classifyComponent`/`haveMatchingFingerprints`) hold only committed token *strings* from two git refs — no config, no FS, no inputs. They **cannot** replay, and replaying would require recomputing a historical fingerprint, which the [forever-invariant](#back-compat-invariant--synthetic-history-reads-stored-strings-never-recomputes) forbids outright. Both stay **string-only**, and both compare the **digest** (stripping the `v:` version prefix), which makes them inherently immune to version-only deltas — a v1→v2 re-stamp with an unchanged digest reads as "no change." (Strict-lazy churn is still the policy that keeps re-stamps from riding no-op commits in the first place, but the comparators no longer *depend* on it for correctness.) -The `changed.go` classifier was the easily-missed member of the *second* class. The fix is **not** to make it replay-aware (impossible, and invariant-violating) — it is to give it the same **digest-compare** as `FindFingerprintChanges`, so a version-only delta reads as "no change." An earlier draft of this table wrongly demanded replay here; that obligation is removed. +The `changed.go` classifier is the easily-missed member of the *second* class: it must get the same **digest-compare** as `FindFingerprintChanges`, so a version-only delta reads as "no change" — not a replay (which it cannot do, holding no inputs). -**This contract is enforced by a type, not prose — the `fingerprint.Token` choke-point lands in PR C.** A reviewer-vigilance rule across the five-plus comparison sites is exactly the kind of discipline this RFC elsewhere converts to structure (the atomic token, D3), and the digest-comparison rework *raised* the stakes: it scattered `v:` prefix-parsing across two historical + three current-tree sites, and it blessed a **copyable** "digest-compare two stored strings" template. The residual hazard is therefore no longer mere *omission* — a forgotten replay at a current-tree site fails *safely* toward inequality → spurious `Stale`/`Changed` → wasteful rebuild (G1 churn, never G5) — it is *mis-classification*: a future consumer that holds live inputs but cargo-cults the **historical** stored-string template never looks at those inputs → silently accepts a stale tree → reachable G5. Omission is safe; mis-classification is not, and only structure closes it. So PR C (which already edits every one of those sites) introduces an opaque `fingerprint.Token` type (unexported internals) with **one strict parser** — `ParseToken` accepts only `sha256:` (legacy) and `v:sha256:`, treats any malformed token as *changed* (never normalizing a parse failure to an empty digest), and is the *sole* way to read a stored hash — routed through a single `Reconcile(lock) → {Fresh | Stale | RestampTo(v)}` API. A raw `==` on a stored hash outside that package will not compile. This converts the prose two-class contract into a compiler-enforced one at zero marginal cost, because the sites are being touched anyway. (Not a one-way door — the `Token` type has no on-disk-format dependency — but there is no reason to touch all five sites twice and carry the mis-classification window in between.) +**This contract is enforced by a type, not prose — the `fingerprint.Token` choke-point.** A reviewer-vigilance rule across the five-plus comparison sites is the kind of discipline this RFC elsewhere converts to structure (the atomic token, D3), and digest-comparison widens the surface: `v:` prefix-parsing now lives at two historical + three current-tree sites, and the "digest-compare two stored strings" pattern is **copyable**. The residual hazard is therefore not mere *omission* — a forgotten replay at a current-tree site fails *safely* toward inequality → spurious `Stale`/`Changed` → wasteful rebuild (G1 churn, never G5) — it is *mis-classification*: a future consumer that holds live inputs but copies the **historical** stored-string template never looks at those inputs → silently accepts a stale tree → reachable G5. Omission is safe; mis-classification is not, and only structure closes it. So an opaque `fingerprint.Token` type (unexported internals) carries **one strict parser** — `ParseToken` accepts only `sha256:` (legacy) and `v:sha256:`, treats any malformed token as *changed* (never normalizing a parse failure to an empty digest), and is the *sole* way to read a stored hash — routed through a single `Reconcile(lock) → {Fresh | Stale | RestampTo(v)}` API. A raw `==` on a stored hash outside that package will not compile. This lands in PR C, which already edits every one of those sites; it has no on-disk-format dependency, so there is no reason to touch all five sites twice and carry the mis-classification window in between. ### The synthetic changelog/release path is the real hazard [`synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go) turns fingerprint movement into **user-visible, shipped** package state — `%autochangelog` entries and `%autorelease` increments. There are two distinct comparators, and the design resolves them asymmetrically. -- **`FindFingerprintChanges` (historical walker)** compares `InputFingerprint` across the lock's git history and emits a synthetic changelog/release entry on every change. It compares the **digest** (stripping the `v:` version prefix), not the full token — a one-line string operation, not the infeasible version-aware replay (it has only committed *strings*, no inputs). So a version-only re-stamp (a lazy v1→v2 with an unchanged digest, or a resolution-only bump that advances the shared prefix) is **invisible** to it; only a moved digest — a genuine input change — fires. The migration folds honestly into the real change's entry that carries it. **Design decision:** the v1→v2 conversion is an *accepted, per-component, notable* changelog event that piggybacks a real change, and digest-comparison makes that robust by construction rather than reliant on perfect lazy discipline. - - **`component migrate` is release-grade *when it moves digests*.** A migrate that retires a *fingerprint* algorithm re-stamps every unchanged lock from `computeFP1`'s digest to `computeFP2`'s — the digests move, the walker fires, and the fleet-wide release is the deliberate cost ([registry floor](#registry-floor-and-forced-migration)). A migrate that retires only a *resolution* algorithm moves the shared prefix but not the `InputFingerprint` digest, so it is correctly release-silent. Either way the firing is *honest* (a real digest move), never a phantom-prefix artifact — which is exactly what the strict-lazy policy used to guard against and digest-comparison now guarantees structurally. +- **`FindFingerprintChanges` (historical walker)** compares `InputFingerprint` across the lock's git history and emits a synthetic changelog/release entry on every change. It compares the **digest** (stripping the `v:` version prefix), not the full token — a one-line string operation, not the infeasible version-aware replay (it has only committed *strings*, no inputs). So a version-only re-stamp (a lazy v1→v2 with an unchanged digest, or a resolution-only bump that advances the shared prefix) is **invisible** to it; only a moved digest — a genuine input change — fires, and the migration folds into the real change's entry that carries it. The v1→v2 conversion is thus an *accepted, per-component, notable* changelog event that piggybacks a real change, guaranteed by digest-comparison rather than by lazy-discipline. + - **`component migrate` is release-grade *when it moves digests*.** A migrate that retires a *fingerprint* algorithm re-stamps every unchanged lock from `computeFP1`'s digest to `computeFP2`'s — the digests move, the walker fires, and the fleet-wide release is the deliberate cost ([registry floor](#registry-floor-and-forced-migration)). A migrate that retires only a *resolution* algorithm moves the shared prefix but not the `InputFingerprint` digest, so it is correctly release-silent. Either way the firing tracks a real digest move, never a bare prefix change. - **`BuildDirtyChange` (live dirty check)** compares a *recomputed* current-version (v2) hash against the *stored* (possibly v1) `headLock.InputFingerprint` and declares dirty on inequality. "Accept as notable" does **not** save this path: post-switchover an *unchanged* component would read **dirty on every `render`/`build`** until re-stamped — a persistent, recurring spurious signal, worse than a one-time entry. The fix is **free**: it is the *same replay Part 2 already owes the freshness check* — replay at `headLock`'s recorded version before declaring dirty. One additional call site for logic already being written, no new mechanism. **Net:** the changelog-walker concern is not "make the walker version-aware" (hard, maybe infeasible). It is two cheap things — (1) the historical comparators (`FindFingerprintChanges`, `changed.go`) compare the **digest**, so a version-only delta never fires; and (2) extend the *current-tree* replay to `BuildDirtyChange` (which *does* hold live inputs), one call site for logic already being written. The reset commit is the single deliberate exception: it *is* a fleet-wide notable event, the coordinated cutover, intentionally visible. ### `ResolutionInputHash` — shares the version, replay deferred -`ComponentLock` carries a *second* persisted content hash, `ResolutionInputHash`, with its own staleness logic and its own silent-write path (it writes when only `resHashChanged`, never flipping `Changed`). It has the **identical** evolution problem as `InputFingerprint`: any future change to `ComputeResolutionHash`'s algorithm moves every lock's hash — exactly the mass-churn this RFC exists to prevent. +`ComponentLock` carries a *second* persisted content hash, `ResolutionInputHash`, with its own staleness logic and its own silent-write path (it writes when only `resHashChanged`, never flipping `Changed`). It has the **identical** evolution problem as `InputFingerprint`, and the single shared content version covers it (see [Both hashes share one version](#both-hashes-share-one-version)). Two things make its replay safe to defer: -The single shared content version (the token's `v` prefix) covers it (see [Both hashes share one version](#both-hashes-share-one-version)). What differs is **blast radius**, which is why we wire its replay later, not now: +- **Smaller blast radius.** `ResolutionInputHash` does **not** feed `synthistory`, so an algorithm change can never mint a phantom changelog/release (that hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. +- **No pending change.** It is a flat seven-field SHA256, not a struct walk, so the projection substrate leaves it untouched. Its registry slot stays `computeRes1` until its inputs genuinely change. -- `ResolutionInputHash` does **not** feed `synthistory` — so an algorithm change can never mint a phantom changelog/release (that hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. -- It is a flat seven-field SHA256, not a struct walk, so the projection substrate leaves it untouched — it has no pending version event. Its registry slot stays `computeRes1` until its inputs genuinely change. - -**Decision:** the atomic token format is fixed at the reset, so there is no irreversible key-naming decision left; wire fingerprint replay in Part 2's first PR; reserve resolution replay (slot present, prior fn reused) and wire it the day `ComputeResolutionHash` first changes — a localized follow-up with no schema change. KISS/YAGNI on the second replay. **One shared version, kept honest by digest-comparison — no split.** An earlier draft planned to give `ResolutionInputHash` its *own* prefix when replay is wired, reasoning that a resolution-only bump must not advance the release-bearing `InputFingerprint` token. That split is unnecessary: the historical changelog/classifier comparators compare the **digest**, not the version prefix (see [Both hashes share one version](#both-hashes-share-one-version)), so advancing the shared prefix for a resolution-only change moves no digest and mints no release. The shared integer therefore stays permanent; when `ComputeResolutionHash` first changes we add `computeRes2`, bump the shared version, and re-stamp both fields together — the `InputFingerprint` digest is unchanged (its algorithm was reused), so the changelog walker stays correctly silent. +**Decision (KISS/YAGNI):** wire fingerprint replay in Part 2's first PR; reserve resolution replay (slot present, prior fn reused) and wire it the day `ComputeResolutionHash` first changes — add `computeRes2`, bump the shared version, re-stamp both fields together. No separate resolution prefix is needed: digest-comparison keeps the shared version correct for both ([above](#both-hashes-share-one-version)). ## Design decisions @@ -605,15 +603,15 @@ Both can omit zero values; the decisive difference is **whether an old algorithm Field membership lives in a per-field version-set tag (`fingerprint:"v1..*"`) read by one generic walker — not in N hand-written functions, and not in the binary include/exclude tag of today's reflective audit. Rationale: -- **The unsafe direction is the false-negative** (a meaningful field silently omitted → missed rebuild → stale artifact). A *mandatory* tag — absent → build failure — makes the include/exclude decision impossible to *forget* (G-1's forgotten-field case). The *wrongly-excluded* case (a `-` tag on a build-effective field) is caught by the kept exclusion ledger, and the *wrongly-included-but-unmeasured* case by golden-vector coverage — see [Enforcement](#golden-vector-coverage-is-the-load-bearing-invariant). +- **The unsafe direction is the false-negative** (a meaningful field silently omitted → missed rebuild → stale artifact, a G5 violation). A *mandatory* tag — absent → build failure — makes the include/exclude decision impossible to *forget*. The *wrongly-excluded* case (a `-` tag on a build-effective field) is caught by the kept exclusion ledger, and the *wrongly-included-but-unmeasured* case by golden-vector coverage — see [Enforcement](#golden-vector-coverage-is-the-load-bearing-invariant). - **Version-awareness is declarative.** A field's whole lifecycle — introduced at v3, dropped at v5, revived at v8 — is one greppable string on the field (`v3..v4,v8..*`), not a diff smeared across three function bodies. Recovery (bring-back) is *expressible* precisely because the set is non-contiguous. -- **Honest cost: frozen-ness is now *test*-enforced, not *structurally* enforced.** The walker reflects the live struct (the very thing Problem 6 rejected); only the version-set tag re-freezes membership, and golden-vector **coverage** re-freezes the emit-key, encoding, and zero-predicate. This is a real trade — a hand-written `projectVN` would make field removal a *compile* error — accepted because the coverage invariant turns it into an equally-blocking CI failure, and because the declarative lifecycle, native completeness, and first-class recovery are worth it. The hand-written variant is kept as [Option B](#alternatives-considered). +- **Cost: frozen-ness is *test*-enforced, not *structurally* enforced.** The walker reflects the live struct (the very thing Problem 6 rejected); only the version-set tag re-freezes membership, and golden-vector **coverage** re-freezes the emit-key, encoding, and zero-predicate. A hand-written `projectVN` would make field removal a *compile* error; the coverage invariant turns it into an equally-blocking CI failure instead, in exchange for the declarative lifecycle, native completeness, and first-class recovery. The hand-written variant is kept as [Option B](#alternatives-considered). ### D3 — Atomic self-describing token; no format bump, reconcile via force-rehash The stored hash is a single `v:sha256:` token, not separate version and digest fields. One field, written atomically, so the version and the digest can never desync (the class of bug a split-field design invites when one is written and the other is not). -The lock **format** `Version` stays at `1`. An earlier draft bumped it (1→2) as a poison pill to stop old binaries touching reset locks, but that also stops them reading pins to *queue a build* — too blunt. Instead, back-compat rests on two cheaper properties: the format is unchanged so every binary parses every lock, and the content-version registry **force-rehashes** any sub-floor token (legacy, or downgraded by an old binary) up to the current version. Old binaries stay useful (read pins, build); their only possible mischief — writing a legacy-substrate hash — is self-correcting on the next new-binary run, not silent corruption. Back-compat is therefore: **same format forever, reconcile fingerprints by version, never recompute history.** +The lock **format** `Version` stays at `1`. Bumping it to `2` as a poison pill — to stop old binaries touching reset locks — is too blunt: it also stops them reading pins to *queue a build*. Instead, back-compat rests on two cheaper properties: the format is unchanged so every binary parses every lock, and the content-version registry **force-rehashes** any sub-floor token (legacy, or downgraded by an old binary) up to the current version. Old binaries stay useful (read pins, build); their only possible mischief — writing a legacy-substrate hash — is self-correcting on the next new-binary run, not silent corruption. Back-compat is therefore: **same format forever, reconcile fingerprints by version, never recompute history.** ### D4 — Project to bytes, not a `ConfigHash()` method on the type @@ -624,10 +622,10 @@ The lock **format** `Version` stays at `1`. An earlier draft bumped it (1→2) a - **Incremental lazy migration on the `hashstructure` substrate** (the original plan): flip the inclusion default to omitempty via `Includable`, version the lock content, and migrate lazily — *without* a reset. Rejected: Problem 6 makes its central promise unkeepable. A "frozen" replay function built on `hashstructure.Hash` reflects the live struct, so the first field addition after the switchover moves the old algorithm's output and forces a rehash anyway. The incremental path therefore does not actually avoid a coordinated cutover — it defers one to the first field addition, on a substrate that makes replay unsound. With a coordinated cutover already scheduled (the dev→prod cutover), spending it once on a clean projection substrate strictly dominates. - **Global `IgnoreZeroValue`** — a blunt switch that omits *all* zero fields with no escape hatch for build-meaningful zeros, and still on the non-frozen `hashstructure` substrate. Rejected. - **Parallel versioned structs with per-struct `Hash()`** — couples locks to Go type identity and duplicates hashing logic per version. Rejected in favor of Part 2's integer-versioned combiner over frozen projections. -- **Bump the lock format `Version` 1→2 as a poison pill** (an earlier draft's choice) — makes old binaries hard-reject reset locks. Rejected: it also blocks old binaries from reading pins to queue a build, and it is unnecessary, since the content-version registry already force-rehashes any sub-floor or downgraded token (D3). Same-format + force-rehash keeps old binaries useful without risking silent corruption. +- **Bump the lock format `Version` 1→2 as a poison pill** — makes old binaries hard-reject reset locks. Rejected: it also blocks old binaries from reading pins to queue a build, and it is unnecessary, since the content-version registry already force-rehashes any sub-floor or downgraded token (D3). Same-format + force-rehash keeps old binaries useful without risking silent corruption. - **Eager fleet-wide migration as the steady-state mechanism** — rewriting every lock on every algorithm change is the mass-churn the design exists to prevent. Rejected for the steady state. The *reset* is a deliberate, one-time, operator-driven eager pass riding an already-scheduled rebuild — the sanctioned exception, not the rule; `component migrate` is its post-reset equivalent for retiring an old version. -- **Hand-written per-version `projectVN` selection functions (instead of version tags).** Each version gets a bespoke `func projectVN(c) []byte` with one explicit `emit`/`emitAlways` line per measured field. *Win:* field removal is **compile-enforced** — deleting a struct field a retained `projectVN` still names won't compile (the tag walker downgrades this to a CI-time golden-vector failure). *Losses:* membership is smeared across N function bodies instead of one declarative tag per field; "bring a field back a few versions later" has no first-class expression (you re-add an `emit` line, with nothing tying it to the field's earlier life); and the mandatory-decision property (G-1) needs a *separate* completeness test with an awkward field→emit-key bridge, where the tag simply *is* the ledger. Rejected in favor of version tags: the declarative lifecycle, native completeness, and expressible recovery outweigh trading one compile-time guard for an equally-blocking CI guard. -- **Per-field hash manifest in the lock (instead of one opaque token).** Store `{field → hash}` (à la `go.sum`) rather than a single `v:sha256:…` digest. *Genuine wins:* dropping a field becomes ignoring its manifest line — no projection kept alive for replay, so the **deprecate-then-delete two-step and the registry-retirement deadlock** (the append-only growth above) both vanish; and the stored-vs-stored historical comparators become structural set-diffs rather than version-blind string compares. *Why the opaque token still wins for azldev:* (1) the projection substrate **already** delivers additive immunity (G4) — the manifest's headline draw — so that advantage is moot, not additive; (2) the manifest does **not** kill the false-fresh hazard, contrary to first impression — an old lock has *no line* for a newly-measured input, so there is still no baseline to detect a change to it (the blind spot is relocated, not removed); (3) it makes *algorithm evolution* — the entire point of Part 2 — **harder**, needing per-field versioning where the token needs one integer for the whole algorithm; and (4) it bloats every lock to O(fields × components) (the well-known `go.sum` size cost). The manifest is the better tool for a *static* input set that mainly grows and shrinks; the opaque token + single version is the better tool for an *evolving hashing algorithm*, which is azldev's actual problem. Recorded explicitly because the reset bakes the storage model in — token-vs-manifest is irreversible after PR B — and the retirement deadlock the manifest would have dissolved is instead answered by the floor-advance cadence above. +- **Hand-written per-version `projectVN` selection functions (instead of version tags).** Each version gets a bespoke `func projectVN(c) []byte` with one explicit `emit`/`emitAlways` line per measured field. *Win:* field removal is **compile-enforced** — deleting a struct field a retained `projectVN` still names won't compile (the tag walker downgrades this to a CI-time golden-vector failure). *Losses:* membership is smeared across N function bodies instead of one declarative tag per field; "bring a field back a few versions later" has no first-class expression (you re-add an `emit` line, with nothing tying it to the field's earlier life); and the mandatory-decision property needs a *separate* completeness test with an awkward field→emit-key bridge, where the tag simply *is* the ledger. Rejected in favor of version tags: the declarative lifecycle, native completeness, and expressible recovery outweigh trading one compile-time guard for an equally-blocking CI guard. +- **Per-field hash manifest in the lock (instead of one opaque token).** Store `{field → hash}` (à la `go.sum`) rather than a single `v:sha256:…` digest. *Genuine wins:* dropping a field becomes ignoring its manifest line — no projection kept alive for replay, so the **deprecate-then-delete two-step and the registry-retirement deadlock** (the append-only growth above) both vanish; and the stored-vs-stored historical comparators become structural set-diffs rather than version-blind string compares. *Why the opaque token still wins for azldev:* (1) the projection substrate **already** delivers additive immunity (G4) — the manifest's headline draw — so that advantage is moot, not additive; (2) the manifest does **not** kill the false-fresh hazard — an old lock has *no line* for a newly-measured input, so there is still no baseline to detect a change to it (the blind spot is relocated, not removed); (3) it makes *algorithm evolution* — the entire point of Part 2 — **harder**, needing per-field versioning where the token needs one integer for the whole algorithm; and (4) it bloats every lock to O(fields × components) (the well-known `go.sum` size cost). The manifest is the better tool for a *static* input set that mainly grows and shrinks; the opaque token + single version is the better tool for an *evolving hashing algorithm*, which is azldev's actual problem. Recorded explicitly because the reset bakes the storage model in — token-vs-manifest is irreversible after PR B — and the retirement deadlock the manifest would have dissolved is instead answered by the floor-advance cadence above. ## Incremental delivery @@ -646,4 +644,4 @@ Each PR is independently revertible up to the cutover. PRs A–B land together a 1. Should a lazy re-stamp during a *read-only* command (`render`, `build` freshness check) write the lock back, or defer all writes to `component update`? Writing on read is surprising; deferring means freshness checks stay slightly slower until the next update. (Leaning: defer all writes to `update`, keeping reads side-effect-free.) 2. For the config schema axis, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. -*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → build failure, `!`-prefix for always-emit), read by one generic walker — this restores "forgotten field → loud build failure" (G-1) natively; the **emit-key is the frozen TOML key** (never the Go field name, so a field rename is byte-neutral; `key=` overrides for keyless fields; duplicate emit-keys fail the build), the **omit-predicate is `reflect.Value.IsZero()`** (former Open Q#3), and the tag DSL is **frozen at three range-operators** (`..`, `!`, `*`) plus the orthogonal `key=`; frozen-ness rests on the **golden-vector coverage invariant** (every field measured at a retained version appears non-zero in ≥1 vector, with a discrimination check) whose oracle is a **tag-independent dropped-fields ledger** (a field absent from it must discriminate at the *current* version), plus a **kept exclusion ledger** for `-` fields (the inclusion default is native to the tag; the *exclusion* default stays ledgered because it is the G5-dangerous direction); the stored hash is read only through an opaque **`fingerprint.Token`** type with one strict parser and a `Reconcile` API (adopted in PR C, closing the comparator mis-classification hazard structurally rather than by prose); baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version covers both stored hashes **permanently** (no split) — the historical changelog/classifier comparators compare the **digest** (stripping the `v:` prefix), so advancing the shared prefix for a resolution-only algorithm change moves no digest and mints no release; resolution replay stays reserved (slot present, `computeRes1` reused) until `ComputeResolutionHash` first changes. +*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → build failure, `!`-prefix for always-emit), read by one generic walker — restoring "forgotten field → loud build failure" natively; the **emit-key is the frozen TOML key** (never the Go field name, so a field rename is byte-neutral; `key=` overrides for keyless fields; duplicate emit-keys fail the build), the **omit-predicate is `reflect.Value.IsZero()`** (former Open Q#3), and the tag DSL is **frozen at three range-operators** (`..`, `!`, `*`) plus the orthogonal `key=`; frozen-ness rests on the **golden-vector coverage invariant** (every field measured at a retained version appears non-zero in ≥1 vector, with a discrimination check) whose oracle is a **tag-independent dropped-fields ledger** (a field absent from it must discriminate at the *current* version), plus a **kept exclusion ledger** for `-` fields (the inclusion default is native to the tag; the *exclusion* default stays ledgered because it is the G5-dangerous direction); the stored hash is read only through an opaque **`fingerprint.Token`** type with one strict parser and a `Reconcile` API (adopted in PR C, closing the comparator mis-classification hazard at compile time); baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version covers both stored hashes **permanently** (no split) — the historical changelog/classifier comparators compare the **digest** (stripping the `v:` prefix), so advancing the shared prefix for a resolution-only algorithm change moves no digest and mints no release; resolution replay stays reserved (slot present, `computeRes1` reused) until `ComputeResolutionHash` first changes. From 4ee92600e9063c8f53a8b597a69a096594d63fa2 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Mon, 8 Jun 2026 17:30:08 -0700 Subject: [PATCH 08/15] update 6 --- docs/developer/rfc/lazy-schema-migration.md | 128 ++++++++++---------- 1 file changed, 63 insertions(+), 65 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index d223ca10..8b84da29 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -89,7 +89,7 @@ This RFC therefore has two parts: **(1)** a one-time **reset** at the dev→prod - **G2: only real changes drift.** Post-reset, a lock changes iff that component's build-effective inputs changed. - **G3: piecemeal, lazy migration post-reset.** Genuine algorithm evolution after the reset rolls out per-component, riding independent changes, never as a big-bang. - **G4: additive fields are drift-neutral by construction — *truly*, not just for new locks.** On the projection substrate (below) an unset additive field is invisible to *every* lock including old ones, because old versions emit only the fields their tags include — a field added later is not in any shipped version's tag set, so it cannot move an existing hash. -- **G5: correctness backstop preserved.** Never silently under-rebuild: a genuine input change must always drift its lock. Replay may accept encoding/over-capture changes; it must never mask a behavior-changing one. +- **G5: correctness backstop preserved — relative to the lock's own content version.** Never silently under-rebuild: a genuine input change must drift any lock *whose version measures that input*. An input a lock's version does not measure (a field introduced later, a not-yet-adopted newly-measured input) is correctly invisible until the lock migrates — lazy non-adoption is by contract, not a miss. Replay may accept encoding/over-capture changes; it must never mask a behavior-changing one within the lock's own measured set. - **G6 (new, hard): back-compatible reads for synthetic history.** The new binary must still **read** pre-reset locks across git history (synthetic changelog/release walks them), even though it **writes** only the new format. Reading never recomputes a historical hash — it compares stored strings only. ## Problem inventory @@ -137,7 +137,7 @@ Not every config change should be treated the same way. The right mechanism depe | Class | Example | Should unaffected locks drift? | Mechanism | | ----- | ------- | ------------------------------ | --------- | -| **Additive field** | new `foo` field, unset on most components | No — only setters drift | **Free, no bump.** Tag the new field `vN..*` (current version, omit-if-zero); a component that leaves it unset emits identical bytes, so no shipped hash moves — adding an omit-if-zero field to the live version is the one output-preserving no-bump edit. Setters drift (correct). | +| **Additive field** | new `foo` field, unset on most components | No | **Free, no bump.** Tag the new field `vN..*` (current version, omit-if-zero); a component that leaves it unset emits identical bytes, so no shipped hash moves — adding an omit-if-zero field to the live version is the one output-preserving no-bump edit. A setter whose lock is *already at* version N drifts; a setter on an older, un-migrated lock is left unchanged (false-fresh) until that lock next re-stamps or is migrated — the same lazy contract as a newly-measured input. To force the field onto the whole fleet now, do an explicit `component migrate`. **Tagging a build-meaningful-zero field `!vN..*` (always-emit) on the live version is *not* this case** — see the note below. | | **Additive with non-zero default** | new field defaulted to `"auto"` via defaults merge | No | **Bump + replay.** The default resolves non-zero on *every* component, so it is emitted everywhere and would move every hash — omit-if-zero can't save it. Bump and tag the field `v(N+1)..*`; old locks **replay at their version** (whose set excludes it), match their stored digest → recognized unchanged → lazy re-stamp, no rebuild. | | **Default change on an *existing* field** | bump `jobs` default `4`→`8` | Yes — every component's effective input moved | **Not lazy-maskable.** Replay recomputes the *current* config (now resolving to `8`) under the old algorithm → `jobs=8` ≠ stored `jobs=4` → genuine fleet-wide drift; replay cannot suppress it because the resolved value genuinely changed for everyone. Escape hatch: `config migrate` writes the *old* resolved value explicitly (`jobs=4`) into each config **before** moving the default — existing components then pin the old value (no drift) and only new components pick up `8`. Without that pre-pass it is a legitimate (if large) fleet rebuild, not a bug. | | **Rename / move** | `foo` → `bar`, same semantics | No | **Schema migration + bump + replay.** Migrate old TOML → canonical struct (the rename lands in the struct), then tag the renamed field `v(N+1)..*`. Old locks replay at their version and are recognized unchanged → lazy re-stamp, no rebuild. | @@ -149,7 +149,9 @@ Not every config change should be treated the same way. The right mechanism depe The recurring requirement across the "No" rows is the same: **distinguish a change in user intent from a change in encoding, and only drift on the former.** Note the first row: on the projection substrate, a new field is added to `projectVN` as *omit-if-zero*, so a component that does not set it emits identical bytes and stays hash-neutral — *for every lock, old or new*, because old configs never set the brand-new field. Adding it does not move any existing hash (no shipped lock set it), so it needs no version bump. Part 2 then carries only the genuinely hard cases (rows 2, 5, and post-reset renames/removals). The shared move in every "Bump + replay" row is the same primitive — **increment the content version, keep the old `projectVN` as a frozen replay projection, and let unchanged locks re-stamp lazily** — detailed in [Part 2](#part-2--post-reset-lazy-migration). -> **`projectVN`** is shorthand used throughout this RFC for the canonical *projection at content-version N* introduced by this design (defined in [Substrate options](#substrate-options) and [The projection substrate](#the-projection-substrate)). It is **not** N hand-written functions: it is a single generic walker, `project(cfg, N)`, whose per-field membership is declared in version-set tags on the struct fields (see [Version-tagged field selection](#version-tagged-field-selection)). `projectV1` means `project(cfg, 1)` — the fields whose tag set includes v1; `projectV2` the next version, and so on. Each version's projection is frozen once shipped (its tags never move; golden vectors enforce it) — that is the whole point. +**Adding a field as `!` (always-emit) to a *live* version is a version-bump event, not a free additive.** A zero-valued `!` field emits bytes for *every* component, including those that never set it, so it moves every lock the instant it lands on the current version — the opposite of "leave old locks alone." Build-meaningful-zero fields must therefore be introduced at a *new* version (`!v(N+1)..*`) and absorbed by replay, exactly like any other non-additive change. Only omit-if-zero additions (`vN..*`) are free on the live version. + +> **`projectVN`** is shorthand used throughout this RFC for the canonical *projection at content-version N* introduced by this design (defined in [Substrate options](#substrate-options) and [The projection substrate](#the-projection-substrate)). It is a per-version function `projectVN(cfg) []byte` — **generated** from declarative version-set tags on the struct fields (see [Version-tagged field selection](#version-tagged-field-selection)), not hand-written. `projectV1` measures the fields whose tag set includes v1; `projectV2` the next version, and so on. Each generated `projectVN` is frozen once shipped (its source tags never move; the generated code is checked in; golden vectors backstop it) — that is the whole point. ## Research @@ -158,7 +160,7 @@ The recurring requirement across the "No" rows is the same: **distinguish a chan Two substrates can produce a content fingerprint of the resolved config. The difference that matters here is **whether an old algorithm function can be frozen.** - **`hashstructure` + `Includable` (rejected as the substrate).** Keeps existing hashes byte-identical and gives per-field omission via `HashInclude`. But, as established above (Problem 6), a function built on `hashstructure.Hash` reflects over the live struct and method set, so it cannot be a frozen historical algorithm. It also requires a value-receiver `HashInclude` on *every* nested fingerprinted struct and a subtle `v.(reflect.Value)` type-assert to work at all — brittle plumbing in service of a substrate that still can't host sound replay. -- **Canonical projection + stdlib hash (chosen).** Split the two jobs `hashstructure` fuses — *field selection* and *hashing* — into explicit steps. Field selection is **declared per field** as a version-set in the `fingerprint` tag (`fingerprint:"v1..*"`); a single generic walker, `project(cfg, N)`, emits the fields whose set includes version N in a canonical, sorted, self-delimiting byte form, and an stdlib `sha256` hashes those bytes. Because a shipped version's tag membership is **fixed and golden-vector-pinned**, `project(cfg, 1)` does not see fields added later, does not depend on the type's method set, and does not depend on receiver subtleties. It is a genuinely frozen pure function of `(cfg, version)` — the property replay requires. The cost is owning a small projection encoder, the version-set tags, and **golden hash vectors** per version (a checked-in `(config, version) → hash` table) so "frozen" is CI-enforced, not merely intended. +- **Canonical projection + stdlib hash (chosen).** Split the two jobs `hashstructure` fuses — *field selection* and *hashing* — into explicit steps. Field selection is **declared per field** as a version-set in the `fingerprint` tag (`fingerprint:"v1..*"`); a `go generate` step emits a per-version `projectVN(cfg) []byte` function that serializes the fields whose set includes version N in a canonical, sorted, self-delimiting byte form, and an stdlib `sha256` hashes those bytes. Because a shipped `projectVN` is frozen checked-in code, it does not see fields added later, does not depend on the type's method set, and does not depend on receiver subtleties. It is a genuinely frozen pure function of `(cfg)` per version — the property replay requires. The cost is owning the generator and **golden hash vectors** per version (a checked-in `(config, version) → hash` table) so the generator itself is CI-backstopped. The projection substrate is what makes G4 true for old locks and what makes Part 2's replay sound. It is adopted at the reset (below), not incrementally. @@ -200,24 +202,24 @@ The original "lazy" instinct was right for Part 2 and wrong for Part 1: there is Replace `hashstructure.Hash(component, …)` with an explicit two-step pipeline: ```text -ComponentConfig ──project(cfg,1)──► canonical bytes ──sha256──► configHash - (version-tagged fields, (stdlib) +ComponentConfig ──projectV1(cfg)──► canonical bytes ──sha256──► configHash + (generated from v1 tags, (stdlib) sorted keys, emit-if-nonzero) ``` -`projectV1` is the projection at version 1 — `project(cfg, 1)`. Field membership is declared **on each struct field** as a version-set in the `fingerprint` tag (`fingerprint:"v1..*"`); a single generic walker emits, in stable key order, every field whose set includes the target version, length-prefixing key+value so distinct field sets cannot collide. It omits a field when its **resolved value is zero** (omit-if-zero, an encoder property now, not a struct-tag toggle); a range prefixed with `!` (e.g. `!v1..*`) always-emits, for fields whose zero is build-meaningful. There is no per-version function — only the generic walker parametrized by version. (Grammar and recovery semantics: [Version-tagged field selection](#version-tagged-field-selection) below.) +`projectV1` is the projection at version 1. Field membership is declared **on each struct field** as a version-set in the `fingerprint` tag (`fingerprint:"v1..*"`); a `go generate` step reads those tags and emits a frozen `projectV1(cfg) []byte` function that emits, in stable key order, every field whose set includes v1, length-prefixing key+value so distinct field sets cannot collide. It omits a field when its **resolved value is zero** (omit-if-zero); a range prefixed with `!` (e.g. `!v1..*`) always-emits, for fields whose zero is build-meaningful. The generated functions are checked in and the registry dispatches to them. (Grammar, generation, and recovery semantics: [Version-tagged field selection](#version-tagged-field-selection) below.) Three things this buys that `hashstructure` could not: -- **Frozen by construction.** A version's field set is fixed by tags that never change for a shipped version (golden vectors enforce it), so adding `Foo` to the struct later is invisible to `project(cfg, 1)` — its output for an old config is unchanged. This is what makes Part 2's replay sound (Problem 6) and G4 true for *old* locks, not just new ones. -- **No method-set / receiver magic.** No `Includable`, no per-nested-struct method, no `v.(reflect.Value)` type-assert footgun. Selection is a declarative tag the walker reads. -- **Golden-vector enforced.** A checked-in table of `(config, version) → hash` vectors is asserted in CI, so any accidental change to a historical projection — a tag edit that moves a shipped version's membership — fails the build. "Frozen" stops being a promise and becomes a test. +- **Frozen by construction.** A shipped `projectVN`'s body is fixed checked-in code (its source tags never move), so adding `Foo` to the struct later cannot change `projectV1`'s output for an old config. This is what makes Part 2's replay sound (Problem 6) and G4 true for *old* locks, not just new ones. +- **No method-set / receiver magic.** No `Includable`, no per-nested-struct method, no `v.(reflect.Value)` type-assert footgun. Selection is a declarative tag the generator reads. +- **Removal is a compile error; rename is byte-neutral.** A generated `projectVN` references each measured field by its literal Go path and emits a literal key, so deleting a field a retained version still measures won't compile, and renaming the Go field changes nothing. Golden vectors backstop the generator itself. The cost is owning the projection encoder and the golden vectors. That cost is paid once, at the reset, against a rebuild we are already doing. ### Version-tagged field selection -Field membership in each version's projection is declared **on the struct field**, as a version-set in the existing `fingerprint` tag — not in a hand-written per-version function. One generic walker, `project(cfg, N)`, emits every field whose set includes `N`. This is the chosen mechanism; hand-written `projectVN` functions are the [Option B alternative](#alternatives-considered). +Field membership in each version's projection is declared **on the struct field**, as a version-set in the existing `fingerprint` tag. A `go generate` step reads those tags and **emits** a per-version `projectVN(cfg) []byte` function — the tags are the source of truth, the generated functions are the artifact. This is the chosen mechanism; a runtime reflective walker and hand-written functions are the [alternatives](#alternatives-considered). **Grammar** (deliberately small): @@ -241,7 +243,7 @@ version = "v", digit, { digit } ; `*` resolves to "this version and every later one," so an *active* field never needs a tag edit across a version bump — only a field that is *dropped* at the bump gets its range closed (`v1..*` → `v1..vN`). -**The emit-key is the field's TOML key, frozen — never the Go identifier.** The walker emits each field under a stable string key and sorts by it; that key is the field's **`toml:` name**, or — for a field with no usable TOML key — an explicit `key=` member in the `fingerprint` tag (grammar below). The key is pinned once and treated as part of the frozen output. It is deliberately *not* the Go field name, so a cosmetic Go rename (`Foo`→`Bar`, same TOML key, same tag) is genuinely byte-neutral — making the [struct-rename drift-neutral claim](#config-schema-version-and-canonical-migration-future) true at the *field* level too, not just the type level. Renaming the *emit-key* is an output-changing edit and therefore a version bump like any other. **Duplicate emit-keys within one retained version fail the build** (two fields resolving to the same key would collide and alias — a silent G5 hazard), so the key set is checked for uniqueness at every retained version. +**The emit-key is the field's TOML key, frozen — never the Go identifier.** The generated function emits each field under a stable string key and sorts by it; that key is the field's **`toml:` name**, or — for a field with no usable TOML key — an explicit `key=` member in the `fingerprint` tag (grammar below). The generator emits it as a literal, so it is pinned as part of the frozen output. It is deliberately *not* the Go field name, so a cosmetic Go rename (`Foo`→`Bar`, same TOML key, same tag) is byte-neutral — making the [struct-rename drift-neutral claim](#config-schema-version-and-canonical-migration-future) true at the *field* level too, not just the type level. Renaming the *emit-key* is an output-changing edit and therefore a version bump like any other. **Duplicate emit-keys within one retained version fail generation** (two fields resolving to the same key would collide and alias — a silent G5 hazard), so the generator checks key uniqueness at every retained version. **The grammar is frozen at three range-operators** (`..` range, `!` always-emit, `*` open-end), plus the orthogonal `key=` override: @@ -263,70 +265,58 @@ It deliberately re-invents protobuf's `reserved` field-range discipline, and pro Every frozen output is byte-preserved, and the **golden vectors prove it**: the edit `v1..v1` → `v1..v1,v5..*` must leave the v1–v4 vectors identical or CI fails. The grammar lets you *express* the non-contiguous set; the golden vectors *forbid* rewriting history while doing so. Two recovery flavors, both covered: (a) field still on the struct (lingering for replay) → reopen its range; (b) field already physically deleted (floor passed it) → bring-back is just a fresh additive field tagged `vN..*`. Same outcome, no special case. -**Always-emit is per-range, for the same reason.** Whether a field's *zero value emits* can change over time just as its membership can — so `!` flags an individual range, not the whole field. `v1..v4,!v5..*` means omit-if-zero through v4, then always-emit from v5. Toggling it is an *output-changing* edit (a zero-valued field starts or stops emitting), so it lands as a new range at a new version exactly like a drop/re-add — same output-preservation rule, same golden-vector enforcement. The walker just asks "which range holds N, and is it `!`?" A whole-field always-flag could not express this temporal toggle without a per-version walker. +**Always-emit is per-range, for the same reason.** Whether a field's *zero value emits* can change over time just as its membership can — so `!` flags an individual range, not the whole field. `v1..v4,!v5..*` means omit-if-zero through v4, then always-emit from v5. Toggling it is an *output-changing* edit (a zero-valued field starts or stops emitting), so it lands as a new range at a new version exactly like a drop/re-add — same output-preservation rule, same golden-vector backstop. The generator simply emits, for each version, whether that field's range is `!`. A whole-field always-flag could not express this temporal toggle. -**What tags version, and what they don't.** Tags version *membership* — which fields a version measures. They do **not** version *encoding* — how a field's bytes are formed, or how the combiner folds non-field inputs. So the generic walker absorbs additive / removal / bring-back changes as pure tag edits (zero code), while a genuine encoding or combiner change still ships as versioned code in `computeFP(N+1)` (the walker output + the combiner step frozen at N). The taxonomy's non-additive rows are exactly that small set. +**What tags version, and what they don't.** Tags version *membership* — which fields a version measures. They do **not** version *encoding* — how a field's bytes are formed, or how the combiner folds non-field inputs. So a pure tag edit (additive / removal / bring-back) regenerates with no hand-written code, while a genuine encoding or combiner change still ships as versioned code in `computeFP(N+1)` (the projection output + the combiner step frozen at N). The taxonomy's non-additive rows are exactly that small set. -> **The walker still reflects the live struct — tags freeze only *membership*, golden vectors freeze the rest.** `project(cfg, N)` reflects the live struct exactly as the rejected `hashstructure` substrate did (Problem 6). The version-set tag re-freezes *which fields* a version measures, but three other things the walker reads from the live struct are **not** frozen by the tag — the **emit-key** (frozen instead by the immutable-TOML-key rule above), the **per-field encoding/type** (how a value becomes bytes), and the **zero-predicate** (what counts as omittable). All three are re-frozen by **golden-vector coverage**, not by code structure: a change to any of them moves a covered vector and fails CI. [Golden-vector coverage](#golden-vector-coverage-is-the-load-bearing-invariant) is therefore the load-bearing invariant of the whole substrate — the one place this design trades a *structural* guarantee for a *test* guarantee. It is sound only because the coverage rule below is stated and enforced. +**Escape hatch — the registry already is one; a per-field one is deferred.** The whole-function hatch is free: the registry is `map[int]computeFn` and does not care whether an entry was generated or hand-written, so a version the generator cannot express is simply *not generated* — you drop a hand-written `computeFPN` into the map instead. No new mechanism. A *per-field* hatch (massaging one field's encoding inside an otherwise-generated function — e.g. `fingerprint:"v1..*,enc=sortedSlice"`) is **deliberately not built now**: custom encoding is an *encoding* concern, which the rule above already routes through a versioned-code bump, and adding an `enc=` operator is an RFC-grade grammar change (the grammar is frozen at three range-operators + `key=`). Note the cost either way: a hand-written or hand-edited version drops back to **golden-vectors-only** — it loses regeneration-idempotence and the generator's completeness/coverage guards — so the hatch is for rare, deliberate cases, not routine use. -**Enforcement**, four layers — the first three are cheap syntactic guards, the fourth is the load-bearing one: +> **What codegen freezes structurally, and what golden vectors backstop.** Generation reflects the live struct — but at *build time*, and its output is **frozen checked-in code**, so the runtime projection never reflects the live struct (the way the rejected `hashstructure` substrate did at hash time, Problem 6). Three things the tag alone does not pin are pinned by the generated code instead: the **emit-key** (a literal string), **field membership** (a literal field list per version), and **field removal** (a retained `projectVN` references the field by Go path, so deleting it won't compile). Two things the compiler cannot independently judge — the **per-field encoding/type** (whether the emitted bytes are *right*) and the **zero-predicate** — are caught by regeneration-idempotence (CI runs `go generate`; any diff to a retained `projectVN` fails) and ultimately by **golden vectors**, which catch a generator bug that would move a shipped version's bytes. Golden-vector coverage is therefore a *backstop* behind compiler + generator, not the sole load-bearing guarantee — the design keeps its structural guarantees structural. -1. **No tag → build failure** makes the include/exclude decision impossible to *forget*: a field with no `fingerprint` tag fails the build. This restores the safe inclusion default the bare projection gives up — a forgotten field would otherwise drop silently out of the hash (a G5 stale-artifact hazard). -2. **Well-formedness:** ranges parse, are sorted and non-overlapping, and name no version above `currentLockContentVersion` (open `*` excepted); a `!` prefix is per-range and orthogonal; **emit-keys are unique within every retained version**. A malformed, future-referencing, or key-colliding set fails the build. -3. **Exclusion ledger (kept, not retired).** A field tagged `fingerprint:"-"` is the **dangerous** direction — it removes the field from the hash, the G5-violating way to ship a stale artifact — so it gets *more* scrutiny, not less. An enumerated list (the surviving half of today's `expectedExclusions`) names every `-` field with a one-line justification; adding or removing a `-` requires editing the list, so an accidental exclusion fails CI. Mandatory tags fix the *forgotten*-field default, but only this independent ledger catches a *wrongly-excluded* field — no coverage test exercises a field that claims to be unmeasured. -4. **Golden-vector coverage** — the keystone (next). +**Enforcement**, in order of strength: -#### Golden-vector coverage is the load-bearing invariant +1. **Compiler.** A generated `projectVN` references each measured field by literal Go path and emits a literal key: deleting a field a retained version measures won't compile, and the key cannot silently drift to the Go identifier. +2. **Generator (generate-time).** The generator **enumerates every field reachable from a fingerprinted root, recursing by field type** — this walk is the completeness guard, and it auto-discovers nested structs, replacing today's hand-maintained `fingerprintedStructs` list (which can silently go stale when a new nested type is added). It then refuses to emit on: a reached field with **no tag** (the include/exclude decision is mandatory — the generator must *fail* on an unrecognized field, never silently skip it, or a forgotten field drops out of the hash, a G5 hazard); a malformed, future-referencing, overlapping, or key-colliding tag set; or a `-`-tagged field absent from the **exclusion ledger** (an enumerated list — the surviving half of today's `expectedExclusions` — naming every `-` field with a justification, so an accidental exclusion fails generation). It also enforces the [coverage oracle](#golden-vector-coverage-the-backstop). +3. **Regeneration-idempotence.** CI runs `go generate` and fails on any diff to a retained `projectVN` — a shipped version's emitted code cannot change without an intentional regeneration that the diff surfaces. +4. **Golden vectors** — the semantic backstop (next). -Frozen-ness moved from a *structural* property (the rejected hand-written functions would not *compile* if you deleted a named field) to a *test* property (the golden-vector table forbids the live walker from drifting). That trade is sound **only if the corpus actually exercises every field of every retained version.** State it as a first-class, enforced invariant: +#### Golden-vector coverage: the backstop -> **Coverage invariant:** every field whose tag set includes a retained version MUST appear, **non-zero**, in at least one retained golden vector — and a discrimination check must vary it and assert the version's hash *moves*. A field that is never exercised non-zero is invisible to CI, so any drift in its emit-key, encoding, or zero-handling would pass silently. +Compiler + generator + regeneration-idempotence carry the structural load; golden vectors are the semantic backstop that catches a *generator* bug moving a shipped version's bytes. The backstop is only as good as its corpus, so the corpus must exercise every field of every retained version: -**The oracle must be independent of the tag.** The subtle trap: if the discrimination check derives *which versions to test* from the same range tag it is meant to police, a wrong-narrow tag silences its own check. A field build-effective today but mistagged `v1..v1` (while `currentLockContentVersion` is `v2`) tells a tag-derived generator "only check v1" — so the v2 discrimination check the invariant promises **never runs**, and the field silently drops out of the current hash (G5). The fix is a tag-independent oracle: **every fingerprinted field on the struct is expected to be measured-and-discriminating at the *current* version, unless it appears in a reviewed *dropped-fields ledger*** (sibling to the `-` exclusion ledger, naming each field intentionally closed at version N with a justification). The coverage test reads that ledger, *not* the range tag, to decide what to assert — so a range that excludes the current version without a matching ledger entry fails CI. The tag is the implementation; the ledger is the oracle; they are checked against each other. +> **Coverage invariant:** every field whose tag set includes a retained version MUST appear, **non-zero**, in at least one retained golden vector, with a discrimination check that varies it and asserts the version's hash *moves*. A field never exercised non-zero is invisible to the backstop. -This single rule mechanically closes three holes at once, each otherwise a silent stale-artifact (G5) bug that escapes CI *precisely because* some field is unset in the vectors: +**The coverage oracle must be independent of the tag** (enforced by the generator). If the discrimination check derived *which versions to test* from the same range tag it polices, a wrong-narrow tag would silence its own check: a field build-effective today but mistagged `v1..v1` (while `currentLockContentVersion` is `v2`) would tell a tag-derived check "only check v1," so the promised v2 check never runs and the field silently drops out of the current hash (G5). The fix: **every fingerprinted field is expected to be measured-and-discriminating at the *current* version unless it appears in a reviewed *dropped-fields ledger*** (sibling to the `-` exclusion ledger, naming each field intentionally closed at version N). The oracle reads that ledger by struct-reflection, *not* the range tag — so a range that excludes the current version without a matching ledger entry fails generation. (PR-A acceptance criterion, with a unit test that the oracle catches the narrow-tag hole.) -- **Wrong/narrow range** (e.g. `v1..v1` on a field that is build-effective at v2, or a typo'd gap `v1..v4,v6..*` dropping v5): the field is not in the dropped-fields ledger, so the oracle still expects a *current-version* discrimination move — the tag says it is unmeasured there, the assertion fires, CI fails. (This is the case the tag-independent oracle exists to catch.) -- **Emit-key drift** (a field rename that reached the key): the covered config now emits under a different key → its vector moves → CI fails. -- **Encoding/type drift** (a field's bytes change shape under the live walker): same — the covered vector moves. +Non-zero coverage alone is necessary but not sufficient; four more obligations close holes a single `"foo"`-valued vector leaves open, each a generator responsibility: -Without coverage, all three pass CI whenever the field happens to be zero in the corpus. With it, "the golden vectors prove frozen-ness" becomes a true statement instead of an aspiration. This is the substrate's one *test*-enforced (not structurally-enforced) guarantee, and it must be wired in PR A alongside the vectors themselves. +- **`!`-zero behavior.** Dropping `!` silently stops emitting a build-meaningful zero (G5). Every retained `!` range needs a **zero-valued** discrimination vector. +- **Encoding across the value space.** A `"foo"` vector misses an encoder change affecting only delimiter bytes, multibyte runes, or multi-entry slices/maps. Add per-encoder **property/fuzz** vectors. (Fails toward G1 over-drift except under a collision the length-prefixed form makes unlikely.) +- **nil-vs-empty-slice is a resolver invariant.** Under `IsZero()` a nil slice omits and `[]` emits; "resolution normalizes to one canonical form" is distributed. Pin it with a **resolver-side canonical-form test** over fingerprint-sensitive slice/map fields. (A `!` on an all-zero nested *struct* has the same subtlety — settle in PR A.) +- **Enumerator completeness.** The coverage corpus is checked against the **generator's own field enumeration** (above), so it cannot drift from what is measured — a newly-added field or nested struct that the generator reaches but the corpus does not cover fails the backstop, not just the generator. ### Baseline v1 — omit-if-zero, no include-always legacy Because the reset rebuilds everything, there is **no pre-existing population to stay byte-compatible with.** That removes the single biggest constraint of the incremental plan: we do **not** need an `include-always` compatibility mode to preserve today's hashes. `projectV1` is the omit-if-zero projection from day one. There is no `computeFP1 = legacy include-always` entry to carry forever — the registry's floor *starts* at the clean projection. ```go -// Membership is declared per field as a version-set tag; one generic walker -// emits the fields whose set includes the target version, in stable key order. -// What freezes a version is that its tags never change once shipped (golden -// vectors enforce it) — not that the walker is bespoke. +// You write tags; a `go generate` step emits the per-version projection. type ComponentConfig struct { Upstream string `toml:"upstream" fingerprint:"v1..*"` // key "upstream", omit-if-zero Patches []string `toml:"patches" fingerprint:"v1..*"` // omit-if-zero StripDebug bool `toml:"strip-debug" fingerprint:"!v1..*"` // always-emit: zero (false) is build-meaningful Internal string `toml:"-" fingerprint:"-"` // never measured - // … every fingerprinted field carries an explicit tag; absent ⇒ build failure … + // … every fingerprinted field carries an explicit tag; absent ⇒ generation fails … } -// fingerprintFields recurses into nested fingerprinted structs (same scope as -// the retired audit), returning each leaf field with its frozen TOML key, its -// version-set, and the resolved value. The key — not the Go field name — is what -// gets emitted, so a Go rename is byte-neutral. -func project(c *ComponentConfig, version int) []byte { +// GENERATED — do not edit. The body is a literal field list, so deleting a +// field it names won't compile, and the emit-key is a literal string. +func projectV1(c *ComponentConfig) []byte { var b canonicalBuf - for _, f := range fingerprintFields(c) { // reflection, cached, sorted by TOML key - r := f.set.rangeContaining(version) - if r == nil { - continue // field not measured at this version - } - // omit when the resolved value IsZero, UNLESS this range is '!' (always-emit). - if !r.always && f.value.IsZero() { - continue - } - b.emit(f.key, f.value) - } - return b.Bytes() + b.emit("patches", c.Patches) // omit-if-zero + b.emitAlways("strip-debug", c.StripDebug) // '!' → emit even when zero + b.emit("upstream", c.Upstream) // omit-if-zero + return b.Bytes() // keys sorted at generation time } ``` @@ -373,7 +363,7 @@ The lock **format** `Version` stays at `1`. The on-disk *schema* is unchanged Recovery from a sub-`v1` token is the **same mechanism** as the reset itself: a token with no `v:` prefix (or a version below `minSupportedLockContentVersion`) cannot be replayed, so it is treated as `Stale` and **force-rehashed** to the current version on the next `update`. One code path unifies three cases: -- **Pre-reset locks** carry a legacy decimal hash with no prefix → force-rehashed to `v1` at the reset. +- **Pre-reset locks** carry a legacy `sha256:` hash with no `v:` prefix → force-rehashed to `v1` at the reset. - **An old binary that rewrites a reset lock** stamps its legacy-substrate hash (no prefix) → the next new-binary run force-rehashes it back to `v1`. The mischief is self-correcting, never silent corruption. - **A future floor raise** (after a deliberate `component migrate`) retires an old `v` the same way. @@ -503,7 +493,7 @@ Split the change into its two halves; they are handled independently: So the bump is **not breaking**: replay answers "were the *old* inputs unchanged?" without rebuilding. -**The one constraint replay still imposes: a field a retained version still measures must stay on the struct.** The projection is immune to field *additions* (the walker only emits fields whose tag set includes the target version, so a new field is invisible to old versions). It is *not* immune to field *removal*: v1 still measures `c` (its tag set includes v1) and the retained v1 golden vector sets `c`, so physically deleting `c` from the struct makes that vector's config unconstructable → the golden-vector test fails to build. (Hand-written `projectVN` functions would make this a *compile* error instead — a marginally stronger guard the tag walker trades for an equally-blocking CI one; see [D2](#d2--version-tagged-field-selection--golden-vector-coverage).) Removal is therefore the one edit still gated by a **deprecate-then-delete** two-step, both non-breaking: +**The one constraint replay still imposes: a field a retained version still measures must stay on the struct.** The projection is immune to field *additions* (a generated `projectVN` only emits the fields its version's tags include, so a new field is invisible to old versions). It is *not* immune to field *removal*: the generated `projectV1` references `c` by literal Go path, so physically deleting `c` from the struct **won't compile** while v1 is retained. Removal is therefore the one edit still gated by a **deprecate-then-delete** two-step, both non-breaking: 1. **Bump to v2 measuring `{a,b,d}` but keep field `c` on the struct** so the v1 projection can still read it for replay (close `c`'s tag to `v1..v1`, so v2 does not measure it). Every old lock replays clean at v1, is recognized as unchanged, lazy re-stamps to v2. Zero forced rebuilds. 2. **Only after the floor passes v1** (`minSupportedLockContentVersion = 2`, ideally after a deliberate `component migrate`) physically delete field `c` and `projectV1`. @@ -526,7 +516,7 @@ This is the on-disk TOML axis. It is **independent** of the fingerprint axis and The critical invariant: **migrate old TOML → latest canonical struct, then project once.** A semantically no-op migration (rename `foo`→`bar`) must produce the *same* canonical struct, hence the same projection bytes, hence no drift. This is what keeps the schema axis **orthogonal** to the lock axis: a faithful `config migrate` is a pure re-encoding that moves *no* fingerprint, so it never triggers a `component migrate`. If a TOML change genuinely alters build meaning, that is a content-version bump (Part 2), not a `config migrate`. -**Resolved by projection:** the old `hashstructure` caveat — that it mixed `reflect.Type.Name()` into the hash, so renaming a Go struct moved every fingerprint even with identical content — **no longer applies.** The walker hashes only the explicit field bytes it emits, under each field's **frozen TOML key**, never the Go type or field name. So *both* a struct-type rename **and** a cosmetic field rename (`Foo`→`Bar`, same `toml:` key) are genuinely drift-neutral — **pinned by golden tests** (rename a fingerprinted struct, and rename a field while keeping its TOML key → byte-identical digest in both cases), so the property is CI-enforced, not just asserted here. Renaming the *TOML key itself* is an output-changing edit and takes a version bump like any other. +**Resolved by projection:** the old `hashstructure` caveat — that it mixed `reflect.Type.Name()` into the hash, so renaming a Go struct moved every fingerprint even with identical content — **no longer applies.** The generated projection emits only the explicit field bytes, under each field's **frozen TOML key**, never the Go type or field name. So *both* a struct-type rename **and** a cosmetic field rename (`Foo`→`Bar`, same `toml:` key) are genuinely drift-neutral — **pinned by golden tests** (rename a fingerprinted struct, and rename a field while keeping its TOML key → byte-identical digest in both cases), so the property is CI-enforced, not just asserted here. Renaming the *TOML key itself* is an output-changing edit and takes a version bump like any other. ## Pipeline @@ -562,7 +552,14 @@ The versioned-replay story in Part 2 must hold for **every** reader of `InputFin The `changed.go` classifier is the easily-missed member of the *second* class: it must get the same **digest-compare** as `FindFingerprintChanges`, so a version-only delta reads as "no change" — not a replay (which it cannot do, holding no inputs). -**This contract is enforced by a type, not prose — the `fingerprint.Token` choke-point.** A reviewer-vigilance rule across the five-plus comparison sites is the kind of discipline this RFC elsewhere converts to structure (the atomic token, D3), and digest-comparison widens the surface: `v:` prefix-parsing now lives at two historical + three current-tree sites, and the "digest-compare two stored strings" pattern is **copyable**. The residual hazard is therefore not mere *omission* — a forgotten replay at a current-tree site fails *safely* toward inequality → spurious `Stale`/`Changed` → wasteful rebuild (G1 churn, never G5) — it is *mis-classification*: a future consumer that holds live inputs but copies the **historical** stored-string template never looks at those inputs → silently accepts a stale tree → reachable G5. Omission is safe; mis-classification is not, and only structure closes it. So an opaque `fingerprint.Token` type (unexported internals) carries **one strict parser** — `ParseToken` accepts only `sha256:` (legacy) and `v:sha256:`, treats any malformed token as *changed* (never normalizing a parse failure to an empty digest), and is the *sole* way to read a stored hash — routed through a single `Reconcile(lock) → {Fresh | Stale | RestampTo(v)}` API. A raw `==` on a stored hash outside that package will not compile. This lands in PR C, which already edits every one of those sites; it has no on-disk-format dependency, so there is no reason to touch all five sites twice and carry the mis-classification window in between. +**This contract is enforced by types, not prose — the `fingerprint.Token` choke-point.** A reviewer-vigilance rule across the five-plus comparison sites is the kind of discipline this RFC elsewhere converts to structure (the atomic token, D3), and digest-comparison widens the surface: `v:` prefix-parsing now lives at two historical + three current-tree sites, and the "digest-compare two stored strings" pattern is **copyable**. The residual hazard is therefore not mere *omission* — a forgotten replay at a current-tree site fails *safely* toward inequality → spurious `Stale`/`Changed` → wasteful rebuild (G1 churn, never G5) — it is *mis-classification*: a future consumer that holds live inputs but copies the **historical** stored-string template never looks at those inputs → silently accepts a stale tree → reachable G5. Omission is safe; mis-classification is not, and only structure closes it. + +The fix is a **two-type split**, because a single token type cannot tell the two comparator classes apart: + +- **`StoredToken`** — parsed from a lock by the *sole* strict parser `ParseToken` (accepts only `sha256:` legacy and `v:sha256:`; any malformed token is treated as *changed*, never normalized to an empty digest). It exposes `SameDigest(other StoredToken)` and nothing else — it holds no inputs, so a site that has only stored strings *physically cannot* perform a freshness decision. +- **`FreshToken`** — obtainable *only* from `ComputeIdentityAt(version, config, …)`, so constructing one requires live inputs. It exposes `Reconcile(stored StoredToken) → {Fresh | Stale | RestampTo(v)}`. + +A historical site holding two `StoredToken`s can call `SameDigest` but cannot fabricate a `FreshToken`, so it cannot accidentally pose as a current-tree freshness check; a current-tree site must obtain a `FreshToken` to reconcile, which forces it through live inputs. The *assignment* documents the class, and the mis-classification path is unconstructible rather than merely discouraged. Both types are **non-comparable** (an unexported `_ [0]func()` field), so a raw `==` on a token outside the `fingerprint` package fails to compile — closing the copy-the-`==` path too. (Unexported fields alone would *not* do this: a struct of comparable unexported fields is still `==`-comparable from any package; the non-comparable sentinel is what blocks it.) This lands in PR C, which already edits every one of those sites; it has no on-disk-format dependency, so there is no reason to touch all five sites twice and carry the mis-classification window in between. ### The synthetic changelog/release path is the real hazard @@ -595,17 +592,17 @@ Both can omit zero values; the decisive difference is **whether an old algorithm | Sound replay (Part 2) | Yes | No (the disqualifier) | | Meaningful empties | `!`-prefixed range per field | `fingerprint:"always"` per field | | Type-name in hash | No (rename is drift-neutral) | Yes (rename moves every hash) | -| Plumbing | Generic walker + version tags + golden vectors | Value-receiver `HashInclude` on every nested struct + `v.(reflect.Value)` assert | +| Plumbing | Version tags + generator + golden vectors | Value-receiver `HashInclude` on every nested struct + `v.(reflect.Value)` assert | `Includable` keeps today's hashes byte-identical, which mattered for an *incremental* rollout — but that property is worthless once the reset rebuilds everything anyway, and it comes attached to a substrate that makes replay unsound. Projection trades byte-compatibility (which we are spending on the coordinated cutover regardless) for frozen replay (which we need forever). Adopted at the reset. -### D2 — Version-tagged field selection + golden-vector coverage +### D2 — Version-tagged field selection, generated -Field membership lives in a per-field version-set tag (`fingerprint:"v1..*"`) read by one generic walker — not in N hand-written functions, and not in the binary include/exclude tag of today's reflective audit. Rationale: +Field membership lives in a per-field version-set tag (`fingerprint:"v1..*"`); a `go generate` step emits the per-version `projectVN` functions from those tags. This is the chosen mechanism over both a runtime reflective walker and hand-written functions — it takes the declarative authoring of the former and the compile-time guarantees of the latter. Rationale: -- **The unsafe direction is the false-negative** (a meaningful field silently omitted → missed rebuild → stale artifact, a G5 violation). A *mandatory* tag — absent → build failure — makes the include/exclude decision impossible to *forget*. The *wrongly-excluded* case (a `-` tag on a build-effective field) is caught by the kept exclusion ledger, and the *wrongly-included-but-unmeasured* case by golden-vector coverage — see [Enforcement](#golden-vector-coverage-is-the-load-bearing-invariant). -- **Version-awareness is declarative.** A field's whole lifecycle — introduced at v3, dropped at v5, revived at v8 — is one greppable string on the field (`v3..v4,v8..*`), not a diff smeared across three function bodies. Recovery (bring-back) is *expressible* precisely because the set is non-contiguous. -- **Cost: frozen-ness is *test*-enforced, not *structurally* enforced.** The walker reflects the live struct (the very thing Problem 6 rejected); only the version-set tag re-freezes membership, and golden-vector **coverage** re-freezes the emit-key, encoding, and zero-predicate. A hand-written `projectVN` would make field removal a *compile* error; the coverage invariant turns it into an equally-blocking CI failure instead, in exchange for the declarative lifecycle, native completeness, and first-class recovery. The hand-written variant is kept as [Option B](#alternatives-considered). +- **The unsafe direction is the false-negative** (a meaningful field silently omitted → missed rebuild → stale artifact, a G5 violation). A *mandatory* tag — absent → generation fails — makes the include/exclude decision impossible to *forget*. The *wrongly-excluded* case (a `-` tag on a build-effective field) is caught by the kept exclusion ledger, and the *wrongly-included-but-unmeasured* case by the [coverage backstop](#golden-vector-coverage-the-backstop). +- **Version-awareness is declarative.** A field's whole lifecycle — introduced at v3, dropped at v5, revived at v8 — is one greppable string on the field (`v3..v4,v8..*`), inexpressible in hand-written form, with no diff smeared across function bodies. +- **Frozen-ness stays structural.** Because the generated functions are checked-in code, a retained `projectVN` references each field by literal Go path — deleting a measured field won't compile, the emit-key is a literal, and regeneration-idempotence (CI `go generate` + diff) pins a shipped version's output. Golden vectors are the semantic backstop behind that, not the sole guarantee. This recovers the hand-written model's compile guarantee that a *runtime* reflective walker gives up (its output would reflect the live struct at hash time — Problem 6 one layer down), while keeping the DSL's declarative lifecycle. The cost is a generator; the project already runs `go generate` via `mage` (`stringer`/`mockgen` precedent), so the infra cost is low. ### D3 — Atomic self-describing token; no format bump, reconcile via force-rehash @@ -624,16 +621,17 @@ The lock **format** `Version` stays at `1`. Bumping it to `2` as a poison pill - **Parallel versioned structs with per-struct `Hash()`** — couples locks to Go type identity and duplicates hashing logic per version. Rejected in favor of Part 2's integer-versioned combiner over frozen projections. - **Bump the lock format `Version` 1→2 as a poison pill** — makes old binaries hard-reject reset locks. Rejected: it also blocks old binaries from reading pins to queue a build, and it is unnecessary, since the content-version registry already force-rehashes any sub-floor or downgraded token (D3). Same-format + force-rehash keeps old binaries useful without risking silent corruption. - **Eager fleet-wide migration as the steady-state mechanism** — rewriting every lock on every algorithm change is the mass-churn the design exists to prevent. Rejected for the steady state. The *reset* is a deliberate, one-time, operator-driven eager pass riding an already-scheduled rebuild — the sanctioned exception, not the rule; `component migrate` is its post-reset equivalent for retiring an old version. -- **Hand-written per-version `projectVN` selection functions (instead of version tags).** Each version gets a bespoke `func projectVN(c) []byte` with one explicit `emit`/`emitAlways` line per measured field. *Win:* field removal is **compile-enforced** — deleting a struct field a retained `projectVN` still names won't compile (the tag walker downgrades this to a CI-time golden-vector failure). *Losses:* membership is smeared across N function bodies instead of one declarative tag per field; "bring a field back a few versions later" has no first-class expression (you re-add an `emit` line, with nothing tying it to the field's earlier life); and the mandatory-decision property needs a *separate* completeness test with an awkward field→emit-key bridge, where the tag simply *is* the ledger. Rejected in favor of version tags: the declarative lifecycle, native completeness, and expressible recovery outweigh trading one compile-time guard for an equally-blocking CI guard. +- **Runtime reflective walker for field selection (instead of generated functions).** One generic `project(cfg, N)` reflects the struct at hash time and emits the fields whose version-set includes N. Least code, and it shares the tag syntax with the chosen approach. Rejected: it reflects the *live* struct at hash time — Problem 6 one layer down — so its frozen-ness rests entirely on golden-vector coverage (test discipline), and field removal degrades from a compile error to a CI failure. Codegen keeps the same tags but moves the reflection to *generate* time and freezes the output as checked-in code, recovering the compile guarantee. +- **Hand-written per-version `projectVN` functions (instead of generating them from tags).** Each version gets a bespoke function with one explicit `emit`/`emitAlways` line per measured field. Same compile guarantees as codegen (removal won't compile, literal emit-key), but: membership is smeared across N function bodies; "bring a field back a few versions later" has no first-class expression (you re-add an `emit` line, nothing ties it to the field's earlier life); and the mandatory-decision and coverage properties need separate bookkeeping the tags otherwise carry. Codegen is the same runtime with declarative authoring — strictly preferable given the existing `go generate` infrastructure. - **Per-field hash manifest in the lock (instead of one opaque token).** Store `{field → hash}` (à la `go.sum`) rather than a single `v:sha256:…` digest. *Genuine wins:* dropping a field becomes ignoring its manifest line — no projection kept alive for replay, so the **deprecate-then-delete two-step and the registry-retirement deadlock** (the append-only growth above) both vanish; and the stored-vs-stored historical comparators become structural set-diffs rather than version-blind string compares. *Why the opaque token still wins for azldev:* (1) the projection substrate **already** delivers additive immunity (G4) — the manifest's headline draw — so that advantage is moot, not additive; (2) the manifest does **not** kill the false-fresh hazard — an old lock has *no line* for a newly-measured input, so there is still no baseline to detect a change to it (the blind spot is relocated, not removed); (3) it makes *algorithm evolution* — the entire point of Part 2 — **harder**, needing per-field versioning where the token needs one integer for the whole algorithm; and (4) it bloats every lock to O(fields × components) (the well-known `go.sum` size cost). The manifest is the better tool for a *static* input set that mainly grows and shrinks; the opaque token + single version is the better tool for an *evolving hashing algorithm*, which is azldev's actual problem. Recorded explicitly because the reset bakes the storage model in — token-vs-manifest is irreversible after PR B — and the retirement deadlock the manifest would have dissolved is instead answered by the floor-advance cadence above. ## Incremental delivery The reset (Part 1) must land as one coherent change at the dev→prod cutover; its pieces are independently reviewable but ship together because they all move the hash. -1. **PR A (substrate)**: the canonical encoder (`canonicalBuf`, `emit` with a per-range always flag), the generic tag-driven `project(cfg, N)` walker (recursing into nested fingerprinted structs) + version-set tag parser, **version tags on every fingerprinted field** (absent → build failure), the frozen **TOML-key** emit rule, the `reflect.Value.IsZero()` omit-predicate, the `sha256` combiner, the golden vectors **and the coverage invariant** (every field measured at a retained version appears non-zero in ≥1 vector, with a discrimination check). The mandatory-tag test plus the slimmed exclusion ledger replace the retired `TestAllFingerprintedFieldsHaveDecision` audit — the inclusion default is now native to the tag, the exclusion default stays ledgered. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Unit tests: a field tagged `v2..*` is invisible to a v1 projection; a `!`-prefixed range hashes even at zero; a field with **no** `fingerprint` tag fails the build; a **nested** fingerprinted struct with a tagless field fails the build; a **Go-field rename keeping the TOML key** yields a byte-identical digest; two fields colliding on one emit-key fail the build; a field whose range excludes the current version without a **dropped-fields-ledger** entry fails the coverage/discrimination check; the **coverage/discrimination** test fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`); golden vectors pin v1; editing a shipped version's output for an existing config fails a golden vector; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. +1. **PR A (substrate)**: the **projection generator** (`go generate`) — reads the version-set tags and emits the per-version `projectVN(cfg) []byte` functions (literal emits, sorted keys) plus golden-vector and coverage scaffolding — the canonical encoder (`canonicalBuf`, `emit`/`emitAlways`), the version-set tag parser, the frozen **TOML-key** emit rule, the `reflect.Value.IsZero()` omit-predicate, the `sha256` combiner, and the golden vectors. Generate-time guards: a fingerprinted field with **no tag** fails generation; the slimmed **exclusion ledger** and **dropped-fields ledger** replace the retired `TestAllFingerprintedFieldsHaveDecision` audit; **regeneration-idempotence** (CI `go generate` + `git diff --exit-code`) pins shipped versions. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Tests: a field tagged `v2..*` is absent from generated `projectV1`; a `!` range emits at zero; a field with **no** `fingerprint` tag fails generation; a **nested** fingerprinted struct with a tagless field fails generation; deleting a field a retained `projectVN` names **fails to compile**; a **Go-field rename keeping the TOML key** yields a byte-identical digest; two fields colliding on one emit-key fail generation; the coverage oracle (by struct-reflection, not the tag) fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`) and is not in the dropped-fields ledger; golden vectors pin v1; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. 2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`. Ships at the cutover; absorbed by the scheduled rebuild. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. -3. **PR C (Part 2 machinery)**: the opaque **`fingerprint.Token` type** (unexported internals) with the single strict `ParseToken` (accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match) and the `Reconcile(lock) → {Fresh | Stale | RestampTo(v)}` API; the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **all five** comparison sites through `Token`/`Reconcile` — replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, `BuildDirtyChange` (the three current-tree sites), plus digest-compare in `FindFingerprintChanges` and `changed.go`'s `classifyComponent` (the two historical sites). Resolution replay reserved (slot reuses `computeRes1`). **Not fully inert:** this PR switches the live compares from raw-string to `Token`-routed *on merge* — only the *registry dispatch* is dormant while just `v1` exists, and `BuildDirtyChange`'s replay is a hard prerequisite for any later PR that registers `v2`. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a stored hash outside the `fingerprint` package fails to compile. +3. **PR C (Part 2 machinery)**: the **two-type token split** — `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **all five** comparison sites through these types — replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, `BuildDirtyChange` (the three current-tree sites, via `FreshToken.Reconcile`), plus `StoredToken.SameDigest` in `FindFingerprintChanges` and `changed.go`'s `classifyComponent` (the two historical sites). Resolution replay reserved (slot reuses `computeRes1`). **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* — only the *registry dispatch* is dormant while just `v1` exists, and `BuildDirtyChange`'s replay is a hard prerequisite for any later PR that registers `v2`. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a historical site cannot construct a `FreshToken` (no live inputs). 4. **PR D (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) — add a field absent from `projectV1` and set it on one component; assert only that lock drifts and every other lock is byte-identical. 5. **PR E (config schema axis, later)**: `schema-version` field + load-time canonical migration + the `config migrate` command. Gated on the first post-reset non-additive TOML change not already absorbed by the reset's normalization pass. @@ -644,4 +642,4 @@ Each PR is independently revertible up to the cutover. PRs A–B land together a 1. Should a lazy re-stamp during a *read-only* command (`render`, `build` freshness check) write the lock back, or defer all writes to `component update`? Writing on read is surprising; deferring means freshness checks stay slightly slower until the next update. (Leaning: defer all writes to `update`, keeping reads side-effect-free.) 2. For the config schema axis, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. -*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → build failure, `!`-prefix for always-emit), read by one generic walker — restoring "forgotten field → loud build failure" natively; the **emit-key is the frozen TOML key** (never the Go field name, so a field rename is byte-neutral; `key=` overrides for keyless fields; duplicate emit-keys fail the build), the **omit-predicate is `reflect.Value.IsZero()`** (former Open Q#3), and the tag DSL is **frozen at three range-operators** (`..`, `!`, `*`) plus the orthogonal `key=`; frozen-ness rests on the **golden-vector coverage invariant** (every field measured at a retained version appears non-zero in ≥1 vector, with a discrimination check) whose oracle is a **tag-independent dropped-fields ledger** (a field absent from it must discriminate at the *current* version), plus a **kept exclusion ledger** for `-` fields (the inclusion default is native to the tag; the *exclusion* default stays ledgered because it is the G5-dangerous direction); the stored hash is read only through an opaque **`fingerprint.Token`** type with one strict parser and a `Reconcile` API (adopted in PR C, closing the comparator mis-classification hazard at compile time); baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version covers both stored hashes **permanently** (no split) — the historical changelog/classifier comparators compare the **digest** (stripping the `v:` prefix), so advancing the shared prefix for a resolution-only algorithm change moves no digest and mints no release; resolution replay stays reserved (slot present, `computeRes1` reused) until `ComputeResolutionHash` first changes. +*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → generation fails, `!`-prefix for always-emit) from which a **`go generate` step emits the per-version `projectVN` functions** — restoring "forgotten field → loud failure" natively and making field removal a *compile* error; the **emit-key is the frozen TOML key** (never the Go field name, so a field rename is byte-neutral; `key=` overrides for keyless fields; duplicate emit-keys fail generation), the **omit-predicate is `reflect.Value.IsZero()`** (former Open Q#3), and the tag DSL is **frozen at three range-operators** (`..`, `!`, `*`) plus the orthogonal `key=`; frozen-ness rests on the **compiler + generator + regeneration-idempotence**, with the **golden-vector coverage invariant** (every field measured at a retained version exercised non-zero in ≥1 vector, with a tag-independent discrimination oracle) as the semantic backstop whose oracle is a **tag-independent dropped-fields ledger** (a field absent from it must discriminate at the *current* version), plus a **kept exclusion ledger** for `-` fields (the inclusion default is native to the tag; the *exclusion* default stays ledgered because it is the G5-dangerous direction); the stored hash is read only through a **two-type token split** — `StoredToken` (digest-compare only) and `FreshToken` (reconcile, requires live inputs), both non-comparable — adopted in PR C, closing both the raw-`==` and comparator mis-classification hazards at compile time; baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version covers both stored hashes **permanently** (no split) — the historical changelog/classifier comparators compare the **digest** (stripping the `v:` prefix), so advancing the shared prefix for a resolution-only algorithm change moves no digest and mints no release; resolution replay stays reserved (slot present, `computeRes1` reused) until `ComputeResolutionHash` first changes. From 06e3e9bfbc9eb030966d53b4d5d0418f0cf56814 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Tue, 9 Jun 2026 13:14:00 -0700 Subject: [PATCH 09/15] update 7 --- docs/developer/rfc/lazy-schema-migration.md | 61 ++++++++++++++------- 1 file changed, 40 insertions(+), 21 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index 8b84da29..c5cd93e6 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -151,7 +151,7 @@ The recurring requirement across the "No" rows is the same: **distinguish a chan **Adding a field as `!` (always-emit) to a *live* version is a version-bump event, not a free additive.** A zero-valued `!` field emits bytes for *every* component, including those that never set it, so it moves every lock the instant it lands on the current version — the opposite of "leave old locks alone." Build-meaningful-zero fields must therefore be introduced at a *new* version (`!v(N+1)..*`) and absorbed by replay, exactly like any other non-additive change. Only omit-if-zero additions (`vN..*`) are free on the live version. -> **`projectVN`** is shorthand used throughout this RFC for the canonical *projection at content-version N* introduced by this design (defined in [Substrate options](#substrate-options) and [The projection substrate](#the-projection-substrate)). It is a per-version function `projectVN(cfg) []byte` — **generated** from declarative version-set tags on the struct fields (see [Version-tagged field selection](#version-tagged-field-selection)), not hand-written. `projectV1` measures the fields whose tag set includes v1; `projectV2` the next version, and so on. Each generated `projectVN` is frozen once shipped (its source tags never move; the generated code is checked in; golden vectors backstop it) — that is the whole point. +> **`projectVN`** is shorthand used throughout this RFC for the canonical *projection at content-version N* introduced by this design (defined in [Substrate options](#substrate-options) and [The projection substrate](#the-projection-substrate)). It is a per-version function `projectVN(cfg) []byte` — **generated** from declarative version-set tags on the struct fields (see [Version-tagged field selection](#version-tagged-field-selection)), not hand-written. `projectV1` measures the fields whose tag set includes v1; `projectV2` the next version, and so on. Each generated `projectVN` freezes once *superseded* (the next version is registered): its source tags no longer move, its generated code is checked in, and golden vectors backstop it. The live version stays editable for output-preserving additions — that is the whole point. ## Research @@ -211,7 +211,7 @@ ComponentConfig ──projectV1(cfg)──► canonical bytes ──sha256── Three things this buys that `hashstructure` could not: -- **Frozen by construction.** A shipped `projectVN`'s body is fixed checked-in code (its source tags never move), so adding `Foo` to the struct later cannot change `projectV1`'s output for an old config. This is what makes Part 2's replay sound (Problem 6) and G4 true for *old* locks, not just new ones. +- **Frozen by construction.** A *superseded* `projectVN`'s body is fixed checked-in code (CI fails on any diff to it), so adding `Foo` to the struct later cannot change a historical `projectV1`'s output for an old config. (The *live* version's projector stays mutable for output-preserving additions — see [enforcement](#version-tagged-field-selection); "frozen" means `version < current`.) This is what makes Part 2's replay sound (Problem 6) and G4 true for *old* locks, not just new ones. - **No method-set / receiver magic.** No `Includable`, no per-nested-struct method, no `v.(reflect.Value)` type-assert footgun. Selection is a declarative tag the generator reads. - **Removal is a compile error; rename is byte-neutral.** A generated `projectVN` references each measured field by its literal Go path and emits a literal key, so deleting a field a retained version still measures won't compile, and renaming the Go field changes nothing. Golden vectors backstop the generator itself. @@ -277,12 +277,17 @@ Every frozen output is byte-preserved, and the **golden vectors prove it**: the 1. **Compiler.** A generated `projectVN` references each measured field by literal Go path and emits a literal key: deleting a field a retained version measures won't compile, and the key cannot silently drift to the Go identifier. 2. **Generator (generate-time).** The generator **enumerates every field reachable from a fingerprinted root, recursing by field type** — this walk is the completeness guard, and it auto-discovers nested structs, replacing today's hand-maintained `fingerprintedStructs` list (which can silently go stale when a new nested type is added). It then refuses to emit on: a reached field with **no tag** (the include/exclude decision is mandatory — the generator must *fail* on an unrecognized field, never silently skip it, or a forgotten field drops out of the hash, a G5 hazard); a malformed, future-referencing, overlapping, or key-colliding tag set; or a `-`-tagged field absent from the **exclusion ledger** (an enumerated list — the surviving half of today's `expectedExclusions` — naming every `-` field with a justification, so an accidental exclusion fails generation). It also enforces the [coverage oracle](#golden-vector-coverage-the-backstop). -3. **Regeneration-idempotence.** CI runs `go generate` and fails on any diff to a retained `projectVN` — a shipped version's emitted code cannot change without an intentional regeneration that the diff surfaces. +3. **Regeneration-idempotence.** CI runs `go generate` and fails on any diff to a **strictly-historical** `projectVN` (`version < currentLockContentVersion`) — a *superseded* version's emitted code cannot change without an intentional, diff-surfaced regeneration. The **live** version's `projectVN` is deliberately *mutable*: an output-preserving omit-if-zero addition regenerates it (a new `b.emit` line) and that diff is expected, not a violation. "Frozen" throughout this RFC means **superseded** (`version < current`), not "shipped" — a version freezes when the next one is registered, not the moment it first ships. 4. **Golden vectors** — the semantic backstop (next). #### Golden-vector coverage: the backstop -Compiler + generator + regeneration-idempotence carry the structural load; golden vectors are the semantic backstop that catches a *generator* bug moving a shipped version's bytes. The backstop is only as good as its corpus, so the corpus must exercise every field of every retained version: +Compiler + generator + regeneration-idempotence carry the structural load; golden vectors are the semantic backstop that catches a *generator* bug moving a shipped version's bytes. Two properties make the backstop structural rather than discipline: + +- **Expected digests are hand-authored and never generator-emitted.** If the `(config, version) → digest` table were regenerated in lockstep with the projector code, a "delete-everything-and-regenerate" commit would move *both* the code and its own expected values, and the backstop would silently agree with itself. The expected digests are therefore hand-committed; `go generate` may scaffold *cases* but must never write the expected values. A mutation to any retained vector is a hard CI failure, not a moved line a reviewer must notice. +- **Retained-version manifest.** The generator validates each retained version against a checked-in manifest (the field set + emit-keys that version measures); generation fails if a retained version's entry lacks a compatible live path, *unless* it is below `minSupportedLockContentVersion`. This is what keeps field-removal structural under the normal *delete + regenerate + commit* workflow (where the compile guard alone would be bypassed) — so the negative test is **delete + regenerate + build**, not just delete. + +The backstop is only as good as its corpus, so the corpus must exercise every field of every retained version: > **Coverage invariant:** every field whose tag set includes a retained version MUST appear, **non-zero**, in at least one retained golden vector, with a discrimination check that varies it and asserts the version's hash *moves*. A field never exercised non-zero is invisible to the backstop. @@ -292,7 +297,8 @@ Non-zero coverage alone is necessary but not sufficient; four more obligations c - **`!`-zero behavior.** Dropping `!` silently stops emitting a build-meaningful zero (G5). Every retained `!` range needs a **zero-valued** discrimination vector. - **Encoding across the value space.** A `"foo"` vector misses an encoder change affecting only delimiter bytes, multibyte runes, or multi-entry slices/maps. Add per-encoder **property/fuzz** vectors. (Fails toward G1 over-drift except under a collision the length-prefixed form makes unlikely.) -- **nil-vs-empty-slice is a resolver invariant.** Under `IsZero()` a nil slice omits and `[]` emits; "resolution normalizes to one canonical form" is distributed. Pin it with a **resolver-side canonical-form test** over fingerprint-sensitive slice/map fields. (A `!` on an all-zero nested *struct* has the same subtlety — settle in PR A.) +- **nil-vs-empty is a resolver invariant — slices *and* maps.** Under `IsZero()` a nil slice/map omits and a non-nil empty one (`[]`, `{}`) emits, and `mergo.Merge` can yield either for the same intent. "Resolution normalizes to one canonical form" is a *distributed* property; pin it with a **resolver-side canonical-form test** over every fingerprint-sensitive slice **and map** field, not only per-field vectors. +- **`!` on an all-zero nested struct emits.** `IsZero()` on a struct is true iff every sub-field is zero, so a `!`-tagged nested struct whose fields all resolve to zero would otherwise be omitted; the generated sub-projector treats a `!` range as "emit the (recursively projected) value even when the struct `IsZero`," so a build-meaningful all-zero struct still hashes. Covered by a zero-valued discrimination vector like any other `!` range. - **Enumerator completeness.** The coverage corpus is checked against the **generator's own field enumeration** (above), so it cannot drift from what is measured — a newly-added field or nested struct that the generator reaches but the corpus does not cover fails the backstop, not just the generator. ### Baseline v1 — omit-if-zero, no include-always legacy @@ -302,10 +308,11 @@ Because the reset rebuilds everything, there is **no pre-existing population to ```go // You write tags; a `go generate` step emits the per-version projection. type ComponentConfig struct { - Upstream string `toml:"upstream" fingerprint:"v1..*"` // key "upstream", omit-if-zero - Patches []string `toml:"patches" fingerprint:"v1..*"` // omit-if-zero - StripDebug bool `toml:"strip-debug" fingerprint:"!v1..*"` // always-emit: zero (false) is build-meaningful - Internal string `toml:"-" fingerprint:"-"` // never measured + Upstream string `toml:"upstream" fingerprint:"v1..*"` // key "upstream", omit-if-zero + Patches []string `toml:"patches" fingerprint:"v1..*"` // omit-if-zero + Defines map[string]string `toml:"defines" fingerprint:"v1..*"` // map: emitted in sorted-key order + StripDebug bool `toml:"strip-debug" fingerprint:"!v1..*"` // always-emit: zero (false) is build-meaningful + Internal string `toml:"-" fingerprint:"-"` // never measured // … every fingerprinted field carries an explicit tag; absent ⇒ generation fails … } @@ -313,13 +320,21 @@ type ComponentConfig struct { // field it names won't compile, and the emit-key is a literal string. func projectV1(c *ComponentConfig) []byte { var b canonicalBuf + b.emitMap("defines", c.Defines) // entries emitted in sorted-key order (deterministic) b.emit("patches", c.Patches) // omit-if-zero b.emitAlways("strip-debug", c.StripDebug) // '!' → emit even when zero b.emit("upstream", c.Upstream) // omit-if-zero - return b.Bytes() // keys sorted at generation time + return b.Bytes() // field order fixed at generation time } ``` +**The generated encoding contract — frozen per version, fully specified before PR A.** The golden vectors bake the byte encoding in irreversibly at the reset, so every value type's serialization is a one-way door and must be pinned now, not discovered later. The contract: + +- **Maps emit in sorted-key order.** This is the one guarantee `hashstructure` gave for free that the projection must re-establish: a naive `range` over a Go map is **non-deterministic** (randomized iteration), so a generated `b.emitMap` must sort entries by key and emit each as `:=:` under the field key. Without this, an unchanged config hashes differently across runs → intermittent spurious lock drift (the exact G1/G2 catastrophe this RFC exists to prevent). Measured map fields are real (`Defines`, `Packages`), so this is mandatory, with a fuzz vector (≥2 keys, varying insert order → identical digest). +- **Value-slot encoding is defined per type, not left to `%v`.** `bool` → `"true"`/`"false"`; integers → base-10; `[]T` → each element as its own length-prefixed sub-value in slice order (not a JSON blob); `map` → as above. Interfaces, generics, and external types either get an explicit encoding or **fail generation** — no silent `fmt`-style fallback that a dependency could change underneath us. +- **Nested struct values are emitted by a frozen per-version sub-projector, never by runtime reflection.** If a generated `projectV1` emitted a `[]ComponentOverlay` by delegating to a *live* reflective encoder, adding a field to `ComponentOverlay` later would change `projectV1`'s output at hash time — Problem 6 reborn one layer down. The generator therefore emits a literal per-version projector for each nested struct type too; element/value projectors are frozen exactly like top-level ones. +- **Recursion prunes at `fingerprint:"-"`, per edge.** The completeness walk descends only through *measured* fields and only into struct kinds (through slice-element, map-value, and pointer-element types), treating defined scalar types as leaves. A `-` tag stops the walk at that **edge** (the field isn't measured, so its subtree isn't either) — not at the type (the same type reached through an *included* field elsewhere is still enumerated there). This breaks the real `ComponentConfig → SourceConfigFile → … → ComponentConfig` cycle (both back-edges are already `-`), and a **visited-type memo** on the included graph guards against any future included-path cycle. An untagged field reached on an *included* edge fails generation (the mandatory-decision guard); fields under a `-` edge are never reached, so they need no tag. + **Why omit-if-zero is safe — fingerprints see the resolved config.** The usual objection to blanket omit-if-zero is the false-negative footgun: a field whose zero is meaningful gets omitted and collides with "unset," so two semantically different configs hash the same and a rebuild is missed. That objection assumes we hash *raw user input*. We do not. `ComputeIdentity` runs on the **resolved, post-merge** config (`*result.config`, after defaults are applied). The omit predicate is therefore "the *resolved value* equals Go-zero," not "the user didn't type it." Consequences: - Two configs that both resolve a field to zero build identically → hashing them the same is **correct**, not a collision. @@ -552,14 +567,14 @@ The versioned-replay story in Part 2 must hold for **every** reader of `InputFin The `changed.go` classifier is the easily-missed member of the *second* class: it must get the same **digest-compare** as `FindFingerprintChanges`, so a version-only delta reads as "no change" — not a replay (which it cannot do, holding no inputs). -**This contract is enforced by types, not prose — the `fingerprint.Token` choke-point.** A reviewer-vigilance rule across the five-plus comparison sites is the kind of discipline this RFC elsewhere converts to structure (the atomic token, D3), and digest-comparison widens the surface: `v:` prefix-parsing now lives at two historical + three current-tree sites, and the "digest-compare two stored strings" pattern is **copyable**. The residual hazard is therefore not mere *omission* — a forgotten replay at a current-tree site fails *safely* toward inequality → spurious `Stale`/`Changed` → wasteful rebuild (G1 churn, never G5) — it is *mis-classification*: a future consumer that holds live inputs but copies the **historical** stored-string template never looks at those inputs → silently accepts a stale tree → reachable G5. Omission is safe; mis-classification is not, and only structure closes it. +**This contract is enforced by types, not prose — the `fingerprint.Token` choke-point.** A reviewer-vigilance rule across the comparison sites is the kind of discipline this RFC elsewhere converts to structure (the atomic token, D3), and digest-comparison widens the surface: `v:` prefix-parsing now lives at three historical + three current-tree sites, and the "digest-compare two stored strings" pattern is **copyable**. The residual hazard is therefore not mere *omission* — a forgotten replay at a current-tree site fails *safely* toward inequality → spurious `Stale`/`Changed` → wasteful rebuild (G1 churn, never G5) — it is *mis-classification*: a future consumer that holds live inputs but copies the **historical** stored-string template never looks at those inputs → silently accepts a stale tree → reachable G5. Omission is safe; mis-classification is not, and only structure closes it. The fix is a **two-type split**, because a single token type cannot tell the two comparator classes apart: - **`StoredToken`** — parsed from a lock by the *sole* strict parser `ParseToken` (accepts only `sha256:` legacy and `v:sha256:`; any malformed token is treated as *changed*, never normalized to an empty digest). It exposes `SameDigest(other StoredToken)` and nothing else — it holds no inputs, so a site that has only stored strings *physically cannot* perform a freshness decision. -- **`FreshToken`** — obtainable *only* from `ComputeIdentityAt(version, config, …)`, so constructing one requires live inputs. It exposes `Reconcile(stored StoredToken) → {Fresh | Stale | RestampTo(v)}`. +- **`FreshToken`** — obtainable *only* from `ComputeIdentityAt(version, config, …)`, so constructing a *valid* one requires live inputs. Its zero value (`var f FreshToken`) is still syntactically constructible, so it must **fail closed**: a `FreshToken` carries a validity bit set only by the constructor, and `Reconcile` on an unset one errors (or returns `Stale`), never silently treats an empty digest as a match. It exposes `Reconcile(stored StoredToken) → {Fresh | Stale | RestampTo(v)}`. -A historical site holding two `StoredToken`s can call `SameDigest` but cannot fabricate a `FreshToken`, so it cannot accidentally pose as a current-tree freshness check; a current-tree site must obtain a `FreshToken` to reconcile, which forces it through live inputs. The *assignment* documents the class, and the mis-classification path is unconstructible rather than merely discouraged. Both types are **non-comparable** (an unexported `_ [0]func()` field), so a raw `==` on a token outside the `fingerprint` package fails to compile — closing the copy-the-`==` path too. (Unexported fields alone would *not* do this: a struct of comparable unexported fields is still `==`-comparable from any package; the non-comparable sentinel is what blocks it.) This lands in PR C, which already edits every one of those sites; it has no on-disk-format dependency, so there is no reason to touch all five sites twice and carry the mis-classification window in between. +A historical site holding two `StoredToken`s can call `SameDigest` but cannot fabricate a `FreshToken`, so it cannot accidentally pose as a current-tree freshness check; a current-tree site must obtain a `FreshToken` to reconcile, which forces it through live inputs. The *assignment* documents the class, and the mis-classification path is unconstructible rather than merely discouraged. Both types are **non-comparable** (an unexported `_ [0]func()` field), so a raw `==` on a token outside the `fingerprint` package fails to compile — closing the copy-the-`==` path too. (Unexported fields alone would *not* do this: a struct of comparable unexported fields is still `==`-comparable from any package; the non-comparable sentinel is what blocks it.) This lands in PR C, which already edits every one of those sites; it has no on-disk-format dependency, so there is no reason to touch them twice and carry the mis-classification window in between. ### The synthetic changelog/release path is the real hazard @@ -575,11 +590,13 @@ A historical site holding two `StoredToken`s can call `SameDigest` but cannot fa `ComponentLock` carries a *second* persisted content hash, `ResolutionInputHash`, with its own staleness logic and its own silent-write path (it writes when only `resHashChanged`, never flipping `Changed`). It has the **identical** evolution problem as `InputFingerprint`, and the single shared content version covers it (see [Both hashes share one version](#both-hashes-share-one-version)). Two things make its replay safe to defer: -- **Smaller blast radius.** `ResolutionInputHash` does **not** feed `synthistory`, so an algorithm change can never mint a phantom changelog/release (that hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. +- **Smaller blast radius.** `ResolutionInputHash` does **not** feed `synthistory`, so an algorithm change can never mint a phantom changelog/release (that hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. (One honest caveat: a resolution-only bump moves the `input-fingerprint` *line bytes* too — the `v:` prefix advances — on a component whose render inputs did not change. Digest-comparison immunizes consumers from a phantom release, but it slightly erodes the raw "this `.lock` line moved ⇒ inputs changed" reviewer signal; the *digest*, not the line, is the trustworthy signal post-reset.) - **No pending change.** It is a flat seven-field SHA256, not a struct walk, so the projection substrate leaves it untouched. Its registry slot stays `computeRes1` until its inputs genuinely change. **Decision (KISS/YAGNI):** wire fingerprint replay in Part 2's first PR; reserve resolution replay (slot present, prior fn reused) and wire it the day `ComputeResolutionHash` first changes — add `computeRes2`, bump the shared version, re-stamp both fields together. No separate resolution prefix is needed: digest-comparison keeps the shared version correct for both ([above](#both-hashes-share-one-version)). +**The one genuinely-deferred seam — restamp-on-write.** The churn fix restamps the `InputFingerprint` token only `if result.Changed`, but a resolution-only write is `resHashChanged && !Changed` (it overwrites `ResolutionInputHash` without flipping `Changed`). While both hashes share v1 this is inert; but once resolution replay is wired, a resolution-only write could advance the shared version on one field and leave the `InputFingerprint` prefix lagging — two prefixes in one lock, which `parseTokenVersion` would then read inconsistently. The pin, applied when resolution replay lands: **restamp the shared prefix on `Changed || resHashChanged`** (or make `InputFingerprint` the sole prefix authority, advanced on any write). This cannot manifest before the first resolution-algorithm bump, so it is explicitly deferred — named here so the future author does not rediscover it the hard way. + ## Design decisions ### D1 — Canonical projection vs `hashstructure` + `Includable` @@ -594,7 +611,7 @@ Both can omit zero values; the decisive difference is **whether an old algorithm | Type-name in hash | No (rename is drift-neutral) | Yes (rename moves every hash) | | Plumbing | Version tags + generator + golden vectors | Value-receiver `HashInclude` on every nested struct + `v.(reflect.Value)` assert | -`Includable` keeps today's hashes byte-identical, which mattered for an *incremental* rollout — but that property is worthless once the reset rebuilds everything anyway, and it comes attached to a substrate that makes replay unsound. Projection trades byte-compatibility (which we are spending on the coordinated cutover regardless) for frozen replay (which we need forever). Adopted at the reset. +`Includable` keeps today's hashes byte-identical, which mattered for an *incremental* rollout — but that property is worthless once the reset rebuilds everything anyway, and it comes attached to a substrate that makes replay unsound. (Verified against `hashstructure` v2.0.2, the pinned version in `go.mod`: it reflects the live struct and method set at hash time and mixes `reflect.Type.Name()` into the digest — the properties Problem 6 turns on.) Projection trades byte-compatibility (which we are spending on the coordinated cutover regardless) for frozen replay (which we need forever). Adopted at the reset. ### D2 — Version-tagged field selection, generated @@ -602,7 +619,9 @@ Field membership lives in a per-field version-set tag (`fingerprint:"v1..*"`); a - **The unsafe direction is the false-negative** (a meaningful field silently omitted → missed rebuild → stale artifact, a G5 violation). A *mandatory* tag — absent → generation fails — makes the include/exclude decision impossible to *forget*. The *wrongly-excluded* case (a `-` tag on a build-effective field) is caught by the kept exclusion ledger, and the *wrongly-included-but-unmeasured* case by the [coverage backstop](#golden-vector-coverage-the-backstop). - **Version-awareness is declarative.** A field's whole lifecycle — introduced at v3, dropped at v5, revived at v8 — is one greppable string on the field (`v3..v4,v8..*`), inexpressible in hand-written form, with no diff smeared across function bodies. -- **Frozen-ness stays structural.** Because the generated functions are checked-in code, a retained `projectVN` references each field by literal Go path — deleting a measured field won't compile, the emit-key is a literal, and regeneration-idempotence (CI `go generate` + diff) pins a shipped version's output. Golden vectors are the semantic backstop behind that, not the sole guarantee. This recovers the hand-written model's compile guarantee that a *runtime* reflective walker gives up (its output would reflect the live struct at hash time — Problem 6 one layer down), while keeping the DSL's declarative lifecycle. The cost is a generator; the project already runs `go generate` via `mage` (`stringer`/`mockgen` precedent), so the infra cost is low. +- **Frozen-ness stays structural.** Because the generated functions are checked-in code, a retained `projectVN` references each field by literal Go path — deleting a measured field won't compile, the emit-key is a literal, and regeneration-idempotence (CI `go generate` + diff) pins a shipped version's output. Golden vectors are the semantic backstop behind that, not the sole guarantee. This recovers the hand-written model's compile guarantee that a *runtime* reflective walker gives up (its output would reflect the live struct at hash time — Problem 6 one layer down), while keeping the DSL's declarative lifecycle. + +The `go generate` *infrastructure* already exists (`stringer`/`mockgen` via `mage`), so the marginal cost is low — but the projection generator's **stakes** are categorically higher than those tools, and the design treats it accordingly. A `stringer` bug is cosmetic; a `mockgen` bug breaks test compilation and is caught instantly. A **projection-generator bug silently moves a shipped version's bytes → fleet-wide G5 (stale, undetectable except by the corpus) or G1 (mass churn).** The generator is therefore a first-class, fingerprint-load-bearing production artifact with its own test suite, and **regeneration-idempotence is a required CI gate** (the [`.github/workflows/generate.yml`](../../../.github/workflows/generate.yml) check is mandatory, never skippable) — without it the freeze degrades from structural to test-discipline. That is precisely why the coverage oracle and hand-frozen golden digests above are mandatory, not optional. ### D3 — Atomic self-describing token; no format bump, reconcile via force-rehash @@ -630,16 +649,16 @@ The lock **format** `Version` stays at `1`. Bumping it to `2` as a poison pill The reset (Part 1) must land as one coherent change at the dev→prod cutover; its pieces are independently reviewable but ship together because they all move the hash. 1. **PR A (substrate)**: the **projection generator** (`go generate`) — reads the version-set tags and emits the per-version `projectVN(cfg) []byte` functions (literal emits, sorted keys) plus golden-vector and coverage scaffolding — the canonical encoder (`canonicalBuf`, `emit`/`emitAlways`), the version-set tag parser, the frozen **TOML-key** emit rule, the `reflect.Value.IsZero()` omit-predicate, the `sha256` combiner, and the golden vectors. Generate-time guards: a fingerprinted field with **no tag** fails generation; the slimmed **exclusion ledger** and **dropped-fields ledger** replace the retired `TestAllFingerprintedFieldsHaveDecision` audit; **regeneration-idempotence** (CI `go generate` + `git diff --exit-code`) pins shipped versions. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Tests: a field tagged `v2..*` is absent from generated `projectV1`; a `!` range emits at zero; a field with **no** `fingerprint` tag fails generation; a **nested** fingerprinted struct with a tagless field fails generation; deleting a field a retained `projectVN` names **fails to compile**; a **Go-field rename keeping the TOML key** yields a byte-identical digest; two fields colliding on one emit-key fail generation; the coverage oracle (by struct-reflection, not the tag) fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`) and is not in the dropped-fields ledger; golden vectors pin v1; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. -2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`. Ships at the cutover; absorbed by the scheduled rebuild. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. -3. **PR C (Part 2 machinery)**: the **two-type token split** — `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **all five** comparison sites through these types — replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, `BuildDirtyChange` (the three current-tree sites, via `FreshToken.Reconcile`), plus `StoredToken.SameDigest` in `FindFingerprintChanges` and `changed.go`'s `classifyComponent` (the two historical sites). Resolution replay reserved (slot reuses `computeRes1`). **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* — only the *registry dispatch* is dormant while just `v1` exists, and `BuildDirtyChange`'s replay is a hard prerequisite for any later PR that registers `v2`. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a historical site cannot construct a `FreshToken` (no live inputs). +2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`, asserted by a named-constant test (`currentVersion == 1`) with a comment that the *content* version lives in the token prefix, not here — so a future format bump cannot silently break every historical read through `lockfile.Parse`. Ships at the cutover; absorbed by the scheduled rebuild. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. +3. **PR C (Part 2 machinery)**: the **two-type token split** — `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`, fails closed on its zero value), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **every** comparison and compute site through these types. The **current-tree** sites (via `FreshToken.Reconcile`): replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, `BuildDirtyChange`, and the second `ComputeIdentity` caller `bumpComponents` (`update.go`); plus the `computeCurrentFingerprint` (`sourceprep.go`) return-type cascade `string → FreshToken`. The **historical** sites (via `StoredToken.SameDigest`): `FindFingerprintChanges`, `changed.go`'s `classifyComponent`, **and `haveMatchingFingerprints`**. ⚠️ **`haveMatchingFingerprints` is security-load-bearing, not a mechanical changelog sibling:** it gates the cache-poisoning integrity check (`if result.SourcesChange && haveMatchingFingerprints(...)` in `changed.go`). If only `classifyComponent` is converted and this site is missed, the first legitimate `v2` bump makes a version-only re-stamp compare unequal → the integrity violation is **never recorded → tamper evidence silently swallowed**. It must convert to digest-compare in the same PR. Resolution replay reserved (slot reuses `computeRes1`). **Ordering gate (CI-enforced):** `currentLockContentVersion > 1` is forbidden unless `BuildDirtyChange` already routes through `Reconcile` — otherwise registering `v2` makes every component read persistently dirty on every `render`/`build`. **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* — only the *registry dispatch* is dormant while just `v1` exists. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event and does **not** suppress `haveMatchingFingerprints`; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a zero-value `FreshToken`/`StoredToken` fails closed; a historical site cannot construct a `FreshToken`; the registry `init()` panics on a `[minSupported,current]` gap. 4. **PR D (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) — add a field absent from `projectV1` and set it on one component; assert only that lock drifts and every other lock is byte-identical. 5. **PR E (config schema axis, later)**: `schema-version` field + load-time canonical migration + the `config migrate` command. Gated on the first post-reset non-additive TOML change not already absorbed by the reset's normalization pass. +6. **PR F (forced lock migration, gated on the first `v2`)**: the `component migrate` command (the only sanctioned floor-raise; the prescribed fix for a build-critical newly-measured input) and the CI spread-ceiling on `currentLockContentVersion − minSupportedLockContentVersion`. The floor machinery (`minSupportedLockContentVersion`, the `init()` gap-panic) goes live the moment `v2` registers but is unusable without this, so PR F must land with or before the first `v2` — the same gating shape as PR E. -Each PR is independently revertible up to the cutover. PRs A–B land together at the dev→prod cutover (they move every hash and are absorbed by the scheduled rebuild); PR C is inert until the first post-reset algorithm change; PRs D–E follow. +Each PR is independently revertible up to the cutover. PRs A–B land together at the dev→prod cutover (they move every hash and are absorbed by the scheduled rebuild); PR C is inert until the first post-reset algorithm change; PR D follows; PRs E–F are gated on the first post-reset schema/algorithm change respectively. ## Open questions -1. Should a lazy re-stamp during a *read-only* command (`render`, `build` freshness check) write the lock back, or defer all writes to `component update`? Writing on read is surprising; deferring means freshness checks stay slightly slower until the next update. (Leaning: defer all writes to `update`, keeping reads side-effect-free.) -2. For the config schema axis, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. +1. For the config schema axis, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. *Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → generation fails, `!`-prefix for always-emit) from which a **`go generate` step emits the per-version `projectVN` functions** — restoring "forgotten field → loud failure" natively and making field removal a *compile* error; the **emit-key is the frozen TOML key** (never the Go field name, so a field rename is byte-neutral; `key=` overrides for keyless fields; duplicate emit-keys fail generation), the **omit-predicate is `reflect.Value.IsZero()`** (former Open Q#3), and the tag DSL is **frozen at three range-operators** (`..`, `!`, `*`) plus the orthogonal `key=`; frozen-ness rests on the **compiler + generator + regeneration-idempotence**, with the **golden-vector coverage invariant** (every field measured at a retained version exercised non-zero in ≥1 vector, with a tag-independent discrimination oracle) as the semantic backstop whose oracle is a **tag-independent dropped-fields ledger** (a field absent from it must discriminate at the *current* version), plus a **kept exclusion ledger** for `-` fields (the inclusion default is native to the tag; the *exclusion* default stays ledgered because it is the G5-dangerous direction); the stored hash is read only through a **two-type token split** — `StoredToken` (digest-compare only) and `FreshToken` (reconcile, requires live inputs), both non-comparable — adopted in PR C, closing both the raw-`==` and comparator mis-classification hazards at compile time; baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version covers both stored hashes **permanently** (no split) — the historical changelog/classifier comparators compare the **digest** (stripping the `v:` prefix), so advancing the shared prefix for a resolution-only algorithm change moves no digest and mints no release; resolution replay stays reserved (slot present, `computeRes1` reused) until `ComputeResolutionHash` first changes. From e0960d759af62b6257ac5712a9f7c75e0e258647 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Tue, 9 Jun 2026 14:13:56 -0700 Subject: [PATCH 10/15] update 8 --- docs/developer/rfc/lazy-schema-migration.md | 54 ++++++++++++--------- 1 file changed, 30 insertions(+), 24 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index c5cd93e6..5a08a69a 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -297,7 +297,7 @@ Non-zero coverage alone is necessary but not sufficient; four more obligations c - **`!`-zero behavior.** Dropping `!` silently stops emitting a build-meaningful zero (G5). Every retained `!` range needs a **zero-valued** discrimination vector. - **Encoding across the value space.** A `"foo"` vector misses an encoder change affecting only delimiter bytes, multibyte runes, or multi-entry slices/maps. Add per-encoder **property/fuzz** vectors. (Fails toward G1 over-drift except under a collision the length-prefixed form makes unlikely.) -- **nil-vs-empty is a resolver invariant — slices *and* maps.** Under `IsZero()` a nil slice/map omits and a non-nil empty one (`[]`, `{}`) emits, and `mergo.Merge` can yield either for the same intent. "Resolution normalizes to one canonical form" is a *distributed* property; pin it with a **resolver-side canonical-form test** over every fingerprint-sensitive slice **and map** field, not only per-field vectors. +- **nil-vs-empty is a resolver invariant — slices *and* maps.** Under `IsZero()` a nil slice/map omits and a non-nil empty one (`[]`, `{}`) emits, and `mergo.Merge` can yield either for the same intent. This is the one correctness assumption the design *preserves rather than proves*: \"resolution normalizes nil-vs-empty to one canonical form\" is a *distributed* property across every merge site. PR A must (a) name the **single resolver chokepoint** that owns canonicalization (so it is one enforced place, not a convention scattered across call sites), (b) **inventory** every fingerprint-sensitive slice/map field, and (c) write the resolver-side canonical-form test **first** — before the projection lands, or the over-drift is latent. (Verify the current resolver's behavior here is not yet confirmed to canonicalize.) - **`!` on an all-zero nested struct emits.** `IsZero()` on a struct is true iff every sub-field is zero, so a `!`-tagged nested struct whose fields all resolve to zero would otherwise be omitted; the generated sub-projector treats a `!` range as "emit the (recursively projected) value even when the struct `IsZero`," so a build-meaningful all-zero struct still hashes. Covered by a zero-valued discrimination vector like any other `!` range. - **Enumerator completeness.** The coverage corpus is checked against the **generator's own field enumeration** (above), so it cannot drift from what is measured — a newly-added field or nested struct that the generator reaches but the corpus does not cover fails the backstop, not just the generator. @@ -330,8 +330,9 @@ func projectV1(c *ComponentConfig) []byte { **The generated encoding contract — frozen per version, fully specified before PR A.** The golden vectors bake the byte encoding in irreversibly at the reset, so every value type's serialization is a one-way door and must be pinned now, not discovered later. The contract: -- **Maps emit in sorted-key order.** This is the one guarantee `hashstructure` gave for free that the projection must re-establish: a naive `range` over a Go map is **non-deterministic** (randomized iteration), so a generated `b.emitMap` must sort entries by key and emit each as `:=:` under the field key. Without this, an unchanged config hashes differently across runs → intermittent spurious lock drift (the exact G1/G2 catastrophe this RFC exists to prevent). Measured map fields are real (`Defines`, `Packages`), so this is mandatory, with a fuzz vector (≥2 keys, varying insert order → identical digest). -- **Value-slot encoding is defined per type, not left to `%v`.** `bool` → `"true"`/`"false"`; integers → base-10; `[]T` → each element as its own length-prefixed sub-value in slice order (not a JSON blob); `map` → as above. Interfaces, generics, and external types either get an explicit encoding or **fail generation** — no silent `fmt`-style fallback that a dependency could change underneath us. +- **Composite omission is by *projected* emptiness, not raw `IsZero()`.** A nested struct or a map entry is `reflect.Value.IsZero()` only when **every** sub-field is zero — *including excluded (`fingerprint:"-"`) children* — so a global `IsZero()` predicate would leak: a measured composite whose only non-zero content is an excluded child is not `IsZero`, so it would emit, and the digest would move on a change that touched no *measured* input. (Real case: `ComponentConfig.Build` is measured but holds the excluded `Failure`/`Hints`; setting only `build.failure.expected` makes `Build` raw-non-zero while every measured sub-field is zero → false lock drift + a phantom release.) The rule: a composite is omitted when its **frozen sub-projector emits no measured bytes** (unless tagged `!`), *not* when the raw value `IsZero`. Scalars keep plain `IsZero()`. The coverage backstop gains the **inverse** check it was missing: a **negative discrimination vector** per `-` field (and per all-`-`-value map entry) that varies that field alone and asserts the digest does **not** move. +- **Maps emit in sorted-key order; an entry whose value projects empty still emits its key.** A naive `range` over a Go map is **non-deterministic** (randomized iteration), so a generated `b.emitMap` must sort entries by key and emit each as `:=:` under the field key — the one guarantee `hashstructure` gave for free that the projection must re-establish, else an unchanged config hashes differently across runs (intermittent spurious drift). **Map-key membership is itself measured:** an entry whose *value* projects to empty still emits its key, so `{"baz":{}}` ≠ `{}` (matching today's `hashstructure`, which hashes map keys). Tests: a fuzz vector (≥2 keys, varying insert order → identical digest) **and** a key-varying vector (add an empty-value entry → digest moves). The natural "set a non-zero value" vector would *not* exercise this, so it must be written explicitly. +- **Value-slot encoding is defined per type, not left to `%v`.** `bool` → `"true"`/`"false"`; integers → base-10; `[]T` → each element as its own length-prefixed sub-value in slice order (not a JSON blob); `map` → as above. A **named scalar type** (e.g. `fileutils.HashType`, `SpecSourceType`, `ComponentOverlayType`, `ReleaseCalculation`) encodes by its **underlying `reflect.Kind`** (named string/int/bool → the underlying kind) — these are measured fields, so they must *not* fail generation. Only genuinely un-encodable shapes (interfaces, generics, pointers to external types, `time.Time`/`[]byte`-style special-cases not present in today's measured graph) **fail generation** rather than fall back to a `fmt`-style encoding a dependency could change underneath us. The v1 encoding test enumerates every named scalar in the measured graph. - **Nested struct values are emitted by a frozen per-version sub-projector, never by runtime reflection.** If a generated `projectV1` emitted a `[]ComponentOverlay` by delegating to a *live* reflective encoder, adding a field to `ComponentOverlay` later would change `projectV1`'s output at hash time — Problem 6 reborn one layer down. The generator therefore emits a literal per-version projector for each nested struct type too; element/value projectors are frozen exactly like top-level ones. - **Recursion prunes at `fingerprint:"-"`, per edge.** The completeness walk descends only through *measured* fields and only into struct kinds (through slice-element, map-value, and pointer-element types), treating defined scalar types as leaves. A `-` tag stops the walk at that **edge** (the field isn't measured, so its subtree isn't either) — not at the type (the same type reached through an *included* field elsewhere is still enumerated there). This breaks the real `ComponentConfig → SourceConfigFile → … → ComponentConfig` cycle (both back-edges are already `-`), and a **visited-type memo** on the included graph guards against any future included-path cycle. An untagged field reached on an *included* edge fails generation (the mandatory-decision guard); fields under a `-` edge are never reached, so they need no tag. @@ -361,6 +362,7 @@ The reset rebuild is a budget. Spend it on the irreversible / cutover-only chang 4. **Adopt an atomic, self-describing `v1:sha256:…` token** for the stored hash, so the version and the digest can never desync (closes the re-stamp/desync class of bug where the version field and the hash field are written independently). 5. **Unify on `sha256` everywhere**, retiring the `uint64`→decimal-string wart from the `hashstructure` era. One hash format, one encoding. 6. **Do every pending rename / default-normalization now.** Renaming a field, moving content between structs, or changing a baked-in default is a one-way door under Part 2 (it needs a version bump + replay); at the reset it is free because everything rebuilds anyway. This is where the schema-axis "hardest cases" get absorbed cheaply. +7. **Resolve each field's mandatory tag — and bank the free corrections.** The "absent ⇒ generation fails" rule forces a conscious decision on every fingerprinted field at the reset, which is the moment to *fix* existing mistakes for free. Concretely, tag `ComponentConfig.Packages` `fingerprint:"-"`: every `PackageConfig` field is publish-only (`Publish`, itself `-`), so the map measures nothing build-effective — yet today `hashstructure` hashes its *keys*, so adding a publish-only package name already triggers a spurious rebuild. Excluding it at the reset retires that existing G1 churn at zero cost. Audit the whole struct for the same pattern (a measured composite whose every leaf is `-`). **Anti-goal:** do *not* burn reset budget on additive fields — Part 2 handles those for free, forever. The success criterion for the load-out is that **no *routine* change ever forces a second coordinated cutover**: after the reset, every ordinary change must be expressible as either a free additive field or a lazy Part 2 version bump. Retiring an *old* content version is the one sanctioned exception — a fleet-wide `component migrate` is itself a deliberate, planned, reset-grade event (see [Registry floor](#registry-floor-and-forced-migration)); the goal is that nothing *unplanned* ever forces one. @@ -446,9 +448,11 @@ This resolves Problems 2 (for default changes), 3 (hashing bugfixes), and 5 (pie `ComponentLock` carries two persisted content hashes: `InputFingerprint` (render inputs, via `projectVN` + `sha256`) and `ResolutionInputHash` (upstream-resolution inputs — a flat SHA256 over seven explicit fields in `ComputeResolutionHash`). Both have the **same evolution problem**: appending an input or reordering the fold moves every lock's hash → G1 churn. -We version them with **one shared integer** (the token's `v` prefix), not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version fields would double the floor/replay/migrate machinery for an input set (`ResolutionInputHash`) that changes rarely — YAGNI. **The shared integer is permanent, made safe by digest-comparison.** The one hazard a shared prefix could create: a *resolution-only* algorithm bump drags the `InputFingerprint` token's prefix `v1`→`v2` while its digest is unchanged (the fingerprint algorithm was reused), and a *full-token* changelog walker would misread that prefix move as a release. We close it not by splitting the version but by having the historical changelog/classifier comparators compare the **digest** (the `:` tail), stripping the `v:` prefix: a resolution-only bump moves the prefix but not the digest → no phantom release; a real input change moves the digest → fires. Both fields are always co-written in the same `update` pass and the prefix advances whenever *either* algorithm advances, so the single prefix is a correct version for both. (See [the synthetic-history path](#the-synthetic-changelogrelease-path-is-the-real-hazard).) +We version them with **one shared integer**, not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version axes would double the floor/replay/migrate machinery for an input set (`ResolutionInputHash`) that changes rarely — YAGNI. -**Phasing.** The atomic token format (`v:sha256:…`) is fixed at the reset. Fingerprint replay is wired in Part 2's first PR; **resolution-hash replay is reserved, not yet wired** — the slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`), with no schema change. The deferral is safe because of its smaller blast radius — see [`ResolutionInputHash`](#resolutioninputhash--shares-the-version-replay-deferred). +**`InputFingerprint` is the sole prefix authority; `ResolutionInputHash` stays bare.** The shared version is physically stored **only** in `InputFingerprint`'s `v:` prefix. `ResolutionInputHash` carries **no prefix** — it remains a bare `sha256:` digest. This is the decisive choice that prevents a cross-field desync: the **first fingerprint-only `v2`** already advances the shared prefix, and if `ResolutionInputHash` *also* carried it, `resolver.go`'s raw string compare of the whole field would see a prefix-only move (`v1:…X` → `v2:…X`, `computeRes1` unchanged) and mark resolution stale → fleet-wide re-resolution for nothing. With the prefix living only in `InputFingerprint`, the resolver compares a bare digest that does not move on a fingerprint bump. The "shared version" therefore means: the integer in `InputFingerprint`'s prefix selects which `computeResN` produced `ResolutionInputHash` during replay (read from the one prefix), not that resolution stores its own copy. This also keeps `InputFingerprint` the only release-bearing field, so the historical changelog/classifier comparators — which compare the **digest**, stripping the `v:` prefix — never see a phantom move on a version-only re-stamp. (See [the synthetic-history path](#the-synthetic-changelogrelease-path-is-the-real-hazard).) + +**Phasing.** The atomic token format (`v:sha256:…`) is fixed at the reset. Fingerprint replay is wired in Part 2's first PR; **resolution-hash replay is reserved, not yet wired** — the slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`). Because `ResolutionInputHash` is bare and prefix-free, a fingerprint-only bump before that day is a no-op for the resolver — the deferral is genuinely safe, not merely small-blast-radius. See [`ResolutionInputHash`](#resolutioninputhash--bare-digest-replay-deferred). #### Churn-avoidance policies (G1) @@ -489,10 +493,10 @@ This is correct *by contract* (a v1 lock promises freshness under the v1 input s Lazy migration means an untouched lock can sit at an old version **indefinitely** (G3 by design). That makes "keep the last *N* versions" a **correctness cliff, not a tuning knob**: if pruning drops the compute function a lock still depends on, replay becomes impossible → forced `FreshnessStale` → the mass rebuild/rewrite (and, via the downstream-consumer analysis below, mass changelog churn) the whole design exists to avoid. So the floor must be explicit and paired with an escape hatch, decided now: - **`minSupportedLockContentVersion`** is a hard floor. A lock below it cannot be replayed and is treated as `Stale`. Dropping a registry entry is therefore a deliberate, breaking, announced act — never incidental cleanup. -- **`component migrate`** (Open Q#5, promoted to a requirement) force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* — it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm — advancing that algorithm is the whole point — so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic-history trap](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm moves the shared prefix but not the `InputFingerprint` digest, so it is correctly release-silent.) That is *why* migrate is reset-grade and rare, not a free background sweep — the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal — each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). +- **`component migrate`** (Open Q#5, promoted to a requirement) force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* — it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm — advancing that algorithm is the whole point — so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic-history trap](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm rewrites only the bare, prefix-free `ResolutionInputHash` — which `synthistory` never reads — so it is correctly release-silent.) That is *why* migrate is reset-grade and rare, not a free background sweep — the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal — each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). - **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine — left alone, the registry, golden vectors, and deprecated tombstone fields grow **append-only** (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion − minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. **Early-warning ramp:** the ceiling is a *warning at ceiling−1*, a hard failure only at the ceiling — so an approaching floor-raise surfaces as a heads-up on the PR *before* the one that registers `v(N+1)`, converting the forced migrate from a surprise blocking failure into a planned event (the design's goal that nothing *unplanned* ever forces a migrate). **Residual:** if genuine algorithm changes arrive *faster* than planned rebuilds, the ceiling still ultimately *forces* an unplanned, release-grade `component migrate`. The ceiling does not eliminate the expensive event; it bounds the backlog by *converting* an unbounded version spread into an occasional forced migrate, with one version of advance notice. This is the accepted cost of lazy-forever coexistence. -**Mixed-toolchain hazard — bounded by the version-pin, not auto-repair.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock, stamping a legacy (prefix-less) or lower-`v` hash. In the **working tree** this is self-correcting: the next new-binary run detects the sub-floor token and force-rehashes it to the current version. But "self-correcting" stops at the working tree — if a downgraded lock is **committed**, `FindFingerprintChanges` reads `v1 → legacy → v1` as two real release events, and a published `%autorelease` increment cannot be withdrawn. So the load-bearing guard against *committed* phantom releases is the **CI version-pin**: post-cutover, no old binary may run the `update`-and-commit step. (The force-rehash only cleans the working tree; it does not undo history.) The *symmetric* residual — a binary that predates content-version `v2` meeting a `v2` token it cannot replay — is closed by a **required** write-time guard (Open Q#5, now a requirement): refuse to write a token whose version exceeds the binary's `currentLockContentVersion`, erroring rather than silently restamping at `v1`. Note this guard lives in the binary doing the write, so it constrains *newer-but-not-newest* binaries; it does **not** retroactively constrain a genuinely *old* binary — that direction is the version-pin's job. +**Mixed-toolchain hazard — bounded by the version-pin, not auto-repair.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock, stamping a legacy (prefix-less) or lower-`v` hash. In the **working tree** this is self-correcting: the next new-binary run detects the sub-floor token and force-rehashes it to the current version. But "self-correcting" stops at the working tree — if a downgraded lock is **committed**, `FindFingerprintChanges` reads `v1 → legacy → v1` as two real release events, and a published `%autorelease` increment cannot be withdrawn. So the load-bearing guard against *committed* phantom releases is the **CI version-pin**: post-cutover, no old binary may run the `update`-and-commit step. Concretely, that means the lock-writing CI job runs from a **pinned build image (by digest, not a floating tag)** rebuilt from the cutover commit or later, and **no other path reaches the `update`-and-commit step** — local developer binaries do not commit locks; only the pinned job does. (The force-rehash only cleans the working tree; it does not undo history.) The *symmetric* residual — a binary that predates content-version `v2` meeting a `v2` token it cannot replay — is closed by a **required** write-time guard (Open Q#5, now a requirement): refuse to write a token whose version exceeds the binary's `currentLockContentVersion`, erroring rather than silently restamping at `v1`. Note this guard lives in the binary doing the write, so it constrains *newer-but-not-newest* binaries; it does **not** retroactively constrain a genuinely *old* binary — that direction is the version-pin's job. #### Replaying across a changed input set — `{a,b,c}` → `{a,b,d}` @@ -555,10 +559,12 @@ The versioned-replay story in Part 2 must hold for **every** reader of `InputFin | -------- | ----- | -------- | --------------------------- | | `checkFingerprintFreshness` (resolver) | recomputed identity | vs stored token | Replay at token version (Part 2 core) | | `component update` `Changed` decision | recomputed identity | vs stored token | **Replay before `Changed`** (see churn policy seam) | -| `changed.go` `classifyComponent` / `haveMatchingFingerprints` (CI classifier) | stored token strings (two historical git refs) | **digest compare** (strip `v:` prefix) | **String-only — must NOT replay** (no inputs available, and replaying historical configs would violate the no-recompute invariant); comparing the digest makes it immune to version-only deltas | -| `synthistory.FindFingerprintChanges` | stored token strings across git history | **digest of adjacent commits** (strip `v:` prefix) | **String-only; digest-compare** so a version-only re-stamp (including a resolution-only bump) never fires a release | +| `bumpComponents` (`update.go`) | recomputed identity | vs stored token | Current-tree replay (second `ComputeIdentity` caller) | +| `changed.go` `classifyComponent` (CI classifier) | stored token strings (two historical git refs) | **digest compare** (strip `v:` prefix) | **String-only — must NOT replay** (no inputs available; replaying historical configs would violate the no-recompute invariant) | +| `changed.go` `haveMatchingFingerprints` (⚠️ cache-poisoning integrity gate) | stored token strings | **digest compare** (strip `v:` prefix) | **String-only; security-load-bearing** — a version-only delta must read as "same" or the integrity check is silently skipped | +| `synthistory.FindFingerprintChanges` | stored token strings across git history | **digest of adjacent commits** (strip `v:` prefix) | **String-only; digest-compare** so a version-only re-stamp never fires a release | | `synthistory.BuildDirtyChange` | recomputed (current ver) | vs stored `headLock` token | **Replay at headLock version** before declaring dirty | -| `ResolutionInputHash` staleness/write | recomputed resolution hash | vs stored | **Shares the version; replay reserved, not yet wired** | +| `ResolutionInputHash` staleness/write | recomputed resolution hash | vs stored **bare** digest | **No prefix** (bare `sha256:`); fingerprint-only bumps never touch it; replay reserved | **Two comparator classes, not one — and only one of them can replay.** The consumers split cleanly by *what they hold*: @@ -572,30 +578,30 @@ The `changed.go` classifier is the easily-missed member of the *second* class: i The fix is a **two-type split**, because a single token type cannot tell the two comparator classes apart: - **`StoredToken`** — parsed from a lock by the *sole* strict parser `ParseToken` (accepts only `sha256:` legacy and `v:sha256:`; any malformed token is treated as *changed*, never normalized to an empty digest). It exposes `SameDigest(other StoredToken)` and nothing else — it holds no inputs, so a site that has only stored strings *physically cannot* perform a freshness decision. -- **`FreshToken`** — obtainable *only* from `ComputeIdentityAt(version, config, …)`, so constructing a *valid* one requires live inputs. Its zero value (`var f FreshToken`) is still syntactically constructible, so it must **fail closed**: a `FreshToken` carries a validity bit set only by the constructor, and `Reconcile` on an unset one errors (or returns `Stale`), never silently treats an empty digest as a match. It exposes `Reconcile(stored StoredToken) → {Fresh | Stale | RestampTo(v)}`. +- **`FreshToken`** — obtainable *only* from `ComputeIdentityAt(version, config, …)`, so constructing a *valid* one requires live inputs. Its zero value (`var f FreshToken`) is still syntactically constructible, so it **fails safe**: a `FreshToken` carries a validity bit set only by the constructor, and `Reconcile` on an unset one **returns `Stale`** (never errors, never `Fresh`). `Stale` is the fail-*safe* answer on a path whose job is "rebuild when in doubt" — a zero token means "no freshness evidence," so it triggers a rebuild (G1 churn at worst) and never blocks a `build`/`render`/`--check-only`, where an `error` would be fail-*stop* and could take the fleet down on an accidental zero-value path. It exposes `Reconcile(stored StoredToken) → {Fresh | Stale | RestampTo(v)}`. (Belt-and-suspenders: a named test `var f FreshToken; assert f.Reconcile(stored) == Stale`, and if feasible a vet/lint check that no site reconciles a statically-zero token — so a *programming* mistake is still caught loudly without coupling runtime behavior to it.) + +A historical site holding two `StoredToken`s can call `SameDigest` but cannot fabricate a `FreshToken`, so it cannot accidentally pose as a current-tree freshness check; a current-tree site must obtain a `FreshToken` to reconcile, which forces it through live inputs. The *assignment* documents the class, and the mis-classification path is unconstructible rather than merely discouraged. Both types are **non-comparable** (an unexported `_ [0]func()` field), so a raw `==` on a token outside the `fingerprint` package fails to compile. (Unexported fields alone would *not* do this: a struct of comparable unexported fields is still `==`-comparable from any package; the non-comparable sentinel is what blocks it.) -A historical site holding two `StoredToken`s can call `SameDigest` but cannot fabricate a `FreshToken`, so it cannot accidentally pose as a current-tree freshness check; a current-tree site must obtain a `FreshToken` to reconcile, which forces it through live inputs. The *assignment* documents the class, and the mis-classification path is unconstructible rather than merely discouraged. Both types are **non-comparable** (an unexported `_ [0]func()` field), so a raw `==` on a token outside the `fingerprint` package fails to compile — closing the copy-the-`==` path too. (Unexported fields alone would *not* do this: a struct of comparable unexported fields is still `==`-comparable from any package; the non-comparable sentinel is what blocks it.) This lands in PR C, which already edits every one of those sites; it has no on-disk-format dependency, so there is no reason to touch them twice and carry the mis-classification window in between. +For the choke-point to be *structural* and not merely conventional, the **lock fields must be token-typed, not raw `string`**: as long as `ComponentLock.InputFingerprint`/`ResolutionInputHash` stay exported strings, `lock.InputFingerprint == other.InputFingerprint` still compiles and the raw-compare pattern stays copyable. So PR C changes those fields to `StoredToken` (TOML marshal/unmarshal routing through `ParseToken`, so every read crosses the strict parser), or hides the raw string behind an accessor that returns a `StoredToken`. Only then does \"enforced by types, not prose\" hold end-to-end. This lands in PR C, which already edits every comparison site; it has no on-disk-format dependency (the *bytes* are unchanged \u2014 only the Go field type), so there is no reason to touch them twice and carry the mis-classification window in between. ### The synthetic changelog/release path is the real hazard [`synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go) turns fingerprint movement into **user-visible, shipped** package state — `%autochangelog` entries and `%autorelease` increments. There are two distinct comparators, and the design resolves them asymmetrically. -- **`FindFingerprintChanges` (historical walker)** compares `InputFingerprint` across the lock's git history and emits a synthetic changelog/release entry on every change. It compares the **digest** (stripping the `v:` version prefix), not the full token — a one-line string operation, not the infeasible version-aware replay (it has only committed *strings*, no inputs). So a version-only re-stamp (a lazy v1→v2 with an unchanged digest, or a resolution-only bump that advances the shared prefix) is **invisible** to it; only a moved digest — a genuine input change — fires, and the migration folds into the real change's entry that carries it. The v1→v2 conversion is thus an *accepted, per-component, notable* changelog event that piggybacks a real change, guaranteed by digest-comparison rather than by lazy-discipline. - - **`component migrate` is release-grade *when it moves digests*.** A migrate that retires a *fingerprint* algorithm re-stamps every unchanged lock from `computeFP1`'s digest to `computeFP2`'s — the digests move, the walker fires, and the fleet-wide release is the deliberate cost ([registry floor](#registry-floor-and-forced-migration)). A migrate that retires only a *resolution* algorithm moves the shared prefix but not the `InputFingerprint` digest, so it is correctly release-silent. Either way the firing tracks a real digest move, never a bare prefix change. +- **`FindFingerprintChanges` (historical walker)** compares `InputFingerprint` across the lock's git history and emits a synthetic changelog/release entry on every change. It compares the **digest** (stripping the `v:` version prefix), not the full token — a one-line string operation, not the infeasible version-aware replay (it has only committed *strings*, no inputs). So a version-only re-stamp (a lazy v1→v2 migration with an unchanged digest) is **invisible** to it; only a moved digest — a genuine input change — fires, and the migration folds into the real change's entry that carries it. The v1→v2 conversion is thus an *accepted, per-component, notable* changelog event that piggybacks a real change, guaranteed by digest-comparison rather than by lazy-discipline. + - **`component migrate` is release-grade *when it moves digests*.** A migrate that retires a *fingerprint* algorithm re-stamps every unchanged lock from `computeFP1`'s digest to `computeFP2`'s — the digests move, the walker fires, and the fleet-wide release is the deliberate cost ([registry floor](#registry-floor-and-forced-migration)). A migrate that retires only a *resolution* algorithm rewrites only the bare `ResolutionInputHash` (which `synthistory` never reads), so it is correctly release-silent. Either way the firing tracks a real `InputFingerprint` digest move. - **`BuildDirtyChange` (live dirty check)** compares a *recomputed* current-version (v2) hash against the *stored* (possibly v1) `headLock.InputFingerprint` and declares dirty on inequality. "Accept as notable" does **not** save this path: post-switchover an *unchanged* component would read **dirty on every `render`/`build`** until re-stamped — a persistent, recurring spurious signal, worse than a one-time entry. The fix is **free**: it is the *same replay Part 2 already owes the freshness check* — replay at `headLock`'s recorded version before declaring dirty. One additional call site for logic already being written, no new mechanism. **Net:** the changelog-walker concern is not "make the walker version-aware" (hard, maybe infeasible). It is two cheap things — (1) the historical comparators (`FindFingerprintChanges`, `changed.go`) compare the **digest**, so a version-only delta never fires; and (2) extend the *current-tree* replay to `BuildDirtyChange` (which *does* hold live inputs), one call site for logic already being written. The reset commit is the single deliberate exception: it *is* a fleet-wide notable event, the coordinated cutover, intentionally visible. -### `ResolutionInputHash` — shares the version, replay deferred +### `ResolutionInputHash` — bare digest, replay deferred -`ComponentLock` carries a *second* persisted content hash, `ResolutionInputHash`, with its own staleness logic and its own silent-write path (it writes when only `resHashChanged`, never flipping `Changed`). It has the **identical** evolution problem as `InputFingerprint`, and the single shared content version covers it (see [Both hashes share one version](#both-hashes-share-one-version)). Two things make its replay safe to defer: +`ComponentLock` carries a *second* persisted content hash, `ResolutionInputHash`, with its own staleness logic and its own silent-write path (it writes when only `resHashChanged`, never flipping `Changed`). It has the **identical** evolution problem as `InputFingerprint`, but two properties make its replay safe to defer: -- **Smaller blast radius.** `ResolutionInputHash` does **not** feed `synthistory`, so an algorithm change can never mint a phantom changelog/release (that hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. (One honest caveat: a resolution-only bump moves the `input-fingerprint` *line bytes* too — the `v:` prefix advances — on a component whose render inputs did not change. Digest-comparison immunizes consumers from a phantom release, but it slightly erodes the raw "this `.lock` line moved ⇒ inputs changed" reviewer signal; the *digest*, not the line, is the trustworthy signal post-reset.) +- **Smaller blast radius.** `ResolutionInputHash` does **not** feed `synthistory`, so an algorithm change can never mint a phantom changelog/release (that hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. - **No pending change.** It is a flat seven-field SHA256, not a struct walk, so the projection substrate leaves it untouched. Its registry slot stays `computeRes1` until its inputs genuinely change. -**Decision (KISS/YAGNI):** wire fingerprint replay in Part 2's first PR; reserve resolution replay (slot present, prior fn reused) and wire it the day `ComputeResolutionHash` first changes — add `computeRes2`, bump the shared version, re-stamp both fields together. No separate resolution prefix is needed: digest-comparison keeps the shared version correct for both ([above](#both-hashes-share-one-version)). - -**The one genuinely-deferred seam — restamp-on-write.** The churn fix restamps the `InputFingerprint` token only `if result.Changed`, but a resolution-only write is `resHashChanged && !Changed` (it overwrites `ResolutionInputHash` without flipping `Changed`). While both hashes share v1 this is inert; but once resolution replay is wired, a resolution-only write could advance the shared version on one field and leave the `InputFingerprint` prefix lagging — two prefixes in one lock, which `parseTokenVersion` would then read inconsistently. The pin, applied when resolution replay lands: **restamp the shared prefix on `Changed || resHashChanged`** (or make `InputFingerprint` the sole prefix authority, advanced on any write). This cannot manifest before the first resolution-algorithm bump, so it is explicitly deferred — named here so the future author does not rediscover it the hard way. +**Decision (KISS/YAGNI):** wire fingerprint replay in Part 2's first PR. `ResolutionInputHash` stays a **bare `sha256:` digest with no `v:` prefix** (the prefix lives only in `InputFingerprint` — see [Both hashes share one version](#both-hashes-share-one-version)), so the resolver compares it directly and a fingerprint-only bump never touches it. The day `ComputeResolutionHash` first changes, add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`); decide *then* whether resolution needs its own prefix or reads the shared one. Because resolution carries no prefix today, the desync hazard of a shared prefix on two fields **does not exist** — there is no restamp-on-write seam to defer. ## Design decisions @@ -650,15 +656,15 @@ The reset (Part 1) must land as one coherent change at the dev→prod cutover; i 1. **PR A (substrate)**: the **projection generator** (`go generate`) — reads the version-set tags and emits the per-version `projectVN(cfg) []byte` functions (literal emits, sorted keys) plus golden-vector and coverage scaffolding — the canonical encoder (`canonicalBuf`, `emit`/`emitAlways`), the version-set tag parser, the frozen **TOML-key** emit rule, the `reflect.Value.IsZero()` omit-predicate, the `sha256` combiner, and the golden vectors. Generate-time guards: a fingerprinted field with **no tag** fails generation; the slimmed **exclusion ledger** and **dropped-fields ledger** replace the retired `TestAllFingerprintedFieldsHaveDecision` audit; **regeneration-idempotence** (CI `go generate` + `git diff --exit-code`) pins shipped versions. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Tests: a field tagged `v2..*` is absent from generated `projectV1`; a `!` range emits at zero; a field with **no** `fingerprint` tag fails generation; a **nested** fingerprinted struct with a tagless field fails generation; deleting a field a retained `projectVN` names **fails to compile**; a **Go-field rename keeping the TOML key** yields a byte-identical digest; two fields colliding on one emit-key fail generation; the coverage oracle (by struct-reflection, not the tag) fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`) and is not in the dropped-fields ledger; golden vectors pin v1; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. 2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`, asserted by a named-constant test (`currentVersion == 1`) with a comment that the *content* version lives in the token prefix, not here — so a future format bump cannot silently break every historical read through `lockfile.Parse`. Ships at the cutover; absorbed by the scheduled rebuild. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. -3. **PR C (Part 2 machinery)**: the **two-type token split** — `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`, fails closed on its zero value), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **every** comparison and compute site through these types. The **current-tree** sites (via `FreshToken.Reconcile`): replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, `BuildDirtyChange`, and the second `ComputeIdentity` caller `bumpComponents` (`update.go`); plus the `computeCurrentFingerprint` (`sourceprep.go`) return-type cascade `string → FreshToken`. The **historical** sites (via `StoredToken.SameDigest`): `FindFingerprintChanges`, `changed.go`'s `classifyComponent`, **and `haveMatchingFingerprints`**. ⚠️ **`haveMatchingFingerprints` is security-load-bearing, not a mechanical changelog sibling:** it gates the cache-poisoning integrity check (`if result.SourcesChange && haveMatchingFingerprints(...)` in `changed.go`). If only `classifyComponent` is converted and this site is missed, the first legitimate `v2` bump makes a version-only re-stamp compare unequal → the integrity violation is **never recorded → tamper evidence silently swallowed**. It must convert to digest-compare in the same PR. Resolution replay reserved (slot reuses `computeRes1`). **Ordering gate (CI-enforced):** `currentLockContentVersion > 1` is forbidden unless `BuildDirtyChange` already routes through `Reconcile` — otherwise registering `v2` makes every component read persistently dirty on every `render`/`build`. **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* — only the *registry dispatch* is dormant while just `v1` exists. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event and does **not** suppress `haveMatchingFingerprints`; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a zero-value `FreshToken`/`StoredToken` fails closed; a historical site cannot construct a `FreshToken`; the registry `init()` panics on a `[minSupported,current]` gap. +3. **PR C (Part 2 machinery)**: the **two-type token split** — `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`, fails closed on its zero value), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **every** comparison and compute site through these types. The **current-tree** sites (via `FreshToken.Reconcile`): replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, `BuildDirtyChange`, and the second `ComputeIdentity` caller `bumpComponents` (`update.go`); plus the `computeCurrentFingerprint` (`sourceprep.go`) return-type cascade `string → FreshToken`. The **historical** sites (via `StoredToken.SameDigest`): `FindFingerprintChanges`, `changed.go`'s `classifyComponent`, **and `haveMatchingFingerprints`**. ⚠️ **`haveMatchingFingerprints` is security-load-bearing, not a mechanical changelog sibling:** it gates the cache-poisoning integrity check (`if result.SourcesChange && haveMatchingFingerprints(...)` in `changed.go`). If only `classifyComponent` is converted and this site is missed, the first legitimate `v2` bump makes a version-only re-stamp compare unequal → the integrity violation is **never recorded → tamper evidence silently swallowed**. It must convert to digest-compare in the same PR. Resolution replay reserved (slot reuses `computeRes1`). **Ordering gate (CI-enforced):** `currentLockContentVersion > 1` is forbidden unless `BuildDirtyChange` already routes through `Reconcile` — otherwise registering `v2` makes every component read persistently dirty on every `render`/`build`. The gate is necessary but not sufficient (it does not prove `haveMatchingFingerprints` converted), so it is paired with a **named acceptance test**: `from="v1:sha256:X"`, `to="v2:sha256:X"` ⇒ `haveMatchingFingerprints` returns **true** — a missed conversion fails CI rather than silently disabling the integrity check. **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* — only the *registry dispatch* is dormant while just `v1` exists. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event and does **not** suppress `haveMatchingFingerprints`; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a zero-value `FreshToken`/`StoredToken` fails closed; a historical site cannot construct a `FreshToken`; the registry `init()` panics on a `[minSupported,current]` gap. 4. **PR D (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) — add a field absent from `projectV1` and set it on one component; assert only that lock drifts and every other lock is byte-identical. 5. **PR E (config schema axis, later)**: `schema-version` field + load-time canonical migration + the `config migrate` command. Gated on the first post-reset non-additive TOML change not already absorbed by the reset's normalization pass. -6. **PR F (forced lock migration, gated on the first `v2`)**: the `component migrate` command (the only sanctioned floor-raise; the prescribed fix for a build-critical newly-measured input) and the CI spread-ceiling on `currentLockContentVersion − minSupportedLockContentVersion`. The floor machinery (`minSupportedLockContentVersion`, the `init()` gap-panic) goes live the moment `v2` registers but is unusable without this, so PR F must land with or before the first `v2` — the same gating shape as PR E. +6. **PR F (forced lock migration, gated on the first floor raise)**: the `component migrate` command (the only sanctioned floor-raise; the prescribed fix for a build-critical newly-measured input) and the CI spread-ceiling on `currentLockContentVersion − minSupportedLockContentVersion`. **Gating:** a `v2` bump *without* PR F is safe — v1 stays in the registry and the floor stays at 1, so unmigrated locks still replay. PR F is required only before **raising `minSupportedLockContentVersion` above 1** (retiring v1), since that is what makes un-migrated locks unreplayable. A CI gate forbids raising the floor unless `component migrate` exists. So PR F is decoupled from the first `v2` and gated on the first floor raise. -Each PR is independently revertible up to the cutover. PRs A–B land together at the dev→prod cutover (they move every hash and are absorbed by the scheduled rebuild); PR C is inert until the first post-reset algorithm change; PR D follows; PRs E–F are gated on the first post-reset schema/algorithm change respectively. +Each PR is independently revertible up to the cutover. PRs A–B land together at the dev→prod cutover (they move every hash and are absorbed by the scheduled rebuild); PR C is inert until the first post-reset algorithm change; PR D follows; PR E is gated on the first post-reset schema change, PR F on the first floor raise. ## Open questions 1. For the config schema axis, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. -*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → generation fails, `!`-prefix for always-emit) from which a **`go generate` step emits the per-version `projectVN` functions** — restoring "forgotten field → loud failure" natively and making field removal a *compile* error; the **emit-key is the frozen TOML key** (never the Go field name, so a field rename is byte-neutral; `key=` overrides for keyless fields; duplicate emit-keys fail generation), the **omit-predicate is `reflect.Value.IsZero()`** (former Open Q#3), and the tag DSL is **frozen at three range-operators** (`..`, `!`, `*`) plus the orthogonal `key=`; frozen-ness rests on the **compiler + generator + regeneration-idempotence**, with the **golden-vector coverage invariant** (every field measured at a retained version exercised non-zero in ≥1 vector, with a tag-independent discrimination oracle) as the semantic backstop whose oracle is a **tag-independent dropped-fields ledger** (a field absent from it must discriminate at the *current* version), plus a **kept exclusion ledger** for `-` fields (the inclusion default is native to the tag; the *exclusion* default stays ledgered because it is the G5-dangerous direction); the stored hash is read only through a **two-type token split** — `StoredToken` (digest-compare only) and `FreshToken` (reconcile, requires live inputs), both non-comparable — adopted in PR C, closing both the raw-`==` and comparator mis-classification hazards at compile time; baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version covers both stored hashes **permanently** (no split) — the historical changelog/classifier comparators compare the **digest** (stripping the `v:` prefix), so advancing the shared prefix for a resolution-only algorithm change moves no digest and mints no release; resolution replay stays reserved (slot present, `computeRes1` reused) until `ComputeResolutionHash` first changes. +*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → generation fails, `!`-prefix for always-emit) from which a **`go generate` step emits the per-version `projectVN` functions** — restoring "forgotten field → loud failure" natively and making field removal a *compile* error; the **emit-key is the frozen TOML key** (never the Go field name, so a field rename is byte-neutral; `key=` overrides for keyless fields; duplicate emit-keys fail generation), the **omit-predicate is `reflect.Value.IsZero()`** (former Open Q#3), and the tag DSL is **frozen at three range-operators** (`..`, `!`, `*`) plus the orthogonal `key=`; frozen-ness rests on the **compiler + generator + regeneration-idempotence**, with the **golden-vector coverage invariant** (every field measured at a retained version exercised non-zero in ≥1 vector, with a tag-independent discrimination oracle) as the semantic backstop whose oracle is a **tag-independent dropped-fields ledger** (a field absent from it must discriminate at the *current* version), plus a **kept exclusion ledger** for `-` fields (the inclusion default is native to the tag; the *exclusion* default stays ledgered because it is the G5-dangerous direction); the stored hash is read only through a **two-type token split** — `StoredToken` (digest-compare only) and `FreshToken` (reconcile, requires live inputs), both non-comparable — adopted in PR C, closing both the raw-`==` and comparator mis-classification hazards at compile time; baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version governs both stored hashes, stored **only** in `InputFingerprint`'s `v:` prefix — `ResolutionInputHash` stays a **bare** `sha256:` digest, so a fingerprint-only bump never touches it and the resolver sees no spurious move; the historical changelog/classifier comparators compare the **digest** (stripping the `v:` prefix), so a version-only re-stamp moves no digest and mints no release; resolution replay stays reserved (slot present, `computeRes1` reused) until `ComputeResolutionHash` first changes. From b33568b7fbda4b28088b5fc2e29b469f3b631106 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Tue, 9 Jun 2026 15:14:04 -0700 Subject: [PATCH 11/15] cleanup 2 --- docs/developer/rfc/lazy-schema-migration.md | 41 +++++++++++++++------ 1 file changed, 30 insertions(+), 11 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index 5a08a69a..93432f97 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -73,7 +73,7 @@ Replay only works if an old algorithm function can faithfully reproduce the hash - Its body is `hashstructure.Hash(component, …)`, which **reflects over the live Go struct**. Add a field later and the old function now sees that field (at zero value, included) → its output moves → it can no longer reproduce the historical hash. So *adding* a field breaks *replay of older versions*, which is exactly the additive case we are trying to make free. - It also resolves the live **method set**: once `ComponentConfig` implements `Includable`, the same `hashstructure.Hash` call silently switches inclusion behavior, with no per-call opt-out (the interface is resolved automatically). -The consequence is sharp: an incremental "flip the default to omitempty, lazily migrate" plan **cannot keep its central promise.** "Additive fields are drift-neutral by construction" holds only for locks already at the new version; for the older locks that lazy migration deliberately leaves alone, the next field addition forces a hash change anyway. You do not avoid the mass rebuild — you defer it to the first field addition, and you build the whole replay apparatus on a substrate that makes replay unsound. +An incremental "flip the default to omitempty, lazily migrate" plan therefore **cannot keep its central promise.** "Additive fields are drift-neutral by construction" holds only for locks already at the new version; for the older locks that lazy migration deliberately leaves alone, the next field addition forces a hash change anyway. You do not avoid the mass rebuild — you defer it to the first field addition, and you build the whole replay apparatus on a substrate that makes replay unsound. ### The opportunity: a coordinated cutover is already scheduled @@ -151,7 +151,7 @@ The recurring requirement across the "No" rows is the same: **distinguish a chan **Adding a field as `!` (always-emit) to a *live* version is a version-bump event, not a free additive.** A zero-valued `!` field emits bytes for *every* component, including those that never set it, so it moves every lock the instant it lands on the current version — the opposite of "leave old locks alone." Build-meaningful-zero fields must therefore be introduced at a *new* version (`!v(N+1)..*`) and absorbed by replay, exactly like any other non-additive change. Only omit-if-zero additions (`vN..*`) are free on the live version. -> **`projectVN`** is shorthand used throughout this RFC for the canonical *projection at content-version N* introduced by this design (defined in [Substrate options](#substrate-options) and [The projection substrate](#the-projection-substrate)). It is a per-version function `projectVN(cfg) []byte` — **generated** from declarative version-set tags on the struct fields (see [Version-tagged field selection](#version-tagged-field-selection)), not hand-written. `projectV1` measures the fields whose tag set includes v1; `projectV2` the next version, and so on. Each generated `projectVN` freezes once *superseded* (the next version is registered): its source tags no longer move, its generated code is checked in, and golden vectors backstop it. The live version stays editable for output-preserving additions — that is the whole point. +> **`projectVN`** is shorthand used throughout this RFC for the canonical *projection at content-version N* introduced by this design (defined in [Substrate options](#substrate-options) and [The projection substrate](#the-projection-substrate)). It is a per-version function `projectVN(cfg) []byte` — **generated** from declarative version-set tags on the struct fields (see [Version-tagged field selection](#version-tagged-field-selection)), not hand-written. `projectV1` measures the fields whose tag set includes v1; `projectV2` the next version, and so on. Each generated `projectVN` freezes once *superseded* (the next version is registered): its source tags no longer move, its generated code is checked in, and golden vectors backstop it. The live version stays editable for output-preserving additions. ## Research @@ -193,7 +193,7 @@ The design has **two parts** with very different cost profiles: 1. **Part 1 — the reset (one coordinated cutover).** At the dev→prod cutover, swap the hashing substrate to canonical projection, declare the post-cutover projection as content-version **v1**, and spend the already-scheduled rebuild on every change that is *cheap now and a one-way door later* (the irreversible changes). Pre-reset locks already committed to **git history** stay readable and are never recomputed (the back-compat invariant below); a pre-reset lock in the **working tree** is force-rehashed to the `v1:` token on its first post-reset `update`. 2. **Part 2 — post-reset lazy migration (below).** A versioned registry + replay, now riding the *frozen* projection functions, absorbs the rare genuine algorithm change after the cutover, lazily and per-component, with no second coordinated cutover. -The original "lazy" instinct was right for Part 2 and wrong for Part 1: there is no way to make a substrate swap or a batch of one-way-door normalizations free, so they must ride the one rebuild we are already paying for. Everything that *can* be lazy (additive fields) is pushed into Part 2 and costs nothing. +Part 1 cannot be made lazy: there is no way to make a substrate swap or a batch of one-way-door normalizations free, so they ride the one rebuild we are already paying for. Everything that *can* be lazy (additive fields) is pushed into Part 2 and costs nothing. ## Part 1 — The reset @@ -256,7 +256,7 @@ range = version, [ "..", ( version | "*" ) ] ; version = "v", digit, { digit } ; ``` -It deliberately re-invents protobuf's `reserved` field-range discipline, and protobuf survived because `reserved` never grew. Adding a fourth *range*-operator is an RFC-grade change, not a tag edit — cheap insurance against a bespoke mini-language accreting edge cases. +This mirrors protobuf's `reserved` field-range discipline. Adding a fourth *range*-operator is an RFC-grade change, not a tag edit — cheap insurance against a bespoke mini-language accreting edge cases. **Recovery is the property that justifies the range syntax.** The hard requirement: if we drop a field, then versions later realize we need it again, we must be able to bring it back *without* disturbing any frozen historical hash. The rule that guarantees it: **you only ever *add* a range for the *new* version; you never edit a shipped version's membership.** Walk it: @@ -493,7 +493,7 @@ This is correct *by contract* (a v1 lock promises freshness under the v1 input s Lazy migration means an untouched lock can sit at an old version **indefinitely** (G3 by design). That makes "keep the last *N* versions" a **correctness cliff, not a tuning knob**: if pruning drops the compute function a lock still depends on, replay becomes impossible → forced `FreshnessStale` → the mass rebuild/rewrite (and, via the downstream-consumer analysis below, mass changelog churn) the whole design exists to avoid. So the floor must be explicit and paired with an escape hatch, decided now: - **`minSupportedLockContentVersion`** is a hard floor. A lock below it cannot be replayed and is treated as `Stale`. Dropping a registry entry is therefore a deliberate, breaking, announced act — never incidental cleanup. -- **`component migrate`** (Open Q#5, promoted to a requirement) force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* — it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm — advancing that algorithm is the whole point — so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic-history trap](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm rewrites only the bare, prefix-free `ResolutionInputHash` — which `synthistory` never reads — so it is correctly release-silent.) That is *why* migrate is reset-grade and rare, not a free background sweep — the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal — each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). +- **`component migrate`** (Open Q#5, promoted to a requirement) force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* — it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm, so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic-history trap](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm rewrites only the bare, prefix-free `ResolutionInputHash` — which `synthistory` never reads — so it is correctly release-silent.) Migrate is therefore rare: the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal — each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). - **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine — left alone, the registry, golden vectors, and deprecated tombstone fields grow **append-only** (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion − minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. **Early-warning ramp:** the ceiling is a *warning at ceiling−1*, a hard failure only at the ceiling — so an approaching floor-raise surfaces as a heads-up on the PR *before* the one that registers `v(N+1)`, converting the forced migrate from a surprise blocking failure into a planned event (the design's goal that nothing *unplanned* ever forces a migrate). **Residual:** if genuine algorithm changes arrive *faster* than planned rebuilds, the ceiling still ultimately *forces* an unplanned, release-grade `component migrate`. The ceiling does not eliminate the expensive event; it bounds the backlog by *converting* an unbounded version spread into an occasional forced migrate, with one version of advance notice. This is the accepted cost of lazy-forever coexistence. **Mixed-toolchain hazard — bounded by the version-pin, not auto-repair.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock, stamping a legacy (prefix-less) or lower-`v` hash. In the **working tree** this is self-correcting: the next new-binary run detects the sub-floor token and force-rehashes it to the current version. But "self-correcting" stops at the working tree — if a downgraded lock is **committed**, `FindFingerprintChanges` reads `v1 → legacy → v1` as two real release events, and a published `%autorelease` increment cannot be withdrawn. So the load-bearing guard against *committed* phantom releases is the **CI version-pin**: post-cutover, no old binary may run the `update`-and-commit step. Concretely, that means the lock-writing CI job runs from a **pinned build image (by digest, not a floating tag)** rebuilt from the cutover commit or later, and **no other path reaches the `update`-and-commit step** — local developer binaries do not commit locks; only the pinned job does. (The force-rehash only cleans the working tree; it does not undo history.) The *symmetric* residual — a binary that predates content-version `v2` meeting a `v2` token it cannot replay — is closed by a **required** write-time guard (Open Q#5, now a requirement): refuse to write a token whose version exceeds the binary's `currentLockContentVersion`, erroring rather than silently restamping at `v1`. Note this guard lives in the binary doing the write, so it constrains *newer-but-not-newest* binaries; it does **not** retroactively constrain a genuinely *old* binary — that direction is the version-pin's job. @@ -561,7 +561,7 @@ The versioned-replay story in Part 2 must hold for **every** reader of `InputFin | `component update` `Changed` decision | recomputed identity | vs stored token | **Replay before `Changed`** (see churn policy seam) | | `bumpComponents` (`update.go`) | recomputed identity | vs stored token | Current-tree replay (second `ComputeIdentity` caller) | | `changed.go` `classifyComponent` (CI classifier) | stored token strings (two historical git refs) | **digest compare** (strip `v:` prefix) | **String-only — must NOT replay** (no inputs available; replaying historical configs would violate the no-recompute invariant) | -| `changed.go` `haveMatchingFingerprints` (⚠️ cache-poisoning integrity gate) | stored token strings | **digest compare** (strip `v:` prefix) | **String-only; security-load-bearing** — a version-only delta must read as "same" or the integrity check is silently skipped | +| `changed.go` `haveMatchingFingerprints` (cache-poisoning integrity gate) | stored token strings | **digest compare** (strip `v:` prefix) | **String-only; security-load-bearing** — a version-only delta must read as "same" or the integrity check is silently skipped | | `synthistory.FindFingerprintChanges` | stored token strings across git history | **digest of adjacent commits** (strip `v:` prefix) | **String-only; digest-compare** so a version-only re-stamp never fires a release | | `synthistory.BuildDirtyChange` | recomputed (current ver) | vs stored `headLock` token | **Replay at headLock version** before declaring dirty | | `ResolutionInputHash` staleness/write | recomputed resolution hash | vs stored **bare** digest | **No prefix** (bare `sha256:`); fingerprint-only bumps never touch it; replay reserved | @@ -617,7 +617,7 @@ Both can omit zero values; the decisive difference is **whether an old algorithm | Type-name in hash | No (rename is drift-neutral) | Yes (rename moves every hash) | | Plumbing | Version tags + generator + golden vectors | Value-receiver `HashInclude` on every nested struct + `v.(reflect.Value)` assert | -`Includable` keeps today's hashes byte-identical, which mattered for an *incremental* rollout — but that property is worthless once the reset rebuilds everything anyway, and it comes attached to a substrate that makes replay unsound. (Verified against `hashstructure` v2.0.2, the pinned version in `go.mod`: it reflects the live struct and method set at hash time and mixes `reflect.Type.Name()` into the digest — the properties Problem 6 turns on.) Projection trades byte-compatibility (which we are spending on the coordinated cutover regardless) for frozen replay (which we need forever). Adopted at the reset. +`Includable` keeps today's hashes byte-identical, which mattered for an *incremental* rollout — but that no longer matters once the reset rebuilds everything anyway, and it comes attached to a substrate that makes replay unsound. (Verified against `hashstructure` v2.0.2, the pinned version in `go.mod`: it reflects the live struct and method set at hash time and mixes `reflect.Type.Name()` into the digest — the properties Problem 6 turns on.) Projection trades byte-compatibility (which we are spending on the coordinated cutover regardless) for frozen replay (which we need forever). Adopted at the reset. ### D2 — Version-tagged field selection, generated @@ -641,14 +641,14 @@ The lock **format** `Version` stays at `1`. Bumping it to `2` as a poison pill ## Alternatives considered -- **Incremental lazy migration on the `hashstructure` substrate** (the original plan): flip the inclusion default to omitempty via `Includable`, version the lock content, and migrate lazily — *without* a reset. Rejected: Problem 6 makes its central promise unkeepable. A "frozen" replay function built on `hashstructure.Hash` reflects the live struct, so the first field addition after the switchover moves the old algorithm's output and forces a rehash anyway. The incremental path therefore does not actually avoid a coordinated cutover — it defers one to the first field addition, on a substrate that makes replay unsound. With a coordinated cutover already scheduled (the dev→prod cutover), spending it once on a clean projection substrate strictly dominates. +- **Incremental lazy migration on the `hashstructure` substrate** (the original plan): flip the inclusion default to omitempty via `Includable`, version the lock content, and migrate lazily — *without* a reset. Rejected: Problem 6 makes its central promise unkeepable. A "frozen" replay function built on `hashstructure.Hash` reflects the live struct, so the first field addition after the switchover moves the old algorithm's output and forces a rehash anyway. The incremental path therefore does not actually avoid a coordinated cutover — it defers one to the first field addition, on a substrate that makes replay unsound. With a coordinated cutover already scheduled (the dev→prod cutover), spending it once on a clean projection substrate is the better trade. - **Global `IgnoreZeroValue`** — a blunt switch that omits *all* zero fields with no escape hatch for build-meaningful zeros, and still on the non-frozen `hashstructure` substrate. Rejected. - **Parallel versioned structs with per-struct `Hash()`** — couples locks to Go type identity and duplicates hashing logic per version. Rejected in favor of Part 2's integer-versioned combiner over frozen projections. - **Bump the lock format `Version` 1→2 as a poison pill** — makes old binaries hard-reject reset locks. Rejected: it also blocks old binaries from reading pins to queue a build, and it is unnecessary, since the content-version registry already force-rehashes any sub-floor or downgraded token (D3). Same-format + force-rehash keeps old binaries useful without risking silent corruption. - **Eager fleet-wide migration as the steady-state mechanism** — rewriting every lock on every algorithm change is the mass-churn the design exists to prevent. Rejected for the steady state. The *reset* is a deliberate, one-time, operator-driven eager pass riding an already-scheduled rebuild — the sanctioned exception, not the rule; `component migrate` is its post-reset equivalent for retiring an old version. - **Runtime reflective walker for field selection (instead of generated functions).** One generic `project(cfg, N)` reflects the struct at hash time and emits the fields whose version-set includes N. Least code, and it shares the tag syntax with the chosen approach. Rejected: it reflects the *live* struct at hash time — Problem 6 one layer down — so its frozen-ness rests entirely on golden-vector coverage (test discipline), and field removal degrades from a compile error to a CI failure. Codegen keeps the same tags but moves the reflection to *generate* time and freezes the output as checked-in code, recovering the compile guarantee. - **Hand-written per-version `projectVN` functions (instead of generating them from tags).** Each version gets a bespoke function with one explicit `emit`/`emitAlways` line per measured field. Same compile guarantees as codegen (removal won't compile, literal emit-key), but: membership is smeared across N function bodies; "bring a field back a few versions later" has no first-class expression (you re-add an `emit` line, nothing ties it to the field's earlier life); and the mandatory-decision and coverage properties need separate bookkeeping the tags otherwise carry. Codegen is the same runtime with declarative authoring — strictly preferable given the existing `go generate` infrastructure. -- **Per-field hash manifest in the lock (instead of one opaque token).** Store `{field → hash}` (à la `go.sum`) rather than a single `v:sha256:…` digest. *Genuine wins:* dropping a field becomes ignoring its manifest line — no projection kept alive for replay, so the **deprecate-then-delete two-step and the registry-retirement deadlock** (the append-only growth above) both vanish; and the stored-vs-stored historical comparators become structural set-diffs rather than version-blind string compares. *Why the opaque token still wins for azldev:* (1) the projection substrate **already** delivers additive immunity (G4) — the manifest's headline draw — so that advantage is moot, not additive; (2) the manifest does **not** kill the false-fresh hazard — an old lock has *no line* for a newly-measured input, so there is still no baseline to detect a change to it (the blind spot is relocated, not removed); (3) it makes *algorithm evolution* — the entire point of Part 2 — **harder**, needing per-field versioning where the token needs one integer for the whole algorithm; and (4) it bloats every lock to O(fields × components) (the well-known `go.sum` size cost). The manifest is the better tool for a *static* input set that mainly grows and shrinks; the opaque token + single version is the better tool for an *evolving hashing algorithm*, which is azldev's actual problem. Recorded explicitly because the reset bakes the storage model in — token-vs-manifest is irreversible after PR B — and the retirement deadlock the manifest would have dissolved is instead answered by the floor-advance cadence above. +- **Per-field hash manifest in the lock (instead of one opaque token).** Store `{field → hash}` (à la `go.sum`) rather than a single `v:sha256:…` digest. *Genuine wins:* dropping a field becomes ignoring its manifest line — no projection kept alive for replay, so the **deprecate-then-delete two-step and the registry-retirement deadlock** (the append-only growth above) both vanish; and the stored-vs-stored historical comparators become structural set-diffs rather than version-blind string compares. *Why the opaque token still wins for azldev:* (1) the projection substrate **already** delivers additive immunity (G4) — the manifest's headline draw — so that advantage is moot, not additive; (2) the manifest does **not** kill the false-fresh hazard — an old lock has *no line* for a newly-measured input, so there is still no baseline to detect a change to it (the blind spot is relocated, not removed); (3) it makes *algorithm evolution* — the entire point of Part 2 — **harder**, needing per-field versioning where the token needs one integer for the whole algorithm; and (4) it bloats every lock to O(fields × components) (the well-known `go.sum` size cost). The manifest is the better tool for a *static* input set that mainly grows and shrinks; the opaque token + single version is the better tool for an *evolving hashing algorithm*, which is azldev's actual problem. The reset bakes the storage model in — token-vs-manifest is irreversible after PR B — and the retirement deadlock the manifest would have dissolved is instead answered by the floor-advance cadence above. ## Incremental delivery @@ -656,7 +656,7 @@ The reset (Part 1) must land as one coherent change at the dev→prod cutover; i 1. **PR A (substrate)**: the **projection generator** (`go generate`) — reads the version-set tags and emits the per-version `projectVN(cfg) []byte` functions (literal emits, sorted keys) plus golden-vector and coverage scaffolding — the canonical encoder (`canonicalBuf`, `emit`/`emitAlways`), the version-set tag parser, the frozen **TOML-key** emit rule, the `reflect.Value.IsZero()` omit-predicate, the `sha256` combiner, and the golden vectors. Generate-time guards: a fingerprinted field with **no tag** fails generation; the slimmed **exclusion ledger** and **dropped-fields ledger** replace the retired `TestAllFingerprintedFieldsHaveDecision` audit; **regeneration-idempotence** (CI `go generate` + `git diff --exit-code`) pins shipped versions. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Tests: a field tagged `v2..*` is absent from generated `projectV1`; a `!` range emits at zero; a field with **no** `fingerprint` tag fails generation; a **nested** fingerprinted struct with a tagless field fails generation; deleting a field a retained `projectVN` names **fails to compile**; a **Go-field rename keeping the TOML key** yields a byte-identical digest; two fields colliding on one emit-key fail generation; the coverage oracle (by struct-reflection, not the tag) fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`) and is not in the dropped-fields ledger; golden vectors pin v1; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. 2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`, asserted by a named-constant test (`currentVersion == 1`) with a comment that the *content* version lives in the token prefix, not here — so a future format bump cannot silently break every historical read through `lockfile.Parse`. Ships at the cutover; absorbed by the scheduled rebuild. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. -3. **PR C (Part 2 machinery)**: the **two-type token split** — `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`, fails closed on its zero value), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **every** comparison and compute site through these types. The **current-tree** sites (via `FreshToken.Reconcile`): replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, `BuildDirtyChange`, and the second `ComputeIdentity` caller `bumpComponents` (`update.go`); plus the `computeCurrentFingerprint` (`sourceprep.go`) return-type cascade `string → FreshToken`. The **historical** sites (via `StoredToken.SameDigest`): `FindFingerprintChanges`, `changed.go`'s `classifyComponent`, **and `haveMatchingFingerprints`**. ⚠️ **`haveMatchingFingerprints` is security-load-bearing, not a mechanical changelog sibling:** it gates the cache-poisoning integrity check (`if result.SourcesChange && haveMatchingFingerprints(...)` in `changed.go`). If only `classifyComponent` is converted and this site is missed, the first legitimate `v2` bump makes a version-only re-stamp compare unequal → the integrity violation is **never recorded → tamper evidence silently swallowed**. It must convert to digest-compare in the same PR. Resolution replay reserved (slot reuses `computeRes1`). **Ordering gate (CI-enforced):** `currentLockContentVersion > 1` is forbidden unless `BuildDirtyChange` already routes through `Reconcile` — otherwise registering `v2` makes every component read persistently dirty on every `render`/`build`. The gate is necessary but not sufficient (it does not prove `haveMatchingFingerprints` converted), so it is paired with a **named acceptance test**: `from="v1:sha256:X"`, `to="v2:sha256:X"` ⇒ `haveMatchingFingerprints` returns **true** — a missed conversion fails CI rather than silently disabling the integrity check. **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* — only the *registry dispatch* is dormant while just `v1` exists. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event and does **not** suppress `haveMatchingFingerprints`; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a zero-value `FreshToken`/`StoredToken` fails closed; a historical site cannot construct a `FreshToken`; the registry `init()` panics on a `[minSupported,current]` gap. +3. **PR C (Part 2 machinery)**: the **two-type token split** — `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`, fails closed on its zero value), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **every** comparison and compute site through these types. The **current-tree** sites (via `FreshToken.Reconcile`): replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, `BuildDirtyChange`, and the second `ComputeIdentity` caller `bumpComponents` (`update.go`); plus the `computeCurrentFingerprint` (`sourceprep.go`) return-type cascade `string → FreshToken`. The **historical** sites (via `StoredToken.SameDigest`): `FindFingerprintChanges`, `changed.go`'s `classifyComponent`, **and `haveMatchingFingerprints`**. **`haveMatchingFingerprints` is security-load-bearing:** it gates the cache-poisoning integrity check (`if result.SourcesChange && haveMatchingFingerprints(...)` in `changed.go`). If only `classifyComponent` is converted and this site is missed, the first legitimate `v2` bump makes a version-only re-stamp compare unequal → the integrity violation is **never recorded → tamper evidence silently swallowed**. It must convert to digest-compare in the same PR. Resolution replay reserved (slot reuses `computeRes1`). **Ordering gate (CI-enforced):** `currentLockContentVersion > 1` is forbidden unless `BuildDirtyChange` already routes through `Reconcile` — otherwise registering `v2` makes every component read persistently dirty on every `render`/`build`. The gate is necessary but not sufficient (it does not prove `haveMatchingFingerprints` converted), so it is paired with a **named acceptance test**: `from="v1:sha256:X"`, `to="v2:sha256:X"` ⇒ `haveMatchingFingerprints` returns **true** — a missed conversion fails CI rather than silently disabling the integrity check. **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* — only the *registry dispatch* is dormant while just `v1` exists. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event and does **not** suppress `haveMatchingFingerprints`; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a zero-value `FreshToken`/`StoredToken` fails closed; a historical site cannot construct a `FreshToken`; the registry `init()` panics on a `[minSupported,current]` gap. 4. **PR D (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) — add a field absent from `projectV1` and set it on one component; assert only that lock drifts and every other lock is byte-identical. 5. **PR E (config schema axis, later)**: `schema-version` field + load-time canonical migration + the `config migrate` command. Gated on the first post-reset non-additive TOML change not already absorbed by the reset's normalization pass. 6. **PR F (forced lock migration, gated on the first floor raise)**: the `component migrate` command (the only sanctioned floor-raise; the prescribed fix for a build-critical newly-measured input) and the CI spread-ceiling on `currentLockContentVersion − minSupportedLockContentVersion`. **Gating:** a `v2` bump *without* PR F is safe — v1 stays in the registry and the floor stays at 1, so unmigrated locks still replay. PR F is required only before **raising `minSupportedLockContentVersion` above 1** (retiring v1), since that is what makes un-migrated locks unreplayable. A CI gate forbids raising the floor unless `component migrate` exists. So PR F is decoupled from the first `v2` and gated on the first floor raise. @@ -667,4 +667,23 @@ Each PR is independently revertible up to the cutover. PRs A–B land together a 1. For the config schema axis, does `schema-version` live per-config-file or per-component? Per-file is simpler; per-component allows mixed-version projects during migration. -*Resolved in-text (recorded here so they aren't re-litigated):* the reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover; the substrate is canonical projection (frozen `projectVN` + golden vectors), not `hashstructure`; the **canonical byte encoding is the existing length-prefixed `:=:` form** used by `combineInputs`, committed and pinned by golden vectors at the reset (former Open Q#4 — a precondition for PR A, not an open question, because the reset makes it irreversible); the **version write-guard is a requirement, not an option** (former Open Q#5): a binary refuses to write a token whose version exceeds its own `currentLockContentVersion`, and the CI version-pin prevents *old* binaries from committing downgrades; **field membership is declared in mandatory per-field version-set tags** (`fingerprint:"v1..*"`; absent → generation fails, `!`-prefix for always-emit) from which a **`go generate` step emits the per-version `projectVN` functions** — restoring "forgotten field → loud failure" natively and making field removal a *compile* error; the **emit-key is the frozen TOML key** (never the Go field name, so a field rename is byte-neutral; `key=` overrides for keyless fields; duplicate emit-keys fail generation), the **omit-predicate is `reflect.Value.IsZero()`** (former Open Q#3), and the tag DSL is **frozen at three range-operators** (`..`, `!`, `*`) plus the orthogonal `key=`; frozen-ness rests on the **compiler + generator + regeneration-idempotence**, with the **golden-vector coverage invariant** (every field measured at a retained version exercised non-zero in ≥1 vector, with a tag-independent discrimination oracle) as the semantic backstop whose oracle is a **tag-independent dropped-fields ledger** (a field absent from it must discriminate at the *current* version), plus a **kept exclusion ledger** for `-` fields (the inclusion default is native to the tag; the *exclusion* default stays ledgered because it is the G5-dangerous direction); the stored hash is read only through a **two-type token split** — `StoredToken` (digest-compare only) and `FreshToken` (reconcile, requires live inputs), both non-comparable — adopted in PR C, closing both the raw-`==` and comparator mis-classification hazards at compile time; baseline `v1` is omit-if-zero with **no** include-always legacy in the registry; the lock format `Version` stays at `1` (old binaries keep reading pins to build); the substrate swap and any old-binary downgrade are reconciled by **force-rehashing** sub-floor tokens, not a format gate; the stored hash is an **atomic** `v:sha256:` token; back-compat rests on the verified invariant that **no reader recomputes a historical fingerprint** (synthetic history and historic-overlay application read stored strings only); registry retention is a **floor**, not "last N"; `component migrate` is the post-reset forced-migration pass (lock axis; `config migrate` is its schema-axis sibling) and is itself a deliberate release-grade event; one shared content version governs both stored hashes, stored **only** in `InputFingerprint`'s `v:` prefix — `ResolutionInputHash` stays a **bare** `sha256:` digest, so a fingerprint-only bump never touches it and the resolver sees no spurious move; the historical changelog/classifier comparators compare the **digest** (stripping the `v:` prefix), so a version-only re-stamp moves no digest and mints no release; resolution replay stays reserved (slot present, `computeRes1` reused) until `ComputeResolutionHash` first changes. +## Decisions settled in the body + +Indexed here so they are not re-litigated; each is argued in full at the linked section. + +| Decision | Where | +| -------- | ----- | +| Reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover | §The opportunity | +| Substrate is canonical projection (generated `projectVN` + golden vectors), not `hashstructure` | [§Substrate options](#substrate-options) | +| Field selection is **codegen** from mandatory per-field version-set tags (absent ⇒ generation fails); `go generate` emits the per-version `projectVN` | [§Version-tagged field selection](#version-tagged-field-selection) | +| Emit-key = frozen TOML key (`key=` override; duplicate keys fail generation); omit-predicate = `reflect.Value.IsZero()` (composite omission by *projected* emptiness) | [§Version-tagged field selection](#version-tagged-field-selection) | +| Tag DSL frozen at three range-operators (`..` `!` `*`) plus the orthogonal `key=` | [§Version-tagged field selection](#version-tagged-field-selection) | +| Canonical byte encoding = existing length-prefixed `:=:`; maps sorted-key; per-type value slots — pinned irreversibly at the reset | §The projection substrate | +| Frozen-ness = compiler + generator + regeneration-idempotence; golden-vector coverage (tag-independent dropped-fields oracle) is the backstop; exclusion ledger kept for `-` fields | [§Golden-vector coverage](#golden-vector-coverage-the-backstop) | +| Stored hash = atomic `v:sha256:` token; lock format `Version` stays `1`; sub-floor/downgraded tokens reconciled by force-rehash | §The lock changes at the reset | +| Stored hash read only through the two-type token split (`StoredToken`/`FreshToken`, non-comparable), adopted in PR C | [§Downstream consumers](#downstream-fingerprint-consumers-blast-radius) | +| Version write-guard required (refuse to write above the binary's `currentLockContentVersion`); CI version-pin blocks old-binary commits | [§Registry floor](#registry-floor-and-forced-migration) | +| Back-compat: no reader recomputes a historical fingerprint (synthetic history / overlays read stored strings only) | [§Back-compat invariant](#back-compat-invariant--synthetic-history-reads-stored-strings-never-recomputes) | +| Registry retention is a floor, not "last N"; `component migrate` is the forced-migration pass (a deliberate release-grade event) | [§Registry floor](#registry-floor-and-forced-migration) | +| One content version, stored only in `InputFingerprint`'s prefix; `ResolutionInputHash` stays bare; resolution replay reserved | [§Both hashes share one version](#both-hashes-share-one-version) | +| Historical comparators compare the digest (strip `v:`), so version-only re-stamps mint no release | [§Synthetic changelog path](#the-synthetic-changelogrelease-path-is-the-real-hazard) | From 94db8db63f298a37c182ae336a769edfff064b48 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Tue, 9 Jun 2026 16:28:16 -0700 Subject: [PATCH 12/15] update 9 --- docs/developer/rfc/lazy-schema-migration.md | 348 ++++++++++---------- 1 file changed, 183 insertions(+), 165 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index 93432f97..d689d748 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -4,14 +4,14 @@ - **Author**: @damcilva - **Created**: 2026-06-04 - **Related code**: - - [`internal/fingerprint/fingerprint.go`](../../../internal/fingerprint/fingerprint.go) — `ComputeIdentity`, `ComputeResolutionHash`, `combineInputs` - - [`internal/lockfile/lockfile.go`](../../../internal/lockfile/lockfile.go) — `ComponentLock`, `Parse` format-version gate - - [`internal/projectconfig/fingerprint_test.go`](../../../internal/projectconfig/fingerprint_test.go) — field-inclusion audit - - [`internal/app/azldev/core/components/resolver.go`](../../../internal/app/azldev/core/components/resolver.go) — `computeFreshnessStatus`, `checkFingerprintFreshness` - - [`internal/app/azldev/cmds/component/update.go`](../../../internal/app/azldev/cmds/component/update.go) — `Changed` decision, re-stamp write - - [`internal/app/azldev/cmds/component/changed.go`](../../../internal/app/azldev/cmds/component/changed.go) — `classifyComponent`, `haveMatchingFingerprints` (CI classification) - - [`internal/app/azldev/core/sources/synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go) — `FindFingerprintChanges`, `BuildDirtyChange` (synthetic changelog/release) - - [`internal/app/azldev/core/sources/sourceprep.go`](../../../internal/app/azldev/core/sources/sourceprep.go) — `computeCurrentFingerprint` + - [`internal/fingerprint/fingerprint.go`](../../../internal/fingerprint/fingerprint.go) - `ComputeIdentity`, `ComputeResolutionHash`, `combineInputs` + - [`internal/lockfile/lockfile.go`](../../../internal/lockfile/lockfile.go) - `ComponentLock`, `Parse` format-version gate + - [`internal/projectconfig/fingerprint_test.go`](../../../internal/projectconfig/fingerprint_test.go) - field-inclusion audit + - [`internal/app/azldev/core/components/resolver.go`](../../../internal/app/azldev/core/components/resolver.go) - `computeFreshnessStatus`, `checkFingerprintFreshness` + - [`internal/app/azldev/cmds/component/update.go`](../../../internal/app/azldev/cmds/component/update.go) - `Changed` decision, re-stamp write + - [`internal/app/azldev/cmds/component/changed.go`](../../../internal/app/azldev/cmds/component/changed.go) - `classifyComponent`, `haveMatchingFingerprints` (CI classification) + - [`internal/app/azldev/core/sources/synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go) - `FindFingerprintChanges`, `BuildDirtyChange` (synthetic changelog/release) + - [`internal/app/azldev/core/sources/sourceprep.go`](../../../internal/app/azldev/core/sources/sourceprep.go) - `computeCurrentFingerprint` ## Background @@ -40,7 +40,7 @@ configHash, err := hashstructure.Hash(component, hashstructure.FormatV2, &hashstructure.HashOptions{TagName: "fingerprint"}) ``` -`configHash` is then folded together with the source identity, overlay file hashes, manual bump, and distro release version into a domain-separated SHA256 (`combineInputs`). Field inclusion is policed by [`TestAllFingerprintedFieldsHaveDecision`](../../../internal/projectconfig/fingerprint_test.go): every field of every fingerprinted struct must be consciously categorized as **included** (no tag) or **excluded** (`fingerprint:"-"`). The safe default is *included* — a new field contributes to the hash unless told otherwise. +`configHash` is then folded together with the source identity, overlay file hashes, manual bump, and distro release version into a domain-separated SHA256 (`combineInputs`). Field inclusion is policed by [`TestAllFingerprintedFieldsHaveDecision`](../../../internal/projectconfig/fingerprint_test.go): every field of every fingerprinted struct must be consciously categorized as **included** (no tag) or **excluded** (`fingerprint:"-"`). The safe default is *included* - a new field contributes to the hash unless told otherwise. Drift is detected in [`resolver.go`](../../../internal/app/azldev/core/components/resolver.go): `computeFreshnessStatus` → `checkFingerprintFreshness` recomputes the identity and compares it to `InputFingerprint`, yielding `FreshnessCurrent` or `FreshnessStale`. `component update` ([`update.go`](../../../internal/app/azldev/cmds/component/update.go)) re-stamps the lock and flips a user-visible `Changed` flag whenever the fingerprint moves. @@ -52,34 +52,34 @@ As the tool matures, three *independent* notions of "version" are emerging. Conf | ---- | ------------- | ----------- | ------------- | --------------------- | | **Config schema version** | on-disk TOML field shape | load / migration layer | No | `config migrate` (future) | | **Lock content-hash version** | how inputs fold into the lock's stored hashes (`InputFingerprint` *and* `ResolutionInputHash`) | `fingerprint` combiner | No (implicitly v1) | `component migrate` | -| **Lock file format version** | lock file serialization | `lockfile` | Yes (`Version = 1`) | — (frozen at `1`) | +| **Lock file format version** | lock file serialization | `lockfile` | Yes (`Version = 1`) | - (frozen at `1`) | ### The problem -Because field inclusion defaults to *included*, **adding any new fingerprinted config field re-hashes every component**, even components that never set the field. `hashstructure` hashes a zero-value field identically to a present-but-empty field — but *differently* from a field that does not exist in the struct at all. So the moment the Go struct gains `Foo string`, every component's `configHash` changes, every `InputFingerprint` changes, and every `*.lock` shows drift on the next `component update`. +Because field inclusion defaults to *included*, **adding any new fingerprinted config field re-hashes every component**, even components that never set the field. `hashstructure` hashes a zero-value field identically to a present-but-empty field - but *differently* from a field that does not exist in the struct at all. So the moment the Go struct gains `Foo string`, every component's `configHash` changes, every `InputFingerprint` changes, and every `*.lock` shows drift on the next `component update`. Concretely: we add field `foo` and set `foo = "baz"` on package `bar`. The desired outcome is that **only** `bar.lock` drifts. The actual outcome today is that **all** lock files drift. -**The root concern is git churn, not rebuilds.** The mass rebuild is a knock-on effect; the thing we actually want to protect is the **lock-file diff in a PR**. A change that touches one package should produce exactly one changed `*.lock` — ideally zero changed bytes in any other lock file, in any way. Lock files should change *only* when there is a real, per-component change. Clean diffs keep PRs reviewable, keep `git blame` meaningful, and make "this lock moved" a trustworthy signal that *that component's* inputs actually changed. The rebuild fan-out follows for free once the diffs are clean. +**The root concern is git churn, not rebuilds.** The mass rebuild is a knock-on effect; the thing we actually want to protect is the **lock-file diff in a PR**. A change that touches one package should produce exactly one changed `*.lock` - ideally zero changed bytes in any other lock file, in any way. Lock files should change *only* when there is a real, per-component change. Clean diffs keep PRs reviewable, keep `git blame` meaningful, and make "this lock moved" a trustworthy signal that *that component's* inputs actually changed. The rebuild fan-out follows for free once the diffs are clean. -There is a harder variant lurking behind the additive case: **non-additive** schema changes — renaming a field, removing one, changing a baked-in default, or fixing a bug in the hashing logic itself. These legitimately change the *meaning* of the config without changing user intent, and we will eventually need to absorb them without forcing every consumer to rebuild. +There is a harder variant lurking behind the additive case: **non-additive** schema changes - renaming a field, removing one, changing a baked-in default, or fixing a bug in the hashing logic itself. These legitimately change the *meaning* of the config without changing user intent, and we will eventually need to absorb them without forcing every consumer to rebuild. ### The substrate problem: replay only works if old algorithms stay frozen The natural fix for non-additive change is **versioned replay**: stamp an algorithm version into the lock, keep the old algorithm around, and when a lock is behind, recompute with *its* algorithm to ask "were the inputs actually unchanged, or did only the encoding move?" If unchanged, accept the lock without a rebuild. -Replay only works if an old algorithm function can faithfully reproduce the hash it produced when the lock was written. **On the current `hashstructure` substrate, it cannot** — a "frozen" algorithm function is not actually frozen: +Replay only works if an old algorithm function can faithfully reproduce the hash it produced when the lock was written. **On the current `hashstructure` substrate, it cannot** - a "frozen" algorithm function is not actually frozen: - Its body is `hashstructure.Hash(component, …)`, which **reflects over the live Go struct**. Add a field later and the old function now sees that field (at zero value, included) → its output moves → it can no longer reproduce the historical hash. So *adding* a field breaks *replay of older versions*, which is exactly the additive case we are trying to make free. - It also resolves the live **method set**: once `ComponentConfig` implements `Includable`, the same `hashstructure.Hash` call silently switches inclusion behavior, with no per-call opt-out (the interface is resolved automatically). -An incremental "flip the default to omitempty, lazily migrate" plan therefore **cannot keep its central promise.** "Additive fields are drift-neutral by construction" holds only for locks already at the new version; for the older locks that lazy migration deliberately leaves alone, the next field addition forces a hash change anyway. You do not avoid the mass rebuild — you defer it to the first field addition, and you build the whole replay apparatus on a substrate that makes replay unsound. +An incremental "flip the default to omitempty, lazily migrate" plan therefore **cannot keep its central promise.** "Additive fields are drift-neutral by construction" holds only for locks already at the new version; for the older locks that lazy migration deliberately leaves alone, the next field addition forces a hash change anyway. You do not avoid the mass rebuild - you defer it to the first field addition, and you build the whole replay apparatus on a substrate that makes replay unsound. ### The opportunity: a coordinated cutover is already scheduled -The project has a **dev→prod environment cutover** coming that forces a full rebuild regardless. This is a *coordinated cutover* — a one-time, distro-wide switch with no mixed-version window, the sanctioned moment to make changes that cannot be made lazily. That changes the calculus completely. The entire "lazy" framing exists to *avoid* a mass update; if exactly one sanctioned mass update is already on the calendar, the strategy inverts: +The project has a **dev→prod environment cutover** coming that forces a full rebuild regardless. This is a *coordinated cutover* - a one-time, distro-wide switch with no mixed-version window, the sanctioned moment to make changes that cannot be made lazily. That changes the calculus completely. The entire "lazy" framing exists to *avoid* a mass update; if exactly one sanctioned mass update is already on the calendar, the strategy inverts: -> **Lazy migration is for the cheap and additive. The one free rebuild is a budget — spend it exclusively on the one-way doors that are cheap now and a coordinated-cutover-only change later.** +> **Lazy migration is for the cheap and additive. The one free rebuild is a budget - spend it exclusively on the one-way doors that are cheap now and a coordinated-cutover-only change later.** This RFC therefore has two parts: **(1)** a one-time **reset** at the dev→prod cutover that replaces the hashing substrate with one whose old algorithms are *genuinely* frozen, and **(2)** a **post-reset lazy migration** mechanism (versioned registry + replay) that rides that clean substrate for the rare genuine algorithm change thereafter. Part 2 is what the original "lazy" design was reaching for; part 1 is what makes it sound. @@ -88,9 +88,9 @@ This RFC therefore has two parts: **(1)** a one-time **reset** at the dev→prod - **G1 (primary, non-functional): no spurious lock-file diffs *after the reset*.** Once prod locks exist, landing a config-schema or hashing change must not rewrite `*.lock` files for components whose effective inputs are unchanged. The reset itself is the *one* sanctioned exception, absorbed by the already-scheduled rebuild. - **G2: only real changes drift.** Post-reset, a lock changes iff that component's build-effective inputs changed. - **G3: piecemeal, lazy migration post-reset.** Genuine algorithm evolution after the reset rolls out per-component, riding independent changes, never as a big-bang. -- **G4: additive fields are drift-neutral by construction — *truly*, not just for new locks.** On the projection substrate (below) an unset additive field is invisible to *every* lock including old ones, because old versions emit only the fields their tags include — a field added later is not in any shipped version's tag set, so it cannot move an existing hash. -- **G5: correctness backstop preserved — relative to the lock's own content version.** Never silently under-rebuild: a genuine input change must drift any lock *whose version measures that input*. An input a lock's version does not measure (a field introduced later, a not-yet-adopted newly-measured input) is correctly invisible until the lock migrates — lazy non-adoption is by contract, not a miss. Replay may accept encoding/over-capture changes; it must never mask a behavior-changing one within the lock's own measured set. -- **G6 (new, hard): back-compatible reads for synthetic history.** The new binary must still **read** pre-reset locks across git history (synthetic changelog/release walks them), even though it **writes** only the new format. Reading never recomputes a historical hash — it compares stored strings only. +- **G4: additive fields are drift-neutral by construction - *truly*, not just for new locks.** On the projection substrate (below) an unset additive field is invisible to *every* lock including old ones, because old versions emit only the fields their tags include - a field added later is not in any shipped version's tag set, so it cannot move an existing hash. +- **G5: correctness backstop preserved - relative to the lock's own content version.** Never silently under-rebuild: a genuine input change must drift any lock *whose version measures that input*. An input a lock's version does not measure (a field introduced later, a not-yet-adopted newly-measured input) is correctly invisible until the lock migrates - lazy non-adoption is by contract, not a miss. Replay may accept encoding/over-capture changes; it must never mask a behavior-changing one within the lock's own measured set. +- **G6 (new, hard): back-compatible reads for synthetic history.** The new binary must still **read** pre-reset locks across git history (synthetic changelog/release walks them), even though it **writes** only the new format. Reading never recomputes a historical hash - it compares stored strings only. ## Problem inventory @@ -103,7 +103,7 @@ This RFC therefore has two parts: **(1)** a one-time **reset** at the dev→prod | 5 | Migration is all-or-nothing | Freshness check is binary match/no-match against one stored hash | No piecemeal rollout | | 6 | Versioned replay is unsound on the current substrate | "Frozen" algorithm = `hashstructure.Hash` over the **live** struct/method-set; adding a field moves the old function's output | Replay cannot reproduce historical hashes | -Problems 1–5 share a shape: a change that *should* be invisible to most components is forced to be visible to all of them, because the fingerprint cannot distinguish "input changed" from "encoding changed." Problem 4 is the missing primitive for managed config evolution. Problem 5 is the property we want from any post-reset solution — **per-component, lazy** migration. Problem 6 is the one that kills the *incremental* path outright: the very mechanism that would make problems 1–3 free (versioned replay) is unsound while the substrate reflects the live struct. Fixing 6 is what the reset buys. +Problems 1-5 share a shape: a change that *should* be invisible to most components is forced to be visible to all of them, because the fingerprint cannot distinguish "input changed" from "encoding changed." Problem 4 is the missing primitive for managed config evolution. Problem 5 is the property we want from any post-reset solution - **per-component, lazy** migration. Problem 6 is the one that kills the *incremental* path outright: the very mechanism that would make problems 1-3 free (versioned replay) is unsound while the substrate reflects the live struct. Fixing 6 is what the reset buys. ## How fingerprinting works today (detail) @@ -125,11 +125,11 @@ The struct's type name *is* part of the hash (`hashstructure` mixes in `reflect. **Why this substrate cannot host frozen replay.** Every property above is resolved *at hash time against the live program*, not against a pinned description of the v1 encoding: -- The set of fields walked is whatever the struct has *now* — add a field, and last year's `computeFP1` (whose body is still just `hashstructure.Hash(component)`) now includes it. -- Whether `Includable` is consulted depends on whether the type implements it *now* — not on what was true when v1 locks were written. +- The set of fields walked is whatever the struct has *now* - add a field, and last year's `computeFP1` (whose body is still just `hashstructure.Hash(component)`) now includes it. +- Whether `Includable` is consulted depends on whether the type implements it *now* - not on what was true when v1 locks were written. - A `value` vs `pointer` receiver subtlety even decides whether the root struct's `HashInclude` is seen at all (the top-level value is not addressable). -A function meant to be "the v1 algorithm, forever" therefore changes meaning every time the struct or its method set changes. That is the disqualifier for the incremental plan (Problem 6) and the motivation for the projection substrate below, whose v1 projection emits only its version-tagged fields and reads neither the method set nor the type name — immune to all three. +A function meant to be "the v1 algorithm, forever" therefore changes meaning every time the struct or its method set changes. That is the disqualifier for the incremental plan (Problem 6) and the motivation for the projection substrate below, whose v1 projection emits only its version-tagged fields and reads neither the method set nor the type name - immune to all three. ## Change taxonomy @@ -137,21 +137,21 @@ Not every config change should be treated the same way. The right mechanism depe | Class | Example | Should unaffected locks drift? | Mechanism | | ----- | ------- | ------------------------------ | --------- | -| **Additive field** | new `foo` field, unset on most components | No | **Free, no bump.** Tag the new field `vN..*` (current version, omit-if-zero); a component that leaves it unset emits identical bytes, so no shipped hash moves — adding an omit-if-zero field to the live version is the one output-preserving no-bump edit. A setter whose lock is *already at* version N drifts; a setter on an older, un-migrated lock is left unchanged (false-fresh) until that lock next re-stamps or is migrated — the same lazy contract as a newly-measured input. To force the field onto the whole fleet now, do an explicit `component migrate`. **Tagging a build-meaningful-zero field `!vN..*` (always-emit) on the live version is *not* this case** — see the note below. | -| **Additive with non-zero default** | new field defaulted to `"auto"` via defaults merge | No | **Bump + replay.** The default resolves non-zero on *every* component, so it is emitted everywhere and would move every hash — omit-if-zero can't save it. Bump and tag the field `v(N+1)..*`; old locks **replay at their version** (whose set excludes it), match their stored digest → recognized unchanged → lazy re-stamp, no rebuild. | -| **Default change on an *existing* field** | bump `jobs` default `4`→`8` | Yes — every component's effective input moved | **Not lazy-maskable.** Replay recomputes the *current* config (now resolving to `8`) under the old algorithm → `jobs=8` ≠ stored `jobs=4` → genuine fleet-wide drift; replay cannot suppress it because the resolved value genuinely changed for everyone. Escape hatch: `config migrate` writes the *old* resolved value explicitly (`jobs=4`) into each config **before** moving the default — existing components then pin the old value (no drift) and only new components pick up `8`. Without that pre-pass it is a legitimate (if large) fleet rebuild, not a bug. | +| **Additive field** | new `foo` field, unset on most components | No | **Free, no bump.** Tag the new field `vN..*` (current version, omit-if-zero); a component that leaves it unset emits identical bytes, so no shipped hash moves - adding an omit-if-zero field to the live version is the one output-preserving no-bump edit. A setter whose lock is *already at* version N drifts; a setter on an older, un-migrated lock is left unchanged (false-fresh) until that lock next re-stamps or is migrated - the same lazy contract as a newly-measured input. To force the field onto the whole fleet now, do an explicit `component migrate`. **Tagging a build-meaningful-zero field `!vN..*` (always-emit) on the live version is *not* this case** - see the note below. | +| **Additive with non-zero default** | new field defaulted to `"auto"` via defaults merge | No | **Bump + replay.** The default resolves non-zero on *every* component, so it is emitted everywhere and would move every hash - omit-if-zero can't save it. Bump and tag the field `v(N+1)..*`; old locks **replay at their version** (whose set excludes it), match their stored digest → recognized unchanged → lazy re-stamp, no rebuild. | +| **Default change on an *existing* field** | bump `jobs` default `4`→`8` | Yes - every component's effective input moved | **Not lazy-maskable.** Replay recomputes the *current* config (now resolving to `8`) under the old algorithm → `jobs=8` ≠ stored `jobs=4` → genuine fleet-wide drift; replay cannot suppress it because the resolved value genuinely changed for everyone. Escape hatch: `config migrate` writes the *old* resolved value explicitly (`jobs=4`) into each config **before** moving the default - existing components then pin the old value (no drift) and only new components pick up `8`. Without that pre-pass it is a legitimate (if large) fleet rebuild, not a bug. | | **Rename / move** | `foo` → `bar`, same semantics | No | **Schema migration + bump + replay.** Migrate old TOML → canonical struct (the rename lands in the struct), then tag the renamed field `v(N+1)..*`. Old locks replay at their version and are recognized unchanged → lazy re-stamp, no rebuild. | -| **Semantic change** | meaning of `foo` changes; output differs | Yes — that's correct | **None.** The build output genuinely differs, so the lock *should* drift. Replay at the old version would (correctly) mismatch → `Stale` → rebuild. Nothing to suppress. | +| **Semantic change** | meaning of `foo` changes; output differs | Yes - that's correct | **None.** The build output genuinely differs, so the lock *should* drift. Replay at the old version would (correctly) mismatch → `Stale` → rebuild. Nothing to suppress. | | **Hashing bugfix** | overlay ordering bug in the combiner | No | **Bump + replay.** Ship the fixed combiner as the version-`N+1` half of `computeFP(N+1)`; old locks replay at the old (buggy) version. If their inputs are unchanged the buggy digest still matches → recognized unchanged → lazy re-stamp to the fixed version, no rebuild. | -| **Newly measured input** | start folding in a new overlay source or identity element | No | **Bump + replay.** A non-config input is added in the combiner half of `computeFP(N+1)` (a config field would be tagged `v(N+1)..*` instead). Old locks replay at their version, which didn't fold it in, match their stored digest → recognized unchanged → lazy re-stamp, no rebuild. **Caveat:** until a lock migrates, replay is *blind* to the new input, so a change to it reads as fresh (false-fresh) — if it is build-critical, force a `component migrate` pass instead of riding lazy adoption (see [churn-avoidance](#churn-avoidance-policies-g1)). | +| **Newly measured input** | start folding in a new overlay source or identity element | No | **Bump + replay.** A non-config input is added in the combiner half of `computeFP(N+1)` (a config field would be tagged `v(N+1)..*` instead). Old locks replay at their version, which didn't fold it in, match their stored digest → recognized unchanged → lazy re-stamp, no rebuild. **Caveat:** until a lock migrates, replay is *blind* to the new input, so a change to it reads as fresh (false-fresh) - if it is build-critical, force a `component migrate` pass instead of riding lazy adoption (see [churn-avoidance](#churn-avoidance-policies-g1)). | | **Field removal** | drop deprecated `foo` | No, if nobody set it | **Deprecate-then-delete (+ bump for setters).** Close the field's range at the prior version (`vK..*` → `vK..vN`, so v(N+1) stops measuring it) but **keep the field on the struct** so older versions can still read it for replay. Only after the floor passes vN (ideally after a `component migrate`) physically delete the field. Setters drift on the bump; non-setters replay clean. | -| **Resurrected field** | re-measure a previously-dropped `foo` | Depends — only if its value moved | **Tag edit (+ bump).** Append a new range to the field's set (`v1..v3,v8..*`) so v8+ measures it again while v1–v7 stay byte-identical (golden-vector-enforced). If the field was already physically deleted, bring it back as a fresh additive field tagged `v8..*`. The earlier life and the revival never collide because each version's output is pinned independently. | +| **Resurrected field** | re-measure a previously-dropped `foo` | Depends - only if its value moved | **Tag edit (+ bump).** Append a new range to the field's set (`v1..v3,v8..*`) so v8+ measures it again while v1-v7 stay byte-identical (golden-vector-enforced). If the field was already physically deleted, bring it back as a fresh additive field tagged `v8..*`. The earlier life and the revival never collide because each version's output is pinned independently. | -The recurring requirement across the "No" rows is the same: **distinguish a change in user intent from a change in encoding, and only drift on the former.** Note the first row: on the projection substrate, a new field is added to `projectVN` as *omit-if-zero*, so a component that does not set it emits identical bytes and stays hash-neutral — *for every lock, old or new*, because old configs never set the brand-new field. Adding it does not move any existing hash (no shipped lock set it), so it needs no version bump. Part 2 then carries only the genuinely hard cases (rows 2, 5, and post-reset renames/removals). The shared move in every "Bump + replay" row is the same primitive — **increment the content version, keep the old `projectVN` as a frozen replay projection, and let unchanged locks re-stamp lazily** — detailed in [Part 2](#part-2--post-reset-lazy-migration). +The recurring requirement across the "No" rows is the same: **distinguish a change in user intent from a change in encoding, and only drift on the former.** Note the first row: on the projection substrate, a new field is added to `projectVN` as *omit-if-zero*, so a component that does not set it emits identical bytes and stays hash-neutral - *for every lock, old or new*, because old configs never set the brand-new field. Adding it does not move any existing hash (no shipped lock set it), so it needs no version bump. Part 2 then carries only the genuinely hard cases (rows 2, 5, and post-reset renames/removals). The shared move in every "Bump + replay" row is the same primitive - **increment the content version, keep the old `projectVN` as a frozen replay projection, and let unchanged locks re-stamp lazily** - detailed in [Part 2](#part-2-post-reset-lazy-migration). -**Adding a field as `!` (always-emit) to a *live* version is a version-bump event, not a free additive.** A zero-valued `!` field emits bytes for *every* component, including those that never set it, so it moves every lock the instant it lands on the current version — the opposite of "leave old locks alone." Build-meaningful-zero fields must therefore be introduced at a *new* version (`!v(N+1)..*`) and absorbed by replay, exactly like any other non-additive change. Only omit-if-zero additions (`vN..*`) are free on the live version. +**Adding a field as `!` (always-emit) to a *live* version is a version-bump event, not a free additive.** A zero-valued `!` field emits bytes for *every* component, including those that never set it, so it moves every lock the instant it lands on the current version - the opposite of "leave old locks alone." Build-meaningful-zero fields must therefore be introduced at a *new* version (`!v(N+1)..*`) and absorbed by replay, exactly like any other non-additive change. Only omit-if-zero additions (`vN..*`) are free on the live version. -> **`projectVN`** is shorthand used throughout this RFC for the canonical *projection at content-version N* introduced by this design (defined in [Substrate options](#substrate-options) and [The projection substrate](#the-projection-substrate)). It is a per-version function `projectVN(cfg) []byte` — **generated** from declarative version-set tags on the struct fields (see [Version-tagged field selection](#version-tagged-field-selection)), not hand-written. `projectV1` measures the fields whose tag set includes v1; `projectV2` the next version, and so on. Each generated `projectVN` freezes once *superseded* (the next version is registered): its source tags no longer move, its generated code is checked in, and golden vectors backstop it. The live version stays editable for output-preserving additions. +> **`projectVN`** is shorthand used throughout this RFC for the canonical *projection at content-version N* introduced by this design (defined in [Substrate options](#substrate-options) and [The projection substrate](#the-projection-substrate)). It is a per-version function `projectVN(cfg) []byte` - **generated** from declarative version-set tags on the struct fields (see [Version-tagged field selection](#version-tagged-field-selection)), not hand-written. `projectV1` measures the fields whose tag set includes v1; `projectV2` the next version, and so on. Each generated `projectVN` freezes once *superseded* (the next version is registered): its source tags no longer move, its generated code is checked in, and golden vectors backstop it. The live version stays editable for output-preserving additions. ## Research @@ -159,30 +159,30 @@ The recurring requirement across the "No" rows is the same: **distinguish a chan Two substrates can produce a content fingerprint of the resolved config. The difference that matters here is **whether an old algorithm function can be frozen.** -- **`hashstructure` + `Includable` (rejected as the substrate).** Keeps existing hashes byte-identical and gives per-field omission via `HashInclude`. But, as established above (Problem 6), a function built on `hashstructure.Hash` reflects over the live struct and method set, so it cannot be a frozen historical algorithm. It also requires a value-receiver `HashInclude` on *every* nested fingerprinted struct and a subtle `v.(reflect.Value)` type-assert to work at all — brittle plumbing in service of a substrate that still can't host sound replay. -- **Canonical projection + stdlib hash (chosen).** Split the two jobs `hashstructure` fuses — *field selection* and *hashing* — into explicit steps. Field selection is **declared per field** as a version-set in the `fingerprint` tag (`fingerprint:"v1..*"`); a `go generate` step emits a per-version `projectVN(cfg) []byte` function that serializes the fields whose set includes version N in a canonical, sorted, self-delimiting byte form, and an stdlib `sha256` hashes those bytes. Because a shipped `projectVN` is frozen checked-in code, it does not see fields added later, does not depend on the type's method set, and does not depend on receiver subtleties. It is a genuinely frozen pure function of `(cfg)` per version — the property replay requires. The cost is owning the generator and **golden hash vectors** per version (a checked-in `(config, version) → hash` table) so the generator itself is CI-backstopped. +- **`hashstructure` + `Includable` (rejected as the substrate).** Keeps existing hashes byte-identical and gives per-field omission via `HashInclude`. But, as established above (Problem 6), a function built on `hashstructure.Hash` reflects over the live struct and method set, so it cannot be a frozen historical algorithm. It also requires a value-receiver `HashInclude` on *every* nested fingerprinted struct and a subtle `v.(reflect.Value)` type-assert to work at all - brittle plumbing in service of a substrate that still can't host sound replay. +- **Canonical projection + stdlib hash (chosen).** Split the two jobs `hashstructure` fuses - *field selection* and *hashing* - into explicit steps. Field selection is **declared per field** as a version-set in the `fingerprint` tag (`fingerprint:"v1..*"`); a `go generate` step emits a per-version `projectVN(cfg) []byte` function that serializes the fields whose set includes version N in a canonical, sorted, self-delimiting byte form, and an stdlib `sha256` hashes those bytes. Because a shipped `projectVN` is frozen checked-in code, it does not see fields added later, does not depend on the type's method set, and does not depend on receiver subtleties. It is a genuinely frozen pure function of `(cfg)` per version - the property replay requires. The cost is owning the generator and **golden hash vectors** per version (a checked-in `(config, version) → hash` table) so the generator itself is CI-backstopped. The projection substrate is what makes G4 true for old locks and what makes Part 2's replay sound. It is adopted at the reset (below), not incrementally. ### How other tools version lock state -- **Cargo (`Cargo.lock`)** carries an explicit `version = 4` at the top of the lock and teaches `cargo` to read older versions, upgrading in place on the next write. Migration is lazy — touching the lock upgrades it. +- **Cargo (`Cargo.lock`)** carries an explicit `version = 4` at the top of the lock and teaches `cargo` to read older versions, upgrading in place on the next write. Migration is lazy - touching the lock upgrades it. - **npm (`package-lock.json`)** uses `lockfileVersion` and supports reading v1/v2/v3, rewriting to the current version on install. - **Terraform state** stores a `version` and a `terraform_version`; state is upgraded forward on use, never downgraded. - **Go modules** avoid the problem entirely by hashing *content* (`h1:` dirhashes) rather than a struct shape, so adding metadata fields never perturbs existing sums. -The common pattern: an **integer version stamped into the persisted artifact**, plus the ability to **read and replay older versions**, plus **lazy forward-migration on write**. We keep `ComponentLock.Version` (the lock *format* slot) fixed at `1` and carry the *content* version **inside the `InputFingerprint` token** (`v:sha256:…`) rather than in a separate struct field — one atomic value, no version/digest desync, no new TOML field for an old binary to mishandle. The Go-modules lesson is the deepest one: hashing *content* rather than struct shape is what makes additive metadata free — the canonical-projection substrate is our version of that lesson. +The common pattern: an **integer version stamped into the persisted artifact**, plus the ability to **read and replay older versions**, plus **lazy forward-migration on write**. We keep `ComponentLock.Version` (the lock *format* slot) fixed at `1` and carry the *content* version **inside the `InputFingerprint` token** (`v:sha256:…`) rather than in a separate struct field - one atomic value, no version/digest desync, no new TOML field for an old binary to mishandle. The Go-modules lesson is the deepest one: hashing *content* rather than struct shape is what makes additive metadata free - the canonical-projection substrate is our version of that lesson. -**Where this design goes beyond the precedent.** All four tools above keep exactly **one** active algorithm: Cargo/npm/Terraform rewrite the *whole* artifact to the current version on next touch (eager-on-write), and Go modules sidestep replay entirely by never re-migrating semantics. **None of them keeps N historical hashing algorithms alive simultaneously across an indefinitely-unmigrated fleet** — which is exactly Part 2's behavior. The citations support "version stamp + lazy forward-migrate on write"; they do *not* cover "frozen algorithms coexisting forever." That coexistence is justified here on its own terms (it is what avoids a fleet rebuild on every algorithm change), and its one real cost — append-only registry growth — is bounded by the [floor-advance cadence](#registry-floor-and-forced-migration), not by precedent. +**Where this design goes beyond the precedent.** All four tools above keep exactly **one** active algorithm: Cargo/npm/Terraform rewrite the *whole* artifact to the current version on next touch (eager-on-write), and Go modules sidestep replay entirely by never re-migrating semantics. **None of them keeps N historical hashing algorithms alive simultaneously across an indefinitely-unmigrated fleet** - which is exactly Part 2's behavior. The citations support "version stamp + lazy forward-migrate on write"; they do *not* cover "frozen algorithms coexisting forever." That coexistence is justified here on its own terms (it is what avoids a fleet rebuild on every algorithm change), and its one real cost - append-only registry growth - is bounded by the [floor-advance cadence](#registry-floor-and-forced-migration), not by precedent. ### Where the hashing logic should live -With the projection substrate the fingerprint algorithm decomposes into two steps. **Both are versioned together** by the single lock content version — the version pins the *entire* fingerprint computation, not just the field list: +With the projection substrate the fingerprint algorithm decomposes into two steps. **Both are versioned together** by the single lock content version - the version pins the *entire* fingerprint computation, not just the field list: -1. **Projection** — `projectVN(config)` names and serializes the config fields version N measures. This is *about the config type*, but it is data extraction, not hashing: it returns canonical **bytes**, not a hash. -2. **Combiner / orchestration** — reads overlay file contents (needs `opctx.FS`), folds in source identity / releasever / bump, applies domain separation, and runs `sha256` over the projection bytes plus those non-config inputs. None of these are config fields, but the combiner equally decides *what is measured*: starting to fold in a new overlay source, adding an identity input, or reordering the fold all change the digest exactly as a projection change does. +1. **Projection** - `projectVN(config)` names and serializes the config fields version N measures. This is *about the config type*, but it is data extraction, not hashing: it returns canonical **bytes**, not a hash. +2. **Combiner / orchestration** - reads overlay file contents (needs `opctx.FS`), folds in source identity / releasever / bump, applies domain separation, and runs `sha256` over the projection bytes plus those non-config inputs. None of these are config fields, but the combiner equally decides *what is measured*: starting to fold in a new overlay source, adding an identity input, or reordering the fold all change the digest exactly as a projection change does. -So the per-version compute function in the registry is the **whole algorithm** — `computeFPN` = `projectVN` + the combiner step frozen at version N. "Watching another field" splits cleanly: if it is a *config* field, it goes in `projectV(N+1)`; if it is a *non-config* input (a new overlay source, a new identity element), it goes in the combiner half of `computeFP(N+1)`. Either way it is a content-version bump absorbed by replay, never a silent hash move. The combiner is the **sole version authority**: it owns the registry and the dispatch, and `projectVN` is just the frozen config-extraction step it calls. +So the per-version compute function in the registry is the **whole algorithm** - `computeFPN` = `projectVN` + the combiner step frozen at version N. "Watching another field" splits cleanly: if it is a *config* field, it goes in `projectV(N+1)`; if it is a *non-config* input (a new overlay source, a new identity element), it goes in the combiner half of `computeFP(N+1)`. Either way it is a content-version bump absorbed by replay, never a silent hash move. The combiner is the **sole version authority**: it owns the registry and the dispatch, and `projectVN` is just the frozen config-extraction step it calls. Expose the projection on (or beside) the config type and keep the combiner in `fingerprint`. **Do not** expose a `ConfigHash()` method on the type: a method that returns a finished hash both drags a hashing concern onto a data type *and* tempts callers to route around the version registry to get a raw, version-agnostic hash. Returning bytes from `projectVN` keeps the type ignorant of versioning and crypto. @@ -190,12 +190,12 @@ Expose the projection on (or beside) the config type and keep the combiner in `f The design has **two parts** with very different cost profiles: -1. **Part 1 — the reset (one coordinated cutover).** At the dev→prod cutover, swap the hashing substrate to canonical projection, declare the post-cutover projection as content-version **v1**, and spend the already-scheduled rebuild on every change that is *cheap now and a one-way door later* (the irreversible changes). Pre-reset locks already committed to **git history** stay readable and are never recomputed (the back-compat invariant below); a pre-reset lock in the **working tree** is force-rehashed to the `v1:` token on its first post-reset `update`. -2. **Part 2 — post-reset lazy migration (below).** A versioned registry + replay, now riding the *frozen* projection functions, absorbs the rare genuine algorithm change after the cutover, lazily and per-component, with no second coordinated cutover. +1. **Part 1 - the reset (one coordinated cutover).** At the dev→prod cutover, swap the hashing substrate to canonical projection, declare the post-cutover projection as content-version **v1**, and spend the already-scheduled rebuild on every change that is *cheap now and a one-way door later* (the irreversible changes). Pre-reset locks already committed to **git history** stay readable and are never recomputed (the back-compat invariant below); a pre-reset lock in the **working tree** is force-rehashed to the `v1:` token on its first post-reset `update`. +2. **Part 2 - post-reset lazy migration (below).** A versioned registry + replay, now riding the *frozen* projection functions, absorbs the rare genuine algorithm change after the cutover, lazily and per-component, with no second coordinated cutover. Part 1 cannot be made lazy: there is no way to make a substrate swap or a batch of one-way-door normalizations free, so they ride the one rebuild we are already paying for. Everything that *can* be lazy (additive fields) is pushed into Part 2 and costs nothing. -## Part 1 — The reset +## Part 1: The reset ### The projection substrate @@ -211,7 +211,7 @@ ComponentConfig ──projectV1(cfg)──► canonical bytes ──sha256── Three things this buys that `hashstructure` could not: -- **Frozen by construction.** A *superseded* `projectVN`'s body is fixed checked-in code (CI fails on any diff to it), so adding `Foo` to the struct later cannot change a historical `projectV1`'s output for an old config. (The *live* version's projector stays mutable for output-preserving additions — see [enforcement](#version-tagged-field-selection); "frozen" means `version < current`.) This is what makes Part 2's replay sound (Problem 6) and G4 true for *old* locks, not just new ones. +- **Frozen by construction.** A *superseded* `projectVN`'s body is fixed checked-in code (CI fails on any diff to it), so adding `Foo` to the struct later cannot change a historical `projectV1`'s output for an old config. (The *live* version's projector stays mutable for output-preserving additions - see [enforcement](#version-tagged-field-selection); "frozen" means `version < current`.) This is what makes Part 2's replay sound (Problem 6) and G4 true for *old* locks, not just new ones. - **No method-set / receiver magic.** No `Includable`, no per-nested-struct method, no `v.(reflect.Value)` type-assert footgun. Selection is a declarative tag the generator reads. - **Removal is a compile error; rename is byte-neutral.** A generated `projectVN` references each measured field by its literal Go path and emits a literal key, so deleting a field a retained version still measures won't compile, and renaming the Go field changes nothing. Golden vectors backstop the generator itself. @@ -219,7 +219,7 @@ The cost is owning the projection encoder and the golden vectors. That cost is p ### Version-tagged field selection -Field membership in each version's projection is declared **on the struct field**, as a version-set in the existing `fingerprint` tag. A `go generate` step reads those tags and **emits** a per-version `projectVN(cfg) []byte` function — the tags are the source of truth, the generated functions are the artifact. This is the chosen mechanism; a runtime reflective walker and hand-written functions are the [alternatives](#alternatives-considered). +Field membership in each version's projection is declared **on the struct field**, as a version-set in the existing `fingerprint` tag. A `go generate` step reads those tags and **emits** a per-version `projectVN(cfg) []byte` function - the tags are the source of truth, the generated functions are the artifact. This is the chosen mechanism; a runtime reflective walker and hand-written functions are the [alternatives](#alternatives-considered). **Grammar** (deliberately small): @@ -232,18 +232,18 @@ version = "v", digit, { digit } ; | Tag | Meaning | | --- | --- | -| *(absent)* | **build failure** — every fingerprinted field must carry an explicit decision | +| *(absent)* | **build failure** - every fingerprinted field must carry an explicit decision | | `-` | never measured (unchanged from today) | -| `v1..*` | measured from v1 onward, omit-if-zero — the common "active field" case | -| `v1..v4` | measured v1–v4, then dropped | +| `v1..*` | measured from v1 onward, omit-if-zero - the common "active field" case | +| `v1..v4` | measured v1-v4, then dropped | | `v3..*` | introduced at v3 | -| `v1..v4,v6..*` | measured v1–v4, **dropped at v5, brought back at v6** | +| `v1..v4,v6..*` | measured v1-v4, **dropped at v5, brought back at v6** | | `!v1..*` | measured v1 onward, **always-emit** (zero value still hashes) | -| `v1..v4,!v5..*` | omit-if-zero v1–v4, then **always-emit from v5** (the temporal toggle) | +| `v1..v4,!v5..*` | omit-if-zero v1-v4, then **always-emit from v5** (the temporal toggle) | -`*` resolves to "this version and every later one," so an *active* field never needs a tag edit across a version bump — only a field that is *dropped* at the bump gets its range closed (`v1..*` → `v1..vN`). +`*` resolves to "this version and every later one," so an *active* field never needs a tag edit across a version bump - only a field that is *dropped* at the bump gets its range closed (`v1..*` → `v1..vN`). -**The emit-key is the field's TOML key, frozen — never the Go identifier.** The generated function emits each field under a stable string key and sorts by it; that key is the field's **`toml:` name**, or — for a field with no usable TOML key — an explicit `key=` member in the `fingerprint` tag (grammar below). The generator emits it as a literal, so it is pinned as part of the frozen output. It is deliberately *not* the Go field name, so a cosmetic Go rename (`Foo`→`Bar`, same TOML key, same tag) is byte-neutral — making the [struct-rename drift-neutral claim](#config-schema-version-and-canonical-migration-future) true at the *field* level too, not just the type level. Renaming the *emit-key* is an output-changing edit and therefore a version bump like any other. **Duplicate emit-keys within one retained version fail generation** (two fields resolving to the same key would collide and alias — a silent G5 hazard), so the generator checks key uniqueness at every retained version. +**The emit-key is the field's TOML key, frozen - never the Go identifier.** The generated function emits each field under a stable string key and sorts by it; that key is the field's **`toml:` name**, or - for a field with no usable TOML key - an explicit `key=` member in the `fingerprint` tag (grammar below). The generator emits it as a literal, so it is pinned as part of the frozen output. It is deliberately *not* the Go field name, so a cosmetic Go rename (`Foo`→`Bar`, same TOML key, same tag) is byte-neutral - making the [struct-rename drift-neutral claim](#config-schema-version-and-canonical-migration-future) true at the *field* level too, not just the type level. Renaming the *emit-key* is an output-changing edit and therefore a version bump like any other. **Duplicate emit-keys within one retained version fail generation** (two fields resolving to the same key would collide and alias - a silent G5 hazard), so the generator checks key uniqueness at every retained version. **The grammar is frozen at three range-operators** (`..` range, `!` always-emit, `*` open-end), plus the orthogonal `key=` override: @@ -256,54 +256,54 @@ range = version, [ "..", ( version | "*" ) ] ; version = "v", digit, { digit } ; ``` -This mirrors protobuf's `reserved` field-range discipline. Adding a fourth *range*-operator is an RFC-grade change, not a tag edit — cheap insurance against a bespoke mini-language accreting edge cases. +This mirrors protobuf's `reserved` field-range discipline. Adding a fourth *range*-operator is an RFC-grade change, not a tag edit - cheap insurance against a bespoke mini-language accreting edge cases. **Recovery is the property that justifies the range syntax.** The hard requirement: if we drop a field, then versions later realize we need it again, we must be able to bring it back *without* disturbing any frozen historical hash. The rule that guarantees it: **you only ever *add* a range for the *new* version; you never edit a shipped version's membership.** Walk it: - `Foo` tagged `v1..*`. At the v2 bump we drop it → edit to `v1..v1`. v1 still emits `Foo`; v2+ does not. -- At v5 we need it back → edit to `v1..v1,v5..*`. **v1's membership is unchanged (still in the set), v2–v4 unchanged (still out), only v5+ is added.** +- At v5 we need it back → edit to `v1..v1,v5..*`. **v1's membership is unchanged (still in the set), v2-v4 unchanged (still out), only v5+ is added.** -Every frozen output is byte-preserved, and the **golden vectors prove it**: the edit `v1..v1` → `v1..v1,v5..*` must leave the v1–v4 vectors identical or CI fails. The grammar lets you *express* the non-contiguous set; the golden vectors *forbid* rewriting history while doing so. Two recovery flavors, both covered: (a) field still on the struct (lingering for replay) → reopen its range; (b) field already physically deleted (floor passed it) → bring-back is just a fresh additive field tagged `vN..*`. Same outcome, no special case. +Every frozen output is byte-preserved, and the **golden vectors prove it**: the edit `v1..v1` → `v1..v1,v5..*` must leave the v1-v4 vectors identical or CI fails. The grammar lets you *express* the non-contiguous set; the golden vectors *forbid* rewriting history while doing so. Two recovery flavors, both covered: (a) field still on the struct (lingering for replay) → reopen its range; (b) field already physically deleted (floor passed it) → bring-back is just a fresh additive field tagged `vN..*`. Same outcome, no special case. -**Always-emit is per-range, for the same reason.** Whether a field's *zero value emits* can change over time just as its membership can — so `!` flags an individual range, not the whole field. `v1..v4,!v5..*` means omit-if-zero through v4, then always-emit from v5. Toggling it is an *output-changing* edit (a zero-valued field starts or stops emitting), so it lands as a new range at a new version exactly like a drop/re-add — same output-preservation rule, same golden-vector backstop. The generator simply emits, for each version, whether that field's range is `!`. A whole-field always-flag could not express this temporal toggle. +**Always-emit is per-range, for the same reason.** Whether a field's *zero value emits* can change over time just as its membership can - so `!` flags an individual range, not the whole field. `v1..v4,!v5..*` means omit-if-zero through v4, then always-emit from v5. Toggling it is an *output-changing* edit (a zero-valued field starts or stops emitting), so it lands as a new range at a new version exactly like a drop/re-add - same output-preservation rule, same golden-vector backstop. The generator simply emits, for each version, whether that field's range is `!`. A whole-field always-flag could not express this temporal toggle. -**What tags version, and what they don't.** Tags version *membership* — which fields a version measures. They do **not** version *encoding* — how a field's bytes are formed, or how the combiner folds non-field inputs. So a pure tag edit (additive / removal / bring-back) regenerates with no hand-written code, while a genuine encoding or combiner change still ships as versioned code in `computeFP(N+1)` (the projection output + the combiner step frozen at N). The taxonomy's non-additive rows are exactly that small set. +**What tags version, and what they don't.** Tags version *membership* - which fields a version measures. They do **not** version *encoding* - how a field's bytes are formed, or how the combiner folds non-field inputs. So a pure tag edit (additive / removal / bring-back) regenerates with no hand-written code, while a genuine encoding or combiner change still ships as versioned code in `computeFP(N+1)` (the projection output + the combiner step frozen at N). The taxonomy's non-additive rows are exactly that small set. -**Escape hatch — the registry already is one; a per-field one is deferred.** The whole-function hatch is free: the registry is `map[int]computeFn` and does not care whether an entry was generated or hand-written, so a version the generator cannot express is simply *not generated* — you drop a hand-written `computeFPN` into the map instead. No new mechanism. A *per-field* hatch (massaging one field's encoding inside an otherwise-generated function — e.g. `fingerprint:"v1..*,enc=sortedSlice"`) is **deliberately not built now**: custom encoding is an *encoding* concern, which the rule above already routes through a versioned-code bump, and adding an `enc=` operator is an RFC-grade grammar change (the grammar is frozen at three range-operators + `key=`). Note the cost either way: a hand-written or hand-edited version drops back to **golden-vectors-only** — it loses regeneration-idempotence and the generator's completeness/coverage guards — so the hatch is for rare, deliberate cases, not routine use. +**Escape hatch - the registry already is one; a per-field one is deferred.** The whole-function hatch is free: the registry is `map[int]computeFn` and does not care whether an entry was generated or hand-written, so a version the generator cannot express is simply *not generated* - you drop a hand-written `computeFPN` into the map instead. No new mechanism. A *per-field* hatch (massaging one field's encoding inside an otherwise-generated function - e.g. `fingerprint:"v1..*,enc=sortedSlice"`) is **deliberately not built now**: custom encoding is an *encoding* concern, which the rule above already routes through a versioned-code bump, and adding an `enc=` operator is an RFC-grade grammar change (the grammar is frozen at three range-operators + `key=`). Note the cost either way: a hand-written or hand-edited version drops back to **golden-vectors-only** - it loses regeneration-idempotence and the generator's completeness/coverage guards - so the hatch is for rare, deliberate cases, not routine use. -> **What codegen freezes structurally, and what golden vectors backstop.** Generation reflects the live struct — but at *build time*, and its output is **frozen checked-in code**, so the runtime projection never reflects the live struct (the way the rejected `hashstructure` substrate did at hash time, Problem 6). Three things the tag alone does not pin are pinned by the generated code instead: the **emit-key** (a literal string), **field membership** (a literal field list per version), and **field removal** (a retained `projectVN` references the field by Go path, so deleting it won't compile). Two things the compiler cannot independently judge — the **per-field encoding/type** (whether the emitted bytes are *right*) and the **zero-predicate** — are caught by regeneration-idempotence (CI runs `go generate`; any diff to a retained `projectVN` fails) and ultimately by **golden vectors**, which catch a generator bug that would move a shipped version's bytes. Golden-vector coverage is therefore a *backstop* behind compiler + generator, not the sole load-bearing guarantee — the design keeps its structural guarantees structural. +> **What codegen freezes structurally, and what golden vectors backstop.** Generation reflects the live struct - but at *build time*, and its output is **frozen checked-in code**, so the runtime projection never reflects the live struct (the way the rejected `hashstructure` substrate did at hash time, Problem 6). Three things the tag alone does not pin are pinned by the generated code instead: the **emit-key** (a literal string), **field membership** (a literal field list per version), and **field removal** (a retained `projectVN` references the field by Go path, so deleting it won't compile). Two things the compiler cannot independently judge - the **per-field encoding/type** (whether the emitted bytes are *right*) and the **zero-predicate** - are caught by regeneration-idempotence (CI runs `go generate`; any diff to a retained `projectVN` fails) and ultimately by **golden vectors**, which catch a generator bug that would move a shipped version's bytes. Golden-vector coverage is therefore a *backstop* behind compiler + generator, not the sole load-bearing guarantee - the design keeps its structural guarantees structural. **Enforcement**, in order of strength: 1. **Compiler.** A generated `projectVN` references each measured field by literal Go path and emits a literal key: deleting a field a retained version measures won't compile, and the key cannot silently drift to the Go identifier. -2. **Generator (generate-time).** The generator **enumerates every field reachable from a fingerprinted root, recursing by field type** — this walk is the completeness guard, and it auto-discovers nested structs, replacing today's hand-maintained `fingerprintedStructs` list (which can silently go stale when a new nested type is added). It then refuses to emit on: a reached field with **no tag** (the include/exclude decision is mandatory — the generator must *fail* on an unrecognized field, never silently skip it, or a forgotten field drops out of the hash, a G5 hazard); a malformed, future-referencing, overlapping, or key-colliding tag set; or a `-`-tagged field absent from the **exclusion ledger** (an enumerated list — the surviving half of today's `expectedExclusions` — naming every `-` field with a justification, so an accidental exclusion fails generation). It also enforces the [coverage oracle](#golden-vector-coverage-the-backstop). -3. **Regeneration-idempotence.** CI runs `go generate` and fails on any diff to a **strictly-historical** `projectVN` (`version < currentLockContentVersion`) — a *superseded* version's emitted code cannot change without an intentional, diff-surfaced regeneration. The **live** version's `projectVN` is deliberately *mutable*: an output-preserving omit-if-zero addition regenerates it (a new `b.emit` line) and that diff is expected, not a violation. "Frozen" throughout this RFC means **superseded** (`version < current`), not "shipped" — a version freezes when the next one is registered, not the moment it first ships. -4. **Golden vectors** — the semantic backstop (next). +2. **Generator (generate-time).** The generator **enumerates every field reachable from a fingerprinted root, recursing by field type** - this walk is the completeness guard, and it auto-discovers nested structs, replacing today's hand-maintained `fingerprintedStructs` list (which can silently go stale when a new nested type is added). It then refuses to emit on: a reached field with **no tag** (the include/exclude decision is mandatory - the generator must *fail* on an unrecognized field, never silently skip it, or a forgotten field drops out of the hash, a G5 hazard); a malformed, future-referencing, overlapping, or key-colliding tag set; or a `-`-tagged field absent from the **exclusion ledger** (an enumerated list - the surviving half of today's `expectedExclusions` - naming every `-` field with a justification, so an accidental exclusion fails generation). It also enforces the [coverage oracle](#golden-vector-coverage-the-backstop). +3. **Regeneration-idempotence.** CI runs `go generate` and fails on any diff to a **strictly-historical** `projectVN` (`version < currentLockContentVersion`) - a *superseded* version's emitted code cannot change without an intentional, diff-surfaced regeneration. The **live** version's `projectVN` is deliberately *mutable*: an output-preserving omit-if-zero addition regenerates it (a new `b.emit` line) and that diff is expected, not a violation. "Frozen" throughout this RFC means **superseded** (`version < current`), not "shipped" - a version freezes when the next one is registered, not the moment it first ships. +4. **Golden vectors** - the semantic backstop (next). #### Golden-vector coverage: the backstop Compiler + generator + regeneration-idempotence carry the structural load; golden vectors are the semantic backstop that catches a *generator* bug moving a shipped version's bytes. Two properties make the backstop structural rather than discipline: - **Expected digests are hand-authored and never generator-emitted.** If the `(config, version) → digest` table were regenerated in lockstep with the projector code, a "delete-everything-and-regenerate" commit would move *both* the code and its own expected values, and the backstop would silently agree with itself. The expected digests are therefore hand-committed; `go generate` may scaffold *cases* but must never write the expected values. A mutation to any retained vector is a hard CI failure, not a moved line a reviewer must notice. -- **Retained-version manifest.** The generator validates each retained version against a checked-in manifest (the field set + emit-keys that version measures); generation fails if a retained version's entry lacks a compatible live path, *unless* it is below `minSupportedLockContentVersion`. This is what keeps field-removal structural under the normal *delete + regenerate + commit* workflow (where the compile guard alone would be bypassed) — so the negative test is **delete + regenerate + build**, not just delete. +- **Retained-version manifest.** The generator validates each retained version against a checked-in manifest (the field set + emit-keys that version measures); generation fails if a retained version's entry lacks a compatible live path, *unless* it is below `minSupportedLockContentVersion`. This is what keeps field-removal structural under the normal *delete + regenerate + commit* workflow (where the compile guard alone would be bypassed) - so the negative test is **delete + regenerate + build**, not just delete. The backstop is only as good as its corpus, so the corpus must exercise every field of every retained version: > **Coverage invariant:** every field whose tag set includes a retained version MUST appear, **non-zero**, in at least one retained golden vector, with a discrimination check that varies it and asserts the version's hash *moves*. A field never exercised non-zero is invisible to the backstop. -**The coverage oracle must be independent of the tag** (enforced by the generator). If the discrimination check derived *which versions to test* from the same range tag it polices, a wrong-narrow tag would silence its own check: a field build-effective today but mistagged `v1..v1` (while `currentLockContentVersion` is `v2`) would tell a tag-derived check "only check v1," so the promised v2 check never runs and the field silently drops out of the current hash (G5). The fix: **every fingerprinted field is expected to be measured-and-discriminating at the *current* version unless it appears in a reviewed *dropped-fields ledger*** (sibling to the `-` exclusion ledger, naming each field intentionally closed at version N). The oracle reads that ledger by struct-reflection, *not* the range tag — so a range that excludes the current version without a matching ledger entry fails generation. (PR-A acceptance criterion, with a unit test that the oracle catches the narrow-tag hole.) +**The coverage oracle must be independent of the tag** (enforced by the generator). If the discrimination check derived *which versions to test* from the same range tag it polices, a wrong-narrow tag would silence its own check: a field build-effective today but mistagged `v1..v1` (while `currentLockContentVersion` is `v2`) would tell a tag-derived check "only check v1," so the promised v2 check never runs and the field silently drops out of the current hash (G5). The fix: **every fingerprinted field is expected to be measured-and-discriminating at the *current* version unless it appears in a reviewed *dropped-fields ledger*** (sibling to the `-` exclusion ledger, naming each field intentionally closed at version N). The oracle reads that ledger by struct-reflection, *not* the range tag - so a range that excludes the current version without a matching ledger entry fails generation. (PR-A acceptance criterion, with a unit test that the oracle catches the narrow-tag hole.) Non-zero coverage alone is necessary but not sufficient; four more obligations close holes a single `"foo"`-valued vector leaves open, each a generator responsibility: - **`!`-zero behavior.** Dropping `!` silently stops emitting a build-meaningful zero (G5). Every retained `!` range needs a **zero-valued** discrimination vector. - **Encoding across the value space.** A `"foo"` vector misses an encoder change affecting only delimiter bytes, multibyte runes, or multi-entry slices/maps. Add per-encoder **property/fuzz** vectors. (Fails toward G1 over-drift except under a collision the length-prefixed form makes unlikely.) -- **nil-vs-empty is a resolver invariant — slices *and* maps.** Under `IsZero()` a nil slice/map omits and a non-nil empty one (`[]`, `{}`) emits, and `mergo.Merge` can yield either for the same intent. This is the one correctness assumption the design *preserves rather than proves*: \"resolution normalizes nil-vs-empty to one canonical form\" is a *distributed* property across every merge site. PR A must (a) name the **single resolver chokepoint** that owns canonicalization (so it is one enforced place, not a convention scattered across call sites), (b) **inventory** every fingerprint-sensitive slice/map field, and (c) write the resolver-side canonical-form test **first** — before the projection lands, or the over-drift is latent. (Verify the current resolver's behavior here is not yet confirmed to canonicalize.) +- **nil-vs-empty is a resolver invariant - slices *and* maps.** Under `IsZero()` a nil slice/map omits and a non-nil empty one (`[]`, `{}`) emits, and the resolver's `mergo.Merge(…, WithOverride, WithAppendSlice)` ([`component.go`](../../../internal/projectconfig/component.go) `MergeUpdatesFrom`, `ResolveComponentConfig`) can yield *either* for the same intent depending on merge order, with **no post-merge normalization today**. This is the one correctness assumption the design *preserves rather than proves*. PR A closes it structurally: (a) a single named chokepoint - `canonicalizeForFingerprint(cfg)` at the **end of `ResolveComponentConfig`** - owns nil-vs-empty normalization, so it is one enforced place, not a convention scattered across merge sites; (b) it carries an **inventory** of every fingerprint-sensitive slice/map field; and (c) its canonical-form test is written **first - before any golden vector is authored**, or a vector bakes in a non-deterministic encoding. (This and the scalar-slice row of the encoding table are the same question - settle them together.) This is the single most load-bearing PR-A gate. - **`!` on an all-zero nested struct emits.** `IsZero()` on a struct is true iff every sub-field is zero, so a `!`-tagged nested struct whose fields all resolve to zero would otherwise be omitted; the generated sub-projector treats a `!` range as "emit the (recursively projected) value even when the struct `IsZero`," so a build-meaningful all-zero struct still hashes. Covered by a zero-valued discrimination vector like any other `!` range. -- **Enumerator completeness.** The coverage corpus is checked against the **generator's own field enumeration** (above), so it cannot drift from what is measured — a newly-added field or nested struct that the generator reaches but the corpus does not cover fails the backstop, not just the generator. +- **Enumerator completeness.** The coverage corpus is checked against the **generator's own field enumeration** (above), so it cannot drift from what is measured - a newly-added field or nested struct that the generator reaches but the corpus does not cover fails the backstop, not just the generator. -### Baseline v1 — omit-if-zero, no include-always legacy +### Baseline v1: omit-if-zero, no include-always legacy -Because the reset rebuilds everything, there is **no pre-existing population to stay byte-compatible with.** That removes the single biggest constraint of the incremental plan: we do **not** need an `include-always` compatibility mode to preserve today's hashes. `projectV1` is the omit-if-zero projection from day one. There is no `computeFP1 = legacy include-always` entry to carry forever — the registry's floor *starts* at the clean projection. +Because the reset rebuilds everything, there is **no pre-existing population to stay byte-compatible with.** That removes the single biggest constraint of the incremental plan: we do **not** need an `include-always` compatibility mode to preserve today's hashes. `projectV1` is the omit-if-zero projection from day one. There is no `computeFP1 = legacy include-always` entry to carry forever - the registry's floor *starts* at the clean projection. ```go // You write tags; a `go generate` step emits the per-version projection. @@ -316,7 +316,7 @@ type ComponentConfig struct { // … every fingerprinted field carries an explicit tag; absent ⇒ generation fails … } -// GENERATED — do not edit. The body is a literal field list, so deleting a +// GENERATED - do not edit. The body is a literal field list, so deleting a // field it names won't compile, and the emit-key is a literal string. func projectV1(c *ComponentConfig) []byte { var b canonicalBuf @@ -328,45 +328,63 @@ func projectV1(c *ComponentConfig) []byte { } ``` -**The generated encoding contract — frozen per version, fully specified before PR A.** The golden vectors bake the byte encoding in irreversibly at the reset, so every value type's serialization is a one-way door and must be pinned now, not discovered later. The contract: +**The generated encoding contract - frozen per version, fully specified before PR A.** The golden vectors bake the byte encoding in irreversibly at the reset, so every value type's serialization is a one-way door and must be pinned now, not discovered later. The contract: -- **Composite omission is by *projected* emptiness, not raw `IsZero()`.** A nested struct or a map entry is `reflect.Value.IsZero()` only when **every** sub-field is zero — *including excluded (`fingerprint:"-"`) children* — so a global `IsZero()` predicate would leak: a measured composite whose only non-zero content is an excluded child is not `IsZero`, so it would emit, and the digest would move on a change that touched no *measured* input. (Real case: `ComponentConfig.Build` is measured but holds the excluded `Failure`/`Hints`; setting only `build.failure.expected` makes `Build` raw-non-zero while every measured sub-field is zero → false lock drift + a phantom release.) The rule: a composite is omitted when its **frozen sub-projector emits no measured bytes** (unless tagged `!`), *not* when the raw value `IsZero`. Scalars keep plain `IsZero()`. The coverage backstop gains the **inverse** check it was missing: a **negative discrimination vector** per `-` field (and per all-`-`-value map entry) that varies that field alone and asserts the digest does **not** move. -- **Maps emit in sorted-key order; an entry whose value projects empty still emits its key.** A naive `range` over a Go map is **non-deterministic** (randomized iteration), so a generated `b.emitMap` must sort entries by key and emit each as `:=:` under the field key — the one guarantee `hashstructure` gave for free that the projection must re-establish, else an unchanged config hashes differently across runs (intermittent spurious drift). **Map-key membership is itself measured:** an entry whose *value* projects to empty still emits its key, so `{"baz":{}}` ≠ `{}` (matching today's `hashstructure`, which hashes map keys). Tests: a fuzz vector (≥2 keys, varying insert order → identical digest) **and** a key-varying vector (add an empty-value entry → digest moves). The natural "set a non-zero value" vector would *not* exercise this, so it must be written explicitly. -- **Value-slot encoding is defined per type, not left to `%v`.** `bool` → `"true"`/`"false"`; integers → base-10; `[]T` → each element as its own length-prefixed sub-value in slice order (not a JSON blob); `map` → as above. A **named scalar type** (e.g. `fileutils.HashType`, `SpecSourceType`, `ComponentOverlayType`, `ReleaseCalculation`) encodes by its **underlying `reflect.Kind`** (named string/int/bool → the underlying kind) — these are measured fields, so they must *not* fail generation. Only genuinely un-encodable shapes (interfaces, generics, pointers to external types, `time.Time`/`[]byte`-style special-cases not present in today's measured graph) **fail generation** rather than fall back to a `fmt`-style encoding a dependency could change underneath us. The v1 encoding test enumerates every named scalar in the measured graph. -- **Nested struct values are emitted by a frozen per-version sub-projector, never by runtime reflection.** If a generated `projectV1` emitted a `[]ComponentOverlay` by delegating to a *live* reflective encoder, adding a field to `ComponentOverlay` later would change `projectV1`'s output at hash time — Problem 6 reborn one layer down. The generator therefore emits a literal per-version projector for each nested struct type too; element/value projectors are frozen exactly like top-level ones. -- **Recursion prunes at `fingerprint:"-"`, per edge.** The completeness walk descends only through *measured* fields and only into struct kinds (through slice-element, map-value, and pointer-element types), treating defined scalar types as leaves. A `-` tag stops the walk at that **edge** (the field isn't measured, so its subtree isn't either) — not at the type (the same type reached through an *included* field elsewhere is still enumerated there). This breaks the real `ComponentConfig → SourceConfigFile → … → ComponentConfig` cycle (both back-edges are already `-`), and a **visited-type memo** on the included graph guards against any future included-path cycle. An untagged field reached on an *included* edge fails generation (the mandatory-decision guard); fields under a `-` edge are never reached, so they need no tag. +- **Composite omission is by *projected* emptiness, not raw `IsZero()`.** A nested struct or a map/slice-of-struct entry is `reflect.Value.IsZero()` only when **every** sub-field is zero - *including excluded (`fingerprint:"-"`) children* - so a global `IsZero()` predicate would leak: a measured composite whose only non-zero content is an excluded child is not `IsZero`, so it would emit, and the digest would move on a change that touched no *measured* input. **This is a hazard the generator must design out, not a current bug:** today `ComponentConfig.Build` holds only the already-excluded `Failure`/`Hints` (both `fingerprint:"-"`), and today's `hashstructure` drops them *in its own walk*, so setting `build.failure.expected` moves no hash today - the leak appears only if the new projector naively reflects the parent through one global `IsZero()`. The rule that forecloses it: a composite is omitted when its **frozen sub-projector emits no measured bytes** (unless tagged `!`), *not* when the raw value `IsZero`. So the predicate **splits by kind: scalar leaves (including scalar slices like `[]string`) keep plain `IsZero()`; composites (nested struct, map, slice-of-struct) use projected emptiness** - see the **v1 encoding table** below for the per-type rule, including where slices and maps fall. The coverage backstop gains the **inverse** check it was missing: a **negative discrimination vector** per `-` field (and per all-`-`-value map entry) that varies that field alone and asserts the digest does **not** move. +- **Maps emit in sorted-key order; an entry whose value projects empty still emits its key.** A naive `range` over a Go map is **non-deterministic** (randomized iteration), so a generated `b.emitMap` must sort entries by key and emit each as `:=:` under the field key - the one guarantee `hashstructure` gave for free that the projection must re-establish, else an unchanged config hashes differently across runs (intermittent spurious drift). **Map-key membership is itself measured:** an entry whose *value* projects to empty still emits its key, so `{"baz":{}}` ≠ `{}` (matching today's `hashstructure`, which hashes map keys). Tests: a fuzz vector (≥2 keys, varying insert order → identical digest) **and** a key-varying vector (add an empty-value entry → digest moves). The natural "set a non-zero value" vector would *not* exercise this, so it must be written explicitly. +- **Value-slot encoding is defined per type, not left to `%v`.** `bool` → `"true"`/`"false"`; integers → base-10; `[]T` → each element as its own length-prefixed sub-value in slice order (not a JSON blob); `map` → as above. A **named scalar type** (e.g. `fileutils.HashType`, `SpecSourceType`, `ComponentOverlayType`, `ReleaseCalculation`) encodes by its **underlying `reflect.Kind`** (named string/int/bool → the underlying kind) - these are measured fields, so they must *not* fail generation. Only genuinely un-encodable shapes (interfaces, generics, pointers to external types, `time.Time`/`[]byte`-style special-cases not present in today's measured graph) **fail generation** rather than fall back to a `fmt`-style encoding a dependency could change underneath us. The v1 encoding test enumerates every named scalar in the measured graph. +- **Nested struct values are emitted by a frozen per-version sub-projector, never by runtime reflection.** If a generated `projectV1` emitted a `[]ComponentOverlay` by delegating to a *live* reflective encoder, adding a field to `ComponentOverlay` later would change `projectV1`'s output at hash time - Problem 6 reborn one layer down. The generator therefore emits a literal per-version projector for each nested struct type too; element/value projectors are frozen exactly like top-level ones. +- **Recursion prunes at `fingerprint:"-"`, per edge.** The completeness walk descends only through *measured* fields and only into struct kinds (through slice-element, map-value, and pointer-element types), treating defined scalar types as leaves. A `-` tag stops the walk at that **edge** (the field isn't measured, so its subtree isn't either) - not at the type (the same type reached through an *included* field elsewhere is still enumerated there). This breaks the real `ComponentConfig → SourceConfigFile → … → ComponentConfig` cycle (both back-edges are already `-`), and a **visited-type memo** on the included graph guards against any future included-path cycle. An untagged field reached on an *included* edge fails generation (the mandatory-decision guard); fields under a `-` edge are never reached, so they need no tag. -**Why omit-if-zero is safe — fingerprints see the resolved config.** The usual objection to blanket omit-if-zero is the false-negative footgun: a field whose zero is meaningful gets omitted and collides with "unset," so two semantically different configs hash the same and a rebuild is missed. That objection assumes we hash *raw user input*. We do not. `ComputeIdentity` runs on the **resolved, post-merge** config (`*result.config`, after defaults are applied). The omit predicate is therefore "the *resolved value* equals Go-zero," not "the user didn't type it." Consequences: +**v1 encoding table (normative - every measured kind, pinned irreversibly at the reset).** This is the single source of truth the prose above points at; the golden vectors bake it in. + +| Go kind (v1 examples) | Encoded as | Omitted when | Notes | +| --- | --- | --- | --- | +| `string` (`upstream`) | raw bytes | `IsZero` (`""`) | scalar leaf | +| `bool` (`strip-debug`) | `true` / `false` | `IsZero` (`false`), **unless `!`** | `!` emits the build-meaningful `false` | +| `int` (`manual-bump`) | base-10 | `IsZero` (`0`) | scalar leaf | +| named scalar (`SpecSourceType`, `fileutils.HashType`, `ComponentOverlayType`, `ReleaseCalculation`) | by **underlying `reflect.Kind`** | underlying `IsZero` | measured - must **not** fail generation | +| `[]string` (`patches`) | length-prefixed elements, slice order | nil **or** empty | **scalar slice**: membership measured; nil≡`[]` collapsed by the resolver chokepoint (below), then pinned by a golden vector | +| `[]Struct` (`[]ComponentOverlay`) | each element via its frozen sub-projector | no element projects bytes | **composite slice**: each element kept/dropped by *its own* projected emptiness | +| `map[string]string` (`defines`) | sorted-key `:k=:v` | no entries | key membership measured (`{"k":""}` ≠ `{}`) | +| `map[string]Struct` (`map[string]PackageConfig`) | sorted-key; value via frozen sub-projector | no entries | **excluded `-` in v1**; if ever included, this row governs | +| nested struct (`build`, `spec`) | frozen per-version sub-projector | sub-projector emits no measured bytes, **unless `!`** | **projected** emptiness, not raw `IsZero` | +| `*Struct` | follow if non-nil; element via sub-projector | nil, or points to a projected-empty value | | +| interface / type param / `func` / `chan` / pointer-to-external / `time.Time`·`[]byte`-style | - | - | **fails generation** (no silent `fmt` fallback); none present in the v1 graph | + +**Cost of pruning at `-`, and its tripwire (G5 guard).** Excluding a composite also removes its subtree from the completeness walk, so a *future* build-effective field added under an excluded type would be **silently unmeasured**. `Packages map[string]PackageConfig` is excluded today because `PackageConfig` holds only the publish-only, `-`-tagged `Publish` - correct now, but `PackageConfig` is documented as growable. You cannot both kill the key-churn (needs excluding the map) *and* keep the per-leaf guard alive (needs an included edge), so the exclusion carries an **external tripwire**: a CI test asserting the excluded type's field set stays within its known-inert set (a new `PackageConfig` field fails CI → forces re-evaluation of the parent exclusion), recorded against its exclusion-ledger entry. + +**Why omit-if-zero is safe - fingerprints see the resolved config.** The usual objection to blanket omit-if-zero is the false-negative footgun: a field whose zero is meaningful gets omitted and collides with "unset," so two semantically different configs hash the same and a rebuild is missed. That objection assumes we hash *raw user input*. We do not. `ComputeIdentity` runs on the **resolved, post-merge** config (`*result.config`, after defaults are applied). The omit predicate is therefore "the *resolved value* equals Go-zero," not "the user didn't type it." Consequences: - Two configs that both resolve a field to zero build identically → hashing them the same is **correct**, not a collision. -- "Unset" never reaches the hasher — it has already been resolved to its default. If the default is non-zero, the field is non-zero and is emitted anyway. If the default *is* zero, then unset and explicit-zero resolve identically → same build → same hash → correct. +- "Unset" never reaches the hasher - it has already been resolved to its default. If the default is non-zero, the field is non-zero and is emitted anyway. If the default *is* zero, then unset and explicit-zero resolve identically → same build → same hash → correct. So the classic false-negative requires absence ≠ zero-default *at the point of hashing*, and post-merge resolution closes that gap. The load-bearing invariant is **G5's guarantee restated structurally: the fingerprint must see exactly the build-effective resolved config.** That invariant must already hold, or fingerprinting is broken independently of this change. A `!`-prefixed range is the escape hatch for the rare field whose zero value is build-meaningful. -**Result:** additive fields are drift-neutral **by construction** (G4) — a newly added field, listed omit-if-zero in `projectVN`, emits nothing for any component that does not set it, so it is invisible to every lock that leaves it unset, old or new. Adding it moves no existing hash (no shipped lock could have set a field that did not yet exist), so it needs no version bump. Only setters drift (G2). +**Result:** additive fields are drift-neutral **by construction** (G4) - a newly added field, listed omit-if-zero in `projectVN`, emits nothing for any component that does not set it, so it is invisible to every lock that leaves it unset, old or new. Adding it moves no existing hash (no shipped lock could have set a field that did not yet exist), so it needs no version bump. Only setters drift (G2). #### Edge cases under omit-if-zero -The omit predicate is **`reflect.Value.IsZero()`**, one global rule for every field (resolving former Open Q#3); `!` is the only per-field override. The consequences need stating because `IsZero` is type-specific: +The omit predicate **splits by kind** - scalar leaves use `reflect.Value.IsZero()`, composites (nested struct, map, slice-of-struct) use projected emptiness (the encoding contract above is the single source of truth); `!` is the only per-field override. A few `IsZero` consequences on the **scalar leaves** still need stating, because `IsZero` is type-specific: -- **Meaningful zero with a non-zero default** (e.g. `int Jobs` defaulting to `4`, where `0` means serial). Post-merge: unset → `4` (emitted), explicit `0` → omitted. These build differently *and* hash differently, so there is no collision — they are consistent. Use a `!` range only if a zero value must be distinguishable from a future change of default. -- **nil vs empty slice — they hash *differently* under `IsZero`.** A nil slice is zero → omitted; a non-nil empty slice (`[]`) is **not** zero → emitted. If post-merge resolution can produce *either* nil or `[]` for the same intent, that ambiguity would move a hash — so the rule is: **resolution must normalize to one canonical form**, and where an explicit-empty value is build-meaningful and reachable, tag the field `!` so nil and empty both emit and stay distinguishable. This is a constraint on the resolver, pinned by a golden vector, not a free-for-all. +- **Meaningful zero with a non-zero default** (e.g. `int Jobs` defaulting to `4`, where `0` means serial). Post-merge: unset → `4` (emitted), explicit `0` → omitted. These build differently *and* hash differently, so there is no collision - they are consistent. Use a `!` range only if a zero value must be distinguishable from a future change of default. +- **nil vs empty slice - they hash *differently* under `IsZero`.** A nil slice is zero → omitted; a non-nil empty slice (`[]`) is **not** zero → emitted. If post-merge resolution can produce *either* nil or `[]` for the same intent, that ambiguity would move a hash - so the rule is: **resolution must normalize to one canonical form**, and where an explicit-empty value is build-meaningful and reachable, tag the field `!` so nil and empty both emit and stay distinguishable. This is a constraint on the resolver, pinned by a golden vector, not a free-for-all. -### The reset load-out — what to spend the free rebuild on +### The reset load-out: what to spend the free rebuild on The reset rebuild is a budget. Spend it on the irreversible / cutover-only changes; **do not** spend it on anything Part 2 can do lazily for free. Priority order: 1. **Switch the substrate to canonical projection.** Foundational, one-way, enables everything else. (Above.) 2. **Establish `projectV1` as omit-if-zero with no include-always legacy.** The compatibility mode never enters the registry, so it never has to age out. -3. **Keep the lock *format* `Version` at `1` — the content-version token carries the reset.** The reset adds **no new TOML field** (the atomic token in item 4 reuses `InputFingerprint`) and touches **no** pinning field (`upstream-commit`, `import-commit`, `manual-bump`), so an old binary still parses a reset lock and reads everything it needs to *queue a build*. The substrate swap rides entirely on the content-version machinery (Part 2): pre-reset locks carry a legacy (prefix-less) token below the registry floor, and the reset is simply the **first forced upgrade** of the fleet to the `v1:` token. This also makes the one real mixed-toolchain risk self-correcting: if an old binary ever rewrites a reset lock with its legacy-substrate hash, the next new-binary run sees a sub-floor token and **force-rehashes** it back to `v1` — a clean forced upgrade, never silent corruption (next subsection). +3. **Keep the lock *format* `Version` at `1` - the content-version token carries the reset.** The reset adds **no new TOML field** (the atomic token in item 4 reuses `InputFingerprint`) and touches **no** pinning field (`upstream-commit`, `import-commit`, `manual-bump`), so an old binary still parses a reset lock and reads everything it needs to *queue a build*. The substrate swap rides entirely on the content-version machinery (Part 2): pre-reset locks carry a legacy (prefix-less) token below the registry floor, and the reset is simply the **first forced upgrade** of the fleet to the `v1:` token. This also makes the one real mixed-toolchain risk self-correcting: if an old binary ever rewrites a reset lock with its legacy-substrate hash, the next new-binary run sees a sub-floor token and **force-rehashes** it back to `v1` - a clean forced upgrade, never silent corruption (next subsection). 4. **Adopt an atomic, self-describing `v1:sha256:…` token** for the stored hash, so the version and the digest can never desync (closes the re-stamp/desync class of bug where the version field and the hash field are written independently). 5. **Unify on `sha256` everywhere**, retiring the `uint64`→decimal-string wart from the `hashstructure` era. One hash format, one encoding. 6. **Do every pending rename / default-normalization now.** Renaming a field, moving content between structs, or changing a baked-in default is a one-way door under Part 2 (it needs a version bump + replay); at the reset it is free because everything rebuilds anyway. This is where the schema-axis "hardest cases" get absorbed cheaply. -7. **Resolve each field's mandatory tag — and bank the free corrections.** The "absent ⇒ generation fails" rule forces a conscious decision on every fingerprinted field at the reset, which is the moment to *fix* existing mistakes for free. Concretely, tag `ComponentConfig.Packages` `fingerprint:"-"`: every `PackageConfig` field is publish-only (`Publish`, itself `-`), so the map measures nothing build-effective — yet today `hashstructure` hashes its *keys*, so adding a publish-only package name already triggers a spurious rebuild. Excluding it at the reset retires that existing G1 churn at zero cost. Audit the whole struct for the same pattern (a measured composite whose every leaf is `-`). +7. **Resolve each field's mandatory tag - and bank the free corrections.** The "absent ⇒ generation fails" rule forces a conscious decision on every fingerprinted field at the reset, which is the moment to *fix* existing mistakes for free. Concretely, tag `ComponentConfig.Packages` `fingerprint:"-"`: every `PackageConfig` field is publish-only (`Publish`, itself `-`), so the map measures nothing build-effective - yet today `hashstructure` hashes its *keys*, so adding a publish-only package name already triggers a spurious rebuild. Excluding it at the reset retires that existing G1 churn at zero cost. Audit the whole struct for the same pattern (a measured composite whose every leaf is `-`). -**Anti-goal:** do *not* burn reset budget on additive fields — Part 2 handles those for free, forever. The success criterion for the load-out is that **no *routine* change ever forces a second coordinated cutover**: after the reset, every ordinary change must be expressible as either a free additive field or a lazy Part 2 version bump. Retiring an *old* content version is the one sanctioned exception — a fleet-wide `component migrate` is itself a deliberate, planned, reset-grade event (see [Registry floor](#registry-floor-and-forced-migration)); the goal is that nothing *unplanned* ever forces one. +**Anti-goal:** do *not* burn reset budget on additive fields - Part 2 handles those for free, forever. The success criterion for the load-out is that **no *routine* change ever forces a second coordinated cutover**: after the reset, every ordinary change must be expressible as either a free additive field or a lazy Part 2 version bump. Retiring an *old* content version is the one sanctioned exception - a fleet-wide `component migrate` is itself a deliberate, planned, reset-grade event (see [Registry floor](#registry-floor-and-forced-migration)); the goal is that nothing *unplanned* ever forces one. -### The lock changes at the reset — atomic token + forced upgrade +### The lock changes at the reset: atomic token + forced upgrade The stored hash becomes a single self-describing token: @@ -376,7 +394,7 @@ input-fingerprint = "v1:sha256:9f86d0…" # :: One field carries both the content version and the digest, so they cannot be written out of step (the desync bug a split version/digest field invites). Parsing splits on `:`; an absent prefix on a pre-reset lock reads as the legacy format. -The lock **format** `Version` stays at `1`. The on-disk *schema* is unchanged — same fields, same TOML shape — so an old binary still parses a reset lock and reads its pins (`upstream-commit`, `import-commit`, `manual-bump`), which is all it needs to queue a build. What changes is the *value* of `InputFingerprint`: the substrate swap is expressed purely as a content-version step, and the reset is the **first forced upgrade** to the `v1:` token. The existing singleton `Parse` gate (`Version == 1`) is left untouched; all substrate/version reconciliation routes through the content-version registry instead of a format gate. +The lock **format** `Version` stays at `1`. The on-disk *schema* is unchanged - same fields, same TOML shape - so an old binary still parses a reset lock and reads its pins (`upstream-commit`, `import-commit`, `manual-bump`), which is all it needs to queue a build. What changes is the *value* of `InputFingerprint`: the substrate swap is expressed purely as a content-version step, and the reset is the **first forced upgrade** to the `v1:` token. The existing singleton `Parse` gate (`Version == 1`) is left untouched; all substrate/version reconciliation routes through the content-version registry instead of a format gate. Recovery from a sub-`v1` token is the **same mechanism** as the reset itself: a token with no `v:` prefix (or a version below `minSupportedLockContentVersion`) cannot be replayed, so it is treated as `Stale` and **force-rehashed** to the current version on the next `update`. One code path unifies three cases: @@ -386,7 +404,7 @@ Recovery from a sub-`v1` token is the **same mechanism** as the reset itself: a This is the one place back-compatibility is load-bearing, and it is satisfied without a format bump: old binaries read pins and build; the fingerprint value reconciles by version. See the next section for why reading *historical* locks never needs to recompute their hash at all. -### Back-compat invariant — synthetic history reads stored strings, never recomputes +### Back-compat invariant: synthetic history reads stored strings, never recomputes The reset is only safe because of a property of the codebase verified against the source: **nothing that reads a *historical* lock ever recomputes a fingerprint for it.** Every historical reader compares the *stored* hash strings; the only code that recomputes a fingerprint does so for the **current working tree against HEAD**, never against an arbitrary past commit. Concretely: @@ -394,21 +412,21 @@ The reset is only safe because of a property of the codebase verified against th | ------ | ----------------------------------- | ----------- | | `synthistory.FindFingerprintChanges` | walks `lockfile.ShowAtCommit`→`Parse`, compares `InputFingerprint` *strings* between adjacent commits | No | | `synthistory.BuildDirtyChange` | compares the precomputed current fingerprint to HEAD's stored string | No (HEAD only) | -| `sourceprep.computeCurrentFingerprint` | the *only* `ComputeIdentity` call on this surface — computes for the **current tree**, compares to HEAD's stored hash | Current tree only | +| `sourceprep.computeCurrentFingerprint` | the *only* `ComputeIdentity` call on this surface - computes for the **current tree**, compares to HEAD's stored hash | Current tree only | -The consequence: **swapping the substrate is invisible to synthetic history.** A pre-reset (legacy-token) lock and a post-reset `v1:` lock are just two different opaque strings at two different commits; the walker reports "changed" across the reset commit (correct — it *is* a notable, deliberate, fleet-wide event, the coordinated cutover) and never tries to recompute either side. Applying historic overlays likewise reads stored lock fields and needs no hash recomputation. +The consequence: **swapping the substrate is invisible to synthetic history.** A pre-reset (legacy-token) lock and a post-reset `v1:` lock are just two different opaque strings at two different commits; the walker reports "changed" across the reset commit (correct - it *is* a notable, deliberate, fleet-wide event, the coordinated cutover) and never tries to recompute either side. Applying historic overlays likewise reads stored lock fields and needs no hash recomputation. > **Invariant (must hold forever):** synthetic history and historic-overlay application operate on **stored lock fields only.** No reader recomputes a fingerprint for a historical commit. This is precisely what lets a frozen `projectVN` be *forward-only*: it never has to reproduce a hash from a different substrate generation, only hashes the lock that the *current* binary writes. A future change that recomputes a historical fingerprint would break this and must be rejected in review. -This invariant — no reader recomputes a historical fingerprint — is the complete back-compatibility story: **new-reads-old by string, never-recompute-old by algorithm.** The lock *format* never bumps, so old and new binaries parse every lock identically; only the *interpretation* of the fingerprint value evolves, and that rides the content-version registry. +This invariant - no reader recomputes a historical fingerprint - is the complete back-compatibility story: **new-reads-old by string, never-recompute-old by algorithm.** The lock *format* never bumps, so old and new binaries parse every lock identically; only the *interpretation* of the fingerprint value evolves, and that rides the content-version registry. -## Part 2 — Post-reset lazy migration +## Part 2: Post-reset lazy migration -The reset gives us a clean, frozen substrate. Part 2 is the machinery that rides it for the rare genuine algorithm change *after* the cutover — lazily, per-component, with no second coordinated cutover. This is the original "lazy" design, now sound because `projectVN` is genuinely frozen. +The reset gives us a clean, frozen substrate. Part 2 is the machinery that rides it for the rare genuine algorithm change *after* the cutover - lazily, per-component, with no second coordinated cutover. This is the original "lazy" design, now sound because `projectVN` is genuinely frozen. ### Versioned lock content with lazy replay (algorithm changes) -Stamp one **lock content-hash version** into the lock (the `v1:` prefix of the atomic token) and teach the freshness check to **replay** older versions. The version governs *both* stored hashes (`InputFingerprint` and `ResolutionInputHash`) — they live in one lock, share one write event, and a single integer is the natural fit (see [scope note](#both-hashes-share-one-version) for why one version, not two): +Stamp one **lock content-hash version** into the lock (the `v1:` prefix of the atomic token) and teach the freshness check to **replay** older versions. The version governs *both* stored hashes (`InputFingerprint` and `ResolutionInputHash`) - they live in one lock, share one write event, and a single integer is the natural fit (see [scope note](#both-hashes-share-one-version) for why one version, not two): 1. The content version lives in the atomic `v:sha256:…` token (it is **not** the lock *format* `Version`, which stays at `1`). The registry floor *starts* at `1` = the projection baseline; there is no legacy pre-projection algorithm in the registry, because pre-reset locks are never replayed (they are read-only history, per the invariant above). A pre-reset lock's prefix-less token is therefore *below* the floor and reconciled by force-rehash, not replay. 2. Turn the combiner into a thin dispatcher over a small registry of historical algorithms, keyed by version. Each entry pairs the two compute functions; when only one algorithm changes, the other slot **reuses** the prior function (no version-neutral hash moves for the untouched one). Keep versions back to a declared floor (see [Registry floor](#registry-floor-and-forced-migration)): @@ -439,26 +457,26 @@ Stamp one **lock content-hash version** into the lock (the `v1:` prefix of the a } ``` -3. In `checkFingerprintFreshness`, compute at the **current** version. On mismatch, if the lock's token version `< current`, recompute at the lock's token version. If *that* matches the stored digest, the inputs are unchanged and only the algorithm evolved → treat as `FreshnessCurrent` and flag for silent re-stamp. Otherwise → `FreshnessStale`. (The resolution hash reuses `computeRes1` until its algorithm first changes — see scope note.) +3. In `checkFingerprintFreshness`, compute at the **current** version. On mismatch, if the lock's token version `< current`, recompute at the lock's token version. If *that* matches the stored digest, the inputs are unchanged and only the algorithm evolved → treat as `FreshnessCurrent` and flag for silent re-stamp. Otherwise → `FreshnessStale`. (The resolution hash reuses `computeRes1` until its algorithm first changes - see scope note.) 4. `component update` re-stamps the token to the **current** version **only when it is already writing for an independent reason** (see the churn policy below). Migration is therefore **lazy and per-component**: a lock upgrades only when something independently touches it. This resolves Problems 2 (for default changes), 3 (hashing bugfixes), and 5 (piecemeal rollout). It is the same lazy-forward-migration pattern Cargo/npm use, specialized to a content hash. #### Both hashes share one version -`ComponentLock` carries two persisted content hashes: `InputFingerprint` (render inputs, via `projectVN` + `sha256`) and `ResolutionInputHash` (upstream-resolution inputs — a flat SHA256 over seven explicit fields in `ComputeResolutionHash`). Both have the **same evolution problem**: appending an input or reordering the fold moves every lock's hash → G1 churn. +`ComponentLock` carries two persisted content hashes: `InputFingerprint` (render inputs, via `projectVN` + `sha256`) and `ResolutionInputHash` (upstream-resolution inputs - a flat SHA256 over seven explicit fields in `ComputeResolutionHash`). Both have the **same evolution problem**: appending an input or reordering the fold moves every lock's hash → G1 churn. -We version them with **one shared integer**, not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version axes would double the floor/replay/migrate machinery for an input set (`ResolutionInputHash`) that changes rarely — YAGNI. +We version them with **one shared integer**, not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version axes would double the floor/replay/migrate machinery for an input set (`ResolutionInputHash`) that changes rarely - YAGNI. -**`InputFingerprint` is the sole prefix authority; `ResolutionInputHash` stays bare.** The shared version is physically stored **only** in `InputFingerprint`'s `v:` prefix. `ResolutionInputHash` carries **no prefix** — it remains a bare `sha256:` digest. This is the decisive choice that prevents a cross-field desync: the **first fingerprint-only `v2`** already advances the shared prefix, and if `ResolutionInputHash` *also* carried it, `resolver.go`'s raw string compare of the whole field would see a prefix-only move (`v1:…X` → `v2:…X`, `computeRes1` unchanged) and mark resolution stale → fleet-wide re-resolution for nothing. With the prefix living only in `InputFingerprint`, the resolver compares a bare digest that does not move on a fingerprint bump. The "shared version" therefore means: the integer in `InputFingerprint`'s prefix selects which `computeResN` produced `ResolutionInputHash` during replay (read from the one prefix), not that resolution stores its own copy. This also keeps `InputFingerprint` the only release-bearing field, so the historical changelog/classifier comparators — which compare the **digest**, stripping the `v:` prefix — never see a phantom move on a version-only re-stamp. (See [the synthetic-history path](#the-synthetic-changelogrelease-path-is-the-real-hazard).) +**`InputFingerprint` is the sole prefix authority; `ResolutionInputHash` stays bare.** The shared version is physically stored **only** in `InputFingerprint`'s `v:` prefix. `ResolutionInputHash` carries **no prefix** - it remains a bare `sha256:` digest. This is the decisive choice that prevents the *fingerprint-bump* desync: the **first fingerprint-only `v2`** already advances the shared prefix, and if `ResolutionInputHash` *also* carried it, `resolver.go`'s raw string compare of the whole field would see a prefix-only move (`v1:…X` → `v2:…X`, `computeRes1` unchanged) and mark resolution stale → fleet-wide re-resolution for nothing. With the prefix living only in `InputFingerprint`, the resolver compares a bare digest that does not move on a fingerprint bump. (This closes the fingerprint-bump direction only; the *symmetric* resolution-only-write desync stays dormant until `computeRes2` and is held by the structural tripwire in [`ResolutionInputHash`](#resolutioninputhash-bare-digest-replay-deferred).) The "shared version" therefore means: the integer in `InputFingerprint`'s prefix selects which `computeResN` produced `ResolutionInputHash` during replay (read from the one prefix), not that resolution stores its own copy. This also keeps `InputFingerprint` the only release-bearing field, so the historical changelog/classifier comparators - which compare the **digest**, stripping the `v:` prefix - never see a phantom move on a version-only re-stamp. (See [the synthetic-history path](#the-synthetic-changelogrelease-path-is-the-real-hazard).) -**Phasing.** The atomic token format (`v:sha256:…`) is fixed at the reset. Fingerprint replay is wired in Part 2's first PR; **resolution-hash replay is reserved, not yet wired** — the slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`). Because `ResolutionInputHash` is bare and prefix-free, a fingerprint-only bump before that day is a no-op for the resolver — the deferral is genuinely safe, not merely small-blast-radius. See [`ResolutionInputHash`](#resolutioninputhash--bare-digest-replay-deferred). +**Phasing.** The atomic token format (`v:sha256:…`) is fixed at the reset. Fingerprint replay is wired in Part 2's first PR; **resolution-hash replay is reserved, not yet wired** - the slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`). Because `ResolutionInputHash` is bare and prefix-free, a fingerprint-only bump before that day is a no-op for the resolver - the deferral is genuinely safe, not merely small-blast-radius. See [`ResolutionInputHash`](#resolutioninputhash-bare-digest-replay-deferred). #### Churn-avoidance policies (G1) -The version stamp is itself a potential source of spurious diffs — the exact thing G1 forbids. The rule that prevents it is one idea: **judge "changed?" by replaying the lock's *own* version, not the current one.** Everything below follows from that. +The version stamp is itself a potential source of spurious diffs - the exact thing G1 forbids. The rule that prevents it is one idea: **judge "changed?" by replaying the lock's *own* version, not the current one.** Everything below follows from that. -**Why the obvious approach is wrong.** Today `update.go` sets `result.Changed = true` the instant `lock.InputFingerprint != identity.Fingerprint`, where `identity` is computed at the **current** version. That comparison sits *upstream* of the write guard `if !result.Changed && !resHashChanged { return false, nil }`. So the moment you ship a v1→v2 *algorithm* change, the current-version hash differs from every stored v1 token, `Changed` flips for **~every component at once**, and you get the mass auto-release-bump + mass lock rewrite G1 exists to prevent. The version stamp cannot "harmlessly ride the `Changed` path" — it *triggers* it. +**Why the obvious approach is wrong.** Today `update.go` sets `result.Changed = true` the instant `lock.InputFingerprint != identity.Fingerprint`, where `identity` is computed at the **current** version. That comparison sits *upstream* of the write guard `if !result.Changed && !resHashChanged { return false, nil }`. So the moment you ship a v1→v2 *algorithm* change, the current-version hash differs from every stored v1 token, `Changed` flips for **~every component at once**, and you get the mass auto-release-bump + mass lock rewrite G1 exists to prevent. The version stamp cannot "harmlessly ride the `Changed` path" - it *triggers* it. **The fix: replay before you compare.** Recompute at the lock's recorded version first, and only call it changed if *that* disagrees: @@ -473,38 +491,38 @@ if lock.InputFingerprint != replayed.Token() { // algorithm moved → NOT Changed. // Re-stamp to the current version ONLY when the lock is already dirty for a -// real reason — the version upgrade piggybacks a real write, never triggers one. +// real reason - the version upgrade piggybacks a real write, never triggers one. if result.Changed { lock.InputFingerprint = identity.Token() // current version + digest, written together } ``` -This makes migration strictly **opportunistic**: a lock advances its version the next time its component changes for real, and not one commit sooner. Because the version lives *inside* the atomic token, a lock at `v1` with unchanged inputs keeps its exact `v1:sha256:…` bytes — there is no separate version field to materialize and no zero-diff bookkeeping. (When resolution replay is wired, the same replay-before-compare guards the `resHashChanged` write.) +This makes migration strictly **opportunistic**: a lock advances its version the next time its component changes for real, and not one commit sooner. Because the version lives *inside* the atomic token, a lock at `v1` with unchanged inputs keeps its exact `v1:sha256:…` bytes - there is no separate version field to materialize and no zero-diff bookkeeping. (When resolution replay is wired, the same replay-before-compare guards the `resHashChanged` write.) -**The unavoidable flip side — false-fresh on a newly-measured input.** "Replay at the lock's own version" is what buys churn-avoidance, but it is the *same* property that creates a blind spot, because replaying `computeFP(old)` is **blind to any input that version did not measure.** Concretely, when v2 starts folding in an input v1 never touched (the [*Newly measured input*](#change-taxonomy) row): +**The unavoidable flip side - false-fresh on a newly-measured input.** "Replay at the lock's own version" is what buys churn-avoidance, but it is the *same* property that creates a blind spot, because replaying `computeFP(old)` is **blind to any input that version did not measure.** Concretely, when v2 starts folding in an input v1 never touched (the [*Newly measured input*](#change-taxonomy) row): - A change to that **new** input on a still-`v1` lock replays at v1, which ignores it → digest still matches → **`Changed = false`** → the change is silently treated as fresh. -- The new input only takes effect on that lock when the lock migrates to v2 — i.e. the next time it is dirtied for an *independent* reason, or via `component migrate`. +- The new input only takes effect on that lock when the lock migrates to v2 - i.e. the next time it is dirtied for an *independent* reason, or via `component migrate`. -This is correct *by contract* (a v1 lock promises freshness under the v1 input set, which excludes the new input), and harmless for a cosmetic input. But for a **build-critical** new input it is a latent-stale hazard: artifacts can lag the new input by an unbounded number of commits. **Decision rule:** if a newly-measured input must take effect fleet-wide immediately, do **not** rely on lazy adoption — pair the version bump with a deliberate `component migrate` (see [Registry floor and forced migration](#registry-floor-and-forced-migration)). Lazy adoption is the default; `component migrate` is the opt-in for inputs that cannot wait. +This is correct *by contract* (a v1 lock promises freshness under the v1 input set, which excludes the new input), and harmless for a cosmetic input. But for a **build-critical** new input it is a latent-stale hazard: artifacts can lag the new input by an unbounded number of commits. **Decision rule:** if a newly-measured input must take effect fleet-wide immediately, do **not** rely on lazy adoption - pair the version bump with a deliberate `component migrate` (see [Registry floor and forced migration](#registry-floor-and-forced-migration)). Lazy adoption is the default; `component migrate` is the opt-in for inputs that cannot wait. #### Registry floor and forced migration Lazy migration means an untouched lock can sit at an old version **indefinitely** (G3 by design). That makes "keep the last *N* versions" a **correctness cliff, not a tuning knob**: if pruning drops the compute function a lock still depends on, replay becomes impossible → forced `FreshnessStale` → the mass rebuild/rewrite (and, via the downstream-consumer analysis below, mass changelog churn) the whole design exists to avoid. So the floor must be explicit and paired with an escape hatch, decided now: -- **`minSupportedLockContentVersion`** is a hard floor. A lock below it cannot be replayed and is treated as `Stale`. Dropping a registry entry is therefore a deliberate, breaking, announced act — never incidental cleanup. -- **`component migrate`** (Open Q#5, promoted to a requirement) force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception — it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* — it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm, so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic-history trap](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm rewrites only the bare, prefix-free `ResolutionInputHash` — which `synthistory` never reads — so it is correctly release-silent.) Migrate is therefore rare: the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal — each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). -- **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine — left alone, the registry, golden vectors, and deprecated tombstone fields grow **append-only** (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion − minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. **Early-warning ramp:** the ceiling is a *warning at ceiling−1*, a hard failure only at the ceiling — so an approaching floor-raise surfaces as a heads-up on the PR *before* the one that registers `v(N+1)`, converting the forced migrate from a surprise blocking failure into a planned event (the design's goal that nothing *unplanned* ever forces a migrate). **Residual:** if genuine algorithm changes arrive *faster* than planned rebuilds, the ceiling still ultimately *forces* an unplanned, release-grade `component migrate`. The ceiling does not eliminate the expensive event; it bounds the backlog by *converting* an unbounded version spread into an occasional forced migrate, with one version of advance notice. This is the accepted cost of lazy-forever coexistence. +- **`minSupportedLockContentVersion`** is a hard floor. A lock below it cannot be replayed and is treated as `Stale`. Dropping a registry entry is therefore a deliberate, breaking, announced act - never incidental cleanup. +- **`component migrate`** force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception - it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* - it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm, so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic-history trap](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm rewrites only the bare, prefix-free `ResolutionInputHash` - which `synthistory` never reads - so it is correctly release-silent.) Migrate is therefore rare: the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal - each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). +- **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine - left alone, the registry, golden vectors, and deprecated tombstone fields grow **append-only** (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion - minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. **Early-warning ramp:** the ceiling is a *warning at ceiling-1*, a hard failure only at the ceiling - so an approaching floor-raise surfaces as a heads-up on the PR *before* the one that registers `v(N+1)`, converting the forced migrate from a surprise blocking failure into a planned event (the design's goal that nothing *unplanned* ever forces a migrate). **Residual:** if genuine algorithm changes arrive *faster* than planned rebuilds, the ceiling still ultimately *forces* an unplanned, release-grade `component migrate`. The ceiling does not eliminate the expensive event; it bounds the backlog by *converting* an unbounded version spread into an occasional forced migrate, with one version of advance notice. This is the accepted cost of lazy-forever coexistence. -**Mixed-toolchain hazard — bounded by the version-pin, not auto-repair.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock, stamping a legacy (prefix-less) or lower-`v` hash. In the **working tree** this is self-correcting: the next new-binary run detects the sub-floor token and force-rehashes it to the current version. But "self-correcting" stops at the working tree — if a downgraded lock is **committed**, `FindFingerprintChanges` reads `v1 → legacy → v1` as two real release events, and a published `%autorelease` increment cannot be withdrawn. So the load-bearing guard against *committed* phantom releases is the **CI version-pin**: post-cutover, no old binary may run the `update`-and-commit step. Concretely, that means the lock-writing CI job runs from a **pinned build image (by digest, not a floating tag)** rebuilt from the cutover commit or later, and **no other path reaches the `update`-and-commit step** — local developer binaries do not commit locks; only the pinned job does. (The force-rehash only cleans the working tree; it does not undo history.) The *symmetric* residual — a binary that predates content-version `v2` meeting a `v2` token it cannot replay — is closed by a **required** write-time guard (Open Q#5, now a requirement): refuse to write a token whose version exceeds the binary's `currentLockContentVersion`, erroring rather than silently restamping at `v1`. Note this guard lives in the binary doing the write, so it constrains *newer-but-not-newest* binaries; it does **not** retroactively constrain a genuinely *old* binary — that direction is the version-pin's job. +**Mixed-toolchain hazard - bounded by the version-pin, not auto-repair.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock, stamping a legacy (prefix-less) or lower-`v` hash. In the **working tree** this is self-correcting: the next new-binary run detects the sub-floor token and force-rehashes it to the current version. But "self-correcting" stops at the working tree - if a downgraded lock is **committed**, `FindFingerprintChanges` reads `v1 → legacy → v1` as two real release events, and a published `%autorelease` increment cannot be withdrawn. So the load-bearing guard against *committed* phantom releases is the **CI version-pin**: post-cutover, no old binary may run the `update`-and-commit step. Concretely, that means the lock-writing CI job runs from a **pinned build image (by digest, not a floating tag)** rebuilt from the cutover commit or later, and **no other path reaches the `update`-and-commit step** - local developer binaries do not commit locks; only the pinned job does. (The force-rehash only cleans the working tree; it does not undo history.) The *symmetric* residual - a binary that predates content-version `v2` meeting a `v2` token it cannot replay - is closed by a **required** write-time guard: refuse to write a token whose version exceeds the binary's `currentLockContentVersion`, erroring rather than silently restamping at `v1`. Note this guard lives in the binary doing the write, so it constrains *newer-but-not-newest* binaries; it does **not** retroactively constrain a genuinely *old* binary - that direction is the version-pin's job. -#### Replaying across a changed input set — `{a,b,c}` → `{a,b,d}` +#### Replaying across a changed input set: `{a,b,c}` → `{a,b,d}` -A lock stores **one atomic token** (`v:sha256:…`); it does *not* store the individual inputs. So when the measured set changes — say the fingerprint stops measuring `c` and starts measuring `d` — an existing lock is reconciled the only way an opaque digest allows: **recompute and compare, at the lock's own version.** +A lock stores **one atomic token** (`v:sha256:…`); it does *not* store the individual inputs. So when the measured set changes - say the fingerprint stops measuring `c` and starts measuring `d` - an existing lock is reconciled the only way an opaque digest allows: **recompute and compare, at the lock's own version.** Split the change into its two halves; they are handled independently: -- **Adding `d`** is the additive case — `projectV1` never listed `d`, so for any lock at v1 the digest is byte-identical whether or not the struct now has `d` (G4, *truly* — the property `hashstructure` could not give). Free. No version bump. +- **Adding `d`** is the additive case - `projectV1` never listed `d`, so for any lock at v1 the digest is byte-identical whether or not the struct now has `d` (G4, *truly* - the property `hashstructure` could not give). Free. No version bump. - **Dropping `c`** is what forces the version bump, and it is reconciled by replay: 1. `computeFP2` (measures `{a,b,d}`) ≠ stored digest → mismatch. 2. token version (1) < current (2) → **replay `computeFP1`** (still measures `{a,b,c}`). @@ -517,13 +535,13 @@ So the bump is **not breaking**: replay answers "were the *old* inputs unchanged 1. **Bump to v2 measuring `{a,b,d}` but keep field `c` on the struct** so the v1 projection can still read it for replay (close `c`'s tag to `v1..v1`, so v2 does not measure it). Every old lock replays clean at v1, is recognized as unchanged, lazy re-stamps to v2. Zero forced rebuilds. 2. **Only after the floor passes v1** (`minSupportedLockContentVersion = 2`, ideally after a deliberate `component migrate`) physically delete field `c` and `projectV1`. -> **Invariant:** a field may be physically removed from the config struct only after *every* retained version whose tag set includes it has been retired below `minSupportedLockContentVersion`. Retained versions and the struct they read must stay in sync — you cannot delete a field a live version's golden vector still sets. +> **Invariant:** a field may be physically removed from the config struct only after *every* retained version whose tag set includes it has been retired below `minSupportedLockContentVersion`. Retained versions and the struct they read must stay in sync - you cannot delete a field a live version's golden vector still sets. -This makes "drop an input" a lazy, per-component migration rather than a fleet-wide rebuild — at the cost of carrying a deprecated field on the struct until the last version measuring it ages out. +This makes "drop an input" a lazy, per-component migration rather than a fleet-wide rebuild - at the cost of carrying a deprecated field on the struct until the last version measuring it ages out. #### First post-reset customer -The reset establishes `projectV1` directly; it is *not* itself a Part 2 version event (it rides the rebuild, not replay). Part 2's machinery therefore sits idle until the **first genuine algorithm change after the cutover** — e.g. a `computeFP2` that fixes an overlay-folding bug, folds in a newly measured input, or changes a baked-in default. That change registers `computeFP2`, bumps `currentLockContentVersion` to 2, and is absorbed by replay with no second coordinated cutover. Because the projection substrate makes additive config changes hash-neutral by construction (G4), the *only* changes that ever need a Part 2 version event are genuine non-additive algorithm changes — a deliberately small set. +The reset establishes `projectV1` directly; it is *not* itself a Part 2 version event (it rides the rebuild, not replay). Part 2's machinery therefore sits idle until the **first genuine algorithm change after the cutover** - e.g. a `computeFP2` that fixes an overlay-folding bug, folds in a newly measured input, or changes a baked-in default. That change registers `computeFP2`, bumps `currentLockContentVersion` to 2, and is absorbed by replay with no second coordinated cutover. Because the projection substrate makes additive config changes hash-neutral by construction (G4), the *only* changes that ever need a Part 2 version event are genuine non-additive algorithm changes - a deliberately small set. ## Config schema version and canonical migration (future) @@ -535,7 +553,7 @@ This is the on-disk TOML axis. It is **independent** of the fingerprint axis and The critical invariant: **migrate old TOML → latest canonical struct, then project once.** A semantically no-op migration (rename `foo`→`bar`) must produce the *same* canonical struct, hence the same projection bytes, hence no drift. This is what keeps the schema axis **orthogonal** to the lock axis: a faithful `config migrate` is a pure re-encoding that moves *no* fingerprint, so it never triggers a `component migrate`. If a TOML change genuinely alters build meaning, that is a content-version bump (Part 2), not a `config migrate`. -**Resolved by projection:** the old `hashstructure` caveat — that it mixed `reflect.Type.Name()` into the hash, so renaming a Go struct moved every fingerprint even with identical content — **no longer applies.** The generated projection emits only the explicit field bytes, under each field's **frozen TOML key**, never the Go type or field name. So *both* a struct-type rename **and** a cosmetic field rename (`Foo`→`Bar`, same `toml:` key) are genuinely drift-neutral — **pinned by golden tests** (rename a fingerprinted struct, and rename a field while keeping its TOML key → byte-identical digest in both cases), so the property is CI-enforced, not just asserted here. Renaming the *TOML key itself* is an output-changing edit and takes a version bump like any other. +**Resolved by projection:** the old `hashstructure` caveat - that it mixed `reflect.Type.Name()` into the hash, so renaming a Go struct moved every fingerprint even with identical content - **no longer applies.** The generated projection emits only the explicit field bytes, under each field's **frozen TOML key**, never the Go type or field name. So *both* a struct-type rename **and** a cosmetic field rename (`Foo`→`Bar`, same `toml:` key) are genuinely drift-neutral - **pinned by golden tests** (rename a fingerprinted struct, and rename a field while keeping its TOML key → byte-identical digest in both cases), so the property is CI-enforced, not just asserted here. Renaming the *TOML key itself* is an output-changing edit and takes a version bump like any other. ## Pipeline @@ -553,115 +571,115 @@ TOML on disk ──migrate to canonical struct (schema axis)──► ComponentC ## Downstream fingerprint consumers (blast radius) -The versioned-replay story in Part 2 must hold for **every** reader of `InputFingerprint`, not just the two paths it grew up around. This is the post-reset migration blast-radius map; each consumer's behavior under a Part 2 v1→v2 algorithm switchover is stated explicitly. (The *reset itself* is invisible to these consumers as analyzed under [Back-compat invariant](#back-compat-invariant--synthetic-history-reads-stored-strings-never-recomputes): they compare stored strings, and pre-reset locks are never recomputed.) +The versioned-replay story in Part 2 must hold for **every** reader of `InputFingerprint`, not just the two paths it grew up around. This is the post-reset migration blast-radius map; each consumer's behavior under a Part 2 v1→v2 algorithm switchover is stated explicitly. (The *reset itself* is invisible to these consumers as analyzed under [Back-compat invariant](#back-compat-invariant-synthetic-history-reads-stored-strings-never-recomputes): they compare stored strings, and pre-reset locks are never recomputed.) | Consumer | Reads | Compares | Migration behavior required | | -------- | ----- | -------- | --------------------------- | | `checkFingerprintFreshness` (resolver) | recomputed identity | vs stored token | Replay at token version (Part 2 core) | | `component update` `Changed` decision | recomputed identity | vs stored token | **Replay before `Changed`** (see churn policy seam) | | `bumpComponents` (`update.go`) | recomputed identity | vs stored token | Current-tree replay (second `ComputeIdentity` caller) | -| `changed.go` `classifyComponent` (CI classifier) | stored token strings (two historical git refs) | **digest compare** (strip `v:` prefix) | **String-only — must NOT replay** (no inputs available; replaying historical configs would violate the no-recompute invariant) | -| `changed.go` `haveMatchingFingerprints` (cache-poisoning integrity gate) | stored token strings | **digest compare** (strip `v:` prefix) | **String-only; security-load-bearing** — a version-only delta must read as "same" or the integrity check is silently skipped | +| `changed.go` `classifyComponent` (CI classifier) | stored token strings (two historical git refs) | **digest compare** (strip `v:` prefix) | **String-only - must NOT replay** (no inputs available; replaying historical configs would violate the no-recompute invariant) | +| `changed.go` `haveMatchingFingerprints` (cache-poisoning integrity gate) | stored token strings | **digest compare** (strip `v:` prefix) | **String-only; security-load-bearing** - a version-only delta must read as "same" or the integrity check is silently skipped | | `synthistory.FindFingerprintChanges` | stored token strings across git history | **digest of adjacent commits** (strip `v:` prefix) | **String-only; digest-compare** so a version-only re-stamp never fires a release | | `synthistory.BuildDirtyChange` | recomputed (current ver) | vs stored `headLock` token | **Replay at headLock version** before declaring dirty | | `ResolutionInputHash` staleness/write | recomputed resolution hash | vs stored **bare** digest | **No prefix** (bare `sha256:`); fingerprint-only bumps never touch it; replay reserved | -**Two comparator classes, not one — and only one of them can replay.** The consumers split cleanly by *what they hold*: +**Two comparator classes, not one - and only one of them can replay.** The consumers split cleanly by *what they hold*: - **Current-tree comparators** (`checkFingerprintFreshness`, `update`'s `Changed`, `BuildDirtyChange`) recompute against *live inputs*, so they **can and must** replay at the stored token's version. Feasible and invariant-safe. -- **Stored-vs-stored historical comparators** (`FindFingerprintChanges`, `changed.go`'s `classifyComponent`/`haveMatchingFingerprints`) hold only committed token *strings* from two git refs — no config, no FS, no inputs. They **cannot** replay, and replaying would require recomputing a historical fingerprint, which the [forever-invariant](#back-compat-invariant--synthetic-history-reads-stored-strings-never-recomputes) forbids outright. Both stay **string-only**, and both compare the **digest** (stripping the `v:` version prefix), which makes them inherently immune to version-only deltas — a v1→v2 re-stamp with an unchanged digest reads as "no change." (Strict-lazy churn is still the policy that keeps re-stamps from riding no-op commits in the first place, but the comparators no longer *depend* on it for correctness.) +- **Stored-vs-stored historical comparators** (`FindFingerprintChanges`, `changed.go`'s `classifyComponent`/`haveMatchingFingerprints`) hold only committed token *strings* from two git refs - no config, no FS, no inputs. They **cannot** replay, and replaying would require recomputing a historical fingerprint, which the [forever-invariant](#back-compat-invariant-synthetic-history-reads-stored-strings-never-recomputes) forbids outright. Both stay **string-only**, and both compare the **digest** (stripping the `v:` version prefix), which makes them inherently immune to version-only deltas - a v1→v2 re-stamp with an unchanged digest reads as "no change." (Strict-lazy churn is still the policy that keeps re-stamps from riding no-op commits in the first place, but the comparators no longer *depend* on it for correctness.) -The `changed.go` classifier is the easily-missed member of the *second* class: it must get the same **digest-compare** as `FindFingerprintChanges`, so a version-only delta reads as "no change" — not a replay (which it cannot do, holding no inputs). +The `changed.go` classifier is the easily-missed member of the *second* class: it must get the same **digest-compare** as `FindFingerprintChanges`, so a version-only delta reads as "no change" - not a replay (which it cannot do, holding no inputs). -**This contract is enforced by types, not prose — the `fingerprint.Token` choke-point.** A reviewer-vigilance rule across the comparison sites is the kind of discipline this RFC elsewhere converts to structure (the atomic token, D3), and digest-comparison widens the surface: `v:` prefix-parsing now lives at three historical + three current-tree sites, and the "digest-compare two stored strings" pattern is **copyable**. The residual hazard is therefore not mere *omission* — a forgotten replay at a current-tree site fails *safely* toward inequality → spurious `Stale`/`Changed` → wasteful rebuild (G1 churn, never G5) — it is *mis-classification*: a future consumer that holds live inputs but copies the **historical** stored-string template never looks at those inputs → silently accepts a stale tree → reachable G5. Omission is safe; mis-classification is not, and only structure closes it. +**This contract is enforced by types, not prose - the `fingerprint.Token` choke-point.** A reviewer-vigilance rule across the comparison sites is the kind of discipline this RFC elsewhere converts to structure (the atomic token, D3), and digest-comparison widens the surface: `v:` prefix-parsing now lives at three historical + three current-tree sites, and the "digest-compare two stored strings" pattern is **copyable**. The residual hazard is therefore not mere *omission* - a forgotten replay at a current-tree site fails *safely* toward inequality → spurious `Stale`/`Changed` → wasteful rebuild (G1 churn, never G5) - it is *mis-classification*: a future consumer that holds live inputs but copies the **historical** stored-string template never looks at those inputs → silently accepts a stale tree → reachable G5. Omission is safe; mis-classification is not, and only structure closes it. The fix is a **two-type split**, because a single token type cannot tell the two comparator classes apart: -- **`StoredToken`** — parsed from a lock by the *sole* strict parser `ParseToken` (accepts only `sha256:` legacy and `v:sha256:`; any malformed token is treated as *changed*, never normalized to an empty digest). It exposes `SameDigest(other StoredToken)` and nothing else — it holds no inputs, so a site that has only stored strings *physically cannot* perform a freshness decision. -- **`FreshToken`** — obtainable *only* from `ComputeIdentityAt(version, config, …)`, so constructing a *valid* one requires live inputs. Its zero value (`var f FreshToken`) is still syntactically constructible, so it **fails safe**: a `FreshToken` carries a validity bit set only by the constructor, and `Reconcile` on an unset one **returns `Stale`** (never errors, never `Fresh`). `Stale` is the fail-*safe* answer on a path whose job is "rebuild when in doubt" — a zero token means "no freshness evidence," so it triggers a rebuild (G1 churn at worst) and never blocks a `build`/`render`/`--check-only`, where an `error` would be fail-*stop* and could take the fleet down on an accidental zero-value path. It exposes `Reconcile(stored StoredToken) → {Fresh | Stale | RestampTo(v)}`. (Belt-and-suspenders: a named test `var f FreshToken; assert f.Reconcile(stored) == Stale`, and if feasible a vet/lint check that no site reconciles a statically-zero token — so a *programming* mistake is still caught loudly without coupling runtime behavior to it.) +- **`StoredToken`** - parsed from a lock by the *sole* strict parser `ParseToken` (accepts only `sha256:` legacy and `v:sha256:`; any malformed token is treated as *changed*, never normalized to an empty digest). It exposes `SameDigest(other StoredToken)` and nothing else - it holds no inputs, so a site that has only stored strings *physically cannot* perform a freshness decision. +- **`FreshToken`** - obtainable *only* from `ComputeIdentityAt(version, config, …)`, so constructing a *valid* one requires live inputs. Its zero value (`var f FreshToken`) is still syntactically constructible, so it **fails safe**: a `FreshToken` carries a validity bit set only by the constructor, and `Reconcile` on an unset one **returns `Stale`** (never errors, never `Fresh`). `Stale` is the fail-*safe* answer on a path whose job is "rebuild when in doubt" - a zero token means "no freshness evidence," so it triggers a rebuild (G1 churn at worst) and never blocks a `build`/`render`/`--check-only`, where an `error` would be fail-*stop* and could take the fleet down on an accidental zero-value path. It exposes `Reconcile(stored StoredToken) → {Fresh | Stale | RestampTo(v)}`. (Belt-and-suspenders: a named test `var f FreshToken; assert f.Reconcile(stored) == Stale`, and if feasible a vet/lint check that no site reconciles a statically-zero token - so a *programming* mistake is still caught loudly without coupling runtime behavior to it.) A historical site holding two `StoredToken`s can call `SameDigest` but cannot fabricate a `FreshToken`, so it cannot accidentally pose as a current-tree freshness check; a current-tree site must obtain a `FreshToken` to reconcile, which forces it through live inputs. The *assignment* documents the class, and the mis-classification path is unconstructible rather than merely discouraged. Both types are **non-comparable** (an unexported `_ [0]func()` field), so a raw `==` on a token outside the `fingerprint` package fails to compile. (Unexported fields alone would *not* do this: a struct of comparable unexported fields is still `==`-comparable from any package; the non-comparable sentinel is what blocks it.) -For the choke-point to be *structural* and not merely conventional, the **lock fields must be token-typed, not raw `string`**: as long as `ComponentLock.InputFingerprint`/`ResolutionInputHash` stay exported strings, `lock.InputFingerprint == other.InputFingerprint` still compiles and the raw-compare pattern stays copyable. So PR C changes those fields to `StoredToken` (TOML marshal/unmarshal routing through `ParseToken`, so every read crosses the strict parser), or hides the raw string behind an accessor that returns a `StoredToken`. Only then does \"enforced by types, not prose\" hold end-to-end. This lands in PR C, which already edits every comparison site; it has no on-disk-format dependency (the *bytes* are unchanged \u2014 only the Go field type), so there is no reason to touch them twice and carry the mis-classification window in between. +For the choke-point to be *structural* and not merely conventional, the **lock fields must be token-typed, not raw `string`**: as long as `ComponentLock.InputFingerprint`/`ResolutionInputHash` stay exported strings, `lock.InputFingerprint == other.InputFingerprint` still compiles and the raw-compare pattern stays copyable. So PR C changes those fields to `StoredToken` (TOML marshal/unmarshal routing through `ParseToken`, so every read crosses the strict parser), or hides the raw string behind an accessor that returns a `StoredToken`. Only then does \"enforced by types, not prose\" hold end-to-end. This lands in PR C, which already edits every comparison site. **The on-disk bytes are not automatically unchanged, though; the field *form* decides it** (verified empirically against go-toml/v2 v2.3.1, the pinned version). `omitempty` decides emptiness by reflecting a struct's *exported* fields **before** consulting `TextMarshaler`, so a token struct whose digest sits in an *unexported* field is judged empty and **dropped even when set**: a populated `input-fingerprint` silently vanishes, while a *non-*`omitempty` value struct instead emits a spurious `resolution-input-hash = ''` line. Two byte-neutral forms survive: (a) an **accessor**, keeping the on-disk field a `string` and exposing a `StoredToken` via method with writes routed through `ParseToken`, byte-neutral *by construction* (the serialized type never changes) and `==`-proof for every other package; or (b) a value struct with an **exported** digest field (so `omitempty` tracks it) plus a custom marshal that renders it as a bare string. The pointer form (`*StoredToken`) is byte-neutral but reintroduces a silent pointer-`==`, so it is rejected. **PR-C acceptance gate:** a golden round-trip test proving a real local lock's bytes are unchanged across the conversion, so the property is *tested*, not asserted. Either accepted form lands in PR C with no separate on-disk-format bump. ### The synthetic changelog/release path is the real hazard -[`synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go) turns fingerprint movement into **user-visible, shipped** package state — `%autochangelog` entries and `%autorelease` increments. There are two distinct comparators, and the design resolves them asymmetrically. +[`synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go) turns fingerprint movement into **user-visible, shipped** package state - `%autochangelog` entries and `%autorelease` increments. There are two distinct comparators, and the design resolves them asymmetrically. -- **`FindFingerprintChanges` (historical walker)** compares `InputFingerprint` across the lock's git history and emits a synthetic changelog/release entry on every change. It compares the **digest** (stripping the `v:` version prefix), not the full token — a one-line string operation, not the infeasible version-aware replay (it has only committed *strings*, no inputs). So a version-only re-stamp (a lazy v1→v2 migration with an unchanged digest) is **invisible** to it; only a moved digest — a genuine input change — fires, and the migration folds into the real change's entry that carries it. The v1→v2 conversion is thus an *accepted, per-component, notable* changelog event that piggybacks a real change, guaranteed by digest-comparison rather than by lazy-discipline. - - **`component migrate` is release-grade *when it moves digests*.** A migrate that retires a *fingerprint* algorithm re-stamps every unchanged lock from `computeFP1`'s digest to `computeFP2`'s — the digests move, the walker fires, and the fleet-wide release is the deliberate cost ([registry floor](#registry-floor-and-forced-migration)). A migrate that retires only a *resolution* algorithm rewrites only the bare `ResolutionInputHash` (which `synthistory` never reads), so it is correctly release-silent. Either way the firing tracks a real `InputFingerprint` digest move. -- **`BuildDirtyChange` (live dirty check)** compares a *recomputed* current-version (v2) hash against the *stored* (possibly v1) `headLock.InputFingerprint` and declares dirty on inequality. "Accept as notable" does **not** save this path: post-switchover an *unchanged* component would read **dirty on every `render`/`build`** until re-stamped — a persistent, recurring spurious signal, worse than a one-time entry. The fix is **free**: it is the *same replay Part 2 already owes the freshness check* — replay at `headLock`'s recorded version before declaring dirty. One additional call site for logic already being written, no new mechanism. +- **`FindFingerprintChanges` (historical walker)** compares `InputFingerprint` across the lock's git history and emits a synthetic changelog/release entry on every change. It compares the **digest** (stripping the `v:` version prefix), not the full token - a one-line string operation, not the infeasible version-aware replay (it has only committed *strings*, no inputs). So a version-only re-stamp (a lazy v1→v2 migration with an unchanged digest) is **invisible** to it; only a moved digest - a genuine input change - fires, and the migration folds into the real change's entry that carries it. The v1→v2 conversion is thus an *accepted, per-component, notable* changelog event that piggybacks a real change, guaranteed by digest-comparison rather than by lazy-discipline. + - **`component migrate` is release-grade *when it moves digests*.** A migrate that retires a *fingerprint* algorithm re-stamps every unchanged lock from `computeFP1`'s digest to `computeFP2`'s - the digests move, the walker fires, and the fleet-wide release is the deliberate cost ([registry floor](#registry-floor-and-forced-migration)). A migrate that retires only a *resolution* algorithm rewrites only the bare `ResolutionInputHash` (which `synthistory` never reads), so it is correctly release-silent. Either way the firing tracks a real `InputFingerprint` digest move. +- **`BuildDirtyChange` (live dirty check)** compares a *recomputed* current-version (v2) hash against the *stored* (possibly v1) `headLock.InputFingerprint` and declares dirty on inequality. "Accept as notable" does **not** save this path: post-switchover an *unchanged* component would read **dirty on every `render`/`build`** until re-stamped - a persistent, recurring spurious signal, worse than a one-time entry. The fix is **free**: it is the *same replay Part 2 already owes the freshness check* - replay at `headLock`'s recorded version before declaring dirty. One additional call site for logic already being written, no new mechanism. -**Net:** the changelog-walker concern is not "make the walker version-aware" (hard, maybe infeasible). It is two cheap things — (1) the historical comparators (`FindFingerprintChanges`, `changed.go`) compare the **digest**, so a version-only delta never fires; and (2) extend the *current-tree* replay to `BuildDirtyChange` (which *does* hold live inputs), one call site for logic already being written. The reset commit is the single deliberate exception: it *is* a fleet-wide notable event, the coordinated cutover, intentionally visible. +**Net:** the changelog-walker concern is not "make the walker version-aware" (hard, maybe infeasible). It is two cheap things - (1) the historical comparators (`FindFingerprintChanges`, `changed.go`) compare the **digest**, so a version-only delta never fires; and (2) extend the *current-tree* replay to `BuildDirtyChange` (which *does* hold live inputs), one call site for logic already being written. The reset commit is the single deliberate exception: it *is* a fleet-wide notable event, the coordinated cutover, intentionally visible. -### `ResolutionInputHash` — bare digest, replay deferred +### `ResolutionInputHash`: bare digest, replay deferred `ComponentLock` carries a *second* persisted content hash, `ResolutionInputHash`, with its own staleness logic and its own silent-write path (it writes when only `resHashChanged`, never flipping `Changed`). It has the **identical** evolution problem as `InputFingerprint`, but two properties make its replay safe to defer: - **Smaller blast radius.** `ResolutionInputHash` does **not** feed `synthistory`, so an algorithm change can never mint a phantom changelog/release (that hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. - **No pending change.** It is a flat seven-field SHA256, not a struct walk, so the projection substrate leaves it untouched. Its registry slot stays `computeRes1` until its inputs genuinely change. -**Decision (KISS/YAGNI):** wire fingerprint replay in Part 2's first PR. `ResolutionInputHash` stays a **bare `sha256:` digest with no `v:` prefix** (the prefix lives only in `InputFingerprint` — see [Both hashes share one version](#both-hashes-share-one-version)), so the resolver compares it directly and a fingerprint-only bump never touches it. The day `ComputeResolutionHash` first changes, add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`); decide *then* whether resolution needs its own prefix or reads the shared one. Because resolution carries no prefix today, the desync hazard of a shared prefix on two fields **does not exist** — there is no restamp-on-write seam to defer. +**Decision (KISS/YAGNI):** wire fingerprint replay in Part 2's first PR. `ResolutionInputHash` stays a **bare `sha256:` digest with no `v:` prefix** (the prefix lives only in `InputFingerprint` - see [Both hashes share one version](#both-hashes-share-one-version)), so the resolver compares it directly and a fingerprint-only bump never touches it. The day `ComputeResolutionHash` first changes, add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`); decide *then* whether resolution needs its own prefix or reads the shared one. Because resolution carries no prefix and is compared bare today, a fingerprint-only bump never touches it - so the shared-prefix desync is **dormant, not eliminated**, and wakes only when resolution gains a second algorithm. The seam: the prefix advances only on `result.Changed`, while a resolution-only write takes the independent `resHashChanged` path ([`update.go`](../../../internal/app/azldev/cmds/component/update.go)). So once `computeRes2` exists, a resolution-only write would advance the bare digest while the shared prefix stays at `v1`, and replay would select `computeRes1` → permanent false-stale. This is **safe-direction (G1 churn, never a missed rebuild) and dormant** (resolution replay is reserved; one algorithm today), so it gates no PR now. To stop it shipping silently the day it matters, the guard is **structural, not prose**: registering a second resolution algorithm **fails the build** (a registry `init()`-time assertion) unless the desync is resolved - either resolution takes its own prefix, or a resolution-only write also re-stamps `InputFingerprint`'s prefix to current (same digest) behind a CI gate mirroring the dirty-change gate. The decision is *forced* at `computeRes2`, not forgotten. ## Design decisions -### D1 — Canonical projection vs `hashstructure` + `Includable` +### D1: Canonical projection vs `hashstructure` + `Includable` Both can omit zero values; the decisive difference is **whether an old algorithm can be frozen**, which `Includable` cannot deliver (Problem 6). | | Canonical projection (chosen) | `hashstructure` + `Includable` | | --- | --- | --- | -| Old algorithm frozen | Yes — version-tagged fields, golden-vector pinned | No — reflects the live struct/method-set | +| Old algorithm frozen | Yes - version-tagged fields, golden-vector pinned | No - reflects the live struct/method-set | | Sound replay (Part 2) | Yes | No (the disqualifier) | | Meaningful empties | `!`-prefixed range per field | `fingerprint:"always"` per field | | Type-name in hash | No (rename is drift-neutral) | Yes (rename moves every hash) | | Plumbing | Version tags + generator + golden vectors | Value-receiver `HashInclude` on every nested struct + `v.(reflect.Value)` assert | -`Includable` keeps today's hashes byte-identical, which mattered for an *incremental* rollout — but that no longer matters once the reset rebuilds everything anyway, and it comes attached to a substrate that makes replay unsound. (Verified against `hashstructure` v2.0.2, the pinned version in `go.mod`: it reflects the live struct and method set at hash time and mixes `reflect.Type.Name()` into the digest — the properties Problem 6 turns on.) Projection trades byte-compatibility (which we are spending on the coordinated cutover regardless) for frozen replay (which we need forever). Adopted at the reset. +`Includable` keeps today's hashes byte-identical, which mattered for an *incremental* rollout - but that no longer matters once the reset rebuilds everything anyway, and it comes attached to a substrate that makes replay unsound. (Verified against `hashstructure` v2.0.2, the pinned version in `go.mod`: it reflects the live struct and method set at hash time and mixes `reflect.Type.Name()` into the digest - the properties Problem 6 turns on.) Projection trades byte-compatibility (which we are spending on the coordinated cutover regardless) for frozen replay (which we need forever). Adopted at the reset. -### D2 — Version-tagged field selection, generated +### D2: Version-tagged field selection, generated -Field membership lives in a per-field version-set tag (`fingerprint:"v1..*"`); a `go generate` step emits the per-version `projectVN` functions from those tags. This is the chosen mechanism over both a runtime reflective walker and hand-written functions — it takes the declarative authoring of the former and the compile-time guarantees of the latter. Rationale: +Field membership lives in a per-field version-set tag (`fingerprint:"v1..*"`); a `go generate` step emits the per-version `projectVN` functions from those tags. This is the chosen mechanism over both a runtime reflective walker and hand-written functions - it takes the declarative authoring of the former and the compile-time guarantees of the latter. Rationale: -- **The unsafe direction is the false-negative** (a meaningful field silently omitted → missed rebuild → stale artifact, a G5 violation). A *mandatory* tag — absent → generation fails — makes the include/exclude decision impossible to *forget*. The *wrongly-excluded* case (a `-` tag on a build-effective field) is caught by the kept exclusion ledger, and the *wrongly-included-but-unmeasured* case by the [coverage backstop](#golden-vector-coverage-the-backstop). -- **Version-awareness is declarative.** A field's whole lifecycle — introduced at v3, dropped at v5, revived at v8 — is one greppable string on the field (`v3..v4,v8..*`), inexpressible in hand-written form, with no diff smeared across function bodies. -- **Frozen-ness stays structural.** Because the generated functions are checked-in code, a retained `projectVN` references each field by literal Go path — deleting a measured field won't compile, the emit-key is a literal, and regeneration-idempotence (CI `go generate` + diff) pins a shipped version's output. Golden vectors are the semantic backstop behind that, not the sole guarantee. This recovers the hand-written model's compile guarantee that a *runtime* reflective walker gives up (its output would reflect the live struct at hash time — Problem 6 one layer down), while keeping the DSL's declarative lifecycle. +- **The unsafe direction is the false-negative** (a meaningful field silently omitted → missed rebuild → stale artifact, a G5 violation). A *mandatory* tag - absent → generation fails - makes the include/exclude decision impossible to *forget*. The *wrongly-excluded* case (a `-` tag on a build-effective field) is caught by the kept exclusion ledger, and the *wrongly-included-but-unmeasured* case by the [coverage backstop](#golden-vector-coverage-the-backstop). +- **Version-awareness is declarative.** A field's whole lifecycle - introduced at v3, dropped at v5, revived at v8 - is one greppable string on the field (`v3..v4,v8..*`), inexpressible in hand-written form, with no diff smeared across function bodies. +- **Frozen-ness stays structural.** Because the generated functions are checked-in code, a retained `projectVN` references each field by literal Go path - deleting a measured field won't compile, the emit-key is a literal, and regeneration-idempotence (CI `go generate` + diff) pins a shipped version's output. Golden vectors are the semantic backstop behind that, not the sole guarantee. This recovers the hand-written model's compile guarantee that a *runtime* reflective walker gives up (its output would reflect the live struct at hash time - Problem 6 one layer down), while keeping the DSL's declarative lifecycle. -The `go generate` *infrastructure* already exists (`stringer`/`mockgen` via `mage`), so the marginal cost is low — but the projection generator's **stakes** are categorically higher than those tools, and the design treats it accordingly. A `stringer` bug is cosmetic; a `mockgen` bug breaks test compilation and is caught instantly. A **projection-generator bug silently moves a shipped version's bytes → fleet-wide G5 (stale, undetectable except by the corpus) or G1 (mass churn).** The generator is therefore a first-class, fingerprint-load-bearing production artifact with its own test suite, and **regeneration-idempotence is a required CI gate** (the [`.github/workflows/generate.yml`](../../../.github/workflows/generate.yml) check is mandatory, never skippable) — without it the freeze degrades from structural to test-discipline. That is precisely why the coverage oracle and hand-frozen golden digests above are mandatory, not optional. +The `go generate` *infrastructure* already exists (`stringer`/`mockgen` via `mage`), so the marginal cost is low - but the projection generator's **stakes** are categorically higher than those tools, and the design treats it accordingly. A `stringer` bug is cosmetic; a `mockgen` bug breaks test compilation and is caught instantly. A **projection-generator bug silently moves a shipped version's bytes → fleet-wide G5 (stale, undetectable except by the corpus) or G1 (mass churn).** The generator is therefore a first-class, fingerprint-load-bearing production artifact with its own test suite, and **regeneration-idempotence is a required CI gate** (the [`.github/workflows/generate.yml`](../../../.github/workflows/generate.yml) check is mandatory, never skippable) - without it the freeze degrades from structural to test-discipline. That is precisely why the coverage oracle and hand-frozen golden digests above are mandatory, not optional. -### D3 — Atomic self-describing token; no format bump, reconcile via force-rehash +### D3: Atomic self-describing token; no format bump, reconcile via force-rehash The stored hash is a single `v:sha256:` token, not separate version and digest fields. One field, written atomically, so the version and the digest can never desync (the class of bug a split-field design invites when one is written and the other is not). -The lock **format** `Version` stays at `1`. Bumping it to `2` as a poison pill — to stop old binaries touching reset locks — is too blunt: it also stops them reading pins to *queue a build*. Instead, back-compat rests on two cheaper properties: the format is unchanged so every binary parses every lock, and the content-version registry **force-rehashes** any sub-floor token (legacy, or downgraded by an old binary) up to the current version. Old binaries stay useful (read pins, build); their only possible mischief — writing a legacy-substrate hash — is self-correcting on the next new-binary run, not silent corruption. Back-compat is therefore: **same format forever, reconcile fingerprints by version, never recompute history.** +The lock **format** `Version` stays at `1`. Bumping it to `2` as a poison pill - to stop old binaries touching reset locks - is too blunt: it also stops them reading pins to *queue a build*. Instead, back-compat rests on two cheaper properties: the format is unchanged so every binary parses every lock, and the content-version registry **force-rehashes** any sub-floor token (legacy, or downgraded by an old binary) up to the current version. Old binaries stay useful (read pins, build); their only possible mischief - writing a legacy-substrate hash - is self-correcting on the next new-binary run, not silent corruption. Back-compat is therefore: **same format forever, reconcile fingerprints by version, never recompute history.** -### D4 — Project to bytes, not a `ConfigHash()` method on the type +### D4: Project to bytes, not a `ConfigHash()` method on the type `project(config, version) []byte` returns canonical bytes; the combiner in `fingerprint` owns the `sha256` and the version dispatch. A `ConfigHash()` method that returns a finished hash was rejected: it drags crypto + versioning onto a data type, and it tempts callers to route around the version registry to get a raw, version-agnostic hash. Returning bytes keeps the config type ignorant of versioning, and keeps the combiner the **sole version authority**. See [the seam note](#where-the-hashing-logic-should-live). ## Alternatives considered -- **Incremental lazy migration on the `hashstructure` substrate** (the original plan): flip the inclusion default to omitempty via `Includable`, version the lock content, and migrate lazily — *without* a reset. Rejected: Problem 6 makes its central promise unkeepable. A "frozen" replay function built on `hashstructure.Hash` reflects the live struct, so the first field addition after the switchover moves the old algorithm's output and forces a rehash anyway. The incremental path therefore does not actually avoid a coordinated cutover — it defers one to the first field addition, on a substrate that makes replay unsound. With a coordinated cutover already scheduled (the dev→prod cutover), spending it once on a clean projection substrate is the better trade. -- **Global `IgnoreZeroValue`** — a blunt switch that omits *all* zero fields with no escape hatch for build-meaningful zeros, and still on the non-frozen `hashstructure` substrate. Rejected. -- **Parallel versioned structs with per-struct `Hash()`** — couples locks to Go type identity and duplicates hashing logic per version. Rejected in favor of Part 2's integer-versioned combiner over frozen projections. -- **Bump the lock format `Version` 1→2 as a poison pill** — makes old binaries hard-reject reset locks. Rejected: it also blocks old binaries from reading pins to queue a build, and it is unnecessary, since the content-version registry already force-rehashes any sub-floor or downgraded token (D3). Same-format + force-rehash keeps old binaries useful without risking silent corruption. -- **Eager fleet-wide migration as the steady-state mechanism** — rewriting every lock on every algorithm change is the mass-churn the design exists to prevent. Rejected for the steady state. The *reset* is a deliberate, one-time, operator-driven eager pass riding an already-scheduled rebuild — the sanctioned exception, not the rule; `component migrate` is its post-reset equivalent for retiring an old version. -- **Runtime reflective walker for field selection (instead of generated functions).** One generic `project(cfg, N)` reflects the struct at hash time and emits the fields whose version-set includes N. Least code, and it shares the tag syntax with the chosen approach. Rejected: it reflects the *live* struct at hash time — Problem 6 one layer down — so its frozen-ness rests entirely on golden-vector coverage (test discipline), and field removal degrades from a compile error to a CI failure. Codegen keeps the same tags but moves the reflection to *generate* time and freezes the output as checked-in code, recovering the compile guarantee. -- **Hand-written per-version `projectVN` functions (instead of generating them from tags).** Each version gets a bespoke function with one explicit `emit`/`emitAlways` line per measured field. Same compile guarantees as codegen (removal won't compile, literal emit-key), but: membership is smeared across N function bodies; "bring a field back a few versions later" has no first-class expression (you re-add an `emit` line, nothing ties it to the field's earlier life); and the mandatory-decision and coverage properties need separate bookkeeping the tags otherwise carry. Codegen is the same runtime with declarative authoring — strictly preferable given the existing `go generate` infrastructure. -- **Per-field hash manifest in the lock (instead of one opaque token).** Store `{field → hash}` (à la `go.sum`) rather than a single `v:sha256:…` digest. *Genuine wins:* dropping a field becomes ignoring its manifest line — no projection kept alive for replay, so the **deprecate-then-delete two-step and the registry-retirement deadlock** (the append-only growth above) both vanish; and the stored-vs-stored historical comparators become structural set-diffs rather than version-blind string compares. *Why the opaque token still wins for azldev:* (1) the projection substrate **already** delivers additive immunity (G4) — the manifest's headline draw — so that advantage is moot, not additive; (2) the manifest does **not** kill the false-fresh hazard — an old lock has *no line* for a newly-measured input, so there is still no baseline to detect a change to it (the blind spot is relocated, not removed); (3) it makes *algorithm evolution* — the entire point of Part 2 — **harder**, needing per-field versioning where the token needs one integer for the whole algorithm; and (4) it bloats every lock to O(fields × components) (the well-known `go.sum` size cost). The manifest is the better tool for a *static* input set that mainly grows and shrinks; the opaque token + single version is the better tool for an *evolving hashing algorithm*, which is azldev's actual problem. The reset bakes the storage model in — token-vs-manifest is irreversible after PR B — and the retirement deadlock the manifest would have dissolved is instead answered by the floor-advance cadence above. +- **Incremental lazy migration on the `hashstructure` substrate** (the original plan): flip the inclusion default to omitempty via `Includable`, version the lock content, and migrate lazily - *without* a reset. Rejected: Problem 6 makes its central promise unkeepable. A "frozen" replay function built on `hashstructure.Hash` reflects the live struct, so the first field addition after the switchover moves the old algorithm's output and forces a rehash anyway. The incremental path therefore does not actually avoid a coordinated cutover - it defers one to the first field addition, on a substrate that makes replay unsound. With a coordinated cutover already scheduled (the dev→prod cutover), spending it once on a clean projection substrate is the better trade. +- **Global `IgnoreZeroValue`** - a blunt switch that omits *all* zero fields with no escape hatch for build-meaningful zeros, and still on the non-frozen `hashstructure` substrate. Rejected. +- **Parallel versioned structs with per-struct `Hash()`** - couples locks to Go type identity and duplicates hashing logic per version. Rejected in favor of Part 2's integer-versioned combiner over frozen projections. +- **Bump the lock format `Version` 1→2 as a poison pill** - makes old binaries hard-reject reset locks. Rejected: it also blocks old binaries from reading pins to queue a build, and it is unnecessary, since the content-version registry already force-rehashes any sub-floor or downgraded token (D3). Same-format + force-rehash keeps old binaries useful without risking silent corruption. +- **Eager fleet-wide migration as the steady-state mechanism** - rewriting every lock on every algorithm change is the mass-churn the design exists to prevent. Rejected for the steady state. The *reset* is a deliberate, one-time, operator-driven eager pass riding an already-scheduled rebuild - the sanctioned exception, not the rule; `component migrate` is its post-reset equivalent for retiring an old version. +- **Runtime reflective walker for field selection (instead of generated functions).** One generic `project(cfg, N)` reflects the struct at hash time and emits the fields whose version-set includes N. Least code, and it shares the tag syntax with the chosen approach. Rejected: it reflects the *live* struct at hash time - Problem 6 one layer down - so its frozen-ness rests entirely on golden-vector coverage (test discipline), and field removal degrades from a compile error to a CI failure. Codegen keeps the same tags but moves the reflection to *generate* time and freezes the output as checked-in code, recovering the compile guarantee. +- **Hand-written per-version `projectVN` functions (instead of generating them from tags).** Each version gets a bespoke function with one explicit `emit`/`emitAlways` line per measured field. Same compile guarantees as codegen (removal won't compile, literal emit-key), but: membership is smeared across N function bodies; "bring a field back a few versions later" has no first-class expression (you re-add an `emit` line, nothing ties it to the field's earlier life); and the mandatory-decision and coverage properties need separate bookkeeping the tags otherwise carry. Codegen is the same runtime with declarative authoring - strictly preferable given the existing `go generate` infrastructure. +- **Per-field hash manifest in the lock (instead of one opaque token).** Store `{field → hash}` (à la `go.sum`) rather than a single `v:sha256:…` digest. *Genuine wins:* dropping a field becomes ignoring its manifest line - no projection kept alive for replay, so the **deprecate-then-delete two-step and the registry-retirement deadlock** (the append-only growth above) both vanish; and the stored-vs-stored historical comparators become structural set-diffs rather than version-blind string compares. *Why the opaque token still wins for azldev:* (1) the projection substrate **already** delivers additive immunity (G4) - the manifest's headline draw - so that advantage is moot, not additive; (2) the manifest does **not** kill the false-fresh hazard - an old lock has *no line* for a newly-measured input, so there is still no baseline to detect a change to it (the blind spot is relocated, not removed); (3) it makes *algorithm evolution* - the entire point of Part 2 - **harder**, needing per-field versioning where the token needs one integer for the whole algorithm; and (4) it bloats every lock to O(fields × components) (the well-known `go.sum` size cost). The manifest is the better tool for a *static* input set that mainly grows and shrinks; the opaque token + single version is the better tool for an *evolving hashing algorithm*, which is azldev's actual problem. The reset bakes the storage model in - token-vs-manifest is irreversible after PR B - and the retirement deadlock the manifest would have dissolved is instead answered by the floor-advance cadence above. ## Incremental delivery The reset (Part 1) must land as one coherent change at the dev→prod cutover; its pieces are independently reviewable but ship together because they all move the hash. -1. **PR A (substrate)**: the **projection generator** (`go generate`) — reads the version-set tags and emits the per-version `projectVN(cfg) []byte` functions (literal emits, sorted keys) plus golden-vector and coverage scaffolding — the canonical encoder (`canonicalBuf`, `emit`/`emitAlways`), the version-set tag parser, the frozen **TOML-key** emit rule, the `reflect.Value.IsZero()` omit-predicate, the `sha256` combiner, and the golden vectors. Generate-time guards: a fingerprinted field with **no tag** fails generation; the slimmed **exclusion ledger** and **dropped-fields ledger** replace the retired `TestAllFingerprintedFieldsHaveDecision` audit; **regeneration-idempotence** (CI `go generate` + `git diff --exit-code`) pins shipped versions. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Tests: a field tagged `v2..*` is absent from generated `projectV1`; a `!` range emits at zero; a field with **no** `fingerprint` tag fails generation; a **nested** fingerprinted struct with a tagless field fails generation; deleting a field a retained `projectVN` names **fails to compile**; a **Go-field rename keeping the TOML key** yields a byte-identical digest; two fields colliding on one emit-key fail generation; the coverage oracle (by struct-reflection, not the tag) fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`) and is not in the dropped-fields ledger; golden vectors pin v1; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. -2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`, asserted by a named-constant test (`currentVersion == 1`) with a comment that the *content* version lives in the token prefix, not here — so a future format bump cannot silently break every historical read through `lockfile.Parse`. Ships at the cutover; absorbed by the scheduled rebuild. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. -3. **PR C (Part 2 machinery)**: the **two-type token split** — `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`, fails closed on its zero value), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **every** comparison and compute site through these types. The **current-tree** sites (via `FreshToken.Reconcile`): replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, `BuildDirtyChange`, and the second `ComputeIdentity` caller `bumpComponents` (`update.go`); plus the `computeCurrentFingerprint` (`sourceprep.go`) return-type cascade `string → FreshToken`. The **historical** sites (via `StoredToken.SameDigest`): `FindFingerprintChanges`, `changed.go`'s `classifyComponent`, **and `haveMatchingFingerprints`**. **`haveMatchingFingerprints` is security-load-bearing:** it gates the cache-poisoning integrity check (`if result.SourcesChange && haveMatchingFingerprints(...)` in `changed.go`). If only `classifyComponent` is converted and this site is missed, the first legitimate `v2` bump makes a version-only re-stamp compare unequal → the integrity violation is **never recorded → tamper evidence silently swallowed**. It must convert to digest-compare in the same PR. Resolution replay reserved (slot reuses `computeRes1`). **Ordering gate (CI-enforced):** `currentLockContentVersion > 1` is forbidden unless `BuildDirtyChange` already routes through `Reconcile` — otherwise registering `v2` makes every component read persistently dirty on every `render`/`build`. The gate is necessary but not sufficient (it does not prove `haveMatchingFingerprints` converted), so it is paired with a **named acceptance test**: `from="v1:sha256:X"`, `to="v2:sha256:X"` ⇒ `haveMatchingFingerprints` returns **true** — a missed conversion fails CI rather than silently disabling the integrity check. **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* — only the *registry dispatch* is dormant while just `v1` exists. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event and does **not** suppress `haveMatchingFingerprints`; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a zero-value `FreshToken`/`StoredToken` fails closed; a historical site cannot construct a `FreshToken`; the registry `init()` panics on a `[minSupported,current]` gap. -4. **PR D (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) — add a field absent from `projectV1` and set it on one component; assert only that lock drifts and every other lock is byte-identical. +1. **PR A (substrate)**: the **projection generator** (`go generate`) - reads the version-set tags and emits the per-version `projectVN(cfg) []byte` functions (literal emits, sorted keys) plus golden-vector and coverage scaffolding - the canonical encoder (`canonicalBuf`, `emit`/`emitAlways`), the version-set tag parser, the frozen **TOML-key** emit rule, the **split omit-predicate** (scalar leaves `IsZero`, composites projected emptiness), the `sha256` combiner, and the golden vectors. Generate-time guards: a fingerprinted field with **no tag** fails generation; the slimmed **exclusion ledger** and **dropped-fields ledger** replace the retired `TestAllFingerprintedFieldsHaveDecision` audit; **regeneration-idempotence** (CI `go generate` + `git diff --exit-code`) pins shipped versions. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Tests: a field tagged `v2..*` is absent from generated `projectV1`; a `!` range emits at zero; a field with **no** `fingerprint` tag fails generation; a **nested** fingerprinted struct with a tagless field fails generation; deleting a field a retained `projectVN` names **fails to compile**; a **Go-field rename keeping the TOML key** yields a byte-identical digest; two fields colliding on one emit-key fail generation; a `!`-tagged nested struct whose every child is `-` (so its mandatory `!`-zero discrimination vector is unsatisfiable) is rejected as degenerate; the coverage oracle (by struct-reflection, not the tag) fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`) and is not in the dropped-fields ledger; golden vectors pin v1; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. +2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`, asserted by a named-constant test (`currentVersion == 1`) with a comment that the *content* version lives in the token prefix, not here - so a future format bump cannot silently break every historical read through `lockfile.Parse`. Ships at the cutover; absorbed by the scheduled rebuild. The `hashstructure` import and its `go.mod` entry are removed here, since no caller survives the switch. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. +3. **PR C (Part 2 machinery)**: the **two-type token split** - `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`, fails closed on its zero value), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **every** comparison and compute site through these types. The **current-tree** sites (via `FreshToken.Reconcile`): replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, `BuildDirtyChange`, and the second `ComputeIdentity` caller `bumpComponents` (`update.go`); plus the `computeCurrentFingerprint` (`sourceprep.go`) return-type cascade `string → FreshToken`. The **historical** sites (via `StoredToken.SameDigest`): `FindFingerprintChanges`, `changed.go`'s `classifyComponent`, **and `haveMatchingFingerprints`**. **`haveMatchingFingerprints` is security-load-bearing:** it gates the cache-poisoning integrity check (`if result.SourcesChange && haveMatchingFingerprints(...)` in `changed.go`). If only `classifyComponent` is converted and this site is missed, the first legitimate `v2` bump makes a version-only re-stamp compare unequal → the integrity violation is **never recorded → tamper evidence silently swallowed**. It must convert to digest-compare in the same PR. Resolution replay reserved (slot reuses `computeRes1`). **Ordering gate (CI-enforced):** `currentLockContentVersion > 1` is forbidden unless `BuildDirtyChange` already routes through `Reconcile` - otherwise registering `v2` makes every component read persistently dirty on every `render`/`build`. The gate is necessary but not sufficient (it does not prove `haveMatchingFingerprints` converted), so it is paired with a **named acceptance test**: `from="v1:sha256:X"`, `to="v2:sha256:X"` ⇒ `haveMatchingFingerprints` returns **true** - a missed conversion fails CI rather than silently disabling the integrity check. **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* - only the *registry dispatch* is dormant while just `v1` exists. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event and does **not** suppress `haveMatchingFingerprints`; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a zero-value `FreshToken`/`StoredToken` fails closed; a historical site cannot construct a `FreshToken`; the registry `init()` panics on a `[minSupported,current]` gap; a named `classifyComponent({name:"v1:sha256:X"}, {name:"v2:sha256:X"}) == Unchanged` (the third raw historical compare, with no CI gate of its own); a `BuildDirtyChange(v2-token, headLock-v1-same-digest) == nil` (a `RestampTo` must not mint a dirty synthetic commit; the existing "not a changelog event" test exercises `FindFingerprintChanges`, not this path); a malformed token round-trips its **original raw bytes** through `MarshalText`, so a malformed lock is never rewritten on save (no spurious `FindFingerprintChanges` event). +4. **PR D (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) - add a field absent from `projectV1` and set it on one component; assert only that lock drifts and every other lock is byte-identical. 5. **PR E (config schema axis, later)**: `schema-version` field + load-time canonical migration + the `config migrate` command. Gated on the first post-reset non-additive TOML change not already absorbed by the reset's normalization pass. -6. **PR F (forced lock migration, gated on the first floor raise)**: the `component migrate` command (the only sanctioned floor-raise; the prescribed fix for a build-critical newly-measured input) and the CI spread-ceiling on `currentLockContentVersion − minSupportedLockContentVersion`. **Gating:** a `v2` bump *without* PR F is safe — v1 stays in the registry and the floor stays at 1, so unmigrated locks still replay. PR F is required only before **raising `minSupportedLockContentVersion` above 1** (retiring v1), since that is what makes un-migrated locks unreplayable. A CI gate forbids raising the floor unless `component migrate` exists. So PR F is decoupled from the first `v2` and gated on the first floor raise. +6. **PR F (forced lock migration, gated on the first floor raise)**: the `component migrate` command (the only sanctioned floor-raise; the prescribed fix for a build-critical newly-measured input) and the CI spread-ceiling on `currentLockContentVersion - minSupportedLockContentVersion`. **Gating:** a `v2` bump *without* PR F is safe - v1 stays in the registry and the floor stays at 1, so unmigrated locks still replay. PR F is required only before **raising `minSupportedLockContentVersion` above 1** (retiring v1), since that is what makes un-migrated locks unreplayable. A CI gate forbids raising the floor unless `component migrate` exists. So PR F is decoupled from the first `v2` and gated on the first floor raise **or** the first content-version bump whose decision rule demands immediate fleet-wide adoption (a build-critical newly-measured input, which cannot wait for lazy migration). -Each PR is independently revertible up to the cutover. PRs A–B land together at the dev→prod cutover (they move every hash and are absorbed by the scheduled rebuild); PR C is inert until the first post-reset algorithm change; PR D follows; PR E is gated on the first post-reset schema change, PR F on the first floor raise. +Each PR is independently revertible up to the cutover. PRs A-B land together at the dev→prod cutover (they move every hash and are absorbed by the scheduled rebuild); PR C is inert until the first post-reset algorithm change; PR D follows; PR E is gated on the first post-reset schema change, PR F on the first floor raise. ## Open questions @@ -676,14 +694,14 @@ Indexed here so they are not re-litigated; each is argued in full at the linked | Reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover | §The opportunity | | Substrate is canonical projection (generated `projectVN` + golden vectors), not `hashstructure` | [§Substrate options](#substrate-options) | | Field selection is **codegen** from mandatory per-field version-set tags (absent ⇒ generation fails); `go generate` emits the per-version `projectVN` | [§Version-tagged field selection](#version-tagged-field-selection) | -| Emit-key = frozen TOML key (`key=` override; duplicate keys fail generation); omit-predicate = `reflect.Value.IsZero()` (composite omission by *projected* emptiness) | [§Version-tagged field selection](#version-tagged-field-selection) | +| Emit-key = frozen TOML key (`key=` override; duplicate keys fail generation); omit-predicate splits - scalar leaves `IsZero`, composites *projected* emptiness | [§Version-tagged field selection](#version-tagged-field-selection) | | Tag DSL frozen at three range-operators (`..` `!` `*`) plus the orthogonal `key=` | [§Version-tagged field selection](#version-tagged-field-selection) | -| Canonical byte encoding = existing length-prefixed `:=:`; maps sorted-key; per-type value slots — pinned irreversibly at the reset | §The projection substrate | +| Canonical byte encoding = existing length-prefixed `:=:`; maps sorted-key; per-type value slots - pinned irreversibly at the reset | §The projection substrate | | Frozen-ness = compiler + generator + regeneration-idempotence; golden-vector coverage (tag-independent dropped-fields oracle) is the backstop; exclusion ledger kept for `-` fields | [§Golden-vector coverage](#golden-vector-coverage-the-backstop) | | Stored hash = atomic `v:sha256:` token; lock format `Version` stays `1`; sub-floor/downgraded tokens reconciled by force-rehash | §The lock changes at the reset | | Stored hash read only through the two-type token split (`StoredToken`/`FreshToken`, non-comparable), adopted in PR C | [§Downstream consumers](#downstream-fingerprint-consumers-blast-radius) | | Version write-guard required (refuse to write above the binary's `currentLockContentVersion`); CI version-pin blocks old-binary commits | [§Registry floor](#registry-floor-and-forced-migration) | -| Back-compat: no reader recomputes a historical fingerprint (synthetic history / overlays read stored strings only) | [§Back-compat invariant](#back-compat-invariant--synthetic-history-reads-stored-strings-never-recomputes) | +| Back-compat: no reader recomputes a historical fingerprint (synthetic history / overlays read stored strings only) | [§Back-compat invariant](#back-compat-invariant-synthetic-history-reads-stored-strings-never-recomputes) | | Registry retention is a floor, not "last N"; `component migrate` is the forced-migration pass (a deliberate release-grade event) | [§Registry floor](#registry-floor-and-forced-migration) | | One content version, stored only in `InputFingerprint`'s prefix; `ResolutionInputHash` stays bare; resolution replay reserved | [§Both hashes share one version](#both-hashes-share-one-version) | | Historical comparators compare the digest (strip `v:`), so version-only re-stamps mint no release | [§Synthetic changelog path](#the-synthetic-changelogrelease-path-is-the-real-hazard) | From 2c49876b372fc2a91b03dee50c20a72a7a0cafa2 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Wed, 10 Jun 2026 16:15:49 -0700 Subject: [PATCH 13/15] update 10 --- docs/developer/rfc/lazy-schema-migration.md | 72 ++++++++++----------- 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index d689d748..9d8273bb 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -160,7 +160,7 @@ The recurring requirement across the "No" rows is the same: **distinguish a chan Two substrates can produce a content fingerprint of the resolved config. The difference that matters here is **whether an old algorithm function can be frozen.** - **`hashstructure` + `Includable` (rejected as the substrate).** Keeps existing hashes byte-identical and gives per-field omission via `HashInclude`. But, as established above (Problem 6), a function built on `hashstructure.Hash` reflects over the live struct and method set, so it cannot be a frozen historical algorithm. It also requires a value-receiver `HashInclude` on *every* nested fingerprinted struct and a subtle `v.(reflect.Value)` type-assert to work at all - brittle plumbing in service of a substrate that still can't host sound replay. -- **Canonical projection + stdlib hash (chosen).** Split the two jobs `hashstructure` fuses - *field selection* and *hashing* - into explicit steps. Field selection is **declared per field** as a version-set in the `fingerprint` tag (`fingerprint:"v1..*"`); a `go generate` step emits a per-version `projectVN(cfg) []byte` function that serializes the fields whose set includes version N in a canonical, sorted, self-delimiting byte form, and an stdlib `sha256` hashes those bytes. Because a shipped `projectVN` is frozen checked-in code, it does not see fields added later, does not depend on the type's method set, and does not depend on receiver subtleties. It is a genuinely frozen pure function of `(cfg)` per version - the property replay requires. The cost is owning the generator and **golden hash vectors** per version (a checked-in `(config, version) → hash` table) so the generator itself is CI-backstopped. +- **Canonical projection + stdlib hash (chosen).** Split the two jobs `hashstructure` fuses - *field selection* and *hashing* - into explicit steps. Field selection is **declared per field** as a version-set in the `fingerprint` tag (`fingerprint:"v1..*"`); a `go generate` step emits a per-version `projectVN(cfg) []byte` function that serializes the fields whose set includes version N in a canonical, sorted, self-delimiting byte form, and an stdlib `sha256` hashes those bytes. Because a generated `projectVN` is checked-in code, a *superseded* version (one the next version has replaced) does not see fields added later, does not depend on the type's method set, and does not depend on receiver subtleties. It is a genuinely frozen pure function of `(cfg)` per version - the property replay requires. The cost is owning the generator and **golden hash vectors** per version (a checked-in `(config, version) → hash` table) so the generator itself is CI-backstopped. The projection substrate is what makes G4 true for old locks and what makes Part 2's replay sound. It is adopted at the reset (below), not incrementally. @@ -221,7 +221,7 @@ The cost is owning the projection encoder and the golden vectors. That cost is p Field membership in each version's projection is declared **on the struct field**, as a version-set in the existing `fingerprint` tag. A `go generate` step reads those tags and **emits** a per-version `projectVN(cfg) []byte` function - the tags are the source of truth, the generated functions are the artifact. This is the chosen mechanism; a runtime reflective walker and hand-written functions are the [alternatives](#alternatives-considered). -**Grammar** (deliberately small): +**Grammar** (core shape; the orthogonal `key=` override is omitted here for clarity and folded into the complete, authoritative grammar below): ```ebnf tag = "-" | member, { ",", member } ; @@ -271,7 +271,7 @@ Every frozen output is byte-preserved, and the **golden vectors prove it**: the **Escape hatch - the registry already is one; a per-field one is deferred.** The whole-function hatch is free: the registry is `map[int]computeFn` and does not care whether an entry was generated or hand-written, so a version the generator cannot express is simply *not generated* - you drop a hand-written `computeFPN` into the map instead. No new mechanism. A *per-field* hatch (massaging one field's encoding inside an otherwise-generated function - e.g. `fingerprint:"v1..*,enc=sortedSlice"`) is **deliberately not built now**: custom encoding is an *encoding* concern, which the rule above already routes through a versioned-code bump, and adding an `enc=` operator is an RFC-grade grammar change (the grammar is frozen at three range-operators + `key=`). Note the cost either way: a hand-written or hand-edited version drops back to **golden-vectors-only** - it loses regeneration-idempotence and the generator's completeness/coverage guards - so the hatch is for rare, deliberate cases, not routine use. -> **What codegen freezes structurally, and what golden vectors backstop.** Generation reflects the live struct - but at *build time*, and its output is **frozen checked-in code**, so the runtime projection never reflects the live struct (the way the rejected `hashstructure` substrate did at hash time, Problem 6). Three things the tag alone does not pin are pinned by the generated code instead: the **emit-key** (a literal string), **field membership** (a literal field list per version), and **field removal** (a retained `projectVN` references the field by Go path, so deleting it won't compile). Two things the compiler cannot independently judge - the **per-field encoding/type** (whether the emitted bytes are *right*) and the **zero-predicate** - are caught by regeneration-idempotence (CI runs `go generate`; any diff to a retained `projectVN` fails) and ultimately by **golden vectors**, which catch a generator bug that would move a shipped version's bytes. Golden-vector coverage is therefore a *backstop* behind compiler + generator, not the sole load-bearing guarantee - the design keeps its structural guarantees structural. +> **What codegen freezes structurally, and what golden vectors backstop.** Generation reflects the live struct - but at *build time*, and its output is **frozen checked-in code**, so the runtime projection never reflects the live struct (the way the rejected `hashstructure` substrate did at hash time, Problem 6). Three things the tag alone does not pin are pinned by the generated code instead: the **emit-key** (a literal string), **field membership** (a literal field list per version), and **field removal** (a retained `projectVN` references the field by Go path, so deleting it won't compile). Two things the compiler cannot independently judge - the **per-field encoding/type** (whether the emitted bytes are *right*) and the **zero-predicate** - are caught by regeneration-idempotence (CI runs `go generate`; any diff to a retained `projectVN` fails) and ultimately by **golden vectors**, which catch a generator bug that would move a retained version's bytes. Golden-vector coverage is therefore a *backstop* behind compiler + generator, not the sole load-bearing guarantee - the design keeps its structural guarantees structural. **Enforcement**, in order of strength: @@ -282,7 +282,7 @@ Every frozen output is byte-preserved, and the **golden vectors prove it**: the #### Golden-vector coverage: the backstop -Compiler + generator + regeneration-idempotence carry the structural load; golden vectors are the semantic backstop that catches a *generator* bug moving a shipped version's bytes. Two properties make the backstop structural rather than discipline: +Compiler + generator + regeneration-idempotence carry the structural load; golden vectors are the semantic backstop that catches a *generator* bug moving a retained version's bytes. Two properties make the backstop structural rather than discipline: - **Expected digests are hand-authored and never generator-emitted.** If the `(config, version) → digest` table were regenerated in lockstep with the projector code, a "delete-everything-and-regenerate" commit would move *both* the code and its own expected values, and the backstop would silently agree with itself. The expected digests are therefore hand-committed; `go generate` may scaffold *cases* but must never write the expected values. A mutation to any retained vector is a hard CI failure, not a moved line a reviewer must notice. - **Retained-version manifest.** The generator validates each retained version against a checked-in manifest (the field set + emit-keys that version measures); generation fails if a retained version's entry lacks a compatible live path, *unless* it is below `minSupportedLockContentVersion`. This is what keeps field-removal structural under the normal *delete + regenerate + commit* workflow (where the compile guard alone would be bypassed) - so the negative test is **delete + regenerate + build**, not just delete. @@ -297,8 +297,8 @@ Non-zero coverage alone is necessary but not sufficient; four more obligations c - **`!`-zero behavior.** Dropping `!` silently stops emitting a build-meaningful zero (G5). Every retained `!` range needs a **zero-valued** discrimination vector. - **Encoding across the value space.** A `"foo"` vector misses an encoder change affecting only delimiter bytes, multibyte runes, or multi-entry slices/maps. Add per-encoder **property/fuzz** vectors. (Fails toward G1 over-drift except under a collision the length-prefixed form makes unlikely.) -- **nil-vs-empty is a resolver invariant - slices *and* maps.** Under `IsZero()` a nil slice/map omits and a non-nil empty one (`[]`, `{}`) emits, and the resolver's `mergo.Merge(…, WithOverride, WithAppendSlice)` ([`component.go`](../../../internal/projectconfig/component.go) `MergeUpdatesFrom`, `ResolveComponentConfig`) can yield *either* for the same intent depending on merge order, with **no post-merge normalization today**. This is the one correctness assumption the design *preserves rather than proves*. PR A closes it structurally: (a) a single named chokepoint - `canonicalizeForFingerprint(cfg)` at the **end of `ResolveComponentConfig`** - owns nil-vs-empty normalization, so it is one enforced place, not a convention scattered across merge sites; (b) it carries an **inventory** of every fingerprint-sensitive slice/map field; and (c) its canonical-form test is written **first - before any golden vector is authored**, or a vector bakes in a non-deterministic encoding. (This and the scalar-slice row of the encoding table are the same question - settle them together.) This is the single most load-bearing PR-A gate. -- **`!` on an all-zero nested struct emits.** `IsZero()` on a struct is true iff every sub-field is zero, so a `!`-tagged nested struct whose fields all resolve to zero would otherwise be omitted; the generated sub-projector treats a `!` range as "emit the (recursively projected) value even when the struct `IsZero`," so a build-meaningful all-zero struct still hashes. Covered by a zero-valued discrimination vector like any other `!` range. +- **nil-vs-empty is a resolver invariant - scalar slices only.** It bites only where `IsZero` distinguishes nil from non-nil-empty *and* the predicate is plain `IsZero`: i.e. **scalar slices** (`[]string`). Composites (maps, slice-of-struct, nested structs) omit by *projected emptiness*, under which a nil and an empty container both project no bytes, so they need no normalization (maps are swept in only defensively). For scalar slices, the resolver's `mergo.Merge(…, WithOverride, WithAppendSlice)` ([`component.go`](../../../internal/projectconfig/component.go) `MergeUpdatesFrom`, `ResolveComponentConfig`) can yield nil *or* `[]` for the same intent depending on merge order, with **no post-merge normalization today**. This is the one correctness assumption the design *preserves rather than proves*. PR A closes it structurally: (a) a single named chokepoint - `canonicalizeForFingerprint(cfg)` at the **end of `ResolveComponentConfig`** - collapses every fingerprint-sensitive scalar slice to one canonical form (**nil**), so it is one enforced place, not a convention scattered across merge sites; (b) a **CI cross-check binds that chokepoint's scalar-slice inventory to the generator's enumerated scalar-slice measured-field set** (the same anti-staleness pattern as the coverage corpus), so a newly-added scalar slice cannot silently miss normalization; and (c) its canonical-form test is written **first - before any golden vector is authored**, or a vector bakes in a non-deterministic encoding. **Every omit predicate runs *after* `canonicalizeForFingerprint`**, so the projector never sees a non-canonical slice. This is the single most load-bearing PR-A gate. +- **`!` on a projected-empty nested struct emits.** A `!`-tagged nested struct whose *measured* children all resolve to zero would otherwise be omitted by projected emptiness (it emits no measured bytes); the generated sub-projector treats a `!` range as "emit the recursively projected value even when the sub-projector emits no measured bytes," so a build-meaningful all-measured-zero struct still hashes - **including the case where only an excluded (`-`) child is non-zero**, which a global `IsZero` predicate would mis-handle (the struct reads raw-non-zero, so the `!` trigger keyed on `IsZero` would never fire). Covered by a discrimination vector whose measured children are all zero - one variant with a non-zero excluded child - asserting the `!` struct still emits. - **Enumerator completeness.** The coverage corpus is checked against the **generator's own field enumeration** (above), so it cannot drift from what is measured - a newly-added field or nested struct that the generator reaches but the corpus does not cover fails the backstop, not just the generator. ### Baseline v1: omit-if-zero, no include-always legacy @@ -330,7 +330,7 @@ func projectV1(c *ComponentConfig) []byte { **The generated encoding contract - frozen per version, fully specified before PR A.** The golden vectors bake the byte encoding in irreversibly at the reset, so every value type's serialization is a one-way door and must be pinned now, not discovered later. The contract: -- **Composite omission is by *projected* emptiness, not raw `IsZero()`.** A nested struct or a map/slice-of-struct entry is `reflect.Value.IsZero()` only when **every** sub-field is zero - *including excluded (`fingerprint:"-"`) children* - so a global `IsZero()` predicate would leak: a measured composite whose only non-zero content is an excluded child is not `IsZero`, so it would emit, and the digest would move on a change that touched no *measured* input. **This is a hazard the generator must design out, not a current bug:** today `ComponentConfig.Build` holds only the already-excluded `Failure`/`Hints` (both `fingerprint:"-"`), and today's `hashstructure` drops them *in its own walk*, so setting `build.failure.expected` moves no hash today - the leak appears only if the new projector naively reflects the parent through one global `IsZero()`. The rule that forecloses it: a composite is omitted when its **frozen sub-projector emits no measured bytes** (unless tagged `!`), *not* when the raw value `IsZero`. So the predicate **splits by kind: scalar leaves (including scalar slices like `[]string`) keep plain `IsZero()`; composites (nested struct, map, slice-of-struct) use projected emptiness** - see the **v1 encoding table** below for the per-type rule, including where slices and maps fall. The coverage backstop gains the **inverse** check it was missing: a **negative discrimination vector** per `-` field (and per all-`-`-value map entry) that varies that field alone and asserts the digest does **not** move. +- **Composite omission is by *projected* emptiness, not raw `IsZero()`.** A nested struct or a map/slice-of-struct entry is `reflect.Value.IsZero()` only when **every** sub-field is zero - *including excluded (`fingerprint:"-"`) children* - so a global `IsZero()` predicate would leak: a measured composite whose only non-zero content is an excluded child is not `IsZero`, so it would emit, and the digest would move on a change that touched no *measured* input. **This is a hazard the generator must design out, not a current bug:** today `ComponentConfig.Build`'s only *excluded* children are `Failure`/`Hints` (both `fingerprint:"-"`) - they sit alongside five measured children (`With`, `Without`, `Defines`, `Undefines`, `Check`) - and today's `hashstructure` drops the excluded pair *in its own walk*, so setting only `build.failure.expected` moves no hash today; the leak would appear only if the new projector naively reflected the parent through one global `IsZero()`. The rule that forecloses it: a composite is omitted when its **frozen sub-projector emits no measured bytes** (unless tagged `!`), *not* when the raw value `IsZero`. So the predicate **splits by kind: scalar leaves (including scalar slices like `[]string`, which the resolver chokepoint first collapses to canonical nil) keep plain `IsZero()`; composites (nested struct, map, slice-of-struct) use projected emptiness** - see the **v1 encoding table** below for the per-type rule, including where slices and maps fall. The coverage backstop gains the **inverse** check it was missing: a **negative discrimination vector** per `-` field (and per all-`-`-value map entry) that varies that field alone and asserts the digest does **not** move. - **Maps emit in sorted-key order; an entry whose value projects empty still emits its key.** A naive `range` over a Go map is **non-deterministic** (randomized iteration), so a generated `b.emitMap` must sort entries by key and emit each as `:=:` under the field key - the one guarantee `hashstructure` gave for free that the projection must re-establish, else an unchanged config hashes differently across runs (intermittent spurious drift). **Map-key membership is itself measured:** an entry whose *value* projects to empty still emits its key, so `{"baz":{}}` ≠ `{}` (matching today's `hashstructure`, which hashes map keys). Tests: a fuzz vector (≥2 keys, varying insert order → identical digest) **and** a key-varying vector (add an empty-value entry → digest moves). The natural "set a non-zero value" vector would *not* exercise this, so it must be written explicitly. - **Value-slot encoding is defined per type, not left to `%v`.** `bool` → `"true"`/`"false"`; integers → base-10; `[]T` → each element as its own length-prefixed sub-value in slice order (not a JSON blob); `map` → as above. A **named scalar type** (e.g. `fileutils.HashType`, `SpecSourceType`, `ComponentOverlayType`, `ReleaseCalculation`) encodes by its **underlying `reflect.Kind`** (named string/int/bool → the underlying kind) - these are measured fields, so they must *not* fail generation. Only genuinely un-encodable shapes (interfaces, generics, pointers to external types, `time.Time`/`[]byte`-style special-cases not present in today's measured graph) **fail generation** rather than fall back to a `fmt`-style encoding a dependency could change underneath us. The v1 encoding test enumerates every named scalar in the measured graph. - **Nested struct values are emitted by a frozen per-version sub-projector, never by runtime reflection.** If a generated `projectV1` emitted a `[]ComponentOverlay` by delegating to a *live* reflective encoder, adding a field to `ComponentOverlay` later would change `projectV1`'s output at hash time - Problem 6 reborn one layer down. The generator therefore emits a literal per-version projector for each nested struct type too; element/value projectors are frozen exactly like top-level ones. @@ -342,9 +342,9 @@ func projectV1(c *ComponentConfig) []byte { | --- | --- | --- | --- | | `string` (`upstream`) | raw bytes | `IsZero` (`""`) | scalar leaf | | `bool` (`strip-debug`) | `true` / `false` | `IsZero` (`false`), **unless `!`** | `!` emits the build-meaningful `false` | -| `int` (`manual-bump`) | base-10 | `IsZero` (`0`) | scalar leaf | +| `int` (forward-looking) | base-10 | `IsZero` (`0`) | scalar leaf; no plain `int` in the v1 measured graph (`manual-bump` is a combiner input, not projected) | | named scalar (`SpecSourceType`, `fileutils.HashType`, `ComponentOverlayType`, `ReleaseCalculation`) | by **underlying `reflect.Kind`** | underlying `IsZero` | measured - must **not** fail generation | -| `[]string` (`patches`) | length-prefixed elements, slice order | nil **or** empty | **scalar slice**: membership measured; nil≡`[]` collapsed by the resolver chokepoint (below), then pinned by a golden vector | +| `[]string` (`patches`) | length-prefixed elements, slice order | canonical nil (post-`canonicalizeForFingerprint`) | **scalar slice**: the resolver chokepoint (above) collapses nil≡`[]` to nil, then plain `IsZero` omits it; pinned by a golden vector. Tag `!` (exempt from collapse) to keep an explicit empty build-meaningful | | `[]Struct` (`[]ComponentOverlay`) | each element via its frozen sub-projector | no element projects bytes | **composite slice**: each element kept/dropped by *its own* projected emptiness | | `map[string]string` (`defines`) | sorted-key `:k=:v` | no entries | key membership measured (`{"k":""}` ≠ `{}`) | | `map[string]Struct` (`map[string]PackageConfig`) | sorted-key; value via frozen sub-projector | no entries | **excluded `-` in v1**; if ever included, this row governs | @@ -352,7 +352,7 @@ func projectV1(c *ComponentConfig) []byte { | `*Struct` | follow if non-nil; element via sub-projector | nil, or points to a projected-empty value | | | interface / type param / `func` / `chan` / pointer-to-external / `time.Time`·`[]byte`-style | - | - | **fails generation** (no silent `fmt` fallback); none present in the v1 graph | -**Cost of pruning at `-`, and its tripwire (G5 guard).** Excluding a composite also removes its subtree from the completeness walk, so a *future* build-effective field added under an excluded type would be **silently unmeasured**. `Packages map[string]PackageConfig` is excluded today because `PackageConfig` holds only the publish-only, `-`-tagged `Publish` - correct now, but `PackageConfig` is documented as growable. You cannot both kill the key-churn (needs excluding the map) *and* keep the per-leaf guard alive (needs an included edge), so the exclusion carries an **external tripwire**: a CI test asserting the excluded type's field set stays within its known-inert set (a new `PackageConfig` field fails CI → forces re-evaluation of the parent exclusion), recorded against its exclusion-ledger entry. +**Cost of pruning at `-`, and its tripwire (G5 guard).** Excluding a composite also removes its subtree from the completeness walk, so a *future* build-effective field added under an excluded type would be **silently unmeasured**. `Packages map[string]PackageConfig` is measured today (untagged); it **will be excluded at the reset** (load-out item 7), and once excluded its subtree leaves the completeness walk. The exclusion is sound because `PackageConfig` holds only the publish-only, `-`-tagged `Publish` - but `PackageConfig` is documented as growable. You cannot both kill the key-churn (needs excluding the map) *and* keep the per-leaf guard alive (needs an included edge), so the exclusion carries an **external tripwire**: a CI test asserting the excluded type's field set stays within its known-inert set (a new `PackageConfig` field fails CI → forces re-evaluation of the parent exclusion), recorded against its exclusion-ledger entry. **Why omit-if-zero is safe - fingerprints see the resolved config.** The usual objection to blanket omit-if-zero is the false-negative footgun: a field whose zero is meaningful gets omitted and collides with "unset," so two semantically different configs hash the same and a rebuild is missed. That objection assumes we hash *raw user input*. We do not. `ComputeIdentity` runs on the **resolved, post-merge** config (`*result.config`, after defaults are applied). The omit predicate is therefore "the *resolved value* equals Go-zero," not "the user didn't type it." Consequences: @@ -368,7 +368,7 @@ So the classic false-negative requires absence ≠ zero-default *at the point of The omit predicate **splits by kind** - scalar leaves use `reflect.Value.IsZero()`, composites (nested struct, map, slice-of-struct) use projected emptiness (the encoding contract above is the single source of truth); `!` is the only per-field override. A few `IsZero` consequences on the **scalar leaves** still need stating, because `IsZero` is type-specific: - **Meaningful zero with a non-zero default** (e.g. `int Jobs` defaulting to `4`, where `0` means serial). Post-merge: unset → `4` (emitted), explicit `0` → omitted. These build differently *and* hash differently, so there is no collision - they are consistent. Use a `!` range only if a zero value must be distinguishable from a future change of default. -- **nil vs empty slice - they hash *differently* under `IsZero`.** A nil slice is zero → omitted; a non-nil empty slice (`[]`) is **not** zero → emitted. If post-merge resolution can produce *either* nil or `[]` for the same intent, that ambiguity would move a hash - so the rule is: **resolution must normalize to one canonical form**, and where an explicit-empty value is build-meaningful and reachable, tag the field `!` so nil and empty both emit and stay distinguishable. This is a constraint on the resolver, pinned by a golden vector, not a free-for-all. +- **nil vs empty scalar slice - canonicalized before projection.** Raw `IsZero` would treat a nil slice (omitted) and a non-nil empty `[]` (emitted) differently, and resolution can produce either for the same intent. The rule (stated once in [the coverage section](#golden-vector-coverage-the-backstop)): `canonicalizeForFingerprint` collapses every fingerprint-sensitive scalar slice to **nil** *before* the projector runs, so plain `IsZero` then omits it deterministically. Where an explicit-empty value is build-meaningful, tag the field `!` **and exempt it from collapse**, so nil and `[]` both emit and stay distinguishable. Pinned by a golden vector, enforced at the one chokepoint - not a per-call-site convention. ### The reset load-out: what to spend the free rebuild on @@ -468,15 +468,15 @@ This resolves Problems 2 (for default changes), 3 (hashing bugfixes), and 5 (pie We version them with **one shared integer**, not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version axes would double the floor/replay/migrate machinery for an input set (`ResolutionInputHash`) that changes rarely - YAGNI. -**`InputFingerprint` is the sole prefix authority; `ResolutionInputHash` stays bare.** The shared version is physically stored **only** in `InputFingerprint`'s `v:` prefix. `ResolutionInputHash` carries **no prefix** - it remains a bare `sha256:` digest. This is the decisive choice that prevents the *fingerprint-bump* desync: the **first fingerprint-only `v2`** already advances the shared prefix, and if `ResolutionInputHash` *also* carried it, `resolver.go`'s raw string compare of the whole field would see a prefix-only move (`v1:…X` → `v2:…X`, `computeRes1` unchanged) and mark resolution stale → fleet-wide re-resolution for nothing. With the prefix living only in `InputFingerprint`, the resolver compares a bare digest that does not move on a fingerprint bump. (This closes the fingerprint-bump direction only; the *symmetric* resolution-only-write desync stays dormant until `computeRes2` and is held by the structural tripwire in [`ResolutionInputHash`](#resolutioninputhash-bare-digest-replay-deferred).) The "shared version" therefore means: the integer in `InputFingerprint`'s prefix selects which `computeResN` produced `ResolutionInputHash` during replay (read from the one prefix), not that resolution stores its own copy. This also keeps `InputFingerprint` the only release-bearing field, so the historical changelog/classifier comparators - which compare the **digest**, stripping the `v:` prefix - never see a phantom move on a version-only re-stamp. (See [the synthetic-history path](#the-synthetic-changelogrelease-path-is-the-real-hazard).) +**`InputFingerprint` is the sole prefix authority; `ResolutionInputHash` stays bare.** The shared version is physically stored **only** in `InputFingerprint`'s `v:` prefix. `ResolutionInputHash` carries **no prefix** - it remains a bare `sha256:` digest. This is the decisive choice that prevents the *fingerprint-bump* desync: the **first fingerprint-only `v2`** already advances the shared prefix, and if `ResolutionInputHash` *also* carried it, `resolver.go`'s raw string compare of the whole field would see a prefix-only move (`v1:…X` → `v2:…X`, `computeRes1` unchanged) and mark resolution stale → fleet-wide re-resolution for nothing. With the prefix living only in `InputFingerprint`, the resolver compares a bare digest that does not move on a fingerprint bump. (This closes the fingerprint-bump direction only; the *symmetric* resolution-only-write desync stays dormant until `computeRes2` and is held by the structural tripwire in [`ResolutionInputHash`](#resolutioninputhash-bare-digest-replay-deferred).) The "shared version" therefore means: the integer in `InputFingerprint`'s prefix selects which `computeResN` produced `ResolutionInputHash` during replay (read from the one prefix), not that resolution stores its own copy. This also keeps `InputFingerprint` the only release-bearing field, so the historical changelog/classifier comparators - which compare the **digest**, stripping the `v:` prefix - never see a phantom move on a version-only re-stamp. (See [the synthetic changelog/release path](#the-synthetic-changelogrelease-path-is-the-real-hazard).) -**Phasing.** The atomic token format (`v:sha256:…`) is fixed at the reset. Fingerprint replay is wired in Part 2's first PR; **resolution-hash replay is reserved, not yet wired** - the slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`). Because `ResolutionInputHash` is bare and prefix-free, a fingerprint-only bump before that day is a no-op for the resolver - the deferral is genuinely safe, not merely small-blast-radius. See [`ResolutionInputHash`](#resolutioninputhash-bare-digest-replay-deferred). +**Phasing.** The atomic token format (`v:sha256:…`) is fixed at the reset. Fingerprint replay is wired in Part 2's first PR; **resolution-hash replay is reserved, not yet wired** - the slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its comparison sites (the resolution-staleness branch of `computeFreshnessStatus` + the `resHashChanged` silent-write guard in `updateResolutionHash`). Because `ResolutionInputHash` is bare and prefix-free, a fingerprint-only bump before that day is a no-op for the resolver - the deferral is genuinely safe, not merely small-blast-radius. See [`ResolutionInputHash`](#resolutioninputhash-bare-digest-replay-deferred). #### Churn-avoidance policies (G1) The version stamp is itself a potential source of spurious diffs - the exact thing G1 forbids. The rule that prevents it is one idea: **judge "changed?" by replaying the lock's *own* version, not the current one.** Everything below follows from that. -**Why the obvious approach is wrong.** Today `update.go` sets `result.Changed = true` the instant `lock.InputFingerprint != identity.Fingerprint`, where `identity` is computed at the **current** version. That comparison sits *upstream* of the write guard `if !result.Changed && !resHashChanged { return false, nil }`. So the moment you ship a v1→v2 *algorithm* change, the current-version hash differs from every stored v1 token, `Changed` flips for **~every component at once**, and you get the mass auto-release-bump + mass lock rewrite G1 exists to prevent. The version stamp cannot "harmlessly ride the `Changed` path" - it *triggers* it. +**The mass-churn trap.** Today `update.go` sets `result.Changed = true` the instant `lock.InputFingerprint != identity.Fingerprint`, where `identity` is computed at the **current** version. That comparison sits *upstream* of the write guard `if !result.Changed && !resHashChanged { return false, nil }`. So the moment you ship a v1→v2 *algorithm* change, the current-version hash differs from every stored v1 token, `Changed` flips for **~every component at once**, and you get the mass auto-release-bump + mass lock rewrite G1 exists to prevent. The version stamp cannot "harmlessly ride the `Changed` path" - it *triggers* it. **The fix: replay before you compare.** Recompute at the lock's recorded version first, and only call it changed if *that* disagrees: @@ -511,8 +511,8 @@ This is correct *by contract* (a v1 lock promises freshness under the v1 input s Lazy migration means an untouched lock can sit at an old version **indefinitely** (G3 by design). That makes "keep the last *N* versions" a **correctness cliff, not a tuning knob**: if pruning drops the compute function a lock still depends on, replay becomes impossible → forced `FreshnessStale` → the mass rebuild/rewrite (and, via the downstream-consumer analysis below, mass changelog churn) the whole design exists to avoid. So the floor must be explicit and paired with an escape hatch, decided now: - **`minSupportedLockContentVersion`** is a hard floor. A lock below it cannot be replayed and is treated as `Stale`. Dropping a registry entry is therefore a deliberate, breaking, announced act - never incidental cleanup. -- **`component migrate`** force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception - it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* - it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm, so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic-history trap](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm rewrites only the bare, prefix-free `ResolutionInputHash` - which `synthistory` never reads - so it is correctly release-silent.) Migrate is therefore rare: the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal - each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). -- **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine - left alone, the registry, golden vectors, and deprecated tombstone fields grow **append-only** (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion - minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. **Early-warning ramp:** the ceiling is a *warning at ceiling-1*, a hard failure only at the ceiling - so an approaching floor-raise surfaces as a heads-up on the PR *before* the one that registers `v(N+1)`, converting the forced migrate from a surprise blocking failure into a planned event (the design's goal that nothing *unplanned* ever forces a migrate). **Residual:** if genuine algorithm changes arrive *faster* than planned rebuilds, the ceiling still ultimately *forces* an unplanned, release-grade `component migrate`. The ceiling does not eliminate the expensive event; it bounds the backlog by *converting* an unbounded version spread into an occasional forced migrate, with one version of advance notice. This is the accepted cost of lazy-forever coexistence. +- **`component migrate`** force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception - it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* - it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm, so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic changelog/release path](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm rewrites only the bare, prefix-free `ResolutionInputHash` - which `synthistory` never reads - so it is correctly release-silent.) Migrate is therefore rare: the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal - each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). +- **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine - left alone, the registry and deprecated tombstone fields grow **append-only**, and the golden corpus grows **multiplicatively** (O(retained-versions × measured-fields)) (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion - minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. **Early-warning ramp:** the ceiling is a *warning at ceiling-1*, a hard failure only at the ceiling - so an approaching floor-raise surfaces as a heads-up on the PR *before* the one that registers `v(N+1)`, converting the forced migrate from a surprise blocking failure into a planned event (the design's goal that nothing *unplanned* ever forces a migrate). The ceiling and its ramp are installed by PR F; versions registered before PR F accumulate unguarded, which is acceptable because the floor stays at 1 and every version remains replayable until the first floor-raise. **Residual:** if genuine algorithm changes arrive *faster* than planned rebuilds, the ceiling still ultimately *forces* an unplanned, release-grade `component migrate`. The ceiling does not eliminate the expensive event; it bounds the backlog by *converting* an unbounded version spread into an occasional forced migrate, with one version of advance notice. This is the accepted cost of lazy-forever coexistence. **Mixed-toolchain hazard - bounded by the version-pin, not auto-repair.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock, stamping a legacy (prefix-less) or lower-`v` hash. In the **working tree** this is self-correcting: the next new-binary run detects the sub-floor token and force-rehashes it to the current version. But "self-correcting" stops at the working tree - if a downgraded lock is **committed**, `FindFingerprintChanges` reads `v1 → legacy → v1` as two real release events, and a published `%autorelease` increment cannot be withdrawn. So the load-bearing guard against *committed* phantom releases is the **CI version-pin**: post-cutover, no old binary may run the `update`-and-commit step. Concretely, that means the lock-writing CI job runs from a **pinned build image (by digest, not a floating tag)** rebuilt from the cutover commit or later, and **no other path reaches the `update`-and-commit step** - local developer binaries do not commit locks; only the pinned job does. (The force-rehash only cleans the working tree; it does not undo history.) The *symmetric* residual - a binary that predates content-version `v2` meeting a `v2` token it cannot replay - is closed by a **required** write-time guard: refuse to write a token whose version exceeds the binary's `currentLockContentVersion`, erroring rather than silently restamping at `v1`. Note this guard lives in the binary doing the write, so it constrains *newer-but-not-newest* binaries; it does **not** retroactively constrain a genuinely *old* binary - that direction is the version-pin's job. @@ -548,12 +548,12 @@ The reset establishes `projectV1` directly; it is *not* itself a Part 2 version This is the on-disk TOML axis. It is **independent** of the fingerprint axis and only needed once we make *non-additive* TOML changes (rename/move/remove fields in the file format itself) that were *not* already absorbed by the reset's normalization pass. Most of the hardest cases are spent at the reset (load-out item 6); this axis covers whatever non-additive TOML change arises *after*. 1. Add an explicit `schema-version` to the config file (distinct from the existing `$schema` URL, which is for editor validation). -2. At **load time**, migrate older config shapes forward into the single latest canonical struct *before* anything hashes them. Fingerprinting stays blissfully unaware of file-format history. A `config migrate` command (sibling to today's `config schema` / `config dump`) makes this an explicit, reviewable pass that rewrites stale TOML files in place to the current `schema-version`. +2. At **load time**, migrate older config shapes forward into the single latest canonical struct *before* anything hashes them. Fingerprinting stays unaware of file-format history. A `config migrate` command (sibling to today's `config schema` / `config dump`) makes this an explicit, reviewable pass that rewrites stale TOML files in place to the current `schema-version`. 3. The projection substrate already provides the clean seam: `projectVN` reads the post-migration canonical struct; the combiner stays in `fingerprint`. No `ConfigHash()` method is added (see [the seam note](#where-the-hashing-logic-should-live)). The critical invariant: **migrate old TOML → latest canonical struct, then project once.** A semantically no-op migration (rename `foo`→`bar`) must produce the *same* canonical struct, hence the same projection bytes, hence no drift. This is what keeps the schema axis **orthogonal** to the lock axis: a faithful `config migrate` is a pure re-encoding that moves *no* fingerprint, so it never triggers a `component migrate`. If a TOML change genuinely alters build meaning, that is a content-version bump (Part 2), not a `config migrate`. -**Resolved by projection:** the old `hashstructure` caveat - that it mixed `reflect.Type.Name()` into the hash, so renaming a Go struct moved every fingerprint even with identical content - **no longer applies.** The generated projection emits only the explicit field bytes, under each field's **frozen TOML key**, never the Go type or field name. So *both* a struct-type rename **and** a cosmetic field rename (`Foo`→`Bar`, same `toml:` key) are genuinely drift-neutral - **pinned by golden tests** (rename a fingerprinted struct, and rename a field while keeping its TOML key → byte-identical digest in both cases), so the property is CI-enforced, not just asserted here. Renaming the *TOML key itself* is an output-changing edit and takes a version bump like any other. +**Resolved by projection:** the old `hashstructure` caveat - that it mixed `reflect.Type.Name()` into the hash, so renaming a Go struct moved every fingerprint even with identical content - **no longer applies.** The generated projection emits only the explicit field bytes, under each field's **frozen TOML key**, never the Go type or field name. So *both* a struct-type rename **and** a cosmetic field rename (`Foo`→`Bar`, same `toml:` key) are genuinely drift-neutral - **pinned by golden tests** (rename a fingerprinted struct, and rename a field while keeping its TOML key → byte-identical digest in both cases), so the property is CI-enforced. Renaming the *TOML key itself* is an output-changing edit and takes a version bump like any other. ## Pipeline @@ -571,17 +571,17 @@ TOML on disk ──migrate to canonical struct (schema axis)──► ComponentC ## Downstream fingerprint consumers (blast radius) -The versioned-replay story in Part 2 must hold for **every** reader of `InputFingerprint`, not just the two paths it grew up around. This is the post-reset migration blast-radius map; each consumer's behavior under a Part 2 v1→v2 algorithm switchover is stated explicitly. (The *reset itself* is invisible to these consumers as analyzed under [Back-compat invariant](#back-compat-invariant-synthetic-history-reads-stored-strings-never-recomputes): they compare stored strings, and pre-reset locks are never recomputed.) +The versioned-replay story in Part 2 must hold for **every** reader of `InputFingerprint`, not just the two paths it grew up around. This is the post-reset migration blast-radius map; each consumer's behavior under a Part 2 v1→v2 algorithm switchover is stated explicitly. (The *reset itself* does not engage this Part 2 replay machinery - these consumers compare stored strings and pre-reset locks are never recomputed; the reset commit's digest move is its one intended visible event, the coordinated cutover, as analyzed under [Back-compat invariant](#back-compat-invariant-synthetic-history-reads-stored-strings-never-recomputes).) | Consumer | Reads | Compares | Migration behavior required | | -------- | ----- | -------- | --------------------------- | | `checkFingerprintFreshness` (resolver) | recomputed identity | vs stored token | Replay at token version (Part 2 core) | | `component update` `Changed` decision | recomputed identity | vs stored token | **Replay before `Changed`** (see churn policy seam) | -| `bumpComponents` (`update.go`) | recomputed identity | vs stored token | Current-tree replay (second `ComputeIdentity` caller) | +| `bumpComponents` (`update.go`) | recomputed identity | **writes unconditionally** (no compare) | Forced-change writer: takes `FreshToken.Token()`, not `Reconcile` (always `Changed` by design) | | `changed.go` `classifyComponent` (CI classifier) | stored token strings (two historical git refs) | **digest compare** (strip `v:` prefix) | **String-only - must NOT replay** (no inputs available; replaying historical configs would violate the no-recompute invariant) | | `changed.go` `haveMatchingFingerprints` (cache-poisoning integrity gate) | stored token strings | **digest compare** (strip `v:` prefix) | **String-only; security-load-bearing** - a version-only delta must read as "same" or the integrity check is silently skipped | | `synthistory.FindFingerprintChanges` | stored token strings across git history | **digest of adjacent commits** (strip `v:` prefix) | **String-only; digest-compare** so a version-only re-stamp never fires a release | -| `synthistory.BuildDirtyChange` | recomputed (current ver) | vs stored `headLock` token | **Replay at headLock version** before declaring dirty | +| `synthistory.BuildDirtyChange` | precomputed current-ver fingerprint vs stored `headLock` token | digest/token compare | **Caller replays** at `headLock` version and passes the replayed token (`BuildDirtyChange` holds no inputs - see below) | | `ResolutionInputHash` staleness/write | recomputed resolution hash | vs stored **bare** digest | **No prefix** (bare `sha256:`); fingerprint-only bumps never touch it; replay reserved | **Two comparator classes, not one - and only one of them can replay.** The consumers split cleanly by *what they hold*: @@ -595,12 +595,12 @@ The `changed.go` classifier is the easily-missed member of the *second* class: i The fix is a **two-type split**, because a single token type cannot tell the two comparator classes apart: -- **`StoredToken`** - parsed from a lock by the *sole* strict parser `ParseToken` (accepts only `sha256:` legacy and `v:sha256:`; any malformed token is treated as *changed*, never normalized to an empty digest). It exposes `SameDigest(other StoredToken)` and nothing else - it holds no inputs, so a site that has only stored strings *physically cannot* perform a freshness decision. +- **`StoredToken`** - parsed from a lock by the *sole* strict parser `ParseToken` (accepts only `sha256:` legacy and `v:sha256:`; any malformed token is treated as *changed* **for comparison**, never normalized to an empty digest - but round-tripped **verbatim on write**, so a malformed lock is never silently rewritten, see PR C). It exposes `SameDigest(other StoredToken)` and nothing else - it holds no inputs, so a site that has only stored strings *physically cannot* perform a freshness decision. - **`FreshToken`** - obtainable *only* from `ComputeIdentityAt(version, config, …)`, so constructing a *valid* one requires live inputs. Its zero value (`var f FreshToken`) is still syntactically constructible, so it **fails safe**: a `FreshToken` carries a validity bit set only by the constructor, and `Reconcile` on an unset one **returns `Stale`** (never errors, never `Fresh`). `Stale` is the fail-*safe* answer on a path whose job is "rebuild when in doubt" - a zero token means "no freshness evidence," so it triggers a rebuild (G1 churn at worst) and never blocks a `build`/`render`/`--check-only`, where an `error` would be fail-*stop* and could take the fleet down on an accidental zero-value path. It exposes `Reconcile(stored StoredToken) → {Fresh | Stale | RestampTo(v)}`. (Belt-and-suspenders: a named test `var f FreshToken; assert f.Reconcile(stored) == Stale`, and if feasible a vet/lint check that no site reconciles a statically-zero token - so a *programming* mistake is still caught loudly without coupling runtime behavior to it.) A historical site holding two `StoredToken`s can call `SameDigest` but cannot fabricate a `FreshToken`, so it cannot accidentally pose as a current-tree freshness check; a current-tree site must obtain a `FreshToken` to reconcile, which forces it through live inputs. The *assignment* documents the class, and the mis-classification path is unconstructible rather than merely discouraged. Both types are **non-comparable** (an unexported `_ [0]func()` field), so a raw `==` on a token outside the `fingerprint` package fails to compile. (Unexported fields alone would *not* do this: a struct of comparable unexported fields is still `==`-comparable from any package; the non-comparable sentinel is what blocks it.) -For the choke-point to be *structural* and not merely conventional, the **lock fields must be token-typed, not raw `string`**: as long as `ComponentLock.InputFingerprint`/`ResolutionInputHash` stay exported strings, `lock.InputFingerprint == other.InputFingerprint` still compiles and the raw-compare pattern stays copyable. So PR C changes those fields to `StoredToken` (TOML marshal/unmarshal routing through `ParseToken`, so every read crosses the strict parser), or hides the raw string behind an accessor that returns a `StoredToken`. Only then does \"enforced by types, not prose\" hold end-to-end. This lands in PR C, which already edits every comparison site. **The on-disk bytes are not automatically unchanged, though; the field *form* decides it** (verified empirically against go-toml/v2 v2.3.1, the pinned version). `omitempty` decides emptiness by reflecting a struct's *exported* fields **before** consulting `TextMarshaler`, so a token struct whose digest sits in an *unexported* field is judged empty and **dropped even when set**: a populated `input-fingerprint` silently vanishes, while a *non-*`omitempty` value struct instead emits a spurious `resolution-input-hash = ''` line. Two byte-neutral forms survive: (a) an **accessor**, keeping the on-disk field a `string` and exposing a `StoredToken` via method with writes routed through `ParseToken`, byte-neutral *by construction* (the serialized type never changes) and `==`-proof for every other package; or (b) a value struct with an **exported** digest field (so `omitempty` tracks it) plus a custom marshal that renders it as a bare string. The pointer form (`*StoredToken`) is byte-neutral but reintroduces a silent pointer-`==`, so it is rejected. **PR-C acceptance gate:** a golden round-trip test proving a real local lock's bytes are unchanged across the conversion, so the property is *tested*, not asserted. Either accepted form lands in PR C with no separate on-disk-format bump. +For the choke-point to be *structural* and not merely conventional, the **lock fields must be token-typed, not raw `string`**: as long as `ComponentLock.InputFingerprint`/`ResolutionInputHash` stay exported strings, `lock.InputFingerprint == other.InputFingerprint` still compiles and the raw-compare pattern stays copyable. So PR C changes those fields to `StoredToken` (TOML marshal/unmarshal routing through `ParseToken`, so every read crosses the strict parser), or hides the raw string behind an accessor that returns a `StoredToken`. Only then does \"enforced by types, not prose\" hold end-to-end. This lands in PR C, which already edits every comparison site. **The on-disk bytes are not automatically unchanged, though; the field *form* decides it** (verified empirically against go-toml/v2 v2.3.1, the pinned version; the reproduction is banked as a committed `_test.go` artifact in PR C, so "verified empirically" has a citation). `omitempty` decides emptiness by reflecting a struct's *exported* fields **before** consulting `TextMarshaler`, so a token struct whose digest sits in an *unexported* field is judged empty and **dropped even when set**: a populated `input-fingerprint` silently vanishes, while a *non-*`omitempty` value struct instead emits a spurious `resolution-input-hash = ''` line. Two byte-neutral forms survive: (a) an **accessor**, keeping the on-disk field a `string` and exposing a `StoredToken` via method with writes routed through `ParseToken`, byte-neutral *by construction* (the serialized type never changes) and `==`-proof for every other package; or (b) a value struct with an **exported** digest field (so `omitempty` tracks it) plus a custom marshal that renders it as a bare string. The pointer form (`*StoredToken`) is byte-neutral but reintroduces a silent pointer-`==`, so it is rejected. **PR-C acceptance gate:** a golden round-trip test proving a real local lock's bytes are unchanged across the conversion, so the property is *tested*, not asserted. Either accepted form lands in PR C with no separate on-disk-format bump. ### The synthetic changelog/release path is the real hazard @@ -608,9 +608,9 @@ For the choke-point to be *structural* and not merely conventional, the **lock f - **`FindFingerprintChanges` (historical walker)** compares `InputFingerprint` across the lock's git history and emits a synthetic changelog/release entry on every change. It compares the **digest** (stripping the `v:` version prefix), not the full token - a one-line string operation, not the infeasible version-aware replay (it has only committed *strings*, no inputs). So a version-only re-stamp (a lazy v1→v2 migration with an unchanged digest) is **invisible** to it; only a moved digest - a genuine input change - fires, and the migration folds into the real change's entry that carries it. The v1→v2 conversion is thus an *accepted, per-component, notable* changelog event that piggybacks a real change, guaranteed by digest-comparison rather than by lazy-discipline. - **`component migrate` is release-grade *when it moves digests*.** A migrate that retires a *fingerprint* algorithm re-stamps every unchanged lock from `computeFP1`'s digest to `computeFP2`'s - the digests move, the walker fires, and the fleet-wide release is the deliberate cost ([registry floor](#registry-floor-and-forced-migration)). A migrate that retires only a *resolution* algorithm rewrites only the bare `ResolutionInputHash` (which `synthistory` never reads), so it is correctly release-silent. Either way the firing tracks a real `InputFingerprint` digest move. -- **`BuildDirtyChange` (live dirty check)** compares a *recomputed* current-version (v2) hash against the *stored* (possibly v1) `headLock.InputFingerprint` and declares dirty on inequality. "Accept as notable" does **not** save this path: post-switchover an *unchanged* component would read **dirty on every `render`/`build`** until re-stamped - a persistent, recurring spurious signal, worse than a one-time entry. The fix is **free**: it is the *same replay Part 2 already owes the freshness check* - replay at `headLock`'s recorded version before declaring dirty. One additional call site for logic already being written, no new mechanism. +- **`BuildDirtyChange` (live dirty check)** compares a *recomputed* current-version (v2) hash against the *stored* (possibly v1) `headLock.InputFingerprint` and declares dirty on inequality. Post-switchover an *unchanged* component would otherwise read **dirty on every `render`/`build`** until re-stamped - a persistent, recurring spurious signal, worse than a one-time entry. The fix is the *same replay the freshness check already owes*, but with one signature caveat: **`BuildDirtyChange(currentFingerprint string, headLock *ComponentLock, currentUpstreamCommit string)` carries no `config`/`opts`**, so it physically cannot replay. The **caller** (which holds `*result.config`) must replay at `headLock`'s recorded version and pass the already-replayed current-version token in; `BuildDirtyChange` then digest-compares as before. So it is a caller-side change (or a signature widening to take the inputs), not a one-liner inside `BuildDirtyChange` - reusing logic already being written, but crossing the `sources` package boundary. -**Net:** the changelog-walker concern is not "make the walker version-aware" (hard, maybe infeasible). It is two cheap things - (1) the historical comparators (`FindFingerprintChanges`, `changed.go`) compare the **digest**, so a version-only delta never fires; and (2) extend the *current-tree* replay to `BuildDirtyChange` (which *does* hold live inputs), one call site for logic already being written. The reset commit is the single deliberate exception: it *is* a fleet-wide notable event, the coordinated cutover, intentionally visible. +**Net:** the changelog-walker concern is not "make the walker version-aware" (hard, maybe infeasible). It is two cheap things - (1) the historical comparators (`FindFingerprintChanges`, `changed.go`) compare the **digest**, so a version-only delta never fires; and (2) extend the *current-tree* replay to `BuildDirtyChange`'s **caller** (which holds live inputs; `BuildDirtyChange` itself receives the replayed token), reusing logic already being written. The reset commit is the single deliberate exception: it *is* a fleet-wide notable event, the coordinated cutover, intentionally visible. ### `ResolutionInputHash`: bare digest, replay deferred @@ -619,7 +619,7 @@ For the choke-point to be *structural* and not merely conventional, the **lock f - **Smaller blast radius.** `ResolutionInputHash` does **not** feed `synthistory`, so an algorithm change can never mint a phantom changelog/release (that hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. - **No pending change.** It is a flat seven-field SHA256, not a struct walk, so the projection substrate leaves it untouched. Its registry slot stays `computeRes1` until its inputs genuinely change. -**Decision (KISS/YAGNI):** wire fingerprint replay in Part 2's first PR. `ResolutionInputHash` stays a **bare `sha256:` digest with no `v:` prefix** (the prefix lives only in `InputFingerprint` - see [Both hashes share one version](#both-hashes-share-one-version)), so the resolver compares it directly and a fingerprint-only bump never touches it. The day `ComputeResolutionHash` first changes, add `computeRes2` and extend replay to its one comparison site (`checkResolutionFreshness` + the `resHashChanged` silent-write guard in `update.go`); decide *then* whether resolution needs its own prefix or reads the shared one. Because resolution carries no prefix and is compared bare today, a fingerprint-only bump never touches it - so the shared-prefix desync is **dormant, not eliminated**, and wakes only when resolution gains a second algorithm. The seam: the prefix advances only on `result.Changed`, while a resolution-only write takes the independent `resHashChanged` path ([`update.go`](../../../internal/app/azldev/cmds/component/update.go)). So once `computeRes2` exists, a resolution-only write would advance the bare digest while the shared prefix stays at `v1`, and replay would select `computeRes1` → permanent false-stale. This is **safe-direction (G1 churn, never a missed rebuild) and dormant** (resolution replay is reserved; one algorithm today), so it gates no PR now. To stop it shipping silently the day it matters, the guard is **structural, not prose**: registering a second resolution algorithm **fails the build** (a registry `init()`-time assertion) unless the desync is resolved - either resolution takes its own prefix, or a resolution-only write also re-stamps `InputFingerprint`'s prefix to current (same digest) behind a CI gate mirroring the dirty-change gate. The decision is *forced* at `computeRes2`, not forgotten. +**Decision (KISS/YAGNI):** wire fingerprint replay in Part 2's first PR. `ResolutionInputHash` stays a **bare `sha256:` digest with no `v:` prefix** (the prefix lives only in `InputFingerprint` - see [Both hashes share one version](#both-hashes-share-one-version)), so the resolver compares it directly and a fingerprint-only bump never touches it. The day `ComputeResolutionHash` first changes, add `computeRes2` and extend replay to its comparison sites (the resolution-staleness branch of `computeFreshnessStatus` + the `resHashChanged` silent-write guard in `updateResolutionHash`). The shared-prefix choice is **already settled** - resolution reads `InputFingerprint`'s prefix; it does not grow its own - so the only deferred work is *wiring* that replay. Deferral is safe because resolution carries no prefix and is compared bare today, so the shared-prefix desync is **dormant, not eliminated**, waking only when resolution gains a second algorithm. The seam: the prefix advances only on `result.Changed`, while a resolution-only write takes the independent `resHashChanged` path ([`update.go`](../../../internal/app/azldev/cmds/component/update.go)). So once `computeRes2` exists, a resolution-only write would advance the bare digest while the shared prefix stays at `v1`, and replay would select `computeRes1` → permanent false-stale. This is **safe-direction (G1 churn, never a missed rebuild) and dormant** (resolution replay is reserved; one algorithm today), so it gates no PR now. To stop it shipping silently the day it matters, the registry carries a **forced-conversation tripwire** - it surfaces the desync for review, it does not by itself verify the fix: give `lockAlgo` a `resolutionVersion int` field and have the registry `init()` panic (**failing at startup/test-time, caught by any package test in CI - not by `go build`**) if any entry's `resolutionVersion` exceeds the floor's without a `resolutionPrefixHandled` constant set, reducing the predicate to a `len`/const check. Remediation - resolution taking its own prefix, or a resolution-only write co-stamping `InputFingerprint`'s prefix to current (same digest) - is a property of `updateResolutionHash`, a *different* package the registry loop cannot see, so it is verified by a **named CI test mirroring PR C's dirty-change gate**, not by the tripwire. The decision is *forced* at `computeRes2`, not forgotten. ## Design decisions @@ -643,9 +643,9 @@ Field membership lives in a per-field version-set tag (`fingerprint:"v1..*"`); a - **The unsafe direction is the false-negative** (a meaningful field silently omitted → missed rebuild → stale artifact, a G5 violation). A *mandatory* tag - absent → generation fails - makes the include/exclude decision impossible to *forget*. The *wrongly-excluded* case (a `-` tag on a build-effective field) is caught by the kept exclusion ledger, and the *wrongly-included-but-unmeasured* case by the [coverage backstop](#golden-vector-coverage-the-backstop). - **Version-awareness is declarative.** A field's whole lifecycle - introduced at v3, dropped at v5, revived at v8 - is one greppable string on the field (`v3..v4,v8..*`), inexpressible in hand-written form, with no diff smeared across function bodies. -- **Frozen-ness stays structural.** Because the generated functions are checked-in code, a retained `projectVN` references each field by literal Go path - deleting a measured field won't compile, the emit-key is a literal, and regeneration-idempotence (CI `go generate` + diff) pins a shipped version's output. Golden vectors are the semantic backstop behind that, not the sole guarantee. This recovers the hand-written model's compile guarantee that a *runtime* reflective walker gives up (its output would reflect the live struct at hash time - Problem 6 one layer down), while keeping the DSL's declarative lifecycle. +- **Frozen-ness stays structural.** Because the generated functions are checked-in code, a retained `projectVN` references each field by literal Go path - deleting a measured field won't compile, the emit-key is a literal, and regeneration-idempotence (CI `go generate` + diff) pins a superseded version's output. Golden vectors are the semantic backstop behind that, not the sole guarantee. This recovers the hand-written model's compile guarantee that a *runtime* reflective walker gives up (its output would reflect the live struct at hash time - Problem 6 one layer down), while keeping the DSL's declarative lifecycle. -The `go generate` *infrastructure* already exists (`stringer`/`mockgen` via `mage`), so the marginal cost is low - but the projection generator's **stakes** are categorically higher than those tools, and the design treats it accordingly. A `stringer` bug is cosmetic; a `mockgen` bug breaks test compilation and is caught instantly. A **projection-generator bug silently moves a shipped version's bytes → fleet-wide G5 (stale, undetectable except by the corpus) or G1 (mass churn).** The generator is therefore a first-class, fingerprint-load-bearing production artifact with its own test suite, and **regeneration-idempotence is a required CI gate** (the [`.github/workflows/generate.yml`](../../../.github/workflows/generate.yml) check is mandatory, never skippable) - without it the freeze degrades from structural to test-discipline. That is precisely why the coverage oracle and hand-frozen golden digests above are mandatory, not optional. +The `go generate` *infrastructure* already exists (`stringer`/`mockgen` via `mage`), so the marginal cost is low - but the projection generator's **stakes** are categorically higher than those tools, and the design treats it accordingly. A `stringer` bug is cosmetic; a `mockgen` bug breaks test compilation and is caught instantly. A **projection-generator bug silently moves a retained version's bytes → fleet-wide G5 (stale, undetectable except by the corpus) or G1 (mass churn).** The generator is therefore a first-class, fingerprint-load-bearing production artifact with its own test suite, and **regeneration-idempotence is a required CI gate** (the [`.github/workflows/generate.yml`](../../../.github/workflows/generate.yml) check is mandatory, never skippable) - without it the freeze degrades from structural to test-discipline. That is precisely why the coverage oracle and hand-frozen golden digests above are mandatory, not optional. ### D3: Atomic self-describing token; no format bump, reconcile via force-rehash @@ -666,20 +666,20 @@ The lock **format** `Version` stays at `1`. Bumping it to `2` as a poison pill - - **Eager fleet-wide migration as the steady-state mechanism** - rewriting every lock on every algorithm change is the mass-churn the design exists to prevent. Rejected for the steady state. The *reset* is a deliberate, one-time, operator-driven eager pass riding an already-scheduled rebuild - the sanctioned exception, not the rule; `component migrate` is its post-reset equivalent for retiring an old version. - **Runtime reflective walker for field selection (instead of generated functions).** One generic `project(cfg, N)` reflects the struct at hash time and emits the fields whose version-set includes N. Least code, and it shares the tag syntax with the chosen approach. Rejected: it reflects the *live* struct at hash time - Problem 6 one layer down - so its frozen-ness rests entirely on golden-vector coverage (test discipline), and field removal degrades from a compile error to a CI failure. Codegen keeps the same tags but moves the reflection to *generate* time and freezes the output as checked-in code, recovering the compile guarantee. - **Hand-written per-version `projectVN` functions (instead of generating them from tags).** Each version gets a bespoke function with one explicit `emit`/`emitAlways` line per measured field. Same compile guarantees as codegen (removal won't compile, literal emit-key), but: membership is smeared across N function bodies; "bring a field back a few versions later" has no first-class expression (you re-add an `emit` line, nothing ties it to the field's earlier life); and the mandatory-decision and coverage properties need separate bookkeeping the tags otherwise carry. Codegen is the same runtime with declarative authoring - strictly preferable given the existing `go generate` infrastructure. -- **Per-field hash manifest in the lock (instead of one opaque token).** Store `{field → hash}` (à la `go.sum`) rather than a single `v:sha256:…` digest. *Genuine wins:* dropping a field becomes ignoring its manifest line - no projection kept alive for replay, so the **deprecate-then-delete two-step and the registry-retirement deadlock** (the append-only growth above) both vanish; and the stored-vs-stored historical comparators become structural set-diffs rather than version-blind string compares. *Why the opaque token still wins for azldev:* (1) the projection substrate **already** delivers additive immunity (G4) - the manifest's headline draw - so that advantage is moot, not additive; (2) the manifest does **not** kill the false-fresh hazard - an old lock has *no line* for a newly-measured input, so there is still no baseline to detect a change to it (the blind spot is relocated, not removed); (3) it makes *algorithm evolution* - the entire point of Part 2 - **harder**, needing per-field versioning where the token needs one integer for the whole algorithm; and (4) it bloats every lock to O(fields × components) (the well-known `go.sum` size cost). The manifest is the better tool for a *static* input set that mainly grows and shrinks; the opaque token + single version is the better tool for an *evolving hashing algorithm*, which is azldev's actual problem. The reset bakes the storage model in - token-vs-manifest is irreversible after PR B - and the retirement deadlock the manifest would have dissolved is instead answered by the floor-advance cadence above. +- **Per-field hash manifest in the lock (instead of one opaque token).** Store `{field → hash}` (à la `go.sum`) rather than a single `v:sha256:…` digest. *Genuine wins:* dropping a field becomes ignoring its manifest line - no projection kept alive for replay, so the **deprecate-then-delete two-step and the registry-retirement deadlock** (the append-only growth above) both vanish; and the stored-vs-stored historical comparators become structural set-diffs rather than version-blind string compares. *Why the opaque token still wins for azldev:* (1) the projection substrate **already** delivers additive immunity (G4) - the manifest's headline draw - so that advantage is moot, not additive; (2) the manifest does **not** kill the false-fresh hazard - an old lock has *no line* for a newly-measured input, so there is still no baseline to detect a change to it (the blind spot is relocated, not removed); (3) it makes *algorithm evolution* - the entire point of Part 2 - **harder**, needing per-field versioning where the token needs one integer for the whole algorithm; and (4) it bloats every lock to O(fields × components) (the well-known `go.sum` size cost). The manifest is the better tool for a *static* input set that mainly grows and shrinks; the opaque token + single version is the better tool for an *evolving hashing algorithm*, which is azldev's actual problem. There is no *incremental* middle path between them: sha256 is non-homomorphic, so per-field digests cannot be combined into the whole-config digest without re-hashing - the storage choice is genuinely binary (whole-config projection vs full per-field manifest), not a spectrum. The reset bakes the storage model in - token-vs-manifest is irreversible after PR B - and the retirement deadlock the manifest would have dissolved is instead answered by the floor-advance cadence above. ## Incremental delivery The reset (Part 1) must land as one coherent change at the dev→prod cutover; its pieces are independently reviewable but ship together because they all move the hash. -1. **PR A (substrate)**: the **projection generator** (`go generate`) - reads the version-set tags and emits the per-version `projectVN(cfg) []byte` functions (literal emits, sorted keys) plus golden-vector and coverage scaffolding - the canonical encoder (`canonicalBuf`, `emit`/`emitAlways`), the version-set tag parser, the frozen **TOML-key** emit rule, the **split omit-predicate** (scalar leaves `IsZero`, composites projected emptiness), the `sha256` combiner, and the golden vectors. Generate-time guards: a fingerprinted field with **no tag** fails generation; the slimmed **exclusion ledger** and **dropped-fields ledger** replace the retired `TestAllFingerprintedFieldsHaveDecision` audit; **regeneration-idempotence** (CI `go generate` + `git diff --exit-code`) pins shipped versions. Pure addition alongside the existing path; not yet wired into `ComputeIdentity`. Tests: a field tagged `v2..*` is absent from generated `projectV1`; a `!` range emits at zero; a field with **no** `fingerprint` tag fails generation; a **nested** fingerprinted struct with a tagless field fails generation; deleting a field a retained `projectVN` names **fails to compile**; a **Go-field rename keeping the TOML key** yields a byte-identical digest; two fields colliding on one emit-key fail generation; a `!`-tagged nested struct whose every child is `-` (so its mandatory `!`-zero discrimination vector is unsatisfiable) is rejected as degenerate; the coverage oracle (by struct-reflection, not the tag) fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`) and is not in the dropped-fields ledger; golden vectors pin v1; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser. +1. **PR A (substrate)**: the **projection generator** (`go generate`) - reads the version-set tags and emits the per-version `projectVN(cfg) []byte` functions (literal emits, sorted keys) plus golden-vector and coverage scaffolding - the canonical encoder (`canonicalBuf`, `emit`/`emitAlways`), the version-set tag parser, the frozen **TOML-key** emit rule, the **split omit-predicate** (scalar leaves `IsZero`, composites projected emptiness), the `sha256` combiner, and the golden vectors. Generate-time guards: a fingerprinted field with **no tag** fails generation; the slimmed **exclusion ledger** and **dropped-fields ledger** replace the retired `TestAllFingerprintedFieldsHaveDecision` audit; **regeneration-idempotence** (CI `go generate` + `git diff --exit-code`) pins superseded versions. Additive alongside the existing fingerprint path (it does retire the `TestAllFingerprintedFieldsHaveDecision` audit, replaced by the two ledgers above); not yet wired into `ComputeIdentity`. Tests: a field tagged `v2..*` is absent from generated `projectV1`; a `!` range emits at zero; a field with **no** `fingerprint` tag fails generation; a **nested** fingerprinted struct with a tagless field fails generation; deleting a field a retained `projectVN` names **fails to compile**; a **Go-field rename keeping the TOML key** yields a byte-identical digest; two fields colliding on one emit-key fail generation; a `!`-tagged nested struct whose every child is `-` (so its mandatory `!`-zero discrimination vector is unsatisfiable) is rejected as degenerate; the coverage oracle (by struct-reflection, not the tag) fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`) and is not in the dropped-fields ledger; golden vectors pin v1; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser; a CI assertion that `PackageConfig`'s field set stays within its known publish-only set (a new field fails CI, forcing re-evaluation of the parent `-` exclusion - the prune-at-`-` tripwire). 2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`, asserted by a named-constant test (`currentVersion == 1`) with a comment that the *content* version lives in the token prefix, not here - so a future format bump cannot silently break every historical read through `lockfile.Parse`. Ships at the cutover; absorbed by the scheduled rebuild. The `hashstructure` import and its `go.mod` entry are removed here, since no caller survives the switch. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. -3. **PR C (Part 2 machinery)**: the **two-type token split** - `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`, fails closed on its zero value), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **every** comparison and compute site through these types. The **current-tree** sites (via `FreshToken.Reconcile`): replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, `BuildDirtyChange`, and the second `ComputeIdentity` caller `bumpComponents` (`update.go`); plus the `computeCurrentFingerprint` (`sourceprep.go`) return-type cascade `string → FreshToken`. The **historical** sites (via `StoredToken.SameDigest`): `FindFingerprintChanges`, `changed.go`'s `classifyComponent`, **and `haveMatchingFingerprints`**. **`haveMatchingFingerprints` is security-load-bearing:** it gates the cache-poisoning integrity check (`if result.SourcesChange && haveMatchingFingerprints(...)` in `changed.go`). If only `classifyComponent` is converted and this site is missed, the first legitimate `v2` bump makes a version-only re-stamp compare unequal → the integrity violation is **never recorded → tamper evidence silently swallowed**. It must convert to digest-compare in the same PR. Resolution replay reserved (slot reuses `computeRes1`). **Ordering gate (CI-enforced):** `currentLockContentVersion > 1` is forbidden unless `BuildDirtyChange` already routes through `Reconcile` - otherwise registering `v2` makes every component read persistently dirty on every `render`/`build`. The gate is necessary but not sufficient (it does not prove `haveMatchingFingerprints` converted), so it is paired with a **named acceptance test**: `from="v1:sha256:X"`, `to="v2:sha256:X"` ⇒ `haveMatchingFingerprints` returns **true** - a missed conversion fails CI rather than silently disabling the integrity check. **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* - only the *registry dispatch* is dormant while just `v1` exists. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event and does **not** suppress `haveMatchingFingerprints`; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a zero-value `FreshToken`/`StoredToken` fails closed; a historical site cannot construct a `FreshToken`; the registry `init()` panics on a `[minSupported,current]` gap; a named `classifyComponent({name:"v1:sha256:X"}, {name:"v2:sha256:X"}) == Unchanged` (the third raw historical compare, with no CI gate of its own); a `BuildDirtyChange(v2-token, headLock-v1-same-digest) == nil` (a `RestampTo` must not mint a dirty synthetic commit; the existing "not a changelog event" test exercises `FindFingerprintChanges`, not this path); a malformed token round-trips its **original raw bytes** through `MarshalText`, so a malformed lock is never rewritten on save (no spurious `FindFingerprintChanges` event). +3. **PR C (Part 2 machinery)**: the **two-type token split** - `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`, fails closed on its zero value), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **every** comparison and compute site through these types. The **current-tree** sites (via `FreshToken.Reconcile`): replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, and `BuildDirtyChange`'s caller (which holds the inputs; `BuildDirtyChange` itself receives the replayed token); plus the `computeCurrentFingerprint` (`sourceprep.go`) return-type cascade `string → FreshToken`. `bumpComponents` (`update.go`) is the *second* `ComputeIdentity` caller but a **forced-change writer** (always `Changed`, no stored-token comparison), so it takes `FreshToken.Token()` to write the new token unconditionally - **not** `Reconcile`, whose `Stale`-on-zero would let a zero/malformed token slip silently into the lock. The **historical** sites (via `StoredToken.SameDigest`): `FindFingerprintChanges`, `changed.go`'s `classifyComponent`, **and `haveMatchingFingerprints`**. **`haveMatchingFingerprints` is security-load-bearing:** it gates the cache-poisoning integrity check (`if result.SourcesChange && haveMatchingFingerprints(...)` in `changed.go`). If only `classifyComponent` is converted and this site is missed, the first legitimate `v2` bump makes a version-only re-stamp compare unequal → the integrity violation is **never recorded → tamper evidence silently swallowed**. It must convert to digest-compare in the same PR. Resolution replay reserved (slot reuses `computeRes1`). **Ordering gate (CI-enforced):** `currentLockContentVersion > 1` is forbidden unless `BuildDirtyChange` already routes through `Reconcile` - otherwise registering `v2` makes every component read persistently dirty on every `render`/`build`. The gate is necessary but not sufficient (it does not prove `haveMatchingFingerprints` converted), so it is paired with a **named acceptance test**: `from="v1:sha256:X"`, `to="v2:sha256:X"` ⇒ `haveMatchingFingerprints` returns **true** - a missed conversion fails CI rather than silently disabling the integrity check. **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* - only the *registry dispatch* is dormant while just `v1` exists. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event and does **not** suppress `haveMatchingFingerprints`; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a zero-value `FreshToken`/`StoredToken` fails closed; a historical site cannot construct a `FreshToken`; the registry `init()` panics on a `[minSupported,current]` gap; a named `classifyComponent({name:"v1:sha256:X"}, {name:"v2:sha256:X"}) == Unchanged` (the third raw historical compare, with no CI gate of its own); a `BuildDirtyChange(v2-token, headLock-v1-same-digest) == nil` (a `RestampTo` must not mint a dirty synthetic commit; the existing "not a changelog event" test exercises `FindFingerprintChanges`, not this path); a **discriminating** `BuildDirtyChange` test where the current-version digest *moved* but the caller's replay at `headLock`'s version *matches* ⇒ `nil` (proves the replay actually runs - the same-digest case alone would pass a prefix-stripping digest-compare that never replays); `bumpComponents` writes via `FreshToken.Token()`, and a statically-zero token surfaces an error at the write site rather than a silent `Stale`; a malformed token round-trips its **original raw bytes** through `MarshalText`, so a malformed lock is never rewritten on save (no spurious `FindFingerprintChanges` event). 4. **PR D (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) - add a field absent from `projectV1` and set it on one component; assert only that lock drifts and every other lock is byte-identical. 5. **PR E (config schema axis, later)**: `schema-version` field + load-time canonical migration + the `config migrate` command. Gated on the first post-reset non-additive TOML change not already absorbed by the reset's normalization pass. 6. **PR F (forced lock migration, gated on the first floor raise)**: the `component migrate` command (the only sanctioned floor-raise; the prescribed fix for a build-critical newly-measured input) and the CI spread-ceiling on `currentLockContentVersion - minSupportedLockContentVersion`. **Gating:** a `v2` bump *without* PR F is safe - v1 stays in the registry and the floor stays at 1, so unmigrated locks still replay. PR F is required only before **raising `minSupportedLockContentVersion` above 1** (retiring v1), since that is what makes un-migrated locks unreplayable. A CI gate forbids raising the floor unless `component migrate` exists. So PR F is decoupled from the first `v2` and gated on the first floor raise **or** the first content-version bump whose decision rule demands immediate fleet-wide adoption (a build-critical newly-measured input, which cannot wait for lazy migration). -Each PR is independently revertible up to the cutover. PRs A-B land together at the dev→prod cutover (they move every hash and are absorbed by the scheduled rebuild); PR C is inert until the first post-reset algorithm change; PR D follows; PR E is gated on the first post-reset schema change, PR F on the first floor raise. +Each PR is independently revertible up to the cutover. PRs A-B land together at the dev→prod cutover (they move every hash and are absorbed by the scheduled rebuild); PR C's registry *dispatch* is dormant until the first post-reset algorithm change (its token-routed comparisons go live on merge); PR D follows; PR E is gated on the first post-reset schema change, PR F on the first floor raise. ## Open questions @@ -687,18 +687,18 @@ Each PR is independently revertible up to the cutover. PRs A-B land together at ## Decisions settled in the body -Indexed here so they are not re-litigated; each is argued in full at the linked section. +Indexed here for quick reference; each is argued in full at the linked section. | Decision | Where | | -------- | ----- | -| Reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover | §The opportunity | +| Reset rides the already-scheduled dev→prod rebuild as the one sanctioned coordinated cutover | [§The opportunity](#the-opportunity-a-coordinated-cutover-is-already-scheduled) | | Substrate is canonical projection (generated `projectVN` + golden vectors), not `hashstructure` | [§Substrate options](#substrate-options) | | Field selection is **codegen** from mandatory per-field version-set tags (absent ⇒ generation fails); `go generate` emits the per-version `projectVN` | [§Version-tagged field selection](#version-tagged-field-selection) | | Emit-key = frozen TOML key (`key=` override; duplicate keys fail generation); omit-predicate splits - scalar leaves `IsZero`, composites *projected* emptiness | [§Version-tagged field selection](#version-tagged-field-selection) | | Tag DSL frozen at three range-operators (`..` `!` `*`) plus the orthogonal `key=` | [§Version-tagged field selection](#version-tagged-field-selection) | -| Canonical byte encoding = existing length-prefixed `:=:`; maps sorted-key; per-type value slots - pinned irreversibly at the reset | §The projection substrate | +| Canonical byte encoding = existing length-prefixed `:=:`; maps sorted-key; per-type value slots - pinned irreversibly at the reset | [§Baseline v1 (encoding table)](#baseline-v1-omit-if-zero-no-include-always-legacy) | | Frozen-ness = compiler + generator + regeneration-idempotence; golden-vector coverage (tag-independent dropped-fields oracle) is the backstop; exclusion ledger kept for `-` fields | [§Golden-vector coverage](#golden-vector-coverage-the-backstop) | -| Stored hash = atomic `v:sha256:` token; lock format `Version` stays `1`; sub-floor/downgraded tokens reconciled by force-rehash | §The lock changes at the reset | +| Stored hash = atomic `v:sha256:` token; lock format `Version` stays `1`; sub-floor/downgraded tokens reconciled by force-rehash | [§The lock changes at the reset](#the-lock-changes-at-the-reset-atomic-token--forced-upgrade) | | Stored hash read only through the two-type token split (`StoredToken`/`FreshToken`, non-comparable), adopted in PR C | [§Downstream consumers](#downstream-fingerprint-consumers-blast-radius) | | Version write-guard required (refuse to write above the binary's `currentLockContentVersion`); CI version-pin blocks old-binary commits | [§Registry floor](#registry-floor-and-forced-migration) | | Back-compat: no reader recomputes a historical fingerprint (synthetic history / overlays read stored strings only) | [§Back-compat invariant](#back-compat-invariant-synthetic-history-reads-stored-strings-never-recomputes) | From a2b7d1ac24bcf4aedbe9f6528a19d6c4b66a81bd Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Wed, 10 Jun 2026 16:25:55 -0700 Subject: [PATCH 14/15] cleanup 3 --- docs/developer/rfc/lazy-schema-migration.md | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index 9d8273bb..cc8a7914 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -123,13 +123,7 @@ Two properties of `hashstructure` v2.0.2 are load-bearing for this RFC: The struct's type name *is* part of the hash (`hashstructure` mixes in `reflect.Type.Name()`), so a rename of the Go type moves every hash even when content is byte-identical. -**Why this substrate cannot host frozen replay.** Every property above is resolved *at hash time against the live program*, not against a pinned description of the v1 encoding: - -- The set of fields walked is whatever the struct has *now* - add a field, and last year's `computeFP1` (whose body is still just `hashstructure.Hash(component)`) now includes it. -- Whether `Includable` is consulted depends on whether the type implements it *now* - not on what was true when v1 locks were written. -- A `value` vs `pointer` receiver subtlety even decides whether the root struct's `HashInclude` is seen at all (the top-level value is not addressable). - -A function meant to be "the v1 algorithm, forever" therefore changes meaning every time the struct or its method set changes. That is the disqualifier for the incremental plan (Problem 6) and the motivation for the projection substrate below, whose v1 projection emits only its version-tagged fields and reads neither the method set nor the type name - immune to all three. +**Why this substrate cannot host frozen replay.** All three properties are resolved *at hash time against the live program*: the field set walked is whatever the struct has *now*, `Includable` is consulted only if the type implements it *now*, and a `value`-vs-`pointer` receiver subtlety even decides whether the root struct's `HashInclude` is seen at all (the top-level value is not addressable). This is [the substrate problem](#the-substrate-problem-replay-only-works-if-old-algorithms-stay-frozen) made concrete - Problem 6 - and exactly what the projection substrate below avoids by reading none of them. ## Change taxonomy @@ -659,10 +653,10 @@ The lock **format** `Version` stays at `1`. Bumping it to `2` as a poison pill - ## Alternatives considered -- **Incremental lazy migration on the `hashstructure` substrate** (the original plan): flip the inclusion default to omitempty via `Includable`, version the lock content, and migrate lazily - *without* a reset. Rejected: Problem 6 makes its central promise unkeepable. A "frozen" replay function built on `hashstructure.Hash` reflects the live struct, so the first field addition after the switchover moves the old algorithm's output and forces a rehash anyway. The incremental path therefore does not actually avoid a coordinated cutover - it defers one to the first field addition, on a substrate that makes replay unsound. With a coordinated cutover already scheduled (the dev→prod cutover), spending it once on a clean projection substrate is the better trade. +- **Incremental lazy migration on the `hashstructure` substrate** (the original plan): flip the inclusion default to omitempty via `Includable`, version the lock content, and migrate lazily - *without* a reset. Rejected: [Problem 6](#the-substrate-problem-replay-only-works-if-old-algorithms-stay-frozen) makes its central promise unkeepable - the first field addition after the switchover moves the "frozen" algorithm's output and forces a rehash anyway, so it does not avoid a coordinated cutover, only defers one to that addition on a substrate that makes replay unsound. With a cutover already scheduled, spending it once on a clean projection substrate is the better trade. - **Global `IgnoreZeroValue`** - a blunt switch that omits *all* zero fields with no escape hatch for build-meaningful zeros, and still on the non-frozen `hashstructure` substrate. Rejected. - **Parallel versioned structs with per-struct `Hash()`** - couples locks to Go type identity and duplicates hashing logic per version. Rejected in favor of Part 2's integer-versioned combiner over frozen projections. -- **Bump the lock format `Version` 1→2 as a poison pill** - makes old binaries hard-reject reset locks. Rejected: it also blocks old binaries from reading pins to queue a build, and it is unnecessary, since the content-version registry already force-rehashes any sub-floor or downgraded token (D3). Same-format + force-rehash keeps old binaries useful without risking silent corruption. +- **Bump the lock format `Version` 1→2 as a poison pill** - rejected; see [D3](#d3-atomic-self-describing-token-no-format-bump-reconcile-via-force-rehash). It would block old binaries from reading pins to queue a build, and is unnecessary: the content-version registry already force-rehashes any sub-floor or downgraded token without a format bump. - **Eager fleet-wide migration as the steady-state mechanism** - rewriting every lock on every algorithm change is the mass-churn the design exists to prevent. Rejected for the steady state. The *reset* is a deliberate, one-time, operator-driven eager pass riding an already-scheduled rebuild - the sanctioned exception, not the rule; `component migrate` is its post-reset equivalent for retiring an old version. - **Runtime reflective walker for field selection (instead of generated functions).** One generic `project(cfg, N)` reflects the struct at hash time and emits the fields whose version-set includes N. Least code, and it shares the tag syntax with the chosen approach. Rejected: it reflects the *live* struct at hash time - Problem 6 one layer down - so its frozen-ness rests entirely on golden-vector coverage (test discipline), and field removal degrades from a compile error to a CI failure. Codegen keeps the same tags but moves the reflection to *generate* time and freezes the output as checked-in code, recovering the compile guarantee. - **Hand-written per-version `projectVN` functions (instead of generating them from tags).** Each version gets a bespoke function with one explicit `emit`/`emitAlways` line per measured field. Same compile guarantees as codegen (removal won't compile, literal emit-key), but: membership is smeared across N function bodies; "bring a field back a few versions later" has no first-class expression (you re-add an `emit` line, nothing ties it to the field's earlier life); and the mandatory-decision and coverage properties need separate bookkeeping the tags otherwise carry. Codegen is the same runtime with declarative authoring - strictly preferable given the existing `go generate` infrastructure. @@ -687,7 +681,7 @@ Each PR is independently revertible up to the cutover. PRs A-B land together at ## Decisions settled in the body -Indexed here for quick reference; each is argued in full at the linked section. +Indexed here for quick reference; each is argued in full at the linked section. The four headline trade-offs also have summary tables in [Design decisions](#design-decisions) (D1-D4). | Decision | Where | | -------- | ----- | From a54fc4b3e44d54dfeb50a5195253292b09169992 Mon Sep 17 00:00:00 2001 From: Daniel McIlvaney Date: Wed, 10 Jun 2026 17:18:20 -0700 Subject: [PATCH 15/15] update 11 --- docs/developer/rfc/lazy-schema-migration.md | 75 ++++++++++++++------- 1 file changed, 50 insertions(+), 25 deletions(-) diff --git a/docs/developer/rfc/lazy-schema-migration.md b/docs/developer/rfc/lazy-schema-migration.md index cc8a7914..f1986bc0 100644 --- a/docs/developer/rfc/lazy-schema-migration.md +++ b/docs/developer/rfc/lazy-schema-migration.md @@ -123,7 +123,7 @@ Two properties of `hashstructure` v2.0.2 are load-bearing for this RFC: The struct's type name *is* part of the hash (`hashstructure` mixes in `reflect.Type.Name()`), so a rename of the Go type moves every hash even when content is byte-identical. -**Why this substrate cannot host frozen replay.** All three properties are resolved *at hash time against the live program*: the field set walked is whatever the struct has *now*, `Includable` is consulted only if the type implements it *now*, and a `value`-vs-`pointer` receiver subtlety even decides whether the root struct's `HashInclude` is seen at all (the top-level value is not addressable). This is [the substrate problem](#the-substrate-problem-replay-only-works-if-old-algorithms-stay-frozen) made concrete - Problem 6 - and exactly what the projection substrate below avoids by reading none of them. +**Why this substrate cannot host frozen replay.** All three properties are resolved *at hash time against the live program*: the field set walked is whatever the struct has *now*, `Includable` is consulted only if the type implements it *now*, and a `value`-vs-`pointer` receiver subtlety even decides whether the root struct's `HashInclude` is seen at all (the top-level value is not addressable). This is [the substrate problem](#the-substrate-problem-replay-only-works-if-old-algorithms-stay-frozen) made concrete - Problem 6 - and exactly what the projection substrate below avoids by reading none of these, nor the type name (the fourth live-program dependency, noted above). ## Change taxonomy @@ -201,7 +201,7 @@ ComponentConfig ──projectV1(cfg)──► canonical bytes ──sha256── sorted keys, emit-if-nonzero) ``` -`projectV1` is the projection at version 1. Field membership is declared **on each struct field** as a version-set in the `fingerprint` tag (`fingerprint:"v1..*"`); a `go generate` step reads those tags and emits a frozen `projectV1(cfg) []byte` function that emits, in stable key order, every field whose set includes v1, length-prefixing key+value so distinct field sets cannot collide. It omits a field when its **resolved value is zero** (omit-if-zero); a range prefixed with `!` (e.g. `!v1..*`) always-emits, for fields whose zero is build-meaningful. The generated functions are checked in and the registry dispatches to them. (Grammar, generation, and recovery semantics: [Version-tagged field selection](#version-tagged-field-selection) below.) +`projectV1` is the projection at version 1. Field membership is declared **on each struct field** as a version-set in the `fingerprint` tag (`fingerprint:"v1..*"`); a `go generate` step reads those tags and emits a `projectV1(cfg) []byte` function that emits, in stable key order, every field whose set includes v1, length-prefixing key+value so distinct field sets cannot collide. It omits a field when its **resolved value is zero** (omit-if-zero); a range prefixed with `!` (e.g. `!v1..*`) always-emits, for fields whose zero is build-meaningful. The generated functions are checked in and the registry dispatches to them. (Grammar, generation, and recovery semantics: [Version-tagged field selection](#version-tagged-field-selection) below.) Three things this buys that `hashstructure` could not: @@ -271,7 +271,7 @@ Every frozen output is byte-preserved, and the **golden vectors prove it**: the 1. **Compiler.** A generated `projectVN` references each measured field by literal Go path and emits a literal key: deleting a field a retained version measures won't compile, and the key cannot silently drift to the Go identifier. 2. **Generator (generate-time).** The generator **enumerates every field reachable from a fingerprinted root, recursing by field type** - this walk is the completeness guard, and it auto-discovers nested structs, replacing today's hand-maintained `fingerprintedStructs` list (which can silently go stale when a new nested type is added). It then refuses to emit on: a reached field with **no tag** (the include/exclude decision is mandatory - the generator must *fail* on an unrecognized field, never silently skip it, or a forgotten field drops out of the hash, a G5 hazard); a malformed, future-referencing, overlapping, or key-colliding tag set; or a `-`-tagged field absent from the **exclusion ledger** (an enumerated list - the surviving half of today's `expectedExclusions` - naming every `-` field with a justification, so an accidental exclusion fails generation). It also enforces the [coverage oracle](#golden-vector-coverage-the-backstop). -3. **Regeneration-idempotence.** CI runs `go generate` and fails on any diff to a **strictly-historical** `projectVN` (`version < currentLockContentVersion`) - a *superseded* version's emitted code cannot change without an intentional, diff-surfaced regeneration. The **live** version's `projectVN` is deliberately *mutable*: an output-preserving omit-if-zero addition regenerates it (a new `b.emit` line) and that diff is expected, not a violation. "Frozen" throughout this RFC means **superseded** (`version < current`), not "shipped" - a version freezes when the next one is registered, not the moment it first ships. +3. **Regeneration-idempotence.** CI runs `go generate` and fails on **any uncommitted diff** to the generated projectors (`git diff --exit-code` over *all* of them, live and superseded alike) - so whatever the generator emits must always be committed. This blanket check enforces *idempotence* (committed code == what regenerates); it is **not** itself the freeze. The distinct *freeze* property - a **superseded** version's bytes never move - is enforced by the golden vectors + retained-version manifest. The **live** version's `projectVN` is deliberately *mutable*: an output-preserving omit-if-zero addition regenerates it (a new `b.emit` line), and you commit that expected diff; a *superseded* version regenerating to different bytes is what the golden vectors catch. "Frozen" throughout this RFC means **superseded** (`version < current`), not "shipped" - a version freezes when the next one is registered, not the moment it first ships. 4. **Golden vectors** - the semantic backstop (next). #### Golden-vector coverage: the backstop @@ -291,7 +291,7 @@ Non-zero coverage alone is necessary but not sufficient; four more obligations c - **`!`-zero behavior.** Dropping `!` silently stops emitting a build-meaningful zero (G5). Every retained `!` range needs a **zero-valued** discrimination vector. - **Encoding across the value space.** A `"foo"` vector misses an encoder change affecting only delimiter bytes, multibyte runes, or multi-entry slices/maps. Add per-encoder **property/fuzz** vectors. (Fails toward G1 over-drift except under a collision the length-prefixed form makes unlikely.) -- **nil-vs-empty is a resolver invariant - scalar slices only.** It bites only where `IsZero` distinguishes nil from non-nil-empty *and* the predicate is plain `IsZero`: i.e. **scalar slices** (`[]string`). Composites (maps, slice-of-struct, nested structs) omit by *projected emptiness*, under which a nil and an empty container both project no bytes, so they need no normalization (maps are swept in only defensively). For scalar slices, the resolver's `mergo.Merge(…, WithOverride, WithAppendSlice)` ([`component.go`](../../../internal/projectconfig/component.go) `MergeUpdatesFrom`, `ResolveComponentConfig`) can yield nil *or* `[]` for the same intent depending on merge order, with **no post-merge normalization today**. This is the one correctness assumption the design *preserves rather than proves*. PR A closes it structurally: (a) a single named chokepoint - `canonicalizeForFingerprint(cfg)` at the **end of `ResolveComponentConfig`** - collapses every fingerprint-sensitive scalar slice to one canonical form (**nil**), so it is one enforced place, not a convention scattered across merge sites; (b) a **CI cross-check binds that chokepoint's scalar-slice inventory to the generator's enumerated scalar-slice measured-field set** (the same anti-staleness pattern as the coverage corpus), so a newly-added scalar slice cannot silently miss normalization; and (c) its canonical-form test is written **first - before any golden vector is authored**, or a vector bakes in a non-deterministic encoding. **Every omit predicate runs *after* `canonicalizeForFingerprint`**, so the projector never sees a non-canonical slice. This is the single most load-bearing PR-A gate. +- **nil-vs-empty is a resolver invariant - scalar slices only.** It bites only where `IsZero` distinguishes nil from non-nil-empty *and* the predicate is plain `IsZero`: i.e. **scalar slices** (`[]string`). Composites (maps, slice-of-struct, nested structs) omit by *projected emptiness*, under which a nil and an empty container both project no bytes, so they need no normalization (maps are swept in only defensively). For scalar slices, the resolver's `mergo.Merge(…, WithOverride, WithAppendSlice)` ([`component.go`](../../../internal/projectconfig/component.go) `MergeUpdatesFrom`, `ResolveComponentConfig`) can yield nil *or* `[]` for the same intent depending on merge order, with **no post-merge normalization today**. This is the one correctness assumption the design *preserves rather than proves*. PR A closes it structurally by running it **at the hash boundary**: (a) the single named chokepoint `canonicalizeForFingerprint(cfg)` is invoked **inside `ComputeIdentityAt`, immediately before `projectVN`** (not merely at the end of `ResolveComponentConfig`), so *every* path into the hasher is canonicalized regardless of how the caller obtained the config - `ComputeIdentity` accepts any `ComponentConfig`, so a resolver-end-only placement would leave the caller↔chokepoint link an unenforced convention; (b) a **CI cross-check binds that chokepoint's scalar-slice inventory to the generator's enumerated scalar-slice measured-field set** (generator-emitted, not a hand-maintained list - the same anti-staleness pattern as the coverage corpus), so a newly-added scalar slice cannot silently miss normalization; and (c) its canonical-form test is written **first - before any golden vector is authored**, or a vector bakes in a non-deterministic encoding. Because it runs at the hash boundary, **every omit predicate sees only canonical slices** - structural, not a per-call-site convention. This is the single most load-bearing PR-A gate. - **`!` on a projected-empty nested struct emits.** A `!`-tagged nested struct whose *measured* children all resolve to zero would otherwise be omitted by projected emptiness (it emits no measured bytes); the generated sub-projector treats a `!` range as "emit the recursively projected value even when the sub-projector emits no measured bytes," so a build-meaningful all-measured-zero struct still hashes - **including the case where only an excluded (`-`) child is non-zero**, which a global `IsZero` predicate would mis-handle (the struct reads raw-non-zero, so the `!` trigger keyed on `IsZero` would never fire). Covered by a discrimination vector whose measured children are all zero - one variant with a non-zero excluded child - asserting the `!` struct still emits. - **Enumerator completeness.** The coverage corpus is checked against the **generator's own field enumeration** (above), so it cannot drift from what is measured - a newly-added field or nested struct that the generator reaches but the corpus does not cover fails the backstop, not just the generator. @@ -336,9 +336,9 @@ func projectV1(c *ComponentConfig) []byte { | --- | --- | --- | --- | | `string` (`upstream`) | raw bytes | `IsZero` (`""`) | scalar leaf | | `bool` (`strip-debug`) | `true` / `false` | `IsZero` (`false`), **unless `!`** | `!` emits the build-meaningful `false` | -| `int` (forward-looking) | base-10 | `IsZero` (`0`) | scalar leaf; no plain `int` in the v1 measured graph (`manual-bump` is a combiner input, not projected) | +| `int` (forward-looking) | base-10 | `IsZero` (`0`) | scalar leaf; no plain `int` is measured by `projectV1` today (`manual-bump` is a combiner input, not projected) | | named scalar (`SpecSourceType`, `fileutils.HashType`, `ComponentOverlayType`, `ReleaseCalculation`) | by **underlying `reflect.Kind`** | underlying `IsZero` | measured - must **not** fail generation | -| `[]string` (`patches`) | length-prefixed elements, slice order | canonical nil (post-`canonicalizeForFingerprint`) | **scalar slice**: the resolver chokepoint (above) collapses nil≡`[]` to nil, then plain `IsZero` omits it; pinned by a golden vector. Tag `!` (exempt from collapse) to keep an explicit empty build-meaningful | +| `[]string` (`patches`) | length-prefixed elements, slice order | canonical nil (post-`canonicalizeForFingerprint`) | **scalar slice**: the resolver chokepoint (above) collapses nil≡`[]` to nil, then plain `IsZero` omits it; pinned by a golden vector. Tag `!` (exempt from collapse) to keep an explicit empty distinguishable *from omitted* (not nil-vs-`[]`; see edge cases) | | `[]Struct` (`[]ComponentOverlay`) | each element via its frozen sub-projector | no element projects bytes | **composite slice**: each element kept/dropped by *its own* projected emptiness | | `map[string]string` (`defines`) | sorted-key `:k=:v` | no entries | key membership measured (`{"k":""}` ≠ `{}`) | | `map[string]Struct` (`map[string]PackageConfig`) | sorted-key; value via frozen sub-projector | no entries | **excluded `-` in v1**; if ever included, this row governs | @@ -355,14 +355,14 @@ func projectV1(c *ComponentConfig) []byte { So the classic false-negative requires absence ≠ zero-default *at the point of hashing*, and post-merge resolution closes that gap. The load-bearing invariant is **G5's guarantee restated structurally: the fingerprint must see exactly the build-effective resolved config.** That invariant must already hold, or fingerprinting is broken independently of this change. A `!`-prefixed range is the escape hatch for the rare field whose zero value is build-meaningful. -**Result:** additive fields are drift-neutral **by construction** (G4) - a newly added field, listed omit-if-zero in `projectVN`, emits nothing for any component that does not set it, so it is invisible to every lock that leaves it unset, old or new. Adding it moves no existing hash (no shipped lock could have set a field that did not yet exist), so it needs no version bump. Only setters drift (G2). +**Result:** additive fields are drift-neutral **by construction** (G4) - a newly added field, listed omit-if-zero in `projectVN`, emits nothing for any component that does not set it, so it is invisible to every lock that leaves it unset, old or new. Adding it moves no existing hash (no shipped lock could have set a field that did not yet exist), so it needs no version bump. Only setters drift (G2) - and a setter on an older, un-migrated lock stays false-fresh until that lock next re-stamps to a version that measures the field. #### Edge cases under omit-if-zero The omit predicate **splits by kind** - scalar leaves use `reflect.Value.IsZero()`, composites (nested struct, map, slice-of-struct) use projected emptiness (the encoding contract above is the single source of truth); `!` is the only per-field override. A few `IsZero` consequences on the **scalar leaves** still need stating, because `IsZero` is type-specific: - **Meaningful zero with a non-zero default** (e.g. `int Jobs` defaulting to `4`, where `0` means serial). Post-merge: unset → `4` (emitted), explicit `0` → omitted. These build differently *and* hash differently, so there is no collision - they are consistent. Use a `!` range only if a zero value must be distinguishable from a future change of default. -- **nil vs empty scalar slice - canonicalized before projection.** Raw `IsZero` would treat a nil slice (omitted) and a non-nil empty `[]` (emitted) differently, and resolution can produce either for the same intent. The rule (stated once in [the coverage section](#golden-vector-coverage-the-backstop)): `canonicalizeForFingerprint` collapses every fingerprint-sensitive scalar slice to **nil** *before* the projector runs, so plain `IsZero` then omits it deterministically. Where an explicit-empty value is build-meaningful, tag the field `!` **and exempt it from collapse**, so nil and `[]` both emit and stay distinguishable. Pinned by a golden vector, enforced at the one chokepoint - not a per-call-site convention. +- **nil vs empty scalar slice - canonicalized before projection.** Raw `IsZero` would treat a nil slice (omitted) and a non-nil empty `[]` (emitted) differently, and resolution can produce either for the same intent. The rule (detailed in [the coverage section](#golden-vector-coverage-the-backstop)): `canonicalizeForFingerprint` collapses every fingerprint-sensitive scalar slice to **nil** *before* the projector runs, so plain `IsZero` then omits it deterministically. Where an explicit-empty value is build-meaningful, tag the field `!` **and exempt it from collapse**, so an explicit empty *emits* and stays distinguishable **from omitted**. (The v1 length-prefixed element encoding does not by itself separate nil from `[]` - both are zero elements - so `!` distinguishes emitted-empty from omitted, *not* nil from `[]`; a build-meaningful nil-vs-`[]` split would need an explicit discriminator added to the encoding table plus a golden vector, which no measured scalar slice needs today.) Pinned by a golden vector, enforced at the one chokepoint - not a per-call-site convention. ### The reset load-out: what to spend the free rebuild on @@ -372,7 +372,7 @@ The reset rebuild is a budget. Spend it on the irreversible / cutover-only chang 2. **Establish `projectV1` as omit-if-zero with no include-always legacy.** The compatibility mode never enters the registry, so it never has to age out. 3. **Keep the lock *format* `Version` at `1` - the content-version token carries the reset.** The reset adds **no new TOML field** (the atomic token in item 4 reuses `InputFingerprint`) and touches **no** pinning field (`upstream-commit`, `import-commit`, `manual-bump`), so an old binary still parses a reset lock and reads everything it needs to *queue a build*. The substrate swap rides entirely on the content-version machinery (Part 2): pre-reset locks carry a legacy (prefix-less) token below the registry floor, and the reset is simply the **first forced upgrade** of the fleet to the `v1:` token. This also makes the one real mixed-toolchain risk self-correcting: if an old binary ever rewrites a reset lock with its legacy-substrate hash, the next new-binary run sees a sub-floor token and **force-rehashes** it back to `v1` - a clean forced upgrade, never silent corruption (next subsection). 4. **Adopt an atomic, self-describing `v1:sha256:…` token** for the stored hash, so the version and the digest can never desync (closes the re-stamp/desync class of bug where the version field and the hash field are written independently). -5. **Unify on `sha256` everywhere**, retiring the `uint64`→decimal-string wart from the `hashstructure` era. One hash format, one encoding. +5. **Unify on `sha256` everywhere**, retiring the `uint64`→decimal-string artifact from the `hashstructure` era. One hash format, one encoding. 6. **Do every pending rename / default-normalization now.** Renaming a field, moving content between structs, or changing a baked-in default is a one-way door under Part 2 (it needs a version bump + replay); at the reset it is free because everything rebuilds anyway. This is where the schema-axis "hardest cases" get absorbed cheaply. 7. **Resolve each field's mandatory tag - and bank the free corrections.** The "absent ⇒ generation fails" rule forces a conscious decision on every fingerprinted field at the reset, which is the moment to *fix* existing mistakes for free. Concretely, tag `ComponentConfig.Packages` `fingerprint:"-"`: every `PackageConfig` field is publish-only (`Publish`, itself `-`), so the map measures nothing build-effective - yet today `hashstructure` hashes its *keys*, so adding a publish-only package name already triggers a spurious rebuild. Excluding it at the reset retires that existing G1 churn at zero cost. Audit the whole struct for the same pattern (a measured composite whose every leaf is `-`). @@ -408,6 +408,8 @@ The reset is only safe because of a property of the codebase verified against th | `synthistory.BuildDirtyChange` | compares the precomputed current fingerprint to HEAD's stored string | No (HEAD only) | | `sourceprep.computeCurrentFingerprint` | the *only* `ComputeIdentity` call on this surface - computes for the **current tree**, compares to HEAD's stored hash | Current tree only | +*(This table is **current** behaviour - the readers as they exist today. PR C refines the comparison **granularity** to digest-level, stripping the `v:` prefix, and adds caller-side replay to `BuildDirtyChange`, but preserves the string-only / no-recompute property exactly; the post-PR-C target is the [blast-radius table](#downstream-fingerprint-consumers-blast-radius).)* + The consequence: **swapping the substrate is invisible to synthetic history.** A pre-reset (legacy-token) lock and a post-reset `v1:` lock are just two different opaque strings at two different commits; the walker reports "changed" across the reset commit (correct - it *is* a notable, deliberate, fleet-wide event, the coordinated cutover) and never tries to recompute either side. Applying historic overlays likewise reads stored lock fields and needs no hash recomputation. > **Invariant (must hold forever):** synthetic history and historic-overlay application operate on **stored lock fields only.** No reader recomputes a fingerprint for a historical commit. This is precisely what lets a frozen `projectVN` be *forward-only*: it never has to reproduce a hash from a different substrate generation, only hashes the lock that the *current* binary writes. A future change that recomputes a historical fingerprint would break this and must be rejected in review. @@ -454,7 +456,7 @@ Stamp one **lock content-hash version** into the lock (the `v1:` prefix of the a 3. In `checkFingerprintFreshness`, compute at the **current** version. On mismatch, if the lock's token version `< current`, recompute at the lock's token version. If *that* matches the stored digest, the inputs are unchanged and only the algorithm evolved → treat as `FreshnessCurrent` and flag for silent re-stamp. Otherwise → `FreshnessStale`. (The resolution hash reuses `computeRes1` until its algorithm first changes - see scope note.) 4. `component update` re-stamps the token to the **current** version **only when it is already writing for an independent reason** (see the churn policy below). Migration is therefore **lazy and per-component**: a lock upgrades only when something independently touches it. -This resolves Problems 2 (for default changes), 3 (hashing bugfixes), and 5 (piecemeal rollout). It is the same lazy-forward-migration pattern Cargo/npm use, specialized to a content hash. +This resolves Problems 2 (rename/move and additive-default changes via replay; a change to an *existing* measured default genuinely drifts and is migrate-or-`config migrate`-gated, per the [taxonomy](#change-taxonomy)), 3 (hashing bugfixes), and 5 (piecemeal rollout). It is the same lazy-forward-migration pattern Cargo/npm use, specialized to a content hash. #### Both hashes share one version @@ -462,7 +464,7 @@ This resolves Problems 2 (for default changes), 3 (hashing bugfixes), and 5 (pie We version them with **one shared integer**, not two axes, because: they co-locate in a single lock, they are written in the same `update` pass, and a paired registry lets either evolve independently while the other reuses its prior function. Two separate version axes would double the floor/replay/migrate machinery for an input set (`ResolutionInputHash`) that changes rarely - YAGNI. -**`InputFingerprint` is the sole prefix authority; `ResolutionInputHash` stays bare.** The shared version is physically stored **only** in `InputFingerprint`'s `v:` prefix. `ResolutionInputHash` carries **no prefix** - it remains a bare `sha256:` digest. This is the decisive choice that prevents the *fingerprint-bump* desync: the **first fingerprint-only `v2`** already advances the shared prefix, and if `ResolutionInputHash` *also* carried it, `resolver.go`'s raw string compare of the whole field would see a prefix-only move (`v1:…X` → `v2:…X`, `computeRes1` unchanged) and mark resolution stale → fleet-wide re-resolution for nothing. With the prefix living only in `InputFingerprint`, the resolver compares a bare digest that does not move on a fingerprint bump. (This closes the fingerprint-bump direction only; the *symmetric* resolution-only-write desync stays dormant until `computeRes2` and is held by the structural tripwire in [`ResolutionInputHash`](#resolutioninputhash-bare-digest-replay-deferred).) The "shared version" therefore means: the integer in `InputFingerprint`'s prefix selects which `computeResN` produced `ResolutionInputHash` during replay (read from the one prefix), not that resolution stores its own copy. This also keeps `InputFingerprint` the only release-bearing field, so the historical changelog/classifier comparators - which compare the **digest**, stripping the `v:` prefix - never see a phantom move on a version-only re-stamp. (See [the synthetic changelog/release path](#the-synthetic-changelogrelease-path-is-the-real-hazard).) +**`InputFingerprint` is the sole prefix authority; `ResolutionInputHash` stays bare.** The shared version is physically stored **only** in `InputFingerprint`'s `v:` prefix. `ResolutionInputHash` carries **no prefix** - it remains a bare `sha256:` digest. This is the decisive choice that prevents the *fingerprint-bump* desync: the **first fingerprint-only `v2`** already advances the shared prefix, and if `ResolutionInputHash` *also* carried it, `resolver.go`'s raw string compare of the whole field would see a prefix-only move (`v1:…X` → `v2:…X`, `computeRes1` unchanged) and mark resolution stale → fleet-wide re-resolution for nothing. With the prefix living only in `InputFingerprint`, the resolver compares a bare digest that does not move on a fingerprint bump. (This closes the fingerprint-bump direction only; the *symmetric* resolution-only-write desync stays dormant until `computeRes2` and is held by the structural tripwire in [`ResolutionInputHash`](#resolutioninputhash-bare-digest-replay-deferred).) **The bare invariant is type-enforced, not conventional:** `ResolutionInputHash` parses through a distinct `BareDigestToken` whose parser *rejects* any `v:` prefix, so a stray `resolution-input-hash = "v2:sha256:X"` fails to parse rather than silently weakening the invariant - `InputFingerprint`'s token type accepts the prefix, the resolution type forbids it. The "shared version" therefore means: the integer in `InputFingerprint`'s prefix selects which `computeResN` produced `ResolutionInputHash` during replay (read from the one prefix), not that resolution stores its own copy. This also keeps `InputFingerprint` the only release-bearing field, so the historical changelog/classifier comparators - which compare the **digest**, stripping the `v:` prefix - never see a phantom move on a version-only re-stamp. (See [the synthetic changelog/release path](#the-synthetic-changelogrelease-path-is-the-real-hazard).) **Phasing.** The atomic token format (`v:sha256:…`) is fixed at the reset. Fingerprint replay is wired in Part 2's first PR; **resolution-hash replay is reserved, not yet wired** - the slot exists and `computeRes1` is reused, so the day `ComputeResolutionHash` first changes we add `computeRes2` and extend replay to its comparison sites (the resolution-staleness branch of `computeFreshnessStatus` + the `resHashChanged` silent-write guard in `updateResolutionHash`). Because `ResolutionInputHash` is bare and prefix-free, a fingerprint-only bump before that day is a no-op for the resolver - the deferral is genuinely safe, not merely small-blast-radius. See [`ResolutionInputHash`](#resolutioninputhash-bare-digest-replay-deferred). @@ -505,7 +507,7 @@ This is correct *by contract* (a v1 lock promises freshness under the v1 input s Lazy migration means an untouched lock can sit at an old version **indefinitely** (G3 by design). That makes "keep the last *N* versions" a **correctness cliff, not a tuning knob**: if pruning drops the compute function a lock still depends on, replay becomes impossible → forced `FreshnessStale` → the mass rebuild/rewrite (and, via the downstream-consumer analysis below, mass changelog churn) the whole design exists to avoid. So the floor must be explicit and paired with an escape hatch, decided now: - **`minSupportedLockContentVersion`** is a hard floor. A lock below it cannot be replayed and is treated as `Stale`. Dropping a registry entry is therefore a deliberate, breaking, announced act - never incidental cleanup. -- **`component migrate`** force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. Note this pass is a deliberate G1 exception - it *is* the eager migration G1 normally forbids, made safe by being explicit and operator-driven rather than a silent side effect. **Contract:** it is *offline* - it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm, so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic changelog/release path](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm rewrites only the bare, prefix-free `ResolutionInputHash` - which `synthistory` never reads - so it is correctly release-silent.) Migrate is therefore rare: the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal - each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). +- **`component migrate`** force-advances every lock to the current content version in one deliberate pass. This is the *only* sanctioned way to retire an old version: migrate the fleet first (one intentional, reviewed, fleet-wide commit), then raise the floor. This pass is the **post-reset counterpart to the reset's sanctioned exception**: the eager migration G1 normally forbids, made safe by being explicit, planned, and operator-driven rather than a silent side effect. **Contract:** it is *offline* - it loads each lock, recomputes the fingerprint at `currentLockContentVersion`, and rewrites the token; it does **not** re-resolve upstream (`upstream-commit`/`import-commit` untouched, unlike `update --force-recalculate`) and does **not** touch the manual-bump counter (unlike `--bump`). It *does*, however, move every *fingerprint* digest when it retires a fingerprint algorithm, so a fleet-wide migrate of that kind **is a fleet-wide, release-grade event**: `FindFingerprintChanges` reads each moved digest as notable, exactly as [the synthetic changelog/release path](#the-synthetic-changelogrelease-path-is-the-real-hazard) warns. (A migrate that retires only a *resolution* algorithm rewrites only the bare, prefix-free `ResolutionInputHash` - which `synthistory` never reads - so it is correctly release-silent.) Migrate is therefore rare: the release churn is the deliberate cost of retiring a version. The on-disk *config* axis has its own verb, [`config migrate`](#config-schema-version-and-canonical-migration-future); the two are orthogonal - each lives with the artifact its command group already owns (`component` writes locks, `config` owns the TOML). - **Floor-advance cadence.** Because raising the floor requires a release-grade `component migrate`, pruning cannot be routine - left alone, the registry and deprecated tombstone fields grow **append-only**, and the golden corpus grows **multiplicatively** (O(retained-versions × measured-fields)) (a real cost the opaque-token model accepts; see the manifest alternative). Policy: piggyback floor-raises onto *already-planned* mass rebuilds (the next environment cutover or a major release), and enforce a CI ceiling on the `currentLockContentVersion - minSupportedLockContentVersion` *spread* so the backlog cannot grow unbounded between those planned events. The spread, not the absolute version number, is the quantity kept small. **Early-warning ramp:** the ceiling is a *warning at ceiling-1*, a hard failure only at the ceiling - so an approaching floor-raise surfaces as a heads-up on the PR *before* the one that registers `v(N+1)`, converting the forced migrate from a surprise blocking failure into a planned event (the design's goal that nothing *unplanned* ever forces a migrate). The ceiling and its ramp are installed by PR F; versions registered before PR F accumulate unguarded, which is acceptable because the floor stays at 1 and every version remains replayable until the first floor-raise. **Residual:** if genuine algorithm changes arrive *faster* than planned rebuilds, the ceiling still ultimately *forces* an unplanned, release-grade `component migrate`. The ceiling does not eliminate the expensive event; it bounds the backlog by *converting* an unbounded version spread into an occasional forced migrate, with one version of advance notice. This is the accepted cost of lazy-forever coexistence. **Mixed-toolchain hazard - bounded by the version-pin, not auto-repair.** The classic trap is an older binary regressing a newer lock. Because the lock *format* never bumps, an old binary *can* write a reset lock, stamping a legacy (prefix-less) or lower-`v` hash. In the **working tree** this is self-correcting: the next new-binary run detects the sub-floor token and force-rehashes it to the current version. But "self-correcting" stops at the working tree - if a downgraded lock is **committed**, `FindFingerprintChanges` reads `v1 → legacy → v1` as two real release events, and a published `%autorelease` increment cannot be withdrawn. So the load-bearing guard against *committed* phantom releases is the **CI version-pin**: post-cutover, no old binary may run the `update`-and-commit step. Concretely, that means the lock-writing CI job runs from a **pinned build image (by digest, not a floating tag)** rebuilt from the cutover commit or later, and **no other path reaches the `update`-and-commit step** - local developer binaries do not commit locks; only the pinned job does. (The force-rehash only cleans the working tree; it does not undo history.) The *symmetric* residual - a binary that predates content-version `v2` meeting a `v2` token it cannot replay - is closed by a **required** write-time guard: refuse to write a token whose version exceeds the binary's `currentLockContentVersion`, erroring rather than silently restamping at `v1`. Note this guard lives in the binary doing the write, so it constrains *newer-but-not-newest* binaries; it does **not** retroactively constrain a genuinely *old* binary - that direction is the version-pin's job. @@ -535,7 +537,7 @@ This makes "drop an input" a lazy, per-component migration rather than a fleet-w #### First post-reset customer -The reset establishes `projectV1` directly; it is *not* itself a Part 2 version event (it rides the rebuild, not replay). Part 2's machinery therefore sits idle until the **first genuine algorithm change after the cutover** - e.g. a `computeFP2` that fixes an overlay-folding bug, folds in a newly measured input, or changes a baked-in default. That change registers `computeFP2`, bumps `currentLockContentVersion` to 2, and is absorbed by replay with no second coordinated cutover. Because the projection substrate makes additive config changes hash-neutral by construction (G4), the *only* changes that ever need a Part 2 version event are genuine non-additive algorithm changes - a deliberately small set. +The reset establishes `projectV1` directly; it is *not* itself a Part 2 version event (it rides the rebuild, not replay). Part 2's machinery therefore sits idle until the **first genuine algorithm change after the cutover** - e.g. a `computeFP2` that fixes an overlay-folding bug or folds in a newly measured input. (A default change that genuinely alters build output is *not* in this set - it drifts by design, per the [taxonomy](#change-taxonomy); only an output-preserving rename/default-normalization rides replay.) That change registers `computeFP2`, bumps `currentLockContentVersion` to 2, and is absorbed by replay with no second coordinated cutover. Because the projection substrate makes additive config changes hash-neutral by construction (G4), the *only* changes that ever need a Part 2 version event are genuine non-additive algorithm changes - a deliberately small set. ## Config schema version and canonical migration (future) @@ -565,7 +567,7 @@ TOML on disk ──migrate to canonical struct (schema axis)──► ComponentC ## Downstream fingerprint consumers (blast radius) -The versioned-replay story in Part 2 must hold for **every** reader of `InputFingerprint`, not just the two paths it grew up around. This is the post-reset migration blast-radius map; each consumer's behavior under a Part 2 v1→v2 algorithm switchover is stated explicitly. (The *reset itself* does not engage this Part 2 replay machinery - these consumers compare stored strings and pre-reset locks are never recomputed; the reset commit's digest move is its one intended visible event, the coordinated cutover, as analyzed under [Back-compat invariant](#back-compat-invariant-synthetic-history-reads-stored-strings-never-recomputes).) +The versioned-replay story in Part 2 must hold for **every** reader of `InputFingerprint`, not just the two paths it grew up around. This is the post-reset migration blast-radius map; each consumer's behavior under a Part 2 v1→v2 algorithm switchover is stated explicitly. **This table is the post-PR-C target state**; for *current* behaviour (raw-string compares, no replay) see the [back-compat reader table](#back-compat-invariant-synthetic-history-reads-stored-strings-never-recomputes). (The *reset itself* does not engage this Part 2 replay machinery - these consumers compare stored strings and pre-reset locks are never recomputed; the reset commit's digest move is its one intended visible event, the coordinated cutover, as analyzed under [Back-compat invariant](#back-compat-invariant-synthetic-history-reads-stored-strings-never-recomputes).) | Consumer | Reads | Compares | Migration behavior required | | -------- | ----- | -------- | --------------------------- | @@ -575,12 +577,12 @@ The versioned-replay story in Part 2 must hold for **every** reader of `InputFin | `changed.go` `classifyComponent` (CI classifier) | stored token strings (two historical git refs) | **digest compare** (strip `v:` prefix) | **String-only - must NOT replay** (no inputs available; replaying historical configs would violate the no-recompute invariant) | | `changed.go` `haveMatchingFingerprints` (cache-poisoning integrity gate) | stored token strings | **digest compare** (strip `v:` prefix) | **String-only; security-load-bearing** - a version-only delta must read as "same" or the integrity check is silently skipped | | `synthistory.FindFingerprintChanges` | stored token strings across git history | **digest of adjacent commits** (strip `v:` prefix) | **String-only; digest-compare** so a version-only re-stamp never fires a release | -| `synthistory.BuildDirtyChange` | precomputed current-ver fingerprint vs stored `headLock` token | digest/token compare | **Caller replays** at `headLock` version and passes the replayed token (`BuildDirtyChange` holds no inputs - see below) | +| `synthistory.BuildDirtyChange` | current tree replayed at `headLock` version vs stored `headLock` token | digest compare | **Caller replays** at `headLock` version and passes that token; `BuildDirtyChange` holds no inputs (see below) | | `ResolutionInputHash` staleness/write | recomputed resolution hash | vs stored **bare** digest | **No prefix** (bare `sha256:`); fingerprint-only bumps never touch it; replay reserved | **Two comparator classes, not one - and only one of them can replay.** The consumers split cleanly by *what they hold*: -- **Current-tree comparators** (`checkFingerprintFreshness`, `update`'s `Changed`, `BuildDirtyChange`) recompute against *live inputs*, so they **can and must** replay at the stored token's version. Feasible and invariant-safe. +- **Current-tree comparators** (`checkFingerprintFreshness`, `update`'s `Changed`, and `BuildDirtyChange`'s *caller*) hold *live inputs*, so they **can and must** replay at the stored token's version. (`BuildDirtyChange` itself takes only strings; its caller replays and passes a `headLock`-version token, see [the synthetic changelog/release path](#the-synthetic-changelogrelease-path-is-the-real-hazard).) Feasible and invariant-safe. - **Stored-vs-stored historical comparators** (`FindFingerprintChanges`, `changed.go`'s `classifyComponent`/`haveMatchingFingerprints`) hold only committed token *strings* from two git refs - no config, no FS, no inputs. They **cannot** replay, and replaying would require recomputing a historical fingerprint, which the [forever-invariant](#back-compat-invariant-synthetic-history-reads-stored-strings-never-recomputes) forbids outright. Both stay **string-only**, and both compare the **digest** (stripping the `v:` version prefix), which makes them inherently immune to version-only deltas - a v1→v2 re-stamp with an unchanged digest reads as "no change." (Strict-lazy churn is still the policy that keeps re-stamps from riding no-op commits in the first place, but the comparators no longer *depend* on it for correctness.) The `changed.go` classifier is the easily-missed member of the *second* class: it must get the same **digest-compare** as `FindFingerprintChanges`, so a version-only delta reads as "no change" - not a replay (which it cannot do, holding no inputs). @@ -594,7 +596,7 @@ The fix is a **two-type split**, because a single token type cannot tell the two A historical site holding two `StoredToken`s can call `SameDigest` but cannot fabricate a `FreshToken`, so it cannot accidentally pose as a current-tree freshness check; a current-tree site must obtain a `FreshToken` to reconcile, which forces it through live inputs. The *assignment* documents the class, and the mis-classification path is unconstructible rather than merely discouraged. Both types are **non-comparable** (an unexported `_ [0]func()` field), so a raw `==` on a token outside the `fingerprint` package fails to compile. (Unexported fields alone would *not* do this: a struct of comparable unexported fields is still `==`-comparable from any package; the non-comparable sentinel is what blocks it.) -For the choke-point to be *structural* and not merely conventional, the **lock fields must be token-typed, not raw `string`**: as long as `ComponentLock.InputFingerprint`/`ResolutionInputHash` stay exported strings, `lock.InputFingerprint == other.InputFingerprint` still compiles and the raw-compare pattern stays copyable. So PR C changes those fields to `StoredToken` (TOML marshal/unmarshal routing through `ParseToken`, so every read crosses the strict parser), or hides the raw string behind an accessor that returns a `StoredToken`. Only then does \"enforced by types, not prose\" hold end-to-end. This lands in PR C, which already edits every comparison site. **The on-disk bytes are not automatically unchanged, though; the field *form* decides it** (verified empirically against go-toml/v2 v2.3.1, the pinned version; the reproduction is banked as a committed `_test.go` artifact in PR C, so "verified empirically" has a citation). `omitempty` decides emptiness by reflecting a struct's *exported* fields **before** consulting `TextMarshaler`, so a token struct whose digest sits in an *unexported* field is judged empty and **dropped even when set**: a populated `input-fingerprint` silently vanishes, while a *non-*`omitempty` value struct instead emits a spurious `resolution-input-hash = ''` line. Two byte-neutral forms survive: (a) an **accessor**, keeping the on-disk field a `string` and exposing a `StoredToken` via method with writes routed through `ParseToken`, byte-neutral *by construction* (the serialized type never changes) and `==`-proof for every other package; or (b) a value struct with an **exported** digest field (so `omitempty` tracks it) plus a custom marshal that renders it as a bare string. The pointer form (`*StoredToken`) is byte-neutral but reintroduces a silent pointer-`==`, so it is rejected. **PR-C acceptance gate:** a golden round-trip test proving a real local lock's bytes are unchanged across the conversion, so the property is *tested*, not asserted. Either accepted form lands in PR C with no separate on-disk-format bump. +For the choke-point to be *structural* and not merely conventional, the **lock fields must be token-typed, not raw `string`**: as long as `ComponentLock.InputFingerprint`/`ResolutionInputHash` stay exported strings, `lock.InputFingerprint == other.InputFingerprint` still compiles and the raw-compare pattern stays copyable. So PR C changes those fields to `StoredToken` (TOML marshal/unmarshal routing through `ParseToken`, so every read crosses the strict parser), or hides the raw string behind an accessor that returns a `StoredToken`. Only then does \"enforced by types, not prose\" hold end-to-end. This lands in PR C, which already edits every comparison site. **The on-disk bytes are not automatically unchanged, though; the field *form* decides it** (verified empirically against go-toml/v2 v2.3.1, the pinned version; the reproduction is banked as a committed `_test.go` artifact in PR C, so "verified empirically" has a citation). `omitempty` decides emptiness by reflecting a struct's *exported* fields **before** consulting `TextMarshaler`, so a token struct whose digest sits in an *unexported* field is judged empty and **dropped even when set**: a populated `input-fingerprint` silently vanishes, while a *non-*`omitempty` value struct instead emits a spurious `resolution-input-hash = ''` line. Two byte-neutral forms survive, and they are **not** equally strong. (b) a value struct with an **exported** digest field (so `omitempty` tracks it) plus a custom marshal rendering a bare string is the **structurally stronger** choice and is **preferred for the "enforced by types" claim**: the lock field is a non-comparable token type, so a raw `==` fails to compile *everywhere*, including inside the `lockfile` package. (a) an **accessor** keeping the on-disk field a `string` and exposing a `StoredToken` via method (writes routed through `ParseToken`) is byte-neutral *by construction* (the serialized type never changes) but **weaker** - the raw `string` field stays `==`-able from within its own package, so the guarantee there rests on local discipline. The pointer form (`*StoredToken`) is byte-neutral but reintroduces a silent pointer-`==`, so it is rejected. **PR-C acceptance gate:** a golden round-trip test proving a real local lock's bytes are unchanged across the conversion, so the property is *tested*, not asserted. Either accepted form lands in PR C with no separate on-disk-format bump. ### The synthetic changelog/release path is the real hazard @@ -602,7 +604,7 @@ For the choke-point to be *structural* and not merely conventional, the **lock f - **`FindFingerprintChanges` (historical walker)** compares `InputFingerprint` across the lock's git history and emits a synthetic changelog/release entry on every change. It compares the **digest** (stripping the `v:` version prefix), not the full token - a one-line string operation, not the infeasible version-aware replay (it has only committed *strings*, no inputs). So a version-only re-stamp (a lazy v1→v2 migration with an unchanged digest) is **invisible** to it; only a moved digest - a genuine input change - fires, and the migration folds into the real change's entry that carries it. The v1→v2 conversion is thus an *accepted, per-component, notable* changelog event that piggybacks a real change, guaranteed by digest-comparison rather than by lazy-discipline. - **`component migrate` is release-grade *when it moves digests*.** A migrate that retires a *fingerprint* algorithm re-stamps every unchanged lock from `computeFP1`'s digest to `computeFP2`'s - the digests move, the walker fires, and the fleet-wide release is the deliberate cost ([registry floor](#registry-floor-and-forced-migration)). A migrate that retires only a *resolution* algorithm rewrites only the bare `ResolutionInputHash` (which `synthistory` never reads), so it is correctly release-silent. Either way the firing tracks a real `InputFingerprint` digest move. -- **`BuildDirtyChange` (live dirty check)** compares a *recomputed* current-version (v2) hash against the *stored* (possibly v1) `headLock.InputFingerprint` and declares dirty on inequality. Post-switchover an *unchanged* component would otherwise read **dirty on every `render`/`build`** until re-stamped - a persistent, recurring spurious signal, worse than a one-time entry. The fix is the *same replay the freshness check already owes*, but with one signature caveat: **`BuildDirtyChange(currentFingerprint string, headLock *ComponentLock, currentUpstreamCommit string)` carries no `config`/`opts`**, so it physically cannot replay. The **caller** (which holds `*result.config`) must replay at `headLock`'s recorded version and pass the already-replayed current-version token in; `BuildDirtyChange` then digest-compares as before. So it is a caller-side change (or a signature widening to take the inputs), not a one-liner inside `BuildDirtyChange` - reusing logic already being written, but crossing the `sources` package boundary. +- **`BuildDirtyChange` (live dirty check)** compares a *recomputed* current-version (v2) hash against the *stored* (possibly v1) `headLock.InputFingerprint` and declares dirty on inequality. Post-switchover an *unchanged* component would otherwise read **dirty on every `render`/`build`** until re-stamped - a persistent, recurring spurious signal, worse than a one-time entry. The fix is the *same replay the freshness check already owes*, but with one signature caveat: **`BuildDirtyChange(currentFingerprint string, headLock *ComponentLock, currentUpstreamCommit string)` carries no `config`/`opts`**, so it physically cannot replay. The **caller** (which holds the live config) must compute the current tree's fingerprint *at `headLock`'s recorded version* - yielding a **`headLock`-version** token, not a current-version one - and pass that in; `BuildDirtyChange` then digest-compares as before. (Replaying at the *current* version instead would compare a `v2` digest against the stored `v1` one and read dirty forever - the very pathology this fixes; the replay must target `headLock`'s version, matching the [discriminating PR-C test](#incremental-delivery).) **Dataflow caveat:** today `trySyntheticHistory` computes `currentFingerprint` *before* `buildSyntheticCommits` parses the HEAD lock ([`sourceprep.go`](../../../internal/app/azldev/core/sources/sourceprep.go) feeds a string into [`synthistory.go`](../../../internal/app/azldev/core/sources/synthistory.go)), so the fix **moves the fingerprint computation after the HEAD-lock parse** (or threads the inputs down to it) - a caller-side dataflow change crossing the `sources` package boundary, not a one-liner inside `BuildDirtyChange`. **Net:** the changelog-walker concern is not "make the walker version-aware" (hard, maybe infeasible). It is two cheap things - (1) the historical comparators (`FindFingerprintChanges`, `changed.go`) compare the **digest**, so a version-only delta never fires; and (2) extend the *current-tree* replay to `BuildDirtyChange`'s **caller** (which holds live inputs; `BuildDirtyChange` itself receives the replayed token), reusing logic already being written. The reset commit is the single deliberate exception: it *is* a fleet-wide notable event, the coordinated cutover, intentionally visible. @@ -613,7 +615,30 @@ For the choke-point to be *structural* and not merely conventional, the **lock f - **Smaller blast radius.** `ResolutionInputHash` does **not** feed `synthistory`, so an algorithm change can never mint a phantom changelog/release (that hazard is fingerprint-only). Worst case is a one-line `resolution-input-hash` rewrite per lock plus a wasted re-resolution that usually yields the same commit. Churn, not corruption. - **No pending change.** It is a flat seven-field SHA256, not a struct walk, so the projection substrate leaves it untouched. Its registry slot stays `computeRes1` until its inputs genuinely change. -**Decision (KISS/YAGNI):** wire fingerprint replay in Part 2's first PR. `ResolutionInputHash` stays a **bare `sha256:` digest with no `v:` prefix** (the prefix lives only in `InputFingerprint` - see [Both hashes share one version](#both-hashes-share-one-version)), so the resolver compares it directly and a fingerprint-only bump never touches it. The day `ComputeResolutionHash` first changes, add `computeRes2` and extend replay to its comparison sites (the resolution-staleness branch of `computeFreshnessStatus` + the `resHashChanged` silent-write guard in `updateResolutionHash`). The shared-prefix choice is **already settled** - resolution reads `InputFingerprint`'s prefix; it does not grow its own - so the only deferred work is *wiring* that replay. Deferral is safe because resolution carries no prefix and is compared bare today, so the shared-prefix desync is **dormant, not eliminated**, waking only when resolution gains a second algorithm. The seam: the prefix advances only on `result.Changed`, while a resolution-only write takes the independent `resHashChanged` path ([`update.go`](../../../internal/app/azldev/cmds/component/update.go)). So once `computeRes2` exists, a resolution-only write would advance the bare digest while the shared prefix stays at `v1`, and replay would select `computeRes1` → permanent false-stale. This is **safe-direction (G1 churn, never a missed rebuild) and dormant** (resolution replay is reserved; one algorithm today), so it gates no PR now. To stop it shipping silently the day it matters, the registry carries a **forced-conversation tripwire** - it surfaces the desync for review, it does not by itself verify the fix: give `lockAlgo` a `resolutionVersion int` field and have the registry `init()` panic (**failing at startup/test-time, caught by any package test in CI - not by `go build`**) if any entry's `resolutionVersion` exceeds the floor's without a `resolutionPrefixHandled` constant set, reducing the predicate to a `len`/const check. Remediation - resolution taking its own prefix, or a resolution-only write co-stamping `InputFingerprint`'s prefix to current (same digest) - is a property of `updateResolutionHash`, a *different* package the registry loop cannot see, so it is verified by a **named CI test mirroring PR C's dirty-change gate**, not by the tripwire. The decision is *forced* at `computeRes2`, not forgotten. +**Decision (KISS/YAGNI):** wire fingerprint replay in Part 2's first PR. `ResolutionInputHash` stays a **bare `sha256:` digest with no `v:` prefix** (the prefix lives only in `InputFingerprint` - see [Both hashes share one version](#both-hashes-share-one-version)), so the resolver compares it directly and a fingerprint-only bump never touches it. The day `ComputeResolutionHash` first changes, add `computeRes2` and extend replay to its comparison sites (the resolution-staleness branch of `computeFreshnessStatus` + the `resHashChanged` silent-write guard in `updateResolutionHash`). The shared-prefix choice is **already settled** - resolution reads `InputFingerprint`'s prefix; it does not grow its own - so the only deferred work is *wiring* that replay. Deferral is safe because resolution carries no prefix and is compared bare today, so the shared-prefix desync is **dormant, not eliminated**, waking only when resolution gains a second algorithm. The seam: the prefix advances only on `result.Changed`, while a resolution-only write takes the independent `resHashChanged` path ([`update.go`](../../../internal/app/azldev/cmds/component/update.go)). So once `computeRes2` exists, a resolution-only write would advance the bare digest while the shared prefix stays at `v1`, and replay would select `computeRes1` → permanent false-stale. This is **safe-direction (G1 churn, never a missed rebuild) and dormant** (resolution replay is reserved; one algorithm today), so it gates no PR now. To stop it shipping silently the day it matters, the registry carries a **forced-conversation tripwire** - a *behavioural* CI guard (an `init()` panic), not a structural one: it surfaces the desync for review but does not by itself verify the fix. Give `lockAlgo` a `resolutionVersion int` field and have `init()` panic when `max(entry.resolutionVersion) > minSupportedLockContentVersion` while a `resolutionPrefixHandled` constant is unset. It is a **max-loop, not a `len` count** - a fingerprint-only bump grows `len(lockAlgos)` to 2 with every `resolutionVersion` still `1`, so a length check would panic on correct code. The panic fires at **startup/test-time (caught by any package test in CI), not at `go build`**. The actual remediation is a resolution-only write co-stamping `InputFingerprint`'s prefix to current (same digest); [the shared-prefix choice is already settled](#both-hashes-share-one-version), so resolution does **not** grow its own prefix. That co-stamp lives in `updateResolutionHash`, a *different* package the registry loop cannot see, so flipping `resolutionPrefixHandled` only silences the panic - the fix itself is verified by a **named CI test** asserting that a resolution-only write (`resHashChanged && !Changed`) leaves `InputFingerprint`'s prefix `== currentLockContentVersion`, mirroring PR C's dirty-change gate. The decision is *forced* at `computeRes2`, not forgotten. + +```go +// Added only when computeRes2 is first registered (the [minSupported,current] +// gap check from the baseline registry stays; this is the additional guard): +type lockAlgo struct { + fingerprint computeFn + resolution resolveFn + resolutionVersion int // bumped only when `resolution` itself changes +} + +const resolutionPrefixHandled = false // flip true once updateResolutionHash co-stamps the prefix + +func init() { + // ... existing [minSupported,current] gap check ... + maxRes := 0 + for _, a := range lockAlgos { + maxRes = max(maxRes, a.resolutionVersion) + } + if maxRes > minSupportedLockContentVersion && !resolutionPrefixHandled { + panic("resolution algorithm advanced past floor without prefix co-stamp; see ResolutionInputHash") + } +} +``` ## Design decisions @@ -645,7 +670,7 @@ The `go generate` *infrastructure* already exists (`stringer`/`mockgen` via `mag The stored hash is a single `v:sha256:` token, not separate version and digest fields. One field, written atomically, so the version and the digest can never desync (the class of bug a split-field design invites when one is written and the other is not). -The lock **format** `Version` stays at `1`. Bumping it to `2` as a poison pill - to stop old binaries touching reset locks - is too blunt: it also stops them reading pins to *queue a build*. Instead, back-compat rests on two cheaper properties: the format is unchanged so every binary parses every lock, and the content-version registry **force-rehashes** any sub-floor token (legacy, or downgraded by an old binary) up to the current version. Old binaries stay useful (read pins, build); their only possible mischief - writing a legacy-substrate hash - is self-correcting on the next new-binary run, not silent corruption. Back-compat is therefore: **same format forever, reconcile fingerprints by version, never recompute history.** +The lock **format** `Version` stays at `1`. Bumping it to `2` as a poison pill - to stop old binaries touching reset locks - is too blunt: it also stops them reading pins to *queue a build*. Instead, back-compat rests on two cheaper properties: the format is unchanged so every binary parses every lock, and the content-version registry **force-rehashes** any sub-floor token (legacy, or downgraded by an old binary) up to the current version. Old binaries stay useful (read pins, build); their only possible misbehaviour - writing a legacy-substrate hash - self-corrects on the next new-binary run **in the working tree**. A *committed* downgrade is the one residual it does **not** fix (it would read as a `v1 → legacy → v1` release pair): that case is prevented by the pinned lock-writing CI path, not by force-rehash (see [mixed-toolchain hazard](#registry-floor-and-forced-migration)). Back-compat is therefore: **same format forever, reconcile fingerprints by version, never recompute history.** ### D4: Project to bytes, not a `ConfigHash()` method on the type @@ -660,15 +685,15 @@ The lock **format** `Version` stays at `1`. Bumping it to `2` as a poison pill - - **Eager fleet-wide migration as the steady-state mechanism** - rewriting every lock on every algorithm change is the mass-churn the design exists to prevent. Rejected for the steady state. The *reset* is a deliberate, one-time, operator-driven eager pass riding an already-scheduled rebuild - the sanctioned exception, not the rule; `component migrate` is its post-reset equivalent for retiring an old version. - **Runtime reflective walker for field selection (instead of generated functions).** One generic `project(cfg, N)` reflects the struct at hash time and emits the fields whose version-set includes N. Least code, and it shares the tag syntax with the chosen approach. Rejected: it reflects the *live* struct at hash time - Problem 6 one layer down - so its frozen-ness rests entirely on golden-vector coverage (test discipline), and field removal degrades from a compile error to a CI failure. Codegen keeps the same tags but moves the reflection to *generate* time and freezes the output as checked-in code, recovering the compile guarantee. - **Hand-written per-version `projectVN` functions (instead of generating them from tags).** Each version gets a bespoke function with one explicit `emit`/`emitAlways` line per measured field. Same compile guarantees as codegen (removal won't compile, literal emit-key), but: membership is smeared across N function bodies; "bring a field back a few versions later" has no first-class expression (you re-add an `emit` line, nothing ties it to the field's earlier life); and the mandatory-decision and coverage properties need separate bookkeeping the tags otherwise carry. Codegen is the same runtime with declarative authoring - strictly preferable given the existing `go generate` infrastructure. -- **Per-field hash manifest in the lock (instead of one opaque token).** Store `{field → hash}` (à la `go.sum`) rather than a single `v:sha256:…` digest. *Genuine wins:* dropping a field becomes ignoring its manifest line - no projection kept alive for replay, so the **deprecate-then-delete two-step and the registry-retirement deadlock** (the append-only growth above) both vanish; and the stored-vs-stored historical comparators become structural set-diffs rather than version-blind string compares. *Why the opaque token still wins for azldev:* (1) the projection substrate **already** delivers additive immunity (G4) - the manifest's headline draw - so that advantage is moot, not additive; (2) the manifest does **not** kill the false-fresh hazard - an old lock has *no line* for a newly-measured input, so there is still no baseline to detect a change to it (the blind spot is relocated, not removed); (3) it makes *algorithm evolution* - the entire point of Part 2 - **harder**, needing per-field versioning where the token needs one integer for the whole algorithm; and (4) it bloats every lock to O(fields × components) (the well-known `go.sum` size cost). The manifest is the better tool for a *static* input set that mainly grows and shrinks; the opaque token + single version is the better tool for an *evolving hashing algorithm*, which is azldev's actual problem. There is no *incremental* middle path between them: sha256 is non-homomorphic, so per-field digests cannot be combined into the whole-config digest without re-hashing - the storage choice is genuinely binary (whole-config projection vs full per-field manifest), not a spectrum. The reset bakes the storage model in - token-vs-manifest is irreversible after PR B - and the retirement deadlock the manifest would have dissolved is instead answered by the floor-advance cadence above. +- **Per-field hash manifest in the lock (instead of one opaque token).** Store `{field → hash}` (à la `go.sum`) rather than a single `v:sha256:…` digest. *Genuine wins:* dropping a field becomes ignoring its manifest line - no projection kept alive for replay, so the **deprecate-then-delete two-step and the registry-retirement deadlock** (the append-only growth above) both vanish; and the stored-vs-stored historical comparators become structural set-diffs rather than version-blind string compares. *Why the opaque token still wins for azldev:* (1) the projection substrate **already** delivers additive immunity (G4) - the manifest's headline draw - so that advantage is moot, not additive; (2) the manifest does **not** kill the false-fresh hazard - an old lock has *no line* for a newly-measured input, so there is still no baseline to detect a change to it (the blind spot is relocated, not removed); (3) it makes *algorithm evolution* - the entire point of Part 2 - **harder**, needing per-field versioning where the token needs one integer for the whole algorithm; and (4) it bloats every lock to O(fields × components) (the well-known `go.sum` size cost). The manifest is the better tool for a *static* input set that mainly grows and shrinks; the opaque token + single version is the better tool for an *evolving hashing algorithm*, which is azldev's actual problem. (sha256 is non-homomorphic, so per-field digests cannot be *cheaply* recombined into the whole-config digest without re-hashing; a Merkle-style hybrid is possible but buys nothing here - it still needs per-field versioning, still relocates rather than removes the false-fresh blind spot, and still bloats the lock. The rejection rests on that cost, not on the absence of a middle option.) The reset bakes the storage model in - token-vs-manifest is irreversible after PR B - and the retirement deadlock the manifest would have dissolved is instead answered by the floor-advance cadence above. ## Incremental delivery The reset (Part 1) must land as one coherent change at the dev→prod cutover; its pieces are independently reviewable but ship together because they all move the hash. 1. **PR A (substrate)**: the **projection generator** (`go generate`) - reads the version-set tags and emits the per-version `projectVN(cfg) []byte` functions (literal emits, sorted keys) plus golden-vector and coverage scaffolding - the canonical encoder (`canonicalBuf`, `emit`/`emitAlways`), the version-set tag parser, the frozen **TOML-key** emit rule, the **split omit-predicate** (scalar leaves `IsZero`, composites projected emptiness), the `sha256` combiner, and the golden vectors. Generate-time guards: a fingerprinted field with **no tag** fails generation; the slimmed **exclusion ledger** and **dropped-fields ledger** replace the retired `TestAllFingerprintedFieldsHaveDecision` audit; **regeneration-idempotence** (CI `go generate` + `git diff --exit-code`) pins superseded versions. Additive alongside the existing fingerprint path (it does retire the `TestAllFingerprintedFieldsHaveDecision` audit, replaced by the two ledgers above); not yet wired into `ComputeIdentity`. Tests: a field tagged `v2..*` is absent from generated `projectV1`; a `!` range emits at zero; a field with **no** `fingerprint` tag fails generation; a **nested** fingerprinted struct with a tagless field fails generation; deleting a field a retained `projectVN` names **fails to compile**; a **Go-field rename keeping the TOML key** yields a byte-identical digest; two fields colliding on one emit-key fail generation; a `!`-tagged nested struct whose every child is `-` (so its mandatory `!`-zero discrimination vector is unsatisfiable) is rejected as degenerate; the coverage oracle (by struct-reflection, not the tag) fails when a build-effective field is tagged too narrowly (`v1..v1` at current `v2`) and is not in the dropped-fields ledger; golden vectors pin v1; a non-contiguous set (`v1..v1,v3..*`) round-trips through the parser; a CI assertion that `PackageConfig`'s field set stays within its known publish-only set (a new field fails CI, forcing re-evaluation of the parent `-` exclusion - the prune-at-`-` tripwire). -2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`, asserted by a named-constant test (`currentVersion == 1`) with a comment that the *content* version lives in the token prefix, not here - so a future format bump cannot silently break every historical read through `lockfile.Parse`. Ships at the cutover; absorbed by the scheduled rebuild. The `hashstructure` import and its `go.mod` entry are removed here, since no caller survives the switch. Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. -3. **PR C (Part 2 machinery)**: the **two-type token split** - `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`, fails closed on its zero value), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **every** comparison and compute site through these types. The **current-tree** sites (via `FreshToken.Reconcile`): replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, and `BuildDirtyChange`'s caller (which holds the inputs; `BuildDirtyChange` itself receives the replayed token); plus the `computeCurrentFingerprint` (`sourceprep.go`) return-type cascade `string → FreshToken`. `bumpComponents` (`update.go`) is the *second* `ComputeIdentity` caller but a **forced-change writer** (always `Changed`, no stored-token comparison), so it takes `FreshToken.Token()` to write the new token unconditionally - **not** `Reconcile`, whose `Stale`-on-zero would let a zero/malformed token slip silently into the lock. The **historical** sites (via `StoredToken.SameDigest`): `FindFingerprintChanges`, `changed.go`'s `classifyComponent`, **and `haveMatchingFingerprints`**. **`haveMatchingFingerprints` is security-load-bearing:** it gates the cache-poisoning integrity check (`if result.SourcesChange && haveMatchingFingerprints(...)` in `changed.go`). If only `classifyComponent` is converted and this site is missed, the first legitimate `v2` bump makes a version-only re-stamp compare unequal → the integrity violation is **never recorded → tamper evidence silently swallowed**. It must convert to digest-compare in the same PR. Resolution replay reserved (slot reuses `computeRes1`). **Ordering gate (CI-enforced):** `currentLockContentVersion > 1` is forbidden unless `BuildDirtyChange` already routes through `Reconcile` - otherwise registering `v2` makes every component read persistently dirty on every `render`/`build`. The gate is necessary but not sufficient (it does not prove `haveMatchingFingerprints` converted), so it is paired with a **named acceptance test**: `from="v1:sha256:X"`, `to="v2:sha256:X"` ⇒ `haveMatchingFingerprints` returns **true** - a missed conversion fails CI rather than silently disabling the integrity check. **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* - only the *registry dispatch* is dormant while just `v1` exists. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event and does **not** suppress `haveMatchingFingerprints`; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a zero-value `FreshToken`/`StoredToken` fails closed; a historical site cannot construct a `FreshToken`; the registry `init()` panics on a `[minSupported,current]` gap; a named `classifyComponent({name:"v1:sha256:X"}, {name:"v2:sha256:X"}) == Unchanged` (the third raw historical compare, with no CI gate of its own); a `BuildDirtyChange(v2-token, headLock-v1-same-digest) == nil` (a `RestampTo` must not mint a dirty synthetic commit; the existing "not a changelog event" test exercises `FindFingerprintChanges`, not this path); a **discriminating** `BuildDirtyChange` test where the current-version digest *moved* but the caller's replay at `headLock`'s version *matches* ⇒ `nil` (proves the replay actually runs - the same-digest case alone would pass a prefix-stripping digest-compare that never replays); `bumpComponents` writes via `FreshToken.Token()`, and a statically-zero token surfaces an error at the write site rather than a silent `Stale`; a malformed token round-trips its **original raw bytes** through `MarshalText`, so a malformed lock is never rewritten on save (no spurious `FindFingerprintChanges` event). +2. **PR B (reset cutover)**: switch `ComputeIdentity` to `projectV1`; adopt the atomic `v1:sha256:` token; unify on sha256. Lock format `Version` stays `1`, asserted by a named-constant test (`currentVersion == 1`) with a comment that the *content* version lives in the token prefix, not here - so a future format bump cannot silently break every historical read through `lockfile.Parse`. Ships at the cutover; absorbed by the scheduled rebuild. The `hashstructure` import and its `go.mod` entry are removed here, since no caller survives the switch. The reconciliation PR B needs is **minimal** - parse the token, treat a prefix-less or below-`v1` token as stale, force-rehash to `v1` against a hardcoded floor of `1`; the general versioned registry that *replays* higher versions is PR C (Part 2). Unit tests: a legacy prefix-less token is read as sub-floor and force-rehashed to `v1`; a `v1:` token round-trips; an old binary (format `1`) still parses pins from a reset lock. +3. **PR C (Part 2 machinery)**: the **two-type token split** - `StoredToken` (parsed by the sole strict `ParseToken`: accepts only `sha256:` and `v:sha256:`, malformed → *changed*, never an empty-digest false match; exposes `SameDigest` only) and `FreshToken` (from `ComputeIdentityAt`, exposes `Reconcile(stored) → {Fresh | Stale | RestampTo(v)}`, fails closed on its zero value), both **non-comparable** (`_ [0]func()`); the version registry (`lockAlgos`, `currentLockContentVersion`, `minSupportedLockContentVersion`); `ComputeIdentityAt`; and routing **every** comparison and compute site through these types. The **current-tree** sites (via `FreshToken.Reconcile`): replay-before-`Changed` in `update.go`, `checkFingerprintFreshness`, and `BuildDirtyChange`'s caller (which holds the inputs; `BuildDirtyChange` itself receives the replayed token); plus the `computeCurrentFingerprint` (`sourceprep.go`) return-type cascade `string → FreshToken`. `bumpComponents` (`update.go`) is the *second* `ComputeIdentity` caller but a **forced-change writer** (always `Changed`, no stored-token comparison), so it takes `FreshToken.Token()` to write the new token unconditionally - **not** `Reconcile`, whose `Stale`-on-zero would let a zero/malformed token slip silently into the lock. The **historical** sites (via `StoredToken.SameDigest`): `FindFingerprintChanges`, `changed.go`'s `classifyComponent`, **and `haveMatchingFingerprints`**. **`haveMatchingFingerprints` is security-load-bearing:** it gates the cache-poisoning integrity check (`if result.SourcesChange && haveMatchingFingerprints(...)` in `changed.go`). If only `classifyComponent` is converted and this site is missed, the first legitimate `v2` bump makes a version-only re-stamp compare unequal → the integrity violation is **never recorded → tamper evidence silently swallowed**. It must convert to digest-compare in the same PR. Resolution replay reserved (slot reuses `computeRes1`). **Ordering gate (CI-enforced):** `currentLockContentVersion > 1` is forbidden unless `BuildDirtyChange`'s caller already replays at the `headLock` version (computing the current tree at the stored token's version before the digest-compare) - otherwise registering `v2` makes every component read persistently dirty on every `render`/`build`. This gate is **structural for `BuildDirtyChange` only**; the two *historical* sites it cannot prove are each covered by a **release-blocking named test** (behavioural, not structural): `haveMatchingFingerprints` (`from="v1:sha256:X"`, `to="v2:sha256:X"` ⇒ returns **true**, or the cache-poisoning check silently disables) and `FindFingerprintChanges` (a version-only re-stamp across two commits emits **no** synthetic release, or a phantom `%autorelease` ships). A missed conversion fails CI rather than silently shipping the regression. **Not fully inert:** this PR switches the live compares from raw-string to token-routed *on merge* - only the *registry dispatch* is dormant while just `v1` exists. Unit tests: a synthetic `v1`/`v2` pair with unchanged inputs → `Current` and **not** `Changed`; changed inputs → `Stale`; re-stamp only on an already-dirty write; a digest-identical `v1`→`v2` re-stamp is **not** a changelog event and does **not** suppress `haveMatchingFingerprints`; the reset boundary `sha256:X`→`v1:sha256:Y` fires exactly once; a malformed token is treated as changed, never silently equal; a raw `==` on a token outside the `fingerprint` package fails to compile; a zero-value `FreshToken`/`StoredToken` fails closed; a historical site cannot construct a `FreshToken`; the registry `init()` panics on a `[minSupported,current]` gap; a named `classifyComponent({name:"v1:sha256:X"}, {name:"v2:sha256:X"}) == Unchanged` (the third raw historical compare, with no CI gate of its own); a `BuildDirtyChange(v2-token, headLock-v1-same-digest) == nil` (a `RestampTo` must not mint a dirty synthetic commit; the existing "not a changelog event" test exercises `FindFingerprintChanges`, not this path); a **discriminating** `BuildDirtyChange` test where the current-version digest *moved* but the caller's replay at `headLock`'s version *matches* ⇒ `nil` (proves the replay actually runs - the same-digest case alone would pass a prefix-stripping digest-compare that never replays); `bumpComponents` writes via `FreshToken.Token()`, and a statically-zero token surfaces an error at the write site rather than a silent `Stale`; a malformed token round-trips its **original raw bytes** through `MarshalText`, so a malformed lock is never rewritten on save (no spurious `FindFingerprintChanges` event). 4. **PR D (validation)**: scenario test (in the style of `scenario/component_changed_test.go`) - add a field absent from `projectV1` and set it on one component; assert only that lock drifts and every other lock is byte-identical. 5. **PR E (config schema axis, later)**: `schema-version` field + load-time canonical migration + the `config migrate` command. Gated on the first post-reset non-additive TOML change not already absorbed by the reset's normalization pass. 6. **PR F (forced lock migration, gated on the first floor raise)**: the `component migrate` command (the only sanctioned floor-raise; the prescribed fix for a build-critical newly-measured input) and the CI spread-ceiling on `currentLockContentVersion - minSupportedLockContentVersion`. **Gating:** a `v2` bump *without* PR F is safe - v1 stays in the registry and the floor stays at 1, so unmigrated locks still replay. PR F is required only before **raising `minSupportedLockContentVersion` above 1** (retiring v1), since that is what makes un-migrated locks unreplayable. A CI gate forbids raising the floor unless `component migrate` exists. So PR F is decoupled from the first `v2` and gated on the first floor raise **or** the first content-version bump whose decision rule demands immediate fleet-wide adoption (a build-critical newly-measured input, which cannot wait for lazy migration). @@ -690,8 +715,8 @@ Indexed here for quick reference; each is argued in full at the linked section. | Field selection is **codegen** from mandatory per-field version-set tags (absent ⇒ generation fails); `go generate` emits the per-version `projectVN` | [§Version-tagged field selection](#version-tagged-field-selection) | | Emit-key = frozen TOML key (`key=` override; duplicate keys fail generation); omit-predicate splits - scalar leaves `IsZero`, composites *projected* emptiness | [§Version-tagged field selection](#version-tagged-field-selection) | | Tag DSL frozen at three range-operators (`..` `!` `*`) plus the orthogonal `key=` | [§Version-tagged field selection](#version-tagged-field-selection) | -| Canonical byte encoding = existing length-prefixed `:=:`; maps sorted-key; per-type value slots - pinned irreversibly at the reset | [§Baseline v1 (encoding table)](#baseline-v1-omit-if-zero-no-include-always-legacy) | -| Frozen-ness = compiler + generator + regeneration-idempotence; golden-vector coverage (tag-independent dropped-fields oracle) is the backstop; exclusion ledger kept for `-` fields | [§Golden-vector coverage](#golden-vector-coverage-the-backstop) | +| Canonical byte encoding = existing length-prefixed `:=:`; maps sorted-key; per-type value slots - pinned irreversibly at the reset | [§Baseline v1: encoding table](#baseline-v1-omit-if-zero-no-include-always-legacy) | +| Frozen-ness = compiler + generator + regeneration-idempotence; golden-vector coverage (tag-independent dropped-fields oracle) is the backstop; exclusion ledger kept for `-` fields | [§Golden-vector coverage: the backstop](#golden-vector-coverage-the-backstop) | | Stored hash = atomic `v:sha256:` token; lock format `Version` stays `1`; sub-floor/downgraded tokens reconciled by force-rehash | [§The lock changes at the reset](#the-lock-changes-at-the-reset-atomic-token--forced-upgrade) | | Stored hash read only through the two-type token split (`StoredToken`/`FreshToken`, non-comparable), adopted in PR C | [§Downstream consumers](#downstream-fingerprint-consumers-blast-radius) | | Version write-guard required (refuse to write above the binary's `currentLockContentVersion`); CI version-pin blocks old-binary commits | [§Registry floor](#registry-floor-and-forced-migration) |