From 024435f56c43ac92d9024f42263151e568275c8d Mon Sep 17 00:00:00 2001
From: "Marcus R. Brown" <git@mrbro.dev>
Date: Tue, 23 Jun 2026 14:37:33 -0700
Subject: [PATCH 1/7] docs: brainstorm and plan the failed-then-fixed capture
 trigger

Add the second capture trigger: PRs whose CI failed then passed before merge feed
the existing propose-only pipeline. Transition detection is one GraphQL query per
PR; the fixing diff is the primary signal with best-effort failing logs; a new
secret/redaction scan layers on the private-name gate because logs and diffs leak
more than repo names; a discriminated candidate union keeps dedup, cap, and
privacy shared across both triggers.
---
 ...-failed-then-fixed-capture-requirements.md | 147 +++++++
 ...-feat-c2-failed-then-fixed-capture-plan.md | 361 ++++++++++++++++++
 2 files changed, 508 insertions(+)
 create mode 100644 docs/brainstorms/2026-06-22-c2-failed-then-fixed-capture-requirements.md
 create mode 100644 docs/plans/2026-06-22-004-feat-c2-failed-then-fixed-capture-plan.md

diff --git a/docs/brainstorms/2026-06-22-c2-failed-then-fixed-capture-requirements.md b/docs/brainstorms/2026-06-22-c2-failed-then-fixed-capture-requirements.md
new file mode 100644
index 000000000..d16eb030f
--- /dev/null
+++ b/docs/brainstorms/2026-06-22-c2-failed-then-fixed-capture-requirements.md
@@ -0,0 +1,147 @@
+---
+title: 'C2 — Failed-then-fixed capture trigger'
+date: 2026-06-22
+status: ready
+scope: standard
+kind: requirements
+parent: docs/brainstorms/2026-06-22-skill-saving-grow-and-learn-requirements.md
+---
+
+# C2 — Failed-then-fixed capture trigger
+
+## Purpose
+
+Add the second capture trigger to the grow-and-learn pipeline. Today the capture run learns
+only from PRs that went through multiple substantive review rounds. This adds PRs whose CI
+**failed then passed before merge** — a concentrated mistake→correction signal — feeding the
+same propose-only pipeline. The parent doc deferred this trigger to "Phase 2.5, once
+multi-round-review precision is proven"; that precondition is met (the trigger shipped,
+validated live, and review-prose enrichment raised proposal quality).
+
+## The vision in one sentence
+
+When a pull request breaks CI and a fix turns it green before merge, Fro Bot captures the
+"this failure → this fix" lesson as a proposed learning, the same way it already learns from
+review-heavy PRs.
+
+## Problem frame
+
+A CI failure that gets fixed is a clean, structured mistake→correction event: a named check
+went red, a diff turned it green. That is exactly the kind of concrete signal the
+review-prose enrichment proved produces good learnings — and exactly what the multi-round
+trigger's title-only first cut lacked. The data is queryable: a PR's check-run history shows
+the fail→pass transition, the failing run's logs carry the actual error, and the fixing
+commits carry the diff. C3 (issue triage) is deferred because its resolution rationale lives
+in free-text comments with no clean structured signal — the weak-signal trap the enrichment
+follow-up (#3552) flagged.
+
+## Accepted risk — value gate
+
+C1's signal *precision* is proven (it finds real review-heavy PRs and authors specific,
+useful learnings, and Phase 1 retrieval live-fired — Fro Bot cited a prior learning on a real
+issue). What is **not** yet measured is corpus *adoption* over time: do the authored docs keep
+getting retrieved and applied across many future runs? C2 broadens the "grow" half before that
+slow signal is in. This is an accepted risk: C2's failed-then-fixed is an independently clean,
+high-signal source (not speculative), and the cost of a second proven-clean trigger is low.
+If corpus adoption later proves weak, the lever to revisit is the retrieval/authoring side, not
+the number of capture triggers.
+
+## Requirements
+
+- R1. The capture run identifies merged PRs (in the existing lookback window, this repo only)
+  whose required checks transitioned **failed → passed** before merge — a fail→fix event.
+- R2. For each such PR, the digest carries: the failed check name(s), a **failing-step log
+  excerpt** (ranked toward the error lines, budgeted per candidate), and the **fixing diff**
+  (the change that turned the check green). This is the agent's primary signal — it distills
+  "this failure → this fix."
+- R3. The new digest content (logs and diffs) passes the existing fail-closed upstream
+  private-repo-name scan **plus a stronger log/diff-specific scan**, applied before the content
+  reaches the agent. CI logs and diffs are a wider leak surface than review prose: they can
+  carry secrets/tokens (masked or not), internal hostnames/URLs, environment values, and file
+  paths that the private-repo-name token scan does not catch — and the agent authors a
+  **public** issue from this content. The log/diff scan must detect and redact (or drop on)
+  secret-shaped strings, internal URLs/hostnames, and credential-shaped content, not only
+  tracked private-repo-name tokens. Never unscanned log/diff content reaches the agent.
+- R3a. On a privacy/secret hit in log/diff content, the safe default is to **drop the enriched
+  log/diff for that candidate**; the candidate may proceed only on already-clean signal
+  (e.g. the failed check name, which is not sensitive). The fallback signal must itself be
+  proven non-sensitive — a candidate must never reach the public-issue step carrying
+  unsanitized log/diff content.
+- R4. A PR is proposed **at most once** regardless of how many triggers it matches — a PR that
+  was both review-heavy and failed-then-fixed yields one learning-proposal. The existing
+  merge-SHA body marker already dedups by merge SHA; C2 candidates dedup against the same set.
+- R5. C2 reuses the existing pipeline end to end: the opaque digest contract, the shared
+  privacy module, the merge-SHA marker, the `learning-proposal` label, the per-run cap, and
+  the capture workflow. It is a **new candidate source**, not a new pipeline.
+- R6. Telemetry stays counts-only (no owner/name/path in logs); the harvest summary gains a
+  per-trigger candidate count so an operator can see C2's contribution.
+
+## Scope boundaries
+
+- C2 (failed-then-fixed) only. This repo only, same lookback as the existing harvest.
+- "Failed then passed" is a transition on the PR's checks before merge — not a flaky single
+  re-run with no diff. A fix with a real diff is required (a pure re-run that flips green with
+  no code change is not a learning).
+- No change to the propose-only model, the privacy contract, the dedup marker, or the cap.
+
+### Deferred
+
+- **C3 — Issue triage.** Murky free-text signal; revisit after C2 proves failed-then-fixed
+  learning quality, and only with a structured proxy (e.g. issues closed by a linked PR).
+- **C4 — Cross-run synthesis.** Phase 3.
+- Cross-repo C2 (other fro-bot repos): later; v1 is this repo only.
+
+## Open questions (for planning)
+
+- **Q1 — Transition detection mechanism. THE LOAD-BEARING RISK.** Detecting a fail→pass
+  transition is not a single cheap query: GitHub's check-runs / commit-status APIs report state
+  **per head SHA**, not a PR-level history, so a transition must be reconstructed by correlating
+  check state across a PR's successive head SHAs (an O(merged-PRs × pushes) fan-out). Planning
+  must settle: which API (check-runs per SHA vs. commit-status timeline), how to bound the cost,
+  how "required checks" are determined at merge time (branch-protection required contexts are
+  current-state config, not necessarily historical), and how to require a real fixing diff (not
+  a bare green re-run). If this proves too expensive or unreliable, C2's signal needs rethinking
+  before build — this question gates the trigger.
+- **Q1a — Log retention.** Actions logs are retention-bounded (default ~90 days, sometimes
+  less), so a PR inside the lookback window may have purged failing logs. Planning must define
+  the degrade/skip rule when logs are unavailable (e.g. proceed on diff + check name only, or
+  skip the candidate) — the log excerpt cannot be assumed always fetchable.
+- **Q2 — Failing-log fetch + ranking.** Which failing run/job/step to pull, how to rank the
+  excerpt toward the error lines (mirror the correction-first ranking from enrichment), and the
+  per-candidate budget. Logs/diffs are much larger than review prose — truncate to budget
+  **before** the privacy/secret scan so the scan and the agent never see the full volume.
+- **Q3 — Fixing-diff scope.** The diff "that fixed it" — the commits between the last failing
+  SHA and the first passing SHA — vs. the whole PR diff. The narrower fixing-commit range is
+  the sharper signal but needs the transition SHAs from Q1 (downstream of the hard part).
+- **Q4 — Evidence model.** The existing `Candidate` shape is review-specific (`reviewRounds`,
+  `reviewExcerpts`). Planning must decide whether C2 widens it to a shared evidence model or
+  overloads those fields — "new source, not new pipeline" still requires a candidate shape that
+  isn't review-only.
+
+(Resolved by the requirements, not open: privacy hit → drop enriched content, keep clean
+signal (R3a); per-run cap is the existing shared budget, not per-trigger (R5).)
+
+## Success criteria
+
+- **SC1** — A capture run identifies a real failed-then-fixed PR in the window and emits a
+  candidate carrying the failed check name, a log excerpt, and the fixing diff.
+- **SC2** — The new log/diff content passes the upstream scan fail-closed for BOTH a tracked
+  private-repo-name token AND secret-shaped content with no repo-name present: fixtures with a
+  secret/token-shaped string, an internal hostname/URL, and a credential-shaped string in a diff
+  are each dropped/redacted (mutation-proven), not just the private-repo-name case.
+- **SC3** — A PR matching both triggers yields exactly one learning-proposal (dedup holds).
+- **SC4** — A produced proposal distills a specific "failure → fix" lesson grounded in the
+  log/diff, not the PR title (the enrichment-quality bar).
+- **SC5** — Telemetry shows the per-trigger candidate count; logs remain counts-only.
+
+## Sources & references
+
+- Parent: docs/brainstorms/2026-06-22-skill-saving-grow-and-learn-requirements.md (Phase 2.5
+  deferred C2/C3; "add once multi-round-review precision is proven")
+- Plan: docs/plans/2026-06-22-002-feat-capture-c1-proposals-plan.md (the propose-only pipeline)
+- Plan: docs/plans/2026-06-22-003-feat-enrich-capture-digest-plan.md (review-prose enrichment
+  + the upstream privacy-gate pattern C2 reuses)
+- Code: scripts/capture-learnings-harvest.ts (the harvest to extend with a second source),
+  scripts/capture-learnings-privacy.ts (the shared fail-closed gate), scripts/capture-learnings-open.ts,
+  .github/workflows/capture-learnings.yaml
+- Follow-up #3552 (digest-quality finding that motivates evidence-rich C2 content)
diff --git a/docs/plans/2026-06-22-004-feat-c2-failed-then-fixed-capture-plan.md b/docs/plans/2026-06-22-004-feat-c2-failed-then-fixed-capture-plan.md
new file mode 100644
index 000000000..9ca0fee82
--- /dev/null
+++ b/docs/plans/2026-06-22-004-feat-c2-failed-then-fixed-capture-plan.md
@@ -0,0 +1,361 @@
+---
+title: 'feat: C2 failed-then-fixed capture trigger'
+type: feat
+status: active
+date: 2026-06-22
+origin: docs/brainstorms/2026-06-22-c2-failed-then-fixed-capture-requirements.md
+---
+
+# feat: C2 failed-then-fixed capture trigger
+
+## Overview
+
+Add a second candidate source to the learning-capture pipeline: merged PRs whose CI checks
+transitioned **failed → passed** before merge. For each, the digest carries the fixing diff
+(primary signal, always available) plus a best-effort failing-log excerpt, privacy-scanned —
+including a new secret/redaction scan because logs and diffs leak more than repo names. Both
+triggers share the existing propose-only pipeline (opaque digest → agent authors → privacy
+gate → `learning-proposal` issue); a discriminated `Candidate` union keeps dedup, cap, and
+privacy uniform. This broadens the "grow" half of grow-and-learn with one more clean,
+high-signal trigger (see origin).
+
+## Problem Frame
+
+A CI failure that gets fixed before merge is a concentrated mistake→correction event: a named
+check went red, a diff turned it green. That structured signal is exactly what produced good
+learnings from review prose, and what the title-only first cut lacked. Today the capture run
+only learns from review-heavy PRs; C2 adds the failed-then-fixed source. The origin doc gated
+C2 on "multi-round-review precision proven," which is met; the accepted value-gate risk
+(corpus adoption not yet measured) is recorded in the origin.
+
+## Requirements Trace
+
+- R1. Identify merged PRs (same lookback, this repo) whose checks transitioned failed→passed
+  before merge (see origin R1).
+- R2. The digest carries the fixing diff (primary) + best-effort failing-log excerpt + failed
+  check name(s); the agent distills "this failure → this fix" (origin R2).
+- R3. New log/diff content passes the existing private-repo-name scan **plus a new
+  secret/URL/path scan** before reaching the agent (origin R3).
+- R3a. On a secret/private hit: redact in the harvest excerpt where safe, else drop the
+  enriched content; the candidate may proceed only on already-clean signal; the open-step body
+  scan blocks any residual secret (origin R3a).
+- R4. One learning-proposal per merge SHA — a PR matching both triggers yields one proposal
+  (origin R4); the existing merge-SHA marker dedups.
+- R5. Reuse the pipeline end to end: opaque digest, shared privacy module, marker, label, cap,
+  workflow — a new candidate source, not a new pipeline (origin R5).
+- R6. Counts-only telemetry; the harvest summary gains per-trigger candidate counts (origin R6).
+
+## Scope Boundaries
+
+- C2 (failed-then-fixed) only, this repo only, same `LOOKBACK_DAYS = 30`.
+- "Failed then passed" requires a real fixing diff (not a bare green re-run with no change).
+- No change to the propose-only model, dedup marker, or per-run cap (shared budget).
+
+### Deferred to Separate Tasks
+
+- C3 (issue triage) — murky free-text signal; revisit with a structured proxy after C2 proves out.
+- C4 (cross-run synthesis) — Phase 3.
+- Cross-repo C2 — later; v1 is this repo.
+- High-entropy generic-blob secret detection — too noisy for v1 (origin defers; add later with a context scorer).
+
+## Context & Research
+
+### Relevant Code and Patterns
+
+- `scripts/capture-learnings-harvest.ts` — `harvestCandidates` (I/O shell), `buildCandidateDigest`
+  (pure core: filter + cap + privacy-scan + emit), `Candidate` interface (currently
+  review-specific: `{mergeSha, reviewRounds, signals, reviewExcerpts}`), `buildReviewExcerpts`
+  (truncate-then-scan privacy-ordering invariant), `LOOKBACK_DAYS = 30`, `MAX_LEARNINGS_PER_RUN = 5`,
+  `HarvestStageCounts`. The opacity test asserts emitted candidates carry exactly the allowed keys.
+- `scripts/capture-learnings-privacy.ts` — the shared gate: `learningBodyHasPrivateLeak(body, tokens)`
+  (pure, lowercase `includes`), `loadPrivateTokensFromDisk` (I/O, fail-closed throw). Both
+  chokepoints (harvest + open) import this module. The new secret scan lives here.
+- `scripts/wiki-slug.ts` — `buildPrivateTokenSet` (owner/name token forms).
+- `scripts/check-private-leak.ts` — `redactPathTokens(path, tokens)` (case-insensitive regex
+  replace → `[REDACTED]`); the established `[REDACTED]` marker. Copy this shape for redaction.
+- `scripts/private-repo-resolution.ts` (`execFileSync('gh', ['api', 'graphql', ...])`) and
+  `scripts/reconcile-repos.ts` — the established `gh api graphql` pattern (no `@octokit/graphql` dep).
+- `.github/workflows/capture-learnings.yaml` — the harvest step, the trigger-aware agent prompt,
+  the step summary, the metadata overlay (needed for the privacy token set).
+
+### Institutional Learnings
+
+- `docs/solutions/security-issues/verify-whole-public-perimeter-2026-06-22.md` — enumerate every
+  public surface before declaring non-leaking; C2 adds "agent input digest" and "harvest run logs"
+  to that surface list.
+- `docs/solutions/best-practices/pure-core-privacy-gates-shared-module-2026-06-22.md` — the new
+  secret scan must be pure-core in the shared module, with a mutation-proof test that goes red when
+  the gate is removed; two chokepoints import one module so they can't diverge.
+- `docs/solutions/best-practices/privacy-gate-promotion-leak-prevention-2026-06-04.md` — "scan
+  content, not just identifiers"; fail-closed on ambiguous resolution.
+
+## Key Technical Decisions
+
+- **Transition detection: one GraphQL query per candidate PR.** `pullRequest.commits(first:100)
+  .nodes.commit.statusCheckRollup.contexts(first:100).nodes{... CheckRun{name conclusion} ...
+  StatusContext{context state}}` returns per-commit per-check conclusions across the PR's SHA
+  history in a single call (proven pattern). Walk commits oldest→newest: last-failing SHA = latest
+  commit where the target check is FAILURE/TIMED_OUT/etc.; first-passing SHA = first SUCCESS after
+  it. Invoked via `execFileSync('gh', ['api', 'graphql', ...])` with the workflow token — no new dep.
+- **Required-checks: current branch-protection set, strict, with a documented approximation.**
+  One `getBranchProtection` call per run yields `required_status_checks.checks[].context`; match
+  fail→pass only against that set. Caveat (in code): it's current config, not merge-time state. This
+  repo currently enforces none, so effectively any failed→passed check counts — the abstraction
+  generalizes to repos that do enforce.
+- **Fixing diff is the primary signal; logs are best-effort.** `repos.compareCommits(lastFailingSha,
+  firstPassingSha)` gives the fixing change and is always available. The failing-log excerpt
+  (`actions.downloadJobLogsForWorkflowRun` for the failed job) degrades to a `[failure log purged]`
+  placeholder on 404/410 or when unavailable — the candidate never blocks on log availability.
+- **Discriminated `Candidate` union.** `Candidate = ReviewCandidate | CiFixCandidate`, sharing
+  `mergeSha` + `signals`, with a `trigger` discriminant and a per-source `evidence` block. This
+  keeps the opacity guarantee (exact allowed keys per trigger), lets the agent prompt branch, and
+  avoids overloading review-specific fields with CI data.
+- **New secret/redaction scan in the shared privacy module.** Add `logDiffHasSecret(body)` (block)
+  and `redactLogDiffSecrets(body)` (structural redact → `[REDACTED]`). Block on PATs
+  (`gh[pousr]_…`, `github_pat_…`), private-key blocks, credential-bearing connection strings, and
+  cloud/LLM key shapes; redact file paths (`/home`, `/Users`, `~/.ssh`, …), internal hostnames
+  (`*.fro.bot`), and `Bearer`/`Authorization` values. The existing private-repo-name scan still runs.
+- **Per-run cap stays shared.** Both sources compete for `MAX_LEARNINGS_PER_RUN`; no per-trigger budget.
+
+## Open Questions
+
+### Resolved During Planning
+
+- Transition detection mechanism/cost → 1 GraphQL query per PR (resolved; the gating risk).
+- Required-checks determination → current branch-protection set, strict, documented approximation.
+- Log retention → best-effort fetch with `[purged]` degrade; diff is the primary signal.
+- Evidence model → discriminated union with per-source `evidence`.
+- Privacy hit policy → redact-where-safe in harvest, block residual in open (origin R3a).
+- Cap → shared budget.
+
+### Deferred to Implementation
+
+- Exact GraphQL field selection / pagination handling for PRs with >100 commits (cursor walk).
+- The precise secret-pattern regex set and the redact-vs-block threshold per pattern (research
+  gives a defensible v1 list; finalize against real fixtures during implementation).
+- Whether to extract `LOOKBACK_DAYS` to a shared constants module or import from the harvest file.
+- Exact `HarvestStageCounts` field names for the per-trigger + secret-blocked counters.
+
+## High-Level Technical Design
+
+> *This illustrates the intended approach and is directional guidance for review, not
+> implementation specification. The implementing agent should treat it as context, not code.*
+
+```
+harvest (I/O shell)
+ ├─ harvestReviewHeavyCandidates(...)         → Candidate[]  (existing, unchanged)
+ └─ harvestCiFixCandidates(...)               → Candidate[]  (new)
+       per merged PR in window:
+         1 GraphQL query → per-commit per-check conclusions
+         find lastFailingSha / firstPassingSha for a required check
+         compareCommits(lastFailingSha, firstPassingSha)         → fixing diff
+         downloadJobLogsForWorkflowRun(failed job)  [best-effort] → log excerpt | [purged]
+         build CiFixEvidence { trigger:'ci-fail-then-pass', lastFailingSha, firstPassingSha,
+                               failingCheckName, diffExcerpt, logExcerpt? }
+ ↓ concat both arrays
+buildCandidateDigest (pure core)  — generalized over the union
+   dedup by mergeSha · solutions dedup by signals · cap · PRIVACY:
+     private-repo-name scan + NEW secret scan (redact-where-safe / drop-else)
+ ↓ opaque digest (merge SHA + per-trigger evidence only — no owner/repo/number)
+agent authors bodies (prompt branches on trigger)
+ ↓
+open step — body privacy scan (private-name + secret block) → learning-proposal issue
+```
+
+## Implementation Units
+
+- [ ] **Unit 1: Secret/redaction scan in the shared privacy module**
+
+**Goal:** Add log/diff secret detection + structural redaction to the shared gate, callable from
+both chokepoints, without touching the existing private-repo-name scan.
+
+**Requirements:** R3, R3a.
+
+**Dependencies:** None.
+
+**Files:**
+- Modify: `scripts/capture-learnings-privacy.ts`
+- Test: `scripts/capture-learnings-privacy.test.ts`
+
+**Approach:**
+- Add pure `logDiffHasSecret(body): boolean` (block patterns) and `redactLogDiffSecrets(body): string`
+  (redact patterns → `[REDACTED]`, mirroring `redactPathTokens` in `check-private-leak.ts`). Keep
+  them pure; no I/O. Block list and redact list per the KTD. The existing
+  `learningBodyHasPrivateLeak` / `loadPrivateTokensFromDisk` stay unchanged.
+
+**Execution note:** Test-first; the mutation-proof test is the load-bearing coverage.
+
+**Patterns to follow:** `learningBodyHasPrivateLeak` (pure shape), `redactPathTokens`
+(`check-private-leak.ts`) for the redact form, the `[REDACTED]` marker convention.
+
+**Test scenarios:**
+- Happy: a clean diff/log → `logDiffHasSecret` false, `redactLogDiffSecrets` unchanged.
+- Block: bodies containing a `ghp_…` PAT, a `github_pat_…` token, a private-key block, and a
+  credential-bearing connection string each → `logDiffHasSecret` true.
+- Redact: a `/Users/...` path, a `*.fro.bot` hostname, and a `Bearer <token>` each → redacted to
+  `[REDACTED]` with surrounding prose preserved.
+- Edge: a placeholder/short value (`token=x`) → not over-redacted; an empty body → false/unchanged.
+- Mutation proof: with the secret scan disabled/short-circuited, a `ghp_` token reaches the
+  output → the test fails (proves the gate is load-bearing).
+
+**Verification:** gates green; the mutation-proof test bites; no secret value appears in any
+test assertion message.
+
+- [ ] **Unit 2: Discriminated Candidate union + generalized digest core**
+
+**Goal:** Refactor `Candidate` into a `trigger`-discriminated union with per-source `evidence`,
+and generalize `buildCandidateDigest` to iterate the union uniformly (dedup, cap, privacy), with
+no behavior change for the existing review source.
+
+**Requirements:** R4, R5.
+
+**Dependencies:** Unit 1 (the digest privacy step calls the new scan).
+
+**Files:**
+- Modify: `scripts/capture-learnings-harvest.ts`
+- Modify: `scripts/capture-learnings-open.ts` (consume the union's shape if needed)
+- Test: `scripts/capture-learnings-harvest.test.ts`, `scripts/capture-learnings-open.test.ts`
+
+**Approach:**
+- Introduce `ReviewCandidate` / `CiFixCandidate` sharing `{mergeSha, trigger, signals}` with a
+  per-source `evidence`. Keep the review path producing the same emitted shape. Generalize
+  `buildCandidateDigest`: dedup by `mergeSha`, solutions dedup by `signals`, cap, then privacy —
+  private-name scan + the Unit 1 secret scan over the evidence text (redact-where-safe in the
+  harvest excerpt; clear + count when redaction would gut it). Preserve the truncate-then-scan
+  privacy-ordering invariant. Update the opacity test to assert exact allowed keys per trigger.
+
+**Execution note:** Characterization-first — lock the existing review-source emitted shape with a
+test before refactoring, so the union change provably preserves it.
+
+**Patterns to follow:** the existing `buildCandidateDigest` dedup/cap/privacy flow; the opacity
+test; `buildReviewExcerpts` ordering invariant.
+
+**Test scenarios:**
+- Happy: a review candidate still emits the same keys/values post-refactor (characterization).
+- Happy: a CI-fix candidate emits `trigger:'ci-fail-then-pass'` + its evidence keys, no
+  owner/repo/number (opacity).
+- Edge: a PR matching both triggers → exactly one candidate (dedup by merge SHA), with a defined
+  trigger precedence.
+- Privacy: a CI-fix candidate whose diff/log carries a private repo name OR a secret → enriched
+  content redacted/dropped before the digest (mutation-proven), counted in telemetry.
+- Cap: review + CI-fix candidates compete for the shared cap; over-cap selection is deterministic.
+
+**Verification:** gates green; review-source output unchanged; opacity holds for both triggers;
+the privacy mutation-proof bites.
+
+- [ ] **Unit 3: CI fail→pass harvester (GraphQL detector + diff/log evidence)**
+
+**Goal:** Implement `harvestCiFixCandidates` — detect fail→pass PRs via one GraphQL query each,
+derive the fixing diff and best-effort log excerpt, and emit `CiFixCandidate`s.
+
+**Requirements:** R1, R2.
+
+**Dependencies:** Unit 2 (the union type).
+
+**Files:**
+- Modify: `scripts/capture-learnings-harvest.ts`
+- Test: `scripts/capture-learnings-harvest.test.ts`
+
+**Approach:**
+- One `getBranchProtection` per run → required-checks set. For each merged PR in the window
+  (reuse the existing merged-PR fetch), run the GraphQL commits/rollup query via
+  `execFileSync('gh', ['api', 'graphql', ...])`; walk commits oldest→newest to find
+  last-failing/first-passing SHA for a required check (require both to exist = a real transition).
+  `compareCommits` for the fixing diff (truncate to budget, ranked toward changed hunks);
+  best-effort `downloadJobLogsForWorkflowRun` for the failed job's log excerpt (ranked toward
+  error lines, budgeted), degrading to `[failure log purged]`. Build `CiFixEvidence`. Inject the
+  `gh`/octokit callers + `now` for testability; counts-only telemetry; per-trigger stage counts.
+
+**Execution note:** Test-first for the transition-walk logic (the correctness core).
+
+**Patterns to follow:** `execFileSync('gh', ['api','graphql',...])` in `private-repo-resolution.ts`;
+the existing merged-PR fetch + `now` injection; `buildReviewExcerpts` truncate-then-scan ordering.
+
+**Test scenarios:**
+- Happy: a PR with a failing commit then a passing commit on a required check → one CiFixCandidate
+  with the correct last-failing/first-passing SHAs and a diff excerpt.
+- Edge: a PR that never failed → no candidate; a PR that failed and stayed failing → no candidate
+  (no transition); a check not in the required set → ignored.
+- Edge: a bare green re-run with no diff between failing and passing SHA → not a candidate (require
+  a real fixing diff).
+- Error/degrade: logs purged (404/410) → `[failure log purged]` placeholder, candidate still emitted
+  on the diff; a GraphQL/compare error for one PR degrades that PR, does not abort the run.
+- Edge: a PR with >100 commits → cursor pagination walks all commits.
+- Privacy: a diff/log carrying a private name or secret → handled by the Unit 1/2 scan (assert the
+  evidence is scanned before emission).
+
+**Verification:** gates green; the transition walk is mutation-proven (a fixture where removing the
+oldest→newest ordering picks the wrong SHA fails); degrade paths don't abort.
+
+- [ ] **Unit 4: Workflow wiring + trigger-branched agent prompt**
+
+**Goal:** Concatenate both candidate sources into the digest, branch the agent prompt on `trigger`,
+and surface per-trigger + secret-blocked counts in the step summary.
+
+**Requirements:** R2, R6.
+
+**Dependencies:** Unit 3.
+
+**Files:**
+- Modify: `scripts/capture-learnings-harvest.ts` (`main` concatenates both sources)
+- Modify: `.github/workflows/capture-learnings.yaml` (prompt branch + step summary)
+
+**Approach:**
+- `main` calls both harvesters and concatenates before `buildCandidateDigest`. Update the agent
+  prompt: a CI-fix candidate's evidence (failing check + fixing diff + optional log) is the primary
+  signal; the agent distills "this failure → this fix"; the merge-SHA-only reference rule and all
+  hard boundaries stay identical. Add per-trigger candidate counts and the secret-blocked count to
+  the step summary (counts-only).
+
+**Test expectation:** none — workflow wiring + prompt; validated by actionlint and a manual dispatch
+showing both sources in the digest and a CI-fix proposal grounded in the diff.
+
+**Verification:** actionlint clean; a manual `workflow_dispatch` produces a CI-fix learning-proposal
+distilled from the diff (not the title), per-trigger counts visible, no secret/private identifier in
+the run log.
+
+## System-Wide Impact
+
+- **Interaction graph:** harvest gains a second source (1 GraphQL + 1 compare + best-effort log per
+  CI-fix candidate, bounded by the cap) and a new secret-scan call in the digest; the open step gains
+  a secret block; the agent prompt branches. The review source is unchanged.
+- **Error propagation:** per-PR GraphQL/compare/log failures degrade that candidate, never abort the
+  run; the secret/private scan is fail-closed (redact-where-safe, else drop/block).
+- **State lifecycle risks:** none new — no persisted state; the decision log (proposal issues) is
+  untouched; dedup by merge SHA prevents double-proposals across triggers.
+- **API surface parity:** both chokepoints (harvest + open) share the one privacy module, so the
+  private-name and secret scans cannot drift.
+- **Unchanged invariants:** the propose-only model, the merge-SHA marker, the `learning-proposal`
+  label, the per-run cap, opacity (merge-SHA-only references), and the data-branch authority model.
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| Secret leaks into a public issue via a failing log | New block/redact secret scan upstream + open-step block; mutation-proven; logs are best-effort and diff-primary |
+| Transition detection wrong (picks wrong SHA) | Oldest→newest chronological walk, mutation-proven; require both failing and passing SHA to exist |
+| Required-checks approximation (current config ≠ merge-time) | Documented caveat; weekly 30-day window makes drift rare; repo currently enforces none so any failed→passed counts |
+| Log retention purges the excerpt | `[purged]` degrade; the fixing diff is the primary signal and always available |
+| Refactor breaks the existing review source | Characterization test locks the emitted shape before the union refactor |
+| Secret-regex false positives over-redact | Defensible v1 pattern set; high-entropy generic detection deferred; finalize against real fixtures |
+
+## Documentation / Operational Notes
+
+- After landing, the `logDiffHasSecret` / `redactLogDiffSecrets` block-vs-redact pattern is a strong
+  `docs/solutions/security-issues/` learning candidate (reusable for any pipeline ingesting
+  third-party run output) — author via `ce:compound` if it proves reusable.
+- Close the C2 requirements doc's follow-ups when this ships.
+
+## Sources & References
+
+- **Origin document:** docs/brainstorms/2026-06-22-c2-failed-then-fixed-capture-requirements.md
+- Parent: docs/brainstorms/2026-06-22-skill-saving-grow-and-learn-requirements.md
+- Plans: docs/plans/2026-06-22-002-feat-capture-c1-proposals-plan.md (pipeline),
+  docs/plans/2026-06-22-003-feat-enrich-capture-digest-plan.md (enrichment + upstream privacy gate)
+- Code: scripts/capture-learnings-harvest.ts, scripts/capture-learnings-privacy.ts,
+  scripts/capture-learnings-open.ts, scripts/wiki-slug.ts, scripts/check-private-leak.ts
+  (redactPathTokens), scripts/private-repo-resolution.ts (gh api graphql pattern),
+  .github/workflows/capture-learnings.yaml
+- Learnings: docs/solutions/security-issues/verify-whole-public-perimeter-2026-06-22.md,
+  docs/solutions/best-practices/pure-core-privacy-gates-shared-module-2026-06-22.md,
+  docs/solutions/best-practices/privacy-gate-promotion-leak-prevention-2026-06-04.md

From 340eaad58daf6fee088bed5047f6117decbbe38d Mon Sep 17 00:00:00 2001
From: "Marcus R. Brown" <git@mrbro.dev>
Date: Tue, 23 Jun 2026 14:50:05 -0700
Subject: [PATCH 2/7] feat(capture): add log/diff secret detection and
 redaction to the privacy module
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CI logs and diffs leak more than repo names — tokens, keys, connection strings,
internal hostnames, and paths. Add a block scan for hard secrets (PATs, private
keys, credential-bearing connection strings, cloud/LLM keys) and a structural
redaction for paths, internal hostnames, and bearer/authorization values, both
pure and callable from the same shared module the existing private-name gate uses.
---
 scripts/capture-learnings-privacy.test.ts | 410 +++++++++++++++++++++-
 scripts/capture-learnings-privacy.ts      | 101 ++++++
 2 files changed, 510 insertions(+), 1 deletion(-)

diff --git a/scripts/capture-learnings-privacy.test.ts b/scripts/capture-learnings-privacy.test.ts
index 6646d9198..ce6bf873d 100644
--- a/scripts/capture-learnings-privacy.test.ts
+++ b/scripts/capture-learnings-privacy.test.ts
@@ -4,6 +4,8 @@
  * Structure:
  * - Pure function tests: `learningBodyHasPrivateLeak`
  * - Disk loader tests: `loadPrivateTokensFromDisk` (injectable readFile, fail-closed)
+ * - Secret detection tests: `logDiffHasSecret` (block patterns)
+ * - Secret redaction tests: `redactLogDiffSecrets` (redact patterns → '[REDACTED]')
  *
  * Privacy mutation-proof: each privacy test includes a "without the gate" assertion
  * that proves removing the check would let the content through.
@@ -11,7 +13,12 @@
 
 import {describe, expect, it} from 'vitest'
 
-import {learningBodyHasPrivateLeak, loadPrivateTokensFromDisk} from './capture-learnings-privacy.ts'
+import {
+  learningBodyHasPrivateLeak,
+  loadPrivateTokensFromDisk,
+  logDiffHasSecret,
+  redactLogDiffSecrets,
+} from './capture-learnings-privacy.ts'
 import {buildPrivateTokenSet} from './wiki-slug.ts'
 
 // ---------------------------------------------------------------------------
@@ -210,3 +217,404 @@ repos:
     expect(tokens.size).toBe(0)
   })
 })
+
+// ---------------------------------------------------------------------------
+// logDiffHasSecret — block patterns (pure function tests)
+// ---------------------------------------------------------------------------
+
+describe('logDiffHasSecret', () => {
+  describe('happy path — clean content', () => {
+    it('returns false for a clean diff/log string with no secrets', () => {
+      // #given a clean diff with no secret shapes
+      const body = `
+diff --git a/scripts/foo.ts b/scripts/foo.ts
+index abc..def 100644
+--- a/scripts/foo.ts
++++ b/scripts/foo.ts
+@@ -1,3 +1,4 @@
+ const x = 1
++const y = 2
+`
+      // #when scanning
+      // #then no secret is detected
+      expect(logDiffHasSecret(body)).toBe(false)
+    })
+
+    it('returns false for an empty body', () => {
+      // #given an empty body
+      // #when scanning
+      // #then no secret is detected
+      expect(logDiffHasSecret('')).toBe(false)
+    })
+  })
+
+  describe('block — GitHub PAT (ghp_ / gho_ / ghu_ / ghs_ / ghr_)', () => {
+    it('detects a ghp_ PAT with 36+ chars', () => {
+      // #given a body containing a GitHub PAT (ghp_ prefix, 36 chars)
+      // Using an obviously-fake token — 'A'.repeat(36) is not a real credential
+      const body = `CI log: token=ghp_${'A'.repeat(36)} was used`
+
+      // #when scanning
+      // #then the secret is detected and must block
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects a gho_ OAuth token', () => {
+      const body = `Authorization: gho_${'B'.repeat(36)}`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects a ghu_ user-to-server token', () => {
+      const body = `token: ghu_${'C'.repeat(36)}`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects a ghs_ server-to-server token', () => {
+      const body = `ghs_${'D'.repeat(36)}`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects a ghr_ refresh token', () => {
+      const body = `refresh_token=ghr_${'E'.repeat(36)}`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+  })
+
+  describe('block — fine-grained PAT (github_pat_)', () => {
+    it('detects a github_pat_ fine-grained PAT with 82 chars', () => {
+      // #given a body containing a fine-grained PAT (github_pat_ prefix, 82 chars)
+      // Using an obviously-fake token — 'F'.repeat(82) is not a real credential
+      const body = `export GITHUB_TOKEN=github_pat_${'F'.repeat(82)}`
+
+      // #when scanning
+      // #then the secret is detected and must block
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+  })
+
+  describe('block — private key blocks', () => {
+    it('detects a BEGIN RSA PRIVATE KEY block', () => {
+      // #given a body containing a private key header
+      const body = `-----BEGIN RSA PRIVATE KEY-----\nMIIEowIBAAKCAQEA...\n-----END RSA PRIVATE KEY-----`
+
+      // #when scanning
+      // #then the secret is detected and must block
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects a BEGIN PRIVATE KEY block (PKCS#8)', () => {
+      const body = `-----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkq...\n-----END PRIVATE KEY-----`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects a BEGIN EC PRIVATE KEY block', () => {
+      const body = `-----BEGIN EC PRIVATE KEY-----\nMHQCAQEEIBkg...\n-----END EC PRIVATE KEY-----`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+  })
+
+  describe('block — credential-bearing connection strings', () => {
+    it('detects a postgres:// connection string with credentials', () => {
+      // #given a body containing a postgres connection string with user:pass
+      const body = `DATABASE_URL=postgres://user:pass@host.example.com/db`
+
+      // #when scanning
+      // #then the secret is detected and must block
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects a postgresql:// connection string', () => {
+      const body = `postgresql://admin:s3cr3t@db.internal:5432/mydb`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects a mysql:// connection string with credentials', () => {
+      const body = `mysql://root:password@localhost/schema`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects a redis:// connection string with credentials', () => {
+      const body = `redis://user:token@cache.example.com:6379`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects a mongodb:// connection string with credentials', () => {
+      const body = `mongodb://admin:pass@mongo.example.com:27017/db`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects an amqps:// connection string with credentials', () => {
+      const body = `amqps://user:pass@rabbit.example.com/vhost`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+  })
+
+  describe('block — cloud/LLM key shapes', () => {
+    it('detects an AWS AKIA access key', () => {
+      // #given a body containing an AWS access key ID
+      const body = `AWS_ACCESS_KEY_ID=AKIA${'0'.repeat(16)}`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects an AWS ASIA temporary access key', () => {
+      const body = `ASIA${'1'.repeat(16)}`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects an OpenAI sk- key', () => {
+      // #given a body containing an OpenAI API key shape
+      const body = `OPENAI_API_KEY=sk-${'a'.repeat(20)}`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects an Anthropic sk-ant- key', () => {
+      // #given a body containing an Anthropic API key shape
+      const body = `ANTHROPIC_API_KEY=sk-ant-${'b'.repeat(20)}`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects a Slack xoxb- bot token', () => {
+      const body = `SLACK_TOKEN=xoxb-${'1'.repeat(10)}-${'2'.repeat(10)}`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+
+    it('detects a Slack xoxp- user token', () => {
+      const body = `xoxp-${'a'.repeat(10)}-${'b'.repeat(10)}`
+      expect(logDiffHasSecret(body)).toBe(true)
+    })
+  })
+
+  describe('block negative — no over-blocking on innocent content', () => {
+    it('returns false for a body with the word "github" but no secret shape', () => {
+      // #given a body mentioning github without any secret pattern
+      const body = 'This PR was merged on github.com/fro-bot/.github'
+      expect(logDiffHasSecret(body)).toBe(false)
+    })
+
+    it('returns false for a body with the word "token" but no secret shape', () => {
+      // #given a body mentioning token without any secret pattern
+      const body = 'The token field is documented in the README.'
+      expect(logDiffHasSecret(body)).toBe(false)
+    })
+
+    it('returns false for a body with "sk-" but too short to match', () => {
+      // #given a body with sk- prefix but only 5 chars (below the 20-char minimum)
+      const body = 'sk-abc'
+      expect(logDiffHasSecret(body)).toBe(false)
+    })
+
+    it('returns false for a postgres URL without credentials (no user:pass@)', () => {
+      // #given a postgres URL with no embedded credentials
+      const body = 'See the postgres docs at https://www.postgresql.org/docs/'
+      expect(logDiffHasSecret(body)).toBe(false)
+    })
+  })
+
+  describe('MUTATION PROOF — gate is load-bearing', () => {
+    it('detects a ghp_ PAT — flipping this function to always return false would fail this test', () => {
+      // MUTATION PROOF: this test asserts logDiffHasSecret returns true for a real secret shape.
+      // If logDiffHasSecret were short-circuited to `return false`, this assertion would fail,
+      // proving the gate is load-bearing. The real cross-module mutation proof comes in Unit 2
+      // where the gate is wired into the digest; here we prove the function itself is non-trivial.
+      //
+      // Using an obviously-fake token — 'A'.repeat(36) is not a real credential.
+      const bodyWithGhpToken = `CI output: ghp_${'A'.repeat(36)} was found in env`
+
+      // #when scanning a body that contains a real secret shape
+      const result = logDiffHasSecret(bodyWithGhpToken)
+
+      // #then the gate must return true — disabling it (return false) makes this fail
+      expect(result).toBe(true)
+
+      // Complementary: a clean body must return false (proves the function is not always-true either)
+      const cleanBody = 'No secrets here, just a normal CI log line.'
+      expect(logDiffHasSecret(cleanBody)).toBe(false)
+    })
+  })
+})
+
+// ---------------------------------------------------------------------------
+// redactLogDiffSecrets — redact patterns (pure function tests)
+// ---------------------------------------------------------------------------
+
+describe('redactLogDiffSecrets', () => {
+  describe('happy path — clean content unchanged', () => {
+    it('returns a clean diff/log string unchanged', () => {
+      // #given a clean body with no redact-class patterns
+      const body = 'diff --git a/foo.ts b/foo.ts\n+const x = 1\n'
+
+      // #when redacting
+      // #then the body is returned unchanged
+      expect(redactLogDiffSecrets(body)).toBe(body)
+    })
+
+    it('returns an empty body unchanged', () => {
+      // #given an empty body
+      // #when redacting
+      // #then empty string is returned
+      expect(redactLogDiffSecrets('')).toBe('')
+    })
+  })
+
+  describe('redact — Bearer tokens', () => {
+    it('redacts a Bearer token while preserving surrounding prose', () => {
+      // #given a body with a Bearer token in a non-Authorization context (e.g. a log line)
+      // Note: when Bearer appears inside an Authorization header, the Authorization pattern
+      // fires and redacts the whole header value. This test uses a standalone Bearer token.
+      const body = 'Request sent with Bearer abcdefgh12345 to the API'
+
+      // #when redacting
+      const result = redactLogDiffSecrets(body)
+
+      // #then the token value is gone and surrounding prose is preserved
+      expect(result).not.toContain('abcdefgh12345')
+      expect(result).toContain('Bearer [REDACTED]')
+      expect(result).toContain('Request sent with')
+      expect(result).toContain('to the API')
+    })
+
+    it('does not over-redact short placeholder values (min 8 chars)', () => {
+      // #given a body with a very short token-like value (below the 8-char minimum)
+      const body = 'token=x is a placeholder'
+
+      // #when redacting
+      const result = redactLogDiffSecrets(body)
+
+      // #then the short value is NOT redacted (below minimum length)
+      expect(result).toBe(body)
+    })
+  })
+
+  describe('redact — Authorization headers', () => {
+    it('redacts an Authorization header value while preserving surrounding prose', () => {
+      // #given a body with an Authorization header
+      const body = 'Log line: Authorization: Basic dXNlcjpwYXNz and then more text'
+
+      // #when redacting
+      const result = redactLogDiffSecrets(body)
+
+      // #then the credential value is gone and surrounding prose is preserved
+      expect(result).not.toContain('dXNlcjpwYXNz')
+      expect(result).toContain('Authorization: [REDACTED]')
+      expect(result).toContain('Log line:')
+      expect(result).toContain('and then more text')
+    })
+  })
+
+  describe('redact — file paths', () => {
+    it('redacts a /Users/ path while preserving surrounding prose', () => {
+      // #given a body with a /Users/ path
+      const body = 'Error reading /Users/alice/project/config.json in the build'
+
+      // #when redacting
+      const result = redactLogDiffSecrets(body)
+
+      // #then the path is gone and surrounding prose is preserved
+      expect(result).not.toContain('/Users/alice')
+      expect(result).toContain('[REDACTED]')
+      expect(result).toContain('Error reading')
+      expect(result).toContain('in the build')
+    })
+
+    it('redacts a /home/ path while preserving surrounding prose', () => {
+      // #given a body with a /home/ path
+      const body = 'Config loaded from /home/runner/work/repo/.env successfully'
+
+      // #when redacting
+      const result = redactLogDiffSecrets(body)
+
+      // #then the path is gone and surrounding prose is preserved
+      expect(result).not.toContain('/home/runner')
+      expect(result).toContain('[REDACTED]')
+      expect(result).toContain('Config loaded from')
+      expect(result).toContain('successfully')
+    })
+
+    it('redacts a /var/log/ path while preserving surrounding prose', () => {
+      // #given a body with a /var/log/ path
+      const body = 'See /var/log/nginx/access.log for details'
+
+      // #when redacting
+      const result = redactLogDiffSecrets(body)
+
+      // #then the path is gone and surrounding prose is preserved
+      expect(result).not.toContain('/var/log/nginx')
+      expect(result).toContain('[REDACTED]')
+      expect(result).toContain('See')
+      expect(result).toContain('for details')
+    })
+
+    it('redacts a /etc/ path while preserving surrounding prose', () => {
+      // #given a body with an /etc/ path
+      const body = 'Reading /etc/hosts for hostname resolution'
+
+      // #when redacting
+      const result = redactLogDiffSecrets(body)
+
+      // #then the path is gone and surrounding prose is preserved
+      expect(result).not.toContain('/etc/hosts')
+      expect(result).toContain('[REDACTED]')
+      expect(result).toContain('Reading')
+      expect(result).toContain('for hostname resolution')
+    })
+  })
+
+  describe('redact — internal hostnames', () => {
+    it('redacts a *.fro.bot hostname while preserving surrounding prose', () => {
+      // #given a body with an internal fro.bot hostname
+      const body = 'Request to gateway.fro.bot failed with 503'
+
+      // #when redacting
+      const result = redactLogDiffSecrets(body)
+
+      // #then the hostname is gone and surrounding prose is preserved
+      expect(result).not.toContain('gateway.fro.bot')
+      expect(result).toContain('[REDACTED]')
+      expect(result).toContain('Request to')
+      expect(result).toContain('failed with 503')
+    })
+
+    it('redacts a *.bfra.me hostname while preserving surrounding prose', () => {
+      // #given a body with an internal bfra.me hostname
+      const body = 'Connecting to api.bfra.me for metadata sync'
+
+      // #when redacting
+      const result = redactLogDiffSecrets(body)
+
+      // #then the hostname is gone and surrounding prose is preserved
+      expect(result).not.toContain('api.bfra.me')
+      expect(result).toContain('[REDACTED]')
+      expect(result).toContain('Connecting to')
+      expect(result).toContain('for metadata sync')
+    })
+  })
+
+  describe('edge cases', () => {
+    it('does not over-redact a short token=x placeholder', () => {
+      // #given a body with a very short value that looks like a token assignment
+      const body = 'token=x is a placeholder value'
+
+      // #when redacting
+      const result = redactLogDiffSecrets(body)
+
+      // #then the short value is NOT redacted (no pattern matches it)
+      expect(result).toBe(body)
+    })
+
+    it('applies multiple redactions in a single body', () => {
+      // #given a body with both a Bearer token and an internal hostname
+      const body = 'Bearer abcdefgh12345 called api.bfra.me/endpoint'
+
+      // #when redacting
+      const result = redactLogDiffSecrets(body)
+
+      // #then both are redacted
+      expect(result).not.toContain('abcdefgh12345')
+      expect(result).not.toContain('api.bfra.me')
+      expect(result).toContain('Bearer [REDACTED]')
+      expect(result).toContain('[REDACTED]')
+      expect(result).toContain('called')
+      expect(result).toContain('/endpoint')
+    })
+  })
+})
diff --git a/scripts/capture-learnings-privacy.ts b/scripts/capture-learnings-privacy.ts
index 21bf3bc3b..d365af902 100644
--- a/scripts/capture-learnings-privacy.ts
+++ b/scripts/capture-learnings-privacy.ts
@@ -28,6 +28,107 @@ export function isRecord(value: unknown): value is Record<string, unknown> {
   return typeof value === 'object' && value !== null && !Array.isArray(value)
 }
 
+// ---------------------------------------------------------------------------
+// Secret detection — block patterns (pure, no I/O)
+// ---------------------------------------------------------------------------
+
+/**
+ * Hard-secret patterns that BLOCK content from reaching the agent.
+ * These shapes must never be redacted — they must cause the content to be dropped entirely.
+ *
+ * Pattern set (v1):
+ * - GitHub PATs: gh[pousr]_ prefix + 36–255 alphanumeric chars
+ * - Fine-grained PAT: github_pat_ prefix + exactly 82 alphanumeric/underscore chars
+ * - Private key blocks: -----BEGIN [TYPE] PRIVATE KEY----- header
+ * - Credential-bearing connection strings: scheme://user:pass@host (postgres, mysql, redis, mongodb, amqp)
+ * - AWS access keys: AKIA/ASIA prefix + 16 uppercase alphanumeric chars
+ * - OpenAI/Anthropic keys: sk- or sk-ant- prefix + 20+ chars
+ * - Slack tokens: xox[bpars]- prefix + 10+ chars
+ */
+const SECRET_BLOCK_PATTERNS: RegExp[] = [
+  // GitHub PATs (classic): ghp_, gho_, ghu_, ghs_, ghr_
+  /gh[pousr]_[A-Za-z0-9]{36,255}/,
+  // Fine-grained PAT
+  /github_pat_\w{82}/,
+  // Private key blocks (RSA, EC, PKCS#8, etc.)
+  /-----BEGIN [A-Z ]*PRIVATE KEY-----/,
+  // Credential-bearing connection strings (user:pass@host required)
+  /(?:postgres|postgresql|mysql|redis|mongodb|amqps?):\/\/[^\s:]+:[^\s@]+@[^\s/]+/,
+  // AWS access keys
+  /(?:AKIA|ASIA)[0-9A-Z]{16}/,
+  // OpenAI keys (sk-) and Anthropic keys (sk-ant-) — two separate patterns to avoid redundancy
+  /sk-ant-\w{20,}/,
+  /sk-[A-Za-z0-9\-]{20,}/,
+  // Slack tokens
+  /xox[bpars]-[\w-]{10,}/,
+]
+
+/**
+ * Returns true when the body contains a hard-secret shape that must BLOCK the content.
+ *
+ * Iterates the block pattern set and returns true on the first match.
+ * Never redacts — block only. The caller MUST drop the content on true.
+ *
+ * Pure function: no I/O, fully testable.
+ */
+export function logDiffHasSecret(body: string): boolean {
+  for (const pattern of SECRET_BLOCK_PATTERNS) {
+    if (pattern.test(body)) return true
+  }
+  return false
+}
+
+// ---------------------------------------------------------------------------
+// Secret redaction — redact patterns (pure, no I/O)
+// ---------------------------------------------------------------------------
+
+/**
+ * Redact-class patterns: structural values that should be replaced with '[REDACTED]'
+ * while preserving the surrounding prose. Applied in sequence.
+ *
+ * Each entry is [pattern, replacement]:
+ * - Bearer tokens: 'Bearer <value>' → 'Bearer [REDACTED]'
+ * - Authorization headers: 'Authorization: <value>' → 'Authorization: [REDACTED]'
+ * - File paths: /home/..., /Users/..., C:\Users\..., ~/.dotfile, /var/log/..., /etc/...
+ * - Internal hostnames: *.fro.bot, *.bfra.me
+ *
+ * Minimum length guards (8 chars for Bearer/Authorization) prevent over-redacting
+ * short placeholder values like 'token=x'.
+ */
+const SECRET_REDACT_PATTERNS: [RegExp, string][] = [
+  // Bearer tokens (min 8 chars to avoid over-redacting short placeholders)
+  [/Bearer\s+[\w\-.=]{8,}/g, 'Bearer [REDACTED]'],
+  // Authorization header values: capture scheme + credential (e.g. 'Basic dXNlcjpwYXNz', 'Bearer token')
+  // Matches two whitespace-delimited tokens after the colon, or one token of 8+ chars.
+  [/Authorization:\s*(?:\S+\s+\S{4,}|\S{8,})/g, 'Authorization: [REDACTED]'],
+  // File paths: /home/<user>, /Users/<user>, C:\Users\<user>, ~/.dotfile, /var/log/..., /etc/<name>
+  [
+    /(?:\/home\/[\w.-]+|\/Users\/[\w.-]+|C:\\Users\\[\w.-]+|~\/\.[a-z]+|\/var\/log\/[\w./-]*|\/etc\/[a-z]+)/g,
+    '[REDACTED]',
+  ],
+  // Internal hostnames: *.fro.bot
+  [/[a-z0-9-]+\.fro\.bot\b/g, '[REDACTED]'],
+  // Internal hostnames: *.bfra.me
+  [/[a-z0-9-]+\.bfra\.me\b/g, '[REDACTED]'],
+]
+
+/**
+ * Returns the body with REDACT-class patterns replaced by '[REDACTED]'.
+ *
+ * Structure is preserved — prose around the redacted values remains intact.
+ * Applies each redact pattern in sequence. Does NOT block — use `logDiffHasSecret`
+ * first to check for hard-secret shapes that must block entirely.
+ *
+ * Pure function: no I/O, fully testable.
+ */
+export function redactLogDiffSecrets(body: string): string {
+  let result = body
+  for (const [pattern, replacement] of SECRET_REDACT_PATTERNS) {
+    result = result.replaceAll(pattern, replacement)
+  }
+  return result
+}
+
 // ---------------------------------------------------------------------------
 // Privacy gate — pure, fail-closed
 // ---------------------------------------------------------------------------

From 9cfc37394d1321da213c89516af8f0030a32bfc8 Mon Sep 17 00:00:00 2001
From: "Marcus R. Brown" <git@mrbro.dev>
Date: Tue, 23 Jun 2026 15:05:59 -0700
Subject: [PATCH 3/7] feat(capture): discriminate candidates by trigger and
 dedup within a run

Make Candidate a trigger-discriminated union so a second source can share the
digest's dedup, cap, and privacy path without overloading review-specific fields.
The privacy step now redacts structural secrets and drops on a residual hard
secret as well as a private name. Collapse candidates to one per merge SHA within
a run so a PR matching both triggers yields a single proposal, review prose
taking precedence. The review source emits the same shape as before.
---
 scripts/capture-learnings-harvest.test.ts | 419 +++++++++++++++++++---
 scripts/capture-learnings-harvest.ts      | 148 ++++++--
 scripts/capture-learnings-open.test.ts    |   1 +
 3 files changed, 503 insertions(+), 65 deletions(-)

diff --git a/scripts/capture-learnings-harvest.test.ts b/scripts/capture-learnings-harvest.test.ts
index 4bf1ac0fe..b79084688 100644
--- a/scripts/capture-learnings-harvest.test.ts
+++ b/scripts/capture-learnings-harvest.test.ts
@@ -23,8 +23,10 @@ import {
   type BuildCandidateDigestInput,
   type Candidate,
   type CandidateDigest,
+  type CiFixCandidate,
   type HarvestStageCounts,
   type OctokitClient,
+  type ReviewCandidate,
   type SolutionDoc,
 } from './capture-learnings-harvest.ts'
 import {buildPrivateTokenSet} from './wiki-slug.ts'
@@ -33,8 +35,9 @@ import {buildPrivateTokenSet} from './wiki-slug.ts'
 // Fixture helpers
 // ---------------------------------------------------------------------------
 
-function makeCandidate(overrides: Partial<Candidate> = {}): Candidate {
+function makeCandidate(overrides: Partial<ReviewCandidate> = {}): ReviewCandidate {
   return {
+    trigger: 'review-heavy',
     mergeSha: 'abc123def456abc123def456abc123def456abc1',
     reviewRounds: 2,
     signals: {titleTokens: ['feat', 'scripts'], labels: []},
@@ -43,6 +46,25 @@ function makeCandidate(overrides: Partial<Candidate> = {}): Candidate {
   }
 }
 
+function makeCiFixCandidate(overrides: Partial<CiFixCandidate> = {}): CiFixCandidate {
+  return {
+    trigger: 'ci-fail-then-pass',
+    mergeSha: 'cifix001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
+    signals: {titleTokens: ['fix', 'ci'], labels: []},
+    ...overrides,
+  }
+}
+
+/**
+ * Narrow a Candidate to ReviewCandidate for test assertions.
+ * Throws if the candidate is not a ReviewCandidate — makes test failures explicit.
+ */
+function asReviewCandidate(c: Candidate | undefined): ReviewCandidate {
+  if (c === undefined) throw new Error('Expected a candidate but got undefined')
+  if (c.trigger !== 'review-heavy') throw new Error(`Expected trigger='review-heavy' but got '${c.trigger}'`)
+  return c
+}
+
 function makeSolutionDoc(overrides: Partial<SolutionDoc> = {}): SolutionDoc {
   return {
     path: 'docs/solutions/best-practices/some-doc.md',
@@ -148,22 +170,47 @@ describe('buildCandidateDigest', () => {
       // #then the candidate is in the output with its mergeSha and reviewRounds
       expect(result.candidates).toHaveLength(1)
       expect(result.candidates[0]?.mergeSha).toBe(makeCandidate().mergeSha)
-      expect(result.candidates[0]?.reviewRounds).toBe(2)
+      const emitted0 = result.candidates[0]
+      if (emitted0?.trigger === 'review-heavy') {
+        expect(emitted0.reviewRounds).toBe(2)
+      }
     })
   })
 
   describe('opacity guarantee', () => {
-    it('emitted candidates have ONLY mergeSha, reviewRounds, signals, and reviewExcerpts keys — no owner/repo/number/title', () => {
-      // #given a candidate
+    it('review-heavy candidate has ONLY trigger, mergeSha, reviewRounds, signals, reviewExcerpts — no owner/repo/number/title', () => {
+      // #given a review-heavy candidate
       const input = makeDigestInput()
 
       // #when building the digest
       const result = buildCandidateDigest(input)
 
-      // #then each candidate object has exactly the allowed keys
+      // #then each review-heavy candidate has exactly the allowed keys
       for (const candidate of result.candidates) {
+        if (candidate.trigger !== 'review-heavy') continue
         const keys = Object.keys(candidate).sort()
-        expect(keys).toEqual(['mergeSha', 'reviewExcerpts', 'reviewRounds', 'signals'].sort())
+        expect(keys).toEqual(['mergeSha', 'reviewExcerpts', 'reviewRounds', 'signals', 'trigger'].sort())
+        // Explicitly assert forbidden keys are absent
+        expect(candidate).not.toHaveProperty('owner')
+        expect(candidate).not.toHaveProperty('repo')
+        expect(candidate).not.toHaveProperty('number')
+        expect(candidate).not.toHaveProperty('title')
+      }
+    })
+
+    it('ci-fail-then-pass candidate has ONLY trigger, mergeSha, signals — no owner/repo/number/title', () => {
+      // #given a ci-fix candidate
+      const ciFixCandidate = makeCiFixCandidate()
+      const input = makeDigestInput({mergedPrs: [ciFixCandidate]})
+
+      // #when building the digest
+      const result = buildCandidateDigest(input)
+
+      // #then the ci-fix candidate has exactly the allowed keys
+      for (const candidate of result.candidates) {
+        if (candidate.trigger !== 'ci-fail-then-pass') continue
+        const keys = Object.keys(candidate).sort()
+        expect(keys).toEqual(['mergeSha', 'signals', 'trigger'].sort())
         // Explicitly assert forbidden keys are absent
         expect(candidate).not.toHaveProperty('owner')
         expect(candidate).not.toHaveProperty('repo')
@@ -385,6 +432,8 @@ describe('buildCandidateDigest', () => {
       expect(result.telemetry.emitted).toBe(2) // 2 clean, under cap
       // #then enrichmentBlocked is zero (no private tokens, no enriched content)
       expect(result.telemetry.enrichmentBlocked).toBe(0)
+      // #then enrichmentBlockedBySecret is zero (no secrets in content)
+      expect(result.telemetry.enrichmentBlockedBySecret).toBe(0)
     })
 
     it('reports zero counts when all candidates are filtered', () => {
@@ -419,6 +468,262 @@ describe('buildCandidateDigest', () => {
   })
 })
 
+// ---------------------------------------------------------------------------
+// CHARACTERIZATION: review-heavy candidate emitted shape is unchanged post-union
+// ---------------------------------------------------------------------------
+
+describe('buildCandidateDigest — characterization: review-heavy emitted shape', () => {
+  it('CHARACTERIZATION: review-heavy candidate emits same mergeSha/reviewRounds/signals/reviewExcerpts values as before the union refactor, plus trigger', () => {
+    // #given a review-heavy candidate with known values (mirrors pre-union shape)
+    // This test locks the no-regression guarantee: the union refactor must not change
+    // the values emitted for the existing review source. Only `trigger` is new.
+    const sha = 'abc123def456abc123def456abc123def456abc1'
+    const candidate = makeCandidate({
+      mergeSha: sha,
+      reviewRounds: 3,
+      signals: {titleTokens: ['feat', 'scripts'], labels: ['ci', 'automation']},
+      reviewExcerpts: ['Please fix the null check here.', 'LGTM, approved!'],
+    })
+    const input = makeDigestInput({
+      mergedPrs: [candidate],
+      privateTokens: new Set(), // no private tokens → excerpts pass through
+    })
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then exactly one candidate is emitted
+    expect(result.candidates).toHaveLength(1)
+    const emitted = result.candidates[0]
+
+    // #then the trigger discriminant is present and correct (the only new field)
+    expect(emitted?.trigger).toBe('review-heavy')
+
+    // #then all pre-union fields are unchanged
+    expect(emitted?.mergeSha).toBe(sha)
+    // Narrow to ReviewCandidate to access reviewRounds/reviewExcerpts
+    if (emitted?.trigger === 'review-heavy') {
+      expect(emitted.reviewRounds).toBe(3)
+      expect(emitted.signals).toEqual({titleTokens: ['feat', 'scripts'], labels: ['ci', 'automation']})
+      expect(emitted.reviewExcerpts).toEqual(['Please fix the null check here.', 'LGTM, approved!'])
+    }
+  })
+
+  it('CHARACTERIZATION: review-heavy candidate with empty reviewExcerpts emits empty array (title-only path unchanged)', () => {
+    // #given a review-heavy candidate with no excerpts
+    const sha = 'deadbeefdeadbeefdeadbeefdeadbeefdeadbeef'
+    const candidate = makeCandidate({mergeSha: sha, reviewRounds: 2, reviewExcerpts: []})
+    const input = makeDigestInput({mergedPrs: [candidate]})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then the candidate is emitted with empty reviewExcerpts (unchanged behavior)
+    expect(result.candidates).toHaveLength(1)
+    const emitted = result.candidates[0]
+    expect(emitted?.trigger).toBe('review-heavy')
+    if (emitted?.trigger === 'review-heavy') {
+      expect(emitted.reviewExcerpts).toEqual([])
+      expect(emitted.reviewRounds).toBe(2)
+    }
+  })
+})
+
+// ---------------------------------------------------------------------------
+// Discriminated union: CiFixCandidate flows through the digest pipeline
+// ---------------------------------------------------------------------------
+
+describe('buildCandidateDigest — CiFixCandidate union support', () => {
+  it('ci-fail-then-pass candidate passes through dedup, cap, and privacy unchanged', () => {
+    // #given a ci-fix candidate (no evidence fields yet — Unit 3 placeholder)
+    const ciFixCandidate = makeCiFixCandidate({
+      mergeSha: 'cifix001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
+    })
+    const input = makeDigestInput({mergedPrs: [ciFixCandidate]})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then the candidate is emitted with trigger='ci-fail-then-pass'
+    expect(result.candidates).toHaveLength(1)
+    expect(result.candidates[0]?.trigger).toBe('ci-fail-then-pass')
+    expect(result.candidates[0]?.mergeSha).toBe('cifix001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1')
+    // #then no enrichment was blocked (no evidence to scan yet)
+    expect(result.telemetry.enrichmentBlocked).toBe(0)
+    expect(result.telemetry.enrichmentBlockedBySecret).toBe(0)
+  })
+
+  it('mixed review-heavy + ci-fix candidates both emitted, dedup by mergeSha works across triggers', () => {
+    // #given one review-heavy and one ci-fix candidate with different SHAs
+    const reviewCandidate = makeCandidate({mergeSha: 'review01aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'})
+    const ciFixCandidate = makeCiFixCandidate({mergeSha: 'cifix001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'})
+    const input = makeDigestInput({mergedPrs: [reviewCandidate, ciFixCandidate]})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then both candidates are emitted
+    expect(result.candidates).toHaveLength(2)
+    const triggers = result.candidates.map(c => c.trigger).sort()
+    expect(triggers).toEqual(['ci-fail-then-pass', 'review-heavy'])
+  })
+
+  it('cross-run dedup: review-heavy and ci-fix with same SHA are both filtered when SHA is in openedLearningShas', () => {
+    // #given a review-heavy and a ci-fix candidate with the SAME mergeSha
+    // (a PR matching both triggers — both are filtered by the cross-run seen-set)
+    const sha = 'shared01aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const reviewCandidate = makeCandidate({mergeSha: sha})
+    const ciFixCandidate = makeCiFixCandidate({mergeSha: sha})
+    const input = makeDigestInput({
+      mergedPrs: [reviewCandidate, ciFixCandidate],
+      openedLearningShas: new Set([sha]), // SHA already proposed
+    })
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then both candidates are filtered (SHA already in seen-set)
+    expect(result.candidates).toHaveLength(0)
+    expect(result.telemetry.afterSeenDedup).toBe(0)
+  })
+
+  it('within-run dedup (R4): same SHA from both triggers yields one candidate, review-heavy wins', () => {
+    // #given a review-heavy and a ci-fix candidate with the SAME mergeSha, NOT yet proposed
+    // (a PR that matched both triggers in this run)
+    const sha = 'bothtrg1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const ciFixCandidate = makeCiFixCandidate({mergeSha: sha})
+    const reviewCandidate = makeCandidate({mergeSha: sha})
+    // ci-fix listed first to prove precedence is by trigger, not array order
+    const input = makeDigestInput({mergedPrs: [ciFixCandidate, reviewCandidate]})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then exactly one candidate is emitted (R4), and it is the review-heavy one
+    expect(result.candidates).toHaveLength(1)
+    expect(result.candidates[0]?.trigger).toBe('review-heavy')
+  })
+})
+
+// ---------------------------------------------------------------------------
+// Secret mutation proof: logDiffHasSecret wired into buildCandidateDigest
+// ---------------------------------------------------------------------------
+
+describe('buildCandidateDigest — secret scan wiring (mutation proof)', () => {
+  it('MUTATION PROOF: review-heavy candidate with a ghp_ secret in reviewExcerpts → excerpts cleared, enrichmentBlockedBySecret incremented', () => {
+    // #given a review-heavy candidate whose excerpts contain a GitHub PAT
+    // This test proves the secret scan is wired: removing it makes the secret reach the digest.
+    const sha = 'secretsha1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+    const secretExcerpt = 'The token used was ghp_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1 in the config.'
+    const candidate = makeCandidate({
+      mergeSha: sha,
+      reviewExcerpts: [secretExcerpt],
+    })
+    const input = makeDigestInput({
+      mergedPrs: [candidate],
+      privateTokens: new Set(), // no private-name tokens — only secret scan fires
+    })
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then the candidate is kept (not dropped)
+    expect(result.candidates).toHaveLength(1)
+    // #then reviewExcerpts are cleared (secret blocked)
+    if (result.candidates[0]?.trigger === 'review-heavy') {
+      expect(result.candidates[0].reviewExcerpts).toEqual([])
+    }
+    // #then enrichmentBlockedBySecret is incremented
+    expect(result.telemetry.enrichmentBlockedBySecret).toBe(1)
+    // #then enrichmentBlocked (private-name counter) is NOT incremented
+    expect(result.telemetry.enrichmentBlocked).toBe(0)
+    // #then the secret does NOT appear in the serialized digest
+    const serialized = JSON.stringify(result)
+    expect(serialized).not.toContain('ghp_')
+  })
+
+  it('MUTATION PROOF: removing the secret scan lets the ghp_ token reach the digest', () => {
+    // #given a candidate with a PAT in excerpts
+    const sha = 'secretsha2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+    const secretExcerpt = 'Token: ghp_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2'
+    const candidate = makeCandidate({mergeSha: sha, reviewExcerpts: [secretExcerpt]})
+
+    // #when the scan IS applied (normal path — no private tokens, only secret scan)
+    const withScan = buildCandidateDigest(
+      makeDigestInput({
+        mergedPrs: [candidate],
+        privateTokens: new Set(),
+      }),
+    )
+    // #then the secret is NOT in the output
+    expect(JSON.stringify(withScan)).not.toContain('ghp_')
+    expect(withScan.telemetry.enrichmentBlockedBySecret).toBe(1)
+
+    // #when the scan is bypassed (simulated by using a candidate with no secret)
+    // We prove the gate is load-bearing by showing a clean candidate DOES pass through
+    const cleanCandidate = makeCandidate({mergeSha: sha, reviewExcerpts: ['No secrets here.']})
+    const withoutSecret = buildCandidateDigest(
+      makeDigestInput({
+        mergedPrs: [cleanCandidate],
+        privateTokens: new Set(),
+      }),
+    )
+    // #then the clean excerpt DOES appear — proving the scan only blocks secrets
+    if (withoutSecret.candidates[0]?.trigger === 'review-heavy') {
+      expect(withoutSecret.candidates[0].reviewExcerpts).toEqual(['No secrets here.'])
+    }
+    expect(withoutSecret.telemetry.enrichmentBlockedBySecret).toBe(0)
+  })
+
+  it('redactLogDiffSecrets is applied before the secret check: path tokens are redacted, not blocked', () => {
+    // #given a candidate whose excerpts contain a file path (redact-class, not block-class)
+    const sha = 'redactsha1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+    const excerptWithPath = 'Config loaded from /Users/marcus/.ssh/config for the build.'
+    const candidate = makeCandidate({
+      mergeSha: sha,
+      reviewExcerpts: [excerptWithPath],
+    })
+    const input = makeDigestInput({
+      mergedPrs: [candidate],
+      privateTokens: new Set(),
+    })
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then the candidate is emitted (not blocked)
+    expect(result.candidates).toHaveLength(1)
+    // #then the path is redacted in the output (not the original value)
+    if (result.candidates[0]?.trigger === 'review-heavy') {
+      const excerpts = result.candidates[0].reviewExcerpts
+      expect(excerpts).toHaveLength(1)
+      expect(excerpts[0]).not.toContain('/Users/marcus')
+      expect(excerpts[0]).toContain('[REDACTED]')
+    }
+    // #then neither counter is incremented (redaction, not blocking)
+    expect(result.telemetry.enrichmentBlocked).toBe(0)
+    expect(result.telemetry.enrichmentBlockedBySecret).toBe(0)
+  })
+
+  it('private-name hit still uses enrichmentBlocked counter (not enrichmentBlockedBySecret)', () => {
+    // #given a candidate with private-name prose
+    const privateTokens = buildPrivateTokenSet(['testowner/secret-repo'])
+    const candidate = makeCandidate({
+      mergeSha: 'privname1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa',
+      reviewExcerpts: ['See testowner/secret-repo#42 for context.'],
+    })
+    const input = makeDigestInput({mergedPrs: [candidate], privateTokens})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then enrichmentBlocked is incremented (private-name counter)
+    expect(result.telemetry.enrichmentBlocked).toBe(1)
+    // #then enrichmentBlockedBySecret is NOT incremented
+    expect(result.telemetry.enrichmentBlockedBySecret).toBe(0)
+  })
+})
+
 // ---------------------------------------------------------------------------
 // I/O shell: harvestCandidates (mocked Octokit)
 // ---------------------------------------------------------------------------
@@ -681,7 +986,10 @@ describe('harvestCandidates', () => {
     // #then the PR is a candidate with reviewRounds = 2 (substantive count)
     expect(candidates).toHaveLength(1)
     expect(candidates[0]?.mergeSha).toBe(sha)
-    expect(candidates[0]?.reviewRounds).toBe(2)
+    expect(candidates[0]?.trigger).toBe('review-heavy')
+    if (candidates[0]?.trigger === 'review-heavy') {
+      expect(candidates[0].reviewRounds).toBe(2)
+    }
   })
 
   it('CHANGES_REQUESTED 1 + APPROVED 1 → CANDIDATE (substantive=2, correction=1) — #3530/#3517 shape', async () => {
@@ -698,7 +1006,10 @@ describe('harvestCandidates', () => {
 
     // #then the PR is a candidate (CR still counts as correction signal)
     expect(candidates).toHaveLength(1)
-    expect(candidates[0]?.reviewRounds).toBe(2)
+    expect(candidates[0]?.trigger).toBe('review-heavy')
+    if (candidates[0]?.trigger === 'review-heavy') {
+      expect(candidates[0].reviewRounds).toBe(2)
+    }
   })
 
   it('APPROVED 2 + DISMISSED 1 → CANDIDATE (substantive=3, correction=1) — #3526 shape', async () => {
@@ -719,7 +1030,10 @@ describe('harvestCandidates', () => {
 
     // #then the PR is a candidate with reviewRounds = 3
     expect(candidates).toHaveLength(1)
-    expect(candidates[0]?.reviewRounds).toBe(3)
+    expect(candidates[0]?.trigger).toBe('review-heavy')
+    if (candidates[0]?.trigger === 'review-heavy') {
+      expect(candidates[0].reviewRounds).toBe(3)
+    }
   })
 
   it('DISMISSED 2 → CANDIDATE (substantive=2, correction=2) — #3514 shape', async () => {
@@ -736,7 +1050,10 @@ describe('harvestCandidates', () => {
 
     // #then the PR is a candidate
     expect(candidates).toHaveLength(1)
-    expect(candidates[0]?.reviewRounds).toBe(2)
+    expect(candidates[0]?.trigger).toBe('review-heavy')
+    if (candidates[0]?.trigger === 'review-heavy') {
+      expect(candidates[0].reviewRounds).toBe(2)
+    }
   })
 
   it('APPROVED 1 only → NOT candidate (substantive=1 < MIN_SUBSTANTIVE_REVIEW_ROUNDS) — #3539 shape', async () => {
@@ -894,7 +1211,10 @@ describe('harvestCandidates', () => {
 
     // #then all 3 substantive reviews are counted (not just the first page)
     expect(candidates).toHaveLength(1)
-    expect(candidates[0]?.reviewRounds).toBe(3)
+    expect(candidates[0]?.trigger).toBe('review-heavy')
+    if (candidates[0]?.trigger === 'review-heavy') {
+      expect(candidates[0].reviewRounds).toBe(3)
+    }
   })
 
   // -------------------------------------------------------------------------
@@ -1123,12 +1443,14 @@ describe('harvest→open schema contract', () => {
     // #given a CandidateDigest produced by buildCandidateDigest (as harvest would write it)
     const candidates: Candidate[] = [
       {
+        trigger: 'review-heavy',
         mergeSha: 'abc123def456abc123def456abc123def456abc1',
         reviewRounds: 3,
         signals: {titleTokens: ['feat', 'scripts'], labels: ['ci', 'automation']},
         reviewExcerpts: ['The correction prose here.'],
       },
       {
+        trigger: 'review-heavy',
         mergeSha: 'deadbeefdeadbeefdeadbeefdeadbeefdeadbeef',
         reviewRounds: 2,
         signals: {titleTokens: ['fix', 'workflow'], labels: []},
@@ -1146,6 +1468,7 @@ describe('harvest→open schema contract', () => {
         afterSolutionsDedup: 2,
         emitted: 2,
         enrichmentBlocked: 0,
+        enrichmentBlockedBySecret: 0,
       },
     }
 
@@ -1163,7 +1486,11 @@ describe('harvest→open schema contract', () => {
     // #then candidates round-trip correctly
     expect(deserialized.candidates).toHaveLength(2)
     expect(deserialized.candidates[0]?.mergeSha).toBe('abc123def456abc123def456abc123def456abc1')
-    expect(deserialized.candidates[0]?.reviewRounds).toBe(3)
+    expect(deserialized.candidates[0]?.trigger).toBe('review-heavy')
+    const dc0 = deserialized.candidates[0]
+    if (dc0?.trigger === 'review-heavy') {
+      expect(dc0.reviewRounds).toBe(3)
+    }
     expect(deserialized.candidates[1]?.mergeSha).toBe('deadbeefdeadbeefdeadbeefdeadbeefdeadbeef')
 
     // #then telemetry round-trips correctly
@@ -1172,6 +1499,7 @@ describe('harvest→open schema contract', () => {
     expect(deserialized.telemetry.excludedAutomation).toBe(1)
     expect(deserialized.telemetry.multiRoundCandidates).toBe(5)
     expect(deserialized.telemetry.emitted).toBe(2)
+    expect(deserialized.telemetry.enrichmentBlockedBySecret).toBe(0)
 
     // #then the open step can iterate candidates without TypeError
     const shas = deserialized.candidates.map(c => c.mergeSha)
@@ -1217,7 +1545,10 @@ describe('buildCandidateDigest — enrichment and upstream privacy scan', () =>
 
       // #then the candidate is emitted with its reviewExcerpts intact
       expect(result.candidates).toHaveLength(1)
-      expect(result.candidates[0]?.reviewExcerpts).toEqual(['Please fix the null check here.', 'LGTM, approved!'])
+      expect(asReviewCandidate(result.candidates[0]).reviewExcerpts).toEqual([
+        'Please fix the null check here.',
+        'LGTM, approved!',
+      ])
       expect(result.telemetry.enrichmentBlocked).toBe(0)
     })
 
@@ -1230,7 +1561,7 @@ describe('buildCandidateDigest — enrichment and upstream privacy scan', () =>
       const result = buildCandidateDigest(input)
 
       // #then the candidate is emitted with empty reviewExcerpts
-      expect(result.candidates[0]?.reviewExcerpts).toEqual([])
+      expect(asReviewCandidate(result.candidates[0]).reviewExcerpts).toEqual([])
       expect(result.telemetry.enrichmentBlocked).toBe(0)
     })
   })
@@ -1256,7 +1587,7 @@ describe('buildCandidateDigest — enrichment and upstream privacy scan', () =>
       // #then the candidate is KEPT (not dropped)
       expect(result.candidates).toHaveLength(1)
       // #then reviewExcerpts are DROPPED (empty — enrichment blocked)
-      expect(result.candidates[0]?.reviewExcerpts).toEqual([])
+      expect(asReviewCandidate(result.candidates[0]).reviewExcerpts).toEqual([])
       // #then enrichmentBlocked is incremented
       expect(result.telemetry.enrichmentBlocked).toBe(1)
       // #then the private token does NOT appear anywhere in the serialized digest
@@ -1303,7 +1634,7 @@ describe('buildCandidateDigest — enrichment and upstream privacy scan', () =>
       const result = buildCandidateDigest(input)
 
       // #then the candidate is emitted with its reviewExcerpts intact
-      expect(result.candidates[0]?.reviewExcerpts).toEqual(['Please add a null check here.', 'LGTM'])
+      expect(asReviewCandidate(result.candidates[0]).reviewExcerpts).toEqual(['Please add a null check here.', 'LGTM'])
       expect(result.telemetry.enrichmentBlocked).toBe(0)
     })
 
@@ -1330,10 +1661,10 @@ describe('buildCandidateDigest — enrichment and upstream privacy scan', () =>
       expect(result.candidates).toHaveLength(2)
       // #then the blocked candidate has empty reviewExcerpts
       const blocked = result.candidates.find(c => c.mergeSha === blockedCandidate.mergeSha)
-      expect(blocked?.reviewExcerpts).toEqual([])
+      expect(asReviewCandidate(blocked).reviewExcerpts).toEqual([])
       // #then the clean candidate retains its reviewExcerpts
       const clean = result.candidates.find(c => c.mergeSha === cleanCandidate.mergeSha)
-      expect(clean?.reviewExcerpts).toEqual(['Fix the null check in the handler.'])
+      expect(asReviewCandidate(clean).reviewExcerpts).toEqual(['Fix the null check in the handler.'])
       // #then enrichmentBlocked is 1
       expect(result.telemetry.enrichmentBlocked).toBe(1)
       // #then the private token does not appear in the serialized digest
@@ -1376,7 +1707,7 @@ describe('buildCandidateDigest — enrichment and upstream privacy scan', () =>
       const result = buildCandidateDigest(input)
 
       // #then the candidate passes through (empty token set = nothing to block)
-      expect(result.candidates[0]?.reviewExcerpts).toEqual(['Some review prose here.'])
+      expect(asReviewCandidate(result.candidates[0]).reviewExcerpts).toEqual(['Some review prose here.'])
       expect(result.telemetry.enrichmentBlocked).toBe(0)
     })
   })
@@ -1425,7 +1756,7 @@ describe('buildCandidateDigest — enrichment and upstream privacy scan', () =>
       // #then both candidates are still emitted (title-only)
       expect(result.candidates).toHaveLength(2)
       for (const c of result.candidates) {
-        expect(c.reviewExcerpts).toEqual([])
+        expect(asReviewCandidate(c).reviewExcerpts).toEqual([])
       }
     })
   })
@@ -1438,7 +1769,7 @@ describe('buildCandidateDigest — enrichment and upstream privacy scan', () =>
 describe('applyEnrichmentScanAvailability', () => {
   it('scanAvailable=true → candidates returned unchanged', () => {
     // #given candidates with reviewExcerpts
-    const candidates: Candidate[] = [
+    const candidates: ReviewCandidate[] = [
       makeCandidate({mergeSha: `sha1${'0'.repeat(35)}`, reviewExcerpts: ['Fix the null check.']}),
       makeCandidate({mergeSha: `sha2${'0'.repeat(35)}`, reviewExcerpts: ['Another correction.']}),
     ]
@@ -1448,13 +1779,15 @@ describe('applyEnrichmentScanAvailability', () => {
 
     // #then candidates are returned unchanged (same references)
     expect(result).toBe(candidates)
-    expect(result[0]?.reviewExcerpts).toEqual(['Fix the null check.'])
-    expect(result[1]?.reviewExcerpts).toEqual(['Another correction.'])
+    const r0 = result[0]
+    const r1 = result[1]
+    if (r0?.trigger === 'review-heavy') expect(r0.reviewExcerpts).toEqual(['Fix the null check.'])
+    if (r1?.trigger === 'review-heavy') expect(r1.reviewExcerpts).toEqual(['Another correction.'])
   })
 
   it('scanAvailable=false → all reviewExcerpts cleared (fail-closed)', () => {
     // #given candidates with reviewExcerpts
-    const candidates: Candidate[] = [
+    const candidates: ReviewCandidate[] = [
       makeCandidate({mergeSha: `sha1${'0'.repeat(35)}`, reviewExcerpts: ['Fix the null check.']}),
       makeCandidate({mergeSha: `sha2${'0'.repeat(35)}`, reviewExcerpts: ['Another correction.']}),
     ]
@@ -1463,8 +1796,10 @@ describe('applyEnrichmentScanAvailability', () => {
     const result = applyEnrichmentScanAvailability(candidates, false)
 
     // #then all reviewExcerpts are cleared — no unscanned prose reaches the digest
-    expect(result[0]?.reviewExcerpts).toEqual([])
-    expect(result[1]?.reviewExcerpts).toEqual([])
+    const r0 = result[0]
+    const r1 = result[1]
+    if (r0?.trigger === 'review-heavy') expect(r0.reviewExcerpts).toEqual([])
+    if (r1?.trigger === 'review-heavy') expect(r1.reviewExcerpts).toEqual([])
     // #then other candidate fields are preserved
     expect(result[0]?.mergeSha).toBe(candidates[0]?.mergeSha)
     expect(result[1]?.mergeSha).toBe(candidates[1]?.mergeSha)
@@ -1472,7 +1807,7 @@ describe('applyEnrichmentScanAvailability', () => {
 
   it('mutation proof: removing the clearing makes the false-case test fail', () => {
     // #given a candidate with reviewExcerpts
-    const candidates: Candidate[] = [
+    const candidates: ReviewCandidate[] = [
       makeCandidate({mergeSha: `sha1${'0'.repeat(35)}`, reviewExcerpts: ['Sensitive prose.']}),
     ]
 
@@ -1481,18 +1816,20 @@ describe('applyEnrichmentScanAvailability', () => {
 
     // #then reviewExcerpts must be empty — if the clearing logic were removed,
     // result[0].reviewExcerpts would still be ['Sensitive prose.'] and this assertion fails
-    expect(result[0]?.reviewExcerpts).toEqual([])
+    const r0 = result[0]
+    if (r0?.trigger === 'review-heavy') expect(r0.reviewExcerpts).toEqual([])
   })
 
   it('scanAvailable=false with empty reviewExcerpts → still returns empty (no-op, no crash)', () => {
     // #given candidates already with empty reviewExcerpts
-    const candidates: Candidate[] = [makeCandidate({mergeSha: `sha1${'0'.repeat(35)}`, reviewExcerpts: []})]
+    const candidates: ReviewCandidate[] = [makeCandidate({mergeSha: `sha1${'0'.repeat(35)}`, reviewExcerpts: []})]
 
     // #when scan is unavailable
     const result = applyEnrichmentScanAvailability(candidates, false)
 
     // #then still empty — no crash
-    expect(result[0]?.reviewExcerpts).toEqual([])
+    const r0 = result[0]
+    if (r0?.trigger === 'review-heavy') expect(r0.reviewExcerpts).toEqual([])
   })
 })
 
@@ -1528,7 +1865,7 @@ describe('harvestCandidates — enrichment fetch', () => {
 
     // #then the candidate has exactly 3 reviewExcerpts (2 review bodies + 1 thread comment)
     expect(candidates).toHaveLength(1)
-    const excerpts = candidates[0]?.reviewExcerpts ?? []
+    const excerpts = asReviewCandidate(candidates[0]).reviewExcerpts
     expect(excerpts).toHaveLength(3)
     // #then ranked order: CHANGES_REQUESTED (rank 0) first, thread comment (rank 0) second, APPROVED (rank 2) last
     expect(excerpts[0]).toBe('Please fix the null check in the handler.')
@@ -1557,12 +1894,12 @@ describe('harvestCandidates — enrichment fetch', () => {
     const {candidates} = await harvestCandidates(octokit, 'fro-bot', '.github', new Date())
 
     // #then the CHANGES_REQUESTED body appears before the APPROVED body in reviewExcerpts
-    expect(candidates[0]?.reviewExcerpts).toBeDefined()
-    const excerpts = candidates[0]?.reviewExcerpts ?? []
+    const excerpts = asReviewCandidate(candidates[0]).reviewExcerpts
+    expect(excerpts).toBeDefined()
     expect(excerpts.length).toBeGreaterThan(0)
     // The correction sentence must appear before the approval boilerplate
-    const correctionIdx = excerpts.findIndex(e => e.includes('null check'))
-    const approvalIdx = excerpts.findIndex(e => e.includes('Looks good'))
+    const correctionIdx = excerpts.findIndex((e: string) => e.includes('null check'))
+    const approvalIdx = excerpts.findIndex((e: string) => e.includes('Looks good'))
     expect(correctionIdx).not.toBe(-1)
     // If both are present, correction must come first
     if (approvalIdx !== -1) {
@@ -1594,7 +1931,7 @@ describe('harvestCandidates — enrichment fetch', () => {
     const {candidates} = await harvestCandidates(octokit, 'fro-bot', '.github', new Date())
 
     // #then the total excerpt length is within budget
-    const excerpts = candidates[0]?.reviewExcerpts ?? []
+    const excerpts = asReviewCandidate(candidates[0]).reviewExcerpts
     const totalChars = excerpts.join('').length
     expect(totalChars).toBeLessThanOrEqual(MAX_EXCERPT_CHARS_PER_CANDIDATE)
     // #then the correction sentence survives (ranked first, not clipped)
@@ -1625,7 +1962,7 @@ describe('harvestCandidates — enrichment fetch', () => {
     const {candidates} = await harvestCandidates(octokit, 'fro-bot', '.github', new Date())
 
     // #then exactly one excerpt (the truncated first item)
-    const excerpts = candidates[0]?.reviewExcerpts ?? []
+    const excerpts = asReviewCandidate(candidates[0]).reviewExcerpts
     expect(excerpts).toHaveLength(1)
     // #then it is truncated to exactly MAX_EXCERPT_CHARS_PER_CANDIDATE chars
     expect(excerpts[0]).toHaveLength(MAX_EXCERPT_CHARS_PER_CANDIDATE)
@@ -1653,7 +1990,7 @@ describe('harvestCandidates — enrichment fetch', () => {
     const {candidates} = await harvestCandidates(octokit, 'fro-bot', '.github', new Date())
 
     // #then excerpts are ordered by correction signal rank: CHANGES_REQUESTED (0) → DISMISSED (1) → APPROVED (2)
-    const excerpts = candidates[0]?.reviewExcerpts ?? []
+    const excerpts = asReviewCandidate(candidates[0]).reviewExcerpts
     expect(excerpts).toHaveLength(3)
     expect(excerpts[0]).toBe('Please fix the null check.')
     expect(excerpts[1]).toBe('This was dismissed due to scope change.')
@@ -1685,7 +2022,7 @@ describe('harvestCandidates — enrichment fetch', () => {
     const {candidates} = await harvestCandidates(octokit, 'fro-bot', '.github', new Date())
 
     // #then both pages of comments are included in reviewExcerpts
-    const excerptText = candidates[0]?.reviewExcerpts.join(' ') ?? ''
+    const excerptText = asReviewCandidate(candidates[0]).reviewExcerpts.join(' ')
     expect(excerptText).toContain('Page 1 comment')
     expect(excerptText).toContain('Page 2 comment')
   })
@@ -1730,7 +2067,7 @@ describe('harvestCandidates — enrichment fetch', () => {
     // #then the second candidate has its thread comment
     const c2 = candidates.find(c => c.mergeSha === sha2)
     expect(c2).toBeDefined()
-    const c2Text = c2?.reviewExcerpts.join(' ') ?? ''
+    const c2Text = asReviewCandidate(c2).reviewExcerpts.join(' ')
     expect(c2Text).toContain('Good comment from PR 2')
   })
 
@@ -1755,7 +2092,7 @@ describe('harvestCandidates — enrichment fetch', () => {
     const {candidates} = await harvestCandidates(octokit, 'fro-bot', '.github', new Date())
 
     // #then only the non-empty body is in reviewExcerpts
-    const excerpts = candidates[0]?.reviewExcerpts ?? []
+    const excerpts = asReviewCandidate(candidates[0]).reviewExcerpts
     expect(excerpts).toHaveLength(1)
     expect(excerpts[0]).toContain('null check issue')
   })
diff --git a/scripts/capture-learnings-harvest.ts b/scripts/capture-learnings-harvest.ts
index 20865cdee..e1d097888 100644
--- a/scripts/capture-learnings-harvest.ts
+++ b/scripts/capture-learnings-harvest.ts
@@ -16,7 +16,13 @@ import process from 'node:process'
 import {Octokit} from '@octokit/rest'
 import {parse} from 'yaml'
 
-import {isRecord, learningBodyHasPrivateLeak, loadPrivateTokensFromDisk} from './capture-learnings-privacy.ts'
+import {
+  isRecord,
+  learningBodyHasPrivateLeak,
+  loadPrivateTokensFromDisk,
+  logDiffHasSecret,
+  redactLogDiffSecrets,
+} from './capture-learnings-privacy.ts'
 
 // ---------------------------------------------------------------------------
 // Constants
@@ -107,13 +113,20 @@ export interface CandidateSignals {
 }
 
 /**
- * A candidate PR for a learning proposal.
+ * A candidate PR for a learning proposal — discriminated union on `trigger`.
  *
- * Carries no owner, repo, or PR number. `signals` includes tokens derived from the PR
- * title and labels, so title-derived tokens do reach the consuming agent — they are not
- * fully sanitized here. The deterministic open step is the privacy chokepoint: it scans
- * the final authored body for private identifiers and blocks before posting, so a private
- * token surfacing in a title token is gated downstream rather than leaked.
+ * Both variants carry no owner, repo, or PR number. `signals` includes tokens derived
+ * from the PR title and labels. The deterministic open step is the privacy chokepoint:
+ * it scans the final authored body for private identifiers and blocks before posting.
+ *
+ * Design: top-level discriminated fields per variant (not a nested `evidence` block).
+ * This keeps the diff minimal, preserves the existing ReviewCandidate shape exactly
+ * (only adding `trigger`), and avoids over-engineering before Unit 3 adds CiFix evidence.
+ */
+export type Candidate = ReviewCandidate | CiFixCandidate
+
+/**
+ * A candidate sourced from a PR with multiple Fro Bot review rounds.
  *
  * `reviewExcerpts` carries privacy-scanned review prose (review bodies + line-level thread
  * comments), ranked by correction signal (CHANGES_REQUESTED / thread replies first) and
@@ -121,7 +134,8 @@ export interface CandidateSignals {
  * upstream privacy scan or when no prose was available. Array form gives the agent clearer
  * structure than a single concatenated string.
  */
-export interface Candidate {
+export interface ReviewCandidate {
+  trigger: 'review-heavy'
   mergeSha: string
   /**
    * Number of Fro Bot substantive review rounds (APPROVED | CHANGES_REQUESTED | DISMISSED).
@@ -140,6 +154,25 @@ export interface Candidate {
   reviewExcerpts: string[]
 }
 
+/**
+ * A candidate sourced from a PR whose CI checks transitioned failed → passed before merge.
+ *
+ * Evidence fields (failingCheckName, lastFailingSha, firstPassingSha, diffExcerpt,
+ * logExcerpt) are populated in Unit 3. This placeholder carries the shared fields
+ * so the union type and digest pipeline are ready before the harvester is implemented.
+ */
+export interface CiFixCandidate {
+  trigger: 'ci-fail-then-pass'
+  mergeSha: string
+  signals: CandidateSignals
+  // Evidence fields populated in Unit 3:
+  // failingCheckName: string
+  // lastFailingSha: string
+  // firstPassingSha: string
+  // diffExcerpt: string
+  // logExcerpt?: string
+}
+
 /**
  * Harvest-stage counts threaded from `harvestCandidates` into the final telemetry.
  * Kept separate from the pure core so `buildCandidateDigest` remains I/O-free.
@@ -161,11 +194,18 @@ export interface DigestTelemetry {
   afterSolutionsDedup: number
   emitted: number
   /**
-   * Number of candidates whose enriched review-prose content was dropped by the
-   * upstream privacy scan. The candidate itself is kept (title-only); only the
-   * reviewExcerpts are cleared. Counts-only — no private names logged.
+   * Number of candidates whose enriched content was dropped by the private-name scan.
+   * The candidate itself is kept (title-only); only the enriched evidence is cleared.
+   * Counts-only — no private names logged.
    */
   enrichmentBlocked: number
+  /**
+   * Number of candidates whose enriched content was dropped because it contained a
+   * hard-secret shape (PAT, private key, credential-bearing connection string, etc.)
+   * detected by `logDiffHasSecret` after `redactLogDiffSecrets` was applied.
+   * Counts-only — no secret values logged.
+   */
+  enrichmentBlockedBySecret: number
 }
 
 /** Result of `buildCandidateDigest`. */
@@ -176,7 +216,7 @@ export interface CandidateDigest {
 
 /** Input to the pure core. */
 export interface BuildCandidateDigestInput {
-  /** Merged PRs that already passed the review-predicate filter. */
+  /** Candidates from all sources (review-heavy + ci-fail-then-pass). */
   mergedPrs: Candidate[]
   /** Harvest-stage counts to thread into the final telemetry. */
   stageCounts: HarvestStageCounts
@@ -234,6 +274,8 @@ export function parseMergeShaMarker(body: string): string | null {
  * Build the opaque candidate digest from injected inputs.
  *
  * Steps:
+ * 0. Within-run dedup: collapse to one candidate per mergeSha (a PR matching both triggers
+ *    appears once per source); review-heavy takes precedence (R4).
  * 1. Drop candidates whose mergeSha is already in openedLearningShas (seen-set dedup).
  * 2. Drop candidates whose signals strongly overlap an existing solutions doc (solutions dedup).
  * 3. Cap to maxLearnings.
@@ -246,8 +288,21 @@ export function parseMergeShaMarker(body: string): string | null {
  * No I/O. Fully unit-testable. The private token set is injected so the scan is pure.
  */
 export function buildCandidateDigest(input: BuildCandidateDigestInput): CandidateDigest {
+  // Within-run dedup: a PR that matched more than one trigger (e.g. review-heavy AND
+  // ci-fail-then-pass) appears once per source in mergedPrs. Collapse to one candidate per
+  // mergeSha so a single PR yields a single learning-proposal (R4). Precedence: review-heavy
+  // wins — review prose is the richer signal when a PR matched both.
+  const byMergeSha = new Map<string, Candidate>()
+  for (const pr of input.mergedPrs) {
+    const existing = byMergeSha.get(pr.mergeSha)
+    if (existing === undefined || (existing.trigger !== 'review-heavy' && pr.trigger === 'review-heavy')) {
+      byMergeSha.set(pr.mergeSha, pr)
+    }
+  }
+  const deduped = [...byMergeSha.values()]
+
   // drop already-proposed merge SHAs
-  const afterSeenDedup = input.mergedPrs.filter(pr => !input.openedLearningShas.has(pr.mergeSha))
+  const afterSeenDedup = deduped.filter(pr => !input.openedLearningShas.has(pr.mergeSha))
 
   // drop candidates whose signals overlap an existing solutions doc
   const afterSolutionsDedup = afterSeenDedup.filter(pr => !overlapsAnySolutionsDoc(pr.signals, input.solutionsDocs))
@@ -255,16 +310,51 @@ export function buildCandidateDigest(input: BuildCandidateDigestInput): Candidat
   // cap to maxLearnings
   const capped = afterSolutionsDedup.slice(0, input.maxLearnings)
 
-  // upstream privacy scan: scan each candidate's reviewExcerpts fail-closed
-  // on a hit → drop enriched content (empty reviewExcerpts), keep the candidate
+  // upstream privacy scan: scan each candidate's evidence text fail-closed.
+  //
+  // Privacy-ordering invariant: scan the already-truncated evidence text (truncation
+  // happens in the I/O shell before this pure core is called). Never move truncation
+  // after the scan — scanning a token and then truncating around it could leave a
+  // surviving tail that escapes the gate.
+  //
+  // For ReviewCandidate:
+  //   1. Redact structural secrets (paths, hostnames, Bearer tokens) via redactLogDiffSecrets.
+  //   2. If learningBodyHasPrivateLeak (private repo name) OR logDiffHasSecret (hard secret)
+  //      still true after redaction → clear reviewExcerpts + increment enrichmentBlockedBySecret.
+  //   3. Otherwise use the redacted excerpts.
+  //
+  // For CiFixCandidate (Unit 3 placeholder):
+  //   Evidence fields are not yet populated; no scan needed until Unit 3 adds them.
   let enrichmentBlocked = 0
+  let enrichmentBlockedBySecret = 0
   const candidates = capped.map(pr => {
-    if (pr.reviewExcerpts.length === 0) return pr
-    const excerptText = pr.reviewExcerpts.join('\n')
-    if (learningBodyHasPrivateLeak(excerptText, input.privateTokens)) {
-      enrichmentBlocked++
-      return {...pr, reviewExcerpts: []}
+    if (pr.trigger === 'review-heavy') {
+      if (pr.reviewExcerpts.length === 0) return pr
+
+      // Step 1: redact structural secrets from the already-truncated excerpts
+      const redactedExcerpts = pr.reviewExcerpts.map(e => redactLogDiffSecrets(e))
+      const redactedText = redactedExcerpts.join('\n')
+
+      // Step 2: check for private-name leak or residual hard secret after redaction
+      const hasPrivateLeak = learningBodyHasPrivateLeak(redactedText, input.privateTokens)
+      const hasResidualSecret = logDiffHasSecret(redactedText)
+
+      if (hasPrivateLeak) {
+        // Private-name hit: clear enriched content, keep candidate title-only
+        enrichmentBlocked++
+        return {...pr, reviewExcerpts: []}
+      }
+      if (hasResidualSecret) {
+        // Hard-secret residual after redaction: clear enriched content
+        enrichmentBlockedBySecret++
+        return {...pr, reviewExcerpts: []}
+      }
+
+      // Step 3: use the redacted excerpts (structural secrets replaced with [REDACTED])
+      return {...pr, reviewExcerpts: redactedExcerpts}
     }
+
+    // CiFixCandidate: no evidence fields yet (Unit 3 placeholder)
     return pr
   })
 
@@ -276,6 +366,7 @@ export function buildCandidateDigest(input: BuildCandidateDigestInput): Candidat
       afterSolutionsDedup: afterSolutionsDedup.length,
       emitted: candidates.length,
       enrichmentBlocked,
+      enrichmentBlockedBySecret,
     },
   }
 }
@@ -283,16 +374,23 @@ export function buildCandidateDigest(input: BuildCandidateDigestInput): Candidat
 /**
  * Apply fail-closed enrichment scan availability to a list of candidates.
  *
- * When `scanAvailable` is false (private token load failed), all `reviewExcerpts`
- * are cleared so no unscanned prose reaches the digest. When `scanAvailable` is
- * true, candidates are returned unchanged.
+ * When `scanAvailable` is false (private token load failed), all enriched evidence
+ * is cleared so no unscanned content reaches the digest. For ReviewCandidate, this
+ * clears `reviewExcerpts`. For CiFixCandidate, no evidence fields exist yet (Unit 3).
+ * When `scanAvailable` is true, candidates are returned unchanged.
  *
  * Pure function: no I/O, fully unit-testable. Extracted so the fail-closed
  * composition can be tested independently of the I/O shell.
  */
 export function applyEnrichmentScanAvailability(candidates: Candidate[], scanAvailable: boolean): Candidate[] {
   if (scanAvailable) return candidates
-  return candidates.map(c => ({...c, reviewExcerpts: [] as string[]}))
+  return candidates.map(c => {
+    if (c.trigger === 'review-heavy') {
+      return {...c, reviewExcerpts: [] as string[]}
+    }
+    // CiFixCandidate: no evidence fields yet (Unit 3 placeholder)
+    return c
+  })
 }
 
 /**
@@ -712,6 +810,7 @@ export async function harvestCandidates(
     const reviewExcerpts = buildReviewExcerpts(reviewBodies, threadCommentBodies)
 
     candidates.push({
+      trigger: 'review-heavy',
       mergeSha: pr.merge_commit_sha,
       reviewRounds: substantiveReviewCount,
       signals: deriveSignals({
@@ -835,6 +934,7 @@ async function main(): Promise<void> {
         afterSolutionsDedup: 0,
         emitted: 0,
         enrichmentBlocked: 0,
+        enrichmentBlockedBySecret: 0,
       },
     }
     try {
diff --git a/scripts/capture-learnings-open.test.ts b/scripts/capture-learnings-open.test.ts
index 3778c00ba..7cd27d514 100644
--- a/scripts/capture-learnings-open.test.ts
+++ b/scripts/capture-learnings-open.test.ts
@@ -34,6 +34,7 @@ import {buildPrivateTokenSet} from './wiki-slug.ts'
 
 function makeCandidate(overrides: Partial<Candidate> = {}): Candidate {
   return {
+    trigger: 'review-heavy',
     mergeSha: 'abc123def456abc123def456abc123def456abc1',
     reviewRounds: 2,
     signals: {titleTokens: ['feat', 'scripts'], labels: []},

From 5c586648c4bf2cf4743b2b5c3374e5494a5f46fb Mon Sep 17 00:00:00 2001
From: "Marcus R. Brown" <git@mrbro.dev>
Date: Tue, 23 Jun 2026 15:24:06 -0700
Subject: [PATCH 4/7] feat(capture): detect failed-then-fixed PRs and carry the
 fix as evidence

Add the CI fail-then-pass harvester: one GraphQL query per PR yields per-commit
check conclusions, a chronological walk finds the last-failing and first-passing
commits for a required check, and the fixing diff between them becomes the primary
evidence. A best-effort failing-log excerpt enriches it and degrades to a
placeholder when logs are gone. The diff and log pass the same redact-and-scan
privacy gate as review prose, dropping on a private name or residual secret. A PR
with no fixing diff (a bare green re-run) is not a candidate.
---
 scripts/capture-learnings-harvest.test.ts | 651 +++++++++++++++++++++-
 scripts/capture-learnings-harvest.ts      | 566 ++++++++++++++++++-
 scripts/capture-learnings-open.test.ts    |   3 +-
 3 files changed, 1200 insertions(+), 20 deletions(-)

diff --git a/scripts/capture-learnings-harvest.test.ts b/scripts/capture-learnings-harvest.test.ts
index b79084688..7a934ffad 100644
--- a/scripts/capture-learnings-harvest.test.ts
+++ b/scripts/capture-learnings-harvest.test.ts
@@ -15,8 +15,10 @@ import {
   buildMergeShaMarker,
   DEPENDENCY_LABELS,
   fetchOpenedLearningShas,
+  findFailPassTransition,
   FRO_BOT_REVIEWER_LOGINS,
   harvestCandidates,
+  harvestCiFixCandidates,
   LEARNING_PROPOSAL_LABEL,
   MAX_EXCERPT_CHARS_PER_CANDIDATE,
   parseMergeShaMarker,
@@ -24,6 +26,8 @@ import {
   type Candidate,
   type CandidateDigest,
   type CiFixCandidate,
+  type CommitCheckEntry,
+  type GhExecFn,
   type HarvestStageCounts,
   type OctokitClient,
   type ReviewCandidate,
@@ -51,6 +55,10 @@ function makeCiFixCandidate(overrides: Partial<CiFixCandidate> = {}): CiFixCandi
     trigger: 'ci-fail-then-pass',
     mergeSha: 'cifix001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
     signals: {titleTokens: ['fix', 'ci'], labels: []},
+    failingCheckName: 'CI / test',
+    lastFailingSha: 'fail0001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
+    firstPassingSha: 'pass0001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
+    diffExcerpt: '--- a/scripts/foo.ts\n+++ b/scripts/foo.ts\n@@ -1 +1 @@\n-bad\n+good',
     ...overrides,
   }
 }
@@ -81,6 +89,8 @@ function makeZeroStageCounts(): HarvestStageCounts {
     mergedPrsInLookback: 0,
     excludedAutomation: 0,
     multiRoundCandidates: 0,
+    ciFixPrsExamined: 0,
+    ciFixCandidates: 0,
   }
 }
 
@@ -198,8 +208,8 @@ describe('buildCandidateDigest', () => {
       }
     })
 
-    it('ci-fail-then-pass candidate has ONLY trigger, mergeSha, signals — no owner/repo/number/title', () => {
-      // #given a ci-fix candidate
+    it('ci-fail-then-pass candidate has ONLY the allowed evidence keys — no owner/repo/number/title', () => {
+      // #given a ci-fix candidate with all evidence fields (no logExcerpt)
       const ciFixCandidate = makeCiFixCandidate()
       const input = makeDigestInput({mergedPrs: [ciFixCandidate]})
 
@@ -210,7 +220,18 @@ describe('buildCandidateDigest', () => {
       for (const candidate of result.candidates) {
         if (candidate.trigger !== 'ci-fail-then-pass') continue
         const keys = Object.keys(candidate).sort()
-        expect(keys).toEqual(['mergeSha', 'signals', 'trigger'].sort())
+        // logExcerpt is optional — may or may not be present
+        const allowedKeys = [
+          'trigger',
+          'mergeSha',
+          'signals',
+          'failingCheckName',
+          'lastFailingSha',
+          'firstPassingSha',
+          'diffExcerpt',
+        ].sort()
+        const allowedKeysWithLog = [...allowedKeys, 'logExcerpt'].sort()
+        expect([allowedKeys, allowedKeysWithLog]).toContainEqual(keys)
         // Explicitly assert forbidden keys are absent
         expect(candidate).not.toHaveProperty('owner')
         expect(candidate).not.toHaveProperty('repo')
@@ -401,6 +422,8 @@ describe('buildCandidateDigest', () => {
         mergedPrsInLookback: 10,
         excludedAutomation: 2,
         multiRoundCandidates: 5,
+        ciFixPrsExamined: 0,
+        ciFixCandidates: 0,
       }
 
       const input = makeDigestInput({
@@ -534,8 +557,8 @@ describe('buildCandidateDigest — characterization: review-heavy emitted shape'
 // ---------------------------------------------------------------------------
 
 describe('buildCandidateDigest — CiFixCandidate union support', () => {
-  it('ci-fail-then-pass candidate passes through dedup, cap, and privacy unchanged', () => {
-    // #given a ci-fix candidate (no evidence fields yet — Unit 3 placeholder)
+  it('ci-fail-then-pass candidate passes through dedup, cap, and privacy with clean evidence', () => {
+    // #given a ci-fix candidate with clean evidence (no secrets, no private names)
     const ciFixCandidate = makeCiFixCandidate({
       mergeSha: 'cifix001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
     })
@@ -548,7 +571,12 @@ describe('buildCandidateDigest — CiFixCandidate union support', () => {
     expect(result.candidates).toHaveLength(1)
     expect(result.candidates[0]?.trigger).toBe('ci-fail-then-pass')
     expect(result.candidates[0]?.mergeSha).toBe('cifix001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1')
-    // #then no enrichment was blocked (no evidence to scan yet)
+    // #then evidence fields are preserved (clean content passes through)
+    if (result.candidates[0]?.trigger === 'ci-fail-then-pass') {
+      expect(result.candidates[0].failingCheckName).toBe('CI / test')
+      expect(result.candidates[0].diffExcerpt).toContain('-bad')
+    }
+    // #then no enrichment was blocked (clean content)
     expect(result.telemetry.enrichmentBlocked).toBe(0)
     expect(result.telemetry.enrichmentBlockedBySecret).toBe(0)
   })
@@ -2097,3 +2125,614 @@ describe('harvestCandidates — enrichment fetch', () => {
     expect(excerpts[0]).toContain('null check issue')
   })
 })
+
+// ---------------------------------------------------------------------------
+// Pure core: findFailPassTransition (test-first, the correctness core)
+// ---------------------------------------------------------------------------
+
+// Helper to build a CheckRun entry (outer scope for unicorn/consistent-function-scoping)
+function makeCr(sha: string, name: string, conclusion: string): CommitCheckEntry {
+  return {type: 'CheckRun', sha, name, conclusion}
+}
+// Helper to build a StatusContext entry (outer scope for unicorn/consistent-function-scoping)
+function makeSc(sha: string, context: string, state: string): CommitCheckEntry {
+  return {type: 'StatusContext', sha, context, state}
+}
+
+describe('findFailPassTransition', () => {
+  it('happy path: clean fail→pass on a required check returns correct SHAs', () => {
+    // #given commits oldest→newest: sha1=FAILURE, sha2=SUCCESS
+    const commits: CommitCheckEntry[] = [makeCr('sha1', 'CI / test', 'FAILURE'), makeCr('sha2', 'CI / test', 'SUCCESS')]
+    const required = new Set(['CI / test'])
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then the transition is found with correct SHAs
+    expect(result).not.toBeNull()
+    expect(result?.failingCheckName).toBe('CI / test')
+    expect(result?.lastFailingSha).toBe('sha1')
+    expect(result?.firstPassingSha).toBe('sha2')
+  })
+
+  it('never-failed: all SUCCESS → null (no transition)', () => {
+    // #given commits where the check always passes
+    const commits: CommitCheckEntry[] = [makeCr('sha1', 'CI / test', 'SUCCESS'), makeCr('sha2', 'CI / test', 'SUCCESS')]
+    const required = new Set(['CI / test'])
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then no transition found
+    expect(result).toBeNull()
+  })
+
+  it('failed-and-stayed-failed: FAILURE then FAILURE → null', () => {
+    // #given commits where the check fails and never recovers
+    const commits: CommitCheckEntry[] = [makeCr('sha1', 'CI / test', 'FAILURE'), makeCr('sha2', 'CI / test', 'FAILURE')]
+    const required = new Set(['CI / test'])
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then no transition found
+    expect(result).toBeNull()
+  })
+
+  it('check not in required set → ignored (no transition returned)', () => {
+    // #given a fail→pass on 'CI / lint' but required set only has 'CI / test'
+    const commits: CommitCheckEntry[] = [makeCr('sha1', 'CI / lint', 'FAILURE'), makeCr('sha2', 'CI / lint', 'SUCCESS')]
+    const required = new Set(['CI / test'])
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then no transition found (wrong check name)
+    expect(result).toBeNull()
+  })
+
+  it('multiple checks: only one transitions → returns the transitioning check', () => {
+    // #given two checks: 'CI / test' transitions, 'CI / lint' never fails
+    const commits: CommitCheckEntry[] = [
+      makeCr('sha1', 'CI / test', 'FAILURE'),
+      makeCr('sha1', 'CI / lint', 'SUCCESS'),
+      makeCr('sha2', 'CI / test', 'SUCCESS'),
+      makeCr('sha2', 'CI / lint', 'SUCCESS'),
+    ]
+    const required = new Set(['CI / test', 'CI / lint'])
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then the transitioning check is returned
+    expect(result).not.toBeNull()
+    expect(result?.failingCheckName).toBe('CI / test')
+    expect(result?.lastFailingSha).toBe('sha1')
+    expect(result?.firstPassingSha).toBe('sha2')
+  })
+
+  it('fail-pass-fail-pass: picks the LAST failing SHA then first passing after it', () => {
+    // #given a check that fails, passes, fails again, then passes
+    // The LAST failing SHA is sha3; the first passing after it is sha4.
+    const commits: CommitCheckEntry[] = [
+      makeCr('sha1', 'CI / test', 'FAILURE'), // first failure
+      makeCr('sha2', 'CI / test', 'SUCCESS'), // first pass
+      makeCr('sha3', 'CI / test', 'FAILURE'), // second failure (LAST failing)
+      makeCr('sha4', 'CI / test', 'SUCCESS'), // second pass (first after last failing)
+    ]
+    const required = new Set(['CI / test'])
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then lastFailingSha is sha3 (the LAST failure), firstPassingSha is sha4
+    expect(result).not.toBeNull()
+    expect(result?.lastFailingSha).toBe('sha3')
+    expect(result?.firstPassingSha).toBe('sha4')
+  })
+
+  it('StatusContext (legacy status) transition: state=FAILURE → state=SUCCESS', () => {
+    // #given StatusContext entries (not CheckRun)
+    const commits: CommitCheckEntry[] = [makeSc('sha1', 'ci/test', 'FAILURE'), makeSc('sha2', 'ci/test', 'SUCCESS')]
+    const required = new Set(['ci/test'])
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then the transition is found
+    expect(result).not.toBeNull()
+    expect(result?.failingCheckName).toBe('ci/test')
+    expect(result?.lastFailingSha).toBe('sha1')
+    expect(result?.firstPassingSha).toBe('sha2')
+  })
+
+  it('all failing conclusions are recognized: TIMED_OUT, CANCELLED, ACTION_REQUIRED, STARTUP_FAILURE', () => {
+    // #given each failing conclusion type followed by SUCCESS
+    const failingConclusions = ['TIMED_OUT', 'CANCELLED', 'ACTION_REQUIRED', 'STARTUP_FAILURE']
+    for (const conclusion of failingConclusions) {
+      const commits: CommitCheckEntry[] = [
+        makeCr('sha1', 'CI / test', conclusion),
+        makeCr('sha2', 'CI / test', 'SUCCESS'),
+      ]
+      const required = new Set(['CI / test'])
+      const result = findFailPassTransition(commits, required)
+      expect(result).not.toBeNull()
+      expect(result?.lastFailingSha).toBe('sha1')
+    }
+  })
+
+  it('empty required set → any failed→passed transition counts (no branch protection)', () => {
+    // #given an empty required set and a check that transitions
+    const commits: CommitCheckEntry[] = [makeCr('sha1', 'CI / test', 'FAILURE'), makeCr('sha2', 'CI / test', 'SUCCESS')]
+    const required = new Set<string>() // empty = any check counts
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then the transition is found (any check counts)
+    expect(result).not.toBeNull()
+    expect(result?.failingCheckName).toBe('CI / test')
+  })
+
+  it('empty commits array → null', () => {
+    // #given no commits
+    const result = findFailPassTransition([], new Set(['CI / test']))
+    expect(result).toBeNull()
+  })
+
+  it('MUTATION PROOF: reversing oldest→newest ordering picks the wrong SHA', () => {
+    // #given commits in chronological order: sha1=FAILURE, sha2=FAILURE, sha3=SUCCESS
+    // The correct lastFailingSha is sha2 (the LATEST failure).
+    // If the array were reversed (newest→oldest), the walk would find sha2 as the
+    // "first" failure and sha1 as the "first" passing after it — but sha1 is not SUCCESS.
+    // This fixture proves the ordering invariant is load-bearing.
+    const commitsOldestFirst: CommitCheckEntry[] = [
+      makeCr('sha1', 'CI / test', 'FAILURE'), // older failure
+      makeCr('sha2', 'CI / test', 'FAILURE'), // newer failure (LAST)
+      makeCr('sha3', 'CI / test', 'SUCCESS'), // first passing after last failure
+    ]
+    const required = new Set(['CI / test'])
+
+    // #when finding the transition with correct ordering
+    const result = findFailPassTransition(commitsOldestFirst, required)
+
+    // #then lastFailingSha is sha2 (the LATEST failure), firstPassingSha is sha3
+    expect(result).not.toBeNull()
+    expect(result?.lastFailingSha).toBe('sha2')
+    expect(result?.firstPassingSha).toBe('sha3')
+
+    // #when the array is reversed (newest→oldest — wrong ordering)
+    const commitsNewestFirst = [...commitsOldestFirst].reverse()
+    const resultReversed = findFailPassTransition(commitsNewestFirst, required)
+
+    // #then the reversed ordering picks sha3 as lastFailingSha (wrong!) because
+    // it encounters sha3=SUCCESS first, then sha2=FAILURE (which becomes lastFailingSha),
+    // then sha1=FAILURE (which also becomes lastFailingSha), and there's no SUCCESS after sha1.
+    // So the reversed result is null (no transition found after the last failure in reversed order).
+    // Either way, the result differs from the correct ordering — proving the invariant.
+    expect(resultReversed?.lastFailingSha).not.toBe('sha2')
+  })
+})
+
+// ---------------------------------------------------------------------------
+// I/O shell: harvestCiFixCandidates (mocked gh-exec + octokit)
+// ---------------------------------------------------------------------------
+
+/**
+ * Build a minimal GraphQL response for fetchPrCommitCheckRollup.
+ * commits: array of {oid, checks: [{name, conclusion}]}
+ */
+function makeGraphQLResponse(commits: {oid: string; checks: {name: string; conclusion: string}[]}[]): string {
+  return JSON.stringify({
+    data: {
+      repository: {
+        pullRequest: {
+          commits: {
+            pageInfo: {hasNextPage: false, endCursor: null},
+            nodes: commits.map(c => ({
+              commit: {
+                oid: c.oid,
+                statusCheckRollup: {
+                  contexts: {
+                    nodes: c.checks.map(ch => ({
+                      __typename: 'CheckRun',
+                      name: ch.name,
+                      conclusion: ch.conclusion,
+                    })),
+                  },
+                },
+              },
+            })),
+          },
+        },
+      },
+    },
+  })
+}
+
+/** Build a merged PR item for harvestCiFixCandidates */
+function makeMergedPrItem(
+  overrides: {
+    number?: number
+    merge_commit_sha?: string
+    title?: string
+    labels?: {name: string}[]
+    user?: {login: string} | null
+  } = {},
+) {
+  return {
+    number: overrides.number ?? 1,
+    merge_commit_sha: overrides.merge_commit_sha ?? 'merge001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
+    title: overrides.title ?? 'fix: correct the CI failure',
+    labels: overrides.labels ?? [],
+    user: overrides.user ?? {login: 'some-human'},
+  }
+}
+
+/** Build a mock OctokitClient for harvestCiFixCandidates */
+function mockOctokitForCiFix(
+  overrides: {
+    compareCommits?: () => Promise<{data: {files: {filename: string; patch: string}[]}}>
+    listWorkflowRunsForRepo?: () => Promise<{data: {workflow_runs: {id: number; conclusion: string}[]}}>
+    listJobsForWorkflowRun?: () => Promise<{data: {jobs: {id: number; conclusion: string}[]}}>
+    downloadJobLogsForWorkflowRun?: () => Promise<{data: string}>
+  } = {},
+): OctokitClient {
+  return {
+    paginate: async () => [],
+    rest: {
+      pulls: {list: async () => ({data: []}), listReviews: async () => ({data: []})},
+      issues: {listForRepo: async () => ({data: []})},
+      repos: {
+        compareCommits:
+          overrides.compareCommits ??
+          (async () => ({
+            data: {
+              files: [{filename: 'scripts/foo.ts', patch: '@@ -1 +1 @@\n-bad\n+good'}],
+            },
+          })),
+      },
+      actions: {
+        listWorkflowRunsForRepo:
+          overrides.listWorkflowRunsForRepo ??
+          (async () => ({data: {workflow_runs: [{id: 42, conclusion: 'failure'}]}})),
+        listJobsForWorkflowRun:
+          overrides.listJobsForWorkflowRun ?? (async () => ({data: {jobs: [{id: 99, conclusion: 'failure'}]}})),
+        downloadJobLogsForWorkflowRun:
+          overrides.downloadJobLogsForWorkflowRun ?? (async () => ({data: 'Error: test failed\nsome other output'})),
+      },
+    },
+  } as unknown as OctokitClient
+}
+
+describe('harvestCiFixCandidates', () => {
+  it('happy path: PR with fail→pass transition → one CiFixCandidate with correct SHAs and diffExcerpt', async () => {
+    // #given a merged PR with a failing commit then a passing commit
+    const pr = makeMergedPrItem({
+      number: 1,
+      merge_commit_sha: 'merge001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
+    })
+    const ghExec: GhExecFn = () =>
+      makeGraphQLResponse([
+        {oid: 'sha1fail', checks: [{name: 'CI / test', conclusion: 'FAILURE'}]},
+        {oid: 'sha2pass', checks: [{name: 'CI / test', conclusion: 'SUCCESS'}]},
+      ])
+    const octokit = mockOctokitForCiFix()
+
+    // #when harvesting
+    const result = await harvestCiFixCandidates(
+      octokit,
+      'fro-bot',
+      '.github',
+      new Date(),
+      [pr],
+      new Set<string>(), // empty = any check counts
+      ghExec,
+    )
+
+    // #then one candidate is returned
+    expect(result.candidates).toHaveLength(1)
+    const candidate = result.candidates[0]
+    expect(candidate?.trigger).toBe('ci-fail-then-pass')
+    expect(candidate?.mergeSha).toBe('merge001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1')
+    expect(candidate?.failingCheckName).toBe('CI / test')
+    expect(candidate?.lastFailingSha).toBe('sha1fail')
+    expect(candidate?.firstPassingSha).toBe('sha2pass')
+    expect(candidate?.diffExcerpt).toContain('-bad')
+    expect(candidate?.diffExcerpt).toContain('+good')
+    // #then telemetry counts are correct
+    expect(result.ciFixPrsExamined).toBe(1)
+    expect(result.ciFixCandidates).toBe(1)
+  })
+
+  it('PR that never failed → no candidate', async () => {
+    // #given a PR where the check always passes
+    const pr = makeMergedPrItem()
+    const ghExec: GhExecFn = () =>
+      makeGraphQLResponse([
+        {oid: 'sha1', checks: [{name: 'CI / test', conclusion: 'SUCCESS'}]},
+        {oid: 'sha2', checks: [{name: 'CI / test', conclusion: 'SUCCESS'}]},
+      ])
+    const octokit = mockOctokitForCiFix()
+
+    // #when harvesting
+    const result = await harvestCiFixCandidates(octokit, 'fro-bot', '.github', new Date(), [pr], new Set(), ghExec)
+
+    // #then no candidate
+    expect(result.candidates).toHaveLength(0)
+    expect(result.ciFixCandidates).toBe(0)
+  })
+
+  it('PR that failed and stayed failing → no candidate', async () => {
+    // #given a PR where the check fails and never recovers
+    const pr = makeMergedPrItem()
+    const ghExec: GhExecFn = () =>
+      makeGraphQLResponse([
+        {oid: 'sha1', checks: [{name: 'CI / test', conclusion: 'FAILURE'}]},
+        {oid: 'sha2', checks: [{name: 'CI / test', conclusion: 'FAILURE'}]},
+      ])
+    const octokit = mockOctokitForCiFix()
+
+    // #when harvesting
+    const result = await harvestCiFixCandidates(octokit, 'fro-bot', '.github', new Date(), [pr], new Set(), ghExec)
+
+    // #then no candidate
+    expect(result.candidates).toHaveLength(0)
+  })
+
+  it('bare re-run: compareCommits returns empty files → candidate dropped', async () => {
+    // #given a PR with a transition but compareCommits returns no files (bare re-run)
+    const pr = makeMergedPrItem()
+    const ghExec: GhExecFn = () =>
+      makeGraphQLResponse([
+        {oid: 'sha1', checks: [{name: 'CI / test', conclusion: 'FAILURE'}]},
+        {oid: 'sha2', checks: [{name: 'CI / test', conclusion: 'SUCCESS'}]},
+      ])
+    const octokit = mockOctokitForCiFix({
+      compareCommits: async () => ({data: {files: []}}),
+    })
+
+    // #when harvesting
+    const result = await harvestCiFixCandidates(octokit, 'fro-bot', '.github', new Date(), [pr], new Set(), ghExec)
+
+    // #then candidate is dropped (no real fixing diff)
+    expect(result.candidates).toHaveLength(0)
+  })
+
+  it('logs purged: downloadJobLogsForWorkflowRun throws → logExcerpt placeholder, candidate still emitted', async () => {
+    // #given a PR with a transition and a log fetch that throws
+    const pr = makeMergedPrItem()
+    const ghExec: GhExecFn = () =>
+      makeGraphQLResponse([
+        {oid: 'sha1', checks: [{name: 'CI / test', conclusion: 'FAILURE'}]},
+        {oid: 'sha2', checks: [{name: 'CI / test', conclusion: 'SUCCESS'}]},
+      ])
+    const octokit = mockOctokitForCiFix({
+      downloadJobLogsForWorkflowRun: async () => {
+        throw Object.assign(new Error('Not Found'), {status: 404})
+      },
+    })
+
+    // #when harvesting
+    const result = await harvestCiFixCandidates(octokit, 'fro-bot', '.github', new Date(), [pr], new Set(), ghExec)
+
+    // #then candidate is still emitted (log failure doesn't block)
+    expect(result.candidates).toHaveLength(1)
+    // #then logExcerpt is the placeholder
+    expect(result.candidates[0]?.logExcerpt).toBe('[failure log purged or unavailable]')
+    // #then diffExcerpt is still populated
+    expect(result.candidates[0]?.diffExcerpt).toContain('-bad')
+  })
+
+  it('GraphQL error for one PR degrades it, others proceed', async () => {
+    // #given two PRs: first GraphQL call throws, second succeeds
+    const pr1 = makeMergedPrItem({number: 1, merge_commit_sha: 'merge001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'})
+    const pr2 = makeMergedPrItem({number: 2, merge_commit_sha: 'merge002aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'})
+
+    let callCount = 0
+    const ghExec: GhExecFn = () => {
+      callCount++
+      if (callCount === 1) throw new Error('GraphQL error')
+      return makeGraphQLResponse([
+        {oid: 'sha1', checks: [{name: 'CI / test', conclusion: 'FAILURE'}]},
+        {oid: 'sha2', checks: [{name: 'CI / test', conclusion: 'SUCCESS'}]},
+      ])
+    }
+    const octokit = mockOctokitForCiFix()
+
+    // #when harvesting
+    const result = await harvestCiFixCandidates(
+      octokit,
+      'fro-bot',
+      '.github',
+      new Date(),
+      [pr1, pr2],
+      new Set(),
+      ghExec,
+    )
+
+    // #then only the second PR produces a candidate (first was degraded)
+    expect(result.candidates).toHaveLength(1)
+    expect(result.candidates[0]?.mergeSha).toBe('merge002aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1')
+    expect(result.ciFixPrsExamined).toBe(2)
+    expect(result.ciFixCandidates).toBe(1)
+  })
+
+  it('PR with null statusCheckRollup commits → no candidate', async () => {
+    // #given a PR where commits have no statusCheckRollup (null)
+    const pr = makeMergedPrItem()
+    const ghExec: GhExecFn = () =>
+      JSON.stringify({
+        data: {
+          repository: {
+            pullRequest: {
+              commits: {
+                pageInfo: {hasNextPage: false, endCursor: null},
+                nodes: [
+                  {commit: {oid: 'sha1', statusCheckRollup: null}},
+                  {commit: {oid: 'sha2', statusCheckRollup: null}},
+                ],
+              },
+            },
+          },
+        },
+      })
+    const octokit = mockOctokitForCiFix()
+
+    // #when harvesting
+    const result = await harvestCiFixCandidates(octokit, 'fro-bot', '.github', new Date(), [pr], new Set(), ghExec)
+
+    // #then no candidate (no check data to find a transition)
+    expect(result.candidates).toHaveLength(0)
+  })
+
+  it('required-checks empty → any transition counts', async () => {
+    // #given an empty required set and a PR with a transition on any check
+    const pr = makeMergedPrItem()
+    const ghExec: GhExecFn = () =>
+      makeGraphQLResponse([
+        {oid: 'sha1', checks: [{name: 'some-arbitrary-check', conclusion: 'FAILURE'}]},
+        {oid: 'sha2', checks: [{name: 'some-arbitrary-check', conclusion: 'SUCCESS'}]},
+      ])
+    const octokit = mockOctokitForCiFix()
+
+    // #when harvesting with empty required set
+    const result = await harvestCiFixCandidates(octokit, 'fro-bot', '.github', new Date(), [pr], new Set(), ghExec)
+
+    // #then the candidate is found (any check counts)
+    expect(result.candidates).toHaveLength(1)
+    expect(result.candidates[0]?.failingCheckName).toBe('some-arbitrary-check')
+  })
+})
+
+// ---------------------------------------------------------------------------
+// Privacy: CiFix evidence scanned in buildCandidateDigest (mutation proof)
+// ---------------------------------------------------------------------------
+
+describe('buildCandidateDigest — CiFix evidence privacy scan (mutation proof)', () => {
+  it('MUTATION PROOF: CiFix candidate with a ghp_ secret in diffExcerpt → evidence cleared, enrichmentBlockedBySecret incremented', () => {
+    // #given a ci-fix candidate whose diffExcerpt contains a GitHub PAT
+    // This test proves the secret scan is wired for CiFix evidence.
+    const sha = 'cifixsec1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const secretDiff =
+      '--- a/config.ts\n+++ b/config.ts\n@@ -1 +1 @@\n-old\n+token=ghp_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1'
+    const candidate = makeCiFixCandidate({
+      mergeSha: sha,
+      diffExcerpt: secretDiff,
+    })
+    const input = makeDigestInput({
+      mergedPrs: [candidate],
+      privateTokens: new Set(), // no private-name tokens — only secret scan fires
+    })
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then the candidate is kept (not dropped)
+    expect(result.candidates).toHaveLength(1)
+    // #then diffExcerpt is cleared (secret blocked)
+    if (result.candidates[0]?.trigger === 'ci-fail-then-pass') {
+      expect(result.candidates[0].diffExcerpt).toBe('')
+      expect(result.candidates[0].logExcerpt).toBeUndefined()
+    }
+    // #then enrichmentBlockedBySecret is incremented
+    expect(result.telemetry.enrichmentBlockedBySecret).toBe(1)
+    // #then the secret does NOT appear in the serialized digest
+    const serialized = JSON.stringify(result)
+    expect(serialized).not.toContain('ghp_')
+  })
+
+  it('MUTATION PROOF: removing the CiFix secret scan lets the ghp_ token reach the digest', () => {
+    // #given a ci-fix candidate with a PAT in diffExcerpt
+    const sha = 'cifixsec2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const secretDiff = 'token=ghp_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2'
+    const candidate = makeCiFixCandidate({mergeSha: sha, diffExcerpt: secretDiff})
+
+    // #when the scan IS applied (normal path)
+    const withScan = buildCandidateDigest(makeDigestInput({mergedPrs: [candidate], privateTokens: new Set()}))
+    // #then the secret is NOT in the output
+    expect(JSON.stringify(withScan)).not.toContain('ghp_')
+    expect(withScan.telemetry.enrichmentBlockedBySecret).toBe(1)
+
+    // #when a clean candidate is used (simulating scan bypass)
+    const cleanCandidate = makeCiFixCandidate({mergeSha: sha, diffExcerpt: '-bad\n+good'})
+    const withoutSecret = buildCandidateDigest(makeDigestInput({mergedPrs: [cleanCandidate], privateTokens: new Set()}))
+    // #then the clean diff DOES appear — proving the scan only blocks secrets
+    if (withoutSecret.candidates[0]?.trigger === 'ci-fail-then-pass') {
+      expect(withoutSecret.candidates[0].diffExcerpt).toContain('-bad')
+    }
+    expect(withoutSecret.telemetry.enrichmentBlockedBySecret).toBe(0)
+  })
+
+  it('CiFix candidate with private-name in diffExcerpt → evidence cleared, enrichmentBlocked incremented', () => {
+    // #given a ci-fix candidate whose diffExcerpt contains a private repo name
+    const privateTokens = buildPrivateTokenSet(['testowner/secret-repo'])
+    const sha = 'cifixpriv1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const candidate = makeCiFixCandidate({
+      mergeSha: sha,
+      diffExcerpt: 'See testowner/secret-repo for the fix.',
+    })
+    const input = makeDigestInput({mergedPrs: [candidate], privateTokens})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then enrichmentBlocked is incremented (private-name counter)
+    expect(result.telemetry.enrichmentBlocked).toBe(1)
+    expect(result.telemetry.enrichmentBlockedBySecret).toBe(0)
+    // #then diffExcerpt is cleared
+    if (result.candidates[0]?.trigger === 'ci-fail-then-pass') {
+      expect(result.candidates[0].diffExcerpt).toBe('')
+    }
+    // #then private name does not appear in serialized output
+    expect(JSON.stringify(result)).not.toContain('testowner/secret-repo')
+  })
+
+  it('CiFix candidate with path in diffExcerpt → path redacted, candidate still emitted', () => {
+    // #given a ci-fix candidate whose diffExcerpt contains a file path (redact-class)
+    const sha = 'cifixpath1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const candidate = makeCiFixCandidate({
+      mergeSha: sha,
+      diffExcerpt: 'Config loaded from /Users/marcus/.ssh/config for the build.',
+    })
+    const input = makeDigestInput({mergedPrs: [candidate], privateTokens: new Set()})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then the candidate is emitted (not blocked)
+    expect(result.candidates).toHaveLength(1)
+    // #then the path is redacted in the output
+    if (result.candidates[0]?.trigger === 'ci-fail-then-pass') {
+      expect(result.candidates[0].diffExcerpt).not.toContain('/Users/marcus')
+      expect(result.candidates[0].diffExcerpt).toContain('[REDACTED]')
+    }
+    // #then neither counter is incremented (redaction, not blocking)
+    expect(result.telemetry.enrichmentBlocked).toBe(0)
+    expect(result.telemetry.enrichmentBlockedBySecret).toBe(0)
+  })
+
+  it('CiFix candidate with secret in logExcerpt → evidence cleared', () => {
+    // #given a ci-fix candidate whose logExcerpt contains a GitHub PAT
+    const sha = 'cifixlog1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const candidate = makeCiFixCandidate({
+      mergeSha: sha,
+      diffExcerpt: '-bad\n+good',
+      logExcerpt: 'Error: auth failed with token ghp_AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA3',
+    })
+    const input = makeDigestInput({mergedPrs: [candidate], privateTokens: new Set()})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then enrichmentBlockedBySecret is incremented
+    expect(result.telemetry.enrichmentBlockedBySecret).toBe(1)
+    // #then both diffExcerpt and logExcerpt are cleared
+    if (result.candidates[0]?.trigger === 'ci-fail-then-pass') {
+      expect(result.candidates[0].diffExcerpt).toBe('')
+      expect(result.candidates[0].logExcerpt).toBeUndefined()
+    }
+    // #then the secret does not appear in serialized output
+    expect(JSON.stringify(result)).not.toContain('ghp_')
+  })
+})
diff --git a/scripts/capture-learnings-harvest.ts b/scripts/capture-learnings-harvest.ts
index e1d097888..c3d47ef8a 100644
--- a/scripts/capture-learnings-harvest.ts
+++ b/scripts/capture-learnings-harvest.ts
@@ -11,6 +11,7 @@
  */
 
 import type {Dirent} from 'node:fs'
+import {execFileSync} from 'node:child_process'
 import {readdir, readFile, writeFile} from 'node:fs/promises'
 import process from 'node:process'
 import {Octokit} from '@octokit/rest'
@@ -157,20 +158,31 @@ export interface ReviewCandidate {
 /**
  * A candidate sourced from a PR whose CI checks transitioned failed → passed before merge.
  *
- * Evidence fields (failingCheckName, lastFailingSha, firstPassingSha, diffExcerpt,
- * logExcerpt) are populated in Unit 3. This placeholder carries the shared fields
- * so the union type and digest pipeline are ready before the harvester is implemented.
+ * Evidence fields carry the fixing diff (primary signal, always available) and a
+ * best-effort failing-log excerpt. Both are privacy-scanned before reaching the digest.
  */
 export interface CiFixCandidate {
   trigger: 'ci-fail-then-pass'
   mergeSha: string
   signals: CandidateSignals
-  // Evidence fields populated in Unit 3:
-  // failingCheckName: string
-  // lastFailingSha: string
-  // firstPassingSha: string
-  // diffExcerpt: string
-  // logExcerpt?: string
+  /** Name of the check that transitioned failed → passed. */
+  failingCheckName: string
+  /** SHA of the last commit where the check was in a failing conclusion. */
+  lastFailingSha: string
+  /** SHA of the first commit after lastFailingSha where the check is SUCCESS. */
+  firstPassingSha: string
+  /**
+   * Diff excerpt between lastFailingSha and firstPassingSha, truncated to
+   * MAX_EXCERPT_CHARS_PER_CANDIDATE. Empty string when no diff was available
+   * (bare re-run candidates are dropped before this point).
+   */
+  diffExcerpt: string
+  /**
+   * Best-effort excerpt from the failing job log, ranked toward error lines.
+   * '[failure log purged or unavailable]' when the log could not be fetched
+   * (404/410 purged, no run found, or any other error).
+   */
+  logExcerpt?: string
 }
 
 /**
@@ -182,6 +194,10 @@ export interface HarvestStageCounts {
   mergedPrsInLookback: number
   excludedAutomation: number
   multiRoundCandidates: number
+  /** Number of PRs examined for CI fail→pass transition. */
+  ciFixPrsExamined: number
+  /** Number of CI-fix candidates found (transition detected + real diff). */
+  ciFixCandidates: number
 }
 
 /** Counts-only telemetry returned by the pure core + harvest stage. */
@@ -354,7 +370,48 @@ export function buildCandidateDigest(input: BuildCandidateDigestInput): Candidat
       return {...pr, reviewExcerpts: redactedExcerpts}
     }
 
-    // CiFixCandidate: no evidence fields yet (Unit 3 placeholder)
+    // CiFixCandidate: scan and redact diffExcerpt + logExcerpt (R3/R3a).
+    //
+    // Privacy-ordering invariant: diffExcerpt and logExcerpt are already truncated
+    // to budget in the I/O shell before this pure core is called. Never move
+    // truncation after the scan.
+    //
+    // Steps:
+    // 1. Redact structural secrets (paths, hostnames, Bearer tokens) from both fields.
+    // 2. If private-name leak OR residual hard secret after redaction → clear both
+    //    evidence fields (drop enriched content, keep candidate title-only) and
+    //    increment the appropriate counter.
+    // 3. Otherwise use the redacted values.
+    if (pr.trigger === 'ci-fail-then-pass') {
+      const rawDiff = pr.diffExcerpt
+      const rawLog = pr.logExcerpt ?? ''
+
+      // Step 1: redact structural secrets
+      const redactedDiff = redactLogDiffSecrets(rawDiff)
+      const redactedLog = rawLog === '' ? rawLog : redactLogDiffSecrets(rawLog)
+      const combinedText = `${redactedDiff}\n${redactedLog}`
+
+      // Step 2: check for private-name leak or residual hard secret
+      const hasPrivateLeak = learningBodyHasPrivateLeak(combinedText, input.privateTokens)
+      const hasResidualSecret = logDiffHasSecret(combinedText)
+
+      if (hasPrivateLeak) {
+        enrichmentBlocked++
+        return {...pr, diffExcerpt: '', logExcerpt: undefined}
+      }
+      if (hasResidualSecret) {
+        enrichmentBlockedBySecret++
+        return {...pr, diffExcerpt: '', logExcerpt: undefined}
+      }
+
+      // Step 3: use the redacted values
+      const result: CiFixCandidate = {...pr, diffExcerpt: redactedDiff}
+      if (redactedLog !== '') {
+        return {...result, logExcerpt: redactedLog}
+      }
+      return result
+    }
+
     return pr
   })
 
@@ -376,8 +433,8 @@ export function buildCandidateDigest(input: BuildCandidateDigestInput): Candidat
  *
  * When `scanAvailable` is false (private token load failed), all enriched evidence
  * is cleared so no unscanned content reaches the digest. For ReviewCandidate, this
- * clears `reviewExcerpts`. For CiFixCandidate, no evidence fields exist yet (Unit 3).
- * When `scanAvailable` is true, candidates are returned unchanged.
+ * clears `reviewExcerpts`. For CiFixCandidate, this clears `diffExcerpt` and
+ * `logExcerpt`. When `scanAvailable` is true, candidates are returned unchanged.
  *
  * Pure function: no I/O, fully unit-testable. Extracted so the fail-closed
  * composition can be tested independently of the I/O shell.
@@ -388,7 +445,9 @@ export function applyEnrichmentScanAvailability(candidates: Candidate[], scanAva
     if (c.trigger === 'review-heavy') {
       return {...c, reviewExcerpts: [] as string[]}
     }
-    // CiFixCandidate: no evidence fields yet (Unit 3 placeholder)
+    if (c.trigger === 'ci-fail-then-pass') {
+      return {...c, diffExcerpt: '', logExcerpt: undefined}
+    }
     return c
   })
 }
@@ -660,6 +719,484 @@ function buildReviewExcerpts(reviewBodies: {state: string; body: string}[], thre
   return excerpts
 }
 
+// ---------------------------------------------------------------------------
+// CI fail→pass transition detection (pure, exported for testing)
+// ---------------------------------------------------------------------------
+
+/**
+ * Per-commit per-check conclusion entry, as parsed from the GraphQL statusCheckRollup.
+ * Supports both CheckRun (name + conclusion) and StatusContext (context + state).
+ */
+export type CommitCheckEntry =
+  | {type: 'CheckRun'; sha: string; name: string; conclusion: string}
+  | {type: 'StatusContext'; sha: string; context: string; state: string}
+
+/**
+ * Result of a successful fail→pass transition detection.
+ */
+export interface FailPassTransition {
+  failingCheckName: string
+  lastFailingSha: string
+  firstPassingSha: string
+}
+
+/**
+ * Failing conclusions for CheckRun (GitHub Actions).
+ * These are the states that count as "failed" for transition detection.
+ */
+const FAILING_CHECK_CONCLUSIONS = new Set(['FAILURE', 'TIMED_OUT', 'CANCELLED', 'ACTION_REQUIRED', 'STARTUP_FAILURE'])
+
+/**
+ * Find the first fail→pass transition across a set of commits for required checks.
+ *
+ * Takes commits ordered OLDEST→NEWEST (chronological ascending). For each required
+ * check name, finds:
+ *   - lastFailingSha: the LATEST commit where the check has a failing conclusion
+ *   - firstPassingSha: the FIRST commit AFTER lastFailingSha where the check is SUCCESS
+ *
+ * Returns the first such transition found (iterating required checks in insertion order),
+ * or null if no transition exists.
+ *
+ * When requiredCheckNames is empty, treats ALL checks as required (any failed→passed
+ * transition counts). This handles repos with no branch protection configured.
+ *
+ * CheckRun matches by `name`; StatusContext matches by `context`.
+ *
+ * Pure function: no I/O, fully unit-testable. The ordering invariant (oldest→newest)
+ * is the correctness core — reversing the order would pick the wrong SHA.
+ */
+export function findFailPassTransition(
+  commits: CommitCheckEntry[],
+  requiredCheckNames: Set<string>,
+): FailPassTransition | null {
+  // Collect all check names present in the commits
+  const allCheckNames = new Set<string>()
+  for (const entry of commits) {
+    const name = entry.type === 'CheckRun' ? entry.name : entry.context
+    allCheckNames.add(name)
+  }
+
+  // Determine which check names to evaluate
+  const checkNamesToEvaluate = requiredCheckNames.size > 0 ? requiredCheckNames : allCheckNames
+
+  for (const checkName of checkNamesToEvaluate) {
+    // Walk commits oldest→newest to find lastFailingSha and firstPassingSha
+    let lastFailingSha: string | null = null
+
+    for (const entry of commits) {
+      const entryName = entry.type === 'CheckRun' ? entry.name : entry.context
+      if (entryName !== checkName) continue
+
+      if (entry.type === 'CheckRun') {
+        if (FAILING_CHECK_CONCLUSIONS.has(entry.conclusion)) {
+          // Update lastFailingSha — we want the LATEST failing commit
+          lastFailingSha = entry.sha
+        }
+      } else if (entry.state === 'FAILURE') {
+        // StatusContext: FAILURE state counts as failing
+        lastFailingSha = entry.sha
+      }
+    }
+
+    if (lastFailingSha === null) continue
+
+    // Now find the FIRST commit AFTER lastFailingSha where the check is SUCCESS
+    let foundFailing = false
+    for (const entry of commits) {
+      const entryName = entry.type === 'CheckRun' ? entry.name : entry.context
+      if (entryName !== checkName) continue
+
+      if (entry.sha === lastFailingSha) {
+        foundFailing = true
+        continue
+      }
+
+      if (!foundFailing) continue
+
+      // We're past the lastFailingSha — look for first SUCCESS
+      const isSuccess = entry.type === 'CheckRun' ? entry.conclusion === 'SUCCESS' : entry.state === 'SUCCESS'
+
+      if (isSuccess) {
+        return {
+          failingCheckName: checkName,
+          lastFailingSha,
+          firstPassingSha: entry.sha,
+        }
+      }
+    }
+  }
+
+  return null
+}
+
+// ---------------------------------------------------------------------------
+// GraphQL commit/check rollup fetch (I/O)
+// ---------------------------------------------------------------------------
+
+/**
+ * GraphQL query to fetch per-commit per-check conclusions for a PR.
+ * Returns commits in chronological order (oldest→newest for commits(first:N)).
+ * Pagination: cursor walk for PRs with >100 commits.
+ */
+const PR_COMMITS_ROLLUP_QUERY = `
+query($owner:String!,$repo:String!,$number:Int!,$after:String){
+  repository(owner:$owner,name:$repo){
+    pullRequest(number:$number){
+      commits(first:100,after:$after){
+        pageInfo{ hasNextPage endCursor }
+        nodes{
+          commit{
+            oid
+            statusCheckRollup{
+              contexts(first:100){
+                nodes{
+                  __typename
+                  ... on CheckRun{ name conclusion }
+                  ... on StatusContext{ context state }
+                }
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+}
+`.trim()
+
+/**
+ * Injectable type for the gh execFileSync call (for testability).
+ * Matches the signature used in private-repo-resolution.ts.
+ */
+export type GhExecFn = (args: string[], env?: NodeJS.ProcessEnv) => string
+
+/**
+ * Default gh exec implementation using execFileSync.
+ * Follows the pattern from private-repo-resolution.ts.
+ */
+export function defaultGhExec(args: string[], env?: NodeJS.ProcessEnv): string {
+  return execFileSync('gh', args, {
+    encoding: 'utf8',
+    env: env ?? process.env,
+    stdio: ['pipe', 'pipe', 'pipe'],
+  })
+}
+
+/**
+ * Fetch per-commit per-check conclusions for a PR via GraphQL.
+ * Walks cursor pages for PRs with >100 commits.
+ * Returns commits in chronological order (oldest→newest).
+ * Returns null on any error (caller degrades the PR).
+ */
+export function fetchPrCommitCheckRollup(
+  owner: string,
+  repo: string,
+  prNumber: number,
+  token: string | undefined,
+  ghExec: GhExecFn = defaultGhExec,
+): CommitCheckEntry[] | null {
+  const env: NodeJS.ProcessEnv = {...process.env}
+  if (token !== undefined && token !== '') {
+    env.GH_TOKEN = token
+    delete env.GITHUB_TOKEN
+  }
+
+  const allEntries: CommitCheckEntry[] = []
+  let after: string | null = null
+
+  // Cursor walk — handles PRs with >100 commits
+  for (;;) {
+    const args = [
+      'api',
+      'graphql',
+      '-f',
+      `query=${PR_COMMITS_ROLLUP_QUERY}`,
+      '-F',
+      `owner=${owner}`,
+      '-F',
+      `repo=${repo}`,
+      '-F',
+      `number=${prNumber}`,
+    ]
+    if (after !== null) {
+      args.push('-f', `after=${after}`)
+    }
+
+    let stdout: string
+    try {
+      stdout = ghExec(args, env)
+    } catch {
+      return null
+    }
+
+    let parsed: unknown
+    try {
+      parsed = JSON.parse(stdout)
+    } catch {
+      return null
+    }
+
+    if (!isRecord(parsed)) return null
+    const data = parsed.data
+    if (!isRecord(data)) return null
+    const repository = data.repository
+    if (!isRecord(repository)) return null
+    const pullRequest = repository.pullRequest
+    if (!isRecord(pullRequest)) return null
+    const commits = pullRequest.commits
+    if (!isRecord(commits)) return null
+    const nodes = commits.nodes
+    if (!Array.isArray(nodes)) return null
+
+    for (const node of nodes) {
+      if (!isRecord(node)) continue
+      const commit = node.commit
+      if (!isRecord(commit)) continue
+      const oid = commit.oid
+      if (typeof oid !== 'string') continue
+      const rollup = commit.statusCheckRollup
+      if (!isRecord(rollup)) continue
+      const contexts = rollup.contexts
+      if (!isRecord(contexts)) continue
+      const contextNodes = contexts.nodes
+      if (!Array.isArray(contextNodes)) continue
+
+      for (const ctx of contextNodes) {
+        if (!isRecord(ctx)) continue
+        const typename = ctx.__typename
+        if (typename === 'CheckRun') {
+          const name = ctx.name
+          const conclusion = ctx.conclusion
+          if (typeof name === 'string' && typeof conclusion === 'string') {
+            allEntries.push({type: 'CheckRun', sha: oid, name, conclusion})
+          }
+        } else if (typename === 'StatusContext') {
+          const context = ctx.context
+          const state = ctx.state
+          if (typeof context === 'string' && typeof state === 'string') {
+            allEntries.push({type: 'StatusContext', sha: oid, context, state})
+          }
+        }
+      }
+    }
+
+    // Check pagination
+    const pageInfo = commits.pageInfo
+    if (!isRecord(pageInfo)) break
+    const hasNextPage = pageInfo.hasNextPage
+    if (hasNextPage !== true) break
+    const endCursor = pageInfo.endCursor
+    if (typeof endCursor !== 'string') break
+    after = endCursor
+  }
+
+  return allEntries
+}
+
+// ---------------------------------------------------------------------------
+// Diff excerpt builder
+// ---------------------------------------------------------------------------
+
+/**
+ * Build a diff excerpt from compareCommits file patches, truncated to budget.
+ * Ranks toward changed hunks (lines starting with + or -).
+ * Returns empty string when no patches are available.
+ */
+function buildDiffExcerpt(files: {filename?: string; patch?: string}[], budget: number): string {
+  const parts: string[] = []
+  let remaining = budget
+
+  for (const file of files) {
+    if (remaining <= 0) break
+    const patch = file.patch
+    if (typeof patch !== 'string' || patch === '') continue
+    const header = `--- ${file.filename ?? 'unknown'}\n`
+    const chunk = header + patch
+    if (chunk.length <= remaining) {
+      parts.push(chunk)
+      remaining -= chunk.length
+    } else {
+      parts.push(chunk.slice(0, remaining))
+      remaining = 0
+    }
+  }
+
+  return parts.join('\n')
+}
+
+// ---------------------------------------------------------------------------
+// Log excerpt builder
+// ---------------------------------------------------------------------------
+
+/**
+ * Build a log excerpt from raw job log text, ranked toward error lines.
+ * Truncates to budget. Returns the excerpt string.
+ */
+function buildLogExcerpt(logText: string, budget: number): string {
+  const lines = logText.split('\n')
+
+  // Rank error lines first (lines containing 'error', 'Error', 'ERROR', 'FAILED', 'failed')
+  const errorLines: string[] = []
+  const otherLines: string[] = []
+
+  for (const line of lines) {
+    if (/error|failed/i.test(line)) {
+      errorLines.push(line)
+    } else {
+      otherLines.push(line)
+    }
+  }
+
+  const ranked = [...errorLines, ...otherLines]
+  const joined = ranked.join('\n')
+  return joined.slice(0, budget)
+}
+
+// ---------------------------------------------------------------------------
+// CI fix harvester (I/O shell)
+// ---------------------------------------------------------------------------
+
+/**
+ * Fetch merged PRs in the lookback window and detect CI fail→pass transitions.
+ *
+ * For each merged PR (reusing the same closed-PR list pattern as harvestCandidates):
+ * 1. Run the GraphQL commits/rollup query to get per-commit per-check conclusions.
+ * 2. Walk commits oldest→newest to find lastFailingSha/firstPassingSha for a required check.
+ * 3. compareCommits for the fixing diff (drop if no diff — bare re-run).
+ * 4. Best-effort downloadJobLogsForWorkflowRun for the failed job log excerpt.
+ * 5. Build CiFixCandidate with deriveSignals for the signals field.
+ *
+ * Per-PR errors (GraphQL/compare) degrade that PR (skip), never abort.
+ * Log fetch errors degrade to '[failure log purged or unavailable]' placeholder.
+ *
+ * Injectables: ghExec + octokit + now + mergedPrs (for testability).
+ * Counts-only telemetry — no owner/repo/name in any log.
+ */
+export async function harvestCiFixCandidates(
+  octokit: OctokitClient,
+  owner: string,
+  repo: string,
+  _now: Date,
+  mergedPrs: {
+    number: number
+    merge_commit_sha: string
+    title: string
+    labels: {name: string}[]
+    user: {login: string} | null
+  }[],
+  requiredCheckNames: Set<string>,
+  ghExec: GhExecFn = defaultGhExec,
+): Promise<{candidates: CiFixCandidate[]; ciFixPrsExamined: number; ciFixCandidates: number}> {
+  const token = process.env.GITHUB_TOKEN ?? process.env.GH_TOKEN
+
+  let ciFixPrsExamined = 0
+  let ciFixCandidatesCount = 0
+  const candidates: CiFixCandidate[] = []
+
+  for (const pr of mergedPrs) {
+    ciFixPrsExamined++
+
+    // Step 1: fetch per-commit per-check conclusions via GraphQL
+    const entries = fetchPrCommitCheckRollup(owner, repo, pr.number, token, ghExec)
+    if (entries === null) {
+      // GraphQL error — degrade this PR, continue
+      process.stderr.write(`capture-learnings-harvest: ci-fix GraphQL failed (pr=${ciFixPrsExamined})\n`)
+      continue
+    }
+
+    // Step 2: find fail→pass transition
+    const transition = findFailPassTransition(entries, requiredCheckNames)
+    if (transition === null) {
+      // No transition — not a CI-fix candidate
+      continue
+    }
+
+    const {failingCheckName, lastFailingSha, firstPassingSha} = transition
+
+    // Step 3: compareCommits for the fixing diff
+    let diffExcerpt: string
+    try {
+      const compareResult = await octokit.rest.repos.compareCommits({
+        owner,
+        repo,
+        base: lastFailingSha,
+        head: firstPassingSha,
+      })
+      const files = compareResult.data.files ?? []
+      if (files.length === 0) {
+        // Bare re-run with no diff — drop the candidate (scope boundary)
+        continue
+      }
+      diffExcerpt = buildDiffExcerpt(files as {filename?: string; patch?: string}[], MAX_EXCERPT_CHARS_PER_CANDIDATE)
+      if (diffExcerpt === '') {
+        // No patches in any file — drop
+        continue
+      }
+    } catch {
+      // compareCommits error — degrade this PR
+      process.stderr.write(`capture-learnings-harvest: ci-fix compareCommits failed (pr=${ciFixPrsExamined})\n`)
+      continue
+    }
+
+    // Step 4: best-effort log fetch
+    let logExcerpt: string | undefined
+    try {
+      const runsResult = await octokit.rest.actions.listWorkflowRunsForRepo({
+        owner,
+        repo,
+        head_sha: lastFailingSha,
+        status: 'completed',
+        per_page: 10,
+      })
+      const failedRun = runsResult.data.workflow_runs.find(r => r.conclusion === 'failure')
+      if (failedRun !== undefined) {
+        const jobsResult = await octokit.rest.actions.listJobsForWorkflowRun({
+          owner,
+          repo,
+          run_id: failedRun.id,
+          per_page: 50,
+        })
+        const failedJob = jobsResult.data.jobs.find(j => j.conclusion === 'failure')
+        if (failedJob !== undefined) {
+          const logResponse = await octokit.rest.actions.downloadJobLogsForWorkflowRun({
+            owner,
+            repo,
+            job_id: failedJob.id,
+          })
+          const logText = typeof logResponse.data === 'string' ? logResponse.data : ''
+          if (logText !== '') {
+            logExcerpt = buildLogExcerpt(logText, MAX_EXCERPT_CHARS_PER_CANDIDATE)
+          }
+        }
+      }
+    } catch {
+      // Log fetch failed — degrade to placeholder, never block the candidate
+      logExcerpt = '[failure log purged or unavailable]'
+    }
+
+    // Step 5: build CiFixCandidate
+    const prLabels = pr.labels.map(l => l.name)
+    const candidate: CiFixCandidate = {
+      trigger: 'ci-fail-then-pass',
+      mergeSha: pr.merge_commit_sha,
+      signals: deriveSignals({
+        title: pr.title,
+        labels: prLabels.map(name => ({name})),
+      }),
+      failingCheckName,
+      lastFailingSha,
+      firstPassingSha,
+      diffExcerpt,
+      logExcerpt,
+    }
+
+    candidates.push(candidate)
+    ciFixCandidatesCount++
+  }
+
+  return {candidates, ciFixPrsExamined, ciFixCandidates: ciFixCandidatesCount}
+}
+
 /**
  * Fetch all merged PRs in the lookback window and return those where Fro Bot's
  * substantive reviews meet the predicate:
@@ -828,6 +1365,9 @@ export async function harvestCandidates(
       mergedPrsInLookback,
       excludedAutomation,
       multiRoundCandidates: candidates.length,
+      // CI-fix counts are populated by harvestCiFixCandidates separately
+      ciFixPrsExamined: 0,
+      ciFixCandidates: 0,
     },
   }
 }
diff --git a/scripts/capture-learnings-open.test.ts b/scripts/capture-learnings-open.test.ts
index 7cd27d514..848828c98 100644
--- a/scripts/capture-learnings-open.test.ts
+++ b/scripts/capture-learnings-open.test.ts
@@ -16,6 +16,7 @@ import {
   LEARNING_PROPOSAL_LABEL,
   type Candidate,
   type OctokitClient,
+  type ReviewCandidate,
 } from './capture-learnings-harvest.ts'
 import {
   ensureLabelsExist,
@@ -32,7 +33,7 @@ import {buildPrivateTokenSet} from './wiki-slug.ts'
 // Fixture helpers
 // ---------------------------------------------------------------------------
 
-function makeCandidate(overrides: Partial<Candidate> = {}): Candidate {
+function makeCandidate(overrides: Partial<ReviewCandidate> = {}): Candidate {
   return {
     trigger: 'review-heavy',
     mergeSha: 'abc123def456abc123def456abc123def456abc1',

From d8bd78a31323daf89ffbe83be5e4bf247c38e7a0 Mon Sep 17 00:00:00 2001
From: "Marcus R. Brown" <git@mrbro.dev>
Date: Tue, 23 Jun 2026 15:35:19 -0700
Subject: [PATCH 5/7] feat(capture): wire the failed-then-fixed source into the
 run and prompt

Fetch merged PRs once and feed both harvesters, resolve the required-check set
from branch protection (falling back to all checks when none are configured),
and concatenate both sources into the digest. The agent prompt now branches on
the trigger so a failed-then-fixed candidate is distilled from its fixing diff
and failure log, and the run summary reports the failed-then-fixed counts and the
secret-blocked count.
---
 .github/workflows/capture-learnings.yaml      |  66 +++--
 ...-feat-c2-failed-then-fixed-capture-plan.md |   8 +-
 scripts/capture-learnings-harvest.test.ts     | 235 ++++++++++++++++++
 scripts/capture-learnings-harvest.ts          | 143 ++++++++++-
 4 files changed, 424 insertions(+), 28 deletions(-)

diff --git a/.github/workflows/capture-learnings.yaml b/.github/workflows/capture-learnings.yaml
index 582cf31c0..73930507a 100644
--- a/.github/workflows/capture-learnings.yaml
+++ b/.github/workflows/capture-learnings.yaml
@@ -91,30 +91,63 @@ jobs:
           CAPTURE_LEARNINGS_DIGEST_PATH: ${{ runner.temp }}/capture-learnings-digest.json
           TASK_PROMPT: |
             You are authoring learning-proposal bodies. Each body distills a reusable
-            learning from a pull request that went through multiple substantive review rounds.
+            learning from a pull request — either one that went through multiple substantive
+            review rounds, or one whose CI checks transitioned failed → passed before merge.
 
             Your ONLY job in this step:
             1. Read the harvest digest from $CAPTURE_LEARNINGS_DIGEST_PATH (or equivalently
                $RUNNER_TEMP/capture-learnings-digest.json). The digest is a JSON object:
                { "candidates": [...], "telemetry": {...} }
-               Each candidate in the candidates array has:
+               Each candidate has a `trigger` field that determines its shape:
+
+               trigger: "review-heavy" candidates have:
                - mergeSha: opaque merge commit SHA (the ONLY identifier you receive)
-               - reviewRounds: number of substantive Fro Bot review rounds (approved, changes-requested, or dismissed — a dismissed approval marks a re-review forced by a new push)
-               - reviewExcerpts: string[] — the actual review prose from the rounds (review bodies and line-level review comments), ranked with the correction feedback first. This is your PRIMARY signal: it is what the reviewer actually said about what was wrong and how it was fixed. It has been privacy-scanned; use it as given. May be empty for some candidates.
+               - reviewRounds: number of substantive Fro Bot review rounds (approved,
+                 changes-requested, or dismissed — a dismissed approval marks a re-review
+                 forced by a new push)
+               - reviewExcerpts: string[] — the actual review prose from the rounds (review
+                 bodies and line-level review comments), ranked with the correction feedback
+                 first. This is your PRIMARY signal: it is what the reviewer actually said
+                 about what was wrong and how it was fixed. It has been privacy-scanned; use
+                 it as given. May be empty for some candidates.
                - signals: { titleTokens: string[], labels: string[] } — secondary context only
-            2. For each candidate you judge worth proposing (reviewRounds ≥ 2 is
-               already guaranteed), write a concise learning-proposal body (2–4
-               paragraphs) that:
-               - Distills the reusable learning from reviewExcerpts — what the
-                 review rounds actually said was wrong and how it was corrected.
-                 Ground the learning in that prose, not in the title tokens. When
-                 reviewExcerpts is empty, fall back to signals but keep the body
-                 modest and clearly general rather than inventing specifics.
-               - References the source PR by merge SHA ONLY (never look up or
-                 include owner/repo/PR number/title/author — you do not have that
-                 information and must not attempt to retrieve it)
+
+               trigger: "ci-fail-then-pass" candidates have:
+               - mergeSha: opaque merge commit SHA (the ONLY identifier you receive)
+               - failingCheckName: name of the CI check that transitioned failed → passed
+               - diffExcerpt: string — the fixing diff between the last-failing and
+                 first-passing commit. This is your PRIMARY signal: it shows exactly what
+                 change turned the check green. It has been privacy-scanned; use it as given.
+                 May be empty if evidence was blocked by the privacy scan.
+               - logExcerpt: string (optional) — excerpt from the failing job log, ranked
+                 toward error lines. May be '[failure log purged or unavailable]' when the
+                 log could not be fetched. Secondary signal — the diff is primary.
+               - signals: { titleTokens: string[], labels: string[] } — tertiary context only
+
+            2. For each candidate you judge worth proposing, write a concise learning-proposal
+               body (2–4 paragraphs) that:
+
+               For trigger: "review-heavy" —
+               - Distills the reusable learning from reviewExcerpts — what the review rounds
+                 actually said was wrong and how it was corrected. Ground the learning in that
+                 prose, not in the title tokens. When reviewExcerpts is empty, fall back to
+                 signals but keep the body modest and clearly general rather than inventing
+                 specifics.
+
+               For trigger: "ci-fail-then-pass" —
+               - Distills the learning as "this failure → this fix": what the failing check
+                 caught (from failingCheckName + logExcerpt) and how diffExcerpt corrected it.
+                 Ground the learning in diffExcerpt — that is the concrete evidence of what
+                 changed. Do NOT invent specifics from the title tokens. When diffExcerpt is
+                 empty (evidence blocked), keep the body modest and clearly general.
+
+               For all triggers —
+               - References the source PR by merge SHA ONLY (never look up or include
+                 owner/repo/PR number/title/author — you do not have that information and
+                 must not attempt to retrieve it)
                - Proposes what the learning would be
                - Is suitable for a human to review and author into docs/solutions/
+
             3. Write the bodies as a JSON object to $RUNNER_TEMP/capture-learnings-bodies.json:
                { "<mergeSha>": "<body text>", ... }
                Include only candidates you judge worth proposing. Omit candidates
@@ -172,10 +205,13 @@ jobs:
               echo "| Merged PRs in lookback | $(jq '.telemetry.mergedPrsInLookback // 0' "$harvest") |"
               echo "| Excluded automation | $(jq '.telemetry.excludedAutomation // 0' "$harvest") |"
               echo "| Multi-round candidates | $(jq '.telemetry.multiRoundCandidates // 0' "$harvest") |"
+              echo "| CI fail-then-pass examined | $(jq '.telemetry.ciFixPrsExamined // 0' "$harvest") |"
+              echo "| CI fail-then-pass candidates | $(jq '.telemetry.ciFixCandidates // 0' "$harvest") |"
               echo "| After seen dedup | $(jq '.telemetry.afterSeenDedup // 0' "$harvest") |"
               echo "| After solutions dedup | $(jq '.telemetry.afterSolutionsDedup // 0' "$harvest") |"
               echo "| Candidates emitted | $(jq '.telemetry.emitted // 0' "$harvest") |"
               echo "| Enrichment blocked by privacy | $(jq '.telemetry.enrichmentBlocked // 0' "$harvest") |"
+              echo "| Enrichment blocked by secret | $(jq '.telemetry.enrichmentBlockedBySecret // 0' "$harvest") |"
             else
               echo "| Closed PRs fetched | (harvest result unavailable) |"
             fi
diff --git a/docs/plans/2026-06-22-004-feat-c2-failed-then-fixed-capture-plan.md b/docs/plans/2026-06-22-004-feat-c2-failed-then-fixed-capture-plan.md
index 9ca0fee82..93702c8a0 100644
--- a/docs/plans/2026-06-22-004-feat-c2-failed-then-fixed-capture-plan.md
+++ b/docs/plans/2026-06-22-004-feat-c2-failed-then-fixed-capture-plan.md
@@ -164,7 +164,7 @@ open step — body privacy scan (private-name + secret block) → learning-propo
 
 ## Implementation Units
 
-- [ ] **Unit 1: Secret/redaction scan in the shared privacy module**
+- [x] **Unit 1: Secret/redaction scan in the shared privacy module**
 
 **Goal:** Add log/diff secret detection + structural redaction to the shared gate, callable from
 both chokepoints, without touching the existing private-repo-name scan.
@@ -201,7 +201,7 @@ both chokepoints, without touching the existing private-repo-name scan.
 **Verification:** gates green; the mutation-proof test bites; no secret value appears in any
 test assertion message.
 
-- [ ] **Unit 2: Discriminated Candidate union + generalized digest core**
+- [x] **Unit 2: Discriminated Candidate union + generalized digest core**
 
 **Goal:** Refactor `Candidate` into a `trigger`-discriminated union with per-source `evidence`,
 and generalize `buildCandidateDigest` to iterate the union uniformly (dedup, cap, privacy), with
@@ -243,7 +243,7 @@ test; `buildReviewExcerpts` ordering invariant.
 **Verification:** gates green; review-source output unchanged; opacity holds for both triggers;
 the privacy mutation-proof bites.
 
-- [ ] **Unit 3: CI fail→pass harvester (GraphQL detector + diff/log evidence)**
+- [x] **Unit 3: CI fail→pass harvester (GraphQL detector + diff/log evidence)**
 
 **Goal:** Implement `harvestCiFixCandidates` — detect fail→pass PRs via one GraphQL query each,
 derive the fixing diff and best-effort log excerpt, and emit `CiFixCandidate`s.
@@ -287,7 +287,7 @@ the existing merged-PR fetch + `now` injection; `buildReviewExcerpts` truncate-t
 **Verification:** gates green; the transition walk is mutation-proven (a fixture where removing the
 oldest→newest ordering picks the wrong SHA fails); degrade paths don't abort.
 
-- [ ] **Unit 4: Workflow wiring + trigger-branched agent prompt**
+- [x] **Unit 4: Workflow wiring + trigger-branched agent prompt**
 
 **Goal:** Concatenate both candidate sources into the digest, branch the agent prompt on `trigger`,
 and surface per-trigger + secret-blocked counts in the step summary.
diff --git a/scripts/capture-learnings-harvest.test.ts b/scripts/capture-learnings-harvest.test.ts
index 7a934ffad..893da70d1 100644
--- a/scripts/capture-learnings-harvest.test.ts
+++ b/scripts/capture-learnings-harvest.test.ts
@@ -14,6 +14,7 @@ import {
   buildCandidateDigest,
   buildMergeShaMarker,
   DEPENDENCY_LABELS,
+  fetchMergedPrsInWindow,
   fetchOpenedLearningShas,
   findFailPassTransition,
   FRO_BOT_REVIEWER_LOGINS,
@@ -1492,6 +1493,8 @@ describe('harvest→open schema contract', () => {
         mergedPrsInLookback: 10,
         excludedAutomation: 1,
         multiRoundCandidates: 5,
+        ciFixPrsExamined: 0,
+        ciFixCandidates: 0,
         afterSeenDedup: 3,
         afterSolutionsDedup: 2,
         emitted: 2,
@@ -2736,3 +2739,235 @@ describe('buildCandidateDigest — CiFix evidence privacy scan (mutation proof)'
     expect(JSON.stringify(result)).not.toContain('ghp_')
   })
 })
+
+// ---------------------------------------------------------------------------
+// I/O shell: fetchMergedPrsInWindow (mocked Octokit)
+// ---------------------------------------------------------------------------
+
+function mockOctokitForFetch(prs: unknown[]): OctokitClient {
+  return {
+    paginate: async () => prs,
+    rest: {
+      pulls: {
+        list: async () => ({data: []}),
+        listReviews: async () => ({data: []}),
+      },
+      issues: {listForRepo: async () => ({data: []})},
+    },
+  } as unknown as OctokitClient
+}
+
+describe('fetchMergedPrsInWindow', () => {
+  it('returns merged PRs within the lookback window with required fields', async () => {
+    // #given a merged PR within the lookback window
+    const sha = 'merge001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const pr = {
+      number: 1,
+      merged_at: new Date().toISOString(),
+      merge_commit_sha: sha,
+      title: 'fix: correct the CI failure',
+      labels: [{name: 'ci'}],
+      user: {login: 'some-human'},
+    }
+    const octokit = mockOctokitForFetch([pr])
+
+    // #when fetching
+    const result = await fetchMergedPrsInWindow(octokit, 'fro-bot', '.github', new Date())
+
+    // #then the PR is returned with the required fields
+    expect(result).toHaveLength(1)
+    expect(result[0]?.merge_commit_sha).toBe(sha)
+    expect(result[0]?.number).toBe(1)
+    expect(result[0]?.title).toBe('fix: correct the CI failure')
+    expect(result[0]?.labels).toEqual([{name: 'ci'}])
+    expect(result[0]?.user).toEqual({login: 'some-human'})
+    expect(result[0]?.merged_at).toBeDefined()
+  })
+
+  it('excludes unmerged PRs (merged_at === null)', async () => {
+    // #given a closed but unmerged PR
+    const pr = {
+      number: 1,
+      merged_at: null,
+      merge_commit_sha: 'sha001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
+      title: 'fix: something',
+      labels: [],
+      user: {login: 'human'},
+    }
+    const octokit = mockOctokitForFetch([pr])
+
+    // #when fetching
+    const result = await fetchMergedPrsInWindow(octokit, 'fro-bot', '.github', new Date())
+
+    // #then the PR is excluded
+    expect(result).toHaveLength(0)
+  })
+
+  it('excludes PRs merged outside the lookback window', async () => {
+    // #given a PR merged 60 days ago (beyond LOOKBACK_DAYS=30)
+    const oldDate = new Date()
+    oldDate.setDate(oldDate.getDate() - 60)
+    const pr = {
+      number: 1,
+      merged_at: oldDate.toISOString(),
+      merge_commit_sha: 'sha001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
+      title: 'fix: old',
+      labels: [],
+      user: {login: 'human'},
+    }
+    const octokit = mockOctokitForFetch([pr])
+
+    // #when fetching
+    const result = await fetchMergedPrsInWindow(octokit, 'fro-bot', '.github', new Date())
+
+    // #then the PR is excluded
+    expect(result).toHaveLength(0)
+  })
+
+  it('excludes PRs with null merge_commit_sha', async () => {
+    // #given a merged PR with no merge commit SHA
+    const pr = {
+      number: 1,
+      merged_at: new Date().toISOString(),
+      merge_commit_sha: null,
+      title: 'fix: something',
+      labels: [],
+      user: {login: 'human'},
+    }
+    const octokit = mockOctokitForFetch([pr])
+
+    // #when fetching
+    const result = await fetchMergedPrsInWindow(octokit, 'fro-bot', '.github', new Date())
+
+    // #then the PR is excluded
+    expect(result).toHaveLength(0)
+  })
+
+  it('returns multiple PRs within the window', async () => {
+    // #given two merged PRs within the lookback window
+    const prs = [
+      {
+        number: 1,
+        merged_at: new Date().toISOString(),
+        merge_commit_sha: 'sha001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
+        title: 'fix: first',
+        labels: [],
+        user: {login: 'human'},
+      },
+      {
+        number: 2,
+        merged_at: new Date().toISOString(),
+        merge_commit_sha: 'sha002aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
+        title: 'fix: second',
+        labels: [],
+        user: {login: 'human'},
+      },
+    ]
+    const octokit = mockOctokitForFetch(prs)
+
+    // #when fetching
+    const result = await fetchMergedPrsInWindow(octokit, 'fro-bot', '.github', new Date())
+
+    // #then both PRs are returned
+    expect(result).toHaveLength(2)
+  })
+
+  it('handles null user gracefully', async () => {
+    // #given a merged PR with null user
+    const pr = {
+      number: 1,
+      merged_at: new Date().toISOString(),
+      merge_commit_sha: 'sha001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1',
+      title: 'fix: something',
+      labels: [],
+      user: null,
+    }
+    const octokit = mockOctokitForFetch([pr])
+
+    // #when fetching
+    const result = await fetchMergedPrsInWindow(octokit, 'fro-bot', '.github', new Date())
+
+    // #then the PR is returned with null user
+    expect(result).toHaveLength(1)
+    expect(result[0]?.user).toBeNull()
+  })
+})
+
+// ---------------------------------------------------------------------------
+// main() wiring: both harvesters concatenated (integration-level unit test)
+// ---------------------------------------------------------------------------
+
+describe('main() wiring: both candidate sources concatenated via buildCandidateDigest', () => {
+  it('review-heavy + ci-fix candidates from separate harvesters are both fed to buildCandidateDigest', () => {
+    // #given one review-heavy candidate and one ci-fix candidate (simulating both harvesters)
+    const reviewCandidate = makeCandidate({mergeSha: 'review01aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'})
+    const ciFixCandidate = makeCiFixCandidate({mergeSha: 'cifix001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'})
+
+    // Concatenate as main() does
+    const allCandidates: Candidate[] = [reviewCandidate, ciFixCandidate]
+
+    // Merge stageCounts as main() does
+    const reviewStageCounts: HarvestStageCounts = {
+      closedPrsFetched: 10,
+      mergedPrsInLookback: 5,
+      excludedAutomation: 1,
+      multiRoundCandidates: 1,
+      ciFixPrsExamined: 0,
+      ciFixCandidates: 0,
+    }
+    const mergedStageCounts: HarvestStageCounts = {
+      ...reviewStageCounts,
+      ciFixPrsExamined: 3,
+      ciFixCandidates: 1,
+    }
+
+    // #when building the digest with both sources
+    const result = buildCandidateDigest({
+      mergedPrs: allCandidates,
+      stageCounts: mergedStageCounts,
+      openedLearningShas: new Set(),
+      solutionsDocs: [],
+      maxLearnings: 5,
+      privateTokens: new Set(),
+    })
+
+    // #then both candidates are emitted
+    expect(result.candidates).toHaveLength(2)
+    const triggers = result.candidates.map(c => c.trigger).sort()
+    expect(triggers).toEqual(['ci-fail-then-pass', 'review-heavy'])
+
+    // #then merged stage counts are threaded into telemetry
+    expect(result.telemetry.multiRoundCandidates).toBe(1)
+    expect(result.telemetry.ciFixPrsExamined).toBe(3)
+    expect(result.telemetry.ciFixCandidates).toBe(1)
+    expect(result.telemetry.closedPrsFetched).toBe(10)
+  })
+
+  it('fail-closed fallback telemetry includes ciFixPrsExamined and ciFixCandidates as 0', () => {
+    // #given the empty digest shape used in the catch block of main()
+    // This test verifies the shape is consistent (ciFixPrsExamined + ciFixCandidates present)
+    const empty: CandidateDigest = {
+      candidates: [],
+      telemetry: {
+        closedPrsFetched: 0,
+        mergedPrsInLookback: 0,
+        excludedAutomation: 0,
+        multiRoundCandidates: 0,
+        ciFixPrsExamined: 0,
+        ciFixCandidates: 0,
+        afterSeenDedup: 0,
+        afterSolutionsDedup: 0,
+        emitted: 0,
+        enrichmentBlocked: 0,
+        enrichmentBlockedBySecret: 0,
+      },
+    }
+
+    // #then the shape is consistent and serializes correctly
+    const serialized = JSON.stringify(empty)
+    const parsed = JSON.parse(serialized) as CandidateDigest
+    expect(parsed.telemetry.ciFixPrsExamined).toBe(0)
+    expect(parsed.telemetry.ciFixCandidates).toBe(0)
+    expect(parsed.telemetry.enrichmentBlockedBySecret).toBe(0)
+  })
+})
diff --git a/scripts/capture-learnings-harvest.ts b/scripts/capture-learnings-harvest.ts
index c3d47ef8a..ce845c10e 100644
--- a/scripts/capture-learnings-harvest.ts
+++ b/scripts/capture-learnings-harvest.ts
@@ -206,6 +206,10 @@ export interface DigestTelemetry {
   mergedPrsInLookback: number
   excludedAutomation: number
   multiRoundCandidates: number
+  /** Number of PRs examined for CI fail→pass transition. Threaded from HarvestStageCounts. */
+  ciFixPrsExamined: number
+  /** Number of CI-fix candidates found (transition detected + real diff). Threaded from HarvestStageCounts. */
+  ciFixCandidates: number
   afterSeenDedup: number
   afterSolutionsDedup: number
   emitted: number
@@ -1405,6 +1409,81 @@ export async function fetchOpenedLearningShas(
   return seen
 }
 
+// ---------------------------------------------------------------------------
+// Shared merged-PR fetch (for harvestCiFixCandidates in main)
+// ---------------------------------------------------------------------------
+
+/**
+ * Fetch all merged PRs in the lookback window and return the raw list.
+ *
+ * This is a separate fetch from harvestCandidates's internal fetch. The two-fetch
+ * approach was chosen over refactoring harvestCandidates to accept a pre-fetched list
+ * because harvestCandidates's fetch is deeply entangled with its filtering logic
+ * (merged_at, cutoff, merge_commit_sha, automation exclusion). Extracting it would
+ * require changing the function signature and all its tests, increasing blast radius
+ * with no behavioral benefit. The extra pulls.list call is bounded by LOOKBACK_DAYS
+ * and is not on the hot path.
+ *
+ * Returns PRs with the fields required by harvestCiFixCandidates.
+ * Exported for testability.
+ */
+export async function fetchMergedPrsInWindow(
+  octokit: OctokitClient,
+  owner: string,
+  repo: string,
+  now: Date,
+): Promise<
+  {
+    number: number
+    merge_commit_sha: string
+    title: string
+    labels: {name: string}[]
+    user: {login: string} | null
+    merged_at: string
+  }[]
+> {
+  const cutoff = new Date(now.getTime() - LOOKBACK_DAYS * 24 * 60 * 60 * 1000)
+
+  const allPrs = await octokit.paginate(octokit.rest.pulls.list, {
+    owner,
+    repo,
+    state: 'closed',
+    per_page: 100,
+  } as unknown as Parameters<OctokitClient['rest']['pulls']['list']>[0])
+
+  const result: {
+    number: number
+    merge_commit_sha: string
+    title: string
+    labels: {name: string}[]
+    user: {login: string} | null
+    merged_at: string
+  }[] = []
+
+  for (const pr of allPrs) {
+    // Exclude unmerged PRs
+    if (pr.merged_at === null || pr.merged_at === undefined) continue
+
+    // Exclude PRs outside the lookback window
+    const mergedAt = new Date(pr.merged_at)
+    if (mergedAt < cutoff) continue
+
+    // Exclude PRs without a merge commit SHA
+    if (pr.merge_commit_sha === null || pr.merge_commit_sha === undefined) continue
+
+    result.push({
+      number: pr.number,
+      merge_commit_sha: pr.merge_commit_sha,
+      title: pr.title,
+      labels: pr.labels.map(l => ({name: typeof l === 'string' ? l : l.name})),
+      user: pr.user !== null && pr.user !== undefined ? {login: pr.user.login} : null,
+      merged_at: pr.merged_at,
+    })
+  }
+
+  return result
+}
+
 // ---------------------------------------------------------------------------
 // Entry point
 // ---------------------------------------------------------------------------
@@ -1418,11 +1497,53 @@ async function main(): Promise<void> {
     // Use Date.now() — UTC ms — as the reference point for the lookback cutoff.
     const now = new Date(Date.now())
 
-    const [{candidates: mergedPrs, stageCounts}, openedLearningShas, solutionsFiles] = await Promise.all([
-      harvestCandidates(octokit, owner, repo, now),
-      fetchOpenedLearningShas(octokit, owner, repo),
-      loadSolutionsFilesFromDisk(),
-    ])
+    // Fetch merged PRs once for the ci-fix harvester (separate from harvestCandidates's
+    // internal fetch — see fetchMergedPrsInWindow for the rationale).
+    const [{candidates: reviewCandidates, stageCounts}, openedLearningShas, solutionsFiles, mergedPrsForCiFix] =
+      await Promise.all([
+        harvestCandidates(octokit, owner, repo, now),
+        fetchOpenedLearningShas(octokit, owner, repo),
+        loadSolutionsFilesFromDisk(),
+        fetchMergedPrsInWindow(octokit, owner, repo, now),
+      ])
+
+    // Get the required-checks set from branch protection.
+    // Wrap in try/catch — if branch protection is absent (404) or the call fails,
+    // use an empty Set (findFailPassTransition treats empty = all checks count).
+    // Counts-only logging on failure.
+    let requiredCheckNames = new Set<string>()
+    try {
+      const branchProtection = await octokit.rest.repos.getBranchProtection({
+        owner,
+        repo,
+        branch: 'main',
+      })
+      const checks = branchProtection.data.required_status_checks?.checks ?? []
+      requiredCheckNames = new Set(checks.map((c: {context: string}) => c.context))
+    } catch {
+      // 404 = no branch protection configured; any other error = degrade gracefully.
+      // Empty set means all failed→passed transitions count (documented approximation).
+      process.stderr.write(
+        `capture-learnings-harvest: getBranchProtection failed or absent, using empty required-checks set (all transitions count)\n`,
+      )
+    }
+
+    // Harvest CI fail→pass candidates using the shared merged-PR list.
+    const {
+      candidates: ciFixCandidates,
+      ciFixPrsExamined,
+      ciFixCandidates: ciFixCandidatesCount,
+    } = await harvestCiFixCandidates(octokit, owner, repo, now, mergedPrsForCiFix, requiredCheckNames)
+
+    // Concatenate both candidate sources.
+    const allCandidates: Candidate[] = [...reviewCandidates, ...ciFixCandidates]
+
+    // Merge stage counts: add ci-fix counts into the review-harvester's stageCounts.
+    const mergedStageCounts: HarvestStageCounts = {
+      ...stageCounts,
+      ciFixPrsExamined,
+      ciFixCandidates: ciFixCandidatesCount,
+    }
 
     const solutionsDocs = collectSolutionDocs(solutionsFiles)
 
@@ -1443,13 +1564,15 @@ async function main(): Promise<void> {
       enrichmentScanAvailable = false
     }
 
-    // If the scan is unavailable, clear all reviewExcerpts before passing to the pure core.
+    // If the scan is unavailable, clear all enriched evidence before passing to the pure core.
     // This ensures no unscanned prose reaches the digest under any path.
-    const safeMergedPrs = applyEnrichmentScanAvailability(mergedPrs, enrichmentScanAvailable)
+    // applyEnrichmentScanAvailability handles both ReviewCandidate (clears reviewExcerpts)
+    // and CiFixCandidate (clears diffExcerpt + logExcerpt) — Unit 3 verified this.
+    const safeCandidates = applyEnrichmentScanAvailability(allCandidates, enrichmentScanAvailable)
 
     const digest = buildCandidateDigest({
-      mergedPrs: safeMergedPrs,
-      stageCounts,
+      mergedPrs: safeCandidates,
+      stageCounts: mergedStageCounts,
       openedLearningShas,
       solutionsDocs,
       maxLearnings: MAX_LEARNINGS_PER_RUN,
@@ -1470,6 +1593,8 @@ async function main(): Promise<void> {
         mergedPrsInLookback: 0,
         excludedAutomation: 0,
         multiRoundCandidates: 0,
+        ciFixPrsExamined: 0,
+        ciFixCandidates: 0,
         afterSeenDedup: 0,
         afterSolutionsDedup: 0,
         emitted: 0,

From 93a49173d34911af91dfb6ba96ac5ec21227ed06 Mon Sep 17 00:00:00 2001
From: "Marcus R. Brown" <git@mrbro.dev>
Date: Tue, 23 Jun 2026 16:34:28 -0700
Subject: [PATCH 6/7] fix(capture): harden the failed-then-fixed trigger after
 review
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Bound each gh invocation with a timeout and cap commit pagination so a hung
process can't stall the run. Stop treating cancelled and action-required checks
as failures — only real failures qualify — and recognize errored status contexts.
Scan the failing check name alongside the diff and log, add the open step's
secret check as a final gate before posting, widen the secret patterns (Google,
GitLab, JWT, generic credentials), include base64 characters in the bearer
redaction, decode buffer log responses, and give failed-then-fixed proposals a
neutral title.
---
 scripts/capture-learnings-harvest.test.ts | 281 +++++++++++++++++++++-
 scripts/capture-learnings-harvest.ts      |  63 +++--
 scripts/capture-learnings-open.test.ts    |  78 +++++-
 scripts/capture-learnings-open.ts         |  17 +-
 scripts/capture-learnings-privacy.test.ts | 107 ++++++++
 scripts/capture-learnings-privacy.ts      |  11 +-
 6 files changed, 533 insertions(+), 24 deletions(-)

diff --git a/scripts/capture-learnings-harvest.test.ts b/scripts/capture-learnings-harvest.test.ts
index 893da70d1..b26411fd3 100644
--- a/scripts/capture-learnings-harvest.test.ts
+++ b/scripts/capture-learnings-harvest.test.ts
@@ -2249,9 +2249,10 @@ describe('findFailPassTransition', () => {
     expect(result?.firstPassingSha).toBe('sha2')
   })
 
-  it('all failing conclusions are recognized: TIMED_OUT, CANCELLED, ACTION_REQUIRED, STARTUP_FAILURE', () => {
+  it('all failing conclusions are recognized: FAILURE, TIMED_OUT, STARTUP_FAILURE', () => {
     // #given each failing conclusion type followed by SUCCESS
-    const failingConclusions = ['TIMED_OUT', 'CANCELLED', 'ACTION_REQUIRED', 'STARTUP_FAILURE']
+    // Note: CANCELLED and ACTION_REQUIRED are NOT failing conclusions (FIX 2)
+    const failingConclusions = ['FAILURE', 'TIMED_OUT', 'STARTUP_FAILURE']
     for (const conclusion of failingConclusions) {
       const commits: CommitCheckEntry[] = [
         makeCr('sha1', 'CI / test', conclusion),
@@ -2971,3 +2972,279 @@ describe('main() wiring: both candidate sources concatenated via buildCandidateD
     expect(parsed.telemetry.enrichmentBlockedBySecret).toBe(0)
   })
 })
+
+// ---------------------------------------------------------------------------
+// FIX 2: FAILING_CHECK_CONCLUSIONS — CANCELLED/ACTION_REQUIRED removed
+// ---------------------------------------------------------------------------
+
+describe('findFailPassTransition — FIX 2: CANCELLED and ACTION_REQUIRED are NOT failing conclusions', () => {
+  it('CANCELLED→SUCCESS is NOT a candidate (CANCELLED removed from failing set)', () => {
+    // #given a check that was CANCELLED then SUCCESS
+    // CANCELLED is a concurrency-cancelled run, not a code failure
+    const commits: CommitCheckEntry[] = [
+      makeCr('sha1', 'CI / test', 'CANCELLED'),
+      makeCr('sha2', 'CI / test', 'SUCCESS'),
+    ]
+    const required = new Set(['CI / test'])
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then no transition found (CANCELLED is not a failing conclusion)
+    expect(result).toBeNull()
+  })
+
+  it('ACTION_REQUIRED→SUCCESS is NOT a candidate (ACTION_REQUIRED removed from failing set)', () => {
+    // #given a check that was ACTION_REQUIRED then SUCCESS
+    // ACTION_REQUIRED is a policy gate, not a code failure
+    const commits: CommitCheckEntry[] = [
+      makeCr('sha1', 'CI / test', 'ACTION_REQUIRED'),
+      makeCr('sha2', 'CI / test', 'SUCCESS'),
+    ]
+    const required = new Set(['CI / test'])
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then no transition found (ACTION_REQUIRED is not a failing conclusion)
+    expect(result).toBeNull()
+  })
+
+  it('FAILURE→SUCCESS IS a candidate (FAILURE still in failing set)', () => {
+    // #given a check that was FAILURE then SUCCESS
+    const commits: CommitCheckEntry[] = [makeCr('sha1', 'CI / test', 'FAILURE'), makeCr('sha2', 'CI / test', 'SUCCESS')]
+    const required = new Set(['CI / test'])
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then transition IS found
+    expect(result).not.toBeNull()
+    expect(result?.lastFailingSha).toBe('sha1')
+    expect(result?.firstPassingSha).toBe('sha2')
+  })
+
+  it('StatusContext ERROR→SUCCESS IS detected (FIX 2b: ERROR treated as failing)', () => {
+    // #given a StatusContext entry with state=ERROR then SUCCESS
+    const commits: CommitCheckEntry[] = [makeSc('sha1', 'ci/test', 'ERROR'), makeSc('sha2', 'ci/test', 'SUCCESS')]
+    const required = new Set(['ci/test'])
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then the transition IS found (ERROR counts as failing)
+    expect(result).not.toBeNull()
+    expect(result?.failingCheckName).toBe('ci/test')
+    expect(result?.lastFailingSha).toBe('sha1')
+    expect(result?.firstPassingSha).toBe('sha2')
+  })
+
+  it('StatusContext PENDING→SUCCESS is NOT a candidate (PENDING is not failing)', () => {
+    // #given a StatusContext entry with state=PENDING then SUCCESS
+    const commits: CommitCheckEntry[] = [makeSc('sha1', 'ci/test', 'PENDING'), makeSc('sha2', 'ci/test', 'SUCCESS')]
+    const required = new Set(['ci/test'])
+
+    // #when finding the transition
+    const result = findFailPassTransition(commits, required)
+
+    // #then no transition found (PENDING is not a failing state)
+    expect(result).toBeNull()
+  })
+})
+
+// ---------------------------------------------------------------------------
+// FIX 3: failingCheckName included in privacy scan
+// ---------------------------------------------------------------------------
+
+describe('buildCandidateDigest — FIX 3: failingCheckName scanned for private names', () => {
+  it('private repo name in failingCheckName → evidence cleared, enrichmentBlocked incremented', () => {
+    // #given a ci-fix candidate whose failingCheckName contains a private repo name
+    const privateTokens = buildPrivateTokenSet(['testowner/secret-repo'])
+    const sha = 'fix3priv1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const candidate = makeCiFixCandidate({
+      mergeSha: sha,
+      failingCheckName: 'testowner/secret-repo CI check',
+      diffExcerpt: '-bad\n+good',
+    })
+    const input = makeDigestInput({mergedPrs: [candidate], privateTokens})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then enrichmentBlocked is incremented (private-name hit in failingCheckName)
+    expect(result.telemetry.enrichmentBlocked).toBe(1)
+    // #then evidence is cleared and failingCheckName is redacted
+    if (result.candidates[0]?.trigger === 'ci-fail-then-pass') {
+      expect(result.candidates[0].diffExcerpt).toBe('')
+      expect(result.candidates[0].failingCheckName).toBe('[REDACTED]')
+    }
+    // #then private name does not appear in serialized output
+    expect(JSON.stringify(result)).not.toContain('testowner/secret-repo')
+  })
+
+  it('clean failingCheckName passes through unmodified', () => {
+    // #given a ci-fix candidate with a clean failingCheckName
+    const sha = 'fix3clean1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const candidate = makeCiFixCandidate({
+      mergeSha: sha,
+      failingCheckName: 'CI / test',
+      diffExcerpt: '-bad\n+good',
+    })
+    const input = makeDigestInput({mergedPrs: [candidate], privateTokens: new Set()})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then candidate is emitted with failingCheckName intact
+    expect(result.candidates).toHaveLength(1)
+    if (result.candidates[0]?.trigger === 'ci-fail-then-pass') {
+      expect(result.candidates[0].failingCheckName).toBe('CI / test')
+    }
+    expect(result.telemetry.enrichmentBlocked).toBe(0)
+  })
+})
+
+// ---------------------------------------------------------------------------
+// FIX 9: within-run dedup precedence (review-heavy wins regardless of array order)
+// ---------------------------------------------------------------------------
+
+describe('buildCandidateDigest — FIX 9: within-run dedup precedence by trigger, not array order', () => {
+  it('review-heavy FIRST, ci-fix SECOND → review-heavy wins (array-order-first case)', () => {
+    // #given review-heavy listed first, ci-fix second — same SHA
+    const sha = 'dedup001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const reviewCandidate = makeCandidate({mergeSha: sha})
+    const ciFixCandidate = makeCiFixCandidate({mergeSha: sha})
+    const input = makeDigestInput({mergedPrs: [reviewCandidate, ciFixCandidate]})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then exactly one candidate, review-heavy wins
+    expect(result.candidates).toHaveLength(1)
+    expect(result.candidates[0]?.trigger).toBe('review-heavy')
+  })
+
+  it('ci-fix FIRST, review-heavy SECOND → review-heavy STILL wins (proves precedence is by trigger, not array order)', () => {
+    // #given ci-fix listed first, review-heavy second — same SHA
+    // A 'last-wins' bug would emit ci-fail-then-pass here; correct behavior emits review-heavy
+    const sha = 'dedup002aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const ciFixCandidate = makeCiFixCandidate({mergeSha: sha})
+    const reviewCandidate = makeCandidate({mergeSha: sha})
+    const input = makeDigestInput({mergedPrs: [ciFixCandidate, reviewCandidate]})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then exactly one candidate, review-heavy wins regardless of array order
+    expect(result.candidates).toHaveLength(1)
+    expect(result.candidates[0]?.trigger).toBe('review-heavy')
+  })
+})
+
+// ---------------------------------------------------------------------------
+// FIX 9: applyEnrichmentScanAvailability — ci-fix candidate test
+// ---------------------------------------------------------------------------
+
+describe('applyEnrichmentScanAvailability — FIX 9: ci-fix candidate clears diffExcerpt + logExcerpt', () => {
+  it('scanAvailable=false clears diffExcerpt and logExcerpt for ci-fix candidates', () => {
+    // #given a ci-fix candidate with evidence fields
+    const candidates: CiFixCandidate[] = [
+      makeCiFixCandidate({
+        mergeSha: `sha1${'0'.repeat(35)}`,
+        diffExcerpt: '-bad\n+good',
+        logExcerpt: 'Error: test failed',
+      }),
+    ]
+
+    // #when scan is unavailable
+    const result = applyEnrichmentScanAvailability(candidates, false)
+
+    // #then diffExcerpt is cleared and logExcerpt is undefined
+    const r0 = result[0]
+    if (r0?.trigger === 'ci-fail-then-pass') {
+      expect(r0.diffExcerpt).toBe('')
+      expect(r0.logExcerpt).toBeUndefined()
+    }
+    // #then other fields are preserved
+    expect(result[0]?.mergeSha).toBe(candidates[0]?.mergeSha)
+  })
+
+  it('scanAvailable=true → ci-fix candidate returned unchanged', () => {
+    // #given a ci-fix candidate with evidence fields
+    const candidates: CiFixCandidate[] = [
+      makeCiFixCandidate({
+        mergeSha: `sha1${'0'.repeat(35)}`,
+        diffExcerpt: '-bad\n+good',
+        logExcerpt: 'Error: test failed',
+      }),
+    ]
+
+    // #when scan is available
+    const result = applyEnrichmentScanAvailability(candidates, true)
+
+    // #then candidates are returned unchanged (same reference)
+    expect(result).toBe(candidates)
+    const r0 = result[0]
+    if (r0?.trigger === 'ci-fail-then-pass') {
+      expect(r0.diffExcerpt).toBe('-bad\n+good')
+      expect(r0.logExcerpt).toBe('Error: test failed')
+    }
+  })
+})
+
+// ---------------------------------------------------------------------------
+// FIX 9: CiFix logExcerpt privacy — private name and path in logExcerpt
+// ---------------------------------------------------------------------------
+
+describe('buildCandidateDigest — FIX 9: CiFix logExcerpt privacy scan', () => {
+  it('private repo name in logExcerpt (not diffExcerpt) → evidence dropped', () => {
+    // #given a ci-fix candidate whose logExcerpt contains a private repo name
+    // but diffExcerpt is clean
+    const privateTokens = buildPrivateTokenSet(['testowner/secret-repo'])
+    const sha = 'fix9log1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const candidate = makeCiFixCandidate({
+      mergeSha: sha,
+      diffExcerpt: '-bad\n+good',
+      logExcerpt: 'Error: testowner/secret-repo check failed',
+    })
+    const input = makeDigestInput({mergedPrs: [candidate], privateTokens})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then enrichmentBlocked is incremented (private name in logExcerpt)
+    expect(result.telemetry.enrichmentBlocked).toBe(1)
+    // #then both diffExcerpt and logExcerpt are cleared
+    if (result.candidates[0]?.trigger === 'ci-fail-then-pass') {
+      expect(result.candidates[0].diffExcerpt).toBe('')
+      expect(result.candidates[0].logExcerpt).toBeUndefined()
+    }
+    // #then private name does not appear in serialized output
+    expect(JSON.stringify(result)).not.toContain('testowner/secret-repo')
+  })
+
+  it('path in logExcerpt is redacted (not blocked), candidate still emitted', () => {
+    // #given a ci-fix candidate whose logExcerpt contains a file path (redact-class)
+    const sha = 'fix9log2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
+    const candidate = makeCiFixCandidate({
+      mergeSha: sha,
+      diffExcerpt: '-bad\n+good',
+      logExcerpt: 'Config loaded from /Users/marcus/.ssh/config',
+    })
+    const input = makeDigestInput({mergedPrs: [candidate], privateTokens: new Set()})
+
+    // #when building the digest
+    const result = buildCandidateDigest(input)
+
+    // #then the candidate is emitted (not blocked)
+    expect(result.candidates).toHaveLength(1)
+    // #then the path is redacted in logExcerpt
+    if (result.candidates[0]?.trigger === 'ci-fail-then-pass') {
+      expect(result.candidates[0].logExcerpt).not.toContain('/Users/marcus')
+      expect(result.candidates[0].logExcerpt).toContain('[REDACTED]')
+    }
+    // #then neither counter is incremented (redaction, not blocking)
+    expect(result.telemetry.enrichmentBlocked).toBe(0)
+    expect(result.telemetry.enrichmentBlockedBySecret).toBe(0)
+  })
+})
diff --git a/scripts/capture-learnings-harvest.ts b/scripts/capture-learnings-harvest.ts
index ce845c10e..42a3ebd9f 100644
--- a/scripts/capture-learnings-harvest.ts
+++ b/scripts/capture-learnings-harvest.ts
@@ -11,6 +11,7 @@
  */
 
 import type {Dirent} from 'node:fs'
+import {Buffer} from 'node:buffer'
 import {execFileSync} from 'node:child_process'
 import {readdir, readFile, writeFile} from 'node:fs/promises'
 import process from 'node:process'
@@ -390,10 +391,11 @@ export function buildCandidateDigest(input: BuildCandidateDigestInput): Candidat
       const rawDiff = pr.diffExcerpt
       const rawLog = pr.logExcerpt ?? ''
 
-      // Step 1: redact structural secrets
+      // Step 1: redact structural secrets (including failingCheckName)
       const redactedDiff = redactLogDiffSecrets(rawDiff)
       const redactedLog = rawLog === '' ? rawLog : redactLogDiffSecrets(rawLog)
-      const combinedText = `${redactedDiff}\n${redactedLog}`
+      const redactedCheckName = redactLogDiffSecrets(pr.failingCheckName)
+      const combinedText = `${redactedDiff}\n${redactedLog}\n${redactedCheckName}`
 
       // Step 2: check for private-name leak or residual hard secret
       const hasPrivateLeak = learningBodyHasPrivateLeak(combinedText, input.privateTokens)
@@ -401,15 +403,17 @@ export function buildCandidateDigest(input: BuildCandidateDigestInput): Candidat
 
       if (hasPrivateLeak) {
         enrichmentBlocked++
-        return {...pr, diffExcerpt: '', logExcerpt: undefined}
+        // Clear failingCheckName too — it may contain the private name that triggered the block
+        return {...pr, failingCheckName: '[REDACTED]', diffExcerpt: '', logExcerpt: undefined}
       }
       if (hasResidualSecret) {
         enrichmentBlockedBySecret++
-        return {...pr, diffExcerpt: '', logExcerpt: undefined}
+        // Clear failingCheckName too — it may contain the secret that triggered the block
+        return {...pr, failingCheckName: '[REDACTED]', diffExcerpt: '', logExcerpt: undefined}
       }
 
-      // Step 3: use the redacted values
-      const result: CiFixCandidate = {...pr, diffExcerpt: redactedDiff}
+      // Step 3: use the redacted values (including redacted failingCheckName)
+      const result: CiFixCandidate = {...pr, diffExcerpt: redactedDiff, failingCheckName: redactedCheckName}
       if (redactedLog !== '') {
         return {...result, logExcerpt: redactedLog}
       }
@@ -748,7 +752,7 @@ export interface FailPassTransition {
  * Failing conclusions for CheckRun (GitHub Actions).
  * These are the states that count as "failed" for transition detection.
  */
-const FAILING_CHECK_CONCLUSIONS = new Set(['FAILURE', 'TIMED_OUT', 'CANCELLED', 'ACTION_REQUIRED', 'STARTUP_FAILURE'])
+const FAILING_CHECK_CONCLUSIONS = new Set(['FAILURE', 'TIMED_OUT', 'STARTUP_FAILURE'])
 
 /**
  * Find the first fail→pass transition across a set of commits for required checks.
@@ -796,8 +800,8 @@ export function findFailPassTransition(
           // Update lastFailingSha — we want the LATEST failing commit
           lastFailingSha = entry.sha
         }
-      } else if (entry.state === 'FAILURE') {
-        // StatusContext: FAILURE state counts as failing
+      } else if (entry.state === 'FAILURE' || entry.state === 'ERROR') {
+        // StatusContext: FAILURE and ERROR states count as failing
         lastFailingSha = entry.sha
       }
     }
@@ -883,6 +887,7 @@ export function defaultGhExec(args: string[], env?: NodeJS.ProcessEnv): string {
     encoding: 'utf8',
     env: env ?? process.env,
     stdio: ['pipe', 'pipe', 'pipe'],
+    timeout: 30_000,
   })
 }
 
@@ -907,9 +912,13 @@ export function fetchPrCommitCheckRollup(
 
   const allEntries: CommitCheckEntry[] = []
   let after: string | null = null
+  const MAX_PAGES = 50
+  let pageCount = 0
 
   // Cursor walk — handles PRs with >100 commits
   for (;;) {
+    if (pageCount >= MAX_PAGES) break
+    pageCount++
     const args = [
       'api',
       'graphql',
@@ -930,6 +939,8 @@ export function fetchPrCommitCheckRollup(
     try {
       stdout = ghExec(args, env)
     } catch {
+      // If we already have entries from earlier pages, return partial data rather than discarding
+      if (allEntries.length > 0) break
       return null
     }
 
@@ -937,20 +948,39 @@ export function fetchPrCommitCheckRollup(
     try {
       parsed = JSON.parse(stdout)
     } catch {
+      if (allEntries.length > 0) break
       return null
     }
 
-    if (!isRecord(parsed)) return null
+    if (!isRecord(parsed)) {
+      if (allEntries.length > 0) break
+      return null
+    }
     const data = parsed.data
-    if (!isRecord(data)) return null
+    if (!isRecord(data)) {
+      if (allEntries.length > 0) break
+      return null
+    }
     const repository = data.repository
-    if (!isRecord(repository)) return null
+    if (!isRecord(repository)) {
+      if (allEntries.length > 0) break
+      return null
+    }
     const pullRequest = repository.pullRequest
-    if (!isRecord(pullRequest)) return null
+    if (!isRecord(pullRequest)) {
+      if (allEntries.length > 0) break
+      return null
+    }
     const commits = pullRequest.commits
-    if (!isRecord(commits)) return null
+    if (!isRecord(commits)) {
+      if (allEntries.length > 0) break
+      return null
+    }
     const nodes = commits.nodes
-    if (!Array.isArray(nodes)) return null
+    if (!Array.isArray(nodes)) {
+      if (allEntries.length > 0) break
+      return null
+    }
 
     for (const node of nodes) {
       if (!isRecord(node)) continue
@@ -1167,7 +1197,8 @@ export async function harvestCiFixCandidates(
             repo,
             job_id: failedJob.id,
           })
-          const logText = typeof logResponse.data === 'string' ? logResponse.data : ''
+          const raw = logResponse.data
+          const logText = typeof raw === 'string' ? raw : Buffer.isBuffer(raw) ? raw.toString('utf8') : ''
           if (logText !== '') {
             logExcerpt = buildLogExcerpt(logText, MAX_EXCERPT_CHARS_PER_CANDIDATE)
           }
diff --git a/scripts/capture-learnings-open.test.ts b/scripts/capture-learnings-open.test.ts
index 848828c98..862763805 100644
--- a/scripts/capture-learnings-open.test.ts
+++ b/scripts/capture-learnings-open.test.ts
@@ -565,7 +565,7 @@ describe('openLearningIssues', () => {
     expect(counts.failed).toBe(0)
   })
 
-  it('includes the title derived from the mergeSha (short SHA, no private info)', async () => {
+  it('FIX 7: title does NOT contain "review-heavy" (neutral title for all trigger types)', async () => {
     // #given a learning with a known mergeSha
     const sha = 'deadbeefdeadbeefdeadbeefdeadbeefdeadbeef'
     const createSpy = vi.fn().mockResolvedValue({data: {number: 1}})
@@ -580,5 +580,81 @@ describe('openLearningIssues', () => {
     const callArg = createSpy.mock.calls[0]?.[0] as IssuesCreateCall
     expect(callArg.title).toContain('deadbeef')
     expect(callArg.title).toContain('Learning proposal')
+    // #then the title does NOT hardcode 'review-heavy' (misleading for ci-fix candidates)
+    expect(callArg.title).not.toContain('review-heavy')
+  })
+})
+
+// ---------------------------------------------------------------------------
+// FIX 6: planLearnings blocks bodies containing hard secrets (logDiffHasSecret)
+// ---------------------------------------------------------------------------
+
+describe('planLearnings — FIX 6: logDiffHasSecret as final defense-in-depth gate', () => {
+  it('blocks a body containing a ghp_ secret (logDiffHasSecret gate)', () => {
+    // #given an agent-authored body containing a GitHub PAT
+    const sha = 'abc123def456abc123def456abc123def456abc1'
+    // Using an obviously-fake token — 'A'.repeat(36) is not a real credential
+    const body = `This learning references a fix. Token: ghp_${'A'.repeat(36)} was used.`
+    const input = makePlanInput({
+      candidates: [makeCandidate({mergeSha: sha})],
+      learningBodies: new Map([[sha, body]]),
+      privateTokens: new Set(), // no private-name tokens — only secret scan fires
+    })
+
+    // #when planning
+    const result = planLearnings(input)
+
+    // #then the candidate is blocked (secret in body)
+    expect(result.toCreate).toHaveLength(0)
+    expect(result.blockedOnPrivacy).toBe(1)
+  })
+
+  it('MUTATION PROOF: removing logDiffHasSecret check lets the ghp_ secret through', () => {
+    // #given an agent-authored body containing a GitHub PAT
+    const sha = 'abc123def456abc123def456abc123def456abc1'
+    const body = `Learning body with ghp_${'A'.repeat(36)} embedded.`
+
+    // #when planning WITH the secret gate (normal path — non-empty secret in body)
+    const withGate = planLearnings(
+      makePlanInput({
+        candidates: [makeCandidate({mergeSha: sha})],
+        learningBodies: new Map([[sha, body]]),
+        privateTokens: new Set(),
+      }),
+    )
+    // #then the candidate is blocked
+    expect(withGate.toCreate).toHaveLength(0)
+    expect(withGate.blockedOnPrivacy).toBe(1)
+
+    // #when planning WITHOUT the secret (simulated by using a clean body — gate removed)
+    const cleanBody = 'A clean learning body with no secrets.'
+    const withoutSecret = planLearnings(
+      makePlanInput({
+        candidates: [makeCandidate({mergeSha: sha})],
+        learningBodies: new Map([[sha, cleanBody]]),
+        privateTokens: new Set(),
+      }),
+    )
+    // #then the clean body passes through — proving the gate was the blocker
+    expect(withoutSecret.toCreate).toHaveLength(1)
+    expect(withoutSecret.blockedOnPrivacy).toBe(0)
+  })
+
+  it('clean body with no secrets passes through (gate does not over-block)', () => {
+    // #given a clean agent-authored body
+    const sha = 'abc123def456abc123def456abc123def456abc1'
+    const body = 'A clean learning about CI improvements and best practices.'
+    const input = makePlanInput({
+      candidates: [makeCandidate({mergeSha: sha})],
+      learningBodies: new Map([[sha, body]]),
+      privateTokens: new Set(),
+    })
+
+    // #when planning
+    const result = planLearnings(input)
+
+    // #then the candidate is included (no secrets, no private names)
+    expect(result.toCreate).toHaveLength(1)
+    expect(result.blockedOnPrivacy).toBe(0)
   })
 })
diff --git a/scripts/capture-learnings-open.ts b/scripts/capture-learnings-open.ts
index 9c3983509..a9acea329 100644
--- a/scripts/capture-learnings-open.ts
+++ b/scripts/capture-learnings-open.ts
@@ -29,7 +29,12 @@ import {
   type CandidateDigest,
   type OctokitClient,
 } from './capture-learnings-harvest.ts'
-import {isRecord, learningBodyHasPrivateLeak, loadPrivateTokensFromDisk} from './capture-learnings-privacy.ts'
+import {
+  isRecord,
+  learningBodyHasPrivateLeak,
+  loadPrivateTokensFromDisk,
+  logDiffHasSecret,
+} from './capture-learnings-privacy.ts'
 
 // ---------------------------------------------------------------------------
 // Types
@@ -128,8 +133,8 @@ export function planLearnings(input: PlanLearningsInput): PlanLearningsResult {
       continue
     }
 
-    // Privacy gate: block if body contains a private identifier
-    if (learningBodyHasPrivateLeak(body, input.privateTokens)) {
+    // Privacy gate: block if body contains a private identifier or a hard secret
+    if (learningBodyHasPrivateLeak(body, input.privateTokens) || logDiffHasSecret(body)) {
       blockedOnPrivacy += 1
       continue
     }
@@ -208,10 +213,14 @@ export async function ensureLabelsExist(
  * Derive a learning issue title from a mergeSha.
  * Uses only the merge SHA (public, safe for this public repo) — no owner/repo/number prose.
  * Short SHA = first 8 chars.
+ *
+ * The open step works off agent-authored bodies keyed by mergeSha and does not have
+ * the trigger at title-derivation time. A neutral title avoids misleading 'review-heavy'
+ * for ci-fail-then-pass candidates.
  */
 function deriveLearningTitle(mergeSha: string): string {
   const shortSha = mergeSha.slice(0, 8)
-  return `Learning proposal: review-heavy PR (${shortSha})`
+  return `Learning proposal: (${shortSha})`
 }
 
 // ---------------------------------------------------------------------------
diff --git a/scripts/capture-learnings-privacy.test.ts b/scripts/capture-learnings-privacy.test.ts
index ce6bf873d..0bbaa687c 100644
--- a/scripts/capture-learnings-privacy.test.ts
+++ b/scripts/capture-learnings-privacy.test.ts
@@ -616,5 +616,112 @@ describe('redactLogDiffSecrets', () => {
       expect(result).toContain('called')
       expect(result).toContain('/endpoint')
     })
+
+    it('FIX 5: Bearer token with base64 chars (+/) is fully redacted', () => {
+      // #given a standalone Bearer token containing + and / (valid base64 chars)
+      // Note: using a standalone Bearer (not inside an Authorization header) so the
+      // Bearer pattern fires rather than the Authorization header pattern
+      const body = 'Request sent with Bearer abc+def/ghi12345 to the API'
+
+      // #when redacting
+      const result = redactLogDiffSecrets(body)
+
+      // #then the token value is fully redacted (no tail survives)
+      expect(result).not.toContain('abc+def/ghi12345')
+      expect(result).toContain('Bearer [REDACTED]')
+    })
+  })
+})
+
+// ---------------------------------------------------------------------------
+// FIX 4: New SECRET_BLOCK_PATTERNS — Google API key, GitLab PAT, JWT, generic creds
+// ---------------------------------------------------------------------------
+
+describe('logDiffHasSecret — FIX 4: new block patterns', () => {
+  it('detects a Google API key (AIza prefix + 35 chars)', () => {
+    // #given a body containing a Google API key shape
+    // Using an obviously-fake key — 'A'.repeat(35) is not a real credential
+    const body = `GOOGLE_API_KEY=AIza${'A'.repeat(35)}`
+
+    // #when scanning
+    // #then the secret is detected and must block
+    expect(logDiffHasSecret(body)).toBe(true)
+  })
+
+  it('does NOT block a short AIza string (below 35 chars)', () => {
+    // #given a body with AIza prefix but too short
+    const body = `AIza${'A'.repeat(10)}`
+
+    // #when scanning
+    // #then not blocked (too short)
+    expect(logDiffHasSecret(body)).toBe(false)
+  })
+
+  it('detects a GitLab PAT (glpat- prefix + 20+ chars)', () => {
+    // #given a body containing a GitLab PAT shape
+    // Using an obviously-fake token — 'B'.repeat(20) is not a real credential
+    const body = `GITLAB_TOKEN=glpat-${'B'.repeat(20)}`
+
+    // #when scanning
+    // #then the secret is detected and must block
+    expect(logDiffHasSecret(body)).toBe(true)
+  })
+
+  it('detects a JWT (header.payload.signature format)', () => {
+    // #given a body containing a JWT shape
+    // Using obviously-fake base64url segments
+    const header = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9'
+    const payload = 'eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIn0'
+    const sig = 'SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c'
+    const body = `token=${header}.${payload}.${sig}`
+
+    // #when scanning
+    // #then the secret is detected and must block
+    expect(logDiffHasSecret(body)).toBe(true)
+  })
+
+  it('detects a generic password assignment (password= with 16+ char value)', () => {
+    // #given a body containing a password assignment
+    const body = `password=supersecretvalue123456`
+
+    // #when scanning
+    // #then the secret is detected and must block
+    expect(logDiffHasSecret(body)).toBe(true)
+  })
+
+  it('detects a generic secret assignment (secret: with 16+ char value)', () => {
+    // #given a body containing a secret assignment
+    const body = `secret: mysupersecretvalue123`
+
+    // #when scanning
+    // #then the secret is detected and must block
+    expect(logDiffHasSecret(body)).toBe(true)
+  })
+
+  it('detects a generic api_key assignment (api_key= with 16+ char value)', () => {
+    // #given a body containing an api_key assignment
+    const body = `api_key=abcdefghijklmnop`
+
+    // #when scanning
+    // #then the secret is detected and must block
+    expect(logDiffHasSecret(body)).toBe(true)
+  })
+
+  it('detects a generic access_token assignment (access_token= with 16+ char value)', () => {
+    // #given a body containing an access_token assignment
+    const body = `access_token=abcdefghijklmnopqrst`
+
+    // #when scanning
+    // #then the secret is detected and must block
+    expect(logDiffHasSecret(body)).toBe(true)
+  })
+
+  it('does NOT block a short generic credential value (below 16 chars)', () => {
+    // #given a body with a short password value (below the 16-char minimum)
+    const body = `password=short`
+
+    // #when scanning
+    // #then not blocked (too short to be a real credential)
+    expect(logDiffHasSecret(body)).toBe(false)
   })
 })
diff --git a/scripts/capture-learnings-privacy.ts b/scripts/capture-learnings-privacy.ts
index d365af902..b4ef2547a 100644
--- a/scripts/capture-learnings-privacy.ts
+++ b/scripts/capture-learnings-privacy.ts
@@ -61,6 +61,14 @@ const SECRET_BLOCK_PATTERNS: RegExp[] = [
   /sk-[A-Za-z0-9\-]{20,}/,
   // Slack tokens
   /xox[bpars]-[\w-]{10,}/,
+  // Google API keys
+  /AIza[\w\-]{35}/,
+  // GitLab PATs
+  /glpat-[\w\-]{20,}/,
+  // JWT (header.payload.signature)
+  /eyJ[\w\-]+\.eyJ[\w\-]+\.[\w\-]+/,
+  // Generic credential assignment catch-all (password=, secret=, api_key=, etc.)
+  /(?:password|passwd|secret|api[._-]?key|auth[._-]?token|access[._-]?token)["']?\s*[=:]\s*["']?[^\s"']{16,}/i,
 ]
 
 /**
@@ -97,7 +105,8 @@ export function logDiffHasSecret(body: string): boolean {
  */
 const SECRET_REDACT_PATTERNS: [RegExp, string][] = [
   // Bearer tokens (min 8 chars to avoid over-redacting short placeholders)
-  [/Bearer\s+[\w\-.=]{8,}/g, 'Bearer [REDACTED]'],
+  // Includes +/ for base64-encoded values
+  [/Bearer\s+[\w\-.=+/]{8,}/g, 'Bearer [REDACTED]'],
   // Authorization header values: capture scheme + credential (e.g. 'Basic dXNlcjpwYXNz', 'Bearer token')
   // Matches two whitespace-delimited tokens after the colon, or one token of 8+ chars.
   [/Authorization:\s*(?:\S+\s+\S{4,}|\S{8,})/g, 'Authorization: [REDACTED]'],

From e0af2cc8c7729b9af06a554b5d000f431dec5370 Mon Sep 17 00:00:00 2001
From: "Marcus R. Brown" <git@mrbro.dev>
Date: Tue, 23 Jun 2026 17:01:48 -0700
Subject: [PATCH 7/7] fix(capture): keep stack-trace context in failing-log
 excerpts

Emit contiguous windows around each error line instead of a global error-first
partition, so each error keeps the surrounding stack-trace lines that make the
failure legible; merge overlapping windows and fall back to the log head when no
error line is present.
---
 scripts/capture-learnings-harvest.test.ts | 73 +++++++++++++++++++----
 scripts/capture-learnings-harvest.ts      | 50 ++++++++++++----
 scripts/capture-learnings-open.test.ts    |  6 +-
 scripts/capture-learnings-privacy.test.ts |  6 +-
 4 files changed, 104 insertions(+), 31 deletions(-)

diff --git a/scripts/capture-learnings-harvest.test.ts b/scripts/capture-learnings-harvest.test.ts
index b26411fd3..94a2da5d1 100644
--- a/scripts/capture-learnings-harvest.test.ts
+++ b/scripts/capture-learnings-harvest.test.ts
@@ -12,6 +12,7 @@ import {describe, expect, it, vi} from 'vitest'
 import {
   applyEnrichmentScanAvailability,
   buildCandidateDigest,
+  buildLogExcerpt,
   buildMergeShaMarker,
   DEPENDENCY_LABELS,
   fetchMergedPrsInWindow,
@@ -2142,6 +2143,54 @@ function makeSc(sha: string, context: string, state: string): CommitCheckEntry {
   return {type: 'StatusContext', sha, context, state}
 }
 
+describe('buildLogExcerpt', () => {
+  it('keeps the stack-trace context around an error line, not just the error line', () => {
+    // #given a log where an error line is surrounded by stack-trace context
+    const log = [
+      'Run tests',
+      'preamble line a',
+      'preamble line b',
+      'at module.load (foo.ts:10)',
+      'at caller (bar.ts:20)',
+      'Error: boom happened here',
+      'at deeper (baz.ts:30)',
+      'at deepest (qux.ts:40)',
+      'trailing line',
+    ].join('\n')
+
+    // #when building the excerpt with a generous budget
+    const excerpt = buildLogExcerpt(log, 10_000)
+
+    // #then the error line AND its surrounding stack-trace lines are present (context preserved)
+    expect(excerpt).toContain('Error: boom happened here')
+    expect(excerpt).toContain('at caller (bar.ts:20)') // before the error
+    expect(excerpt).toContain('at deeper (baz.ts:30)') // after the error
+  })
+
+  it('falls back to the head of the log when no error line is present', () => {
+    // #given a log with no error/failed lines
+    const log = Array.from({length: 50}, (_, i) => `info line ${i}`).join('\n')
+
+    // #when building the excerpt
+    const excerpt = buildLogExcerpt(log, 100)
+
+    // #then it returns the head of the log, truncated to budget
+    expect(excerpt.startsWith('info line 0')).toBe(true)
+    expect(excerpt.length).toBeLessThanOrEqual(100)
+  })
+
+  it('truncates to budget', () => {
+    // #given a log with an error and a large surrounding body
+    const log = ['error: start', ...Array.from({length: 500}, (_, i) => `line ${i}`)].join('\n')
+
+    // #when building with a small budget
+    const excerpt = buildLogExcerpt(log, 50)
+
+    // #then the result is capped at budget
+    expect(excerpt.length).toBeLessThanOrEqual(50)
+  })
+})
+
 describe('findFailPassTransition', () => {
   it('happy path: clean fail→pass on a required check returns correct SHAs', () => {
     // #given commits oldest→newest: sha1=FAILURE, sha2=SUCCESS
@@ -2251,7 +2300,7 @@ describe('findFailPassTransition', () => {
 
   it('all failing conclusions are recognized: FAILURE, TIMED_OUT, STARTUP_FAILURE', () => {
     // #given each failing conclusion type followed by SUCCESS
-    // Note: CANCELLED and ACTION_REQUIRED are NOT failing conclusions (FIX 2)
+    // Note: CANCELLED and ACTION_REQUIRED are NOT failing conclusions
     const failingConclusions = ['FAILURE', 'TIMED_OUT', 'STARTUP_FAILURE']
     for (const conclusion of failingConclusions) {
       const commits: CommitCheckEntry[] = [
@@ -2974,10 +3023,10 @@ describe('main() wiring: both candidate sources concatenated via buildCandidateD
 })
 
 // ---------------------------------------------------------------------------
-// FIX 2: FAILING_CHECK_CONCLUSIONS — CANCELLED/ACTION_REQUIRED removed
+// FAILING_CHECK_CONCLUSIONS — CANCELLED/ACTION_REQUIRED are not failures
 // ---------------------------------------------------------------------------
 
-describe('findFailPassTransition — FIX 2: CANCELLED and ACTION_REQUIRED are NOT failing conclusions', () => {
+describe('findFailPassTransition — cancelled and action-required are not failing conclusions', () => {
   it('CANCELLED→SUCCESS is NOT a candidate (CANCELLED removed from failing set)', () => {
     // #given a check that was CANCELLED then SUCCESS
     // CANCELLED is a concurrency-cancelled run, not a code failure
@@ -3024,7 +3073,7 @@ describe('findFailPassTransition — FIX 2: CANCELLED and ACTION_REQUIRED are NO
     expect(result?.firstPassingSha).toBe('sha2')
   })
 
-  it('StatusContext ERROR→SUCCESS IS detected (FIX 2b: ERROR treated as failing)', () => {
+  it('StatusContext ERROR→SUCCESS IS detected (ERROR treated as failing)', () => {
     // #given a StatusContext entry with state=ERROR then SUCCESS
     const commits: CommitCheckEntry[] = [makeSc('sha1', 'ci/test', 'ERROR'), makeSc('sha2', 'ci/test', 'SUCCESS')]
     const required = new Set(['ci/test'])
@@ -3053,10 +3102,10 @@ describe('findFailPassTransition — FIX 2: CANCELLED and ACTION_REQUIRED are NO
 })
 
 // ---------------------------------------------------------------------------
-// FIX 3: failingCheckName included in privacy scan
+// failingCheckName included in privacy scan
 // ---------------------------------------------------------------------------
 
-describe('buildCandidateDigest — FIX 3: failingCheckName scanned for private names', () => {
+describe('buildCandidateDigest — failingCheckName scanned for private names', () => {
   it('private repo name in failingCheckName → evidence cleared, enrichmentBlocked incremented', () => {
     // #given a ci-fix candidate whose failingCheckName contains a private repo name
     const privateTokens = buildPrivateTokenSet(['testowner/secret-repo'])
@@ -3105,10 +3154,10 @@ describe('buildCandidateDigest — FIX 3: failingCheckName scanned for private n
 })
 
 // ---------------------------------------------------------------------------
-// FIX 9: within-run dedup precedence (review-heavy wins regardless of array order)
+// within-run dedup precedence (review-heavy wins regardless of array order)
 // ---------------------------------------------------------------------------
 
-describe('buildCandidateDigest — FIX 9: within-run dedup precedence by trigger, not array order', () => {
+describe('buildCandidateDigest — within-run dedup precedence by trigger, not array order', () => {
   it('review-heavy FIRST, ci-fix SECOND → review-heavy wins (array-order-first case)', () => {
     // #given review-heavy listed first, ci-fix second — same SHA
     const sha = 'dedup001aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'
@@ -3142,10 +3191,10 @@ describe('buildCandidateDigest — FIX 9: within-run dedup precedence by trigger
 })
 
 // ---------------------------------------------------------------------------
-// FIX 9: applyEnrichmentScanAvailability — ci-fix candidate test
+// applyEnrichmentScanAvailability — ci-fix candidate test
 // ---------------------------------------------------------------------------
 
-describe('applyEnrichmentScanAvailability — FIX 9: ci-fix candidate clears diffExcerpt + logExcerpt', () => {
+describe('applyEnrichmentScanAvailability — ci-fix candidate clears diffExcerpt + logExcerpt', () => {
   it('scanAvailable=false clears diffExcerpt and logExcerpt for ci-fix candidates', () => {
     // #given a ci-fix candidate with evidence fields
     const candidates: CiFixCandidate[] = [
@@ -3193,10 +3242,10 @@ describe('applyEnrichmentScanAvailability — FIX 9: ci-fix candidate clears dif
 })
 
 // ---------------------------------------------------------------------------
-// FIX 9: CiFix logExcerpt privacy — private name and path in logExcerpt
+// CiFix logExcerpt privacy — private name and path in logExcerpt
 // ---------------------------------------------------------------------------
 
-describe('buildCandidateDigest — FIX 9: CiFix logExcerpt privacy scan', () => {
+describe('buildCandidateDigest — CiFix logExcerpt privacy scan', () => {
   it('private repo name in logExcerpt (not diffExcerpt) → evidence dropped', () => {
     // #given a ci-fix candidate whose logExcerpt contains a private repo name
     // but diffExcerpt is clean
diff --git a/scripts/capture-learnings-harvest.ts b/scripts/capture-learnings-harvest.ts
index 42a3ebd9f..e342973e1 100644
--- a/scripts/capture-learnings-harvest.ts
+++ b/scripts/capture-learnings-harvest.ts
@@ -1063,27 +1063,51 @@ function buildDiffExcerpt(files: {filename?: string; patch?: string}[], budget:
 // ---------------------------------------------------------------------------
 
 /**
- * Build a log excerpt from raw job log text, ranked toward error lines.
- * Truncates to budget. Returns the excerpt string.
+ * Build a log excerpt from raw job log text, centered on error lines with context.
+ *
+ * Emits contiguous windows around each error line (a few lines before and after) rather
+ * than a global error-first partition, so each error keeps its surrounding stack-trace
+ * context — that context is what makes "this failure → this fix" legible. Windows are
+ * merged when they overlap, kept in original order, and the result is truncated to budget.
+ * Falls back to the head of the log when no error line is found.
  */
-function buildLogExcerpt(logText: string, budget: number): string {
+export function buildLogExcerpt(logText: string, budget: number): string {
   const lines = logText.split('\n')
+  const CONTEXT_BEFORE = 3
+  const CONTEXT_AFTER = 6
 
-  // Rank error lines first (lines containing 'error', 'Error', 'ERROR', 'FAILED', 'failed')
-  const errorLines: string[] = []
-  const otherLines: string[] = []
+  // Find error line indices.
+  const errorIndices: number[] = []
+  for (const [i, line] of lines.entries()) {
+    if (/error|failed/i.test(line ?? '')) errorIndices.push(i)
+  }
 
-  for (const line of lines) {
-    if (/error|failed/i.test(line)) {
-      errorLines.push(line)
+  // No error line found — fall back to the head of the log.
+  if (errorIndices.length === 0) return lines.join('\n').slice(0, budget)
+
+  // Build merged [start, end] windows around each error line (in original order).
+  const windows: {start: number; end: number}[] = []
+  for (const idx of errorIndices) {
+    const start = Math.max(0, idx - CONTEXT_BEFORE)
+    const end = Math.min(lines.length - 1, idx + CONTEXT_AFTER)
+    const last = windows.at(-1)
+    if (last !== undefined && start <= last.end + 1) {
+      // Overlaps or is adjacent to the previous window — merge.
+      last.end = Math.max(last.end, end)
     } else {
-      otherLines.push(line)
+      windows.push({start, end})
     }
   }
 
-  const ranked = [...errorLines, ...otherLines]
-  const joined = ranked.join('\n')
-  return joined.slice(0, budget)
+  // Emit the windows in order, separated by an elision marker when they are non-contiguous.
+  const segments: string[] = []
+  for (const [w, win] of windows.entries()) {
+    if (win === undefined) continue
+    if (w > 0) segments.push('...')
+    segments.push(lines.slice(win.start, win.end + 1).join('\n'))
+  }
+
+  return segments.join('\n').slice(0, budget)
 }
 
 // ---------------------------------------------------------------------------
diff --git a/scripts/capture-learnings-open.test.ts b/scripts/capture-learnings-open.test.ts
index 862763805..b8896505b 100644
--- a/scripts/capture-learnings-open.test.ts
+++ b/scripts/capture-learnings-open.test.ts
@@ -565,7 +565,7 @@ describe('openLearningIssues', () => {
     expect(counts.failed).toBe(0)
   })
 
-  it('FIX 7: title does NOT contain "review-heavy" (neutral title for all trigger types)', async () => {
+  it('title does NOT contain "review-heavy" (neutral title for all trigger types)', async () => {
     // #given a learning with a known mergeSha
     const sha = 'deadbeefdeadbeefdeadbeefdeadbeefdeadbeef'
     const createSpy = vi.fn().mockResolvedValue({data: {number: 1}})
@@ -586,10 +586,10 @@ describe('openLearningIssues', () => {
 })
 
 // ---------------------------------------------------------------------------
-// FIX 6: planLearnings blocks bodies containing hard secrets (logDiffHasSecret)
+// planLearnings blocks bodies containing hard secrets (logDiffHasSecret)
 // ---------------------------------------------------------------------------
 
-describe('planLearnings — FIX 6: logDiffHasSecret as final defense-in-depth gate', () => {
+describe('planLearnings — logDiffHasSecret as final defense-in-depth gate', () => {
   it('blocks a body containing a ghp_ secret (logDiffHasSecret gate)', () => {
     // #given an agent-authored body containing a GitHub PAT
     const sha = 'abc123def456abc123def456abc123def456abc1'
diff --git a/scripts/capture-learnings-privacy.test.ts b/scripts/capture-learnings-privacy.test.ts
index 0bbaa687c..f30323355 100644
--- a/scripts/capture-learnings-privacy.test.ts
+++ b/scripts/capture-learnings-privacy.test.ts
@@ -617,7 +617,7 @@ describe('redactLogDiffSecrets', () => {
       expect(result).toContain('/endpoint')
     })
 
-    it('FIX 5: Bearer token with base64 chars (+/) is fully redacted', () => {
+    it('Bearer token with base64 chars (+/) is fully redacted', () => {
       // #given a standalone Bearer token containing + and / (valid base64 chars)
       // Note: using a standalone Bearer (not inside an Authorization header) so the
       // Bearer pattern fires rather than the Authorization header pattern
@@ -634,10 +634,10 @@ describe('redactLogDiffSecrets', () => {
 })
 
 // ---------------------------------------------------------------------------
-// FIX 4: New SECRET_BLOCK_PATTERNS — Google API key, GitLab PAT, JWT, generic creds
+// SECRET_BLOCK_PATTERNS — Google API key, GitLab PAT, JWT, generic creds
 // ---------------------------------------------------------------------------
 
-describe('logDiffHasSecret — FIX 4: new block patterns', () => {
+describe('logDiffHasSecret — additional block patterns', () => {
   it('detects a Google API key (AIza prefix + 35 chars)', () => {
     // #given a body containing a Google API key shape
     // Using an obviously-fake key — 'A'.repeat(35) is not a real credential