Skip to content

Doctor scan: session scope, auto-hydration, catalog identity, append-only redaction trail#599

Merged
rsnodgrass merged 5 commits into
mainfrom
ryan/doctor-redact
May 11, 2026
Merged

Doctor scan: session scope, auto-hydration, catalog identity, append-only redaction trail#599
rsnodgrass merged 5 commits into
mainfrom
ryan/doctor-redact

Conversation

@rsnodgrass
Copy link
Copy Markdown
Contributor

@rsnodgrass rsnodgrass commented May 11, 2026

Summary

Three related fixes plus one feature on the session credential-scan path. Bundled because they share the same code, surfaced together while auditing the SageOx ox ledger, and only make sense as a coherent story: detect, hydrate, identify, persist.

  • ox-zukxox doctor --check=ledger-secrets / ox session redact-history had three structural bugs that combined to produce 99.4% false-positives and miss real PATs. Fix: scope to sessions/ only; auto-hydrate LFS pointers via openSessionContent so dehydrated sessions are actually readable; apply the SecretPattern.Keywords pre-screen the scan path was bypassing.
  • ox-def1 — split the broad github_token detector into per-prefix variants (github_personal_access_token, github_oauth_token, github_user_to_server_token, github_server_token, github_refresh_token, github_fine_grained_pat) each with Keywords guards at GitHub's published shapes ({36} / {76} / {82}).
  • ox-8bfh — append-only per-session redaction trail. Every redact-history rewrite now records a RedactionPass entry in the session's meta.json with a UUIDv7 pass id, applied-at, applied-by, catalog version + sha256 hash, and per-finding metadata (detector + file + line — never the matched bytes, per ox-zyg7). Operator sees the catalog id in stdout at run time so they know which ruleset is about to scan; future audits read meta.Redactions to answer "did we run, when, with which rules, and what was caught?"

Architecture

```mermaid
flowchart LR
A[ox session redact-history] --> B[hydrateAllSessionsForScan
visible LFS Batch fetch]
B --> C[scanLedgerForSecrets
session-scoped + keyword pre-screen]
C --> D{any unpushed findings?}
D -->|yes| E[per-file interactive prompt]
E --> F[redactFileInPlace]
F --> G[writeRedactionPassesPerSession
MutateSessionMeta + flock]
G --> H[stageAndAmendRedactedFiles
amend commit]
H --> I[ox session push]
```

```mermaid
flowchart TD
R[Redactor.NewRedactor] --> C[computeCatalogIdentity]
C --> V[CatalogVersion: 'ox-secrets-N41-b848e43b']
C --> H[CatalogHash: sha256 of canonicalized pattern set]
V --> P[RedactionPass in meta.json]
H --> P
```

Test plan

  • go test ./internal/session/... ./cmd/ox/ -run 'TestRedact\|TestScan\|TestSecret\|TestCheckLedger\|TestLedgerOrigin\|TestCatalog\|TestSplit\|TestSummarize\|TestWriteRedact' — all green
  • Regression tests for: pointer-file pre-screen (ox-zukx), keyword guard against UUID false-positives (ox-zukx), per-prefix GitHub detectors (ox-def1), catalog hash sensitivity to rule tampering (ox-8bfh), append-only RedactionPass history (ox-8bfh)
  • End-to-end on the live SageOx ox ledger: 350-file LFS hydration pass, 40 findings surfaced (vs 2155 pre-fix), catalog id visible in terminal output
  • Reviewer: confirm MutateSessionMeta flock contract (the same path that prevented ox-lrrq from recurring) is sufficient for concurrent daemon writes during a long redact-history run

Related tickets

  • ox-zukx — doctor scan: scope / hydration / keyword pre-screen (this PR)
  • ox-def1 — per-prefix GitHub PAT detectors (this PR)
  • ox-8bfh — redaction transparency: catalog id + append-only meta.json trail (this PR)
  • ox-lrrq — root-cause: meta.json merge-conflict markers from daemon write path; ledger working tree separately fixed (commits 17c62eb7 + 8a364d97 + 3fd5a225 on the ledger repo). The underlying write-path bug stays open as ox-lrrq.

Safety notes

  • Redaction tool's append-only audit trail uses lfs.MutateSessionMeta (advisory flock + atomic rename) — the same primitive that prevents the ox-lrrq race; this change does not widen the concurrent-write surface.
  • Per ox-zyg7, RedactionEntry carries detector + file + line only, never matched bytes. New regression tests assert no Original field appears on the type.
  • Cache-only design (per project rule) preserved: hydration writes to .sageox/cache/sessions/<name>/, never the in-place git-tracked path.

Summary by CodeRabbit

  • New Features

    • Per-session redaction history: persist an append-only redaction pass per affected session and record catalog identity with each pass.
    • Deterministic catalog identity (version + hash) for credential-detection rules.
  • Bug Fixes

    • Safer OID logging to avoid panics on short OIDs.
    • Tightened Heroku key detection and refined GitHub token detectors with keyword pre-screening.
  • Improvements

    • Session-scoped secret scanning with LFS hydration pre-scan, session/file accounting, stricter pass/fail when hydration coverage fails, and clearer scan summaries.
  • Tests

    • Updated and added tests validating session-scoped scanning, hydration behavior, redaction recording, and catalog identity.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 976869da-0946-4cce-be43-57cd70d6a49c

📥 Commits

Reviewing files that changed from the base of the PR and between 1b4624e and fe4f405.

📒 Files selected for processing (9)
  • cmd/ox/doctor_ledger_secrets.go
  • cmd/ox/doctor_ledger_secrets_test.go
  • cmd/ox/session_redact_history.go
  • cmd/ox/session_redact_hydrate.go
  • cmd/ox/session_redact_record_test.go
  • internal/lfs/meta.go
  • internal/session/redact_rules_test.go
  • internal/session/secrets.go
  • internal/session/secrets_test.go

📝 Walkthrough

Walkthrough

Refactors ledger secret scanning to operate per-session with a pre-scan LFS hydration pass, records per-session redaction audit passes, splits detectors with keyword pre-screening, and computes a deterministic catalog identity for the detector set.

Changes

Session Scanning & Redaction Audit

Layer / File(s) Summary
Secret Pattern Updates & Keyword Pre-Screening
internal/session/secrets.go, internal/session/secrets_test.go
GitHub credentials split into per-prefix detectors with Keywords guards; Heroku key gains Keywords guard and [REDACTED_HEROKU_KEY] label; redaction/detection APIs pre-screen inputs by Keywords after lowercasing.
Catalog Identity Computation & Caching
internal/session/catalog.go, internal/session/catalog_test.go, internal/session/secrets.go
Patterns canonicalized and deterministically hashed; Redactor caches catalogVersion and catalogHash; identity recomputed on pattern mutations; accessors CatalogVersion() and CatalogHash() added; tests validate determinism and sensitivity.
Redaction Audit Metadata Types
internal/lfs/meta.go
SessionMeta gains append-only Redactions field; new exported types RedactionPass, RedactionPassSummary, and RedactionEntry capture per-pass identity, catalog identity, summaries, and per-finding entries.
Redaction Recording & Persistence
cmd/ox/session_redact_record.go, cmd/ox/session_redact_record_test.go
splitSessionPath parses sessions/<name>/<file>; writeRedactionPassesPerSession generates a UUIDv7 pass ID, builds lfs.RedactionPass with catalog identity and summary, appends to each session meta.json via lfs.MutateSessionMeta (errors if meta missing), and returns mutated meta paths; summarizeRedactionEntries computes totals and per-detector counts; tests validate append behavior and privacy of persisted data.
Pre-Scan LFS Hydration
cmd/ox/session_redact_hydrate.go, cmd/ox/session_hydrate.go
New hydrateAllSessionsForScan detects pointer stubs, skips already-hydrated files, hydrates required files via openSessionContent, prints progress, and returns a hydrateSummary with inspected/hydrated/failed counts; OID preview logging guarded to avoid slicing panics.
Session-Scoped Scanning Implementation
cmd/ox/doctor_ledger_secrets.go
scanLedgerForSecrets(projectRoot, ledgerPath) now iterates sessions/<session>/ only, filters by allowlisted extensions, resolves/hydrates session files via resolveSessionContentForScan/openSessionContent, enforces size cap, aggregates per-detector counts without retaining matched bytes, and updates session/file/hydration counters; ledgerSecretsScanResult extended with session coverage counters.
Doctor Ledger Scanning Integration
cmd/ox/doctor_ledger_secrets.go, cmd/ox/doctor_ledger_secrets_test.go
checkLedgerSecrets runs pre-scan hydration, merges hydration outcomes into the scan result, builds a session/hydration-aware scope string, and treats hydration-failure coverage as an incompatible condition for “clean” results; tests assert session-scoped metrics and out-of-scope ignores.
Redaction History Workflow Orchestration
cmd/ox/session_redact_history.go
Cobra entrypoints split session audit (dry-run) and session redact (destructive), compute ProjectRoot for LFS-aware resolution, run pre-hydration, print Redactor catalog identity, accumulate lfs.RedactionEntry per session during interactive redaction, write a redaction pass per affected session before staging/amend, and re-scan; enumerateRedactHistoryFindings re-walks ledgerPath/sessions/* and resolves content via openSessionContent.
Tests & Fixtures
cmd/ox/*_test.go, internal/session/*_test.go
Update/add tests for session-scoped scanning metrics, allowlist and size-cap behavior, session-scoped redact workflow fixtures and minimalMetaJSON, redaction pass append tests, catalog identity determinism tests, stricter GitHub token fixtures, and small pre-push message update.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

  • sageox/ox#562: Touches openSessionContent and cache-only hydration unification used by this PR.
  • sageox/ox#564: Related session hydration behavior changes that overlap pre-scan logic.
  • sageox/ox#597: Overlaps ledger secret scanning and session-redaction code paths modified here.

Suggested labels

session, sageox

Poem

🐰 I hopped through session folders bright,
Hydrated stubs into the light.
Patterns sorted, catalogs known,
Each redaction stamped and sown.
Audit carrots tucked in rows—🥕 hooray!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: session scope, auto-hydration, catalog identity, and append-only redaction trail are all core features introduced in this PR.
Docstring Coverage ✅ Passed Docstring coverage is 90.24% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ryan/doctor-redact

Comment @coderabbitai help to get the list of available commands and usage tips.

@rsnodgrass rsnodgrass marked this pull request as ready for review May 11, 2026 18:20
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmd/ox/doctor_ledger_secrets_test.go (1)

125-130: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

This fixture no longer exercises the size-cap path.

sessions/huge.jsonl is directly under sessions/, but the scanner now only walks sessions/<session>/.... The test passes because the file is out of scope, not because the size cap is honored.

Suggested fix
 	work := makeLedgerForAuditTest(t, map[string]string{
-		"sessions/huge.jsonl": string(buf),
+		"sessions/huge/raw.jsonl": string(buf),
+		"sessions/huge/meta.json": "{}",
 	})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ox/doctor_ledger_secrets_test.go` around lines 125 - 130, The test uses
makeLedgerForAuditTest and scanLedgerForSecrets but puts the huge file at
"sessions/huge.jsonl", which the scanner no longer walks because it expects
files under "sessions/<session>/..."; move the fixture into a session
subdirectory (e.g., "sessions/<session_id>/huge.jsonl") so the scanner visits it
and the size-cap path is exercised, then re-run scanLedgerForSecrets and assert
the expected size-cap behavior.
🧹 Nitpick comments (1)
cmd/ox/session_redact_record_test.go (1)

61-136: 🏗️ Heavy lift

Add the concurrent MutateSessionMeta case before calling this done.

This only proves sequential appends. It will not catch the remaining risk called out in the PR description: a long redact-history run interleaving with a daemon/meta update and losing either the new RedactionPass or the concurrent mutation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ox/session_redact_record_test.go` around lines 61 - 136, Test currently
only exercises sequential appends; add a concurrent MutateSessionMeta case to
simulate a daemon/meta update racing with writeRedactionPassesPerSession so we
catch lost writes. Modify TestWriteRedactionPassesPerSession_AppendsToMeta to
spawn a goroutine that calls lfs.MutateSessionMeta (or the appropriate mutate
helper) to apply a distinct mutation to the same sessDir while
writeRedactionPassesPerSession is running, coordinate with channels or a
sync.WaitGroup to ensure the mutation happens during the redact pass, then after
both complete read the meta with lfs.ReadSessionMeta and assert both the
redaction pass and the concurrent mutation are present (i.e., length of
Redactions includes the new pass and the mutation-produced changes still exist)
to validate concurrent-safe append behavior of writeRedactionPassesPerSession.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/ox/doctor_ledger_secrets_test.go`:
- Around line 109-110: The test currently asserts absence of the old
"github_token" slug; update it to assert absence of the new per-prefix GitHub
detector slugs instead: replace the single assert.NotContains(t,
result.Findings, "github_token", ...) with assertions that ensure
result.Findings does not contain "github_personal_access_token" and any other
GitHub detector slugs (e.g. loop over a slice of expected GitHub slugs and call
assert.NotContains for each). Keep references to result.Findings and use
assert.NotContains so the test fails if any of the per-prefix GitHub slugs
surface.

In `@cmd/ox/doctor_ledger_secrets.go`:
- Around line 251-288: The SessionsSkipped counter is being incremented whenever
no file was read (anyFileRead) which conflates "no allowlisted/readable files"
with actual read/hydration failures; update the loop around
resolveSessionContentForScan and scanLedgerFileForSecrets to distinguish three
outcomes per session: (1) anyFileRead (at least one allowlisted, size-ok file
scanned) — keep result.FilesScanned and the existing flags; (2) hydration/read
failures — detect when resolveSessionContentForScan returns an error and set a
new boolean (e.g., sessionHydrationFailed) and increment a new result counter
(e.g., SessionsHydrationFailed) instead of treating it as skipped; and (3) truly
empty/unscannable sessions (only unsupported extensions or oversize files) —
increment result.SessionsSkipped only when no files were scanned and no
hydration/read errors occurred. Use the existing symbols hydratedThisSession,
anyFileRead, resolveSessionContentForScan, scanLedgerFileForSecrets,
ledgerSecretsScanExts, ledgerSecretsSizeCap, and update result fields
accordingly (add the new SessionsHydrationFailed or similar field).
- Around line 149-155: The logic currently returns PassedCheck when
result.SessionsSkipped > 0 but len(result.Findings) == 0; change this so
incomplete scans do not report a clean pass: if result.SessionsSkipped > 0 and
there are no findings, return a non-passing result (e.g., call FailedCheck or an
existing "incomplete" status helper instead of PassedCheck) and include the
sessions-skipped detail in the scope/message; update the branch around
result.SessionsSkipped / len(result.Findings) (referencing
result.SessionsSkipped, result.Findings, PassedCheck and the local name/scope
variables) so hydration/read failures downgrade the status rather than allowing
a PassedCheck.

In `@cmd/ox/session_redact_history.go`:
- Around line 385-436: The enumeration loop silently drops files when
openSessionContent, os.Stat, os.Open, or scanner.Scan fail; update the scanning
code (around the loop that calls openSessionContent, checks
ledgerSecretsSizeCap, opens files, and uses bufio.Scanner and
redactor.ScanForSecrets) to surface these failures instead of swallowing them:
either return an error immediately (e.g., wrap and return the underlying error)
or collect errors into an aggregated error/list and return a single “enumeration
incomplete” error at the end (or set a boolean and return a specific sentinel
error) so callers of the function can detect incomplete enumeration; ensure
failures from openSessionContent, os.Stat, os.Open, and scanner.Err() are
captured and included in the returned error message (reference symbols:
openSessionContent, ledgerSecretsSizeCap, redactHistoryFinding,
bufio.NewScanner/Scanner.Scan).

In `@cmd/ox/session_redact_hydrate.go`:
- Around line 67-70: The code silently skips unreadable session directories by
doing `files, err := os.ReadDir(sessionDir)` and `if err != nil { continue }`;
instead, on error you must record the failure and notify the operator: increment
the session hydration `Failed` counter/metric, emit a clear error log
referencing `sessionDir` and the `err`, and then continue; update any
per-session summary state so the hydration summary reflects the skipped session
as failed rather than claiming coverage. Ensure you change the `if err != nil`
branch in the block that calls `os.ReadDir(sessionDir)` to perform these actions
(increment `Failed`, log `sessionDir`+`err`) before continuing.

In `@cmd/ox/session_redact_record_test.go`:
- Around line 109-116: The current guard iterating over got1.Redactions and
doing `_ = e` is a no-op; replace it with real assertions: serialize the record
payload (the same bytes that would be persisted) and assert that it does not
contain any original/matched secret bytes, and/or use reflection to inspect each
lfs.RedactionEntry in got1.Redactions -> p.Entries to assert that no field named
"Original" or any byte/slice field contains non-empty secret data; ensure you
assert on the serialized output and on concrete struct fields rather than a
blank `_ = e` to prevent future regressions.

In `@internal/lfs/meta.go`:
- Around line 118-125: Multiple code paths perform read-modify-write on
meta.json without the flock in MutateSessionMeta, risking lost concurrent
updates; refactor each listed location (session_push_summary.go handling Files
near lines ~487–499, session_regenerate.go updates ~606–725,
session_upload_cmd.go reads/writes around 107 and 146–150, agent_session.go
around 1309–1329, and internal/daemon/agentwork/session_finalize.go ~1437–1440)
to call MutateSessionMeta(ctx, sessionDir, func(meta *SessionMeta)
(*SessionMeta, error) { ... }) and do all reads/changes to meta.Files and
Redactions inside that callback, returning the modified meta (or nil+err) so
WriteSessionMeta/WriteSessionMetaOnly are no longer called directly outside the
locked mutation.

In `@internal/session/secrets.go`:
- Around line 643-666: ScanForSecrets applies the MatchesKeyword gate but
RedactString, RedactStringWithDetails and ContainsSecrets do not, causing
inconsistent detection; update those methods (RedactString,
RedactStringWithDetails, ContainsSecrets on type Redactor) to perform the same
pre-check on each pattern using p.MatchesKeyword(lowerInput) (and skip patterns
with nil Pattern) before running regex MatchString or replacement logic,
mirroring the loop behavior used in ScanForSecrets so keyword-guarded patterns
are evaluated only when the lowercased input contains a keyword.
- Around line 107-140: The current regex patterns in internal/session/secrets.go
for github tokens use open-ended ranges (e.g., `{36,255}`, `{76,255}`,
`{82,255}`) which permit arbitrarily long lookalikes; tighten these to the fixed
suffix lengths described in the PR by changing the Pattern values for the
entries named "github_personal_access_token", "github_oauth_token",
"github_user_to_server_token", "github_server_token" to use `{36}` and for
"github_refresh_token" to `{76}` and "github_fine_grained_pat" to `{82}` (e.g.,
regexp.MustCompile(`ghp_[A-Za-z0-9]{36}`) etc.), leaving Redact/Keywords as-is
and updating any related tests to expect the stricter matches.

---

Outside diff comments:
In `@cmd/ox/doctor_ledger_secrets_test.go`:
- Around line 125-130: The test uses makeLedgerForAuditTest and
scanLedgerForSecrets but puts the huge file at "sessions/huge.jsonl", which the
scanner no longer walks because it expects files under "sessions/<session>/...";
move the fixture into a session subdirectory (e.g.,
"sessions/<session_id>/huge.jsonl") so the scanner visits it and the size-cap
path is exercised, then re-run scanLedgerForSecrets and assert the expected
size-cap behavior.

---

Nitpick comments:
In `@cmd/ox/session_redact_record_test.go`:
- Around line 61-136: Test currently only exercises sequential appends; add a
concurrent MutateSessionMeta case to simulate a daemon/meta update racing with
writeRedactionPassesPerSession so we catch lost writes. Modify
TestWriteRedactionPassesPerSession_AppendsToMeta to spawn a goroutine that calls
lfs.MutateSessionMeta (or the appropriate mutate helper) to apply a distinct
mutation to the same sessDir while writeRedactionPassesPerSession is running,
coordinate with channels or a sync.WaitGroup to ensure the mutation happens
during the redact pass, then after both complete read the meta with
lfs.ReadSessionMeta and assert both the redaction pass and the concurrent
mutation are present (i.e., length of Redactions includes the new pass and the
mutation-produced changes still exist) to validate concurrent-safe append
behavior of writeRedactionPassesPerSession.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2c6cd9b8-88e4-4580-a1bf-9bea4d68a369

📥 Commits

Reviewing files that changed from the base of the PR and between 716881e and 89d7a88.

📒 Files selected for processing (12)
  • cmd/ox/doctor_ledger_secrets.go
  • cmd/ox/doctor_ledger_secrets_test.go
  • cmd/ox/session_hydrate.go
  • cmd/ox/session_redact_history.go
  • cmd/ox/session_redact_hydrate.go
  • cmd/ox/session_redact_record.go
  • cmd/ox/session_redact_record_test.go
  • internal/lfs/meta.go
  • internal/session/catalog.go
  • internal/session/catalog_test.go
  • internal/session/secrets.go
  • internal/session/secrets_test.go

Comment thread cmd/ox/doctor_ledger_secrets_test.go Outdated
Comment thread cmd/ox/doctor_ledger_secrets.go Outdated
Comment thread cmd/ox/doctor_ledger_secrets.go
Comment thread cmd/ox/session_redact_history.go
Comment thread cmd/ox/session_redact_hydrate.go
Comment thread cmd/ox/session_redact_record_test.go Outdated
Comment thread internal/lfs/meta.go Outdated
Comment thread internal/session/secrets.go
Comment thread internal/session/secrets.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
cmd/ox/session_redact_history.go (4)

193-212: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use getLedgerPath() instead of reimplementing CLI ledger resolution.

This helper duplicates the local-config/default fallback chain instead of delegating to the canonical resolver used by other cmd/ox commands, so future ledger-path fixes can drift here.

Based on learnings, getLedgerPath() "encapsulates the full resolution chain (local config → default fallback) and is used consistently across murmur.go and doctor_ledger_git.go."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ox/session_redact_history.go` around lines 193 - 212,
resolveLedgerPathForSessionCmd reimplements the ledger path resolution logic
instead of using the canonical resolver; replace its manual local-config/default
fallback logic with a call to getLedgerPath() (and preserve the existing
override behavior: if override != "" return it), then validate the returned path
as before (e.g., check ledger.Exists(path)) so this command uses the same
resolution chain as other cmd/ox commands like murmur.go and
doctor_ledger_git.go.

679-731: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

redactFileInPlace is only safe for JSONL.

The scan now surfaces .json, .md, .txt, and .vtt, but this rewriter serializes each line through session.NewRawWriterTruncate. Approving redaction on meta.json, notes, or transcripts will rewrite the document as JSONL records instead of preserving its original format.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ox/session_redact_history.go` around lines 679 - 731, The current
redactFileInPlace function always reserializes each scanned line via
session.NewRawWriterTruncate and rw.WriteRaw, which converts arbitrary files
into JSONL; restrict this behavior to JSONL-only files by detecting the file
type/extension before processing (e.g., only run the scanner+WriteRaw path for
.jsonl/.json-lines), and for other extensions (.json, .md, .txt, .vtt) either
(a) skip reserialization and perform in-place text redaction by writing raw
bytes to the temp file (preserving original format and line endings) using a
simple os.Create/Write loop, or (b) bypass redactFileInPlace and use a separate
function that preserves original serialization; update references to
session.NewRawWriterTruncate, rw.WriteRaw, rw.CloseAndSync and the scanner path
accordingly so non-JSONL inputs are not emitted as JSONL.

360-363: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Redaction is editing the tracked file, not the hydrated bytes that were scanned.

Findings are produced from openSessionContent(...), but the destructive step rewrites filepath.Join(opts.LedgerPath, path). For dehydrated LFS sessions, that tracked path is still the pointer stub, so an approved redaction can modify different bytes than the audit actually matched.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ox/session_redact_history.go` around lines 360 - 363, The current
redaction calls redactFileInPlace on filepath.Join(opts.LedgerPath, path) which
mutates the tracked file (likely an LFS pointer) instead of the hydrated bytes
that were scanned by openSessionContent; change the flow to obtain the exact
hydrated content used for finding secrets (from openSessionContent or its
returned value), run the redactor on those hydrated bytes, and then persist the
redacted data back to the same storage/object that was actually scanned (not the
LFS pointer stub). Specifically, in the code path around openSessionContent and
the loop that currently calls redactFileInPlace(abs, redactor), replace the
direct in-place edit with: get hydrated bytes used to generate findings, apply
redactor to those bytes, and write the result back to the correct target (either
overwrite the hydrated file used for scanning or update the LFS object/content
store) so the bytes being changed match the bytes that were audited; ensure you
continue to reference opts.LedgerPath and path to locate the correct
session/file metadata but do not edit the pointer stub directly.

787-795: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reuse one scanner across prompts.

Creating a new bufio.Scanner on the same reader for every prompt consumes buffered input when the scanner is discarded. With piped or scripted stdin in the for _, fileFindings := range byPath loop, subsequent prompts hit EOF and skip remaining files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ox/session_redact_history.go` around lines 787 - 795, The bug is creating
a new bufio.Scanner for each prompt which consumes buffered stdin and causes
later prompts to see EOF; instead create a single scanner once and reuse it
across prompts by changing readRedactHistoryAnswer to accept a *bufio.Scanner
(or accept the shared scanner object) and call scanner.Scan()/scanner.Text()
inside that function; instantiate the scanner before the for _, fileFindings :=
range byPath loop and pass the same scanner into calls to
readRedactHistoryAnswer so you no longer construct a new bufio.NewScanner(stdin)
per prompt.
♻️ Duplicate comments (3)
cmd/ox/session_redact_history.go (1)

481-506: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enumeration can still silently drop actionable findings.

This loop keeps continue-ing on openSessionContent, os.Stat, os.Open, and scanner failures, so files can be counted by the aggregate scan and then disappear from the interactive list with no signal. Please fail the enumeration as incomplete instead of swallowing these paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ox/session_redact_history.go` around lines 481 - 506, The enumeration
currently swallows failures by using continue on openSessionContent, os.Stat,
os.Open and ignoring scanner errors, causing missing files to be silently
dropped; change these continues so the caller is notified (either by returning
an error or setting/returning an "incomplete" boolean) instead of skipping the
path: replace the continue on openSessionContent/Stat/Open with an immediate
return of a contextual error (or set a package-level/return flag) so the scan
fails as incomplete, ensure files opened with os.Open are closed via defer
f.Close(), and after the scanner loop check scanner.Err() and propagate that
error (or mark incomplete) rather than ignoring it; keep the existing out
[]redactHistoryFinding appends for successful scans.
cmd/ox/doctor_ledger_secrets.go (2)

251-288: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SessionsSkipped is still mixing failures with “nothing scannable.”

!anyFileRead increments SessionsSkipped for sessions that may only have unsupported or oversize files, while per-file hydration/reopen/scan failures are silently skipped inside the loop. Because anyFileRead/FilesScanned are set before scanLedgerFileForSecrets proves the file was actually readable, the later "hydration failed" summary is still unreliable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ox/doctor_ledger_secrets.go` around lines 251 - 288, The session-level
skip logic is inaccurate because FilesScanned and anyFileRead are set before
scanLedgerFileForSecrets actually succeeds; move/result updates so that only
successful scans mark the session as having readable files: keep
hydratedThisSession logic as-is, but remove result.FilesScanned++ and
anyFileRead = true from before calling scanLedgerFileForSecrets and instead set
anyFileRead = true and increment result.FilesScanned only after
scanLedgerFileForSecrets returns nil; leave size/extension checks and
resolveSessionContentForScan behavior unchanged so failed hydration or scan
still silently continue without marking the session as “readable,” and keep
incrementing result.SessionsHydrated when hydratedThisSession is true and
increment result.SessionsSkipped only when anyFileRead is false after the loop.

153-155: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't mark an incomplete scan as clean.

This still returns PassedCheck when SessionsSkipped > 0. If any session was unscannable, hydration/read failures can still masquerade as a clean audit result here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ox/doctor_ledger_secrets.go` around lines 153 - 155, The check currently
returns PassedCheck when result.Findings is empty even if some sessions were
unscannable; change the logic in the block that inspects result.Findings to also
check result.SessionsSkipped (or equivalent field on result) and, if
SessionsSkipped > 0, do not call PassedCheck(name, ...); instead return a
non-passing status (e.g., a Partial/IncompleteCheck or FailedCheck with a
message noting sessions skipped) so incomplete scans are not reported as clean;
update the branch that uses PassedCheck, referencing result.Findings,
result.SessionsSkipped, and the PassedCheck call.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@cmd/ox/session_redact_history.go`:
- Around line 193-212: resolveLedgerPathForSessionCmd reimplements the ledger
path resolution logic instead of using the canonical resolver; replace its
manual local-config/default fallback logic with a call to getLedgerPath() (and
preserve the existing override behavior: if override != "" return it), then
validate the returned path as before (e.g., check ledger.Exists(path)) so this
command uses the same resolution chain as other cmd/ox commands like murmur.go
and doctor_ledger_git.go.
- Around line 679-731: The current redactFileInPlace function always
reserializes each scanned line via session.NewRawWriterTruncate and rw.WriteRaw,
which converts arbitrary files into JSONL; restrict this behavior to JSONL-only
files by detecting the file type/extension before processing (e.g., only run the
scanner+WriteRaw path for .jsonl/.json-lines), and for other extensions (.json,
.md, .txt, .vtt) either (a) skip reserialization and perform in-place text
redaction by writing raw bytes to the temp file (preserving original format and
line endings) using a simple os.Create/Write loop, or (b) bypass
redactFileInPlace and use a separate function that preserves original
serialization; update references to session.NewRawWriterTruncate, rw.WriteRaw,
rw.CloseAndSync and the scanner path accordingly so non-JSONL inputs are not
emitted as JSONL.
- Around line 360-363: The current redaction calls redactFileInPlace on
filepath.Join(opts.LedgerPath, path) which mutates the tracked file (likely an
LFS pointer) instead of the hydrated bytes that were scanned by
openSessionContent; change the flow to obtain the exact hydrated content used
for finding secrets (from openSessionContent or its returned value), run the
redactor on those hydrated bytes, and then persist the redacted data back to the
same storage/object that was actually scanned (not the LFS pointer stub).
Specifically, in the code path around openSessionContent and the loop that
currently calls redactFileInPlace(abs, redactor), replace the direct in-place
edit with: get hydrated bytes used to generate findings, apply redactor to those
bytes, and write the result back to the correct target (either overwrite the
hydrated file used for scanning or update the LFS object/content store) so the
bytes being changed match the bytes that were audited; ensure you continue to
reference opts.LedgerPath and path to locate the correct session/file metadata
but do not edit the pointer stub directly.
- Around line 787-795: The bug is creating a new bufio.Scanner for each prompt
which consumes buffered stdin and causes later prompts to see EOF; instead
create a single scanner once and reuse it across prompts by changing
readRedactHistoryAnswer to accept a *bufio.Scanner (or accept the shared scanner
object) and call scanner.Scan()/scanner.Text() inside that function; instantiate
the scanner before the for _, fileFindings := range byPath loop and pass the
same scanner into calls to readRedactHistoryAnswer so you no longer construct a
new bufio.NewScanner(stdin) per prompt.

---

Duplicate comments:
In `@cmd/ox/doctor_ledger_secrets.go`:
- Around line 251-288: The session-level skip logic is inaccurate because
FilesScanned and anyFileRead are set before scanLedgerFileForSecrets actually
succeeds; move/result updates so that only successful scans mark the session as
having readable files: keep hydratedThisSession logic as-is, but remove
result.FilesScanned++ and anyFileRead = true from before calling
scanLedgerFileForSecrets and instead set anyFileRead = true and increment
result.FilesScanned only after scanLedgerFileForSecrets returns nil; leave
size/extension checks and resolveSessionContentForScan behavior unchanged so
failed hydration or scan still silently continue without marking the session as
“readable,” and keep incrementing result.SessionsHydrated when
hydratedThisSession is true and increment result.SessionsSkipped only when
anyFileRead is false after the loop.
- Around line 153-155: The check currently returns PassedCheck when
result.Findings is empty even if some sessions were unscannable; change the
logic in the block that inspects result.Findings to also check
result.SessionsSkipped (or equivalent field on result) and, if SessionsSkipped >
0, do not call PassedCheck(name, ...); instead return a non-passing status
(e.g., a Partial/IncompleteCheck or FailedCheck with a message noting sessions
skipped) so incomplete scans are not reported as clean; update the branch that
uses PassedCheck, referencing result.Findings, result.SessionsSkipped, and the
PassedCheck call.

In `@cmd/ox/session_redact_history.go`:
- Around line 481-506: The enumeration currently swallows failures by using
continue on openSessionContent, os.Stat, os.Open and ignoring scanner errors,
causing missing files to be silently dropped; change these continues so the
caller is notified (either by returning an error or setting/returning an
"incomplete" boolean) instead of skipping the path: replace the continue on
openSessionContent/Stat/Open with an immediate return of a contextual error (or
set a package-level/return flag) so the scan fails as incomplete, ensure files
opened with os.Open are closed via defer f.Close(), and after the scanner loop
check scanner.Err() and propagate that error (or mark incomplete) rather than
ignoring it; keep the existing out []redactHistoryFinding appends for successful
scans.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 08d5077c-88b4-41c9-a2f3-57ffdc22e2bd

📥 Commits

Reviewing files that changed from the base of the PR and between 89d7a88 and 5e83877.

📒 Files selected for processing (4)
  • cmd/ox/doctor_ledger_secrets.go
  • cmd/ox/session_redact_history.go
  • cmd/ox/session_redact_record.go
  • cmd/ox/session_redact_record_test.go
✅ Files skipped from review due to trivial changes (1)
  • cmd/ox/session_redact_record_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/ox/session_redact_record.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cmd/ox/session_redact_history_test.go (2)

211-248: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert the redaction audit write in meta.json.

This test seeds meta.json for the new append-only redaction pass path, but it never verifies that a pass entry was actually appended. Adding that assertion would lock in the ox-8bfh behavior and catch regressions in MutateSessionMeta integration.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ox/session_redact_history_test.go` around lines 211 - 248, The test
TestRunRedactHistoryWorkflow_InteractiveYesRedacts currently checks file
redaction and snapshot backup but doesn't assert that MutateSessionMeta actually
appended the redaction audit to sessions/leaky/meta.json; after calling
runRedactHistoryWorkflow, read and parse the working-tree meta.json (or its JSON
bytes) and assert it contains the expected redaction pass entry (e.g., a
"redaction_pass"/"RedactionPass" object or an appended entry in whatever
slice/field MutateSessionMeta writes), so the test verifies that the
MutateSessionMeta integration updated meta.json as intended.

281-306: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten quit-flow assertion to guarantee no mutation on abort.

The test currently proves sess-b is unchanged, but it can still pass if sess-a was modified before handling "q". Capture and compare sess-a before/after as well, so quit semantics are unambiguous.

Suggested test hardening
 func TestRunRedactHistoryWorkflow_QuitAbortsWithoutWritingMore(t *testing.T) {
 	work := makeLedgerWithCommitAndOrigin(t, map[string]string{
 		"sessions/sess-a/raw.jsonl": "AKIAIOSFODNN7EXAMPLE\n",
 		"sessions/sess-b/raw.jsonl": "AKIAIOSFODNN7EXAMPLE\n",
 	})
+	aPath := filepath.Join(work, "sessions/sess-a/raw.jsonl")
+	originalA, err := os.ReadFile(aPath)
+	require.NoError(t, err)
 	bPath := filepath.Join(work, "sessions/sess-b/raw.jsonl")
 	originalB, err := os.ReadFile(bPath)
 	require.NoError(t, err)
 	out := &bytes.Buffer{}

@@
 	require.NoError(t, err)

+	stillA, err := os.ReadFile(aPath)
+	require.NoError(t, err)
+	assert.Equal(t, originalA, stillA, "q should abort without mutating current file")
+
 	// sess-b raw.jsonl must be unchanged (we quit before reaching it OR after the first prompt)
 	stillThere, err := os.ReadFile(bPath)
 	require.NoError(t, err)
 	assert.Equal(t, originalB, stillThere)
 	assert.Contains(t, out.String(), "Aborting")
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/ox/session_redact_history_test.go` around lines 281 - 306, The test
should also capture and assert that "sessions/sess-a/raw.jsonl" was not mutated
to guarantee quit aborted before any file was processed: before calling
runRedactHistoryWorkflow read and save the contents of the sess-a file (e.g.
create aPath := filepath.Join(work, "sessions/sess-a/raw.jsonl") and originalA,
err := os.ReadFile(aPath)), then after the workflow completes read the file
again and require/assert that originalA == stillThereA; keep the existing checks
for sess-b, out buffer and "Aborting" unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@cmd/ox/session_redact_history_test.go`:
- Around line 211-248: The test
TestRunRedactHistoryWorkflow_InteractiveYesRedacts currently checks file
redaction and snapshot backup but doesn't assert that MutateSessionMeta actually
appended the redaction audit to sessions/leaky/meta.json; after calling
runRedactHistoryWorkflow, read and parse the working-tree meta.json (or its JSON
bytes) and assert it contains the expected redaction pass entry (e.g., a
"redaction_pass"/"RedactionPass" object or an appended entry in whatever
slice/field MutateSessionMeta writes), so the test verifies that the
MutateSessionMeta integration updated meta.json as intended.
- Around line 281-306: The test should also capture and assert that
"sessions/sess-a/raw.jsonl" was not mutated to guarantee quit aborted before any
file was processed: before calling runRedactHistoryWorkflow read and save the
contents of the sess-a file (e.g. create aPath := filepath.Join(work,
"sessions/sess-a/raw.jsonl") and originalA, err := os.ReadFile(aPath)), then
after the workflow completes read the file again and require/assert that
originalA == stillThereA; keep the existing checks for sess-b, out buffer and
"Aborting" unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 31f25d9a-2db9-4457-af0a-d4515262f6a1

📥 Commits

Reviewing files that changed from the base of the PR and between 5e83877 and 1b4624e.

📒 Files selected for processing (2)
  • cmd/ox/prepush_scan_test.go
  • cmd/ox/session_redact_history_test.go

@rsnodgrass rsnodgrass merged commit 8967a47 into main May 11, 2026
2 of 3 checks passed
@rsnodgrass rsnodgrass deleted the ryan/doctor-redact branch May 11, 2026 22:09
galexy added a commit that referenced this pull request May 12, 2026
* fix(doctor): post-eeqi credential checks + correct user guidance

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: SageOx <ox@sageox.ai>
SageOx-Session: https://sageox.ai/repo/repo_019c5812-01e9-7b7d-b5b1-321c471c9777/sessions/2026-05-11T22-36-galexy-OxE2aB/view

* fix(doctor): fail (not warn) on invalid API URL; defuse PAT literal in tests

Co-Authored-By: SageOx <ox@sageox.ai>
SageOx-Session: https://sageox.ai/repo/repo_019c5812-01e9-7b7d-b5b1-321c471c9777/sessions/2026-05-11T22-36-galexy-OxE2aB/view

* fix(comments): drop stray --check=ledger-secrets reference from #599

Co-Authored-By: SageOx <ox@sageox.ai>
SageOx-Session: https://sageox.ai/repo/repo_019c5812-01e9-7b7d-b5b1-321c471c9777/sessions/2026-05-11T22-36-galexy-OxE2aB/view

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: SageOx <ox@sageox.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant