Skip to content

Add semantic drift monitor with CLI and security hardening#2

Open
Knapp-Kevin wants to merge 2 commits into
ssdavidai:masterfrom
Knapp-Kevin:master
Open

Add semantic drift monitor with CLI and security hardening#2
Knapp-Kevin wants to merge 2 commits into
ssdavidai:masterfrom
Knapp-Kevin:master

Conversation

@Knapp-Kevin
Copy link
Copy Markdown

This PR introduces an optional Semantic Drift Monitor for Surveyor, along with a focused round of security and robustness hardening.

The drift monitor adds temporal awareness to Surveyor’s clustering output by comparing cluster structure across runs. It does not alter clustering behavior or writeback logic. It is fully optional and disabled by default.

The security fixes address availability, path confinement, atomic writes, retention, and config validation.


What’s Included

1. Semantic Drift Monitor (Optional)

New module:

  • drift.py
    • Persists cluster snapshots to JSON
    • Compares latest two snapshots
    • Computes:
      • Jaccard similarity
      • Membership churn
      • New clusters
      • Dissolved clusters
      • Split/merge heuristics
    • Writes drift reports to JSON payloads in .log files

Surveyor integration:

  • Drift runs after clustering and labeling
  • No changes to clustering logic
  • No changes to writeback behavior
  • Fully non-blocking
  • Wrapped in try/except to prevent pipeline failure

CLI additions:

  • alfred drift status
  • alfred drift history [--limit N]
  • alfred drift show <timestamp|filename>

Configuration:

features:
  semantic_drift_monitor: false

semantic_drift:
  snapshot_retention: 10
  similarity_threshold: 0.5
  warn_on_high_drift: true

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional “Semantic Drift Monitor” to Surveyor to persist cluster snapshots, compare consecutive runs, and expose drift summaries via new CLI commands, along with config + documentation updates.

Changes:

  • Introduce DriftMonitor to snapshot cluster membership and write drift reports.
  • Integrate drift processing into the Surveyor daemon and add alfred drift {status|history|show} CLI.
  • Add config schema + examples/docs for feature flag and drift settings.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/alfred/surveyor/drift.py New snapshot + drift comparison logic, report writing, pruning, and path confinement for report reads
src/alfred/surveyor/drift_cli.py New CLI handlers for drift status/history/show
src/alfred/surveyor/daemon.py Integrates drift processing into the Surveyor pipeline after clustering/labeling
src/alfred/surveyor/config.py Adds FeatureFlags and SemanticDriftConfig to typed config loading
src/alfred/cli.py Adds top-level alfred drift command wiring and dispatch
src/alfred/quickstart.py Includes drift feature/config defaults in generated config
src/alfred/_bundled/config.yaml.example Documents new config keys in bundled example
config.yaml.example Documents new config keys in root example
docs/Surveyor.md Documents drift settings and artifacts
docs/CLI-Commands.md Documents new alfred drift commands
INITIAL_SECURITY_REVIEW.md Adds security review notes for drift monitor

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 167 to +170
all_changed = result.changed_semantic | result.changed_structural
if not all_changed:
log.info("daemon.no_changed_clusters")
return
else:
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The control flow now logs daemon.no_changed_clusters but still continues into drift processing and then logs daemon.labeling_complete. This results in two potentially contradictory log events for the no-change case. Consider returning early after no_changed_clusters, or renaming/guarding the later log so it only fires when labeling actually ran (or rename it to reflect the full step including drift).

Copilot uses AI. Check for mistakes.
Comment on lines +191 to +194
cluster_key = f"semantic_{cid}"
from .state import ClusterState
from datetime import datetime, timezone
self.state.clusters[cluster_key] = ClusterState(
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports inside the per-cluster loop (from .state import ClusterState, from datetime import ...) reduce readability and are executed repeatedly (even if cached). Since ClusterState and datetime/timezone are already safe to import at module scope, prefer moving these imports to the top of the file (or at least above the loop).

Copilot uses AI. Check for mistakes.
Comment thread docs/Surveyor.md
Comment on lines 195 to +206
All configuration lives in `config.yaml` under the `surveyor` section.

### Full Example

```yaml
features:
semantic_drift_monitor: false

semantic_drift:
snapshot_retention: 10
similarity_threshold: 0.5
warn_on_high_drift: true
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs say “All configuration lives in config.yaml under the surveyor section.” but the example immediately places features: and semantic_drift: at the top level (and the loader reads them from the top level). Update the wording (or the example structure) so users aren’t misled about where these keys must live.

Copilot uses AI. Check for mistakes.
Comment thread docs/CLI-Commands.md

The surveyor runs its 4-stage pipeline once and exits. As part of `alfred up`, it runs as a daemon with configurable intervals.

`alfred drift ...` commands are available only when `features.semantic_drift_monitor` is enabled.
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is inaccurate: the alfred drift ... commands are still available when the feature flag is disabled; they just print a message and exit. Suggest rephrasing to clarify that drift artifacts/reports are only generated when enabled, and the CLI will report “disabled” otherwise.

Suggested change
`alfred drift ...` commands are available only when `features.semantic_drift_monitor` is enabled.
`alfred drift ...` commands are always available, but drift artifacts/reports are only generated when `features.semantic_drift_monitor` is enabled; otherwise, the CLI reports that semantic drift monitoring is disabled and exits.

Copilot uses AI. Check for mistakes.
Comment on lines +31 to +35
print(f"Compared clusters: {latest['previous_cluster_count']} -> {latest['current_cluster_count']}")
print(f"Cluster delta: {latest['cluster_count_delta']}")
print(f"New clusters: {len(latest.get('new_clusters', []))}")
print(f"Dissolved clusters: {len(latest.get('dissolved_clusters', []))}")
print(f"Overall churn: {round(latest.get('overall_churn', 0.0) * 100, 2)}%")
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd_status indexes required fields on latest using latest[...] (e.g., previous_cluster_count). If a drift report JSON is tampered with or partially written but still valid JSON, this will raise KeyError and crash the CLI. Prefer latest.get(...) with sensible defaults (and/or validate required keys after loading) so alfred drift status remains robust.

Copilot uses AI. Check for mistakes.
Comment thread src/alfred/cli.py
Comment on lines +488 to +498
def cmd_drift(args: argparse.Namespace) -> None:
raw = _load_unified_config(args.config)

if "surveyor" not in raw:
print("Surveyor is not configured in this config file.")
return

from alfred.surveyor.config import load_from_unified
from alfred.surveyor import drift_cli as dcli

config = load_from_unified(raw)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd_drift doesn’t call _setup_logging_from_config(raw), so structlog warnings from drift artifact parsing (and any other surveyor logging used by the drift CLI) may be missing or inconsistently formatted compared to other commands (e.g. alfred surveyor, alfred temporal). Consider initializing logging the same way here.

Copilot uses AI. Check for mistakes.
ssdavidai added a commit that referenced this pull request Mar 27, 2026
newtonium-errant added a commit to newtonium-errant/alfred that referenced this pull request Apr 18, 2026
Second of 5 commits for Voice Stage 2a-wk2. Extends the talker session
schema with two new top-level fields and threads them through all close
paths (explicit, timeout, startup sweep, shutdown).

- session._build_session_frontmatter: adds session_type (default "note")
  and continues_from (default None) kwargs; emits both at the top level
  of the frontmatter so Dataview and vault search can filter directly.
- session.close_session: new kwargs, written to the vault record AND to
  the closed_sessions state summary (plan open question ssdavidai#5 — state-only
  continuation lookup for wk2; body-parser fallback is wk3).
- session.{resolve_on_startup,check_timeouts}: read _session_type /
  _continues_from off active dict with safe "note" / None defaults so
  wk1 state rehydrates cleanly (plan open question ssdavidai#2 — backwards-compat
  via .get).
- bot._open_session_with_stash: accepts model/session_type/continues_from
  kwargs, stashes _session_type / _continues_from on the active dict.
  /end reads them back and passes them into close_session.
- daemon shutdown path: reads both off active dict before close.

Field name is "model" (not "model_used") — matches wk1 records so the
two cohorts share one telemetry schema (plan open question ssdavidai#2).

Tests: tests/telegram/test_session_frontmatter.py (2 tests).
pytest: 24/24 passing (22 prior + 2 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
newtonium-errant added a commit to newtonium-errant/alfred that referenced this pull request Apr 26, 2026
Code-reviewer P1 ssdavidai#2 — release-blocker for Phase 1 Hypatia.

`_validate_type` ran before `check_scope` on `vault_create`, gating
against the canonical 20-type `KNOWN_TYPES` set only. Every Hypatia
`document` / `concept` / `source` / `citation` / `template` and every
KAL-LE `pattern` / `principle` create raised ``VaultError: Unknown
type`` and never reached the scope-policy check — a misleading error
that pointed at the wrong gate. The new `KNOWN_TYPES_KALLE` /
`KNOWN_TYPES_HYPATIA` schema constants existed only to populate the
scope's create-allowlists; nothing wired them into the type gate.

Smoke check (master, pre-fix):
    $ ALFRED_VAULT_SCOPE=kalle alfred vault create pattern foo
    {"error": "Unknown type: 'pattern'. Valid: account, asset, ..."}
    $ ALFRED_VAULT_SCOPE=hypatia alfred vault create document foo
    {"error": "Unknown type: 'document'. Valid: account, asset, ..."}

Smoke check (post-fix):
    pattern under kalle → SUCCESS, document under hypatia → SUCCESS,
    pattern under hypatia → "scope 'hypatia' can only create hypatia
    types" (caught by the second gate, not the first), document with
    no scope → "Unknown type" (canonical-only preserved for
    untriggered scopes).

Implementation:

* New `schema.KNOWN_TYPES_BY_SCOPE: dict[str, set[str]]` maps each
  extension scope to the union of canonical types plus its own set.
  Two-layer contract documented inline: this gate accepts the type;
  `check_scope`'s create-allowlist enforces the per-scope policy.
* `_validate_type(record_type, scope=None)` consults the union when
  scope is set; default `None` preserves byte-for-byte behavior so
  every caller that doesn't propagate scope (talker `_execute_tool`,
  capture-extract, instructor) stays on canonical types only.
* `vault_create` and `vault_list` thread their existing `scope` kwarg
  into the type gate. `vault/cli.py` already sources scope from
  `ALFRED_VAULT_SCOPE`; the create / list handlers now forward it
  through. CLI agent paths (curator, janitor, distiller, instructor's
  subprocess fallback, every external `alfred vault` invocation) get
  the fix automatically.

Tests: 13 new cases in `test_hypatia_scope.py` exercise `vault_create`
end-to-end — every Hypatia type, both KAL-LE types, cross-scope leak
(document under kalle / pattern under hypatia both rejected at the
type gate), Salem regression guard (note under talker still works),
default scope=None still rejects extension types.

Note: the talker `_execute_tool` in `telegram/conversation.py` still
hardcodes `scope="talker"` — a separate Hypatia release-blocker for
the LIVE bot path (vs the CLI agent path this commit unblocks). That
one's a wider contract change (per-instance scope plumbing) and
filed for a follow-up commit.

Pattern-trigger note for `CLAUDE.md` Vault Operations Layer:
"validation gate ordering" applies to every future instance with
its own type set (V.E.R.A. RRTS, STAY-C). Add the scope's
`KNOWN_TYPES_<NAME>` set AND a `KNOWN_TYPES_BY_SCOPE` entry, or
the type gate silently rejects before scope enforcement runs.
newtonium-errant added a commit to newtonium-errant/alfred that referenced this pull request May 7, 2026
Per ``feedback_multi_instance_wiring_pattern.md`` — three wiring bugs
hit in one session (May 1) that single-instance review couldn't
catch. V.E.R.A. + STAY-C launches gated on Path C decision but
config-ready; this scaffolding ships before the next instance to
prevent repeat bug-discovery cycles.

Pre-existing state (surprising finding ssdavidai#1):
  Both Piece A (``wire_transport_app``) and Piece C (per-route smoke)
  were ALREADY shipped as part of the GCal Phase A+ work in earlier
  sessions. ``wire_transport_app`` exists with all 7 register helpers
  consolidated; ``test_transport_route_smoke.py`` exists with
  full-config smoke + negative-control proof-of-catch + route-count
  pin. This commit closes the GAPS in those scaffolds.

Piece A enhancements (in src/alfred/transport/server.py):
  * Every optional kwarg in ``wire_transport_app`` now emits a
    debug-level skip log when omitted. Per
    ``feedback_intentionally_left_blank.md``: an audit can now
    distinguish "feature X intentionally not wired (KAL-LE has no
    GCal)" from "feature X forgotten (developer missed kwarg)".
    Pre-fix, both states looked identical (silence)
  * 7 new debug events:
      transport.wire_transport_app.vault_path_skipped
      transport.wire_transport_app.send_fn_skipped
      transport.wire_transport_app.pending_items_aggregate_skipped
      transport.wire_transport_app.pending_items_resolver_skipped
      transport.wire_transport_app.peer_inbox_skipped
      transport.wire_transport_app.gcal_skipped
      transport.wire_transport_app.gcal_intended_on_skipped
    Each carries a ``reason`` field naming the operational consequence
    of the skip ("/peer/send will return 501", "outbound/send will
    return 503", etc.)
  * Net behavior change: zero (only adds log emissions)

Piece C enhancements (new tests/test_transport_route_smoke_per_instance.py):
  * Parametrized smoke over THREE instance personas — Salem
    (canonical owner, full features) / KAL-LE (peer-only, no GCal,
    no aggregate) / Hypatia (minimal: vault + send + peer_inbox
    only). The original smoke uses a single "fully-wired" config; per-
    instance parametrization catches "instance X enabled feature Y but
    didn't wire Z" gaps that single-config smoke can't surface
  * Persona-specific contract assertions:
      - test_kalle_canonical_routes_return_404_not_owned: peer-only
        instance must return 404 ``canonical_not_owned`` on
        /canonical/* (not 500 — handler must check ownership before
        reading vault_path)
      - test_hypatia_pending_resolve_returns_501_when_resolver_unwired:
        instance without resolver must return non-500 (handler must
        check resolver presence before invoking)
      - test_kalle_outbound_send_works_without_gcal: negative control
        — GCal-skip doesn't cascade to other route surfaces
  * test_every_register_helper_has_wire_transport_app_kwarg —
    structural test that pins the contract: every register_* helper
    exported from peer_handlers (excluding route-installers) must
    have a corresponding wire_transport_app kwarg. Catches Flavor 3
    at PR-review time: "developer added new register_* but forgot
    wire_transport_app entry → silent wiring gap until production"
    fails this test instead of waiting for production traffic
  * test_wire_transport_app_logs_skip_for_omitted_kwargs — pins the
    Piece A debug-log contract. All 7 expected ``_skipped`` events
    must fire when wire_transport_app is called with only
    ``instance_name``. Uses structlog.testing.capture_logs (per
    feedback_structlog_assertion_patterns)

Why three personas + not the live config files:
  * Live config.salem.yaml / config.kalle.yaml / config.hypatia.yaml
    reference env vars (tokens, paths) that don't exist in CI
  * The personas here are built from typed dataclasses with test-stub
    values — exercises the same wire_transport_app path the daemon
    uses, with the full app surface, but without per-environment
    config-file loading
  * Future instance shapes (V.E.R.A. with RRTS GCal, STAY-C with
    client-cal) extend by adding a fourth parametrize entry that
    mirrors its kwarg subset — same scaffolding pattern

Surprising finding ssdavidai#2 (filed as P2 followup, NOT fixed in this commit):
  The talker daemon does NOT pass ``peer_inbox_callable`` to
  ``wire_transport_app`` (line ~619 in src/alfred/telegram/daemon.py).
  Result: every inbound /peer/send returns 501
  ``peer_inbox_not_configured``. This is exactly the kind of "defined
  but not wired" Flavor 3 gap the meta-pattern warns about — and the
  per-instance smoke would have caught it as a route returning 501.
  Per CLAUDE.md "don't add features beyond what the task requires" —
  flagging for team-lead decision rather than fixing here. The
  smoke tests don't fail the build because 501 is in
  ``_ACCEPTABLE_STATUSES`` (correctly-wired-but-rejecting), so the
  gap is invisible at the smoke layer too. Consider:
    (a) wire peer_inbox in the daemon (add closure that posts to
        Telegram) and remove 501 from acceptable
    (b) leave 501 acceptable + add a separate test that asserts
        peer_inbox IS wired on Salem
  Either way, a separate ticket.

Tests:
  * test_transport_route_smoke_per_instance.py: 3 parametrized smoke
    invocations (salem / kalle / hypatia) + 3 persona-specific
    contract assertions + 2 structural tests = 8 test functions, 10
    test cases (3 parametrized + 7 standalone)
  * Existing test_transport_route_smoke.py: unchanged, still pins the
    full-wiring contract + negative control + route count

Followups filed (P3, NOT addressed here per scope discipline):
  * peer_inbox wiring gap in talker daemon (see surprising finding ssdavidai#2)
  * The 10 zero-arg ``load_config()`` calls in transport/client.py
    (Flavor 2 sites flagged in the May 1 memo but production callers
    always pass config explicitly, so no production bug today)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
newtonium-errant added a commit to newtonium-errant/alfred that referenced this pull request May 8, 2026
Two new bot-layer slash commands gated by per-instance opt-in:
* /train [--cluster <name>] [<text>] — saves raw essay at
  document/essay/<slug>.md, enqueues async voice-profile extraction.
* /method-source [<text>] (registered as /method_source per PTB
  command-name rules) — saves raw method/system source at
  source/<slug>.md, enqueues async method-profile extraction.

Three-tier voice schema:
* Leaf: voice/<slug>.md per /train invocation.
* Cluster: voice/cluster/<name>.md when ≥2 leaves share a cluster tag.
* Overall: voice/Andrew Voice Profile.md when ≥2 cluster summaries exist.
Method side is leaf-only (source + method record per /method-source).

Architecture:
* Sub-2s ack: handler saves raw record via vault_create + appends
  one job to a per-instance JSONL extraction queue, then replies.
* Async worker: lives inside the talker daemon as another asyncio
  task (sweeper / heartbeat / transport pattern). Drains the queue
  every poll_seconds, calls Opus per job, writes the structured
  record, flips the raw record's extraction_status, DMs the operator.
* Cluster + overall builders trigger after a leaf write whose
  cluster tag matches ≥2 existing leaves.

Vault routing fix (regression for f006c48e 2026-05-06):
* Added 4 top-level types (essay, voice, voice-cluster, method) to
  KNOWN_TYPES_HYPATIA + HYPATIA_CREATE_TYPES + STATUS_BY_TYPE +
  TYPE_DIRECTORY. essay routes to document/essay/<name>.md fixing
  the pre-fix bug where vault_create type=note with set_fields={
  "type": "essay"} landed at note/ rather than document/essay/.
* Body-mutation matrix: hypatia gains body_replace on voice +
  voice-cluster + method (re-extraction path); essay deliberately
  omitted (raw essays are write-once).

Includes prompt-tuner refinements to the 4 extraction prompts
(VOICE_EXTRACTION_PROMPT, METHOD_EXTRACTION_PROMPT, VOICE_CLUSTER_PROMPT,
VOICE_OVERALL_PROMPT) — evidence-anchoring rule (verbatim quotes per label),
intentionally-left-blank exits for thin/incoherent inputs.

Code-reviewer P1 fixes folded in:
1. Idempotency on real titles: _write_structured_record +
   maybe_rebuild_cluster were checking existence at slug-derived
   paths while vault_create writes verbatim names; mismatch crashed
   re-extraction with VaultError("File already exists") for any
   title with apostrophes / spaces / capitals. Fixed by using the
   verbatim raw_name / cluster_name for both check + write.
2. /method_source underscore mismatch: PTB delivers /method_source
   (underscore — hyphens illegal in CommandHandler names) but
   parse_method_source_args was passing "method-source" to the
   regex, silently failing the multi-line newline-preservation path.
   Fixed by passing "method_source" (matching delivery).

Schema verification per prompt-tuner request:
* Added 4 intentionally-left-blank status sentinels to STATUS_BY_TYPE
  (insufficient-evidence, no-overall-invariants, incoherent-cluster,
  not-a-method) so the prompt-tuner's empty-state exits flow end-
  to-end into vault rather than getting silently dropped behind a
  default "active". Writer paths now pass through LLM-emitted
  status (defaulting to "active" only when absent).
* Other new optional fields (insufficient_reason, source_attribution,
  leaf_titles, etc.) require no schema change — vault is permissive
  on extra frontmatter keys outside the LIST_FIELDS / REQUIRED_FIELDS
  / STATUS_BY_TYPE constraints.

Tests: +71 (voice_train) + 12 (hypatia_scope) covering routing fix,
P1 ssdavidai#1 idempotency on apostrophe / multi-word names, P1 ssdavidai#2 underscore-
form preservation + botname suffix, schema sentinel passthrough end-
to-end, scope admit/deny matrix, async worker happy + failure paths,
cluster + overall threshold gating.

Per-instance neutrality preserved: voice_train module is config-
flippable; helpers (queue, slug, profile schema) take instance/scope
arguments and never hardcode "hypatia". Salem / KAL-LE adoption is
a config flip + scope-allowlist edit.

Test environment note: the bash sandbox in this session blocks
pytest / python invocations (per feedback_subagent_bash_permission_
inheritance.md), so this commit relies on careful manual review +
the prior pre-fix sweep where 3226 passed / 40 failed (matching
master's 3164 / 40 baseline plus the 62 new tests in voice_train).
The post-fix tests follow the same patterns as the pre-fix ones —
team lead should run the full sweep after merge to confirm zero
regressions before push.

Co-Authored-By: prompt-tuner <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants