Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 41 additions & 63 deletions .dev/status/current-handoff.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# agent-memory current handoff

Status: AI-authored draft. Not yet human-approved.
Last updated: 2026-05-01 01:55 KST
Last updated: 2026-05-01 10:21 KST

## Trigger for the next session

Expand All @@ -16,7 +16,7 @@ read this file first. Do not ask the user to restate context. Verify repo state,

## Ready-to-say answer

agent-memory는 v0.1.37까지 배포/Hermes QA가 완료됐고, 현재는 실제 dogfood QA에서 발견된 observation 데이터 품질 이슈를 고치는 slice를 진행 중이야. 브랜치는 `fix/observation-dogfood-quality`, worktree는 `/Users/reddit/Project/agent-memory/.worktrees/observation-dogfood-quality`야. 목표는 query preview 제거, `hermes hooks doctor/test` synthetic pre-LLM payload가 dogfood observation을 오염시키지 않게 하기, audit에 데이터 부족/empty retrieval 품질 경고를 추가하기, 그리고 기존 DB에서 `memory_status_transitions` table이 없을 때 approve/review가 lazy migration 되도록 하는 거야. 실제 Hermes가 agent-memory에서 가져온 정보를 답변에 사용하는 E2E도 확인했어.
agent-memory는 v0.1.38까지 배포/Hermes QA가 완료됐고, 현재는 Priority 5 dogfood/noise monitoring의 다음 slice인 read-only observation review candidate report를 진행 중이야. 브랜치는 `feat/observation-review-candidates`, worktree는 `/Users/reddit/Project/agent-memory/.worktrees/observation-review-candidates`야. 목표는 `observations audit`의 top injected refs를 `review explain`, replacement/supersedes chain, `graph inspect` 요약과 copy-paste follow-up commands로 연결하는 거야. 자동 cleanup/mutation은 하지 않고 forensic review만 강화한다.

## Current repo state

Expand All @@ -32,15 +32,17 @@ Expected GitHub identity:

Verified base before this slice:

- latest completed release: `v0.1.37`
- v0.1.37 added read-only `agent-memory observations audit` and was published to GitHub/npm/PyPI.
- local Hermes hook uses `/Users/reddit/.agent-memory/runtime/v0.1.37/.venv/bin/agent-memory` against `/Users/reddit/.agent-memory/memory.db`.
- latest completed release: `v0.1.38`
- v0.1.38 removed observation query previews, skipped Hermes doctor/test synthetic observations, added audit quality warnings, and verified a real Hermes turn used an agent-memory fact.
- local Hermes hook uses `/Users/reddit/.agent-memory/runtime/v0.1.38/.venv/bin/agent-memory` against `/Users/reddit/.agent-memory/memory.db`.
- root checkout was clean on `main...origin/main` except local-only untracked state.
- open PRs were `[]`.

Active slice/worktree:

- branch: `fix/observation-dogfood-quality`
- worktree: `/Users/reddit/Project/agent-memory/.worktrees/observation-dogfood-quality`
- intended release after merge: likely `v0.1.38`
- branch: `feat/observation-review-candidates`
- worktree: `/Users/reddit/Project/agent-memory/.worktrees/observation-review-candidates`
- intended release after merge: likely `v0.1.39`

Expected local untracked artifacts to preserve in the root checkout:

Expand All @@ -52,86 +54,62 @@ Expected local untracked artifacts to preserve in the root checkout:

Do not delete or commit these unless the user explicitly asks.

## Current slice: observation dogfood data quality
## Current slice: observation review candidates

Goal:

- Keep observation telemetry useful for real dogfood QA.
- Avoid storing prompt-like query previews.
- Avoid synthetic hook doctor/test payloads polluting observation audits.
- Make audit explicitly report low-signal data states.
- Ensure existing DBs lazily migrate missing lifecycle tables encountered during real local QA.
- Keep dogfood/noise monitoring read-only.
- Turn `observations audit` top refs into actionable forensic review candidates.
- Help operators see lifecycle status, replacement chains, and relation graph hints without exposing raw user queries or mutating memory.

Implemented so far in the active worktree:

- `record_retrieval_observation` now writes `query_preview = None` for new observations.
- Hermes pre-LLM hook detects the deterministic `hermes hooks doctor/test` payload:
- session_id `test-session`
- user_message `What is the weather?`
- empty conversation_history
- is_first_turn true
- model `gpt-4`
- platform `cli`
- Synthetic doctor/test payloads still exercise hook context injection but do not write dogfood observation rows.
- `observations audit` now returns:
- `empty_retrieval_ratio`
- `quality_warnings`
- `no_observations`
- `low_observation_count`
- `high_empty_retrieval_ratio`
- `memory_status_transitions` now has lazy/idempotent schema ensure used by initialize, status update, and status history paths.
- New CLI:
- `agent-memory observations review-candidates <db_path> --limit N --top N --frequent-threshold N`
- Output contract:
- `kind: retrieval_observation_review_candidates`
- `read_only: true`
- nested `observation_audit` payload
- `candidates[]` derived from `top_memory_refs`
- fact refs include `review_explain` payload equivalent to `review explain fact`
- graph summary includes depth-1 relation neighbor refs and edge count
- signals include existing audit signals plus `has_replacement` and `has_graph_relations` when applicable
- copy-paste commands for `review explain`, `review replacements`, and `graph inspect`
- Refactored CLI review explain to reuse `_fact_review_explanation_payload`.
- Docs updated:
- `README.md`
- `docs/hermes-dogfood.md`
- Tests added/updated in `tests/test_cli.py`:
- query preview is absent from observation list output
- audit reports low-signal empty retrievals
- approve-fact migrates existing DBs missing `memory_status_transitions`
- Hermes hook synthetic doctor payload skips observation write
- Hermes hook context includes retrieved memory content when line budget allows
- Test added in `tests/test_cli.py`:
- `test_python_module_cli_observations_review_candidates_explains_top_refs_without_mutation_or_raw_queries`

Verification so far:

- RED confirmed:
- query_preview still present
- synthetic doctor payload wrote observation rows
- audit lacked `empty_retrieval_ratio`/`quality_warnings`
- existing DB without `memory_status_transitions` failed approve with sqlite OperationalError
- `observations review-candidates` was not a valid subcommand.
- GREEN focused:
- `uv run pytest tests/test_cli.py::test_python_module_cli_approve_fact_migrates_existing_database_without_status_transition_table tests/test_cli.py::test_python_module_cli_retrieve_observe_records_secret_safe_local_observation tests/test_cli.py::test_python_module_cli_observations_audit_reports_low_signal_empty_retrievals tests/test_cli.py::test_python_module_cli_hermes_pre_llm_hook_skips_synthetic_doctor_observation tests/test_cli.py::test_python_module_cli_hermes_pre_llm_hook_injects_retrieved_memory_context -q`
- `5 passed`

Live local Hermes QA already confirmed on v0.1.37 runtime before this patch:

- Created a temporary approved fact in `/Users/reddit/.agent-memory/memory.db` with marker `AM_LIVE_E2E_1777567838` scoped to `/Users/reddit/Project/agent-memory`.
- Direct hook check confirmed:
- `direct_hook_contains_marker=True`
- `direct_hook_contains_agent_memory_context=True`
- `direct_hook_contains_retrieved_fact=True`
- Actual Hermes command confirmed the model used injected memory:
- `hermes --accept-hooks -z "What is the Hermes live E2E QA marker? Return only the marker and nothing else."`
- output contained `AM_LIVE_E2E_1777567838`
- Cleanup done:
- test fact id 2 deprecated with reason `live E2E QA cleanup`
- `review explain` showed `visible_in_default_retrieval: false`
- During live QA, an existing DB migration gap was discovered:
- approve failed until `agent-memory init ~/.agent-memory/memory.db` created `memory_status_transitions`
- this is now covered by the new lazy migration test/fix.
- `uv run pytest tests/test_cli.py::test_python_module_cli_observations_review_candidates_explains_top_refs_without_mutation_or_raw_queries -q`
- `1 passed`
- Broader focused:
- audit, review-candidates, review explain, graph inspect CLI tests
- `4 passed`
- Help smoke:
- `PYTHONPATH=src uv run python -m agent_memory.api.cli observations review-candidates --help`
- exit 0

Remaining before PR:

1. Run broader focused group and full local verification:
1. Run full local verification:
- `uv run pytest tests/ -q`
- `uv run python scripts/check_release_metadata.py`
- `uv run python scripts/smoke_release_readiness.py`
- `npm pack --dry-run`
- `git diff --check`
- `node --check bin/agent-memory.js`
2. Run real smoke for observation list/audit on a temp DB and confirm query_preview is null and no raw secret-like text appears.
2. Run real temp-DB smoke for `observations review-candidates` and confirm no raw secret-like query text appears.
3. Run static diff secret scan.
4. Create PR, watch CI, merge, follow release-sync/publish/published smoke/Hermes QA.
5. After v0.1.38 install, repeat Hermes hook doctor and one real E2E check with the new runtime.
5. After v0.1.39 install, repeat Hermes hook doctor and run installed `observations review-candidates` against the existing local DB.

## Next natural slice after this one

After this data-quality fix is released and Hermes QA passes, continue dogfood/noise monitoring using the cleaner audit data. Avoid mutating cleanup or broader graph retrieval until there are enough real, non-synthetic observations to justify ranking/scope changes.
After the read-only review candidate report is released and dogfooded, continue gathering real observation data. If enough non-synthetic observations accumulate, the next likely work is retrieval quality diagnostics for high empty-retrieval ratios or scope/ranking misses. Avoid automatic cleanup/deprecation until the review candidate workflow has been used on real local data.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,10 @@ For local dogfood and noise monitoring, retrievals can leave a secret-safe obser
agent-memory retrieve "$DB" "How should I install agent-memory?" --preferred-scope user:default --observe cli
agent-memory observations list "$DB" --limit 20
agent-memory observations audit "$DB" --limit 200 --top 10 --frequent-threshold 3
agent-memory observations review-candidates "$DB" --limit 200 --top 10 --frequent-threshold 3
```

Use the observation log and audit report to spot frequently injected or surprising memories before changing retrieval behavior. The audit output is read-only JSON with surface/scope counts, empty-retrieval count and ratio, quality warnings such as `low_observation_count` or `high_empty_retrieval_ratio`, top injected memory refs, current status for known refs, and simple signals such as `frequently_injected` and `current_status_not_approved`. Treat it as local operator telemetry, not a synced analytics stream.
Use the observation log and audit report to spot frequently injected or surprising memories before changing retrieval behavior. The audit output is read-only JSON with surface/scope counts, empty-retrieval count and ratio, quality warnings such as `low_observation_count` or `high_empty_retrieval_ratio`, top injected memory refs, current status for known refs, and simple signals such as `frequently_injected` and `current_status_not_approved`. `observations review-candidates` is also read-only; it turns the top audit refs into forensic candidates with fact review explanations, replacement-chain hints, graph-neighborhood summaries, and copy-paste follow-up commands such as `review explain`, `review replacements`, and `graph inspect`. Treat these reports as local operator telemetry, not a synced analytics feature or an automatic cleanup workflow.

## Hermes quickstart

Expand Down
10 changes: 10 additions & 0 deletions docs/hermes-dogfood.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,20 @@ Hermes pre-LLM hook retrievals write a secret-safe local observation row to the
```bash
agent-memory observations list ~/.agent-memory/memory.db --limit 20
agent-memory observations audit ~/.agent-memory/memory.db --limit 200 --top 10 --frequent-threshold 3
agent-memory observations review-candidates ~/.agent-memory/memory.db --limit 200 --top 10 --frequent-threshold 3
```

Use this before tuning ranking or adding broader graph traversal: first confirm which memories are frequently injected, which scopes are active, whether retrieval is often empty, and whether any frequently injected refs are now deprecated/disputed/missing. The audit command is read-only and summarizes local observation rows without emitting raw query text or query previews. Keep this data local unless you intentionally export it.

`observations review-candidates` is the next read-only step after audit. It keeps the same secret-safe observation summary, then expands each top ref into a forensic candidate:

- fact refs include the same lifecycle explanation as `agent-memory review explain fact ...`.
- replacement/supersedes chains are surfaced as candidate signals instead of mutating anything.
- relation graph neighbors are summarized so you know when `agent-memory graph inspect ...` is worth running.
- the JSON includes copy-paste follow-up commands for `review explain`, `review replacements`, and `graph inspect`.

Do not treat review candidates as automatic cleanup recommendations. They are a short list for human review; approve/deprecate/supersede decisions should still be explicit curation actions.

When the audit reports `quality_warnings`, treat them as QA signals rather than cleanup instructions:

- `no_observations`: Hermes has not produced dogfood observation data yet; check hook install/allowlist and run a real prompt.
Expand Down
Loading