Local-first experience memory for coding agents.
Index raw Oh My Pi session transcripts, then recall provenance-backed decisions, runbooks, and gotchas — without ever modifying OMP state.
Install · Quick start · How it works · CLI · MCP server · Contributing
A forensic and experience memory over your actual coding sessions. It reads the session JSONL files already on disk and writes only to its own local index database. Every result traces back to the exact conversation and exchange that produced it, so you can answer questions like "where did we solve this before", "what did the agent actually say", and "which session decided X".
Read-only with respect to OMP state: it never edits, compresses, or curates OMP's own memory. It indexes the raw transcripts and exposes them through a CLI and an MCP server.
- Why not just use OMP memory?
- What it does
- Install
- Quick start
- How it works
- CLI
- Benchmarks
- MCP server
- Development
- Requirements
- Layout
- Contributing
- Acknowledgements
- License
OMP Hindsight and Mnemopi are curated operational memory for the agent's working context. omp-episodic-memory is read-only forensic transcript memory: exact evidence, raw-session citations, and a reviewable inbox for derived decisions, runbooks, and gotchas. It never modifies OMP state.
| Need | Use OMP native memory | Use omp-episodic-memory |
|---|---|---|
| Remember preferences | Yes. Store durable user and project preferences for automatic use. | Optional. Use transcripts as evidence before promoting a preference. |
| Auto-recall at session start | Yes. Native memory is built for ambient context. | No. Recall is explicit and task-scoped. |
| Audit a prior session verbatim | No. Curated memory is compressed. | Yes. Search and read exact transcript exchanges. |
| Trace memory to exact exchange | Limited. Summaries may omit raw provenance. | Yes. Every result carries session, path, and ordinal. |
| Review before promoting derived memory | Limited. Native memory is already curated. | Yes. extract proposes records into an approve/reject inbox. |
| Avoid writing to OMP state | No. Native memory is OMP state. | Yes. The index is separate and read-only toward OMP. |
| Debug stale/contradictory memory | Limited. You see the current distilled view. | Yes. Compare decisions, supersession, and source evidence. |
| Project | Lane | Relationship |
|---|---|---|
| Native OMP memory | Curated working memory for agents. | Complement: this tool supplies transcript evidence before you update native memory. |
| Obra episodic-memory | Episodic memory for agent experience. | Closest neighbor, with this project focused on local OMP transcript forensics and review. |
| Mem0 | General-purpose application memory service/framework. | Different lane: this project favors local-first evidence over broad app memory APIs. |
| Zep/Graphiti | Graph-backed long-term memory for agents and apps. | Different lane: this project keeps a narrow, read-only transcript index with provenance. |
| Letta | Stateful agent runtime with memory as part of the agent system. | Different lane: this project is an external audit and recall layer for existing OMP sessions. |
OMP's built-in memory is curated and compressed — a distilled view optimized for the agent's working context. That is useful, but it is lossy: the original wording, the dead ends, and the precise moment a decision was made are gone.
This tool takes the opposite stance. It indexes the raw transcripts as they sit on disk and gives you provenance back to the exact conversation and exchange. Use it to answer:
- Where did we solve this before?
- What did the agent actually say (verbatim), not the summary?
- Which session decided X, and what was the reasoning at the time?
The index is read-only with respect to OMP state. Derived memory (decisions, gotchas, runbooks) is proposed into a separate reviewable inbox — nothing is asserted into your knowledge base without an explicit approve step.
This is not a competitor to general-purpose agent memory frameworks (Mem0, Zep, Letta) or to OMP-native curation (Hindsight). Its lane is narrow on purpose: raw-transcript provenance plus reviewable derived memory for OMP coding sessions.
- Hybrid search — FTS5 keyword retrieval and
sqlite-vecvector retrieval fused with Reciprocal Rank Fusion (RRF). Modes:both,vector,text. - Typed, reviewable derived memory — decisions, gotchas, and runbooks extracted from transcripts into an approve/reject inbox. Nothing enters the knowledge base without review.
recall_for_taskevidence bundles — task-scoped retrieval that returns supporting evidence with a confidence score and abstains when the index has nothing relevant, rather than fabricating an answer.- Temporal project graph — entities and time-bounded edges, with decision supersession and a memory diff to see what changed since a given date.
- Pinned project-context blocks — durable, project-scoped context surfaced alongside recall.
- Recall eval harness — a reproducible benchmark over question/session fixtures that reports recall, ranking, abstention, and latency metrics as a regression guardrail.
Requires Node.js 20 or newer.
npm install -g omp-episodic-memory # global CLI: omp-episodicOr run without installing:
npx -y -p omp-episodic-memory omp-episodic index
npx -y -p omp-episodic-memory omp-episodic search "family tree research"omp-episodic index # index all sessions
omp-episodic search "family tree research"
omp-episodic statsThe default index path is ${XDG_DATA_HOME:-~/.local/share}/omp-episodic-memory/index.db. Override it with OMP_EPISODIC_DB or --db PATH.
| Stage | What happens |
|---|---|
| Parse | Walks ${OMP_SESSIONS_DIR:-~/.omp/agent/sessions}/**/*.jsonl, assembling each user turn plus the assistant reply that followed into an Exchange, including tool calls/results, command text, file paths, error state, details, and exit status. |
| Embed | Uses Xenova/all-MiniLM-L6-v2 (384-d) via @xenova/transformers. First run downloads the model if it is not already cached. No API keys are required. Tool event text contributes to retrieval. |
| Store | Writes exchanges, serialized tool events, FTS5 keyword tables, and a sqlite-vec vec0 vector table to local SQLite. |
| Search | Fuses vector and keyword branches with Reciprocal Rank Fusion (RRF). Supports both, vector, text, tool-name filters, and tool-error filters. |
| Derive | Extracts typed memory (decisions, gotchas, runbooks) into a reviewable inbox; builds a temporal entity/edge graph with supersession. |
omp-episodic index # index all sessions
omp-episodic search "sqlite-vec" --mode text # keyword-only search
omp-episodic search "Command exited with code 1" --mode text --tool bash --tool-error true
omp-episodic recall "fix flaky vector search" # task-scoped evidence bundle
omp-episodic stats # index statistics
omp-episodic recall "fix flaky vector search" --ui # OMP-styled TTY panel| Command | Description |
|---|---|
index |
Index OMP transcripts into the local SQLite database. |
search |
Hybrid search over indexed exchanges (--mode both|vector|text, --tool NAME, --tool-error true|false). |
recall |
Build a task-scoped evidence bundle with confidence and abstention; supports the same tool filters as search. |
stats |
Show index statistics (exchanges, sessions, date range). |
extract |
Propose typed derived memories (decisions/gotchas/runbooks) into the inbox. |
inbox |
List derived memories by status (pending/approved/rejected/superseded). |
approve |
Approve a pending derived memory by id. |
reject |
Reject a derived memory by id, with an optional reason. |
memories |
Search approved/derived memories by query, type, project, or status. |
graph |
Build or inspect the temporal project graph (entities and edges). |
diff |
Show what derived memory changed since a given date. |
eval |
Run the recall eval harness over a question/session fixture set. |
context |
Show pinned project-context blocks plus recent approved decisions/gotchas/runbooks. |
blocks |
Manage pinned project-context blocks (list, set <kind>, rm <id>). |
Common flags: --mode both|vector|text, --limit N, --after YYYY-MM-DD, --before YYYY-MM-DD, --project P, --json, --ui, --plain, --db PATH, --sessions DIR, --max N.
Terminal polish: --ui (or OMP_EPISODIC_UI=1) enables an ANSI π recall panel for TTY output on search, recall, stats, and inbox. Piped output, --json, and --plain always stay deterministic for scripts and tests.
Environment:
| Variable | Purpose |
|---|---|
OMP_EPISODIC_DB |
Index database path. |
OMP_SESSIONS_DIR |
Default session corpus for CLI indexing. |
OMP_EPISODIC_SESSIONS_DIR |
Root allowed by the MCP read tool. Set this if you index a non-default session directory. |
XDG_DATA_HOME |
Base directory for the default index path. |
The eval command runs a reproducible recall benchmark over a fixture set of exact, decision, procedural, temporal, multi-session, gotcha, runbook, contradiction, and abstention questions:
omp-episodic eval --questions <file> --sessions <dir> --mode textIt builds (or reuses, with --no-build) an index from the fixtures, runs each question through recall, and reports:
| Metric | Meaning |
|---|---|
| Recall@1 / Recall@5 | Fraction of questions whose expected source appears in the top 1 / top 5 results. |
| MRR | Mean reciprocal rank of the expected source. |
| Abstention accuracy | Fraction of unanswerable questions on which recall correctly abstains. |
| False-positive rate | Fraction of unanswerable questions answered anyway (confident when it should abstain). |
| p50 / p95 latency | Median and tail per-query latency. |
Current baseline on the bundled synthetic fixtures (text mode):
| Metric | Result |
|---|---|
| Scored recall questions | 30 |
| Recall@5 | 100% |
| Abstention false-positive rate | 0% |
| p95 latency | < 500ms |
| Extraction precision | 92.6% |
| Unlabeled extraction candidates | 0 |
| Duplicate rate | 0% |
These numbers are on small synthetic fixtures. They are a regression guardrail to catch retrieval/abstention regressions, not a leaderboard claim about real-world corpora.
The bench command runs the recall benchmark and the extraction-quality
benchmark together, scoring both against a two-tier threshold model:
omp-episodic bench --questions <file> --sessions <dir> --labels <file> --mode text- Gates are CI-blocking floors: at least 30 scored recall questions, Recall@5 ≥ 85%, abstention-FP < 10%, p95 < 500ms, extraction precision ≥ 80%, zero unlabeled extraction candidates, and duplicate rate < 10%. A failed gate exits non-zero, so CI goes red.
- Targets are the aspirational SOTA bars (extraction precision ≥ 85%,
Recall@1 ≥ 85%, MRR ≥ 0.80). They are reported with
→when unmet but never fail the build. They mark the gap you close by growing the gold set.
CI runs this exact command on every push (see .github/workflows/ci.yml).
The bundled fixture precision baseline is measured on a synthetic set. To measure and improve extraction quality on your own transcripts, label real candidates:
-
Generate a labels template from your sessions (one row per extracted candidate, pre-filled
correct: true):omp-episodic label-scaffold --sessions ~/.omp/agent/sessions > my-labels.jsonl
-
Review each row in
my-labels.jsonl. Each carriestitle,matchedText, andrulecontext. Flipcorrecttofalsefor any candidate that is noise (a false positive), and tightentitleSubstringif the default first-four- words match is too broad. The eval loader reads onlysessionId,ordinal,type,titleSubstring, andcorrect; the context fields are ignored. -
Re-run the bench against your labeled set to see real precision:
omp-episodic bench --questions <file> --sessions ~/.omp/agent/sessions --labels my-labels.jsonl
As the labeled set grows and precision climbs past the 85% target, raise the
gate floor in src/bench.ts to lock in the gain.
The package ships a second binary, omp-episodic-mcp (./dist/mcp-server.js), that runs the MCP stdio server. Register it in any harness that speaks MCP (Claude Code, Codex, Oh My Pi).
Using the published package via npx (the -p flag selects the named bin, since it differs from the package name):
{
"mcpServers": {
"omp-episodic-memory": {
"command": "npx",
"args": ["-y", "-p", "omp-episodic-memory", "omp-episodic-mcp"]
}
}
}If installed globally (npm install -g omp-episodic-memory), the omp-episodic-mcp command is on your PATH:
{
"mcpServers": {
"omp-episodic-memory": {
"command": "omp-episodic-mcp"
}
}
}For a local checkout, build first (bun run build) and point at the file directly:
{
"mcpServers": {
"omp-episodic-memory": {
"command": "node",
"args": ["/absolute/path/to/omp-episodic-memory/dist/mcp-server.js"]
}
}
}Tools:
| Tool | Purpose |
|---|---|
search |
Hybrid retrieval over indexed sessions. Returns markdown or JSON. |
read |
Reads a full session transcript by path, constrained to the configured sessions root. |
recall_for_task |
Task-scoped evidence bundle with confidence tiers and explicit abstention. |
list_gotchas |
Approved failure-mode memories for a project/task, so the agent avoids repeating a known mistake. |
get_project_context |
Pinned project context plus recent approved decisions, gotchas, and runbooks. |
The MCP server starts an embedding-model prewarm in the background. A first vector search can still be slow if the model cache is cold or the download has not finished; mode: "text" avoids the embedding path.
Local development uses Bun:
bun install # install dependencies
bun run check # type-check (tsc --noEmit)
bun run test # run the test suiteTests run on Node's built-in test runner via tsx (node --import tsx --test).
- Node.js 20+
- Bun for local development commands
- A platform supported by
better-sqlite3andsqlite-vec - Network access on first embedding run unless the Transformers.js model is already cached
| File | Role |
|---|---|
src/types.ts |
Shared contract and portable defaults. |
src/parser.ts |
OMP JSONL to Exchange[] parser. See FORMAT.md. |
src/db.ts |
SQLite schema, read-only open path, and upsert/re-embed writes. |
src/embeddings.ts |
MiniLM embedding singleton with balanced user/assistant truncation. |
src/indexer.ts |
Crawl, embed, upsert, and persist pipeline. |
src/search.ts |
Hybrid RRF retrieval. |
src/cli.ts |
CLI commands: index, search, recall, stats, extract, inbox, approve, reject, memories, graph, diff, eval, context, blocks. |
src/blocks.ts |
Pinned project-context blocks and the project-context aggregator. |
src/mcp-server.ts |
MCP stdio server: search, read, recall_for_task, list_gotchas, get_project_context. |
Contributions are welcome. To get started:
-
Fork and clone the repo, then run
bun install. -
Make your change with a focused commit.
-
Run the gates locally before opening a PR:
bun run check # type-check bun run test # test suite
-
Open a pull request describing the change and its motivation. CI runs the type-check, test suite, and the OMP-MemBench gate on every push.
Bug reports and feature requests are tracked in GitHub Issues. See RELEASING.md for the release process and CHANGELOG.md for the version history.
omp-episodic-memory was inspired in part by Jesse Vincent's
episodic-memory, which brings
semantic recall to Claude Code and Codex conversations.
This project is an independent Oh My Pi-focused implementation. Its emphasis is raw OMP transcript provenance, reviewable derived memories, task-scoped recall, gotchas/runbooks, and recall-quality evaluation.
MIT © omp-episodic-memory contributors