🧠 omp-episodic-memory

Local-first experience memory for coding agents.

Index raw Oh My Pi session transcripts, then recall provenance-backed decisions, runbooks, and gotchas — without ever modifying OMP state.

Install · Quick start · How it works · CLI · MCP server · Contributing

A forensic and experience memory over your actual coding sessions. It reads the session JSONL files already on disk and writes only to its own local index database. Every result traces back to the exact conversation and exchange that produced it, so you can answer questions like "where did we solve this before", "what did the agent actually say", and "which session decided X".

Read-only with respect to OMP state: it never edits, compresses, or curates OMP's own memory. It indexes the raw transcripts and exposes them through a CLI and an MCP server.

Why not just use OMP memory?

What this is

OMP Hindsight and Mnemopi are curated operational memory for the agent's working context. omp-episodic-memory is read-only forensic transcript memory: exact evidence, raw-session citations, and a reviewable inbox for derived decisions, runbooks, and gotchas. It never modifies OMP state.

Need	Use OMP native memory	Use omp-episodic-memory
Remember preferences	Yes. Store durable user and project preferences for automatic use.	Optional. Use transcripts as evidence before promoting a preference.
Auto-recall at session start	Yes. Native memory is built for ambient context.	No. Recall is explicit and task-scoped.
Audit a prior session verbatim	No. Curated memory is compressed.	Yes. Search and read exact transcript exchanges.
Trace memory to exact exchange	Limited. Summaries may omit raw provenance.	Yes. Every result carries session, path, and ordinal.
Review before promoting derived memory	Limited. Native memory is already curated.	Yes. `extract` proposes records into an approve/reject inbox.
Avoid writing to OMP state	No. Native memory is OMP state.	Yes. The index is separate and read-only toward OMP.
Debug stale/contradictory memory	Limited. You see the current distilled view.	Yes. Compare decisions, supersession, and source evidence.

Project	Lane	Relationship
Native OMP memory	Curated working memory for agents.	Complement: this tool supplies transcript evidence before you update native memory.
Obra episodic-memory	Episodic memory for agent experience.	Closest neighbor, with this project focused on local OMP transcript forensics and review.
Mem0	General-purpose application memory service/framework.	Different lane: this project favors local-first evidence over broad app memory APIs.
Zep/Graphiti	Graph-backed long-term memory for agents and apps.	Different lane: this project keeps a narrow, read-only transcript index with provenance.
Letta	Stateful agent runtime with memory as part of the agent system.	Different lane: this project is an external audit and recall layer for existing OMP sessions.

OMP's built-in memory is curated and compressed — a distilled view optimized for the agent's working context. That is useful, but it is lossy: the original wording, the dead ends, and the precise moment a decision was made are gone.

This tool takes the opposite stance. It indexes the raw transcripts as they sit on disk and gives you provenance back to the exact conversation and exchange. Use it to answer:

Where did we solve this before?
What did the agent actually say (verbatim), not the summary?
Which session decided X, and what was the reasoning at the time?

The index is read-only with respect to OMP state. Derived memory (decisions, gotchas, runbooks) is proposed into a separate reviewable inbox — nothing is asserted into your knowledge base without an explicit approve step.

This is not a competitor to general-purpose agent memory frameworks (Mem0, Zep, Letta) or to OMP-native curation (Hindsight). Its lane is narrow on purpose: raw-transcript provenance plus reviewable derived memory for OMP coding sessions.

What it does

Hybrid search — FTS5 keyword retrieval and sqlite-vec vector retrieval fused with Reciprocal Rank Fusion (RRF). Modes: both, vector, text.
Typed, reviewable derived memory — decisions, gotchas, and runbooks extracted from transcripts into an approve/reject inbox. Nothing enters the knowledge base without review.
recall_for_task evidence bundles — task-scoped retrieval that returns supporting evidence with a confidence score and abstains when the index has nothing relevant, rather than fabricating an answer.
Temporal project graph — entities and time-bounded edges, with decision supersession and a memory diff to see what changed since a given date.
Pinned project-context blocks — durable, project-scoped context surfaced alongside recall.
Recall eval harness — a reproducible benchmark over question/session fixtures that reports recall, ranking, abstention, and latency metrics as a regression guardrail.

Install

Requires Node.js 20 or newer.

npm install -g omp-episodic-memory   # global CLI: omp-episodic

Or run without installing:

npx -y -p omp-episodic-memory omp-episodic index
npx -y -p omp-episodic-memory omp-episodic search "family tree research"

Quick start

omp-episodic index                       # index all sessions
omp-episodic search "family tree research"
omp-episodic stats

The default index path is ${XDG_DATA_HOME:-~/.local/share}/omp-episodic-memory/index.db. Override it with OMP_EPISODIC_DB or --db PATH.

How it works

Stage	What happens
Parse	Walks `${OMP_SESSIONS_DIR:-~/.omp/agent/sessions}/*/.jsonl`, assembling each user turn plus the assistant reply that followed into an `Exchange`, including tool calls/results, command text, file paths, error state, details, and exit status.
Embed	Uses `Xenova/all-MiniLM-L6-v2` (384-d) via `@xenova/transformers`. First run downloads the model if it is not already cached. No API keys are required. Tool event text contributes to retrieval.
Store	Writes exchanges, serialized tool events, FTS5 keyword tables, and a `sqlite-vec` `vec0` vector table to local SQLite.
Search	Fuses vector and keyword branches with Reciprocal Rank Fusion (RRF). Supports `both`, `vector`, `text`, tool-name filters, and tool-error filters.
Derive	Extracts typed memory (decisions, gotchas, runbooks) into a reviewable inbox; builds a temporal entity/edge graph with supersession.

CLI

omp-episodic index                              # index all sessions
omp-episodic search "sqlite-vec" --mode text    # keyword-only search
omp-episodic search "Command exited with code 1" --mode text --tool bash --tool-error true
omp-episodic recall "fix flaky vector search"   # task-scoped evidence bundle
omp-episodic stats                              # index statistics
omp-episodic recall "fix flaky vector search" --ui # OMP-styled TTY panel

Command reference

Command	Description
`index`	Index OMP transcripts into the local SQLite database.
`search`	Hybrid search over indexed exchanges (`--mode both\|vector\|text`, `--tool NAME`, `--tool-error true\|false`).
`recall`	Build a task-scoped evidence bundle with confidence and abstention; supports the same tool filters as `search`.
`stats`	Show index statistics (exchanges, sessions, date range).
`extract`	Propose typed derived memories (decisions/gotchas/runbooks) into the inbox.
`inbox`	List derived memories by status (pending/approved/rejected/superseded).
`approve`	Approve a pending derived memory by id.
`reject`	Reject a derived memory by id, with an optional reason.
`memories`	Search approved/derived memories by query, type, project, or status.
`graph`	Build or inspect the temporal project graph (entities and edges).
`diff`	Show what derived memory changed since a given date.
`eval`	Run the recall eval harness over a question/session fixture set.
`context`	Show pinned project-context blocks plus recent approved decisions/gotchas/runbooks.
`blocks`	Manage pinned project-context blocks (`list`, `set <kind>`, `rm <id>`).

Common flags: --mode both|vector|text, --limit N, --after YYYY-MM-DD, --before YYYY-MM-DD, --project P, --json, --ui, --plain, --db PATH, --sessions DIR, --max N.

Terminal polish: --ui (or OMP_EPISODIC_UI=1) enables an ANSI π recall panel for TTY output on search, recall, stats, and inbox. Piped output, --json, and --plain always stay deterministic for scripts and tests.

Environment:

Variable	Purpose
`OMP_EPISODIC_DB`	Index database path.
`OMP_SESSIONS_DIR`	Default session corpus for CLI indexing.
`OMP_EPISODIC_SESSIONS_DIR`	Root allowed by the MCP `read` tool. Set this if you index a non-default session directory.
`XDG_DATA_HOME`	Base directory for the default index path.

Benchmarks

The eval command runs a reproducible recall benchmark over a fixture set of exact, decision, procedural, temporal, multi-session, gotcha, runbook, contradiction, and abstention questions:

omp-episodic eval --questions <file> --sessions <dir> --mode text

It builds (or reuses, with --no-build) an index from the fixtures, runs each question through recall, and reports:

Metric	Meaning
Recall@1 / Recall@5	Fraction of questions whose expected source appears in the top 1 / top 5 results.
MRR	Mean reciprocal rank of the expected source.
Abstention accuracy	Fraction of unanswerable questions on which recall correctly abstains.
False-positive rate	Fraction of unanswerable questions answered anyway (confident when it should abstain).
p50 / p95 latency	Median and tail per-query latency.

Current baseline on the bundled synthetic fixtures (text mode):

Metric	Result
Scored recall questions	30
Recall@5	100%
Abstention false-positive rate	0%
p95 latency	< 500ms
Extraction precision	92.6%
Unlabeled extraction candidates	0
Duplicate rate	0%

These numbers are on small synthetic fixtures. They are a regression guardrail to catch retrieval/abstention regressions, not a leaderboard claim about real-world corpora.

OMP-MemBench (combined gate)

The bench command runs the recall benchmark and the extraction-quality benchmark together, scoring both against a two-tier threshold model:

omp-episodic bench --questions <file> --sessions <dir> --labels <file> --mode text

Gates are CI-blocking floors: at least 30 scored recall questions, Recall@5 ≥ 85%, abstention-FP < 10%, p95 < 500ms, extraction precision ≥ 80%, zero unlabeled extraction candidates, and duplicate rate < 10%. A failed gate exits non-zero, so CI goes red.
Targets are the aspirational SOTA bars (extraction precision ≥ 85%, Recall@1 ≥ 85%, MRR ≥ 0.80). They are reported with → when unmet but never fail the build. They mark the gap you close by growing the gold set.

CI runs this exact command on every push (see .github/workflows/ci.yml).

Growing the extraction gold set on your real sessions

The bundled fixture precision baseline is measured on a synthetic set. To measure and improve extraction quality on your own transcripts, label real candidates:

Generate a labels template from your sessions (one row per extracted candidate, pre-filled correct: true):
```
omp-episodic label-scaffold --sessions ~/.omp/agent/sessions > my-labels.jsonl
```
Review each row in my-labels.jsonl. Each carries title, matchedText, and rule context. Flip correct to false for any candidate that is noise (a false positive), and tighten titleSubstring if the default first-four- words match is too broad. The eval loader reads only sessionId, ordinal, type, titleSubstring, and correct; the context fields are ignored.

Re-run the bench against your labeled set to see real precision:

omp-episodic bench --questions <file> --sessions ~/.omp/agent/sessions --labels my-labels.jsonl

As the labeled set grows and precision climbs past the 85% target, raise the gate floor in src/bench.ts to lock in the gain.

MCP server

The package ships a second binary, omp-episodic-mcp (./dist/mcp-server.js), that runs the MCP stdio server. Register it in any harness that speaks MCP (Claude Code, Codex, Oh My Pi).

Using the published package via npx (the -p flag selects the named bin, since it differs from the package name):

{
  "mcpServers": {
    "omp-episodic-memory": {
      "command": "npx",
      "args": ["-y", "-p", "omp-episodic-memory", "omp-episodic-mcp"]
    }
  }
}

If installed globally (npm install -g omp-episodic-memory), the omp-episodic-mcp command is on your PATH:

{
  "mcpServers": {
    "omp-episodic-memory": {
      "command": "omp-episodic-mcp"
    }
  }
}

For a local checkout, build first (bun run build) and point at the file directly:

{
  "mcpServers": {
    "omp-episodic-memory": {
      "command": "node",
      "args": ["/absolute/path/to/omp-episodic-memory/dist/mcp-server.js"]
    }
  }
}

Tools:

Tool	Purpose
`search`	Hybrid retrieval over indexed sessions. Returns markdown or JSON.
`read`	Reads a full session transcript by path, constrained to the configured sessions root.
`recall_for_task`	Task-scoped evidence bundle with confidence tiers and explicit abstention.
`list_gotchas`	Approved failure-mode memories for a project/task, so the agent avoids repeating a known mistake.
`get_project_context`	Pinned project context plus recent approved decisions, gotchas, and runbooks.

The MCP server starts an embedding-model prewarm in the background. A first vector search can still be slow if the model cache is cold or the download has not finished; mode: "text" avoids the embedding path.

Development

Local development uses Bun:

bun install        # install dependencies
bun run check      # type-check (tsc --noEmit)
bun run test       # run the test suite

Tests run on Node's built-in test runner via tsx (node --import tsx --test).

Requirements

Node.js 20+
Bun for local development commands
A platform supported by better-sqlite3 and sqlite-vec
Network access on first embedding run unless the Transformers.js model is already cached

Layout

File	Role
`src/types.ts`	Shared contract and portable defaults.
`src/parser.ts`	OMP JSONL to `Exchange[]` parser. See `FORMAT.md`.
`src/db.ts`	SQLite schema, read-only open path, and upsert/re-embed writes.
`src/embeddings.ts`	MiniLM embedding singleton with balanced user/assistant truncation.
`src/indexer.ts`	Crawl, embed, upsert, and persist pipeline.
`src/search.ts`	Hybrid RRF retrieval.
`src/cli.ts`	CLI commands: index, search, recall, stats, extract, inbox, approve, reject, memories, graph, diff, eval, context, blocks.
`src/blocks.ts`	Pinned project-context blocks and the project-context aggregator.
`src/mcp-server.ts`	MCP stdio server: `search`, `read`, `recall_for_task`, `list_gotchas`, `get_project_context`.

Contributing

Contributions are welcome. To get started:

Fork and clone the repo, then run bun install.
Make your change with a focused commit.

Run the gates locally before opening a PR:

bun run check      # type-check
bun run test       # test suite

Open a pull request describing the change and its motivation. CI runs the type-check, test suite, and the OMP-MemBench gate on every push.

Bug reports and feature requests are tracked in GitHub Issues. See RELEASING.md for the release process and CHANGELOG.md for the version history.

Acknowledgements

omp-episodic-memory was inspired in part by Jesse Vincent's episodic-memory, which brings semantic recall to Claude Code and Codex conversations.

This project is an independent Oh My Pi-focused implementation. Its emphasis is raw OMP transcript provenance, reviewable derived memories, task-scoped recall, gotchas/runbooks, and recall-quality evaluation.

License

_{Built for Oh My Pi coding sessions.}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
docs		docs
src		src
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
FORMAT.md		FORMAT.md
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 omp-episodic-memory

Table of contents

Why not just use OMP memory?

What this is

What it does

Install

Quick start

How it works

CLI

Command reference

Benchmarks

OMP-MemBench (combined gate)

Growing the extraction gold set on your real sessions

MCP server

Development

Requirements

Layout

Contributing

Acknowledgements

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 omp-episodic-memory

Table of contents

Why not just use OMP memory?

What this is

What it does

Install

Quick start

How it works

CLI

Command reference

Benchmarks

OMP-MemBench (combined gate)

Growing the extraction gold set on your real sessions

MCP server

Development

Requirements

Layout

Contributing

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages