Skip to content

wolfiesch/omp-episodic-memory

Repository files navigation

🧠 omp-episodic-memory

Local-first experience memory for coding agents.

Index raw Oh My Pi session transcripts, then recall provenance-backed decisions, runbooks, and gotchas — without ever modifying OMP state.

CI npm version Install with npx Node.js License: MIT PRs Welcome

Install · Quick start · How it works · CLI · MCP server · Contributing


A forensic and experience memory over your actual coding sessions. It reads the session JSONL files already on disk and writes only to its own local index database. Every result traces back to the exact conversation and exchange that produced it, so you can answer questions like "where did we solve this before", "what did the agent actually say", and "which session decided X".

Read-only with respect to OMP state: it never edits, compresses, or curates OMP's own memory. It indexes the raw transcripts and exposes them through a CLI and an MCP server.

Table of contents

Why not just use OMP memory?

What this is

OMP Hindsight and Mnemopi are curated operational memory for the agent's working context. omp-episodic-memory is read-only forensic transcript memory: exact evidence, raw-session citations, and a reviewable inbox for derived decisions, runbooks, and gotchas. It never modifies OMP state.

Need Use OMP native memory Use omp-episodic-memory
Remember preferences Yes. Store durable user and project preferences for automatic use. Optional. Use transcripts as evidence before promoting a preference.
Auto-recall at session start Yes. Native memory is built for ambient context. No. Recall is explicit and task-scoped.
Audit a prior session verbatim No. Curated memory is compressed. Yes. Search and read exact transcript exchanges.
Trace memory to exact exchange Limited. Summaries may omit raw provenance. Yes. Every result carries session, path, and ordinal.
Review before promoting derived memory Limited. Native memory is already curated. Yes. extract proposes records into an approve/reject inbox.
Avoid writing to OMP state No. Native memory is OMP state. Yes. The index is separate and read-only toward OMP.
Debug stale/contradictory memory Limited. You see the current distilled view. Yes. Compare decisions, supersession, and source evidence.
Project Lane Relationship
Native OMP memory Curated working memory for agents. Complement: this tool supplies transcript evidence before you update native memory.
Obra episodic-memory Episodic memory for agent experience. Closest neighbor, with this project focused on local OMP transcript forensics and review.
Mem0 General-purpose application memory service/framework. Different lane: this project favors local-first evidence over broad app memory APIs.
Zep/Graphiti Graph-backed long-term memory for agents and apps. Different lane: this project keeps a narrow, read-only transcript index with provenance.
Letta Stateful agent runtime with memory as part of the agent system. Different lane: this project is an external audit and recall layer for existing OMP sessions.

OMP's built-in memory is curated and compressed — a distilled view optimized for the agent's working context. That is useful, but it is lossy: the original wording, the dead ends, and the precise moment a decision was made are gone.

This tool takes the opposite stance. It indexes the raw transcripts as they sit on disk and gives you provenance back to the exact conversation and exchange. Use it to answer:

  • Where did we solve this before?
  • What did the agent actually say (verbatim), not the summary?
  • Which session decided X, and what was the reasoning at the time?

The index is read-only with respect to OMP state. Derived memory (decisions, gotchas, runbooks) is proposed into a separate reviewable inbox — nothing is asserted into your knowledge base without an explicit approve step.

This is not a competitor to general-purpose agent memory frameworks (Mem0, Zep, Letta) or to OMP-native curation (Hindsight). Its lane is narrow on purpose: raw-transcript provenance plus reviewable derived memory for OMP coding sessions.

What it does

  • Hybrid search — FTS5 keyword retrieval and sqlite-vec vector retrieval fused with Reciprocal Rank Fusion (RRF). Modes: both, vector, text.
  • Typed, reviewable derived memory — decisions, gotchas, and runbooks extracted from transcripts into an approve/reject inbox. Nothing enters the knowledge base without review.
  • recall_for_task evidence bundles — task-scoped retrieval that returns supporting evidence with a confidence score and abstains when the index has nothing relevant, rather than fabricating an answer.
  • Temporal project graph — entities and time-bounded edges, with decision supersession and a memory diff to see what changed since a given date.
  • Pinned project-context blocks — durable, project-scoped context surfaced alongside recall.
  • Recall eval harness — a reproducible benchmark over question/session fixtures that reports recall, ranking, abstention, and latency metrics as a regression guardrail.

Install

Requires Node.js 20 or newer.

npm install -g omp-episodic-memory   # global CLI: omp-episodic

Or run without installing:

npx -y -p omp-episodic-memory omp-episodic index
npx -y -p omp-episodic-memory omp-episodic search "family tree research"

Quick start

omp-episodic index                       # index all sessions
omp-episodic search "family tree research"
omp-episodic stats

The default index path is ${XDG_DATA_HOME:-~/.local/share}/omp-episodic-memory/index.db. Override it with OMP_EPISODIC_DB or --db PATH.

How it works

Stage What happens
Parse Walks ${OMP_SESSIONS_DIR:-~/.omp/agent/sessions}/**/*.jsonl, assembling each user turn plus the assistant reply that followed into an Exchange, including tool calls/results, command text, file paths, error state, details, and exit status.
Embed Uses Xenova/all-MiniLM-L6-v2 (384-d) via @xenova/transformers. First run downloads the model if it is not already cached. No API keys are required. Tool event text contributes to retrieval.
Store Writes exchanges, serialized tool events, FTS5 keyword tables, and a sqlite-vec vec0 vector table to local SQLite.
Search Fuses vector and keyword branches with Reciprocal Rank Fusion (RRF). Supports both, vector, text, tool-name filters, and tool-error filters.
Derive Extracts typed memory (decisions, gotchas, runbooks) into a reviewable inbox; builds a temporal entity/edge graph with supersession.

CLI

omp-episodic index                              # index all sessions
omp-episodic search "sqlite-vec" --mode text    # keyword-only search
omp-episodic search "Command exited with code 1" --mode text --tool bash --tool-error true
omp-episodic recall "fix flaky vector search"   # task-scoped evidence bundle
omp-episodic stats                              # index statistics
omp-episodic recall "fix flaky vector search" --ui # OMP-styled TTY panel

Command reference

Command Description
index Index OMP transcripts into the local SQLite database.
search Hybrid search over indexed exchanges (--mode both|vector|text, --tool NAME, --tool-error true|false).
recall Build a task-scoped evidence bundle with confidence and abstention; supports the same tool filters as search.
stats Show index statistics (exchanges, sessions, date range).
extract Propose typed derived memories (decisions/gotchas/runbooks) into the inbox.
inbox List derived memories by status (pending/approved/rejected/superseded).
approve Approve a pending derived memory by id.
reject Reject a derived memory by id, with an optional reason.
memories Search approved/derived memories by query, type, project, or status.
graph Build or inspect the temporal project graph (entities and edges).
diff Show what derived memory changed since a given date.
eval Run the recall eval harness over a question/session fixture set.
context Show pinned project-context blocks plus recent approved decisions/gotchas/runbooks.
blocks Manage pinned project-context blocks (list, set <kind>, rm <id>).

Common flags: --mode both|vector|text, --limit N, --after YYYY-MM-DD, --before YYYY-MM-DD, --project P, --json, --ui, --plain, --db PATH, --sessions DIR, --max N.

Terminal polish: --ui (or OMP_EPISODIC_UI=1) enables an ANSI π recall panel for TTY output on search, recall, stats, and inbox. Piped output, --json, and --plain always stay deterministic for scripts and tests.

Environment:

Variable Purpose
OMP_EPISODIC_DB Index database path.
OMP_SESSIONS_DIR Default session corpus for CLI indexing.
OMP_EPISODIC_SESSIONS_DIR Root allowed by the MCP read tool. Set this if you index a non-default session directory.
XDG_DATA_HOME Base directory for the default index path.

Benchmarks

The eval command runs a reproducible recall benchmark over a fixture set of exact, decision, procedural, temporal, multi-session, gotcha, runbook, contradiction, and abstention questions:

omp-episodic eval --questions <file> --sessions <dir> --mode text

It builds (or reuses, with --no-build) an index from the fixtures, runs each question through recall, and reports:

Metric Meaning
Recall@1 / Recall@5 Fraction of questions whose expected source appears in the top 1 / top 5 results.
MRR Mean reciprocal rank of the expected source.
Abstention accuracy Fraction of unanswerable questions on which recall correctly abstains.
False-positive rate Fraction of unanswerable questions answered anyway (confident when it should abstain).
p50 / p95 latency Median and tail per-query latency.

Current baseline on the bundled synthetic fixtures (text mode):

Metric Result
Scored recall questions 30
Recall@5 100%
Abstention false-positive rate 0%
p95 latency < 500ms
Extraction precision 92.6%
Unlabeled extraction candidates 0
Duplicate rate 0%

These numbers are on small synthetic fixtures. They are a regression guardrail to catch retrieval/abstention regressions, not a leaderboard claim about real-world corpora.

OMP-MemBench (combined gate)

The bench command runs the recall benchmark and the extraction-quality benchmark together, scoring both against a two-tier threshold model:

omp-episodic bench --questions <file> --sessions <dir> --labels <file> --mode text
  • Gates are CI-blocking floors: at least 30 scored recall questions, Recall@5 ≥ 85%, abstention-FP < 10%, p95 < 500ms, extraction precision ≥ 80%, zero unlabeled extraction candidates, and duplicate rate < 10%. A failed gate exits non-zero, so CI goes red.
  • Targets are the aspirational SOTA bars (extraction precision ≥ 85%, Recall@1 ≥ 85%, MRR ≥ 0.80). They are reported with when unmet but never fail the build. They mark the gap you close by growing the gold set.

CI runs this exact command on every push (see .github/workflows/ci.yml).

Growing the extraction gold set on your real sessions

The bundled fixture precision baseline is measured on a synthetic set. To measure and improve extraction quality on your own transcripts, label real candidates:

  1. Generate a labels template from your sessions (one row per extracted candidate, pre-filled correct: true):

    omp-episodic label-scaffold --sessions ~/.omp/agent/sessions > my-labels.jsonl
  2. Review each row in my-labels.jsonl. Each carries title, matchedText, and rule context. Flip correct to false for any candidate that is noise (a false positive), and tighten titleSubstring if the default first-four- words match is too broad. The eval loader reads only sessionId, ordinal, type, titleSubstring, and correct; the context fields are ignored.

  3. Re-run the bench against your labeled set to see real precision:

    omp-episodic bench --questions <file> --sessions ~/.omp/agent/sessions --labels my-labels.jsonl

As the labeled set grows and precision climbs past the 85% target, raise the gate floor in src/bench.ts to lock in the gain.

MCP server

The package ships a second binary, omp-episodic-mcp (./dist/mcp-server.js), that runs the MCP stdio server. Register it in any harness that speaks MCP (Claude Code, Codex, Oh My Pi).

Using the published package via npx (the -p flag selects the named bin, since it differs from the package name):

{
  "mcpServers": {
    "omp-episodic-memory": {
      "command": "npx",
      "args": ["-y", "-p", "omp-episodic-memory", "omp-episodic-mcp"]
    }
  }
}

If installed globally (npm install -g omp-episodic-memory), the omp-episodic-mcp command is on your PATH:

{
  "mcpServers": {
    "omp-episodic-memory": {
      "command": "omp-episodic-mcp"
    }
  }
}

For a local checkout, build first (bun run build) and point at the file directly:

{
  "mcpServers": {
    "omp-episodic-memory": {
      "command": "node",
      "args": ["/absolute/path/to/omp-episodic-memory/dist/mcp-server.js"]
    }
  }
}

Tools:

Tool Purpose
search Hybrid retrieval over indexed sessions. Returns markdown or JSON.
read Reads a full session transcript by path, constrained to the configured sessions root.
recall_for_task Task-scoped evidence bundle with confidence tiers and explicit abstention.
list_gotchas Approved failure-mode memories for a project/task, so the agent avoids repeating a known mistake.
get_project_context Pinned project context plus recent approved decisions, gotchas, and runbooks.

The MCP server starts an embedding-model prewarm in the background. A first vector search can still be slow if the model cache is cold or the download has not finished; mode: "text" avoids the embedding path.

Development

Local development uses Bun:

bun install        # install dependencies
bun run check      # type-check (tsc --noEmit)
bun run test       # run the test suite

Tests run on Node's built-in test runner via tsx (node --import tsx --test).

Requirements

  • Node.js 20+
  • Bun for local development commands
  • A platform supported by better-sqlite3 and sqlite-vec
  • Network access on first embedding run unless the Transformers.js model is already cached

Layout

File Role
src/types.ts Shared contract and portable defaults.
src/parser.ts OMP JSONL to Exchange[] parser. See FORMAT.md.
src/db.ts SQLite schema, read-only open path, and upsert/re-embed writes.
src/embeddings.ts MiniLM embedding singleton with balanced user/assistant truncation.
src/indexer.ts Crawl, embed, upsert, and persist pipeline.
src/search.ts Hybrid RRF retrieval.
src/cli.ts CLI commands: index, search, recall, stats, extract, inbox, approve, reject, memories, graph, diff, eval, context, blocks.
src/blocks.ts Pinned project-context blocks and the project-context aggregator.
src/mcp-server.ts MCP stdio server: search, read, recall_for_task, list_gotchas, get_project_context.

Contributing

Contributions are welcome. To get started:

  1. Fork and clone the repo, then run bun install.

  2. Make your change with a focused commit.

  3. Run the gates locally before opening a PR:

    bun run check      # type-check
    bun run test       # test suite
  4. Open a pull request describing the change and its motivation. CI runs the type-check, test suite, and the OMP-MemBench gate on every push.

Bug reports and feature requests are tracked in GitHub Issues. See RELEASING.md for the release process and CHANGELOG.md for the version history.

Acknowledgements

omp-episodic-memory was inspired in part by Jesse Vincent's episodic-memory, which brings semantic recall to Claude Code and Codex conversations.

This project is an independent Oh My Pi-focused implementation. Its emphasis is raw OMP transcript provenance, reviewable derived memories, task-scoped recall, gotchas/runbooks, and recall-quality evaluation.

License

MIT © omp-episodic-memory contributors

Built for Oh My Pi coding sessions.

About

Hybrid semantic and keyword search over Oh My Pi session transcripts

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors