Skip to content

riz007/somtum

Repository files navigation

Somtum Logo

Local-first memory and prompt-cache layer for Claude Code.

Docs · npm

License: MIT npm version

Somtum (Thai: ส้มตำ) is named after the vibrant, shredded green papaya salad. Just like its namesake, Somtum blends durable observations from your Claude Code sessions — decisions, bugfixes, learnings, file summaries — stores them in a local SQLite database, and injects the relevant ones back into context the next time you work on the same project.

Zero-config: one somtum init and every session end is captured automatically. No server, no cloud account, no mandatory tuning.


Table of Contents

v2.0.0 — Global DB (~/.somtum/global.db) · cross-project workspace recall · memory deduplication (superseded_by) · --show-superseded on somtum list · stats instrumentation fix · dashboard dark-mode redesign

v1.5.0 — Multi-page VitePress docs site · somtum list · somtum reset · somtum forget --all · embeddings timeout safety · config crash-resilience · injection.max_chars wired up · warm-start race fix · auth-error hints

v1.3.0 — Auto-inject memories on every prompt · update MCP tool · warm-start after compaction · false-hit detection · workspace scope · suggest-claude-md · stale memory detection in doctor


Why Somtum?

LLM agents like Claude Code start every session with a blank slate. That leads to:

  • Repetitive context — re-explaining the same architectural choices every session
  • Regressions — Claude suggests a fix you already tried and discarded
  • Token waste — reading large files just to "set the scene"

Somtum gives Claude a long-term memory. Once a decision is made or a bug is fixed, it's remembered across all future sessions — without bloating your context window.

What a session looks like with Somtum

Without Somtum                    With Somtum
────────────────────              ──────────────────────────────────────
Session 1: "We use pnpm           Session 1: same work
           because of workspace
           hoisting"

Session 2: Claude suggests        Session 2: Claude already knows about
           npm, you correct it       pnpm, the auth decisions, and the
           again                     bugfixes from last week

How it works

At the end of each Claude Code session, Somtum reads the session transcript and asks Claude Haiku to extract the parts worth keeping — decisions, bug fixes, things learned. Those observations are stored locally in SQLite. On every subsequent prompt, Somtum automatically retrieves the most relevant memories and injects them into context — no manual recall needed.

Memory lifecycle

┌─────────────────────────────────────────────────────────────┐
│                    Claude Code Session                      │
│                                                             │
│       you code · debug · review · make decisions            │
└──────────────────────────────┬──────────────────────────────┘
                               │ SessionEnd / PreCompact
                               ▼
┌─────────────────────────────────────────────────────────────┐
│                     Capture Pipeline                        │
│                                                             │
│  session transcript ──► Haiku extracts observations         │
│                                                             │
│      decisions · bug fixes · learnings · commands           │
│                                                             │
│  PreCompact ─── writes warm-start file ──► next session     │
└──────────────────────────────┬──────────────────────────────┘
                               │ persisted locally
                               ▼
                 ┌─────────────────────────┐
                 │  ~/.somtum/projects/    │
                 │     <project-hash>/     │
                 │                         │
                 │  db.sqlite              │
                 │  index.md               │
                 │  memories/YYYY-MM/      │
                 └────────────┬────────────┘
                              │ every prompt (UserPromptSubmit)
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                 Auto-Inject Pipeline (new)                  │
│                                                             │
│  1. Prompt cache lookup (exact + fuzzy match)               │
│  2. BM25 recall — top-k relevant memories                   │
│  3. Warm-start context (if session just compacted)          │
│                                                             │
│      all injected as additionalContext automatically        │
└─────────────────────────────────────────────────────────────┘

What gets captured — a concrete example

You work a session debugging an auth bug and refactoring a module. At session end, Somtum extracts something like:

[
  {
    "kind": "bugfix",
    "title": "JWT refresh loop caused by missing expiry check",
    "body": "The refresh token loop was triggered because we checked token.exp < Date.now() instead of token.exp < Date.now() / 1000. Unix timestamps are in seconds, not milliseconds.",
    "files": ["src/auth/refresh.ts"]
  },
  {
    "kind": "decision",
    "title": "Use pnpm workspaces — npm hoisting breaks shared types",
    "body": "Switched from npm to pnpm because npm's hoisting puts shared type packages in the wrong node_modules scope, breaking type inference across packages.",
    "files": ["package.json", "pnpm-workspace.yaml"]
  }
]

Next session, when you ask "why are we using pnpm?" or touch src/auth/refresh.ts, Claude finds these memories and already has the context.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Claude Code / Agent                    │
└──────────┬──────────────────────────────┬───────────────────┘
           │ hooks                        │ MCP tools
           ▼                              ▼
┌─────────────────────┐         ┌──────────────────────────┐
│  Hooks              │         │   MCP Tools              │
│                     │         │                          │
│  UserPromptSubmit ──┼─cache──▶│ cache_lookup             │
│                   ──┼─inject─▶│ recall / get             │
│  SessionEnd ────────┼─capture▶│ remember / update        │
│  PreCompact ────────┼─warmst─▶│ forget                   │
│  PreToolUse (Read) ─┼─gate───▶│ stats                    │
│                     │         │ report_false_hit          │
└──────────┬──────────┘         └────────────┬─────────────┘
           │                                 │
           ▼                                 ▼
┌─────────────────────────────────────────────────────────────┐
│                      Core (TypeScript)                      │
│                                                             │
│  ┌──────────────┐  ┌─────────────────┐  ┌───────────────┐  │
│  │ PromptCache  │  │  MemoryStore    │  │   Retriever   │  │
│  │              │  │                 │  │               │  │
│  │ exact hash   │  │ observations    │  │ bm25(default) │  │
│  │ fuzzy embed  │  │ scope: project  │  │ embeddings    │  │
│  │ fingerprint  │  │         global  │  │ index         │  │
│  │ false_hits   │  │       workspace │  │ hybrid        │  │
│  └──────────────┘  │ last_confirmed  │  └───────────────┘  │
│                    └─────────────────┘                      │
└─────────────────────────────────┬───────────────────────────┘
                                  │
                                  ▼
                    ┌─────────────────────────────┐
                    │  SQLite WAL + ~/.somtum/     │
                    │  /projects/<hash>/           │
                    │    db.sqlite                 │
                    │    index.md                  │
                    │    memories/YYYY-MM/<ulid>.md│
                    │  /session/lh_<id>.json       │
                    │  /warmstart/ws_<id>.json     │
                    └─────────────────────────────┘

Retrieval strategies

Strategy How it works Best for Cost
bm25 Keyword search over title + body + tags (SQLite FTS5 — no external dependencies) Exact terms, offline setups Near-zero
embeddings Semantic similarity using a 30 MB local model (bge-small-en-v1.5, runs fully in-process) "What did we decide about auth?" style queries ~5 ms at 10k memories
index Sends a compact memory catalog to Haiku; the model picks relevant IDs Paraphrased or fuzzy queries 1 Haiku API call
hybrid BM25 + embeddings results merged and re-ranked by Haiku General case (best recall) BM25 + embeddings + 1 Haiku call

Default is bm25 — works offline, no setup. Enable hybrid once you have embeddings downloaded.

Caution: Setting strategy=hybrid without enabling embeddings causes a silent fallback to BM25 while paying hybrid overhead. Run somtum doctor — if it shows strategy=hybrid alongside embeddings: disabled, fix it:

somtum config set retrieval.strategy bm25       # match what's actually running
# or, to use real hybrid:
somtum config set retrieval.embeddings.enabled true && somtum reindex

Requirements

  • Node 20+
  • Claude Code — Somtum hooks into Claude Code's SessionEnd, UserPromptSubmit, and PreToolUse events
  • ANTHROPIC_API_KEY (optional) — if set, Somtum uses the Anthropic API directly for extraction. If not set, Somtum falls back to the claude CLI that ships with Claude Code, so no separate API key is required for Claude Code subscribers.

Install

npm install -g somtum

pnpm users: pnpm add -g somtum works if you have run pnpm setup first (sets PNPM_HOME). If you haven't, use npm above.

yarn users: yarn global add is not supported in Yarn v2+ (Berry). Use npm above.

From source

git clone https://github.com/riz007/somtum
cd somtum
pnpm install
pnpm build
pnpm link --global

Native module note

Somtum uses better-sqlite3, which contains a native C++ addon. On most platforms (macOS, Linux x64/arm64, Windows x64) a prebuilt binary is downloaded automatically. On Alpine Linux / musl or unusual architectures, the addon compiles from source — python, make, and gcc must be available. If the install fails with a node-gyp error, install those build tools and retry.


Quickstart

Before you begin — pick your setup

Using Claude Code (subscription)? You do not need an API key. Somtum calls the claude CLI that ships with Claude Code. Confirm it is installed: which claude If nothing prints, reinstall Claude Code or add its binary to your PATH.

Using the Anthropic API directly? Add your key to your shell profile (not just the current terminal tab):

# Add to ~/.zshrc or ~/.bashrc
export ANTHROPIC_API_KEY="sk-ant-..."
source ~/.zshrc

Both paths work. Most Claude Code users should use Option A.


Step 1 — Choose your extraction backend

Somtum needs to call a Claude model at session end to extract observations. Pick one:

Option A: Claude Code subscription (no extra setup)

If you already have Claude Code installed, you're done. Somtum calls claude --print automatically when no API key is present. Skip to Step 2.

Option B: Direct Anthropic API key (optional — faster, lets you pick the model)

Add to ~/.zshrc (or ~/.bashrc):

export ANTHROPIC_API_KEY="sk-ant-..."

Then reload:

source ~/.zshrc

The key must be in your shell profile, not just exported in an open terminal tab. The SessionEnd hook inherits the environment of the shell that started Claude Code — not the current terminal.

Step 2 — Install inside a Claude Code project

Run this from the root of the project you work on with Claude Code:

somtum init

To enable all features at once (recommended):

somtum init --all
# Installs:
#   - SessionEnd capture hook     (memory extraction)
#   - UserPromptSubmit cache hook (prompt cache lookup)
#   - PreToolUse file-gating hook (large file summarization)
#   - MCP server in .mcp.json     (Claude can call recall/remember tools)

Step 3 — Verify your setup

Run doctor immediately after init to confirm everything is connected:

somtum doctor

All checks should show . The two most important are:

  • api_key — confirms Somtum has a way to call Claude for extraction
  • hooks_installed — confirms the SessionEnd hook is registered

Do not start your first session until all checks pass. Each failing check shows an inline fix command.

Step 4 — Use Claude Code normally

Open a Claude Code session from the same directory where you ran somtum init. Work as you normally would. When the session ends, the hook extracts observations automatically in the background (capped at 90 seconds).

Step 5 — Check your memory

# How many observations were captured?
somtum stats

# Search memory
somtum search "auth jwt rotation"
somtum search "why we use pnpm" --strategy hybrid

# Open the visual dashboard
somtum serve

If somtum stats shows memories 0 after a session, see Troubleshooting.


Verifying the setup

After your first Claude Code session ends:

1. Check the hook log

cat ~/.somtum/hook.log

A successful run:

2026-04-30T10:15:42.123Z [post_session] starting
2026-04-30T10:15:44.891Z [post_session] ok — inserted=4 superseded=1 cache=2 summaries=1

Using the claude CLI fallback (no API key):

2026-04-30T10:15:42.123Z [post_session] starting
2026-04-30T10:15:42.124Z [post_session] ANTHROPIC_API_KEY not set — will use claude CLI fallback
2026-04-30T10:15:44.891Z [post_session] ok — inserted=4 superseded=1 cache=2 summaries=1

Neither backend available:

2026-04-30T10:15:42.123Z [post_session] ERROR: Neither ANTHROPIC_API_KEY nor the claude CLI is available.

2. Check stats

somtum stats

You should see memories > 0 after a substantive session. Short or trivial sessions (no decisions, no bug fixes) correctly return 0 — the extractor only keeps durable observations.

3. Run doctor

somtum doctor

All checks should show . The api_key and hooks_installed checks are the two most commonly failing.


Dashboard

somtum serve
# Opens http://localhost:3000

The dashboard has four views:

  • Memory browser — searchable, filterable list of all captured observations. Switch between BM25, hybrid, and embeddings strategies live. Click any memory to see its full body, files, and tags.
  • Knowledge graph — nodes are memories, edges connect memories that share files or tags. Click a node to open it in the detail panel.
  • Analytics — kind breakdown, cache hit rate, retrieval strategy usage, top-referenced files.
  • Forget button — soft-delete any memory directly from the browser.
Flag Default Description
--port <n> 3000 Listen on a custom port
--no-open Start server without opening the browser

Press Ctrl-C to stop.


CLI Reference

Setup

Command Description
somtum init Install the SessionEnd capture hook
somtum init --cache Also install the UserPromptSubmit cache + auto-inject hook
somtum init --file-gating Also install the PreToolUse file-gating hook
somtum init --all Install all hooks + MCP server
somtum init --force Reinstall even if hooks already present
somtum doctor Check DB health, migrations, hooks, API key, breakeven ratio, stale memories

Memory

Command Description
somtum list List stored memories (most recent first, superseded hidden by default)
somtum list --kind decision Filter by kind: decision | learning | bugfix | command | file_summary
somtum list --limit 20 Limit to 20 results
somtum list --json Machine-readable JSON output
somtum list --show-superseded Include memories that have been superseded by a newer duplicate
somtum search <query> Search observations (default: bm25 strategy)
somtum search <query> --strategy hybrid Force a specific retrieval strategy
somtum search <query> -k 16 Return more results
somtum show <id> Print the full body of an observation
somtum remember Manually store an observation
somtum forget <id> Soft-delete an observation by id
somtum forget --all Soft-delete all observations in the current project
somtum edit <id> Open an observation body in $EDITOR
somtum rebuild Regenerate index.md from all observations
somtum reindex Recompute embeddings (after enabling embeddings or changing model)
somtum suggest-claude-md Suggest CLAUDE.md additions from accumulated observations (interactive)
somtum suggest-claude-md --dry-run Preview suggestions without writing
somtum suggest-claude-md --yes --limit 20 Auto-confirm, limit to top 20 by tokens saved

Stats & Visibility

Command Description
somtum stats Tokens saved, cache hit rate, retrieval breakdown
somtum stats --json Machine-readable JSON output
somtum serve Open the visual dashboard in the browser
somtum serve --port <n> Use a custom port (default 3000)
somtum serve --no-open Start server without opening the browser

Data Management

Command Description
somtum export Export observations to stdout as JSON
somtum export --format jsonl --output obs.jsonl Export as JSONL file
somtum export --format markdown Export as readable Markdown
somtum export --include-deleted Include soft-deleted entries
somtum import <file> Import observations from JSON or JSONL
somtum purge --older-than 30d Hard-delete soft-deleted entries older than 30 days
somtum purge --older-than 30d --dry-run Preview without deleting
somtum reset Permanently wipe all memories for the current project (asks to confirm)
somtum reset --yes Skip confirmation (useful in CI or scripts)

Configuration

Command Description
somtum config get Print the full resolved config
somtum config get retrieval.strategy Read a single key (dot-separated)
somtum config set retrieval.strategy hybrid Write to .somtum/config.json
somtum config set retrieval.embeddings.enabled true --global Write to ~/.somtum/config.json

Sync

Command Description
somtum sync status Compare local vs remote observation count
somtum sync push Export and scp observations to remote
somtum sync pull scp from remote and merge into local DB

Set your remote: somtum config set sync.remote "user@host:/path/.somtum/projects/<id>"

Somtum uses hostname-aware syncing — merging observations from multiple machines without data loss.


MCP Server

When you run somtum init --all, Somtum registers an MCP server that Claude can call autonomously during a session:

Tool What Claude does with it
recall Searches memories when unsure about a project detail. Merges project + global DB results. Accepts strategy
get Retrieves full observation bodies by ID. Checks global DB if not found in project DB. Bumps last_confirmed_at
remember Stores an observation manually. scope='global' routes to ~/.somtum/global.db; returns stored_in field
update Updates an existing observation's title, body, tags, or files. Redaction applied
cache_lookup Checks the prompt cache directly
report_false_hit Reports that a cached response didn't answer the question (tunes fuzzy threshold data)
forget Soft-deletes an observation
stats Reports tokens saved, cache hit rate, false-hit count, corpus size, and global_memories count

Every MCP response includes a tokens field so Claude can account for retrieval cost.

Memory scope

Observations now carry a scope field:

Scope Meaning Use it when
project Default. Visible only in this project. Most decisions, bugfixes, and learnings.
workspace Shared across projects via the recall MCP tool. Team conventions, preferred libraries, global rules.
global Same as workspace; reserved for personal preferences that span all your projects. Your personal coding preferences.
# Store a workspace-scoped observation from within a session:
remember("Always use pnpm for Node projects", body="...", scope="workspace")

Storage Layout

~/.somtum/
├── config.json                         ← global config (merged with project config)
├── hook.log                            ← timestamped log of every hook execution
├── global.db                           ← global-scope memories (scope='global'), queried
│                                         alongside every project DB on recall + auto-inject
├── session/
│   └── lh_<id>.json                    ← last cache-hit state per project (false-hit detection)
│                                         files older than 24 h are evicted automatically
├── warmstart/
│   └── ws_<id>_<timestamp>.json        ← warm-start context written after PreCompact (30 min TTL)
│                                         timestamped so concurrent windows don't clobber each other
└── projects/
    └── <project_id>/
        ├── db.sqlite                   ← source of truth (SQLite WAL)
        ├── index.md                    ← human-readable mirror (regenerated)
        └── memories/
            └── YYYY-MM/
                └── <ulid>.md           ← per-observation markdown files

The project ID is derived from the git remote URL (or directory path if no remote). The same project maps to the same ID across machines as long as the remote URL matches.

SQLite is the source of truth. Edit observations with somtum edit <id>, not by hand.


Configuration

Global config lives at ~/.somtum/config.json. Per-project config at .somtum/config.json overrides it (deep merge).

Most common settings

# Enable semantic (embedding-based) search — downloads a 30 MB model once
somtum config set retrieval.embeddings.enabled true
somtum reindex

# Switch to hybrid retrieval (BM25 + embeddings + rerank) for best recall
somtum config set retrieval.strategy hybrid

# Use LLM-based retrieval (no embeddings required, costs one Haiku call per query)
somtum config set retrieval.index.enabled true
somtum config set retrieval.strategy index

# Disable file-gating (on by default — intercepts large file reads and serves cached summary)
somtum config set file_gating.enabled false

# Limit observations extracted per session (default: 10)
somtum config set extraction.max_observations_per_session 5

# Control automatic memory injection on every prompt (default: on)
somtum config set injection.enabled false          # turn off auto-inject
somtum config set injection.k 5                    # inject more memories (default: 3)
somtum config set injection.max_chars 3000         # raise injection size cap (default: 1500)

Full config reference

{
  "extraction": {
    "model": "claude-haiku-4-5-20251001",
    "trigger": ["SessionEnd", "PreCompact"],
    "max_observations_per_session": 10,
  },
  "cache": {
    "enabled": true,
    "fuzzy_match": true,
    "fuzzy_threshold": 0.92, // raise to 0.95 once you have false-hit signal
    "max_entries": 10000,
    "ttl_days": 90,
  },
  "retrieval": {
    "strategy": "bm25", // bm25 | embeddings | index | hybrid
    "k": 8,
    "rerank_model": "claude-haiku-4-5-20251001",
    "bm25": { "enabled": true },
    "embeddings": {
      "enabled": false, // set true to download the 30 MB ONNX model
      "model": "Xenova/bge-small-en-v1.5",
    },
    "index": {
      "enabled": false, // set true to use Haiku as the retriever
      "model": "claude-haiku-4-5-20251001",
    },
  },
  // Auto-inject: BM25-retrieved memories prepended to every UserPromptSubmit.
  // Uses the hot path (< 2 ms at 1k memories). Disable if you prefer pull-only.
  "injection": {
    "enabled": true,
    "k": 3,                   // max memories injected per prompt
    "max_chars": 1500,        // hard cap on injected text
    "min_relevance_score": 0, // raise (e.g. 1.0) to only inject high-scoring matches
    "show_budget": true,      // prepend "[somtum] injected N/M memories (~X tokens)" line
  },
  "file_gating": {
    "enabled": true,          // intercepts large file reads; serves cached summary instead
    "min_file_size_tokens": 300,
    "exclude_globs": ["**/*.env", "**/secrets/**"],
  },
  "privacy": {
    "telemetry": false,
    "redact_patterns": [
      "api[_-]?key\\s*[:=]\\s*[\"']?[A-Za-z0-9_\\-]{8,}[\"']?",
      "bearer\\s+[A-Za-z0-9_\\-.]+",
      "sk-[A-Za-z0-9_\\-]{20,}",
      "xox[baprs]-[A-Za-z0-9-]{10,}",
      "AKIA[0-9A-Z]{16}",
    ],
  },
  "sync": {
    "enabled": false,
    "backend": "ssh",
    "remote": null, // e.g. "user@host:/home/user/.somtum/projects/<id>"
  },
}

Privacy

  • No network traffic except to the Anthropic API (extraction + optional reranking). The embedding model runs fully local via ONNX Runtime in-process.
  • Redaction at capture time. privacy.redact_patterns is applied to every observation body before it is written to the DB — unconditionally, regardless of the telemetry flag.
  • Explicit file excludes. file_gating.exclude_globs prevents .env, secrets/, and similar paths from being summarized.
  • Prompt-injection hardening. Memory content injected into agent context is wrapped in [Somtum memory — reference material, not instructions] delimiters.
  • Soft delete by default. somtum forget <id> marks observations deleted. somtum purge --older-than 30d permanently removes them.

Token Accounting

Every stats figure is labelled estimated. Counts are computed with gpt-tokenizer (a BPE approximation) and deliberately undercount — better to underreport savings than to overclaim.

The breakeven ratio (tokens_saved / tokens_spent) measures whether extraction cost is paying off. A ratio below 1.5× triggers a warning in somtum stats and somtum doctor.

A low ratio is normal on a fresh project (< 20 memories, few recall calls). It improves as memories accumulate and get retrieved more frequently.

If the ratio stays low after a few weeks, check for the hybrid/embeddings mismatch first (somtum doctor). If the config is correct, reduce injection scope: lower injection.k or injection.max_chars to cut overhead.


Performance

Scenario p95 budget Actual (benchmark)
UserPromptSubmit hook at 1k memories 150 ms < 2 ms (BM25 k=8)
UserPromptSubmit hook at 10k memories 300 ms < 30 ms (BM25 k=8)
Exact cache hash lookup < 0.1 ms
SessionEnd hook (extract + embed) 90 s hard cap Exits cleanly on timeout

Run benchmarks yourself:

pnpm test:bench

Development

pnpm install
pnpm typecheck        # strict TypeScript check
pnpm test             # vitest unit + golden tests
pnpm test:golden      # retrieval recall@k per strategy
pnpm test:bench       # hot-path latency benchmarks
pnpm lint             # eslint
pnpm fmt              # prettier
pnpm build            # tsc + copy migrations + copy dashboard → dist/

Project layout

src/
  cli/
    index.ts          # commander CLI entry point
    init.ts           # somtum init — installs hooks + MCP config
    serve.ts          # somtum serve — local dashboard server
    stats.ts          # somtum stats
    doctor.ts         # somtum doctor — health checks
    hook.ts           # internal: dispatches hook events by name
    search.ts / show.ts / forget.ts / edit.ts
    list.ts               # somtum list
    reset.ts              # somtum reset — wipe project DB
    export.ts / import.ts / purge.ts / sync.ts / rebuild.ts / reindex.ts
    config_cmd.ts
    suggest_claude_md.ts  # somtum suggest-claude-md
  core/
    db.ts             # SQLite setup, migration runner
    store.ts          # MemoryStore — CRUD for observations
    cache.ts          # PromptCache — exact + fuzzy lookup
    retriever/        # bm25, embeddings, hybrid, index, factory
    extractor.ts      # session transcript → observations (Claude Haiku)
    index_gen.ts      # renders index.md (incremental past 1k obs)
    memory_files.ts   # writes memories/<YYYY-MM>/<ulid>.md
    retrieval_stats.ts
    embeddings.ts     # Embedder interface + encode/decode utils
    privacy.ts        # redact() — runs on every capture
    tokens.ts         # gpt-tokenizer wrapper
  hooks/
    post_session.ts   # SessionEnd/PreCompact: extract → store → index → warm-start
    pre_prompt.ts     # UserPromptSubmit: cache lookup + auto-inject + false-hit detection
    pre_read.ts       # PreToolUse: file gating
  mcp/               # MCP server + tool implementations
  dashboard/
    index.html        # single-page dashboard (served by somtum serve)
  config.ts          # global + project config merge
  index.ts           # public API for embedding Somtum
src/db/migrations/   # NNN_name.sql migration files
test/
  golden/            # per-strategy retrieval golden sets
  bench/             # hot-path latency benchmarks
  fixtures/          # synthetic session transcripts

Adding a new observation kind

  1. Extend the zod enum in src/core/schema.ts
  2. Update the extractor prompt in src/core/extractor.ts
  3. Add a fixture in test/fixtures/ and an assertion
  4. Update src/core/index_gen.ts to render the new section

Adding a new MCP tool

  1. Define args + response with zod in src/mcp/tools.ts
  2. Register it in src/mcp/server.ts
  3. Response must include a tokens field
  4. Add an integration test in src/mcp/server.test.ts

Troubleshooting

somtum stats shows memories 0 after a session

Check the hook log first:

cat ~/.somtum/hook.log

claude CLI not found and no ANTHROPIC_API_KEY set

  • If you use Claude Code: run which claude — if nothing prints, reinstall Claude Code or add its binary to your PATH.
  • If you prefer the direct API: add export ANTHROPIC_API_KEY="sk-ant-..." to ~/.zshrc and source ~/.zshrc. Must be in your profile, not just exported in the current terminal tab.

Run somtum doctor — the api_key check tells you exactly which backend is available.

Hook not installed in the right directory

somtum init writes the hook to .claude/settings.json in the directory where you ran it. If you launch Claude Code from a different directory, it reads a different settings file.

Fix: run somtum init from the same directory you use to launch Claude Code.

cd ~/my-project
somtum init
claude   # must be launched from ~/my-project

Short or trivial session

If the session had no decisions, bug fixes, or learnings (e.g. you just asked Claude to say hello), the extractor correctly returns 0 observations.


somtum serve opens the browser but shows "Connection refused"

This was a bug fixed in v1.1.0. Upgrade:

npm install -g somtum@latest

If you installed from source, rebuild:

pnpm build

somtum serve — port already in use

somtum serve --port 3001

Agent appears to keep running after session ends

The SessionEnd hook has a hard 90-second timeout. If sessions appear stuck, verify you are on v1.1.0+:

somtum --version
tail -20 ~/.somtum/hook.log

Installation fails (node-gyp / better-sqlite3)

Ensure build tools are installed:

  • macOS: xcode-select --install
  • Ubuntu/Debian: sudo apt-get install build-essential python3
  • Windows: npm install --global --production windows-build-tools

Embeddings are slow or the model won't download

The first somtum reindex downloads a ~30 MB ONNX model from Hugging Face. This requires internet access and may be slow. Subsequent runs use the cached model.

On an air-gapped machine or if you prefer not to use embeddings:

somtum config set retrieval.embeddings.enabled false
somtum config set retrieval.strategy bm25

BM25 works fully offline and is fast at any corpus size.


Claude doesn't seem to have context from previous sessions

Auto-inject is the first thing to check. Since v1.3.0, Somtum automatically injects top-k memories into every UserPromptSubmit via the cache hook — no manual recall step needed.

  1. Confirm the cache hook is installed: somtum doctor → look for hooks_installed ✓
  2. If not installed: somtum init --cache (or somtum init --all)
  3. Confirm injection is enabled: somtum config get injection.enabled → should be true
  4. Check that memories actually exist: somtum statsmemories > 0

Using the MCP server (somtum init --all), Claude can also call recall directly when uncertain. If it's not happening:

  1. Confirm .mcp.json exists: cat .mcp.json
  2. Restart Claude Code to pick up the MCP config

Stale memory warning in somtum doctor

doctor warns when memories are older than 90 days with no confirmed retrievals. These are observations that have never come up in a search. Options:

# Review them before deciding
somtum search "old topic"

# Promote useful ones to workspace scope via MCP
remember("...", scope="workspace")

# Remove irrelevant ones
somtum purge --older-than 90d

Starting fresh — wiping all memories

To hard-reset a project's memory (irreversible):

somtum reset
# Permanently delete all memories for this project? [y/N] y
# somtum: reset complete — project <id> wiped.

To just clear everything softly (recoverable via somtum export --include-deleted):

somtum forget --all

Contributing

Contributions are welcome! See CONTRIBUTING.md for the guide.

Important: This project uses changesets for versioning. Every PR must include a changeset file generated by running pnpm changeset.


License

Licensed under the MIT License.