A Claude Code skill that adds a write spec, placement routing, and a hard index budget
on top of built-in auto-memory — so recall still works at file 170, not just file 15.
English | 中文
Claude Code's auto-memory (v2.1.59+) gets the mechanics right: plain markdown in ~/.claude/projects/<slug>/memory/, a MEMORY.md index loaded at session start, topic files read on demand. What it doesn't give you is discipline. After a few months of real use, three failure modes show up:
- Fragmentation. The same topic accretes across five near-duplicate files. Claude updates none of them — or the wrong one — and every session re-learns what a previous session already knew.
- Vague descriptions. Native recall has no semantic search. Discovery is literally "scan the index lines, pick by description, Read the file". An entry indexed as "Supabase issues" will never be found again.
- Silent truncation. Only the first 200 lines or 25,000 characters of MEMORY.md load per session — whichever comes first. The rest is dropped with no error you'd notice. Your newest entries, usually appended at the bottom, quietly stop existing. It looks like Claude got dumber; it's actually an index overflow.
This skill treats memory as an engineered retrieval system with a fixed budget, not a journal: a typed write spec, a routing rule for where each fact belongs, and a character-accurate budget for the index — plus a one-command audit that flags drift before it costs you.
Three questions, three mechanisms.
Every entry is one file, one topic: <type>_<topic>.md with required frontmatter (name / description / type).
feedback— a lesson learned. Must carry a Why section: without the reasoning, the next session re-litigates the same decision and often lands on the same wrong conclusion. The Why is what makes a lesson stick.reference— deep topic reference, SOPs, reusable pipelines.project— state snapshots: architecture, in-flight work, decision records.
The description is the retrieval key, so it must pack scenario keywords + the conclusion — "browser-side supabase.from() mutation deadlocks after tab switch, use fetch() against the REST API", never "Supabase issues". If you can't imagine the future question that would match it, the description isn't done.
Write it where you'll trip over it. Not everything belongs in memory:
| Trigger | Destination | Mechanism |
|---|---|---|
| Staring at one line/block; the WHY fits in a sentence | Inline code comment | 100% hit rate when editing that file, zero index cost |
| A multi-point contract over a known set of files | .claude/rules/*.md + paths: |
Auto-injected when matching paths are touched |
| A scenario / error class / cross-file or platform pitfall | Memory (this system) | Recalled via index description |
| A rule for every session (commands, stack, preferences) | CLAUDE.md | Always loaded |
With anti-over-migration guardrails: if removing the specific file still leaves a general lesson, it stays in memory; platform/SDK behavior stays in memory; tombstones and investigation SOPs stay in memory. Empirically (full audit of a 175-file production library): only ~15% of entries bind to a single file, and most of those already existed as code comments — index bloat comes from weak governance, not from file-local junk.
The 200-line / 25K-character load limit is the hard wall the whole design leans against. The index is budgeted like a cache, not grown like a log:
- Tiered line budget — crown entries (⭐⭐⭐ "read this before touching X") get ≤160 chars and live near the top of MEMORY.md as truncation insurance; normal entries ≤110 chars; the long tail packs into aggregate lines (
topic → [a](f1.md) · [b](f2.md) · [c](f3.md), 5–10 files per line). - Short link labels —
[foo-bar](feedback_foo_bar.md), never the full filename twice. 12–45 characters saved per line, thousands across a real library. - Hub pages — a group past ~15 entries gets a hub memory that maps the sub-topics; the index keeps one line.
- String length, not bytes — the official docs phrase the limit as "25KB", but the enforced measure is string length (= characters for CJK, which is 3 bytes/char in UTF-8). Verified by locating the actual cut: a 37.9KB / 27,972-character index truncated exactly at the 25,000-character mark — a 25,000-byte cut would have landed ~35% earlier in the file. Calibrate in bytes and you'll truncate while believing you have headroom. (We did. Twice.)
And when the index still overflows, slim in this order: migrate file-local entries out → delete true tombstones → move crowns to the top → tier the budget → compress descriptions last. Compressing first is the intuitive move and the wrong one — it trades recall quality for characters.
Full trade-off notes → references/design-philosophy.md
The built-in system is the substrate — this skill never fights it, it makes Claude write to a spec the substrate rewards:
| Auto-memory alone | With this skill | |
|---|---|---|
| Write spec | None — Claude improvises per session | 3 types, required frontmatter, Why on feedback |
| Decision protection | Conclusions get re-litigated | Why section preserves the reasoning |
| Placement | Everything lands in memory | 4-level routing; file-local facts become code comments |
| Index size | Grows until silent truncation | Budgeted in characters, audited, tiered |
| Duplication | New file per session whim | Update-over-create, frozen groups, hub pages |
| Health check | None | One-command audit: 5 hard checks + 6 soft signals + compliance score |
| At month six | "Claude forgot" | 175-file library at 100% hard-rule compliance |
For semantic retrieval over chunked storage, look at vector-backed tools like Mem0, Letta, or Zep — different problem. The native runtime does no embedding recall, so the leverage isn't in adding vectors; it's in making the index the runtime does read actually work.
These rules weren't designed on a whiteboard. They come from a production library (175 files, CJK-heavy) that hit every failure mode first: an index that silently truncated at 43KB on disk (≈31K chars, well past the 25K limit nobody had measured), a bytes-vs-characters miscalibration that re-broke it a month later, groups past 20 entries with routine recall misses. Under the current ruleset: index at 22.7K/25K chars and 190/200 lines, 100% hard-rule compliance, zero broken links — and a 483-line CLAUDE.md split down to 176 lines with every displaced fact verified findable in the library afterwards.
Sample audit output:
Memory audit · 2026-07-02 · 175 files
Hard checks (must be zero):
missing frontmatter 0
frontmatter fields 0
feedback missing Why 0
naming violations 0
broken MEMORY.md links 0
Soft signals:
oversized files 88
groups over 15 entries 3
untouched 30+ days 67
not in MEMORY.md 0
MEMORY.md size 22765/23000 chars, 190/190 lines OK
index lines >160 chars 35
... (per-item detail listings omitted) ...
Hard-rule compliance: 100.0% (0 violations / 175 files)
Target: 95% or higher
Paste this into any Claude Code session:
Install the claude-memory-manager skill from
https://github.com/jau123/claude-memory-manager
Claude handles the rest. To verify, say "audit memory" in a new session.
Or install manually
git clone https://github.com/jau123/claude-memory-manager.git && \
mkdir -p ~/.claude/skills/memory-management/templates && \
cp claude-memory-manager/SKILL.md ~/.claude/skills/memory-management/ && \
cp claude-memory-manager/templates/audit-memory.template.sh \
~/.claude/skills/memory-management/templates/Per-project audit script + CLAUDE.md memory protocol → INSTALL.md
The skill activates from natural language. No slash command.
You: "Record today's wildcard bug fix"
→ Claude writes one feedback_*.md entry: filename, frontmatter,
Why section, How-to-apply.
You: "Review the session"
→ Claude walks recent session, surfaces 3–5 candidates, asks
which to keep.
You: "Audit memory"
→ Runs scripts/audit-memory.sh, reports compliance, lists files
that need splitting.
Full trigger reference → SKILL.md
- Single-project scope. One memory directory per skill instance; no cross-project consolidation.
- No semantic ranking. The audit is pattern matching (grep + filename + frontmatter); it won't catch "two files describe the same concept in different words."
- Bash + standard Unix tools. Tested on macOS bash 3.2 and Linux bash 5.x; Windows / git-bash untested.
- No concurrency safety. Don't run the audit while another session is mid-write.
- Overkill for small libraries. Below ~10 entries or a month of project age, the built-in auto-memory is sufficient and the schema overhead doesn't pay off.
MIT · Issues and PRs welcome at jau123/claude-memory-manager.