Releases: Mibayy/token-savior
v4.0.0 — single 'optimized' profile, 97.9% @ −80% tokens
What's new
v4.0 simplifies Token Savior to a single recommended profile that just works.
pip install "token-savior-recall[mcp]"{
"mcpServers": {
"token-savior-recall": {
"command": "/path/to/venv/bin/token-savior",
"env": {
"TOKEN_SAVIOR_PROFILE": "optimized",
"WORKSPACE_ROOTS": "/path/to/project"
}
}
}
}That's it. optimized bundles tiny_plus + thin schemas + capture-disabled.
Bench results
| Plain Opus 4.7 | Token Savior v4.0 | |
|---|---|---|
| Score | 78.3% | 97.9% (188/192) |
| Active tokens / task | 17 221 | 3 395 (−80%) |
| Wall time / task | 110.6 s | 18.9 s (−83%) |
Honest correction vs v3.5
v3.5 framed the ts CLI as the headline ("drop the MCP"). The Run D mini-bench showed CLI mode is actually +337% tokens vs MCP on Claude Code — Bash wrapping + no protocol-native compact stub eats the gain.
v4.0 pitch is honest : MCP is the win on Claude Code. The CLI is kept as a portability utility for agents that don't speak MCP (Cursor, Aider, scripts), nothing more.
Migration from v3.x
TS_PROFILE=tiny_plus+TS_THIN_SCHEMAS=1→ simplyTOKEN_SAVIOR_PROFILE=optimizedtiny_plus/tiny/lean/ultra/code_modestill work (legacy aliases)- No breaking change in the dispatcher itself
PyPI : https://pypi.org/project/token-savior-recall/4.0.0/
Bench : https://github.com/Mibayy/tsbench/blob/main/BENCHMARK-SUMMARY.md
v3.5.0 — CLI release (drop the MCP overhead)
What's new
Token Savior is now a CLI binary drop-in for any AI coding agent. The MCP server still works (legacy path), but the CLI avoids the ~15k tokens / task structural overhead Claude Code injects with an attached MCP server.
pip install token-savior-recall
ts use /path/to/project
ts get my_functionBench results (tsbench v3.0)
| Plain Opus 4.7 | Token Savior v3.5.0 | |
|---|---|---|
| Score | 78.3% | 97.9% (188/192) |
| Active tokens / task | 17 221 | 3 395 (−80%) |
| Wall time / task | 110.6 s | 18.9 s (−83%) |
vs v2.9 (Apr 26 record 192/192) : −14% tokens, −29% wall, −2% score.
Key changes
- NEW
tsCLI binary + daemon mode (Unix socket, 10× faster per-call vs cold fork) - NEW
TS_THIN_SCHEMAS=1retire les descriptions sub-properties des inputSchema (−44% manifest) - FIX 5/6 memory hooks now respect
TS_MEMORY_DISABLE=1(was leaking ~10k tokens / task) - PERF mcp.* imports lazy → cold start 1.5s → 685ms (−55%)
- NEW systemd service
scripts/ts-daemon.service - NEW
ts daemon warmprecharge symbol cache pour un projet - DOC README CLI-first, MCP setup moved to Legacy section
Backward compat
The MCP server is unchanged. Existing Claude Code mcp-config still works and benefits from the same internal optims. Just enable TS_PROFILE=tiny_plus + TS_THIN_SCHEMAS=1 + TS_CAPTURE_DISABLED=1 + TS_MEMORY_DISABLE=1 in your env to get the v3 numbers.
Install
pip install token-savior-recall # CLI mode (recommended)
pip install "token-savior-recall[mcp]" # CLI + MCP legacyFull README : https://github.com/Mibayy/token-savior#readme
Benchmark methodology : https://github.com/Mibayy/tsbench/blob/main/BENCHMARK-SUMMARY.md
v3.4.0 — auto profile
One profile to rule them all
The new auto profile sizes its manifest from your actual telemetry instead of asking you to pick a hand-tuned static subset.
How it works
- Essentials (always exposed):
switch_project,list_projects,get_git_status,ts_search,ts_execute. ~1 KT. - Hot core: top-K tools (default K=10) ranked by your persisted
tool_call_counts. Tune K withTS_AUTO_HOT_K. - Cold start: falls back to the
tiny_plusbaseline so the first session is still usable before any telemetry exists.
Resulting manifest: ~15-18 tools / ~2-3 KT, converging to your personal usage after a few sessions.
Use it
export TOKEN_SAVIOR_PROFILE=autoProfile cleanup
core,nav,lean,ultra,tiny,tiny_plusare now deprecated. Setting any of them prints a stderr notice pointing toauto. They will be removed in v4.0.0.fullstays as a first-class debug / power-user profile.code_modekeeps its own slot — it's an execution-mode switch (JS sandbox), orthogonal to manifest size.
Numbers
- Tests: 1476 passed, 2 skipped (+7 new auto-profile cases)
- 8 profiles → effective 3 (
auto,full,code_mode) - Manifest with telemetry data: ~2-3 KT vs full ~9 KT
Install
pip install -U token-savior-recallv3.3.0 — warm worker pool + 6-axis quality pass
Highlights
⚡ Warm worker pool for ts_execute — 31× faster
Long-lived Node worker handles all ts_execute calls in sequence with isolated vm.createContext() per script. Cold spawn 67.96 ms/call → warm reuse 2.20 ms/call.
🧹 Six quality improvements
- Test isolation: latency.py writes redirected to per-session temp DB so pytest never pollutes prod memory.db. Fixed 2 memory_viewer FTS flakes.
- Quieter logs: HuggingFace
UserWarningfrom embeddings.py silenced. - Node 24 CI: actions/checkout@v5 + setup-python@v6 — clears deprecation banner.
- README: full Code Mode section with discovery flow example.
- set_project_root guard: idempotent on already-registered projects (audit showed 32% of calls did redundant reindex). New
force=trueflag preserves rebuild path. - Process hardening: scripts/preflight.sh (ruff + pytest) mandatory before push.
Numbers
- ts_execute latency: 31× faster warm vs cold
- Tools advertised (full): 68
- Tests: 1469 passed, 2 skipped
Install
pip install -U token-savior-recallv3.2.0 — Code Mode complete
Code Mode is now fully operational
v3.1.0 shipped the sandbox + ts_execute. v3.2.0 makes the discovery flow + manifest savings work end-to-end.
🎨 Typed facade auto-generated from MCP inputSchema
Every tool's TypeScript signature is now derived from its real inputSchema. Required fields are non-optional, optional fields use ?, enums become string-literal unions, arrays use Array<T>. Before:
find_symbol: (args?: Record<string, unknown>) => Promise<unknown>;After:
find_symbol: (args?: { name?: string; names?: Array<string>; level?: number; hints?: boolean; project?: string }) => Promise<unknown>;📦 New `code_mode` profile (4 tools, ~1.5 KT manifest)
`TOKEN_SAVIOR_PROFILE=code_mode` exposes only `ts_execute`, `ts_search`, `switch_project`, `list_projects`. −83% manifest tokens vs full. Discovery on demand via ts_search.
🔎 `ts_search(format='ts')`
ts_search now accepts `format='ts'` which returns one-line TS signatures instead of full JSONSchema — roughly half the tokens per match. Auto-selected when profile=code_mode.
Suggested flow under code_mode
```
- ts_search(query='find symbol then deps', format='ts')
→ returns find_symbol + get_dependents signatures - ts_execute(script=`
const sym = await tools.find_symbol({ name: 'foo' });
const deps = await tools.get_dependents({ name: sym.symbol });
return { sym, deps };
`)
→ 1 round-trip, full result returned
```
Numbers
- Tools advertised (full profile): 68
- Tests: 1464 passed, 2 skipped (+5 since v3.1.0)
- Manifest math: full ~9 KT, lean ~7 KT, tiny_plus ~2.5 KT, code_mode ~1.5 KT
Install
```bash
pip install -U token-savior-recall
```
v3.1.0 — Code Mode + Windows fix + optim pass
Highlights
🚀 Code Mode (new!)
A single new tool ts_execute(script, timeout_ms) runs a JS function body in a Node sandbox with a typed 34-tool facade. Chain await tools.find_symbol(...) → await tools.get_function_source(...) → await tools.get_dependents(...) in one round-trip instead of three. Adapted from Cloudflare's Code Mode for MCP pattern.
- Cold subprocess: ~50 ms · 3-tool chain: ~50 ms total
- Disabled with
TS_CODE_MODE_DISABLE=1 - Allowlist: 34 navigation + graph + edit + git + audit tools
🪟 Windows MCP unblocked (#27)
Every subprocess.run in production code now passes stdin=subprocess.DEVNULL, fixing the IOCP ProactorEventLoop deadlock on Python 3.14 Windows. Credit to @arkancrow for the root-cause analysis.
🧰 Operational hardening
- Hooks: 5 sites in
memory-pretooluse.sh/memory-userprompt.shswitched to stdin pipe — eliminates 1620 recurringjson.JSONDecodeErrorentries inhook-errors.log. - Sessions table: empty-row guard added to
session_end. Initial purge cleared 11 582 rows of bench noise (11 612 → 30, DB 8.6 → 6.2 MB). - Latency tracer: new
tool_latencySQLite table records per-call(ts, tool, project, duration_ms, status, error_type). Overhead measured at < 1 ms.
Numbers
- Tools advertised: 68 (was 67)
- Tests: 1459 passed, 2 skipped (was 1450)
Install
pip install -U token-savior-recallOr with uvx:
uvx token-savior-recallv2.6.0 — Memory Engine Phase 1+2 + tsbench 97.8%
tsbench (90 paired tasks · Opus 4.7)
| Plain Claude | Token Savior | Δ | |
|---|---|---|---|
| Score | 120/180 (66.7%) | 176/180 (97.8%) | +31.1pp |
| Active tokens | 1.55M | 821k | −47% |
| Wall time | 166min | 36min | −78% |
| W/T/L | — | 40 / 48 / 2 |
8 of 11 categories at 100% (audit, bug_fixing, code_generation, code_review, config_infra, documentation, git, refactoring, writing_tests).
Bench-driven fixes
CLAUDE_PROJECT_ROOTenv auto-promotes active project at boot- Explicit
project=hint auto-promotes active project on first call TS_WARM_START=1pre-builds index at server startget_full_contextdefaults to compact mode (source head 80 lines + names-only deps)- Empty-result
_suggestiononsearch_codebaseandget_dependents - Lower defaults on noisy analyses (
analyze_config,find_dead_code,find_semantic_duplicates) leanprofile (59 tools) confirmed as bench default
Memory Engine (Phase 1+2)
<private>tag stripper, content_hash dedup O(1),ts://obs/{id}citation URIs- PreToolUse-Read hook → file-context injection
- Session-end rollup structuré (FTS5, 6 fields)
- Progressive disclosure formalisé (Layer 1/2/3)
- sqlite-vec hybrid search + RRF fusion (FTS fallback graceful)
- Web viewer opt-in
127.0.0.1:$TS_VIEWER_PORT(htmx + SSE) - LLM auto-extraction PostToolUse (opt-in
TS_AUTO_EXTRACT=1)
Stats
- Tools: 105 (lean profile: 59)
- Tests: 1318/1318 ✅
- Vector search:
sqlite-vec+sentence-transformers/all-MiniLM-L6-v2
Install
pip install token-savior-recall==2.6.0Full changelog: CHANGELOG.md
v2.1.1 — Token Savior Recall
Highlights
75 MCP tools · 891/891 tests · 97% token reduction · Persistent memory across sessions
This is the recommended release of the v2.1.x line. v2.1.0 was tagged but never published as a Release — use v2.1.1 instead (CI fixes + private assets removed, no functional change).
Phase 2 — Advanced Context Engine
- Program slicing via backward Data Dependency Graph (Python AST)
→get_backward_slice(name, variable, line)returns the minimal instructions affecting a variable at a given line. 92% token reduction on debug workflows. - Knapsack context packing (greedy fractional, Dantzig 1957)
→pack_context(query, budget_tokens)returns the optimal symbol bundle within a fixed token budget. - PageRank / RWR on the dependency graph (Tong, Faloutsos, Pan 2006)
→get_relevance_cluster(name, budget)ranks symbols by mathematical relevance, capturing indirect dependencies BFS misses. - Markov predictive prefetching with daemon warm cache
→ After each tool call, the next most-likely tool is pre-computed in background. 77.8% prediction accuracy on common chains. - Proof-carrying edits via static-analysis EditSafety certificate
→ `verify_edit` and `apply_symbol_change_and_validate` now attach a non-blocking cert: signature preservation, exception unchanged, side-effect unchanged. - Semantic AST hash (alpha-conversion + docstring stripping)
→ `find_semantic_duplicates` detects functions equivalent modulo variable renaming. Falls back to text hash on syntax errors.
Phase 1 — Core Optimizations
- Symbol-level content hashing — 19x reindex speedup on targeted edits
- 2-level semantic hash (signature + body) — precise breaking change detection
- Conversation Symbol Cache (CSC) — 93% token savings on re-accessed symbols
- Lattice of Abstractions L0→L3 — 94-97% compression vs full source
Memory Engine
- 16 memory tools, 8 lifecycle hooks, 12 observation types
- LRU scoring: `0.4 × recency + 0.3 × access + 0.3 × type_priority`
- Delta injection at SessionStart — only changed observations are re-injected (50-70% savings vs full refresh)
- Auto-promotion (note × 5 → convention, warning × 5 → guardrail)
- Contradiction detection, auto-linking, TTL per type
- Mode system (`code` / `review` / `debug` / `infra` / `silent`) with auto-detection
- CLI `ts memory {status,list,search,get,save,top,why,doctor,relink}`
- Telegram feed for critical observations (optional)
- Markdown export + git versioning
Refactor
- `_build_line_offsets` extracted to `models.py` as shared helper (15 annotators dedup'd)
Manifest optimization
- 80 → 75 tools (-6%), 42K → 36K chars (-14%), ~1500 tokens/session saved
v2.1.1 patch (vs v2.1.0)
- Resolve 6 ruff errors (5 E731 lambda assignments inlined in `_warm_cache_async`, 1 F401 unused import)
- Remove `assets/` directory from repo (private marketing assets accidentally committed) and add to `.gitignore`
- No functional change vs v2.1.0
Install
```bash
uvx token-savior-recall
```
Or development install: see README.
🤖 Generated with Claude Code
v1.0.0 — Production Stable
Token Savior v1.0.0 -- Release Notes
25 files changed, 217 symbols affected. 865 tests passing.
Architecture overhaul
The server has been restructured from a monolithic 2,400-line server.py into focused modules:
tool_schemas.py-- all 53 MCP tool schemas extracted (server.py reduced to 1,002 lines)cache_ops.py--CacheManagerclass for persistent JSON cache (save, load, legacy migration)slot_manager.py--SlotManager+_ProjectSlotfor multi-project lifecyclebrace_matcher.py-- sharedfind_brace_end_*for C, C#, Rust, Go annotatorsquery_api.py--ProjectQueryEngineclass (22 methods +as_dict()) replaces 705-line closure
Performance
- LazyLines: file content lazy-loaded from disk on demand instead of stored in cache. Cache size reduced ~57%, idle RAM reduced proportionally.
- Manual serialization:
CacheManager.index_to_dict()does zero-copy field-by-field serialization instead ofdataclasses.asdict(). - scandir batching:
_check_mtime_changesusesos.scandir()per directory. - Regex cache: module-level
_WORD_BOUNDARY_CACHEavoids recompiling patterns. - File limits:
ProjectIndexergainsmax_filesparam (env:TOKEN_SAVIOR_MAX_FILES, default 10,000).
Bug fixes
- Path traversal:
create_checkpointvalidates paths withos.path.commonpath. - Triple save:
_dirtyflag pattern ensures_save_cachecalled at most once per execution path. - Output truncation:
get_dependentsandget_change_impactgainedmax_total_chars(default 50,000).
Tool fusions
get_changed_symbols: now accepts optionalrefparameter (replacesget_changed_symbols_since_ref)apply_symbol_change_and_validate: now acceptsrollback_on_failureparameter (replacesapply_symbol_change_validate_with_rollback)
Deprecated (removal in v1.1.0)
| Deprecated tool | Use instead |
|---|---|
get_changed_symbols_since_ref |
get_changed_symbols(ref=...) |
apply_symbol_change_validate_with_rollback |
apply_symbol_change_and_validate(rollback_on_failure=true) |
Both inject a _deprecated field in their response with migration instructions. Schemas marked "deprecated": true.
Annotator refactoring
annotate_rust: 6 extracted handlers (_handle_rust_impl,_handle_rust_macro,_handle_rust_struct,_handle_rust_enum,_handle_rust_trait,_handle_rust_fn). Complexity dropped from 211 to under 150.annotate_csharp: 8 extracted handlers (_handle_csharp_namespace,_handle_csharp_type,_extract_type_methods, etc.). Complexity dropped from 201 to under 150.AnnotatorProtocol:typing.Protocol+runtime_checkablefor annotator type safety.
New modules
| Module | Purpose |
|---|---|
src/token_savior/tool_schemas.py |
53 tool schemas + DEPRECATED_TOOLS set |
src/token_savior/cache_ops.py |
CacheManager (save/load/migrate) |
src/token_savior/slot_manager.py |
SlotManager + _ProjectSlot |
src/token_savior/brace_matcher.py |
Per-language brace matching |
Test coverage
| Suite | Tests |
|---|---|
test_cache_ops.py |
12 |
test_slot_manager.py |
13 |
test_server_integration.py |
5 |
test_annotator_protocol.py |
4 |
test_tool_schemas.py |
9 |
| Total project | 865 |
Benchmarks
benchmarks/run_benchmarks.py: automated benchmarks on FastAPI + CPython measuring index time, RAM, query response time, and cache size..github/workflows/benchmark.yml: GitHub Action for release benchmarks.
v0.9.0 — Initial release
First public release of Token Savior — structural code intelligence MCP server.
What's included
- Symbol-level code navigation (get_function_source, get_class_source, find_symbol)
- Dependency graphs (get_dependencies, get_dependents, get_change_impact)
- Git-aware structural diff (get_changed_symbols, build_commit_summary)
- Dead code detection and hotspot analysis
- Multi-project support with sub-millisecond queries
- 87% token reduction on large codebases