Reasonix is opinionated, not general. Every abstraction is justified by a DeepSeek-specific behavior or economic property. If it's generic, we don't ship it.
The product north star: coding agent that stays cheap enough to leave on. A tool that quietly burns $200/month on a background project is one nobody uses. Every subsystem below is answerable to that goal.
Problem. DeepSeek bills cached input at ~10% of the miss rate. Automatic prefix caching activates only when the exact byte prefix of the previous request matches. Most agent loops reorder, rewrite, or inject fresh timestamps each turn — cache hit rate in practice: <20%.
Solution. Partition the context into three regions:
┌─────────────────────────────────────────┐
│ IMMUTABLE PREFIX │ ← fixed for session
│ system + tool_specs + few_shots │ cache hit candidate
├─────────────────────────────────────────┤
│ APPEND-ONLY LOG │ ← grows monotonically
│ [assistant₁][tool₁][assistant₂]... │ preserves prefix of prior turns
├─────────────────────────────────────────┤
│ VOLATILE SCRATCH │ ← reset each turn
│ R1 thought, transient plan state │ never sent upstream
└─────────────────────────────────────────┘
Invariants:
- Prefix is computed once per session, hashed, and pinned.
- Log entries are serialized in append order; no rewrites.
- Scratch is distilled via Pillar 2 before any information from it is folded into the log.
Metric. prompt_cache_hit_tokens / (hit + miss) exposed per-turn and
aggregated per-session. Visible in the TUI's top-bar cache cell.
Each tool declares parallelSafe?: boolean (default false). The loop
dispatcher groups consecutive parallel-safe calls into chunks and races
them via Promise.allSettled; the first non-parallel-safe call ends the
chunk and runs alone (serial barrier — read-after-write order
preserved). Tool-result yields and history append still land in declared
order regardless of which call settles first, so the model sees the
same shape it would under a fully serial dispatch.
| Env var | Default | Effect |
|---|---|---|
REASONIX_PARALLEL_MAX |
3 (hard cap 16) |
Max chunk size. |
REASONIX_TOOL_DISPATCH=serial |
unset | Forces serial dispatch — escape hatch. |
Built-in opt-ins: read-only filesystem (read_file, list_directory,
directory_tree, search_files, search_content, get_file_info),
web (web_search, web_fetch), recall_memory, semantic_search,
isolated child loops (run_skill, spawn_subagent), in-memory job
queries (job_output, list_jobs). Mutating / side-effecting tools
stay default. MCP-bridged tools default false — third-party tools
opt in only when the server explicitly declares parallel safety.
Problem. Empirical DeepSeek failure modes:
- Tool-call JSON emitted inside
<think>, missing from the final message. - Arguments dropped when schema has >10 params or deeply nested objects.
- Same tool called repeatedly with identical args (call-storm).
- Truncated JSON due to
max_tokenshit mid-structure.
Solution. Four passes:
flatten— schemas with >10 leaf params or depth >2 are auto-detected onToolRegistry.register()and presented to the model in dot-notation form.dispatch()re-nests the args before calling the user'sfn.scavenge— regex + JSON parser sweepsreasoning_contentfor any tool call the model forgot to emit intool_calls.truncation— detect unbalanced JSON and repair by closing braces or requesting a continuation completion.storm— identical(tool, args)tuple within a sliding window → suppress the call, inject a reflection turn.
Problem. Coding agents that default to the frontier model (v4-pro, ~12× flash cost) and accumulate full tool results in context are $150-$250/month for active users. Most turns don't need frontier reasoning; most sessions re-pay for tool results that were only useful once.
Solution. Four complementary mechanisms, none of which require manual tuning in the common case:
The three presets trade model tier and reasoning effort:
| Preset | Model | Effort | Cost |
|---|---|---|---|
flash |
v4-flash |
max |
1× |
auto (default) |
v4-flash → v4-pro on hard turns |
max |
1–3× |
pro |
v4-pro |
max |
~12× |
All auxiliary calls — forceSummaryAfterIterLimit, subagent spawns,
truncation repair retries — hard-code v4-flash + effort=high regardless
of the user's preset. There's no reason to pay pro rates for "paraphrase
these tool results into prose" or for an explore subagent's grep chain.
Every tool result in the log exceeding TURN_END_RESULT_CAP_TOKENS (3000)
is shrunk to that cap when a turn ends. The model had the full text for
the turn that read it; subsequent turns see a compact summary and can
re-read if needed. One extra read_file call is vastly cheaper than
dragging 12KB through every future prompt.
A proactive 40% context-ratio threshold runs the same shrink pre-emptively inside long multi-iter turns before the 80% emergency threshold fires.
Users switch between flash and pro via /model flash or /model pro
(persistent — applies to every turn until changed). Model can also be
set in .reasonix/settings.json under the model key. No one-shot
arming; no forgotten revert risk when switching is explicit and sticky.
History. Pre-0.50.0,
/prooffered single-turn arming — type/pro, the next turn ran on v4-pro then auto-disarmed. Removed in 0.50.0 (#1657, #1630) when presets were collapsed into direct model selection.
The model itself decides when a task exceeds its current tier. If a task
clearly needs stronger reasoning, the model emits a <<<NEEDS_PRO>>>
marker as the first line of its response. The system aborts the current
flash call and retries the turn on pro. Two forms:
<<<NEEDS_PRO>>>— bare marker, no rationale.<<<NEEDS_PRO: <reason>>>>— includes a one-sentence rationale the user sees in a warning row.
On the pro tier, the marker is a no-op — pro is the top, so the contract tells the model it can't escalate further. This is purely self-report: there is no failure-counter threshold, no scavenge/storm counting, no automatic escalation based on tool errors.
Per-turn and session cost are colored in the StatsPanel:
turn $0.003— green <$0.05, yellow $0.05–0.20, red ≥$0.20session $0.12— same scale ×10
src/
├── client.ts # DeepSeek client (fetch + SSE)
├── loop.ts # Pillar 1 + 3 — CacheFirstLoop
├── repair/ # Pillar 2 pipeline
│ ├── index.ts
│ ├── scavenge.ts
│ ├── flatten.ts
│ ├── truncation.ts
│ └── storm.ts
├── prompt-fragments.ts # TUI_FORMATTING_RULES, NEGATIVE_CLAIM_RULE —
│ # reused by main + subagent + skill prompts
├── code/prompt.ts # reasonix code main system prompt
├── tools/ # Tool implementations
│ ├── filesystem.ts # read / list / search / edit / write
│ ├── shell.ts # run_command + run_background (JobRegistry)
│ ├── jobs.ts # background-process registry
│ ├── memory.ts # remember / forget / list user memories
│ ├── skills.ts # list + invoke SKILL.md playbooks
│ ├── subagent.ts # spawn_subagent — flash+high by default
│ ├── plan.ts # submit_plan (review gate)
│ └── web.ts # web_search, web_fetch (multi-engine: Bing, Baidu, SearXNG, Metaso and API engines)
├── mcp/ # MCP client + bridge (stdio + SSE)
├── memory.ts # ImmutablePrefix / AppendOnlyLog / VolatileScratch
├── project-memory.ts # REASONIX.md loader
├── user-memory.ts # ~/.reasonix/memory/ store (project + global)
├── skills.ts # built-in explore + research skills
├── session.ts # JSONL session persistence
├── telemetry.ts # cost + cache-hit accounting + SessionSummary
├── tokenizer.ts # DeepSeek V3 tokenizer (ported)
├── usage.ts # ~/.reasonix/usage.jsonl roll-up
├── types.ts # ChatMessage, ToolCall, ToolSpec
├── index.ts # library barrel
└── cli/
├── index.ts # commander entry
├── resolve.ts # config + CLI flag precedence
├── commands/ # chat, code, run, stats, sessions, ...
└── ui/
├── App.tsx # root Ink component (~1984 LOC, was 2931)
├── LiveRows.tsx # spinner rows (OngoingTool / Status / ...)
├── EventLog.tsx # Historical row rendering
├── StatsPanel.tsx # top bar + cost badges
├── PromptInput.tsx # cursor-aware multi-line input
├── PlanConfirm.tsx # submit_plan review modal
├── ShellConfirm.tsx # run_command approval modal
├── EditConfirm.tsx # per-edit review modal
├── markdown.tsx # Ink-native markdown renderer
├── edit-history.ts # EditHistoryEntry + formatters
├── useEditHistory.ts # /undo, /history, /show state machine
├── useCompletionPickers.ts # slash, @, slash-arg pickers
├── useSessionInfo.ts # balance + models + updates fetch
├── useSubagent.ts # subagent sink wiring
└── slash/ # /-command implementation
├── types.ts # SlashContext, SlashResult, ...
├── commands.ts # SLASH_COMMANDS data + parse + suggest
├── helpers.ts # git, memory, token formatters
├── dispatch.ts # registry + handleSlash lookup
└── handlers/ # per-topic: basic, mcp, memory,
# skill, admin, observability, edits,
# jobs, sessions, model (/pro lives here)
Files kept small by design: the largest module under cli/ui/ is 2K
lines (App.tsx), every handler under slash/handlers/ is ≤200 lines,
every hook under cli/ui/ is ≤310 lines. Adding a new slash command
means editing one handler file and one registry line.
- v0.0.x — Pillar 1 end-to-end, repair pipeline complete, Ink TUI scaffold.
- v0.1 — τ-bench numbers published, streaming polish, transcript replay.
- v0.3 — MCP client (stdio + SSE), session persistence.
- v0.4.x —
reasonix codewith SEARCH/REPLACE edits, review/auto gate, background jobs, hooks. - v0.5.x — V4 model support, skills, memory, subagents, actionable error messages.
- v0.6 —
- Cost control (flash-first defaults, auto-compaction,
/proone-shot, failure-triggered escalation, cost badges). deepseek-chat/deepseek-reasonerscheduled for deprecation — all user-facing surfaces updated tov4-flash/v4-pro.- Shared prompt fragments (
TUI_FORMATTING_RULES,NEGATIVE_CLAIM_RULE). - UI refactor: App.tsx split into 6 hooks/components, slash.ts split into 13 per-topic modules.
- Cost control (flash-first defaults, auto-compaction,
- v0.31 (current) —
branch+harvestfeatures removed entirely (the parallel-sample selector and Pillar 2 plan-state extractor); both rarely paid for themselves and bloated the slash surface.
- Multi-agent orchestration as a first-class concept (subagents are a cost-reduction mechanism, not a coordination primitive).
- RAG / vector retrieval.
- Support for non-DeepSeek backends (an OpenAI-compatible shim would
work today via
--modeloverride, but is not tested). - Web UI / SaaS.
- Automatic cost escalation without user-visible announcement. Every pro-tier model call is surfaced; silent escalation was considered and rejected.