JARVES v6 "Secretary"

Claude Code's local assistant. Offloads file operations, shell execution, and routine LLM tasks to a local model — saving Claude API tokens on the tasks where a local model or zero-LLM path does the job.

What it saves

The biggest Claude token costs in daily use:

Task	Without JARVES	With JARVES	Saving
Read a 600-line Python file	~12,000 tokens	—	—
`/outline` that file	—	~640 tokens returned	~11,360 tokens
`/grep` for one function	—	~130 tokens returned	~11,870 tokens
`/summarize` a config file	—	~200 tokens returned	~11,800 tokens
`/run` git log, ls, find	—	0 tokens	100%
`/write` or `/patch` a file	Edit + Read round-trip	0 tokens	100%

Zero-LLM endpoints (/grep, /outline, /tree, /exists, /write, /patch, /run, /read) never touch any model — they're pure Python and respond in <50ms.

LLM endpoints (/summarize, /codegen, /ask) route to a local Ollama model. No Claude API call, no cost.

Architecture

Claude Code
    │  HTTP POST (localhost:7860)
    ▼
jarves.py  (Flask)
    │
    ├── Zero-LLM path (instant, no model)
    │     /run /read /grep /outline /tree /exists /write /patch
    │
    └── Local-LLM path (Ollama, no cloud)
          /ask /summarize /codegen /batch

Quick start

1. Install Ollama and pull a model

# https://ollama.com
ollama pull qwen3:4b
ollama create qwen3-4b-jarves -f Modelfile.qwen3-4b

# Semantic memory (optional but recommended)
ollama pull nomic-embed-text

2. Start the server

pip install flask requests numpy rich
python jarves.py
# Server at http://localhost:7860

3. Use the client

import sys; sys.path.insert(0, '/path/to/jarves')
from j import J

# Zero-LLM — instant, no model cost
J.exists("~/project/file.py")                      # existence check
J.outline("~/project/app.py")                      # function/class map
J.grep("~/project/app.py", "def process", context=3)  # search with context
J.tree("~/project", depth=2)                       # directory tree
J.write("~/project/config.py", "KEY = 'value'")   # write file
J.patch("~/project/config.py", "old_val", "new")  # find-and-replace
J.run("git log --oneline -5")                      # shell command

# Local-LLM — no Claude API tokens
J.summarize("~/project/big_file.py", focus="error handling")
J.codegen("write a function to flatten a nested list")
J.ask("what does this regex do: r'\\d{3}-\\d{4}'")

# Batch multiple ops in one call
J.batch([
    ("outline", "~/project/app.py"),
    ("run", "pytest --tb=short"),
    ("exists", "~/project/.env"),
])

Endpoint reference

Zero-LLM (no model involved)

Endpoint	Method	Key params	Returns
`/run`	POST	`cmd`, `timeout`	`{output}`
`/read`	POST	`path`, `limit`	`{content}`
`/grep`	POST	`path`, `pattern`, `context`	`{matches}`
`/outline`	POST	`path`	`{outline}` — func/class map
`/tree`	POST	`path`, `depth`	`{tree}`
`/exists`	POST	`path`	`{exists, is_file, size}`
`/write`	POST	`path`, `content`	`{result}`
`/patch`	POST	`path`, `old`, `new`	`{result}`

Local-LLM (Ollama, no cloud)

Endpoint	Method	Key params	Returns
`/ask`	POST	`task`, `max_tokens`	`{result}`
`/summarize`	POST	`path` or `text`, `focus`	`{summary}`
`/codegen`	POST	`task`, `lang`	`{code}`
`/batch`	POST	`{tasks: [...]}`	`{results: [...]}`
`/note`	POST	`key`, `value`	`{saved}`
`/memory/clear`	POST	—	`{cleared}`
`/status`	GET	—	`{status, model, tokens_saved_est}`

Models

Modelfile	Base	Size	Notes
`Modelfile.qwen3-4b`	qwen3:4b	2.5 GB	Recommended — good for Apple Silicon
`Modelfile.qwen3`	qwen3:8b	5.2 GB	Better quality, slower
`Modelfile.gemma3`	gemma3:4b	3.3 GB	Fallback

Tested on Apple Silicon (M-series). Runs entirely on-device via Ollama.

Benchmark results (tested on Apple M-series, qwen3:4b)

Zero-LLM ops:  7/7 passed   avg response: 0.02s
Local-LLM ops: 3/3 passed   avg response: 20-37s

Tokens saved estimate (one session): ~30,000+
Saving per /outline call: ~11,600 tokens
Saving per /grep call:    ~11,900 tokens

Zero-LLM endpoints are always <50ms. LLM endpoints (summarize, codegen) take 15-40s on qwen3:4b due to chain-of-thought — use them for background tasks, not interactive queries.

Best use cases for Claude Code

"Does this file have a process_data function?" → J.grep("file.py", "def process_data") — 0 tokens, instant
"What's in this project?" → J.tree("~/project") — 0 tokens, compact output
"I need to understand this 800-line file" → J.summarize("file.py", focus="main logic") — local LLM, no API cost
"Write a helper function for X" → J.codegen("...") — local LLM, no API cost
"Patch this config value" → J.patch("config.py", "old", "new") — 0 tokens, instant

Requirements

Python 3.9+
Ollama running locally
flask requests numpy rich

pip install flask requests numpy rich

Version history

Version	Changes
v6 "Secretary"	+5 new zero-LLM endpoints: /grep, /outline, /tree, /exists, /write, /patch; qwen3:4b; token savings counter
v5	Core architecture: /ask auto-routing, /run, /read, /summarize, /codegen, semantic memory

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
Modelfile.gemma3		Modelfile.gemma3
Modelfile.qwen3		Modelfile.qwen3
Modelfile.qwen3-4b		Modelfile.qwen3-4b
README.md		README.md
j.py		j.py
jarves.py		jarves.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JARVES v6 "Secretary"

What it saves

Architecture

Quick start

1. Install Ollama and pull a model

2. Start the server

3. Use the client

Endpoint reference

Zero-LLM (no model involved)

Local-LLM (Ollama, no cloud)

Models

Benchmark results (tested on Apple M-series, qwen3:4b)

Best use cases for Claude Code

Requirements

Version history

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JARVES v6 "Secretary"

What it saves

Architecture

Quick start

1. Install Ollama and pull a model

2. Start the server

3. Use the client

Endpoint reference

Zero-LLM (no model involved)

Local-LLM (Ollama, no cloud)

Models

Benchmark results (tested on Apple M-series, qwen3:4b)

Best use cases for Claude Code

Requirements

Version history

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages