This is the hands-on book. It assumes nothing beyond "I can compile a C++17 program". By the end you will know how to:
- read the Primer below first if you've never traced a tool call end-to-end — it's the bottom-up walkthrough that makes the rest of the manual click
- compile easyai and download a model
- run
easyai-localand talk to it - host
easyai-serverand call it from Claude Code, OpenAI SDKs, or curl - embed
easyai::Enginein your own program (local llama.cpp) - embed
easyai::Clientin your own program (remote OpenAI-compatible server, with local tools) - drive a remote server end-to-end with
easyai-cli, including a planning tool and live system-observability tools - write a custom tool, with typed parameters
- tune the sampler with presets and runtime overrides
- deploy easyai-server as a hardened Linux service and operate it
- debug common issues (context overflow, malformed tool calls, GPU fallback, TLS, rate limits)
| Part | Chapter | What you get |
|---|---|---|
| Primer | How a tool call works on the wire | The 10 steps from your Tool declaration to the model's final answer — bytes, Jinja templates, parsing, dispatch, follow-up turn |
| 1 | Getting set up | Prereqs, repo layout, building, GPUs, models |
| 2 | Using the binaries | easyai-local, easyai-server, easyai-mcp-server, easyai-cli, easyai-agent, easyai-chat, easyai-recipes |
| 3 | Embedding libeasyai |
Agent (3-line hello), Backend (local↔remote), Engine API top-to-bottom, callbacks, presets, tools, escape hatches |
| 4 | Embedding libeasyai-cli |
Client API top-to-bottom — your code drives a remote model with local tools |
| 5 | Authoring custom tools | Builder API, schemas, sandboxes, error handling, the Plan tool, system_* tools cookbook |
| 6 | Deploying easyai-server | Single-binary install, systemd unit, nginx TLS termination, per-model INI profiles, multiple-server fan-out |
| 7 | Operating the server | /health and /metrics, presets at runtime, log rotation, crash capture |
| 8 | Performance & tuning | KV cache types, flash-attn, mlock, ngl auto-fit, RoPE/YaRN context extension, GPU split mode, sampler choices |
| 9 | Recipes (cookbook) | Real prompts + flag combinations, including the planning agent, papers digest, host triage |
| 10 | Troubleshooting | Build, GPU, runtime, model, tool, network, TLS issues |
| 11 | Design references | Pointers into design.md for the deeper "why" |
If you're new, read Primer → 1 → 2 → 3. If you want to ship something to a remote model right now, jump to Part 4. If you want to write your own tool, Part 5 is the cookbook. If something's broken, Part 10 has a triage matrix.
Read this first. Everything else in the manual makes more sense once you can see the bytes flowing between you, easyai, and the model. We trace one tool call end-to-end, bottom-up — your C++ declaration → Jinja-rendered prompt → model output → parse → dispatch → result → next turn. No magic.
The example: a get_weather tool the model invokes for "what's
the weather in Lisbon?".
auto weather = easyai::Tool::builder("get_weather")
.short_describe("Current weather for a city (metric units).")
.describe("Return the current weather for a city, in metric "
"units. Caches per-city for 5 minutes. Returns "
"'<temp>°C, <conditions>' or an error.")
.param("city", "string", "City name, e.g. 'Lisbon'", /*required=*/true)
.handle([](const easyai::ToolCall & c) -> easyai::ToolResult {
std::string city = easyai::args::get_string_or(c.arguments_json, "city", "");
if (city.empty()) return easyai::ToolResult::error("'city' is required");
return easyai::ToolResult::ok("23 °C, sunny");
})
.build();
engine.add_tool(weather);
engine.chat("what's the weather in Lisbon?");The Tool carries five fields: name, description (the full
manual), short_description (one-line trigger), parameters_json
(JSON schema), handler.
Shape-C wire shape (since 2026-05-26). Per-turn
<tools>blocks shipname + short_description + schema(~2 000 tokens saved vs. the full description). Models reach fortool_lookup (name="get_weather")to pull the full multi-line body when they need it. Builders that set only.describe(...)keep working —wire_description()falls back to the first 120 chars of the full description.
At this point easyai does nothing except remember the tool — no prompt has been built yet.
When chat() runs, easyai hands the conversation + tool catalogue
to llama.cpp's common_chat_templates_inputs. Conceptually:
inputs = {
messages: [
{ role: "system", content: "<your system prompt>" },
{ role: "user", content: "what's the weather in Lisbon?" }
],
tools: [
{
type: "function",
function: {
name: "get_weather",
description: "Return the current weather for a city, in metric units.",
parameters: { type: "object",
properties: { city: { type:"string", description:"…" } },
required: ["city"] }
}
}
],
add_generation_prompt: true
}
easyai itself never decides how to format this — the chat template baked into the GGUF does. The template is Jinja, shipped by the model author. Different model families use different markup, which is why easyai needs three parallel recovery layers (Step 5).
The Jinja template walks inputs.messages and inputs.tools and
emits a single string of bytes. Here are three approximate
renderings — your model's exact bytes depend on its template,
but the shape is universal:
Qwen3 / Qwen2.5 (<tool_call> + JSON):
<|im_start|>system
<your system prompt>
# Tools
You have access to the following tools. To use one, emit a
<tool_call>...</tool_call> block with the call as JSON.
<tools>
{"type":"function","function":{"name":"get_weather","description":"…","parameters":{…}}}
</tools>
<|im_end|>
<|im_start|>user
what's the weather in Lisbon?<|im_end|>
<|im_start|>assistant
Hermes-2-Pro (<tool_call> wrapping XML-ish):
<|im_start|>system
<your system prompt>
You may call tools using:
<tool_call>
<function=NAME>
<parameter=KEY>VALUE</parameter>
</function>
</tool_call>
Tools available:
- get_weather(city: string): Return the current weather for a city.
<|im_end|>
<|im_start|>user
what's the weather in Lisbon?<|im_end|>
<|im_start|>assistant
ChatML / OpenAI-style (function-call JSON in a fenced block):
<|im_start|>system
<your system prompt>
# Functions
```json
[{"name":"get_weather","description":"…","parameters":{…}}]
Call a function by emitting a fenced JSON block with name +
arguments.
<|im_end|>
<|im_start|>user
what's the weather in Lisbon?<|im_end|>
<|im_start|>assistant
The same `Tool` you declared in C++ ends up in different markup
in each. You don't need to care: easyai handles all three.
### Step 4 — the model emits a turn
The model decodes one token at a time. Each token streams to
your `on_token` callback in real time. When the model decides
to call the tool, the **raw text it produces** in a Qwen-family
model looks something like:
I'll check the current weather in Lisbon. <tool_call>{"name":"get_weather","arguments":{"city":"Lisbon"}}</tool_call>
The model then emits an end-of-turn token (e.g. `<|im_end|>`) and
stops. At this point easyai has the full raw turn as a string.
### Step 5 — easyai parses the tool call (PEG + 3 recovery layers)
`parse_assistant` (in `src/engine.cpp`) tries the parsers in
order, fastest path first:
1. **PEG parser** (llama.cpp's `common_chat_parse`). Knows the
model family (selected from the GGUF metadata at load time)
and runs a grammar-driven parse. Hits ~99 % of well-behaved
turns.
2. **Qwen recovery scanner** (`recover_qwen_tool_calls`). For
`<tool_call>{…}</tool_call>` blocks the PEG dropped because
of an inner-brace edge case.
3. **Hermes recovery scanner** (`recover_hermes_tool_calls`).
For `<function=NAME><parameter=K>V</parameter></function>`
markup that the PEG sometimes fails to assemble.
4. **Markdown recovery scanner** (`recover_markdown_tool_calls`).
Last-resort heuristic for "🔧 get_weather(city='Lisbon')"
and similar prose-style emissions some weak models prefer.
All four converge on the same shape — `common_chat_msg` with a
`content` field (visible reply, may be empty) and a
`tool_calls[]` array. After parsing, easyai materialises:
```cpp
struct ParsedAssistantTurn {
std::string content = "I'll check the current weather in Lisbon.";
std::string reasoning_content = ""; // any <think>…</think> block
std::vector<ToolCall> tool_calls = {
ToolCall {
.id = "call_0",
.name = "get_weather",
.arguments_json = "{\"city\":\"Lisbon\"}"
}
};
std::string finish_reason = "tool_calls";
};
For each entry in tool_calls, easyai looks up the registered
Tool by name and invokes its handler. The handler runs on
the same thread that called chat() — it's just a function
call. No threads, no IPC, no fork. Your handler is free to
block, do HTTP, hit the disk, anything.
// Inside Engine::Impl::dispatch_tool (simplified):
auto * t = find_tool_by_name(call.name); // Step 1
if (!t) {
return ToolResult::error("unknown tool: " + call.name);
}
ToolResult result = t->handler(call); // Step 2 — your code
if (on_tool) on_tool(call, result); // Step 3 — observability hook
return result;For our example:
ToolCall in = { .name = "get_weather", .arguments_json = "{\"city\":\"Lisbon\"}", … };
ToolResult out = { .content = "23 °C, sunny", .is_error = false };A new message of role tool is appended:
HistoryMessage {
role = "tool",
content = "23 °C, sunny",
tool_name = "get_weather",
tool_call_id = "call_0" // matches the call's id
};The tool_call_id is what links the result to the assistant's
preceding tool_calls[i].id. Without it, the chat template
can't pair them up and weak models hallucinate the result back
to themselves.
easyai re-runs Jinja over the now-extended history and feeds it back into llama.cpp. Approximate Qwen3 rendering:
…(system + user same as before)…
<|im_start|>assistant
I'll check the current weather in Lisbon.
<tool_call>{"name":"get_weather","arguments":{"city":"Lisbon"}}</tool_call><|im_end|>
<|im_start|>tool
<tool_response>
{"name":"get_weather","content":"23 °C, sunny"}
</tool_response><|im_end|>
<|im_start|>assistant
The model now sees its own previous tool_call AND the result it got back. It picks up where it left off.
The current weather in Lisbon is 23 °C and sunny.<|im_end|>
finish_reason is stop. No more tool_calls. easyai returns
the visible content from chat().
If the model emits another tool_call instead of stopping,
easyai loops back to Step 6. Each loop counts as one hop;
the budget is Engine::max_tool_hops() (default 8, bumped to
99999 when bash is enabled). Hop exhaustion returns whatever
the latest visible content is, plus a last_error() string.
There's a second safety net: if the model produces an
"announce-only" turn ("Let me search…", "I'll look that up…")
without actually emitting a tool_call, easyai discards that
turn, appends a corrective synthetic user message ("don't
announce, execute"), and retries up to
Engine::max_incomplete_retries() times (default 10). This is
the looks_like_announce_phrase predicate in src/engine.cpp,
and it's why weak / 1-bit-quant models still drive a
multi-tool flow reliably.
- You write one
Tooldeclaration; it works across model families. The Jinja template + the recovery layers absorb the dialect differences. - You can swap models freely. Qwen → Hermes → DeepSeek-R1 → a Bonsai-class 1-bit quant — your tool code is unchanged.
- You can reason about correctness. Every byte of every step
above is reproducible: turn on
--verbose(CLI) or open/tmp/easyai-<pid>-<epoch>.logand you see the rendered prompt, the raw model output, the parsedtool_calls, the dispatch summary, and the next-turn render — in order.
When something goes wrong — model never calls the tool, calls the wrong tool, hallucinates an argument — you read the log top-to-bottom against this primer and find the layer that broke. Most of the time it's Step 3 (your tool description wasn't clear enough for the model) or Step 6 (your handler returned an error the model couldn't recover from).
Now that you have the picture, the rest of the manual fills in the practical details: Part 3 / Part 4 show how to wire this up in C++; Part 5 is the cookbook for declaring tools that the model actually calls correctly.
easyai expects llama.cpp as a sibling directory:
develop/
├── easyai/ # this project
└── llama.cpp/ # https://github.com/ggml-org/llama.cpp
Clone llama.cpp if you haven't:
cd ~/develop
git clone https://github.com/ggml-org/llama.cpp| Required | Why |
|---|---|
| CMake ≥ 3.18 | build system |
| A C++17 compiler | the library is C++17 |
| (Apple) Xcode CLT | Metal headers for GPU acceleration |
| (Linux/Win) Vulkan SDK | optional; pass -DGGML_VULKAN=ON |
| Optional | Used by |
|---|---|
| libcurl | the unified web tool (action=search / fetch) |
On macOS:
brew install cmake curl| Target | Configure command |
|---|---|
| Apple Silicon / Intel Mac (Metal) | cmake -S . -B build -DCMAKE_BUILD_TYPE=Release |
| NVIDIA GPU (CUDA) | cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON |
| AMD / Intel / cross-vendor (Vulkan) | cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_VULKAN=ON |
| AMD on Linux (ROCm/HIP) | cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1100 |
| CPU-only (any OS) | cmake -S . -B build -DCMAKE_BUILD_TYPE=Release (then run with -ngl 0) |
NVIDIA / CUDA — install the CUDA Toolkit so nvcc is on PATH. If
CMake complains about an unknown architecture, pin one explicitly:
-DCMAKE_CUDA_ARCHITECTURES=89 (e.g. for RTX 4090) or use native.
AMD / Vulkan — install the Vulkan SDK
(LunarG on Win/macOS, distro
vulkan-tools libvulkan-dev on Linux). On Linux, also install the GPU
driver's Vulkan ICD (mesa-vulkan-drivers for AMD/Intel, NVIDIA driver
ships its own).
AMD / ROCm — set AMDGPU_TARGETS to your card's gfx version.
Check with rocminfo.
CPU-only — same configure command as Metal but always pass -ngl 0 at
runtime (or engine.gpu_layers(0) in code) so layers stay on CPU.
cd easyai
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release # use the right line above
cmake --build build -jOutputs land in build/:
build/easyai-local # local-only REPL (loads a GGUF in-process)
build/easyai-cli # agentic REPL talking to a remote OpenAI-compat endpoint
# (full doc: easyai-cli.md)
build/easyai-server # HTTP server + webui (full doc: easyai-server.md)
build/easyai-mcp-server # standalone MCP-only HTTP daemon, no model loaded,
# sized for thousands of parallel clients
# (full doc: easyai-mcp-server.md)
build/easyai-agent # demo agent (every tool + a custom one)
build/easyai-chat # bare REPL (no tools)
build/libeasyai.dylib # the library
If the configure step says easyai: libcurl found — web tool enabled,
the unified web tool's search and fetch actions work out of the box
(no extra service to run).
Tiny, fast, decent at tools — start here:
mkdir -p models
curl -L -o models/qwen2.5-1.5b-instruct-q4_k_m.gguf \
'https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/qwen2.5-1.5b-instruct-q4_k_m.gguf?download=true'For real work upgrade to Qwen2.5-7B-Instruct or Llama-3.1-8B-Instruct.
./build/easyai-local -m models/qwen2.5-1.5b-instruct-q4_k_m.ggufYou'll see something like:
[easyai-local] loaded models/qwen2.5-1.5b-instruct-q4_k_m.gguf
backend=MTL0 ctx=4096 tools=7 preset=precise
type '/help' for commands, '/quit' to exit
> what's 2+2
2 + 2 equals 4.
>
Try a tool:
> What time is it right now in UTC?
[tool] datetime -> {"utc":"2026-04-25T10:20:49Z","local":"…"}
The current UTC time is 2026-04-25 10:20:49.
Try a preset:
> creative
[preset → creative]
> write a haiku about silicon
Quiet wafer hums,
moonlit traces drink the dawn —
glass dreams in the dust.
creative 0.9 … does both at once: switch preset, override temperature
just for this generation, then run the rest of the line as a prompt.
Use /help to list every preset; /system <text> to swap the system
prompt mid-session; /reset to wipe history.
./build/easyai-server -m models/qwen2.5-1.5b-instruct-q4_k_m.gguf
./build/easyai-server -m models/...gguf --sandbox ./work --allow-bash
./build/easyai-server -m models/...gguf -s system.txtWithout -s, the server boots up as Deep — an expert system
engineer persona built into the default system prompt. Deep
operates a TIME → THINK → PLAN → EXECUTE → VERIFY loop and treats
datetime as the first tool call any time the answer touches "now"
or "today". Operators who want a different voice supply their own
--system "<text>" or -s persona.txt — Deep is the default, not
hardcoded.
--allow-fs enables the unified fs tool (action=read / write /
list / glob / grep / check_path / cwd / sandbox); pair with
--sandbox <dir> to scope it under <dir> (otherwise it operates
against the process's cwd). --allow-bash adds the shell tool,
also pinned to <dir> when --sandbox is set. All three default
OFF — fresh installs don't expose write access or shell to the
model until the operator
opts in. Note that --sandbox <dir> alone does NOT register fs_;
prior versions implied it but as of 2026-05-08 the flags are
honoured independently so an operator can run with a sandbox
boundary and no fs_ registered.
If you pass -s system.txt, that text becomes the default system
prompt for any request that doesn't already include one.
Open http://127.0.0.1:8080 in a browser to use the bundled webui, or
talk to it via curl:
curl http://127.0.0.1:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"easyai","messages":[{"role":"user","content":"Hi!"}]}'Use whatever Claude Code's "OpenAI-compatible base URL" setting is called in
your version (--api-base, env var, or settings file) and set it to
http://127.0.0.1:8080/v1. Anything Claude Code declares as a tool will be
forwarded; anything it doesn't declare will use easyai's built-in toolbelt.
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="not-checked")
print(client.chat.completions.create(
model="easyai",
messages=[{"role":"user","content":"Hi!"}]
).choices[0].message.content)Every request body can carry temperature, top_p, top_k. Or your user
can put a preset right in the message:
{ "messages": [{"role":"user","content":"creative 0.9 write me a poem"}] }The server peels creative 0.9 off, applies the override, and the model
sees just write me a poem.
./build/easyai-agent -m models/qwen2.5-1.5b-instruct-q4_k_m.ggufLook at examples/agent.cpp to see how the tools are registered. The
inline flip_coin example is six lines.
If you remember nothing else, remember this:
// hello.cpp
#include "easyai/easyai.hpp"
int main() {
easyai::Agent a("models/qwen2.5-1.5b-instruct-q4_k_m.gguf");
std::cout << a.ask("What's 2+2?") << "\n";
}Agent is the friendly Tier-1 façade. Construct, ask, print.
Default toolset (datetime + the unified web tool) is wired in;
the unified fs tool and bash stay off until you opt in via
.sandbox() or .allow_bash(). Streaming output is one chained
call away:
easyai::Agent a("model.gguf");
a.system ("Be terse.")
.sandbox ("./workspace")
.preset ("creative")
.on_token([](auto p){ std::cout << p << std::flush; });
a.ask("Read README.md and summarise it.");A remote model works the same way:
auto a = easyai::Agent::remote("http://127.0.0.1:8080/v1");
auto a = easyai::Agent::remote("https://api.openai.com/v1",
std::getenv("OPENAI_API_KEY"));Agent is built on top of Backend (3.1.5) which is built on top
of Engine (3.1) and Client (3.10). When you need access to the
underlying knobs, agent.backend() is the escape hatch — it returns
the materialised Backend & so you can reach into Engine::* /
Client::* setters that Agent doesn't surface directly.
CMake:
find_package(easyai 0.1 REQUIRED)
add_executable(hello hello.cpp)
target_link_libraries(hello PRIVATE easyai::engine easyai::cli)Agent lives in libeasyai-cli (because it can transparently
dispatch to either flavour of Backend), so link both targets.
If you only need the local engine, drop easyai::cli and use
easyai::Engine directly (3.1).
// hello.cpp
#include "easyai/easyai.hpp"
int main() {
easyai::Engine engine;
engine.model("models/qwen2.5-1.5b-instruct-q4_k_m.gguf")
.gpu_layers(99)
.system("Be concise.")
.on_token([](const std::string & t){ std::cout << t << std::flush; });
if (!engine.load()) { std::fprintf(stderr, "load failed: %s\n",
engine.last_error().c_str()); return 1; }
engine.chat("What's 2+2?");
return 0;
}Add it to CMakeLists.txt:
add_executable(hello hello.cpp)
target_link_libraries(hello PRIVATE easyai)If your project lives outside this tree and you've installed easyai
(cmake --install build --prefix /usr/local), use find_package
instead:
find_package(easyai 0.1 REQUIRED)
add_executable(hello hello.cpp)
target_link_libraries(hello PRIVATE easyai::engine)easyai::engine is the link target for libeasyai.so (local llama.cpp
wrapper). For the OpenAI-protocol client described in 3.9, swap to
easyai::cli (or link both side by side).
If your program needs to handle EITHER a local -m model.gguf
flavour OR a remote --url base flavour without if-tree
duplication, the abstraction you want is easyai::Backend:
std::unique_ptr<easyai::Backend> b;
if (!url.empty()) {
easyai::RemoteBackend::Config rc;
rc.base_url = url;
rc.api_key = api_key;
rc.with_tools = true; // dispatch tools locally
b = std::make_unique<easyai::RemoteBackend>(std::move(rc));
} else {
easyai::LocalBackend::Config lc;
lc.model_path = model_path;
lc.sandbox = "./workspace";
lc.allow_bash = true;
b = std::make_unique<easyai::LocalBackend>(std::move(lc));
}
std::string err;
if (!b->init(err)) { std::cerr << err << "\n"; return 1; }
b->set_system("Be terse.");
auto reply = b->chat("hello?", [](auto p){ std::cout << p << std::flush; });Backend is the Tier-3 abstraction Agent is built on top of. Use
it when you want the local↔remote switch but still want to manage the
chat loop yourself, register custom tools, or hook tool callbacks.
The Config struct exposes every Engine/Client setting that's
relevant to "configuring an agent" (sampling preset, sandbox,
allow_bash, KV cache controls for local, TLS/timeout for remote).
LocalBackend ships in libeasyai; RemoteBackend in
libeasyai-cli. Linking only the engine library gives you the
local flavour; adding easyai::cli adds the remote flavour without
duplicating the abstract base.
The 6-line shape:
engine.add_tool(
easyai::Tool::builder("today_is")
.describe("Returns the day of the week.")
.handle([](const easyai::ToolCall &){
return easyai::ToolResult::ok("Saturday"); })
.build());With typed parameters:
engine.add_tool(
easyai::Tool::builder("send_email")
.describe("Send an email via the company SMTP relay.")
.param("to", "string", "Recipient address", /*required=*/true)
.param("subject", "string", "Subject line", true)
.param("body", "string", "Plain-text body", true)
.param("cc", "string", "Optional CC address",false)
.handle([](const easyai::ToolCall & call){
std::string to, subject, body, cc;
easyai::args::get_string(call.arguments_json, "to", to);
easyai::args::get_string(call.arguments_json, "subject", subject);
easyai::args::get_string(call.arguments_json, "body", body);
easyai::args::get_string(call.arguments_json, "cc", cc);
if (to.empty()) return easyai::ToolResult::error("missing 'to'");
if (subject.empty()) return easyai::ToolResult::error("missing 'subject'");
// … your real send code …
return easyai::ToolResult::ok("sent.");
})
.build());Tool::builder automatically synthesises the JSON schema. If you need
something fancier (nested objects, enums) build the schema string yourself
and use Tool::make(name, description, schema_json, handler).
Read this before shipping a tool. The model never sees your handler — it sees only the tool's
name,description, andparameters_json. Treat that text as the contract; vague or under-described tools produce malformed calls or the wrong action, and the model will not learn from a stack trace.
Two patterns ship in src/. Use whichever fits your tool's shape.
The default for a tool that does one thing. Examples in-tree:
bash and datetime (the standalone single-action tools). Recipe:
- Open with one sentence: what does this tool do?
- State Required vs. Optional parameters explicitly.
- Describe the output shape — what does the model see back? ("one path per line, sorted", "two lines: UTC + local", etc.)
- Show one or two concrete example payloads.
- List error / edge-case behavior (truncation, missing path, regex flavor, …).
- Each
.param()description leads withRequired.orOptional., then the constraint and default.
Concrete: this is the polished read_file description (excerpted from
src/builtin_tools.cpp):
Tool::builder("read_file")
.describe(
"Read a UTF-8 text file from disk and return its contents.\n"
"\n"
"The filesystem you see is rooted at `/`; use paths like "
"`/report.md` or `/docs/spec.md`. Required: path. Optional: "
"offset, limit (default returns the first 64 KB; pass offset to "
"page through larger files).\n"
"\n"
"Examples:\n"
" {path:\"/report.md\"}\n"
" {path:\"/docs/spec.md\", offset:65536, limit:65536}\n"
"\n"
"Errors return a single-line message starting with `error:`. "
"Reading a binary file returns the raw bytes — prefer the "
"dedicated tools or `bash` (e.g. `file <path>`) for those.")
.param("path", "string",
"Required. File path under the sandbox root, e.g. "
"`/report.md` or `/docs/spec.md`.", true)
.param("offset", "integer",
"Optional. Skip this many bytes from the start of the file "
"before reading. Default 0. Use the previous read's "
"(offset + bytes_returned) to page forward.", false)
.param("limit", "integer",
"Optional. Maximum bytes to return. Default 65536 (64 KB). "
"Larger files are truncated; raise this only when you "
"really need a bigger chunk.", false)
.handle(...)
.build();Compare this to the original one-line "Read a UTF-8 text file...":
the model now knows exactly what it gets back, has a paginating
example, and has been told what NOT to use this for.
A single tool that dispatches on a top-level action field. Reach for
this when you have N closely-related operations that share state and
parameters: plan (add / update / delete / list). Recipe:
- Open with the purpose sentence +
Pick an action; the parameters needed depend on which action you choose. N actions are supported: - One section per action:
action="X"heading, then Required / Optional, then 2–4 example payloads (literal, copy-pasteable). - Closing notes: shared semantics (status enum, id format, etc.).
- Per-property
.descriptionstrings lead with which actions consume them:"Used by add / update / delete. ...". The model maps each parameter to the right action without re-reading the description body.
Concrete: this is how the plan tool's description is laid out — see
src/plan.cpp:Plan::tool(). The closing line — "The 'items' array
MUST be a real JSON array, not a quoted string." — is there because
real models repeatedly emitted "items": "[...]" in production. Bake
those lessons into the description.
Models follow examples more reliably than they parse JSON-schema constraints. A description that shows a valid call is worth ten that describe the schema in prose. Three concrete failure modes that better descriptions prevent:
- The model invents a parameter name (
"file"instead of"path") because the description used the word "file" without showing the exact key. - The model omits a required field because the schema marked it required but the description didn't repeat that.
- The model mixes shapes from two actions (
{action:"add", id:"1"}) because the description saysaddacceptstextsomewhere but doesn't show what anaddpayload looks like end-to-end.
A good description prevents most misfires. The rest are the cost of running real models on tool calls. The library is built around the assumption that tool handlers will see imperfect input.
1. Use the lenient args::* helpers. Don't roll your own JSON probe
in a handler — the shipped helpers already accept the shapes models
actually emit:
| Helper | Accepts (beyond the spec form) |
|---|---|
get_string |
42 (number → "42"), true / false (bool → string) |
get_int |
"42" (quoted integer literal) |
get_bool |
"true" / "false" / "1" / "0" |
get_array |
"[{...}]" (stringified JSON array — unwrapped and re-parsed) |
2. Infer required fields when the model omits them. When the schema
requires action but the model leaves it out, look at the fields that
are present and pick the most likely intent. The pattern from
src/plan.cpp:
if (action.empty()) {
if (items present and first item has text) action = "add";
else if (items present and first item has status) action = "update";
else if (items present and first item has id) action = "delete";
else if (top-level text is present) action = "add";
else if (top-level status is present) action = "update";
else if (top-level id is present) action = "delete";
else action = "list";
}For multi-action tools where the same payload could mean different
things, disambiguate using current state: in plan we check
whether the supplied id already exists — if not, the model is
creating; if so, the model is updating.
3. Map common synonyms. Models pick near-miss verbs all the time.
if (action == "create" || action == "append" ||
action == "insert" || action == "new") action = "add";
else if (action == "modify" || action == "change" ||
action == "edit" || action == "set") action = "update";
else if (action == "remove" || action == "rm") action = "delete";
else if (action == "show" || action == "get" ||
action == "view") action = "list";4. Errors that teach. When you must reject a call, return an error whose body shows the correct shape inline:
return ToolResult::error(
"plan: 'add' needs either text or items. Examples: "
"{action:\"add\", text:\"my step\"} or "
"{action:\"add\", items:[{text:\"a\"}, {text:\"b\"}]}. "
"items must be a real JSON array, not a quoted string.");The model receives a copy-pasteable example for its next call instead of a cryptic hint.
5. Coalesce notifications across batched mutations. When a handler mutates shared state in a loop (e.g. plan items) and that state has subscribers (UI, telemetry), use an RAII guard to fold the per-item callbacks into one fire at scope exit. Otherwise the UI re-renders once per item:
{
Plan::Batch batch(*self); // begin batch
for (const auto & e : items) self->add(...);
} // single on_change hereThe Plan::Batch guard is in easyai/plan.hpp; the same pattern
applies to any on_change-style observable in a tool.
- One-sentence purpose at the top of
describe(). - Required vs. Optional parameters listed explicitly in prose.
- Output shape described (lines? sorted? errors?).
- At least one concrete example payload.
- Per-
.param()description leads withRequired./Optional., then constraints, then default. - Errors reference the correct shape inline.
- Lenient
args::*helpers used (no hand-rolled JSON parsing). - If multi-action: action inference + synonym mapping in place.
- If batching: callbacks coalesced (RAII guard).
A single unified fs tool with eight sub-actions covers every file
operation. Pass a root directory:
engine.add_tool(easyai::tools::fs("./workspace"));Sub-actions selected by the action parameter: read, write,
list, glob, grep, check_path, cwd, sandbox. The model
calls them as fs(action="read", path="report.md") etc.
Paths sent by the model are anchored to the root by iterating path
components and dropping any .., ., or absolute markers before
joining onto the root. Total containment by construction — there is
no path the model can construct that escapes.
The model sees a virtual /-rooted filesystem (/report.md,
/docs/spec.md); the real sandbox path is hidden from descriptions
and result messages.
fs(action="sandbox") is the model's escape hatch when it does need
the real on-disk path (typing it back in chat, invoking bash with
absolute paths, etc.). It captures the configured root at
registration so the answer is pinned — distinct from
fs(action="cwd"), which reports the process's live cwd and can
drift.
Instead of hand-rolling the standard tool registration:
easyai::cli::Toolbelt()
.sandbox ("./workspace") // adds the unified `fs` tool
.allow_bash() // adds bash; ALSO ensures `fs` is on
.with_plan (plan) // adds the plan tool
.apply (engine); // or .apply(client) for the remote variantThe Toolbelt always includes datetime + the unified web tool.
The unified fs tool is enabled by either .sandbox() or
.allow_bash() — bash is strictly more permissive than fs, so
allowing bash without fs is incoherent (the model would fall back
to cat > file for ordinary writes). A fresh agent installation
that calls neither still can't expose write or shell.
engine.add_tool(easyai::tools::bash("./workspace"));
engine.max_tool_hops(99999); // bash flows span many turnsbash is a /bin/sh -c runner. Output (stdout + stderr) is captured
and capped at 32 KiB; per-command timeout defaults to 30 s, max 300 s
(SIGTERM, then SIGKILL +2 s grace). The cwd is pinned to the root.
This is NOT a hardened sandbox — the command runs with your user privileges. It's appropriate for local single-user agents; for anything multi-tenant or production, run easyai-server inside a container / firejail / unprivileged user.
The unified fs tool's cwd action returns the absolute path of the
process's current working directory at call time. Pair it with
--sandbox: the CLIs and server chdir into the sandbox at startup,
so what fs(action="cwd") reports is exactly the directory bash
operates inside, and the same root every other fs action resolves
its RELATIVE paths against. Models that don't already know the path
should call it once at the start of a task; for any subsequent file
op, relative paths just work.
The Toolbelt adds the unified fs tool automatically when
allow_fs or allow_bash is on; the cwd action ships with it.
The authoritative guide is
EXTERNAL_TOOLS.mdat the repo root. It covers quickstart, ten recipes, anti-patterns, corner cases, sanity warnings, the collaboration workflow, and full troubleshooting. The sub-sections below are a quick reference; refer to that document when actually writing or reviewing manifests.
For tools that wrap an existing CLI binary (uname, pgrep, git,
internal scripts, etc.) you declare them in JSON manifest files
inside a directory. The --external-tools DIR flag is supported by
easyai-local, easyai-cli, and easyai-server. The directory is
scanned for files matching EASYAI-<name>.tools (top-level, exact,
case-sensitive); per-file fault isolation means a syntax error in
one file does NOT prevent the others from loading.
From C++:
// Directory model (recommended):
auto loaded = easyai::load_external_tools_from_dir(dir, /*reserved=*/{});
for (const auto & e : loaded.errors) std::fprintf(stderr, "error: %s\n", e.c_str());
for (const auto & w : loaded.warnings) std::fprintf(stderr, "warn: %s\n", w.c_str());
for (auto & t : loaded.tools) engine.add_tool(std::move(t));
// Single-file (for unit tests / programmatic use):
auto one = easyai::load_external_tools_from_json(path, /*reserved=*/{});
if (!one.error.empty()) { std::fprintf(stderr, "%s\n", one.error.c_str()); return 1; }
for (auto & t : one.tools) engine.add_tool(std::move(t));Manifest schema (one entry — see examples/EASYAI-example.tools and
EXTERNAL_TOOLS.md for more):
{
"version": 1,
"tools": [
{
"name": "list_processes",
"description": "List running processes whose name matches a regex pattern.",
"command": "/usr/bin/pgrep",
"argv": ["-a", "{pattern}"],
"parameters": {
"type": "object",
"properties": {
"pattern": { "type": "string", "description": "Regex." }
},
"required": ["pattern"]
},
"timeout_ms": 5000,
"max_output_bytes": 65536,
"cwd": "$SANDBOX",
"env_passthrough": ["PATH"],
"stderr": "discard",
"treat_nonzero_exit_as_error": false
}
]
}Field-by-field reference
| Field | Required | Notes |
|---|---|---|
name |
yes | ^[a-zA-Z][a-zA-Z0-9_]{0,63}$. Must not collide with built-ins (bash, read_file, …) or already-registered tools. |
description |
yes | Plain-English text for the model. 1..4096 chars. The model uses this to decide when to call your tool, so write it well. |
command |
yes | Absolute path to a regular, executable file. Relative names are rejected at load (no PATH search → no PATH-hijack risk). |
argv |
yes | Array of strings. Each element is either a literal (no { or }) or exactly "{paramname}". Embedded placeholders ("--flag={x}") are rejected — split into two elements (["--flag", "{x}"]) instead. |
parameters |
optional | JSON-Schema-shaped: {type:"object", properties:{...}, required:[...]}. Types accepted: string, integer, number, boolean. |
timeout_ms |
optional | Default 10000. Clamped to [100, 300000]. |
max_output_bytes |
optional | Default 65536. Clamped to [1024, 4 MiB]. Excess output is silently discarded; the response notes the truncation. |
cwd |
optional | Either an absolute path or the magic token "$SANDBOX" which resolves to the process's CWD at load time. Default: "$SANDBOX". |
env_passthrough |
optional | Allowlist of parent-process env vars to inherit. Default empty — the subprocess gets a clean env. Add "PATH", "HOME", etc. only when the wrapped command needs them. |
stderr |
optional | "merge" (default) or "discard". |
treat_nonzero_exit_as_error |
optional | Default true. Set false for tools whose non-zero exit is informational (pgrep returns 1 when nothing matches). |
Security guarantees — these are enforced, not aspirational:
- No shell. The runner uses
fork+execvewith an argv array. The model's argument never passes through a shell parser, so quoting /;/ backticks /$(…)cannot escape its argv slot. - Absolute command path. Validated at load (regular file + executable bit). No PATH lookup, no PATH-hijack.
- Whole-element placeholders only. A model argument fills exactly one argv element; it can't be concatenated into a literal.
- Schema-validated arguments. Type errors are surfaced as a
ToolResult::errorbefore anything is spawned. Required-but- missing arguments are rejected. - Hard caps. Manifest size (1 MiB), tools per manifest (128), params per tool (32), env passthrough size (16), argv elements (256), per-arg bytes (4 KiB). Each cap closes a class of DoS.
- Clean env by default. Only listed
env_passthroughvars inherit.LD_PRELOAD,PATH, etc. don't leak in unless asked. - Closed stdin. No way to feed the subprocess from the model.
- Process-group timeout. SIGTERM to the group on
timeout_ms, SIGKILL after a 1 s grace — kills any grandchildren the command spawned, not just the top-level process. - Inherited fds closed. All fds ≥ 3 are closed in the child before exec, so the agent's HTTP transport / log files / database handles do not leak into the spawned command.
The manifest is the operator's deploy artefact — treat it like a sudoers file. Anyone who can write it can run arbitrary commands as the agent's user.
Argv-injection via leading dashes. The library guarantees that a
model's argument fills exactly one argv slot — quoting and shell
metacharacters can't escape it. What the library cannot know is
whether the wrapped command treats a value starting with - as a
flag. If your tool wraps a binary that accepts options
(pgrep "-V", grep "-r", find "-delete", …), insert the
end-of-options sentinel "--" as a literal argv element BEFORE the
placeholder:
"argv": ["-a", "--", "{pattern}"]GNU coreutils, util-linux, git, grep, ripgrep, find, and pgrep all
honour --. Integer/number/boolean parameters are immune (they're
not strings) and don't need this. See
examples/EASYAI-example.tools and EXTERNAL_TOOLS.md for the
pattern.
# enable from the CLIs (DIR contains EASYAI-*.tools files)
easyai-local --sandbox ./work --external-tools ./tools.d
easyai-cli --sandbox ./work --external-tools ./tools.d --url http://...
easyai-server -m model.gguf --sandbox /srv/agent --external-tools /etc/easyai/external-toolsThe default install ships /etc/easyai/external-tools/ empty;
operators drop EASYAI-<name>.tools files in to add tools. The
systemd unit always passes --external-tools so a restart picks up
new files.
This section is the practical companion to §3.3.4. It assumes you've read the schema and security model and now want to actually ship a manifest.
Recipe 1 — read-only system inspector (no parameters).
Useful for "give me the model the ability to talk about the host" without any attack surface beyond reading public OS state.
{
"name": "host_status",
"description": "Return uptime, load average, kernel name, and free memory of the host. Use when the user asks 'how is the box doing'.",
"command": "/usr/bin/uptime",
"argv": [],
"parameters": { "type": "object", "properties": {} },
"timeout_ms": 2000,
"max_output_bytes": 4096,
"cwd": "$SANDBOX",
"env_passthrough": [],
"stderr": "discard"
}Teaches: zero-parameter tool, conservative timeout_ms and
max_output_bytes, empty env, stderr: "discard" keeps the model's
context clean of harmless noise.
Recipe 2 — code search via ripgrep (with -- sentinel).
{
"name": "code_search",
"description": "Search the project tree for a literal string or regex. Returns file:line:match. Limit yourself to specific patterns — broad searches are slow and noisy.",
"command": "/usr/bin/rg",
"argv": [
"--no-heading",
"--line-number",
"--max-count", "100",
"--",
"{pattern}",
"."
],
"parameters": {
"type": "object",
"properties": {
"pattern": {
"type": "string",
"description": "Literal string or regex to search for. Quote multi-word phrases."
}
},
"required": ["pattern"]
},
"timeout_ms": 15000,
"max_output_bytes": 262144,
"cwd": "$SANDBOX",
"env_passthrough": ["HOME"],
"stderr": "merge",
"treat_nonzero_exit_as_error": false
}Teaches: "--" sentinel before the string placeholder so a model
prompt of pattern = "-r" or "--type=cpp" is interpreted as a
search pattern, not a flag. treat_nonzero_exit_as_error: false
because rg returns 1 when the pattern is not found — that's
informational, not a failure.
Recipe 3 — JSON filter via jq (no shell escaping headaches).
{
"name": "json_filter",
"description": "Apply a jq expression to an existing JSON file in the sandbox. The 'filter' argument is the jq expression (e.g. '.users[] | .email').",
"command": "/usr/bin/jq",
"argv": ["--", "{filter}", "{file}"],
"parameters": {
"type": "object",
"properties": {
"filter": { "type": "string", "description": "jq expression." },
"file": { "type": "string", "description": "Path to a JSON file inside the sandbox." }
},
"required": ["filter", "file"]
},
"timeout_ms": 5000,
"max_output_bytes": 65536,
"cwd": "$SANDBOX"
}Teaches: complex filter strings with quotes / pipes / $() are passed
through as a single argv element — no shell, no escaping. The library
guarantees {filter} fills exactly one argv slot regardless of its
contents.
Recipe 4 — internal CLI with a credential.
{
"name": "deploy_status",
"description": "Return the deployment status of a service from our internal control plane. Service must be one we own.",
"command": "/opt/internal/bin/deploy-cli",
"argv": ["status", "--", "{service}"],
"parameters": {
"type": "object",
"properties": {
"service": { "type": "string", "description": "Service name (e.g. 'billing-api')." }
},
"required": ["service"]
},
"timeout_ms": 10000,
"max_output_bytes": 32768,
"cwd": "$SANDBOX",
"env_passthrough": ["DEPLOY_TOKEN", "HOME", "PATH"]
}Teaches: opt-in env passthrough for credentials. DEPLOY_TOKEN is
read from the parent's environment at every call (so rotating it in
the systemd unit re-reads on next call without a restart). Without
the allowlist the subprocess gets a clean env.
Recipe 5 — Python one-liner (no shell, but execve still works).
{
"name": "python_eval",
"description": "Evaluate a SHORT Python expression and return its repr. Single line, no imports beyond math/datetime/json. Use for arithmetic / date math the model would otherwise get wrong.",
"command": "/usr/bin/python3",
"argv": ["-c", "{expr}"],
"parameters": {
"type": "object",
"properties": {
"expr": { "type": "string", "description": "Python expression. Output goes to stdout via print(repr(...)). Example: 'print(repr(__import__(\"datetime\").datetime.now()))'." }
},
"required": ["expr"]
},
"timeout_ms": 3000,
"max_output_bytes": 16384,
"cwd": "$SANDBOX",
"env_passthrough": []
}Teaches: -c "{expr}" works because {expr} is one argv element.
python3 -c 'print(1)' and python3 -c 'print(1); __import__("os").system("rm -rf /")' reach
python the same way; whether to allow it is your policy decision
(this is essentially bash with a Python-shaped surface area —
narrower, but still arbitrary code execution).
Recipe 6 — git porcelain (integer parameter, no leading-dash worry).
{
"name": "git_log",
"description": "Show the last N commits of the repository in the sandbox. Format: short hash, author, ISO date, subject.",
"command": "/usr/bin/git",
"argv": ["log", "--max-count", "{count}", "--pretty=format:%h %an %ad %s", "--date=iso-strict"],
"parameters": {
"type": "object",
"properties": {
"count": { "type": "integer", "description": "1..100 commits to show." }
},
"required": ["count"]
},
"timeout_ms": 8000,
"max_output_bytes": 131072,
"cwd": "$SANDBOX",
"env_passthrough": ["HOME", "PATH"]
}Teaches: integer parameters skip the leading-dash worry — 1 cannot
be parsed as a flag. HOME is needed for ~/.gitconfig; PATH is
needed because git invokes sub-commands like git-log via PATH.
Recipe 7 — long-running task with cooperative timeout.
{
"name": "build_project",
"description": "Run the build script in the sandbox. Returns build log. May take up to 5 minutes.",
"command": "/usr/bin/make",
"argv": ["-j", "{jobs}"],
"parameters": {
"type": "object",
"properties": {
"jobs": { "type": "integer", "description": "Parallel jobs (1..16)." }
},
"required": ["jobs"]
},
"timeout_ms": 300000,
"max_output_bytes": 4194304,
"cwd": "$SANDBOX",
"env_passthrough": ["HOME", "PATH", "CC", "CXX"],
"stderr": "merge"
}Teaches: timeout_ms at the 5-min ceiling, max_output_bytes at the
4-MiB ceiling. On timeout the runner sends SIGTERM to the process
group (so child compilers die too), then SIGKILL after 1 s. Make
gets one chance to flush — typical build systems handle this fine.
| Situation | What happens |
|---|---|
| Binary doesn't exist when manifest loads | Load fails with a precise error. Agent doesn't start. |
| Binary removed AFTER successful load | Tool call returns exit=127 external_tool: execve failed. |
| Binary on a stalled NFS mount | Call blocks until timeout_ms, then SIGTERM/SIGKILL the way a hung child gets killed anywhere. |
| Required parameter missing in model's call | ToolResult::error before fork(). |
| Optional parameter missing | Empty string substituted into the argv slot. (For tools that distinguish "" from "not present", make the param required.) |
| Extra keys in the model's JSON arguments | Silently ignored. |
| Two manifest entries with the same name | Load fails with duplicate tool name. |
Manifest entry shadows a built-in (e.g. names itself bash) |
Load fails with collision error. |
argv is [] (empty array) |
Valid. Command runs with argv[0] = basename(command) and nothing else. |
Model sends "true" (string) for a boolean parameter |
Validation rejects: expected boolean. |
Model sends 1.5 for an integer parameter |
Validation rejects: expected integer. nlohmann's is_number_integer() is strict. |
Model sends NaN / Infinity for a number parameter |
Rejected: must be a finite number. |
| Manifest is edited while the agent is running | No effect — loaded once at startup. Restart to pick up changes. |
cwd: "$SANDBOX" but the agent didn't chdir |
$SANDBOX is captured at LOAD time. The manifest's cwd resolves to whatever the process CWD was when load_external_tools_from_json ran. The CLIs chdir(--sandbox) BEFORE loading, so $SANDBOX ≡ --sandbox. |
Want LD_PRELOAD to leak through |
Can't, by design. Listed env vars are validated against ^[a-zA-Z][a-zA-Z0-9_]{0,63}$ — LD_PRELOAD matches the pattern, so technically allowed. But you have to opt in explicitly. |
stderr: "discard" and the command writes to stderr |
Bytes go to /dev/null. Model never sees them. |
| Command writes binary / non-UTF-8 bytes to stdout | Captured as-is. Returned as a std::string to the model — it'll see invalid bytes. Best-effort. |
| Command spawns a daemon (forks, parent exits) | Parent reaped immediately; daemon survives but inherits stdout=pipe with no reader; first write gets SIGPIPE and dies. Don't use this design for daemonising commands. |
Subprocess takes 3 seconds; timeout_ms = 5000 |
Runs to completion, output captured normally. |
Subprocess outputs more than max_output_bytes |
Excess is silently discarded (drained, not buffered). Response notes [truncated at N bytes]. The child stays unblocked. |
| Subprocess writes 1 KB then sleeps 30 minutes | Output captured immediately; SIGTERM on timeout_ms, SIGKILL after 1 s grace. |
| Two concurrent calls to the same tool | Each forks its own subprocess. No shared state on the library side. The wrapped command is responsible for its own concurrency. |
| Agent (parent) crashes mid-call | On Linux, PR_SET_PDEATHSIG(SIGKILL) ensures the subprocess dies with the agent. Otherwise it'd reparent to PID 1 and survive. |
Manifest path is /dev/zero or a directory |
slurp() rejects with manifest is not a regular file. |
| Manifest is 2 MB | Rejected: manifest exceeds 1048576 bytes. |
DO:
- Use absolute paths in
command(the loader requires it; don't fight it). - Insert
"--"literal element before any string placeholder for binaries that accept options (rg, grep, find, pgrep, kill, …). - Set
treat_nonzero_exit_as_error: falsefor tools where non-zero is informational (pgrep,grep,diff). - Match
timeout_msandmax_output_bytesto the worst plausible case for that tool — not a global default. Short-running status tools should have small caps so a hung command doesn't waste the 5-min ceiling. - Use
env_passthroughto pass exactly the env vars the wrapped command needs (HOME,PATH, sometimesLANG,TZ, a credential token). Default[]and grow only when something fails. - Spend real time on
descriptiontext — that string is how the model picks WHICH tool to call. Mention edge cases ("returns empty when nothing matches"), expected use ("call this AFTERweb(action=\"search\")"), and units ("returns kilobytes"). - Name parameters to match the wrapped CLI's vocabulary (
patternif the binary calls it pattern, notregex). - Group related tools in one manifest — the
--toolsallowlist applies after load, so a single big manifest is fine for the operator. - Validate the manifest dir before deploy:
easyai-local --no-tools --external-tools ./tools.d(no model call, just load — exits cleanly if valid, errors emitted to stderr if not). - Run easyai-server as a dedicated unprivileged user when external tools are in play — the security guarantees stop shell injection, not "runs with your full uid".
DON'T:
- Don't put a placeholder inside a larger string — split into separate elements.
["--flag={x}"]is rejected at load. Use["--flag", "{x}"]. - Don't reach for the
bashbuiltin to wrap a command you could declare in the manifest. The manifest gives you fork+execve safety, schema validation, hard timeouts, fd hygiene, env hygiene.bashgives you none of those. - Don't rely on the agent's CWD matching what you think it is — be explicit with
cwd: "$SANDBOX"or an absolute path. - Don't put credentials in
argv— they end up in/proc/<pid>/cmdline(world-readable on most distros). Useenv_passthroughinstead. - Don't declare more than ~10–15 tools in a single manifest unless the model is large. Every tool's name + description + schema is serialised into the prompt on every turn — too many tools = too much token budget eaten before the user's actual question.
- Don't use the manifest for tools that need shared in-process state (a database connection pool, an HTTP client with a session cookie). Those are C++ tools —
Tool::builder().handle(...). - Don't expose interactive tools (
vim,nano,more) — stdin is closed; they'll behave strangely or hang until timeout. - Don't expose tools that fork-and-exit (daemonising launchers) — the daemon's stdout is the now-orphaned pipe; first write SIGPIPEs.
# Loads the manifest, prints the resulting tool list, exits.
# Any load error fails the command — wire it into your CI.
easyai-local --no-tools --external-tools ./tools.d --print-models 2>&1 \
| grep -E "(loaded|error)"You can also unit-test a manifest from C++:
auto loaded = easyai::load_external_tools_from_json("mytools.json", {});
assert(loaded.error.empty() && "manifest invalid");
assert(loaded.tools.size() == kExpectedToolCount);
for (const auto & t : loaded.tools) {
// sanity-check the auto-generated parameter schema
auto schema = nlohmann::json::parse(t.parameters_json);
assert(schema["type"] == "object");
}- Not a sandbox. External tools run with the agent's full uid/gid. Network, FS, signals — everything the agent can do, the tool can do. The library closes inheritance leaks; it doesn't isolate.
- Not a process supervisor. No restart-on-failure, no PID file, no log rotation. Each call is a one-shot fork+exec.
- Not async. A tool call blocks the agent loop until it returns or times out. Latency budget = timeout_ms.
- Not stateful. Each call gets a fresh subprocess. If you need
state, write a C++ tool with a captured
std::shared_ptrto a state object.
The authoritative guide is
RAG.md. The summary below is a quick reference.
The knowledge tools give the agent a tool surface for remembering things across sessions. Under the hood they use a passive RAG technique — keyword-indexed Markdown files the agent saves and searches itself, with no embedding model or vector store. Seven independent tools:
for (auto & t : easyai::tools::knowledge_split_tools("/var/lib/easyai/rag"))
engine.add_tool(std::move(t));
// knowledge_save keywords[], content, fix?
// knowledge_append keywords[], content
// knowledge_search keywords[], max_results=10
// knowledge_load keywords[][1..4]
// knowledge_list prefix?, max=50
// knowledge_delete keywords[]
// knowledge_keywords min_count=1, max=200Or via the --memory <dir> flag in easyai-server, easyai-cli, and
easyai-local (the legacy --RAG flag is still accepted as an
alias). The systemd-installed server passes
--memory /var/lib/easyai/rag by default.
Keywords ARE the identifier — there is no separate title parameter.
Sorted keywords joined by _ become the filename.
"python async" produces async_python.md. Immutable entries use
the fix- prefix (e.g. fix-async_python.md).
Each entry is one Markdown file in the configured directory:
keywords: async, python
Body content here. Free-form UTF-8 up to 256 KB.
Operator-readable, hand-editable, grep-able.
Constraints: keywords match ^[A-Za-z0-9._+-]+$ (≤ 32
bytes), 1..8 keywords per entry, content ≤ 256 KiB, max 4 loads per
call.
The model is encouraged (in the tool descriptions) to save
aggressively, search before assuming it doesn't know something, and
delete stale entries to keep the index sharp. See RAG.md for the
full workflow including document ingestion, the positive cycle, and
the operator's audit / backup recipes.
Just register on_token:
engine.on_token([](const std::string & piece){
std::cout << piece << std::flush;
});Pieces are substrings of UTF-8 tokens. Most pieces are full tokens, but multi-byte characters can split across pieces — buffer if you need character-precise rendering.
engine.on_tool([](const easyai::ToolCall & c, const easyai::ToolResult & r){
log_metric("tool_call", { {"name", c.name}, {"is_error", r.is_error} });
});The callback fires after every dispatched tool, success or failure.
engine.clear_history(); // wipes history + KV cache + sampler state
engine.system("You are now a different assistant.");If you want to programmatically replay a conversation, e.g. when restoring from a database:
engine.replace_history({
{"system", "You are a helpful assistant."},
{"user", "What's the capital of Brazil?"},
{"assistant", "Brasília."},
{"user", "And of France?"},
});
auto reply = engine.chat(""); // generate the next assistant turnconst easyai::Preset * p = easyai::find_preset("creative");
if (p) engine.set_sampling(p->temperature, p->top_p, p->top_k, p->min_p);Or to honour a chat-line command from your own UI:
auto pr = easyai::parse_preset(user_line);
if (!pr.applied.empty()) {
engine.set_sampling(pr.temperature, pr.top_p, pr.top_k, pr.min_p);
user_line = user_line.substr(pr.consumed); // strip prefix
}
engine.chat(user_line);In this chapter, you'll learn to:
- understand what an "AI tool" really is (it's just a C++ function!)
- write a tool that returns today's date
- write a tool that fetches live weather from the internet
- give your agent both tools and watch it answer real questions
- recognise when to reach for the more advanced building blocks
You don't need to know: llama.cpp, JSON Schema, Jinja templates, or anything about how language models work under the hood.
This is the chapter every other chapter has been pointing at. When people ask "what's so cool about easyai?" — this is the answer. You're going to give a small AI model two new abilities in about fifty lines of code, and at the end you'll have a working agent that genuinely reaches out to the internet on your behalf.
There's a finished version of everything below in
examples/recipes.cpp. Build it now so you can compare:
cmake --build build -j --target easyai-recipesWe'll come back to that binary at the end and run it.
Imagine you hire a brilliant intern. They're fast, polite, and they know almost everything — but they joined the company yesterday so they don't know your customer database, they don't have your VPN, and they can't see today's calendar. How do you make them useful?
You give them a phone book of internal services and you tell them: "if anyone asks about a customer, call this number; if they ask about billing, call that one."
That's exactly what a tool is to an AI model. Each tool you register is a phone-book entry. The model gets to read three things about it:
| Field | What goes here | Read by |
|---|---|---|
| name | A short identifier — e.g. today_is, weather |
the model |
| description | One sentence: what does this do, when should I use it? | the model |
| handler | A normal C++ function that gets called for you | easyai |
When the model decides "I should use the weather tool", easyai catches that intent, runs your handler with whatever arguments the model picked, and feeds the result back so the model can finish its answer.
The whole dance, drawn out:
user model easyai your handler
│ │ │ │
│ "What's the weather │ │ │
│ in São Paulo?" ───▶ │ │ │
│ │ "I'll call │ │
│ │ weather(city=…)" ──▶│ │
│ │ │ weather(...) ───▶ │
│ │ │ │ … HTTP call …
│ │ │ ◀──── "São Paulo: │
│ │ ◀──── tool result ────│ ⛅ +24°C" │
│ │ │ │
│ "São Paulo is a │ │ │
│ pleasant 24°C…" ◀────│ │ │
You write the handler. Everything else is automatic.
Most tiny AI models have no idea what today's date is. Their training data ended months (sometimes years) ago. Ask Qwen2.5-1.5B "what's today's date?" and you'll usually get a confident-sounding hallucination.
Let's fix that with eight lines.
easyai::Tool today_is() {
return easyai::Tool::builder("today_is")
.describe("Returns today's date in ISO-8601 format (YYYY-MM-DD, UTC).")
.handle([](const easyai::ToolCall &) {
auto now = std::chrono::system_clock::now();
auto t = std::chrono::system_clock::to_time_t(now);
char buf[16];
std::strftime(buf, sizeof(buf), "%Y-%m-%d", std::gmtime(&t));
return easyai::ToolResult::ok(buf);
})
.build();
}Read it line by line — there's nothing magical:
Tool::builder("today_is")— pick a name for the tool. Usesnake_case. This is the name the model will speak when it wants to use the tool..describe(...)— write a one-line description that you'd give a smart intern. "Returns today's date in ISO-8601 format" is crystal-clear. "Useful for date stuff" would not be..handle(...)— the C++ that does the real work. Here it's a little lambda that calls the standard library. No llama.cpp, no JSON, no AI-specific API.ToolResult::ok(buf)— pack the string into a success result. Whatever you pass here is what the model sees back as the tool's reply..build()— turn the recipe into the actualToolobject.
Tip. The description is the only hint the model has about when to call your tool. Write it for an LLM, not for your IDE. Be specific, give an example output, mention units.
easyai::Engine engine;
engine.model("models/qwen2.5-1.5b-instruct-q4_k_m.gguf")
.add_tool(today_is()) // ← your new tool
.load();
engine.chat("What's the date today?");That's it. Eight lines for the tool plus three for the wiring, and your agent now has reliable date access.
Try it. Wrap the snippet above in a
main(), link againsteasyai, build, and run. Or just look atexamples/recipes.cpp— it's the same code, ready to go.
Today's date is fun, but the real point of giving an AI tools is so it can reach out to systems you control: your database, your APIs, your filesystem, the internet.
Let's write a weather tool. We'll use wttr.in — a free,
no-signup service that takes a city name and replies in plain text:
$ curl 'https://wttr.in/Sao Paulo?format=3'
São Paulo: ⛅ +24°C
That's the whole API. Our job is to wrap that in a tool.
We're going to do this in four small steps so nothing feels like a leap.
This time the tool needs a parameter (city). The builder makes
that one extra line:
easyai::Tool::builder("weather")
.describe("Returns the current weather for a city. Backed by wttr.in "
"— free, no API key, plain-text reply.")
.param("city", "string",
"City name, e.g. 'Berlin' or 'Sao Paulo'. Required.",
/*required=*/true)param(name, type, description, required) is all you ever need.
The valid type values are:
type |
C++ in your handler |
|---|---|
"string" |
std::string via args::get_string_or(...) |
"integer" |
long long via args::get_int_or(...) |
"number" |
double via args::get_double_or(...) |
"boolean" |
bool via args::get_bool_or(...) |
"array" |
parse the JSON yourself |
"object" |
parse the JSON yourself |
Heads-up. Tiny models occasionally forget required parameters. Always validate inside your handler — see step 3.
The model packs the arguments into a JSON blob (e.g.
{"city":"Sao Paulo"}). easyai gives you a tiny scanner so you
don't need a JSON library:
.handle([](const easyai::ToolCall & call) {
std::string city = easyai::args::get_string_or(
call.arguments_json, "city", "");
if (city.empty()) {
return easyai::ToolResult::error("missing 'city' argument");
}
...That one line with get_string_or replaces the four lines of
"declare, get, check, default" pattern you'd write in plain C++.
The full helper menu:
| Helper | Returns… |
|---|---|
args::get_string_or(json, key, default) |
the value, or your default |
args::get_int_or (json, key, default) |
same idea, long long |
args::get_double_or(json, key, default) |
same idea, double |
args::get_bool_or (json, key, default) |
same idea, bool |
args::has(json, key) |
bool — did the model fill it in? |
(There's an older bool args::get_string(json, key, &out) form
that's still around when you need to tell "absent" apart from
"present but empty".)
Anything you can do in C++ goes here: hit a REST API, query SQLite, shell out to a Python script, send a Slack message, ring a bell on the desk next to you. In our case it's an HTTP GET, and libcurl takes about ten lines:
CURL * h = curl_easy_init();
char * escaped = curl_easy_escape(h, city.c_str(), 0); // URL-safe
std::string url = "https://wttr.in/";
url += escaped ? escaped : city.c_str();
url += "?format=3"; // one-line summary
if (escaped) curl_free(escaped);
std::string body;
curl_easy_setopt(h, CURLOPT_URL, url.c_str());
curl_easy_setopt(h, CURLOPT_USERAGENT, "easyai-recipes/0.1");
curl_easy_setopt(h, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_setopt(h, CURLOPT_TIMEOUT, 15L);
curl_easy_setopt(h, CURLOPT_WRITEFUNCTION, capture_body); // see recipes.cpp
curl_easy_setopt(h, CURLOPT_WRITEDATA, &body);
CURLcode rc = curl_easy_perform(h);
long code = 0;
curl_easy_getinfo(h, CURLINFO_RESPONSE_CODE, &code);
curl_easy_cleanup(h);Don't panic at the libcurl block — copy and paste it into any tool that needs the network and tweak the URL. The boilerplate is the same every time. Half of your future tools will be exactly this shape.
if (rc != CURLE_OK) {
return easyai::ToolResult::error(
std::string("HTTP transport error: ") + curl_easy_strerror(rc));
}
if (code >= 400) {
return easyai::ToolResult::error(
"wttr.in returned HTTP " + std::to_string(code));
}
return easyai::ToolResult::ok(body);Two return flavours, only:
ToolResult::ok(text)— the model seestextas the reply.ToolResult::error(msg)— easyai marks the message as a failure so the model knows to recover (try a different tool, ask the user, apologise).
Why this matters. When a tool errors, well-trained models do the right thing. They don't pretend the call worked. They tell the user "the weather service is unavailable, want me to try again later?" Use
errorfor anything that isn't a success.
The whole main() is in examples/recipes.cpp:
easyai::Engine engine;
engine.model(model_path)
.context(4096)
.gpu_layers(99)
.system("You are a concise assistant. Use tools whenever they help.")
.add_tool(today_is())
.add_tool(weather())
.on_token([](const std::string & p){ std::cout << p << std::flush; });
if (!engine.load()) {
std::fprintf(stderr, "load failed: %s\n", engine.last_error().c_str());
return 1;
}
engine.chat("What's today's date, and what's the weather in Sao Paulo right now?");Run it:
$ ./build/easyai-recipes models/qwen2.5-1.5b-instruct-q4_k_m.gguf
[recipes] backend=Metal ctx=4096 tools=2
Today is 2026-04-26. São Paulo currently shows ⛅ +24°C, so light
clothes with a thin layer for the evening should be perfect.
- The model received your one English sentence.
- It noticed it didn't know the date — so it called
today_is. - It noticed it didn't know the weather — so it called
weatherwith{"city":"Sao Paulo"}. - easyai ran both your handlers, captured both replies, and fed them back into the model.
- The model wove them into one fluent answer.
You wrote two C++ functions. easyai did the rest.
This sub-section is a quick tour of doors you can walk through next. Each is fully optional.
Mix and match types, mark some as optional, use _or helpers to
thread defaults right through:
easyai::Tool::builder("send_alert")
.describe("Push a one-line alert to the on-call channel.")
.param("text", "string", "Message body. Required.", true)
.param("severity", "string", "info | warning | critical. Default 'info'.", false)
.param("notify_now", "boolean", "Page on-call immediately?", false)
.handle([](const easyai::ToolCall & call) {
auto text = easyai::args::get_string_or(call.arguments_json, "text", "");
auto severity = easyai::args::get_string_or(call.arguments_json, "severity", "info");
auto pageNow = easyai::args::get_bool_or (call.arguments_json, "notify_now", false);
if (text.empty()) return easyai::ToolResult::error("missing 'text'");
// … your real send code …
return easyai::ToolResult::ok("alert dispatched.");
})
.build();The builder makes a flat JSON-Schema (just properties + required).
For 95% of tools that's plenty. Need enums, nested objects, arrays?
Drop down to Tool::make() with a hand-written schema:
engine.add_tool(easyai::Tool::make(
"create_ticket",
"Open a Jira ticket.",
R"({
"type": "object",
"properties": {
"project": { "type": "string" },
"summary": { "type": "string" },
"priority": { "type": "string", "enum": ["P0","P1","P2","P3"] },
"labels": { "type": "array", "items": { "type": "string" } }
},
"required": ["project","summary"]
})",
[](const easyai::ToolCall & call) {
// parse with nlohmann::json (vendored at ../llama.cpp/vendor) …
return easyai::ToolResult::ok("JRA-1234");
}));Same engine, same callback shape, full schema control.
- 3.2.1 Writing reliable tool descriptions
— the contract with the model. Single-action vs. multi-action
patterns, the per-
param()description style used in-tree, and the tolerance shims (synonym mapping, action inference, error messages that teach) that keep tools robust when the model goes off-spec. src/builtin_tools.cpp— the unifiedwebandfstools andbash. All written with the exact API you've been using. No internal magic; copy any of them as a starting point.examples/agent.cpp— every built-in plus a one-linerflip_coinfor the shortest possible custom tool.- 3.3 Sandboxed filesystem tools — expose a directory to the model without giving away the whole disk.
- 3.5 Listening for tool calls — log every dispatch, light up a UI spinner, push to Prometheus.
If you want practice, pick one and tell us what you came up with:
now()— current time in any timezone (parametertz).coin_flip()— heads/tails (no parameters).roll_dice()—count+sidesparameters.unit_convert()— temp/length/weight; HTTP-free.wikipedia_summary()— callsen.wikipedia.org/api/rest_v1/page/summary/<title>.slack_post()— your incoming-webhook URL goes in code.sqlite_query()— read-only, parametersql. Sandbox to one DB.git_log()— last N commits of a sandboxed repo.prometheus_query()— point at your local/api/v1/queryendpoint.home_assistant()— toggle a light by entity ID. Now you've built the front-end of a smart home.
You're done with the chapter. Anything you can call from C++, you can hand to your AI agent. That's the entire promise of easyai as a framework — and you have everything you need.
Use this when you want one assistant turn out (no internal tool loop) so you can decide what to do with any tool calls — exactly what the HTTP server does when the client provides its own tools:
engine.push_message("user", "Call get_weather for Tokyo.");
auto turn = engine.generate_one();
if (turn.finish_reason == "tool_calls") {
for (size_t i = 0; i < turn.tool_calls.size(); ++i) {
const auto & [name, args] = turn.tool_calls[i];
std::string result = my_remote_executor(name, args);
engine.push_message("tool", result, name, turn.tool_call_ids[i]);
}
auto final = engine.generate_one(); // model digests tool result
std::cout << final.content;
} else {
std::cout << turn.content;
}libeasyai-cli is the network-side counterpart of libeasyai. Same
fluent API, same Tool registration model, same agentic loop — the
model runs on a remote /v1/chat/completions endpoint while your
tools execute locally.
// remote.cpp
#include "easyai/client.hpp"
#include "easyai/builtin_tools.hpp"
#include "easyai/plan.hpp"
#include <cstdio>
#include <cstdlib>
#include <iostream>
int main() {
easyai::Client cli;
cli.endpoint("http://ai.local:8080")
.api_key(std::getenv("EASYAI_API_KEY") ? std::getenv("EASYAI_API_KEY") : "")
.model("EasyAi")
.system("You are a planning agent. Be concise.")
.temperature(0.2f)
.top_p(0.92f)
.seed(42);
cli.add_tool(easyai::tools::datetime());
cli.add_tool(easyai::tools::web_search());
cli.add_tool(easyai::tools::web_fetch());
easyai::Plan plan;
plan.on_change([](const easyai::Plan & p){
std::cout << "\n[plan]\n";
p.render(std::cout);
});
cli.add_tool(plan.tool());
cli.on_token ([](const std::string & p){ std::cout << p << std::flush; });
cli.on_reason([](const std::string & p){ std::cerr << p << std::flush; });
cli.on_tool ([](const easyai::ToolCall & call, const easyai::ToolResult & r){
std::fprintf(stderr, "%s %s(%s)\n",
r.is_error ? "✗" : "🔧",
call.name.c_str(),
call.arguments_json.c_str());
});
std::string answer = cli.chat("Resumo dos 3 papers mais citados sobre Mamba este ano.");
if (answer.empty() && !cli.last_error().empty()) {
std::fprintf(stderr, "error: %s\n", cli.last_error().c_str());
return 1;
}
std::cout << "\n";
return 0;
}CMake — find_package style (after cmake --install):
find_package(easyai 0.1 REQUIRED)
add_executable(remote remote.cpp)
target_link_libraries(remote PRIVATE easyai::cli)easyai::cli transitively pulls easyai::engine so Tool / Plan /
the easyai::tools::* factories are available without extra link
flags.
Sampling and penalty knobs are all there as fluent setters:
temperature, top_p, top_k, min_p, repeat_penalty,
frequency_penalty, presence_penalty, seed, max_tokens,
stop(vector), extra_body_json (free-form JSON merged last so it can
override anything the typed setters wrote, useful for non-standard
server extensions like {"reasoning_effort":"high"}).
Server management without touching curl:
std::vector<easyai::RemoteModel> models;
cli.list_models(models);
std::vector<easyai::RemoteTool> remote_tools;
cli.list_remote_tools(remote_tools); // GET /v1/tools
if (!cli.health()) std::fprintf(stderr, "down: %s\n", cli.last_error().c_str());
std::string props_json;
cli.props(props_json); // GET /props (raw JSON)
std::string prom_text;
cli.metrics(prom_text); // GET /metrics (Prometheus)
cli.set_preset("creative"); // POST /v1/presetThe easyai-cli binary (examples/cli.cpp) is a
ready-to-run reference for all of the above — REPL or one-shot, every
sampling knob exposed as a flag, seven management subcommands
(--list-models, --list-tools, --list-remote-tools, --health,
--props, --metrics, --set-preset NAME).
This part is the deep dive on easyai::Client. Use this when the
model lives on another machine (or another process) and you want
your code to drive the conversation with locally-executed tools.
That's the canonical "agent" architecture — model is rented, brain
trusts itself, hands stay on your laptop.
- Builds a valid OpenAI
/v1/chat/completionsrequest body. - Streams the SSE response back, splitting
delta.content,delta.reasoning_content, and incrementaldelta.tool_callsinto your callbacks as they arrive. - When the model emits
finish_reason="tool_calls", dispatches the matchingeasyai::Toolin your process, captures the result, and re-issues the request with the tool message appended — repeating until the model emits a non-toolfinish_reason. - Caps the agentic loop at 8 hops (matches
Engine::chat_continue). - Stores the conversation as raw OpenAI-shape JSON strings internally so no JSON type ever leaks through the public ABI.
#include "easyai/client.hpp"
#include "easyai/builtin_tools.hpp"
#include "easyai/plan.hpp"
easyai::Client cli;
cli.endpoint("http://ai.local:8080") // any /v1/chat/completions URL
.api_key(std::getenv("OPENAI_API_KEY") ? std::getenv("OPENAI_API_KEY") : "")
.model("EasyAi") // request body 'model' field
.system("You are a planning agent. Be concise.")
.timeout_seconds(86400) // connect + read (24 h — multi-hour agentic sessions)
.http_retries(5) // extra attempts on transient failures (default 5; 0 disables)
.verbose(false); // true = log SSE traffic to stderrendpoint accepts any HTTP or HTTPS URL. When the build was linked
with OpenSSL (default if libssl-dev is present at configure time)
HTTPS just works. For dev with a self-signed cert:
cli.tls_insecure(true); // skip peer cert verification
// or:
cli.ca_cert_path("/etc/ssl/certs/internal-ca.pem"); // trust a custom CAEvery standard OpenAI / llama-server / easyai-server field is a fluent setter. Pin only the ones you care about — leaving any of them alone keeps the server's default in effect.
cli.temperature(0.2f)
.top_p(0.92f)
.top_k(50)
.min_p(0.03f) // llama-server / easyai
.repeat_penalty(1.04f) // anti-loop default; pass 1.0 to disable
.frequency_penalty(0.05f) // per-token count penalty, [0.0, 2.0]
.presence_penalty(0.1f) // per-token-seen penalty, [-2.0, 2.0]
.seed(42) // deterministic; -1 = randomise
.max_tokens(12288)
.stop({ "\n\nUSER:", "\n\nQ:" });For non-standard server fields (reasoning_effort, tool_choice,
provider-specific extensions) there's an escape hatch:
cli.extra_body_json(R"({"reasoning_effort":"high","logit_bias":{"50256":-100}})");The string MUST parse as a JSON object; its keys merge into the request body last, so they override anything the typed setters wrote (handy for emergency one-offs).
Same easyai::Tool type used by Engine. The handler runs in your
process when the model picks the tool.
// Built-in tools (compiled into libeasyai):
cli.add_tool(easyai::tools::datetime());
cli.add_tool(easyai::tools::web_search());
cli.add_tool(easyai::tools::web_fetch());
cli.add_tool(easyai::tools::fs_read_file("/data")); // sandbox to /data
cli.add_tool(easyai::tools::fs_list_dir ("/data"));
// Built-in plan tool — separate object so you can render its state.
easyai::Plan plan;
plan.on_change([](const easyai::Plan & p){
std::cout << "\n[plan]\n";
p.render(std::cout);
});
cli.add_tool(plan.tool());
// Your own tool, inline:
cli.add_tool(easyai::Tool::builder("flip_coin")
.describe("Returns 'heads' or 'tails' with uniform probability.")
.handle([](const easyai::ToolCall &){
return easyai::ToolResult::ok((std::rand() & 1) ? "heads" : "tails");
}).build());There is no API difference between a Tool registered on Engine
and one registered on Client — your authoring code is portable
across "local model" and "remote model" deployments.
cli.on_token([](const std::string & piece) {
std::fputs(piece.c_str(), stdout);
std::fflush(stdout);
});
cli.on_reason([](const std::string & piece) {
// Optional: render the model's hidden reasoning in dim grey.
std::fprintf(stderr, "\033[2m%s\033[0m", piece.c_str());
});
cli.on_tool([](const easyai::ToolCall & call,
const easyai::ToolResult & r) {
std::fprintf(stderr, "[tool] %s%s -> %s\n",
r.is_error ? "FAIL " : "",
call.name.c_str(),
r.content.substr(0, 120).c_str());
});on_reason is opt-in by design — many UIs hide reasoning by default
(it's noisy, and some servers don't emit it at all). on_token is
the visible reply; on_tool fires once per dispatched tool round-trip
(call + result already paired).
Composing extra behaviour onto on_tool. Each callback slot is
single-valued — calling cli.on_tool(...) again replaces the
previous handler, it does not chain. If you want to add a checkpoint
or audit step on top of the canonical UI handler that
easyai::ui::Streaming::attach(cli) installs, use the public
forwarder Streaming::notify_tool(call, result) and wrap both:
easyai::ui::Streaming streaming(spinner, stats, style);
streaming.attach(cli); // sets the canonical UI on_tool handler
cli.on_tool([&](const easyai::ToolCall & c,
const easyai::ToolResult & r) {
streaming.notify_tool(c, r); // canonical UI (tool indicator,
// dim styling, plan re-render)
checkpoint_to_disk(cli); // your extra work
});This pattern is how easyai-cli saves .easyai_session after every
tool dispatch (so a force-exit mid-turn still leaves the
conversation up to the last completed tool on disk).
std::string answer = cli.chat("Resumo dos 3 papers mais citados sobre Mamba este ano.");
if (answer.empty() && !cli.last_error().empty()) {
std::fprintf(stderr, "error: %s\n", cli.last_error().c_str());
std::exit(1);
}chat() pushes the user message into history, runs the agentic loop,
and returns the final visible content. Successive chat() calls
keep the conversation going (history is preserved). To start over:
cli.clear_history();For more control (e.g. injecting tool results from outside), use
chat_continue() after pushing your own messages onto history via
the lower-level shape — but chat() is what 99% of agents want.
Each method maps 1:1 to the matching easyai-server route, returns
true on success, and writes diagnostic detail to last_error() on
failure. Together they make the lib enough to script and recreate a
server's state from scratch.
std::vector<easyai::RemoteModel> models;
cli.list_models(models); // GET /v1/models
std::vector<easyai::RemoteTool> tools;
cli.list_remote_tools(tools); // GET /v1/tools (easyai extension)
if (!cli.health()) { // GET /health
std::fprintf(stderr, "down: %s\n", cli.last_error().c_str());
}
std::string props;
cli.props(props); // GET /props (raw JSON)
std::string prom;
cli.metrics(prom); // GET /metrics (Prometheus text)
cli.set_preset("creative"); // POST /v1/presetEverything above is exposed as flags on examples/cli.cpp.
Read its source to see one possible "wire it all up" pattern; lift
chunks into your own app verbatim.
# REPL with the default tool set (datetime, plan, web_search,
# web_fetch, system_*); EASYAI_URL / EASYAI_API_KEY env vars work too.
easyai-cli --url http://ai.local:8080
# One-shot scripted call with a custom tool whitelist:
easyai-cli --url https://api.openai.com \
--api-key $OPENAI_API_KEY --model gpt-4o-mini \
--tools datetime,plan,web_search,web_fetch \
-p "Investigate today's most-cited mamba arxiv papers; produce a 5-bullet summary."
# Pin sampling + add stop sequences:
easyai-cli --url http://ai.local:8080 \
--temperature 0.0 --top-p 0.9 --seed 42 --stop "USER:" --stop "Q:" \
-p "Translate the next sentence to PT-BR: ..."
# Non-standard reasoning_effort field via --extra-json:
easyai-cli --url https://api.openai.com --api-key $K --model o1-preview \
--extra-json '{"reasoning_effort":"high"}' \
-p "Plan the Mars-mission trajectory."
# List local tools and exit (what the model will be told about):
easyai-cli --url http://x --list-tools
# List server-side tools (easyai-server-only extension):
easyai-cli --url http://ai.local:8080 --list-remote-toolsREPL specials inside the interactive mode:
| Command | Effect |
|---|---|
/exit /quit |
leave |
/clear |
clear conversation history (keep tools + system) |
/reset |
clear history AND clear plan |
/plan |
re-print the plan checklist |
/tools |
list locally-registered tools |
/help |
show specials |
This is the cookbook for adding tools the model can call. Every tool
in libeasyai's built-in set was written exactly the way you'll
write yours.
struct Tool {
std::string name;
std::string description;
std::string parameters_json; // JSON schema
ToolHandler handler; // std::function<ToolResult(const ToolCall &)>;
};Four fields. The first three feed the chat template's tool-call section so the model knows what's available; the fourth is your function pointer.
Builder (the typed shorthand, generates the JSON schema for you):
easyai::Tool::builder("weather")
.describe("Return the current weather for a city, in metric units.")
.param("city", "string", "Name of the city, e.g. 'Lisbon'", /*required=*/true)
.param("units", "string", "'metric' (default) or 'imperial'.", false)
.handle([](const easyai::ToolCall & c) -> easyai::ToolResult {
std::string city = easyai::args::get_string_or(c.arguments_json, "city", "");
std::string units = easyai::args::get_string_or(c.arguments_json, "units", "metric");
if (city.empty()) return easyai::ToolResult::error("'city' is required");
// …call wttr.in…
return easyai::ToolResult::ok("23 °C, sunny");
})
.build();Tool::make (raw schema string, when you need nested objects /
enums / oneOf that the typed param API can't express):
easyai::Tool::make(
"rgba_set",
"Set the LED RGBA at index.",
R"({"type":"object",
"properties":{
"i":{"type":"integer","minimum":0,"maximum":31},
"color":{"type":"object","properties":{
"r":{"type":"integer"},"g":{"type":"integer"},
"b":{"type":"integer"},"a":{"type":"integer"}
},"required":["r","g","b"]}
},
"required":["i","color"]})",
[](const easyai::ToolCall & c) -> easyai::ToolResult {
// For nested args, parse the JSON yourself; nlohmann is vendored
// by llama.cpp at vendor/nlohmann/json.hpp if you want it.
return easyai::ToolResult::ok("set");
});easyai::args::* are tiny single-level scanners. They're enough for
~95% of tool authors:
std::string q = args::get_string_or(c.arguments_json, "q", "");
long long max = args::get_int_or (c.arguments_json, "max", 10);
bool dry = args::get_bool_or (c.arguments_json, "dry_run", false);
double t = args::get_double_or(c.arguments_json, "threshold", 0.5);
bool has = args::has (c.arguments_json, "verbose");For nested args (objects, arrays of objects), include
<nlohmann/json.hpp> in your handler and parse normally — no easyai
limitation there.
return easyai::ToolResult::ok("the answer is 42");
return easyai::ToolResult::error("network unreachable");error results are tagged is_error=true so the streaming layer can
render them differently (✗ instead of 🔧 in the cli-remote
output). The model still sees the content — it's just hinted that
the call failed.
Best practices:
- Keep ok-content short and structured (the model reads it as plain text; line breaks are fine).
- Truncate raw output to a reasonable budget — 8–16 KB is plenty.
- Format errors as imperative ("missing 'path' argument") — the model will often retry with the fix.
When the model emits write(file_path=…) and gets unknown tool: write back, the rest of the turn is usually wasted retrying the
hallucinated name. tool_lookup is the read-only escape hatch
PLUS the on-demand "full manual" lookup (since 2026-05-26 it ships
with TWO modes):
engine.add_tool(easyai::tools::tool_lookup([&engine]() {
easyai::tools::ToolCatalog v;
for (const auto & t : engine.tools()) {
v.push_back({ t.name, t.wire_description(), t.description });
}
return v;
}));Register it last so the snapshot it returns covers everything
else. The lambda re-reads engine.tools() at every call, so even
tools added dynamically after tool_lookup show up. (Client has
the same tools() accessor, so the wiring is identical.)
Two modes:
-
No arguments → INDEX view. Numbered list of
name: short trigger— one line per tool. Cheap, scannable. What the model sees:1. datetime: Return the current UTC and local date/time. Call for 'now'/'today'/'latest'… 2. web: Search the web and fetch URLs. action=search|fetch. Reply MUST end with a `Sources:` block… 3. fs: Filesystem: read/write/edit/list/glob/grep in sandbox. Batch with action="ops"… 4. tool_lookup: List or inspect tools registered this session. No args → index; name="<substring>" → full manual… -
name="<substring>"→ MANUAL view. Full multi-line description (rules, examples, edge cases) for every tool whose name matches the substring. This is the expanded help text the model drills into when the index trigger isn't enough.
The split is what makes the Shape-C wire format work end-to-end:
the per-turn <tools> block ships the short trigger only (saves
~2 000 tokens), and tool_lookup(name="fs") returns the full
manual on demand. See AI_TOOLS.md "Shape-C wire shape" for the
wire-side picture.
name="<substring>" is case-insensitive partial match. A no-match
result returns a clear "(no tools match: …)" string rather than an
empty list, so the model never confuses an empty filter with an
empty catalogue.
The companion is the system-prompt tools block — easyai::preamble:: tools_block(view) lives in libeasyai and is shared by server,
local, and cli. Together they take the model from "guess and retry"
to "verify first."
The built-in fs_* family takes a root directory and refuses to
escape it (.. and absolute paths are rejected). The check is
path-component aware — a sandbox at /srv/user rejects
/srv/userMALICIOUS/secret (no string-prefix match). Symlinks
resolve through fs::weakly_canonical and the resolved path must
contain the root as a prefix on path-component boundaries. Last-
millisecond symlink swaps (TOCTOU) are defeated by O_NOFOLLOW
on the open() call. Full details in
SECURITY_AUDIT.md §1, §18.3.
If you're new to easyai's threat model, the operator-facing
60-second TL;DR lives at the top of the audit: what easyai blocks
for you, what's your responsibility, and the three knobs that
matter most. Read SECURITY_AUDIT.md §0
before going to production.
Pattern for your own filesystem-touching tools:
easyai::Tool::builder("read_log")
.describe("Read the last N lines of a service log under /var/log.")
.param("name", "string", "Service name (e.g. 'easyai-server.service').", true)
.param("n", "integer", "How many lines (max 5000). Default 200.", false)
.handle([](const easyai::ToolCall & c) -> easyai::ToolResult {
std::string name = args::get_string_or(c.arguments_json, "name", "");
if (name.find('/') != std::string::npos)
return easyai::ToolResult::error("name must not contain slashes");
std::filesystem::path p = std::filesystem::path("/var/log") / (name + ".log");
if (!std::filesystem::exists(p))
return easyai::ToolResult::error("no log: " + p.string());
// …tail the file…
return easyai::ToolResult::ok("…");
})
.build();easyai::Plan is a checklist with four sub-actions exposed as a
single tool:
easyai::Plan plan;
cli.add_tool(plan.tool()); // or engine.add_tool(...)
plan.on_change([](const easyai::Plan & p){
std::cout << "\n=== plan ===\n";
p.render(std::cout, /*color=*/true); // ANSI-styled checklist
});The model sees ONE tool with an action enum:
| Action | Single-item shape | Batch shape (max 20) |
|---|---|---|
add |
text="…" |
items=[{text}, {text}, …] |
update |
id="3", text?="…", status?="working" |
items=[{id, text?, status?}, …] |
delete |
id="3" (or id="all" to wipe) |
items=[{id}, {id}, …] |
list |
(no fields) | — |
Statuses: pending (default on add) → working → done, plus
error (model flags the step as failed) and deleted (soft
delete — the entry stays in the list rendered struck-through, so
the user can see what was abandoned). Terminal rendering with
color=true: bold for active items, dim for done, red for error,
strikethrough+dim for deleted.
The single-tool / multi-action shape is a deliberate trade — it keeps the model's tool-pick fan-out small and lets weak / 1-bit-quant models stay fluent. The tool description tells the model explicitly "never re-add to mutate a step — use update", which closes the duplicate-id failure mode that earlier versions had.
Works reliably with any tool-call-capable model (Qwen 2.5+, Llama 3+, DeepSeek, OpenAI o-series, Anthropic Claude via OpenAI-compat proxies). On non-trivial multi-step tasks, prompt it to "use the plan tool to break the task into steps and tick them off as you go".
You can also seed the plan from your code before letting the model take over:
plan.add("fetch arxiv listing");
plan.add("triage by citation count");
plan.add("draft 5-bullet digest");
// Or programmatically advance / mark error / soft-delete:
plan.update("1", /*text=*/"", /*status=*/"working");
plan.update("2", /*text=*/"triage by citation count + h-index", "");
plan.remove("3"); // marks "deleted" — stays visible, struck throughexamples/cli.cpp ships four inline system_* tools that
read /proc/* and report back. The whole pattern is:
- Read a
/procfile withifstream. - Parse it (helper functions live in
namespace systools). - Format a human-readable string.
- Return
ToolResult::ok(text).
These tools turn the cli-remote process into an observability agent that can answer "is the server paging?", "which CPU is hot?", "what swap device is configured?" — entirely model-driven. Look at the file from the top (~line 60) for a guided tour with comments.
To add your own:
system_disk_usage—df -hworth of info (read/proc/mounts, callstatvfs).system_processes—ps-equivalent (walk/proc/<pid>/stat).system_network— interfaces + traffic counters (/proc/net/dev).
Copy the existing helpers and ship.
Once you call cli.add_tool(...) (or engine.add_tool(...)) the
tool is registered for the lifetime of that object. There's no
"unregister" — destroy the Client/Engine to drop them. This is by
design: the tool list is a property of the conversation contract
(the model was told what's available); changing it mid-flight would
confuse the chat-template renderer.
If you need conditional tools per-conversation, build a fresh
Client for that conversation. Client is move-only; constructing
one is cheap (no I/O until chat()).
The official path on Linux is scripts/install_easyai_server.sh.
Run it from a fresh checkout:
git clone https://github.com/solariun/easy.git
git clone https://github.com/ggml-org/llama.cpp.git
cd easy
sudo scripts/install_easyai_server.sh \
--model /path/to/your-model.gguf \
--webui-title "Box AI" \
--enable-nowIt detects the GPU backend (nvidia-smi → CUDA, rocminfo → ROCm,
vulkaninfo / AMD lspci → Vulkan, else CPU), builds the right
flavour, installs the libs into /usr/lib/easyai/ (isolated from
system), creates an easyai system user, and drops a hardened
systemd unit with mlock, flash-attn, q8_0 KV cache, Bearer auth,
and Prometheus /metrics.
libeasyai-cli already speaks HTTPS, but easyai-server itself is
plain HTTP by design. Terminate TLS at nginx:
server {
listen 443 ssl http2;
server_name ai.example.com;
ssl_certificate /etc/letsencrypt/live/ai.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/ai.example.com/privkey.pem;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_http_version 1.1;
proxy_set_header Connection "";
# SSE keepalive — thinking models can hold a stream for many
# minutes between visible tokens; pick a value at least as
# high as the server's --http-timeout (default 600 s).
proxy_buffering off;
proxy_read_timeout 1800;
proxy_send_timeout 1800;
}
}Then point the client at https://ai.example.com and the build's
OpenSSL link will Just Work.
Run one easyai-server per model on different ports
(--port 8080 / --port 8081). Have your client switch between
them via cli.endpoint(...) — there is no notion of "model swap"
inside a single server process by design.
sudo scripts/install_easyai_server.sh --upgradeThe script does git fetch + git pull --ff-only + rebuild +
systemctl restart easyai-server in that order. In-flight SSE
streams are aborted when the old process dies; the client gets a
HTTP request failed: connection closed error. The client's
built-in HTTP retry layer (--http-retries, default 5) does NOT
re-issue mid-stream because the model has already produced visible
tokens — the upgrade window is briefly visible to active users.
For zero-downtime upgrades, run two backends behind a load
balancer and drain one at a time.
The INI file supports per-model override sections. When the server
loads a model, it resolves symlinks, strips the path to
basename-without-extension, and does a case-insensitive substring
match against all [MODEL_*] section names. The longest match wins.
Keys are the same as [ENGINE] — temperature, top_p, top_k,
min_p, repeat_penalty, presence_penalty, frequency_penalty,
max_tokens, context, ngl, flash_attn, cache_type_k,
cache_type_v, rope_scaling, rope_freq_scale, yarn_orig_ctx,
split_mode, etc. Only include keys you want to override; omitted
keys keep the [ENGINE] value.
Precedence: CLI flags > MODEL_<match> > [ENGINE] > hardcoded.
Example: loading Qwen3-Coder-Next-Q6_K_M.gguf (or a symlink
ai.gguf pointing at it) matches both [MODEL_Qwen3] and
[MODEL_Qwen3-Coder-Next] — the latter wins because
"Qwen3-Coder-Next" is a longer substring match.
[MODEL_Qwen3-Coder]
temperature = 0.2
top_p = 0.92
top_k = 50
min_p = 0.03
repeat_penalty = 1.04
presence_penalty = 0.1
frequency_penalty = 0.05
context = 262144
rope_scaling = yarn
rope_freq_scale = 2
yarn_orig_ctx = 131072
[MODEL_DeepSeek]
temperature = 0.6
top_p = 0.95
top_k = 40
min_p = 0.05
repeat_penalty = 1.0
presence_penalty = 0.0
frequency_penalty = 0.0
context = 131072The installer writes these as commented-out examples in
/etc/easyai/easyai.ini.
Stateless except for whatever you put in /var/lib/easyai/
(model files, sandboxed fs_* roots). Snapshot that directory.
GET /health— JSON status. Cheap, use it as a liveness probe.GET /metrics— Prometheus text exposition (only when--metricswas passed). Counters:easyai_requests_total,easyai_tool_calls_total,easyai_errors_total.GET /props— full server config snapshot (n_ctx, model alias, build info).
curl -H 'Content-Type: application/json' \
-d '{"preset":"creative"}' \
http://ai.local:8080/v1/presetOr from the lib:
cli.set_preset("creative");Affects subsequent requests until changed again. Per-request sampling (set in the request body) still wins for that one call.
--enable-verbose (installer flag) or --verbose (binary flag) makes
the engine log raw model output, parser actions, and SSE events to
stderr. Tail it with journalctl -u easyai-server -f.
scripts/install_easyai_server.sh installs systemd-coredump and
sets LimitCORE=infinity on the unit. When the process dies:
coredumpctl list easyai-server.service
coredumpctl gdb <PID> # opens gdb on the most recent core-c, --ctx N sets the model's sequence window. Bigger ctx = more
KV cache memory per token. Rule of thumb on Vulkan/RADV with
gfx1035: keep ctx + n_predict ≤ what fits in --ngl auto.
--cache-type-k q8_0 --cache-type-v q8_0 cuts KV memory ~3× vs.
default f16 with negligible quality loss for chat workloads. The
installer ships q8_0 by default.
--flash-attn enables fused attention — faster + less memory on
backends that support it. CUDA and Metal: yes. Vulkan: works on
RDNA2+ with recent llama.cpp (validated on gfx1035).
--mlock pins the model in RAM so the OS can't page it out under
pressure. Required on the AI box because GTT-mapped pages would
otherwise be swap candidates. Needs LimitMEMLOCK=infinity in the
systemd unit (the installer sets this).
What each knob does. At every step the model emits a probability
distribution over the whole vocabulary (~100k+ tokens); these knobs
decide how a token is picked from it. They work in sequence — the
cutters (top_k, top_p, min_p) narrow the candidate pool over
the raw distribution, then temperature controls how randomly the
final token is drawn from the survivors.
temperature— focus-vs-risk dial; divides the logits before softmax.→ 0is greedy (always the top token: deterministic).0.2–0.5keeps the model tight on format, syntax, and facts.1.0is the unmodified distribution.> 1.0flattens the curve so unlikely tokens get a real chance — more creative, more prone to error. The main behaviour dial.top_k— fixed tail cut: keep the K most-probable tokens, discard the rest. Non-adaptive; a cheap guardrail against junk from the long tail.top_p(nucleus) — adaptive tail cut: keep the smallest set of top tokens summing to P. Tiny nucleus when the model is confident, large when it's unsure.min_p— adaptive too, but anchored to the top token: keep tokens withprob ≥ min_p × prob_of_top.min_p 0.5keeps only what's within 2× of the best — aggressive, very focused.
They stack — tightening all of them at once is redundant. Practical
rule: pick one adaptive cutter (top_p ~0.9–0.95 or min_p ~0.03–0.1), leave top_k generous as a backstop (50 default), and
use temperature as the real behaviour dial. Low temperature
(0.2–0.6) for code / agentic / structured output; higher (0.8–1.2) for
creative work; lean conservative on heavily quantised models
(quantisation already adds logit noise, and high temperature amplifies
it into real errors).
Presets order (project-wide default is precise):
deterministic— temp 0.0, greedy. Same prompt → byte-identical reply. Reproducibility / CI / eval harnesses.precise— temp 0.2, min_p 0.10. Default. Code, math, factual Q&A, tool-call workloads, structured output.balanced— temp 0.7. General-purpose chat, summarisation, casual Q&A.creative— temp 1.0, top_p 0.95. Brainstorming, fiction, marketing copy.wild— temp 1.4 + relaxed. Pure exploration; don't ship it.
See README.md §Sampling presets for
the full Behaviour / Pick when… table.
Per-request: pin temp + top_p + top_k + min_p in the request body
(via the --temperature / --top-p / etc. flags on cli-remote, or
the matching Client::* setters in code). These reset every turn.
Penalties bias generation against tokens that have already been produced. Three knobs, three failure modes:
| Knob | Form | Default | What it bites on |
|---|---|---|---|
repeat_penalty |
multiplicative on logits in recent window | 1.04 | tight literal repetition ("I'll write X / Let me write X / OK, creating X") |
frequency_penalty |
additive, proportional to token count | 0.05 | over-use of common tokens ("the the the"); range [0.0, 2.0] |
presence_penalty |
additive, fixed cost per token-already-seen at all | 0.1 | topic stickiness without per-occurrence ramp-up |
repeat_penalty (default 1.04) is a light anti-loop safety net for
thinking models that otherwise rephrase the same intent before
acting. It works for short turns. On long agentic flows (10+
tool hops) it starts misfiring — by the fifth fs_read_file call
the literal tokens of the tool name fall inside the window, the
model paraphrases ("read_file", "fs_read"), the dispatcher fails
with "unknown tool".
frequency_penalty (default 0.05) applies an additive cost
proportional to how many times a token has appeared. Unlike
repeat_penalty, the cost grows with each occurrence, so it
penalises frequent re-use harder than a one-off repeat.
Set via [ENGINE] frequency_penalty / --frequency-penalty.
presence_penalty (default 0.1) is the lever for the other
failure mode. A fixed per-token-seen cost discourages re-introducing
the same vocabulary without the per-occurrence ramp of
repeat_penalty, so calling fs_read_file for the tenth time costs
the same as the second.
The production AI box ships repeat_penalty=1.04 +
frequency_penalty=0.05 + presence_penalty=0.1 — a balanced
triple that tested better on long agentic flows than a single heavy
repeat_penalty. Operators with shorter chat workloads can keep
a simpler pairing (repeat_penalty=1.15, others at 0).
Persistence: penalties are set at startup via the INI / CLI flags
([ENGINE] repeat_penalty / presence_penalty / frequency_penalty,
--repeat-penalty / --presence-penalty / --frequency-penalty) and
persist across requests. Per-request set_sampling() only
resets the shapers (temp / top_p / top_k / min_p) — the penalties
stick. This is deliberate: they're operator-tuned guardrails, not
per-call stylistic knobs.
The full design rationale (math, failure modes, layered API) lives
in design.md §4b.
Engine::chat() caps at 8 tool hops; Client::chat() does the
same. A model that runs away calling tools without converging will
hit the cap and bail out with the last partial answer. Visible in
verbose mode as [easyai] hop 7: ….
When ctx_size exceeds the model's native training context, RoPE
scaling lets the model extrapolate. Three Engine setters control it:
rope_scaling(type)—"none"(default),"linear", or"yarn". YaRN is the recommended method for large extensions (2x+).rope_freq_scale(scale)— frequency scale factor.0.0= model default. Pass2to double the effective context.yarn_orig_ctx(ctx)— YaRN original context length.0= use model default. Set to the model's training context (e.g.131072for a 128K-trained model) when extending with YaRN.
CLI / INI equivalents: --rope-scaling, --rope-scale,
--yarn-orig-ctx / [ENGINE] rope_scaling, rope_freq_scale,
yarn_orig_ctx.
The installer defaults ship rope_scaling=yarn, rope_freq_scale=2,
yarn_orig_ctx=131072 — doubling a 128K model to the default
ctx_size=262144.
Controls how model layers distribute across multiple GPUs:
split_mode(mode)—"none"(single GPU),"layer"(default, split layers across GPUs),"row", or"tensor".
CLI / INI: -sm / --split-mode / [ENGINE] split_mode.
Single-GPU setups should use "none" (the installer default).
Multi-GPU rigs benefit from "layer" or "row" depending on the
model size vs. per-GPU VRAM.
Since the rename, two independent binaries cover the two use cases. No more dual-mode flag juggling on a single binary.
| Binary | What it loads | Library link |
|---|---|---|
easyai-local |
a local GGUF (no HTTP at all) | libeasyai |
easyai-cli |
a remote /v1/chat/completions |
libeasyai-cli |
easyai-cli (remote) supports the standard TLS + agentic flags:
| Flag | Default | Effect |
|---|---|---|
--insecure-tls |
off | Skip peer certificate verification (DEV ONLY, https only). |
--ca-cert <path> |
system | Trust a custom CA bundle (PEM) for https:// endpoints. |
--timeout SECONDS |
86400 (24h) | Read+write timeout — sized for multi-hour agentic sessions. The timer only fires on TRUE silence; every SSE delta resets it, so the value isn't a wall-clock budget on the turn, just a "no progress for X seconds" cutoff. EASYAI_TIMEOUT env. |
--http-retries N |
5 | Extra attempts on transient HTTP failures (connect refused, read timeout, 5xx). 0 disables. Logged on stderr. EASYAI_HTTP_RETRIES env. |
Both binaries share the same preset commands, the same /help, and the
same streaming-aware <think> stripper (--no-reasoning on easyai-cli,
--no-think on easyai-local).
# point at easyai-server running on the LAN
./build/easyai-cli --url http://10.0.0.5:8080
# point at openai.com (env vars EASYAI_API_KEY also work)
./build/easyai-cli --url https://api.openai.com/v1 \
--api-key sk-... \
--model gpt-4o-mini
# point at a llama-server / vLLM / ollama endpoint — anything that speaks /v1
./build/easyai-cli --url http://127.0.0.1:11434/v1 --model llama3.1:8bOne-shot mode for scripting:
# Local one-liner; banners go to stderr so capturing stdout is clean.
answer=$(./build/easyai-local -m model.gguf -p "summarise: $(cat file.txt)")
# Remote with reasoning suppressed:
./build/easyai-cli --url http://localhost:8080 --no-reasoning \
-p "explain BGP route reflectors in two sentences" \
> brief.md--no-think (local) and --no-reasoning (remote) strip <think>…</think>
(and <thinking>…</thinking>) blocks from output. The filter is
streaming-aware and works even when the open or close tag is split across
two model-emitted token chunks.
web_search works out of the box — it talks to DuckDuckGo's HTML endpoint
directly via libcurl. There is nothing to configure and no API key.
If DDG starts rate-limiting your IP (rare), the tool returns an explicit
error message instead of silently failing. If you need a different backend
(Bing, Brave, your own SearXNG), the implementation lives in
src/builtin_tools.cpp::web_search() — copy that handler, swap the URL and
the regex pair, and register your variant via engine.add_tool(my_search()).
Pass --ngl 0 (CLI/server) or engine.gpu_layers(0) (lib).
Just don't add it. There is no global "remove" — easyai has no global
state. To run easyai-local without any tools at all:
./build/easyai-local -m … --no-toolsFor the server: --no-local-tools (renamed from --no-tools so the
flag's scope is unambiguous now that easyai-server can also be an
MCP client — --no-local-tools skips the LOCAL toolbelt only,
leaving the knowledge_* tools, external-tools, and any tools fetched via
--mcp intact).
easyai-server is a drop-in replacement for llama-server for almost
every flag a deployment script cares about. A long-running production
launch looks like:
./build/easyai-server \
--model /var/lib/easyai/models/ai.gguf \
--alias SolariunAI_Box \
--host 0.0.0.0 --port 8080 \
--ctx 262144 \
--ngl 99 \
--threads 8 --threads-batch 8 \
--flash-attn \
--cache-type-k q8_0 --cache-type-v q8_0 \
--mlock --no-mmap \
--preset balanced --temperature 0.2 --top-p 0.92 --top-k 50 \
--api-key "$EASYAI_API_KEY" \
--metrics \
--system-file /etc/easyai/system.txt \
--sandbox /var/lib/easyai/workspaceFlag map vs. llama-server:
| llama-server flag | easyai-server flag |
|---|---|
-m / --model |
-m / --model |
--host / --port |
--host / --port |
-a / --alias |
-a / --alias |
-c / --ctx-size |
-c / --ctx |
--n-gpu-layers |
--ngl |
-t / --threads |
-t / --threads |
-tb / --threads-batch |
-tb / --threads-batch |
-fa / --flash-attn |
-fa / --flash-attn |
-ctk / -ctv |
-ctk / -ctv |
--mlock / --no-mmap |
--mlock / --no-mmap |
--api-key |
--api-key |
--metrics |
--metrics |
--reasoning <on/off> |
--reasoning <on/off> |
--override-kv |
--override-kv |
--frequency-penalty |
--frequency-penalty |
-sm / --split-mode |
-sm / --split-mode |
--rope-scaling |
--rope-scaling |
--rope-freq-scale |
--rope-scale |
--yarn-orig-ctx |
--yarn-orig-ctx |
-np / --parallel |
accepted; warns since the engine is single-context |
When --api-key is set, every /v1/* request must carry
Authorization: Bearer <key>. /, /health, and /metrics stay open
(useful for liveness probes and Prometheus scrapes).
/metrics exposes Prometheus-style counters
(easyai_requests_total, easyai_errors_total, easyai_tool_calls_total)
that you can wire into Grafana or alertmanager.
The server speaks plain HTTP and supports CORS. Stick nginx/Caddy in front to add TLS, auth, and rate limiting. Example Caddyfile:
ai.example.com {
reverse_proxy 127.0.0.1:8080
basicauth {
gus $2a$14$… # bcrypt hash of password
}
}
Run one easyai-server per model on different ports, then add a tiny
proxy that maps model field → upstream port. The single-mutex design
inside one server is the right unit; between servers you scale by process.
- Did the GGUF download fully? Check the file size; small files often mean HTML 404 pages.
- Wrong architecture? llama.cpp prints the supported-arch list during load
with
--verbose. Addengine.verbose(true). - On macOS, run
xcode-select --installonce if Metal headers are missing.
Conversations grow until the KV cache fills. Either:
engine.clear_history()between turnsengine.context(8192)for a longer window (subject to model training)- In the server, this can't happen because every request resets the engine.
Smaller models (≤ 1B parameters) often miss the chat template's tool-call
syntax. easyai catches the parser exception and returns the raw text as the
assistant message. To see what the model emitted, set engine.verbose(true).
Move up to a 3-7B model with native tool-calling support (Qwen2.5-Instruct, Llama-3.1-Instruct, Mistral-Nemo) and the issue disappears.
The model invented a tool name that isn't registered. easyai injects a
ToolResult::error("unknown tool: …") into the conversation; usually the
model recovers next turn. If it doesn't, lower the temperature or be
more specific in your system prompt.
A filesystem tool was called with a path outside the root you passed to
fs_read_file("…") etc. By design — pick a wider root or move the file in.
Something inside chat() threw. The engine remains usable for the next
request (we lock + reset on every call). Check engine.verbose(true) and
re-run for stack-level detail in stderr.
It's actually fine — the printed line "stopped cleanly" tells the truth.
Some shells/wrappers report a non-zero code because of the signal, but
main() returned 0.
The server appends an AUTHORITATIVE preamble to whichever system
message reaches the model (--inject-datetime on is the default).
The preamble has up to three blocks:
# AUTHORITATIVE DATE/TIME— current wall-clock + timezone.# KNOWLEDGE CUTOFF— training-cutoff hint + rule to verify post-cutoff facts.# MEMORY VOCABULARY— top-40 keyword index when--memoryis set (so the model can dispatchknowledge_searchwithout first callingknowledge_keywords).
For regression testing the preamble can be disabled per-request without restarting the server:
curl http://ai.local:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'X-Easyai-Inject: off' \
-d '{"model":"easyai","messages":[{"role":"user","content":"What year is it?"}]}'Header values:
off— skip the preamble for this request only.on— force injection on this request even when the server was launched with--inject-datetime off.- (anything else, or absent header) — defer to the server flag.
WHY DEFAULT ON: most production deployments want the model to trust the server clock, flag post-cutoff facts as uncertain, and know what's in its persistent memory. Turning the preamble off removes a real safety net — only do it for A/B QA runs where you're explicitly comparing pre-injection behaviour.
The same builder is exposed as a library API (since 2026-05-16) so third-party hosts of libeasyai get the same behaviour without copying the format:
#include "easyai/preamble.hpp"
std::string preamble = easyai::preamble::build({
/* inject_datetime = */ true,
/* knowledge_cutoff = */ "2024-10",
/* memory_root = */ "/var/lib/myapp/rag",
});
engine.system(default_system + preamble);Each block is conditional: pass inject_datetime=false to skip the
date/time + cutoff blocks (useful when the remote server already
handles them), pass memory_root="" to skip the memory vocabulary
block. The function is stateless — recomputes every call (fresh
date, fresh directory scan for the memory index). Safe to call on
the hot path; the directory scan is ~10-50ms for typical stores.
Configure-time message in the cmake summary:
-- easyai-cli: OpenSSL NOT found — HTTPS endpoints will be rejected at runtime
Install libssl-dev (Debian / Ubuntu) or openssl-devel (Fedora /
RHEL), wipe the build dir, and reconfigure. At runtime the
Client::endpoint("https://…") call will then succeed.
If your server uses a self-signed cert, either:
cli.tls_insecure(true);— DEV ONLY, skips peer verification.cli.ca_cert_path("/path/to/ca.pem");— trust a custom CA bundle.
The cli-remote binary exposes the same as --insecure-tls and
--ca-cert PATH.
The full text after the colon is what cpp-httplib reported. Note
that the client retries transient failures up to 5 times by
default (configurable via --http-retries N / cli.http_retries(n)
/ EASYAI_HTTP_RETRIES); each retry logs to stderr without
--verbose:
[easyai-cli] HTTP attempt 1/6 failed (Could not establish connection); retrying in 250ms
[easyai-cli] HTTP attempt 2/6 failed (Could not establish connection); retrying in 500ms
…
[easyai-cli] HTTP attempt 6/6 failed (…) — retry budget exhausted
If you see the budget-exhausted line, the underlying cause is one of:
Connection refused— the server isn't listening on that host/port. Check--urlvalue andnc -vz host port.SSL handshake failed— TLS mismatch. Check the cert hostname matches what you're connecting to, the chain is complete, and that your client's CA store has the issuer (or pass--ca-cert).read timeout/Failed to read connection— the model is taking longer than--timeout. Default is now 86400 s (24 h), raised from the older 600 s default specifically to accommodate thinking models with long deliberation phases. Bump further if needed (--timeout 3600orcli.timeout_seconds(3600)); also bump the server side (easyai-server --http-timeout 3600) so the listen socket matches.
Retries do NOT fire mid-stream — once the model has emitted any visible token the layer surfaces the partial response instead of re-issuing (which would duplicate output). For mid-stream cuts the fix is the timeout, not the retry budget.
The cumulative ctx counter on easyai-server's webui needs the new
ctx_used field that the server only added in commit d7f638e.
On older builds you'll see the per-request count instead — upgrade
the server or just ignore the bar's percentage.
After 2-3 successful tool calls some Qwen3 fine-tunes give up on the
XML format and output *🔧 toolname(args)* in markdown instead.
Engine recovers automatically (commit 46903e3); look for this line
in journalctl -u easyai-server:
[easyai] recovered N tool call(s) from markdown markers (model abandoned <tool_call> syntax — agentic loop continues)
If you see the message and the loop continues, you're fine. If not,
add --enable-verbose and check journalctl for [easyai] hop N raw tail: lines — those show what the model actually emitted, which
helps tune the system prompt.
DuckDuckGo's HTML endpoint serves a CAPTCHA / "anomaly" page when it suspects a bot. Wait a minute, lower request rate, or use a different network. No API key option exists — that's the point of the DDG-HTML approach.
Knowledge cutoff is real and the model can't tell what date it is
unless told. Easiest fix: enable datetime in the tool list and
prompt it to call that first when in doubt. An even harder
constraint can be enforced at the server level — see the upcoming
"authoritative datetime injection" feature on easyai-server (commit
soon to follow).
If you want to go deeper:
design.md— internal architecture and "why" decisions, including Section 0 (full dependency inventory: llama, cpp-httplib, nlohmann::json, libcurl, OpenSSL, …) and Section 5b (the OpenAI-protocol client agentic loop).include/easyai/engine.hpp— every public method of the local engine, with doc comments.include/easyai/client.hpp— every public method of the OpenAI client lib, mirroringengine.hppshape.include/easyai/tool.hpp—Tool,ToolCall,ToolResult,Tool::Builder(used identically byEngineandClient).include/easyai/plan.hpp—Planchecklist +Plan::tool()factory.include/easyai/builtin_tools.hpp— factories fordatetime,web_search,web_fetch,fs_*.include/easyai/presets.hpp— sampling presets and the runtime override parser (/temp,creative 0.9, …).src/engine.cpp— thechat()loop is annotated step by step; three-layer tool-call recovery (Qwen / Hermes / markdown) lives inparse_assistant.src/client.cpp— HTTP/SSE transport, agentic loop mirroringEngine::chat_continue, request-body assembly with the full sampling/penalty surface.src/plan.cpp— multi-action plan tool withadd/start/done/list.examples/server.cpp— the per-request flow is annotated; great starting point for a custom HTTP layer.examples/cli.cpp— REPL + management subcommands + inlinesystem_*tools, doubles as the cookbook for adding your own tool to aClient-based agent.scripts/install_easyai_server.sh— production deployment as a hardened systemd unit on Linux (CUDA / ROCm / Vulkan / CPU auto-detect, mlock, flash-attn, q8_0 KV).cmake/easyaiConfig.cmake.in— the find_package shim;find_package(easyai 0.1 REQUIRED)returnseasyai::engine(libeasyai) andeasyai::cli(libeasyai-cli) as IMPORTED targets your project links against.SESSION_NOTES.md— running project journal: recent commits, pending validations, common pitfalls. Useful for resuming context in a fresh chat.README.md— top-level pitch + selective-build cheatsheet.
Happy hacking.