Releases: solariun/easyai
v0.6.0 — memory tool, MTP speculative decoding, multi-arch CI
90 commits since v0.5.4 — 55 files, +11926 / −2104. Highlights below.
Memory tool (passive RAG)
The rag tool has been renamed to memory, with the underlying mechanism reframed as a passive RAG technique. The tool description now carries INVIOLABLE rules — "save what you learn", "memory before internet", "skills are first-class" — and the system prompt enforces a strict information pipeline: memory → web → answer. The keyword vocabulary built from saved memories is now auto-injected into every prompt (server and local), so the model knows what topics it can recall before it tries. The verify/save/re-verify loop that fired when --memory was enabled has been broken.
MTP speculative decoding
Multi-token-prediction speculative decoding is now actually wired through the decode loop (it had only been setting params). ctx_tgt/ctx_dft are now passed into params.speculative.draft before common_speculative_init, and the draft KV cache is trimmed both between draft()/process() and at the top of each MTP iteration to clear M-RoPE position drift from rejected drafts. Configure via --spec-type draft-mtp and --spec-draft-n-max, or use the installer shortcut --mtp. Speculative decoding via draft-simple is also now wired.
Multi-arch build pipeline
.github/workflows/build.yml and release.yml ship CI for x86_64, aarch64, and armhf, with cross-compile toolchain files under cmake/toolchains/. llama.cpp is cloned as a sibling rather than checked out into the workspace (actions/checkout rejected paths outside it).
CLI session persistence + shell mode
.easyai_session now checkpoints after every tool dispatch so a force-exit survives intact. --continue (opt-in) and --compress flags landed alongside a /compress REPL command. Every [cli] knob is now in the INI, with layered $HOME/.easyai/ lookup and a resources/ subdir. New --session-file <name> overrides the default, and --no-local-session does read-only seeding. --shell mode gives a green prompt and prompt-eval green dot.
Filesystem tool overhaul
fs gained edit, append, and ops (batch up to 20) actions. edit displays the diff and auto-inserts a seam newline (HIGH-severity fix for silent file corruption), and reports the post-edit window. read is line-based with start_line, line numbers on by default, and reports total line count. glob matches top-level files. Dispatch in read/list/grep is friendlier with file-vs-dir hints. A green-dot tool marker shows when a tool fires.
Tools surface — split by default
--tools-mode {unified,split,both} exposes one-verb-per-tool surfaces. The default flipped from unified to split: each verb now appears as its own tool. Tool descriptions were rewritten direct (~415 lines cut), and the system "Tool notes" block is now a one-line index.
Web search — cascading backends
Web search now cascades through five engines in order: google → brave → ddg-lite → bing → ddg. All backends switched to a Netscape 4.79 user-agent (dropped the Chrome XHR persona). New backends: brave HTML, ddg-lite, and bing RSS.
Prompt + preamble
The AUTHORITATIVE preamble is now centralised in libeasyai (easyai::preamble::build), so server / local / cli share one builder. The CITE SOURCES rule was triple-reinforced in prompt + web after it was being ignored. Scope and citation rules tightened. The model is now told to stop when it has enough — the user can refine.
WebUI
Tool display shows the diff and metrics; prompt-eval has its own format. Spinner shows instantaneous t/s. Dark-blue speed report now emits on tool dispatch, speed drop, and finish (was a regression). Conversations get an always-visible X button to delete. Tone/preset chip is hidden — UI pins to the server-launched preset. AI Box logo: green → blue gradient with a softer two-layer cyan aura, icon 50% larger. Thinking-label shimmer sweep restored and re-enters on every multi-hop prompt-eval pass. Per-batch prompt-eval progress mirrors llama-server's prompt_progress.
Engine recovery + server compat
Engine now recovers from incomplete-tool turns across requests and recovers bare <toolname> tool-call tags. Zero-work retry uses the max_incomplete_retries budget; the server stopped flagging zero-work turns as incomplete; the CLI no longer persists incomplete turns to .easyai_session. GPU DeviceLost during KV cache clear is now caught to prevent a server crash. New --chat-template-file and --reasoning-format pass through to llama-server; qwen3-think.jinja ships as an editable reference template (refresh-on-upgrade).
Installer
install_easyai_server.sh defaults --mdns-hostname to the current system hostname; --mdns-hostname flag landed (default ai → ai.local). --mtp is now a one-flag MTP turn-on. Restart attempts capped at 2. ttm.pages_limit upserts an existing value instead of skipping. system.txt_template ships, but no active system.txt. Backticks in unit/INI heredocs are now escaped — they had been executing as command substitution. --force is now a total clean rewrite. [ENGINE] block documents spec_type / spec_draft_n_max. Service ExecStart no longer carries -m — the model path is INI-only now.
Security
Seventh security pass landed: 1 HIGH, 1 MEDIUM, 1 LOW. The fs(action="edit") seam-newline fix above was the HIGH. SSL-only httplib calls are now guarded behind CPPHTTPLIB_OPENSSL_SUPPORT. Installer heredoc backtick escaping (above) closed a command-substitution path.
Plan tool
Description and schema tightened to remove ambiguity. Step text hard-capped at 80 chars; numbered-list anti-pattern is rejected.
Other
easyai-cli: first Ctrl+C stops generation, second quits the REPLeasyai-cli:--continueflipped back to opt-in (default OFF)build_macos.sh: auto-clone llama.cpp sibling,--upgradeflag,--installfor system-wide installmemory_append: upsert behaviourfs.glob/rag/memory: accept natural strings, tolerate stringified-array args- Snprintf truncation warning on gcc 15 fixed
- Docs refresh: MTP, sampling knobs (temperature/top_k/top_p/min_p), memory vocab auto-injection, audit count (3/4 → 7), brand SVG inlined in
server.cpp(xxd codegen step dropped)
Full Changelog: v0.5.5...v0.6.0
v0.5.4 — `python3` result rendered with the executed snippet
v0.5.4 — python3 result rendered with the executed snippet
The tool result returned by python3 now opens with a fenced
python ... block carrying the snippet that just ran, followed
by a [python3 executed] notification line, then the exit code and
captured output. Chat UIs that render markdown (the embedded webui,
typical chat clients) display the code with syntax highlighting, so
an operator skimming the conversation transcript can see what
executed without having to expand the raw tool-call JSON.
Result shape
<the snippet>[python3 executed]
exit=0
<captured stdout+stderr>
Notes
-
The model's
codeargument is what gets rendered — the
kPythonSandboxPreamble (the disk-restriction monkey-patch
added in v0.5.2) is deliberately stripped from the displayed
source. Only the user-supplied snippet appears in the block,
so transcripts stay readable. -
Spawn-side errors (pipe / fork failure — the interpreter
never ran) still surface unwrapped so the error message stays
the actual cause and isn't dressed up with a misleading
"executed" notice. -
The OUTPUT section of the tool description now documents the
result format, so the model knows it doesn't need to add its
own markdown wrapping.
Smoke-tested 4/4: oneliner, multiline (preserves indentation),
denied-disk (preamble still triggers, error appears below the
code block), missing-arg (returns unwrapped error, no exec).
3 files changed, 99 insertions(+), 1 deletion(-).
v0.5.3 — Server METRICS line always-on, default every 5 minutes
v0.5.3 — Server METRICS line always-on, default every 5 minutes
The periodic METRICS log line in easyai-server is now emitted
unconditionally, not just under --verbose. Default interval moved
from 1 second to 300 seconds (5 minutes) so the line is low-
overhead enough to leave on permanently in production.
Why
Operators need CPU / memory / GPU / load / TCP-state / TIME_WAIT-
pressure telemetry in journalctl whether or not they're chasing a
debug session. Gating on --verbose meant production deployments
that left verbose off were flying blind on the slow-moving
signals (RSS creep, ephemeral-port exhaustion, fd usage) that
catch problems before they surface as visible failures.
Behaviour shift
- metrics_thread starts whenever metrics_interval > 0
(previously also required --verbose). - metrics_interval default 1 → 300 seconds.
- --verbose now claims only the per-request → / ← arrival /
completion lines. The METRICS line is independent. - Installer easyai.ini template bumped from
metrics_interval = 60tometrics_interval = 300to match.
Migration
Existing operators who pinned [SERVER] metrics_interval in their
INI keep their value — only the unspecified default shifts. Bump
DOWN (60, 30, 5) when actively troubleshooting; set 0 to disable.
Smoke-tested
10 METRICS ticks fired in 30s with --metrics-interval=2 and no
--verbose. The default-300 banner renders correctly when no
flag is passed.
5 files changed, 117 insertions(+), 39 deletions(-).
v0.5.2 — `python3` default-on + sandbox-rooted disk surface
v0.5.2 — python3 default-on + sandbox-rooted disk surface
Promotes the python3 tool (added in v0.5.1) from explicit
opt-in to default-on whenever the operator has signalled "the
model can touch files" — same gate as fs: --sandbox set OR
--allow-bash on. The embedded webui in easyai-server inherits
this for free since the systemd unit always ships with
--sandbox /var/lib/easyai/workspace.
Highlights
python3defaults ON. The flag table now mirrorsfs:
auto-registers when --sandbox is set or --allow-bash is on.
--allow-python is gone; --no-python is the new opt-out
(and [SERVER] allow_python defaults on; set to off in the
INI for the same effect).- Disk access is mechanically restricted to the sandbox root
via a ~25-line Python preamble auto-prefixed to every
snippet. The preamble monkey-patches builtins.open /
io.open / os.open to reject any path whose realpath
resolves outside the cwd Python was chdir'd into.
open("/etc/passwd") raises PermissionError with a message
pointing the model to fs(action=...). - Description rewritten to forbid disk use. The new prose
reads as a contract: "USE FOR testing, calculation, data
processing, networking, information gathering. NEVER USE
FOR DISK — every disk operation has a fs(action=...)
equivalent."
What this gives you
The webui can now do quick computations, JSON wrangling, regex,
date math, network probes, urllib HTTP fetches — all without
the operator having to remember --allow-python. And model
authors who follow the description's contract will use the
right tool for disk (fs), with the sandbox-restriction
preamble as defense-in-depth for the cases where they don't.
Defense-in-depth, NOT a hardened sandbox
The model can still escape via import ctypes; ctypes.CDLL( "libc.so.6").open(...), subprocess.run, or os.system.
The protection is against ACCIDENT, not adversarial intent —
same threat model bash has had since day one.
Smoke-tested
10/10 cases pass:
- sandbox_read_ok / sandbox_write_ok / sandbox_subdir_ok
- etc_passwd_blocked / dotdot_blocked / os_open_blocked
- pathlib_blocked (caught through pathlib.py's internal
open() call)
- compute_ok (Decimal math) / network_ok (gethostbyname)
- stdout_ok (fd-int passthrough lets sys.stdout.write work)
Migration
v0.5.1 callers passing --allow-python get an "unknown arg"
error — drop the flag (python3 is on by default now). Callers
that explicitly didn't want python3 should pass --no-python
(or set [SERVER] allow_python = off in the INI).
13 files changed, 387 insertions(+), 153 deletions(-).
v0.5.1 — `python3` tool: isolated stdlib snippet runner
v0.5.1 — python3 tool: isolated stdlib snippet runner
A second shell-class executor alongside bash. Runs Python 3
snippets via python3 -I -S -E -c <code> — isolated mode: no
PYTHON* env, no site-packages, no cwd on sys.path. Standard library
only; third-party imports fail with ModuleNotFoundError, by design.
Predictable behaviour regardless of host Python configuration.
Highlights
python3(code, timeout_sec?)— runs the snippet via
python3 -I -S -E -c <code>. cwd pinned to --sandbox; fds 3+
closed before exec; SIGTERM/SIGKILL deadline (default 30s,
max 300s); 32 KB stdout+stderr cap.--allow-pythonopt-in flag, mirroring--allow-bash. Wired
through every binary (easyai-cli,easyai-local,
easyai-server,easyai-mcp-server) plus the[SERVER] allow_pythonINI key.- Operator-facing live mirror controlled by
--no-show-python/
[cli] show_python— independent of--no-show-bashso each
can be quieted on its own. - Internally,
bashandpython3now share one
run_capped_subprocesshelper — the fork/fd-close/chdir/drain/
wait machinery only lives in one place. python3reserved name added toEASYAI-*.toolscollision check
so external manifests cannot shadow it.
Why
Data manipulation — JSON wrangling, regex, Decimal math, date
arithmetic, statistics — is one Python snippet but a maze of shell
quoting in bash. Giving the model both means it picks the right
tool for the shape of the task.
NOT a hardened sandbox
The interpreter has the caller's full uid/gid and import os /
import socket / import subprocess all work. The -I -S -E
flags constrain startup (no env, no site-packages, no cwd on
sys.path), not capabilities. Threat model is the same as bash
— hence the same explicit-opt-in gate.
Internal
enum class CappedExecKind { Bash, Python3 }selects the exec
call in the child; everything else is shared. Async-signal-safe
write(2) of the tool_label + ": exec failed" replaces the
fprintf-mixing-with-parent-stdio in the failure path.[bash] $ .../[python3] $ ...banners parameterized on
tool_label so the operator-facing mirror reuses the same shape.
Smoke-tested
print(2+2), JSON via stdlib, isolated check (sys.path[0] = the
stdlib zip, not cwd — confirms -I), stderr capture, raise
SystemExit(2), missing arg, timeout, third-party import
correctly fails with ModuleNotFoundError.
18 files changed, 732 insertions(+), 342 deletions(-).