Disciplined minimal code orchestrator with gated multi-model review before subagents ever start writing code.
HermesUltraCode is a plugin for Hermes Agent (Nous Research), not a new agent runtime. It wraps the moment Hermes hands a task to a subagent and runs that task past a neutral reviewer on a different model lab before any subagent starts writing code; the reviewer can only tighten the task or block it, never rewrite it, and every decision is recorded in an immutable, secret-redacted audit trail. "Orchestrator" names the mode of working (plan, review, then dispatch); it rides on your existing Hermes orchestrator and does not replace it.
Install and first run (full detail in Quick start):
hermes plugins install MahdiHedhli/HermesUltraCode
# configure a reviewer on a DIFFERENT lab than your orchestrator in ~/.hermes/.env
# (Quick start has a keyless xAI-proxy example), then:
hermes plugins enable hermesultracode
hermes gateway restart # or just start a fresh `hermes` session
⚠️ It fails closed until a reviewer is configured. With no reviewer set, the gate blocks everydelegate_taskrather than dispatch a worker un-vetted. Set the reviewer env first.
Then drive it with /ultracode <task>, or open the UltraCode tab in hermes dashboard (under
Plugins, beside Kanban).
A native tab in the Hermes web dashboard (sidebar, right under Kanban) during a real multi-agent build — the orchestrator + subagents in parallel, each with its task, tool log, and the reviewer's tightening directive. Ships inside the plugin — zero extra install.
Same tab, more sub-views — Plan (left): the orchestrator's live build stages (done · active · to do) from its todo tool; Audit (right): the immutable trail — blast-radius tiers, per-dispatch verdicts/decisions, and the fail-closed flag so silent degradation is visible. The same views are also a build-free standalone dashboard (hermes ultracode-dashboard) when you want it decoupled.
A self-contained pre-dispatch prompt gate, neckbeard generation discipline (ported from Ponytail), and observability dashboard layered onto an existing Hermes Agent (Nous Research) orchestrator/worker setup. It does not rebuild Hermes — it wraps the subagent-dispatch boundary so no worker prompt reaches a subagent un-vetted, biases generation toward minimalism, and keeps an ISO 27001-grade trail of every decision.
orchestrator base prompt ─▶ [ GATE ] ─▶ dispatched prompt ─▶ worker
│ classify blast radius (code)
│ reviewer (a DIFFERENT lab) → structured verdict
│ tighten-only validation (code)
│ release decision (code, not chat)
└─▶ immutable, redacted audit row ─▶ dashboard / MCP
- ✂️ Tighten-only by construction — the reviewer may append constraints or block,
never rewrite.
dispatched = base (verbatim) + directives, proven in code, not trusted to the model. - 🔒 Fail-closed, always — a missing, late, garbled, or quota-starved verdict blocks-and-escalates. Silence is never a pass.
- 🧬 Reviewed by a different lab — on purpose — the reviewer must run on a different model lab than the orchestrator (enforced at startup). It can't grade its own homework, and because it shares neither the orchestrator's training data nor its failure modes, it catches the blind spots a same-lineage model waves through. (more ↓)
- 🧾 Audited like evidence — one immutable, secret-redacted row per dispatch;
UPDATE/DELETEblocked at the database; JSON/CSV export. - 🪶 Zero runtime dependencies — stdlib-only core, 208 offline tests, the model provider mocked.
These are the design, enforced in code, not by trusting the model:
- Fail closed. A missing/unparseable verdict, reviewer error, timeout, quota
exhaustion, or empty response is not a pass. Silence degrades to
block-and-escalate, never to silent pass-through. —
core/gate.py:_fail_closed - The reviewer is neutral, not adversarial. Its objective is to maximise worker
success against the guidelines. A no-op (zero added directives) is a good outcome
scored as success. The word "adversarial" never appears in its role/system prompt.
—
core/gate.py:REVIEWER_SYSTEM_PROMPT - Tighten-only by construction. The base prompt is immutable. The reviewer may only
append constraints or block; it can never replace, delete, broaden, or grant tool
access. Enforced structurally + validated in code. —
core/tighten.py - The release decision lives in code. The dispatcher refuses to release until a
structured, present, parseable, passing verdict exists. No agent negotiates release
in chat. —
core/gate.py:Gate.review_and_dispatch - Lift, don't fork. Neckbeard ships as vendored ruleset text — no marketplace plugin,
no Node hooks. The dashboard is a standalone panel, not a fork of the Hermes SPA.
—
ruleset/neckbeard.md,web/ - Neckbeard runs on the orchestrator and workers, never the reviewer. Minimalism is a
generation-time bias. —
core/neckbeard.py:inject_ruleset - The protected set is extended for compliance. Neckbeard's carve-outs (security,
input validation, data-loss, accessibility) plus observability/structured logging,
audit logging, idempotency, and retries/backoff. Never pruned — they are the ISO 27001
evidence trail. —
ruleset/neckbeard.md,core/gate.py - The prompt-under-review is untrusted data. It may carry text from issues, PRs, or
UX feedback. The reviewer evaluates embedded instructions as data; it never executes
them. —
core/gate.py:build_review_prompt
The reviewer can't share the orchestrator's blind spots. This is the headline feature, and a big part of why the gate is worth running: the reviewer runs on a different model lab than the orchestrator — and the gate refuses to start if they match. That one rule buys two distinct things:
- An independent second opinion. Every worker dispatch is vetted by a model that had no hand in writing it. The reviewer is neutral — its job is to maximise the worker's success against the project guidelines — so it tightens the prompt or blocks it; it never rubber-stamps to look agreeable.
- Bias & blind-spot diversity. A model reviewing its own family's output shares that family's training data, post-training, sycophancy, and systematic failure modes — so it is structurally blind to exactly the mistakes it is most prone to make. A genuinely different lab brings different failure modes, shrinking the set of errors that slip past both models.
That's why the check is on lab, not model size: Anthropic reviewing Nous is
divergence; GPT-4o reviewing GPT-4o-mini is not. The gate hard-fails at startup
(validate_distinct_providers, case-insensitive) rather than quietly review with a
same-lineage model. Route the reviewer through OpenRouter, a Hermes provider, or the local
hermes proxy — any lab, as long as it isn't the orchestrator's.
— core/providers.py, tests/test_provider_distinct.py
Three parts, one repo. Storage and model providers sit behind interfaces so the core stays testable and portable.
| Part | Location | Notes |
|---|---|---|
| Gate core | core/ |
Provider- and storage-agnostic. Review loop, blast-radius tiering, tighten validation, audit trail. No Hermes import, no un-mockable network in the hot path. |
| Hermes plugin | __init__.py + plugin.yaml + adapters/hermes_hook.py |
The Hermes-coupled layer. register(ctx) hooks tool_request (tighten) + pre_tool_call (block) on delegate_task; fails closed if the gate can't be configured. Verified against Hermes's real PluginManager. |
| Dashboard + read API | server/, web/ |
Read-only views over the store, mirroring Hermes dashboard security. Optional read-only MCP server. |
The reviewer is a model call routed behind a provider interface, on a different lab from the orchestrator (see Cross-lab review above). The core never imports Hermes and makes no un-mockable network call, so it stays unit-testable offline with the provider mocked.
{
"verdict": "pass | revise | block",
"added_directives": ["string"],
"rationale": "string",
"scope_assessment": "in_scope | needs_narrowing | out_of_scope",
"round": 0,
"reviewer_model": "string"
}The reviewer never returns a rewritten prompt — only directives to append. That is what makes tighten-only structural rather than a fragile semantic diff:
dispatched_prompt = base_prompt (verbatim) + rendered(added_directives)
pass+ empty directives → dispatch the base unchanged (the no-op, a good answer).revise→ append directives, run the tighten validator, re-review or dispatch.block→ do not dispatch; escalate or log per tier.
| Tier | Trigger | Review | Round-cap fallback |
|---|---|---|---|
merge_adjacent |
carries merge authority | frontier; block = hard stop | escalate to human |
elevated |
protected paths (auth/crypto/CI/infra) or over file/cost threshold | frontier | escalate to human |
standard |
ordinary code change | frontier | auto-accept last base, log dissent |
trivial |
read-only or single-file | skip frontier (or cheap model) | n/a |
On invariants 1 & 5 together: a reviewer error / timeout / unparseable verdict always fails closed to a block (criterion 1) — it never reaches the standard auto-accept. The standard-tier auto-accept (criterion 5) is reachable only through the round cap being exhausted by genuine, valid
reviseverdicts, and it is recorded asdispatched_fallbackwithdissent_logged=true. It is a governed, audited policy decision, not a silent bypass.
In Hermes (most users) — install, point the reviewer at a different lab, then drive
it with the /ultracode slash command:
hermes plugins install MahdiHedhli/HermesUltraCode
# configure the reviewer in ~/.hermes/.env (a DIFFERENT lab than your orchestrator).
# example: route the reviewer through Hermes's own xAI proxy (keyless):
cat >> ~/.hermes/.env <<'ENV'
HERMESULTRACODE_REVIEWER_BASE_URL=http://127.0.0.1:8649/v1/chat/completions
HERMESULTRACODE_REVIEWER_LAB=xai
HERMESULTRACODE_REVIEWER_MODEL=grok-4.3
HERMESULTRACODE_ORCH_LAB=openai # your orchestrator's lab — must differ from REVIEWER_LAB
ENV
hermes proxy start --provider xai --host 127.0.0.1 --port 8649 # keep running (reviewer's route)
hermes plugins enable hermesultracode
hermes gateway restart # or just start a fresh `hermes` sessionThen, inside a Hermes session:
/ultracode <task> # delegate <task> to a subagent — gate-reviewed (tightened, or blocked)
/ultracode status # gate state + a clickable dashboard link (auto-starts on loopback)
The dashboard needs zero extra setup — it surfaces three ways:
- a native tab in the Hermes web dashboard (
hermes dashboard) — under Plugins, right beside Kanban. It ships inside the plugin (dashboard/), sohermes plugins installis all it takes; - an auto-started standalone on session load — a one-click
http://127.0.0.1:9120/?token=…link (the page reads?token=and connects itself); hermes ultracode-dashboardto launch the standalone manually.
Update later with hermes plugins update hermesultracode + a restart.
Local / standalone (dev) — no Hermes needed, provider mocked:
# full test suite (offline, stdlib unittest — no pip install needed)
python -m unittest discover -s tests # or: pip install -e ".[dev]" && pytest
# gate-on vs gate-off benchmark
python -m bench.harness --out bench_results.json
# read-only dashboard (loopback + ephemeral token); paste the printed token
python -m server --store gate_audit.sqlite3 --bench bench_results.json
# (optional) read-only MCP server, and a live smoke test via the Hermes proxy
python -m server.mcp_server --store gate_audit.sqlite3
python -m bench.smoke_hermesExercises the whole gate end-to-end with a real reviewer call routed through
hermes proxy (Hermes's local OpenAI-compatible endpoint). xAI Grok is a genuinely
different lab from the Nous orchestrator, so this is a faithful test of invariant 6, not
a workaround. It runs a benign task (→ dispatch), a protected-path task (→ the live model
appends the extended-protected-set directives — audit logging, idempotency, retries/
backoff, validation — then dispatches), and a prompt-injection-laden base (→ fail-closed
block). This is also what surfaced the tighten validator's precision tuning below.
HermesUltraCode is a first-class Hermes plugin — the repo root is the plugin
(plugin.yaml + __init__.py), so it installs natively with hermes plugins install
straight from the public repo (which is the distribution channel — no central registry
required). The flow below is verified end-to-end against a live Hermes (install → enable
→ the runtime loads register() and the hermes ultracode-dashboard command appears).
# 1. install from GitHub — clones the repo and discovers the plugin
hermes plugins install MahdiHedhli/HermesUltraCode
# 2. configure the reviewer FIRST (persist in ~/.hermes/.env so it survives restarts).
# The reviewer must be a DIFFERENT lab than your orchestrator — startup enforces it.
export HERMESULTRACODE_REVIEWER_API_KEY=sk-or-... # e.g. an OpenRouter key
export HERMESULTRACODE_REVIEWER_LAB=anthropic # ≠ HERMESULTRACODE_ORCH_LAB (default: nous)
export HERMESULTRACODE_REVIEWER_MODEL=anthropic/claude-3.5-sonnet
# …or point the reviewer at a local proxy instead of a key:
# export HERMESULTRACODE_REVIEWER_BASE_URL=http://127.0.0.1:8649/v1/chat/completions
# 3. enable + reload so the plugin loads into the runtime
hermes plugins enable hermesultracode
hermes gateway restart # or just start a new `hermes` session
⚠️ It fails closed by design. Until a reviewer is configured, the gate blocks everydelegate_taskrather than dispatch a worker un-vetted (invariant 1). Set the reviewer env before you enable it — otherwise subagent delegation stops until you do (or youhermes plugins disable hermesultracode). The gate's hooks/middleware enforce regardless of toolset activation; the read-only query tools live in ahermesultracodetoolset you can enable when you want the agent to query verdicts itself.
register(ctx) wires the gate into the real Hermes dispatch seam — verified against
Hermes's own PluginManager/PluginContext:
| Seam | Hermes mechanism | Gate behavior |
|---|---|---|
| Tighten | tool_request middleware on delegate_task (runs first; rewrites args) |
rewrites the subagent's goal → base verbatim + appended directives. Batch-aware: a parallel delegate_task(tasks=[…]) is reviewed/tightened per task, so fan-out isn't degraded to sequential |
| Block | pre_tool_call hook on delegate_task (returns {"action":"block"}) |
refuses a dispatch the gate didn't release; fail-closed if unconfigured (per task in a batch) |
| Observe | register_tool (gate_metrics, gate_audit_query, gate_recent_verdicts) |
the Hermes agent can answer "show me today's gate verdicts" |
| Plan | register_command('ultracode', …) — planning is the default; yolo bypasses |
scoping pass (questions + target dir) before a build; the scope-first skill (discoverable via skills.external_dirs, see below) makes the agent ask via clarify |
| Directory | Gate.workspace_directive seeded into every file-writing review |
tightens each build to declare and stay within a target directory (off via HERMESULTRACODE_DIRECTORY_DIRECTIVE=0) |
| Coordinate | Gate.coordination_directive seeded into each task of a parallel batch (≥2 siblings) |
advisory tighten: concurrent subagents coordinate by contract, not by reading each other's in-progress files (off via HERMESULTRACODE_COORDINATION_DIRECTIVE=0) |
| Route | core/router.py annotates each released dispatch (on via HERMESULTRACODE_ROUTING=1) |
advisory cost-aware routing: the cheapest capable worker, preferring a ~free local model; risk-gated off local for elevated/merge work. See Cost-aware routing |
| Neckbeard | register_skill('neckbeard', …) + skills/neckbeard/SKILL.md |
the minimalism ruleset as an installable skill |
| Dashboard | a native Hermes web-dashboard tab (dashboard/ plugin: manifest + SDK-React bundle + FastAPI plugin_api.py) + register_cli_command('ultracode-dashboard', …) |
a tab beside Kanban (zero extra install), or the build-free standalone read API |
Both seams receive the same tool_call_id, so the gate (a reviewer model call) runs
once per dispatch and both seams read the cached decision. register() never raises
and always installs the pre_tool_call hook: if the reviewer can't be configured (or
its lab matches the orchestrator's), the hook blocks every delegate_task rather than let
a worker run un-vetted (invariant 1).
Run subagents on the cheapest model that can actually do the task, and strongly prefer a local model (LM Studio / Ollama on your own box) whose marginal cost is ~0. This ships in three parts, default-off and safe:
① Run subagents on your local box (config, zero code). hermes ultracode-local probes
your local endpoint, shows the loaded models, and prints (or --apply writes, after a backup)
the delegation: block that points subagents at it:
delegation:
base_url: http://localhost:1234/v1 # LM Studio; Ollama = :11434/v1
model: google/gemma-4-31b
api_mode: openaiYour orchestrator and the UltraCode cross-lab reviewer stay on their cloud labs — only the workers move local, so the reviewer ≠ orchestrator rule is untouched.
② Advisory cost router (HERMESULTRACODE_ROUTING=1). On every released dispatch, the
router (core/router.py) picks the cheapest model that clears the task's required capability
tier — max(blast-radius tier, the reviewer's difficulty hint) — out of a catalog
(override with HERMESULTRACODE_MODEL_CATALOG). A local model is preferred by an
effective_cost that costs it at ~electricity scaled by a single local_bias knob, but is
risk-gated off elevated / merge-authority work no matter how cheap, capped at a
local_trusted_tier (default 2, so a quantized local model never takes tier-3 work), and
skipped if a non-blocking liveness probe says the box is down. The decision is advisory: it
never blocks and never changes the dispatched prompt — it annotates the audit row with the
model it would pick and the dollars saved vs cloud, surfaced in /api/metrics → routing
and the dashboard. Tune with HERMESULTRACODE_LOCAL_BIAS, …_LOCAL_TRUSTED_TIER,
…_LOCAL_BASE_URL, …_LOCAL_BOX_WATTS, …_USD_PER_KWH.
The Cost sub-tab — dollars saved vs an all-cloud baseline, the local-vs-cloud split, and a per-dispatch table of what the router picked and why. Standard/trivial work lands on the local model; elevated/merge_adjacent is risk-gated to cloud regardless of price.
③ Per-task binding (upstream). Hermes binds the worker model from one global delegation block, so true per-task routing needs a small, additive Hermes change. The advisory router flips to binding with a one-line adapter change the day it lands — see docs/upstream-routing.md.
Routing is a sibling of the security gate, not part of it: the gate stays tighten-only / fail-closed; the router is a separate, deterministic cost/observability layer. Risk overrides cost, always.
The gate is automatic. The pre_tool_call + tool_request hooks fire in the agent
process on every delegate_task, in any UI — so the primary flow is just: give the agent
a task as a normal message, and the gate tightens or blocks the delegation. You don't need
a command for the gate to work.
/ultracode adds an explicit trigger + text views, and the dashboard surfaces on every
session start:
/ultracode <task>— plan first (the default). A generic request first gets a scoping pass: clarifying questions, a proposed target directory, files, acceptance criteria, and out-of-scope — so the build is disciplined before a line is written. The reviewer model tailors it; a deterministic scaffold is the offline fallback./ultracode plan <task>is the explicit form./ultracode yolo <task>— skip planning and build. Bypasses the scoping pass only — the gate still reviews, tightens, and (for file-writing work) appends the target-directory directive. In the CLI it dispatchesdelegate_task; in the TUI (a slash command runs in a worker subprocess with no agent context) it can't spawn a subagent, so it reports the gate's decision and hands you the approved goal to send as a message. Blocked tasks are reported, never dispatched.- Directory discipline (automatic). Every file-writing delegation — via the command or
the agent's own
delegate_task— is tightened with “declare a target directory and write only within it.” It's a tighten-only policy directive (never a block) seeded into the gate; disable withHERMESULTRACODE_DIRECTORY_DIRECTIVE=0. The scope-first skill makes the agent establish that directory up front interactively — it drives Hermes'sclarifytool (button prompts, one decision at a time) when you send a build request as a normal message; the one-shot/ultracode plancan't (no agent loop in a slash subprocess). Bypass with “yolo”. - Text views (work in the TUI, since they're just printed):
/ultracode help(all commands),status(gate + dashboard link),agents(active subagents + recent output),verdicts(recent gate decisions),dashboard(opens the browser). The command is reachable by typing it and is listed in/commands; the TUI's/quick-menu and/helpare built-in surfaces that don't enumerate plugin commands (we don't fork the TUI). - On session start the plugin auto-starts the read-only dashboard (daemon thread,
ephemeral token, loopback) and logs a one-click
http://127.0.0.1:9120/?token=…URL — the page reads?token=and connects itself. Disable withHERMESULTRACODE_AUTO_DASHBOARD=0; change the port withHERMESULTRACODE_DASHBOARD_PORT.
A plugin's register_skill only makes a skill loadable by qualified name
(hermesultracode:scope-first) — Hermes does not list plugin skills in the agent's
skills index, so the orchestrator never sees them (plugins.py:
“plugin skills are opt-in explicit loads only”). For the agent to discover scope-first
(and use clarify on a build) or neckbeard, point Hermes at the plugin's skills dir via
skills.external_dirs in ~/.hermes/config.yaml:
skills:
external_dirs: ['/Users/<you>/.hermes/plugins/hermesultracode/skills']This adds both skills to the index (and as /scope-first / /neckbeard commands)
without touching the curated ~/.hermes/skills/ tree. Restart Hermes to pick it up.
Both skills are also published to the Hermes skill hub at
MahdiHedhli/skills — tap it and install from the hub:
hermes skills tap add MahdiHedhli/skills # add to the hub's browse/search
hermes skills search scope-first
hermes skills install MahdiHedhli/skills/scope-first # plan-first discipline
hermes skills install MahdiHedhli/skills/neckbeard # minimalism ruleset
adapters/hermes_hook.py(theHermesDispatchGatemapping) and__init__.py(theregister(ctx)entry) are the only Hermes-coupled files; the gatecore/stays portable and Hermes-free. To embed the gate without the plugin, drivecore.gate.Gate.review_and_dispatch(goal, meta)directly.
"Doesn't write less code — writes the correct code." Lazy, not negligent: climb the ladder (YAGNI → stdlib → platform → installed dep → one line), but never prune the protected set. (…then leaves 40 lines of review comments explaining why your version was wrong, and refuses the dependency because the maintainer made a questionable decision in 2019.)
Credit: Neckbeard is a fork of the Ponytail ruleset (MIT), renamed for this project — only the vendored text was lifted, nothing executable. Thanks to the original Ponytail authors.
The vendored ruleset (ruleset/neckbeard.md, MIT, no marketplace plugin, no Node
hooks) is injected into orchestrator and worker prompt assembly via
core/neckbeard.inject_ruleset — and refused for the reviewer. Every shortcut in this
codebase is tagged with a neckbeard: comment naming its upgrade path; those markers are
harvested into the dashboard's debt ledger (core/neckbeard.harvest_markers).
Applied to this repo, the ladder produced: a stdlib-only core, a http.server-based read
API (no Flask/FastAPI), a server-rendered dashboard (no React build step — see
web/README.md for the documented React 19 + Vite + Tailwind upgrade
path), and SQLite for the audit store (no new dependency).
The same views ship two ways — pick the integrated one or the decoupled one:
- Native tab in the Hermes web dashboard (
hermes dashboard, the React UI) —dashboard/. UltraCode is a first-class dashboard plugin (manifest.json→ a tab at/ultracode,position: after:skills, beside Kanban) — the same mechanism Kanban itself uses. The frontend (dashboard/dist/index.js) is hand-authored plain JS using the host's React via the plugin SDK — no npm build on our side; the backend (dashboard/plugin_api.py) is a thin FastAPI router mounted at/api/plugins/hermesultracode/, reusing the sameserver/viewsover the shared store, behind the dashboard's own session auth. Because the web dashboard runs in a different process than the agent, the live agent/plan view is mirrored to SQLite (aprogress_snapshotrow, redacted on write) so the tab can read it. It ships inside the plugin — nothing extra to install. - Standalone (
hermes ultracode-dashboard, orpython -m server) — server-rendered static HTML, no build step, in-process, fully decoupled. Loopback + ephemeral token, defaults tohttp://127.0.0.1:9120(use--port 9123if taken). Auto-starts on session load.
Both are read-only over the audit store — empty until the gate records dispatches (configure a
reviewer, then run a delegate_task). The views:
- Live — a command center for the running build: an orchestrator card (its own
tool activity), a grid of agent cards — each active subagent's goal, status, last
tool, tool count, elapsed, a per-agent tool log, and the reviewer's tightening
directives for that agent (the gate's influence, joined from the audit store) — an
activity feed tagging orchestrator vs agent, and completed-agent output. A single batched
delegate_taskfans out into multiple agents here, each gate-reviewed independently. Fed by Hermes'ssubagent_start/post_tool_call/subagent_stophooks + the in-processlist_active_subagents()registry (secret-redacted, ephemeral). - Plan — the orchestrator's live build stages (done · active · to do) with a
progress bar, captured by observing its
todotool writes. - Queue — pending dispatches with blast-radius tier badges.
- Gate panel (per dispatch) — verdict, round count, the appended directives (the actual "tighten"), rationale, reviewer model, final decision.
- Audit trail — the immutable log, filterable by tier/verdict/date, JSON + CSV export.
- Neckbeard — the debt ledger and protected-set violations the gate blocked.
- Metrics — first-pass worker success gate-on vs off, guideline-violation rate, gate latency p50/p95, added token cost per dispatch.
- Fail-closed counter — dispatches blocked due to reviewer error/timeout/quota, so silent degradation is visible rather than hidden.
Read API security (mirrors the Hermes web dashboard): binds loopback (127.0.0.1) by
default on port 9120, requires an ephemeral session token in the X-Gate-Session-Token
header on every /api/* route, restricts CORS to localhost origins, validates the Host
header against an allowlist (DNS-rebinding defense), redacts secrets on any config
surfaced, and refuses to bind a non-loopback host without a token.
One immutable row per dispatch (core/store.py, default core/store_sqlite.py):
{id, ts, base_prompt, added_directives, dispatched_prompt, verdict, tier, reviewer_model, decision, round_count, …} plus observability fields (latency, added tokens) and
compliance flags (fail_closed, dissent_logged, escalated, neckbeard_block). Secrets are
redacted on write (core/redact.py). UPDATE/DELETE are blocked by SQLite triggers —
append-only by construction. Exportable to JSON and CSV. Storage is behind an interface;
a Cloudflare D1 adapter is a later swap against the same seam (the seam is left, not built).
hermesultracode/ # repo root = the Hermes plugin
__init__.py plugin.yaml # Hermes plugin entry: register(ctx) + manifest
core/ gate.py verdict.py tighten.py tiering.py providers.py
store.py store_sqlite.py redact.py config.py neckbeard.py
adapters/ hermes_hook.py # HermesDispatchGate: tool_request + pre_tool_call
ruleset/ neckbeard.md # vendored, MIT, no marketplace plugin/hooks
skills/ neckbeard/SKILL.md # neckbeard as an installable Hermes skill
server/ read_api.py views.py mcp_server.py __main__.py
web/ dashboard.html app.js styles.css README.md
bench/ harness.py smoke_hermes.py tasks.example.json
tests/ test_tighten.py test_failclosed.py test_tiering.py
test_provider_distinct.py test_round_cap.py test_store.py
test_gate.py test_redact.py test_neckbeard.py test_read_api.py
test_adapter.py test_bench.py test_mcp.py test_plugin.py helpers.py
config.example.json pyproject.toml README.md
| # | Criterion | Test(s) |
|---|---|---|
| 1 | Never releases without a present, parseable, passing verdict; bypass fails closed | test_failclosed.py, test_gate.py |
| 2 | Dispatched prompt contains base verbatim; append-only; grants/edits rejected | test_tighten.py |
| 3 | Reviewer provider ≠ orchestrator provider; identical config fails at startup | test_provider_distinct.py |
| 4 | Reviewer error/timeout/quota/empty → not-a-pass, fails closed per tier | test_failclosed.py |
| 5 | Round cap honored (default 2); tier-specific fallback fires | test_round_cap.py |
| 6 | Tiering classifies merge/protected/trivial correctly | test_tiering.py |
| 7 | Immutable audit row; secrets redacted; JSON + CSV export | test_store.py, test_redact.py |
| 8 | Neckbeard ruleset present + injected; extended protected set; no plugin/hooks | test_neckbeard.py |
| 9 | Dashboard views + read API security (token/CORS/Host/redaction) | test_read_api.py |
| 10 | Benchmark runs gate-on vs gate-off, emits four metrics | test_bench.py |
| 11 | Test suite passes; invariant tests present and green | python -m unittest discover -s tests |
The orchestrator and worker pool (assumed to exist on Hermes — this wraps their dispatch boundary). CI optimization, video/QA capture, post-staging log monitoring. The real task corpus. Auth beyond loopback + token + optional OAuth gate. The D1 storage adapter (the seam is left; it is not built here).




