Skip to content

feat: contracted-grid auth + self-routing commands + persona persistence/cognition fixes#1726

Merged
joelteply merged 32 commits into
canaryfrom
feat/persona-seed-self-heal
Jun 22, 2026
Merged

feat: contracted-grid auth + self-routing commands + persona persistence/cognition fixes#1726
joelteply merged 32 commits into
canaryfrom
feat/persona-seed-self-heal

Conversation

@joelteply

Copy link
Copy Markdown
Contributor

The full stretch since canary (32 commits, each individually validated). Three coherent bodies of work, all on headless Rust, all proven live where applicable.

1. Self-routing command infrastructure

DynCommand object + ActionCommand base trait → stateless self-registration, one dynamic registry (no switch/list duplication), commands/list. Procedural param adaptation + schema exposure so every interface (cu CLI, persona tools, SDKs) adapts from one source. Identity flows into Ctx (listed == callable, per identity). Pure-Rust cu start/stop + client; legacy Node start orchestrator quarantined. Docs: COMMAND-ORGANIZATION, BUILD-AND-PACKAGING.

2. The contracted grid — capability-grant auth (READY FOR COMPUTE)

End-to-end signed-grant authorization so the grid can sell compute. Proven E2E with two real airc peers (issue → present → verify → run; tier-deny holds without a grant):

3. Persona persistence + cognition (Asha, proven live)

  • Persistent identity: seed::ensure_seed self-heals the seed on every bootstrap + preserves birth time; regression test pins write-path == resumer-scan-path. Live: Asha resumes as herself (resumed_count=1, same id 90e758b2, 12 engrams intact) across a restart.
  • Reasoning separation: TextGenerationResponse.reasoning + extract_reasoning strip <think> at the adapter boundary (server reasoning_content → inline split → unclosed-runaway → empty text). Fixed the leak where the persona dumped its whole chain-of-thought; reasoning captured for the harness, room sees clean text.
  • Thinking toggle: ThinkingMode + Qwen3 /no_think soft-switch; the local unsloth reasoning gateway defaults to Suppress (env override UNSLOTH_THINKING=on). Live: Asha answers clean + correct ("144", "Blue.", "4:30pm").

Validation

Workspace cargo check clean; touched-module sweep green (ai::openai_adapter 11, routing::grid_capability 5, epoch_watermark 4, persona::seed 8, citizen_path 6, grant_issuance 2, grid_trust_policy 7, command_handler 15, …); E2E + persona integration tests green; all three live behaviors proven on the rebuilt core.

🤖 Generated with Claude Code

joelteply and others added 30 commits June 21, 2026 16:39
…outing foundation

The routing-side erasure + the first base-trait shape, so a command becomes a
self-contained routable object and a command author writes only a `run` body.

- `DynCommand`: object-safe, type-erased command the kernel can hold in a flat
  `name -> Arc<dyn DynCommand>` map and route to directly (no per-module match
  arm, no prefix double-routing). Blanket impl makes EVERY `CommandHandler` a
  `DynCommand` for free; `invoke` delegates to the existing `dispatch`, so the
  routing side and the typed authoring side share one `CommandSpec` and can't
  drift.
- `ActionCommand`: fire-and-forget verb shape with blanket `CommandSpec` (Bare
  wire) + `CommandHandler` impls. Implementing the shape IS implementing the
  command — the chain `ActionCommand ⟹ CommandSpec ⟹ CommandHandler ⟹
  DynCommand` means declare the shape, get the routable object. Cross-cutting
  policy (`ACCESS` default AiSafe) is declared per command, not re-implemented.

Validated against two outliers in isolation (not yet wired into the executor
hot path): a stateless action (ping-shaped, captures no deps) and a stateful,
dep-holding action (owns an Arc'd counter, tightens ACCESS to Privileged) —
both route identically through the type-erased object. Error-mapping at the
erased boundary preserved (bad params → named `invalid` refusal).

Anchoring design: docs/architecture/COMMAND-ORGANIZATION.md — self-routing map
(typed-path-wins, prefix/ServiceModule fallback during migration), composition
via `ctx.call` through the same chain, and machine+environment-agnostic
execution (cross-tower routing + `Provided` adapters) with latency as a
first-class constraint.

Slice 1 of #42. Next: boot-time command_map + executor consult (typed path
wins, fallback preserved), then QueryCommand/CrudCommand/SessionCommand.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…re-Rust `cu` client

Slice 2 of #42 — the DynCommand object map is now consulted on EVERY live
dispatch path, and `ping` is migrated end-to-end onto it (proven via a pure-Rust
`cu ping`, no Node).

Runtime wiring (typed-path-wins, prefix/ServiceModule fallback preserved):
- `ServiceModule::commands()` — default-empty hook; a module contributes its
  self-routing DynCommand objects (each owns its deps), so the kernel routes a
  name straight to the object with no per-module match arm.
- `ModuleRegistry` — `command_objects: name -> Arc<dyn DynCommand>` map, populated
  at register() from each module's commands(), with a duplicate-name panic
  (the registry is the backstop the "no central list" design removes). New
  `route_object()` (O(1), lock-free) + `list_command_objects()`.
- `dispatch_object_with_panic_guard()` — catch_unwind guard for object dispatch,
  mirroring the module path (persona tool calls converge here).
- Consult added to ALL three live paths: `CommandExecutor::execute_inner`,
  `Runtime::route_command` (the IPC/`cu` socket route), and `route_command_sync`
  (rayon). Object map wins before prefix routing. (Unifying these three into one
  path is the COMMAND-ORGANIZATION.md follow-up.)

ping migrated: `PingCommand` is now an `ActionCommand` (one type + a `run` body;
CommandSpec/CommandHandler/DynCommand all blanket-derived), removed from
HealthModule's command_prefixes and match arm, exposed via commands(). Off the
prefix table, onto the typed object map.

`cu` — the pure-Rust CLI client (`src/bin/cu.rs`), replacing the legacy Node
`./jtag`. `cu <command> [json]` dispatches through the SAME uniform Connection
every client uses (CLI/persona/web/mobile), over the core IPC socket via
CoreIpcTransport (same transport as continuum-mcp). No tsx, no bundle, no Node.

Validated live: built the core + cu directly with cargo (no npm start), ran the
core on its socket, `cu ping` → {"ok":true,"roundTripMs":0} through route_object.
Plus 10 unit/integration tests green (blanket-chain outliers, registry routing,
executor + health typed-path).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…start is THE start

Move-first excision of the Node `npm start` poison (Joel: "move shit first,
compilation blows up, makes it easy to find all the smell").

- `git mv tools/scripts/parallel-start.sh → legacy/node-startup/`. `legacy/` is
  NOT a cargo workspace member, NOT referenced by any npm script / Dockerfile /
  CI workflow, with a README marking it dead and off-limits to editing.
- Both `start` scripts now point at the EXISTING pure-Rust `start-server.sh`
  (root package.json already did; src/package.json was the poison path →
  parallel-start.sh). Dropped the `desktop:legacy` pointer. Verified no live
  (non-comment) consumers of parallel-start.sh remain.
- `start-server.sh` now also builds the `cu` CLI client alongside continuum-mcp,
  so the headless start produces core + mcp + cu — pure Rust, no Node.
- .gitignore: add `tools/models/` (the current voice/avatar model download path;
  the workers→core/tools restructure left the old `src/workers/models/` rule
  stale, so large model binaries were no longer ignored).

`npm start` (from root or src) is now the headless Rust core via cargo run; the
Node orchestrator that broke on stale `cd workers` / scene-gen is out of the path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…deterministic)

The "working persona on the clean command infra" regression guard, with zero live
deps (no inference, no airc, no models). A persona's CommandToolExecutor routes a
`ping` tool call through the uniform Connection → InProcessTransport →
CommandExecutor → execute_inner → route_object → the ping DynCommand (migrated via
ActionCommand, off the prefix table), and the bare PingResult comes back.

Proves the command-infra cleanup didn't break the persona's ability to ACT, and
that the self-routing typed path serves personas — not just internal callers. The
existing suite covered the prefix path (test/echo); this covers the new object
path end-to-end on the persona's real dispatch route.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ommands

`cu` is now the reliable upstart, not just the client (Joel: "make it cu",
"reliable upstart", "cu-driven start"):

- `cu start` — locate `tools/scripts/start-server.sh` (env override or walk up
  from cwd), spawn it in its OWN session (setsid) so the core outlives the CLI,
  log to /tmp/continuum-core-start.log, write a pidfile, and poll `ping` until the
  core is ready (or fail loud with the log tail). Idempotent: no-op if a core
  already answers. start-server.sh stays the pure-Rust implementation detail
  (cargo run, per-platform GPU features, no Node).
- `cu stop` — SIGTERM the recorded process group (setsid made the core a group
  leader, so cargo + core are reaped together), pkill fallback if no pidfile,
  remove the socket.
- `cu <command> [json]` — unchanged dispatch through the uniform Connection.

Validated live: `cu stop` (clean) → `cu start` (core ready in 28s, detached) →
`cu ping` {"ok":true} → `cu start` idempotent ("already running"). No npm, no
Node, no manual launch/poll dance.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…sting/packaging)

The strict, design-first contract that makes startup + testing reliable (the two
weakest links) so personas-on-airc is a dependable foundation to build clients on
and roll out as dockerized/k8s nodes (Joel: "adherence to strict principles and
design first").

Defines: the foundation thesis (install → start → personas on airc → iterate WITH
them → clients on top → dockerized nodes → k8s); 8 strict principles (one
pure-Rust startup, headless core + equal clients, modular units = build units =
containers, deterministic layered testing, no Node in the foundation, move-first
excision, and a SINGLE DYNAMIC command surface — cu calls every command, no
duplicated lists / switch-on-name); the modular unit table (core/mcp/cu/inference/
livekit/unsloth/clients); the cu-driven startup; the three test layers; the
Docker/k8s rollout shape (existing compose + per-unit Dockerfiles → continuum node
as the k8s unit); the Node boundary (web client only); status + next slices.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ts (13GB) were unignored

The cargo workspace root is the repo root, so `cargo run --manifest-path
core/continuum-core/...` (start-server.sh / cu start) builds into /target — and
.gitignore had NO `target` pattern at all (the workers→core restructure left it
uncovered), so 13GB of build artifacts were staged-able. Add /target/ and
**/target/. Canonical build target stays $HOME/.continuum/cache/cargo-target.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…emony, single-source discovery)

Kills the "every command needs a host module to expose it" friction and makes the
command catalog dynamically discoverable from the ONE registry.

- `register_stateless_command!(T)` — a stateless command (no deps, `Default`)
  self-registers BOTH its static descriptor AND a runtime constructor via
  inventory. `ModuleRegistry::new()` seeds the typed object map from these
  (`stateless_command_objects()`), so the command is live on the typed path with
  NO host module, NO `commands()` override, NO match arm. Dep-holding commands
  still come from a module's `commands()` (their deps must be constructed).
  Duplicate-name panic guards both paths.
- `commands/` tree (per COMMAND-ORGANIZATION.md): self-contained command files,
  no central list. First inhabitant: `commands/catalog.rs`.
- `commands/list` — dynamic, single-source command discovery: returns a snapshot
  of `command_registry()` (name, description, access, wire, params type), optional
  name filter. Clients/trays/cu never hardcode a catalog — they call this and
  adapt. It's itself a zero-ceremony stateless command (dogfoods the mechanism).
- `ping` migrated to `register_stateless_command!` — dropped HealthModule's
  `commands()` override; ping is now a pure stateless command, no ceremony.

ts-rs bindings generated (protocol/typescript/commands/). 14 tests green incl.
the catalog self-listing + filter, ping still routing via the typed object map,
and the persona executing ping through it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, no per-command code

The CLI edge of the uniform param-adaptation principle (Joel: "ideally it
naturally translates or adapts all params, so meet humans or AIs in the middle at
every interface… automatically though, not switch statements, procedural").

`cu <command> [args]` adapts params with ONE generic rule for ALL commands — never
a per-command switch:
- nothing → `{}`
- a single positional JSON object/array → verbatim (the AI / tool-call path)
- `--key value` / `--flag` → a JSON object built by one loop: keys normalized
  kebab/snake → camelCase (`--round-trip-ms` → `roundTripMs`, matching the
  canonical wire fields), values coerced by trying JSON first (`5`→number,
  `true`→bool, `{…}`→object) then falling back to string, bare flag → true.

So a human types `cu ping --message hi` and the typed command receives
`{"message":"hi"}`; an AI sends the JSON object directly; both hit the same
command. Schema-AWARE coercion/validation lands when the registry exposes param
JSON schemas via commands/list — same single source, every interface adapts.

Validated live: `cu ping --message hi` → {"ok":true}; `cu commands/list --filter
commands/` → the live catalog. 2 adapter unit tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e adapts (symmetry)

Each command's params now carry a JSON Schema derived AUTOMATICALLY from the Rust
type, so every interface handles any command from the one canonical schema — "all
SDKs automatically handle the rust command, across environments, symmetry" (Joel).
It's adapters everywhere over one source, thin code each.

- `CommandSpec::params_schema()` — provided method, default `Null`. The base
  traits override it to derive the schema via `schemars` (ActionCommand::Params:
  JsonSchema). So a command declared (or ported) onto a base trait gains a real
  schema with ZERO extra code; manual CommandSpec impls stay `Null` until migrated
  — breakage-free. CommandDescriptor carries `params_schema`.
- The adapters (one schema → each paradigm):
  - AI / RAG: `persona_tools` projects the schema into the tool `input_schema`
    (was an open object — the reasoner now sees real fields).
  - cu / CLI: `cu <cmd> --help` renders the manual as bash flags (property →
    --kebab, type, description, required) from the same schema — "the manual
    matches the paradigm." Plus the existing procedural `--key value` adapter.
  - web / mobile / RAG: `commands/list` returns `paramsSchema` to build forms /
    tool-schemas — single source, no per-command code.
- schemars dep added (uuid1). PingParams/CommandsListParams/EchoParams derive
  JsonSchema.

Validated live: `cu commands/list --filter ping` returns the derived schema
(Option<String> → ["string","null"], doc-comments as descriptions); `cu ping
--help` renders `--message <string>`. 12 tests green (schema projection, catalog,
cu adapters + help renderer).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tity, every interface

Threads the authenticated caller through the typed dispatch path so a command can
gate/scope/compose BY identity — and makes discovery (commands/list) honor it, so
"what's listed == what you can call" holds at the CLI and persona, matching the
call gate that was already enforced. Cross-grid identity (airc-verified sender)
keeps flowing the same way.

- `caller_trust(caller)` — ONE source for the caller→trust rule (local/substrate →
  Owner; airc-sourced → Provisional ceiling). `GridTrustAuthPolicy::gate` refactored
  to use it (behavior preserved; tests green) so the gate and any trust-aware
  consumer can't drift.
- `Ctx.caller: Option<CallerIdentity>` threaded via `dispatch_with_caller` →
  `DynCommand::invoke(params, caller)` → `dispatch_object_with_panic_guard`. The
  executor passes the identity it just gated on (persona / cross-grid airc sender);
  local in-process + IPC pass `None` (owner). Module `handle_command` path
  unchanged (legacy, owner-local).
- `commands/list` filters by `caller_trust(ctx.caller)` + `is_command_authorized` —
  the SAME rule the gate uses. Local owner sees all; a Provisional persona/peer sees
  only its authorized surface. Test: provisional ⊆ owner, and every listed command
  is callable at its trust.

Status: identity is now available in Ctx for handlers and gates discovery. Full
composition-propagation (a handler re-dispatching via `ctx.call` as the same
caller) is the next step — the caller is now in Ctx to enable it. Gating the local
IPC path through the executor is NOT needed for correctness (local == owner by
policy); it'd only matter for scoped local identities (future).

19 tests green (gate refactor, identity-gated list, command/handler/catalog/persona).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ty-propagation status

A command composing another propagates the ORIGINAL caller via
`execute_with_caller(sub, params, ctx.caller.clone())`, and the gate enforces that
caller's trust on the sub-call. New test `composed_call_propagates_caller_no_
escalation`: an airc/Provisional caller composing into `data/delete` (Owner-only) is
gate-FORBIDDEN; the local owner passes the gate — no escalation, identity flows
through composition (and, by the same mechanism, across the grid via the airc-
verified caller).

COMMAND-ORGANIZATION.md updated to state the real status: identity propagation
works today (ctx.caller + execute_with_caller); the typed `ctx.call::<C>(p)` sugar
(an executor handle on Ctx so a handler can't forget to pass ctx.caller) is the
remaining ergonomic follow-up on this foundation.

19 executor tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…y command

`system/info` (version + pid) as a stateless ActionCommand in its own file:
register_stateless_command! and it's instantly callable via cu/persona/SDKs with a
derived param schema + ACL gating, no wiring elsewhere. The "minimal code per
command" shape the ported catalog will follow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…composition test, schema/edge notes

Two adversarial reviewers (security + correctness) audited the branch. Security
verdict: NO exploitable escalation; gate refactor behavior-preserving. Fixes for
the actionable findings:

- H3 (perf): `command_registry()` was rebuilding the descriptor Vec + running
  schemars reflection (schema_for!) for EVERY command on EVERY call — so
  commands/list and the persona tool surface were O(commands × reflection) per
  call. Now built once into a OnceLock and cloned out.
- H1 (cu correctness): `cu cmd --key=value` was mis-parsed into a junk
  `{"key=value": true}` key. Now splits on the first `=` (both `--key value` and
  `--key=value` work). Extracted `coerce()`. Test added.
- L1 (test): the composition test only exercised the gate. Replaced with a REAL
  composing handler (`Composer: ActionCommand`) that composes `data/delete` with
  `ctx.caller.clone()` — proving identity propagates through a handler and an
  airc/Provisional caller can't escalate (owner can).
- L2 (test): commands/list identity-gating test now asserts the Provisional surface
  is non-empty (subset check no longer vacuous) and ≤ owner surface.
- M3 (schema): cu `--help` renders a nested-type `$ref` as its type name instead of
  `<value>`; persona_tools tool_input_schema_from carries a TODO for nested
  $defs/$ref (latent — all current params are flat).
- M2 (doc): caller_trust carries a TODO that every airc caller maps to Provisional
  (Blocked peers not yet distinguished — needs the airc↔grid trust bridge).
- C1 (doc): the typed object path on the IPC route deliberately bypasses per-MODULE
  metrics/concurrency (objects are module-independent); documented at the site +
  flagged per-command observability as the command-framework's slice.

Not fixed here (reported, tracked): TCP IPC listener treats remote connections as
local Owner (pre-existing CRITICAL, config-gated to 127.0.0.1 by default — needs a
non-Owner caller for TCP-sourced requests); composition propagation is
author-discipline (the `ctx.compose` helper that forces it is the next slice);
AllowAllPolicy default (one refactor from bypass — consider GridTrust default).

55 tests green (52 lib + 3 cu).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nticated-Owner hole

Adversarial security review found: the TCP IPC listener funneled into the same
`handle_client` → `route_command(caller=None)` = local Owner, UNGATED. With the
Docker `0.0.0.0` bind, anyone who could open the port got unauthenticated Owner
command execution (data/delete, grid/trust, …). Pre-existing, but `route_object`
now rides that path too. Closed:

- New `CallerSource::Tcp` (honest provenance — an unauthenticated remote socket,
  distinct from airc's verified envelope) + `CallerIdentity::tcp(peer_id)`.
- `caller_trust(Tcp)` = Provisional ceiling (remote, never Owner) — same one-source
  rule the gate uses. So TCP can run the AiSafe surface + ai/generate but is
  FORBIDDEN every Owner-gated command.
- `handle_client` now takes the connection's `caller`: the Unix socket passes
  `None` (owner-by-locality — the operator on the box), the TCP listener stamps
  `CallerIdentity::tcp(nil)`. A boundary ACL-gate (`caller_trust` +
  `is_command_authorized`) refuses Owner-gated commands for remote callers before
  dispatch. The caller is also threaded into `Runtime::route_command` so the typed
  object path / composition sees the REMOTE identity (no escalation via a composing
  command over TCP), not silently Owner.

Unix-socket behavior unchanged (local owner). Test: `tcp_caller_is_remote_not_owner`
(Provisional, ai/generate allowed, data/delete|grid/trust|grid/pair forbidden).
58 lib tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…open-TCP residual

Second adversarial pass on the TCP fix: verdict = Owner-execution hole genuinely
closed, no bypass. Addressing the two residual risks it surfaced:

- Default-AiSafe migration footgun: destructive `data/*` are safe only because
  unregistered (unclassified→Owner default-deny). `ActionCommand` defaults ACCESS
  to AiSafe, so migrating one to a command object and forgetting
  `const ACCESS = Privileged` would silently expose it at Provisional (i.e. over
  TCP / to cross-grid peers). New regression test
  `destructive_data_commands_stay_owner_only` (data/delete|update|truncate|
  clear-all) trips CI if that ever happens.
- Open-TCP residual: documented at the TCP listener that the Provisional AiSafe
  surface (arbitrary data/list reads, chat/send writes, ai/generate) is reachable
  UNauthenticated over a non-loopback bind. TODO(authenticated-tcp): shared-secret /
  signed handshake (+ optional sub-Provisional read-only ceiling) before relying on
  0.0.0.0; pairs with the airc↔grid per-peer trust bridge. Until then: don't bind
  0.0.0.0 on an untrusted network.

acl tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…(NodeRegistry is wrong key space)

The airc↔grid trust bridge mechanism (task #38): a gate that resolves a remote
caller's REAL grid TrustLevel instead of the flat Provisional ceiling — built as a
validated SEAM, but deliberately NOT wired in production yet.

- `PeerTrustSource` trait (airc peer_id → TrustLevel) — the abstraction the gate
  depends on, so it's not coupled to any concrete store; mock-tested.
- `GridTrustAuthPolicy::with_trust_source(..)` + `resolve_trust`: a remote
  (Airc/Tcp) caller's registered trust CAPPED at Trusted (REMOTE_TRUST_CEILING —
  Owner is local-only, a remote peer can never reach Owner-gated commands);
  Blocked → denied; unknown → Provisional. `new()` keeps the flat ceiling.
- Test `per_peer_trust_bridge_blocks_blocked_and_caps_remote_at_trusted` proves the
  logic with a mock source: Blocked denied everything, Trusted graduated but
  data/delete still local-only, Owner-registered peer capped at Trusted, unknown →
  Provisional.

WHY NOT WIRED: adversarial self-review caught that the grid `NodeRegistry` is keyed
by transport ADDRESS (`address_to_node_id` → Tailscale IP / Reticulum hash), NOT by
the airc `peer_id` the `CallerIdentity` carries — different identity spaces. Wiring
it would silently no-op (every airc caller → "unknown" → Provisional) AND mislead
(grid/trust by address wouldn't gate airc callers). So `NodeRegistry` does NOT impl
`PeerTrustSource`, and the IPC gate keeps `new()` (flat ceiling — honest, zero
behavior change). The seam activates when a real peer_id-keyed airc trust source
exists — the airc↔grid identity unification (task #38).

Net: the gate's behavior is unchanged in production; the bridge is a tested seam
ready for the airc-side trust source. 12 trust/acl tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…y unification (design)

The keystone design (Joel: "unification is everything, do it right"): identity,
authorization, and the grid economy unified into ONE cryptographically-signed
object — airc's `grid_auth` SignedCapabilityGrant / SignedMeshMembership.

A peer doesn't assert who/what it is — it PRESENTS a grant the owner signed, and
the executing node VERIFIES it (issuer-pin → sig → key-binding → mesh → expiry,
stateless) and authorizes iff `grant.grants(command)` (capabilities use the SAME
vocabulary as command names). This DISSOLVES the two-identity-space problem (the
grant binds peer_id + pubkey, verified against the owner's key — no shared trust
store, no address↔peer mismatch) and IS the contracted/for-sale grid (a paid grant
= capabilities + expiry, signed, revocable by epoch).

Specifies: the model (membership→tier→ACL + capability→grants(command), one
verifier); the airc primitives (have, public: grid_auth); the continuum gate
integration (verify on dispatch, Owner stays local-only, composition propagates the
verdict); why it's the identity unification; issuance + transport + consumer-side
epoch anti-replay; 3-phase plan (membership-tier → capability grants → economy);
the cross-repo split (airc: envelope transport + issuance; continuum: verifying
gate + epoch store + capability map); open questions.

The continuum gate seam (GridTrustAuthPolicy/resolve_trust/cap) stays valid as the
gate shape — this is what it verifies against. Next: Phase 1 (verify
SignedMeshMembership → tier), a joint continuum+airc slice.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…cted-grid gate core)

Phase 2 core of the identity unification (docs/grid/GRID-CAPABILITY-AUTH.md): the
continuum-side engine that verifies an airc `SignedCapabilityGrant` and authorizes
a command from it. Identity + authorization + contract = one signed object.

- `Ed25519GrantVerifier`: impl of airc's `grid_auth::GrantVerifier` using the
  substrate's ed25519 (verify_strict — same primitive as L1-6 envelope sigs).
- `GrantAuthorizer::authorize_command(signed, presenting_pubkey, command, now)`:
  verify via grid_auth (issuer-pin → sig → key-binding → mesh → expiry, stateless)
  → consumer-side epoch anti-replay (reject a superseded lower epoch; revocation =
  higher-epoch empty-caps grant) → `grant.grants(command)`. Returns a TYPED
  `GrantAuthOutcome` (Authorized / Invalid(GrantVerdict) / Superseded / NotGranted)
  so the gate + audit see exactly why.
- The capability vocabulary IS the command vocabulary (`grants("ai/generate")`) —
  no parallel namespace. Owner-gated commands are never delegated (a grant confers
  only its named capabilities).

This is the verification CORE — the heart of the contracted/for-sale gate. It's a
tested SEAM, NOT yet wired to live dispatch: the airc command envelope doesn't carry
grants yet (the airc-side transport slice). When it does, CommandRequestHandler
extracts the grant + presenting key and calls authorize_command from the gate.

4 tests green: valid-grant-authorizes (+ NotGranted for others), typed rejections
(UntrustedIssuer/BadSignature/KeyMismatch — stolen grant can't ride another peer),
epoch anti-replay + revocation, and the REAL ed25519 signature verify + tamper
rejection.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ngine

Adversarial review of be5254d (verdict: crypto/trust-roots/key-binding sound;
not exploitable today — unwired). Must-fixes folded:

- TOCTOU (2.1): epoch check + advance are now ONE atomic `entry()` critical
  section — a superseded epoch can't pass its check while a higher epoch commits
  in the gap. Multi-thread stress test (gated `stress-tests`) proves monotonicity
  under concurrent same-grantee presentation.
- Revocation actually revokes (2.2): the watermark advances on ANY valid grant
  (latest-epoch-authoritative, airc's model), so a higher-epoch empty-caps grant
  supersedes the old real-caps grant. Fixed the test that had enshrined the broken
  behavior — it now asserts the revoked grant returns Superseded.
- Boundary-aware capability match (4): `confers()` matches exact OR on a `/`
  boundary (`ai/generate` confers `ai/generate/stream`, NOT `ai/generatex`) —
  consistent with the command-ACL's prefix rules, never a bare starts_with.
- Test integrity (7.1): verifier is injectable (`with_verifier`); tests now drive
  the REAL `authorize_command` with a stub (no duplicated logic). Added
  malformed-proof reject vectors (wrong-length key/sig).

Hard gates documented before live wiring (2.3/5.1/3.1): persist + bound the epoch
watermark (volatile/unbounded today), and the presenting key MUST come from the
authenticated sender, never the grant body.

6 tests green (5 + stress).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…transport

Adopts airc#1276 (the capability-grant transport continuum's GrantAuthorizer
verifies against): grid_auth::SignedCapabilityGrant::sign, Airc::peer_public_key,
HEADER_AIRC_CAPABILITY_GRANT. Pulls the ~50-commit canary delta since 72824ba —
the ai/generate 5090 compute-lease facility (#1242), the ai/embedding grid
facility (#1239-1241), TranscriptKind::ChannelPurposePublished + channel_purpose
(the typed room-purpose seam for RoomPurposeSource), relay self-election +
stream-plane crypto, and StatusResponse.connected_lan_peers.

ABI deltas are additive to types continuum only decodes (connected_lan_peers is
#[serde(default)]; the new TranscriptKind variant has no exhaustive match against
it), so the bump is decode-compatible.

Validated: cargo check -p continuum-core --features metal,accelerate clean;
routing::{grid_capability, grid_trust_policy, command_handler} tests green
(5 + 5 + 13).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The gate-side half of the capability-grant wiring — the receiving seam a verified
grant flows into, built + tested independently of the airc producer.

- CallerIdentity gains `granted_capabilities`: the capability tags a transport
  boundary CRYPTOGRAPHICALLY VERIFIED for this dispatch (conferred by an
  owner-signed SignedCapabilityGrant, populated ONLY after
  GrantAuthorizer::authorize_command returns Authorized against the authenticated
  sender key). Default empty; `with_granted_capabilities` builder for the boundary.
- GridTrustAuthPolicy::gate adds the contracted-grid fast-path: if a caller's
  verified granted_capabilities confer the command, it's authorized regardless of
  the tier ceiling — the explicit signed contract overrides the coarse default
  trust. Gated on trust > Blocked so a grant can't resurrect a Blocked peer.
- grid_capability::confers is now pub(crate) — the gate re-checks granted caps
  through the SAME boundary-aware match rule (one source of truth, no divergent copy).

Sound because the field is populated ONLY post-verification by a boundary; no
local/Tcp constructor sets it. The airc command handler is the producer (next
slice — needs Airc::own_public_key + owner-key provenance + epoch-watermark
persistence). Until then the field stays empty and the gate is unchanged in
behavior.

Tests: grid_trust_policy verified_grant_overrides_tier_ceiling_for_conferred_command
+ verified_grant_does_not_resurrect_a_blocked_peer; auth_policy (9) + grid_capability
(5) + grid_trust_policy (7) all green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Satisfies the review HARD GATE: the consumer-side anti-replay watermark was
in-memory + unbounded, so a node restart reopened the entire replay window (a
peer could re-present a grant the owner already superseded). The grid expects
mundane restarts, so this must be durable + bounded before grants gate live
traffic.

- New EpochWatermarkStore trait (routing/epoch_watermark.rs) behind which the
  anti-replay state lives:
  - InMemoryEpochWatermark — DashMap, atomic per-grantee (default for tests).
  - SqliteEpochWatermark — durable (survives restart), bounded (evict_older_than
    drops entries no live grant could reference, expiry-aligned by updated_at_ms).
    Atomic check-and-advance runs in a serialized write transaction via
    spawn_blocking, off the async executor (substrate concurrency style).
- GrantAuthorizer holds Arc<dyn EpochWatermarkStore>; authorize_command is now
  async and consults the store. new() keeps in-memory; with_watermark() /
  with_verifier_and_watermark() inject the durable store for the live path.
  A store error fails CLOSED → GrantAuthOutcome::WatermarkUnavailable (deny),
  never authorizes a grant whose replay status is unknown.
- VerifyContext (holding a non-Sync &dyn GrantVerifier) is scoped to drop before
  the await so authorize_command's future is Send — required for the
  multi-threaded handler runtime.

Tests: epoch_watermark anti-replay on BOTH impls + durability-across-reopen +
bounded eviction (4); grid_capability decision path migrated to async (5);
both stress concurrency proofs through the REAL SQLite path. Full routing suite
260 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The producer half of the capability-grant wiring — built + tested against the
gate seam, not yet installed on the live boot path (that needs the GrantAuthorizer
constructed from airc identity + mesh + a durable watermark, next slice).

- CommandRequestHandler gains an optional GrantAuthorizer (with_grant_authorizer);
  new() keeps the tier-only default (grants ignored).
- parse_envelope decodes the optional base64 HEADER_AIRC_CAPABILITY_GRANT into a
  typed SignedCapabilityGrant (ParsedEnvelope.presented_grant). A present-but-
  undecodable header is surfaced loudly, never silently dropped.
- process_request verifies a presented grant via the authorizer against the
  AUTHENTICATED sender key (airc.peer_public_key(sender) — the enrolled key from
  the same registry that signature-verified the envelope, NOT the grant's self-
  asserted grantee_pubkey: the review's hard gate #3). On Authorized, the grant's
  conferred capabilities ride into the gate via CallerIdentity::with_granted_
  capabilities; otherwise the caller falls back to tier gating.
- Dispatch refactored into dispatch_request(executor, parsed, caller); the static
  process_request_via keeps its exact prior behavior (plain authenticated caller,
  no grant) for tests + the LocalGridTransport fixture.

Tests: parse_envelope decodes a presented grant + rejects a malformed grant header;
the no-grant default stays None. command_handler suite 15 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…sona path

Closes the loop: every persona that boots now VERIFIES presented capability
grants. A visiting peer presenting an owner-signed grant gets the conferred
command past its tier ceiling; absent/invalid grants fall back to tier gating.

- airc bump a7ae4f4 → 4aa717d (airc#1277): Airc::mesh_identity is now pub.
- build_grant_authorizer(airc, home): constructs the per-persona GrantAuthorizer.
  "This node is the owner": trusted issuer = the node's OWN enrolled ed25519 key
  (self-enrolled at Airc::open — it signs the grants it hands out); expected mesh
  = the node's own mesh (airc.mesh_identity()); anti-replay = a DURABLE
  SqliteEpochWatermark under <persona-home>/grant_watermark.sqlite (survives
  restart — the review hard gate). Typed GrantAuthorizerBuildError; provider≠owner
  (pinned-issuer-key distribution) is the deferred generalization.
- PersonaCommandInboundPump::spawn takes the authorizer and builds the handler via
  with_grant_authorizer. Both PersonaAircRuntime install sites (bootstrap +
  install_command_pump) build it first; a build failure is a typed bootstrap
  failure (PersonaAircRuntimeError::GrantAuthorizerBuild), never a silent
  fall-through to an unverified path.

Validated: cargo check (metal,accelerate) clean; the production-shape
persona_command_inbound_pump integration test passes (persona answers a
tier-gated command through the installed pump + authorizer); routing lib suite
262 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The grantee side of the contracted grid — the SEND half. A node holds grants an
owner issued it and presents them so the owner can authorize otherwise-tier-denied
commands.

- PresentedGrantStore trait + InMemoryPresentedGrantStore (routing/presented_grant_store.rs):
  base64 grants keyed by TARGET peer (the owner that will verify). Latest-wins on
  insert so a re-issued / higher-epoch grant supersedes; sync lookup for the
  outbound hot path.
- AircTransport gains an optional grant store (with_grant_store); on a peer-targeted
  dispatch it stamps the held grant onto HEADER_AIRC_CAPABILITY_GRANT. Room /
  wildcard targets have no single verifier, so nothing is stamped. None = present
  nothing (unchanged tier-gated behavior).

Pairs with the receive path (handler verifies) + issuance (airc#1278 sign_grant +
the grid/grant/issue command, next). Tests: store holds/presents/supersedes per
target; airc_transport suite 16 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…erify → run

The proof the grid is ready for compute. Two REAL airc peers over the loopback
fixture, the production install/gate/send paths — no mocks:

- Owner exposes a tier-DENIED command (compute/echo) behind GridTrustAuthPolicy +
  the inbound pump with build_grant_authorizer.
- WITHOUT a grant: the remote peer is DENIED (the gate holds).
- Owner ISSUES a grant for the grantee conferring exactly compute/echo
  (Airc::sign_grant, airc#1278).
- Grantee PRESENTS it (InMemoryPresentedGrantStore + AircTransport stamps
  HEADER_AIRC_CAPABILITY_GRANT).
- Owner VERIFIES (handler → GrantAuthorizer, against the authenticated sender key
  + durable watermark) and RUNS the command, echoing the params back through the
  full chain.

Both halves asserted: no grant → denied (no auth hole), valid grant → runs (the
grid can sell compute). Also bumps airc 4aa717d → 55790e1 (airc#1278 sign_grant).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The reusable issuance primitive, symmetric to build_grant_authorizer (verify):
issue_grant(airc, issued_at_ms, params) composes a CapabilityGrant + Airc::sign_grant
+ base64, returning the blob a grantee presents. Binds the grantee's AUTHENTICATED
key (from the owner's enrolment), the owner's mesh, and the owner's signature — all
from the one airc handle so issuer / mesh / grantee-key can't drift from what the
verifier checks. Typed IssueGrantError; fail-closed (never returns a partial grant).

Any surface holding an owner airc handle (a persona runtime, a future
grid/grant/issue command) wraps this — the primitive is identity-agnostic; "which
identity issues" stays the caller's decision.

Dogfooded: the end-to-end contracted-grid test now mints its grant via issue_grant
(replacing the inline construction) and still proves issue → present → verify → run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…y grants

The operator front door over routing::grant_issuance::issue_grant — closes the
contracted-grid loop with a command instead of hand-written Rust.

- GrantIssuanceModule (modules/grant_issuance.rs): holds the
  PersonaAircRuntimeRegistry (shared with the instance manager); handle_command
  decodes { issuerPersonaId, grantee, capabilities, expiresAtMs?, epoch? },
  resolves the issuing persona's live airc handle, and returns the base64 grant
  blob to deliver. A non-running issuer is a hard error (it owns the signing key —
  never fabricated).
- Registered at the live boot site (ipc/mod.rs) alongside the instance manager.
- OWNER-ONLY: grid/grant/issue is outside the cross-grid ACL allow-list, so it
  falls to the ""=Owner wildcard — only the local operator can sell its personas'
  compute; a remote peer can never mint grants. Pinned with an acl regression.

Each persona is its own owner selling ITS compute, so the issuer is a persona's
airc identity (issuerPersonaId names which). Tests: malformed-request +
issuer-not-running error paths (2); acl owner-only pin; the happy path is proven
end-to-end in tests/capability_grant_e2e.rs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…self

Persistent identity worked for Asha (her seed.json persona_id matches her live
peer 90e758b2) but two fragilities could re-mint a stranger and orphan her
engrams — exactly how personas-archive/ filled with 9 strangers:

1. The seed was written ONLY on FreshlyMinted, best-effort + non-fatal. A single
   failed mint-write, or a later deleted/corrupt seed, left her home (engrams +
   airc key) on disk but unresumable → next boot minted a stranger.
2. The bootstrap WRITE path (citizen_home_path(..).parent()/seed.json) and the
   resumer READ path (citizens_kind_dir) agreement was untested — that exact
   divergence (resumer hard-coding `personas/` vs `citizens/personas/`) is what
   created the strangers originally.

Fixes:
- seed::ensure_seed(seed_path, persona_id, agent_name, fallback_created_at_ms):
  idempotent upsert that runs on EVERY bootstrap (mint AND resume). Self-heals a
  missing/corrupt seed from the live identity, and PRESERVES created_at_ms from an
  existing seed (her birth time is stable — a naive rewrite would reset her age
  every boot). persona_instance_manager now always calls it (drops the
  FreshlyMinted gate) + the stale `personas/` path comment is corrected to
  `citizens/personas/`.
- citizen_path: regression test pinning seed-write-path == resumer-scan-path for a
  Persona (the stranger-minting bug can never silently return).

Tests: ensure_seed creates-missing / preserves-birth-time-on-resume / heals-corrupt
(3); the path-agreement pin; seed 8, citizen_path 6, instance_manager 5,
resume_or_mint 6 all green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
joelteply and others added 2 commits June 22, 2026 00:29
…apter boundary

Asha leaked her entire <think>… chain-of-thought into the room (and looped inside
it without ever emitting an answer). Root cause: a reasoning model's reasoning was
never separated from its user-facing text. unsloth's /v1 (llama.cpp backend) emits
<think>…</think> INLINE in `content` (verified live: no `reasoning_content` field),
the OpenAI adapter passed it straight into `text`, and the cognition cleaner only
stripped `<thinking>` (with -ing) — which never matched `<think>`.

Fix at the adapter boundary (where the model's output contract belongs):
- TextGenerationResponse gains `reasoning: Option<String>` — reasoning is captured
  (for the glass-box harness + memory) and stripped from `text`, so it can NEVER
  reach the room. Uniform across adapters; ts-rs binding regenerated.
- openai_adapter::extract_reasoning(content, reasoning_content): precedence —
  (1) a server `reasoning_content` field (vLLM-style) wins; (2) inline
  <think>…</think> is split out, answer = text outside the block; (3) an UNCLOSED
  <think> (the runaway loop) yields EMPTY text so the caller refuses to post,
  never leaking raw reasoning. Wired into the response parse; OpenAIMessage now
  reads reasoning_content too.
- Other adapters set reasoning: None (anthropic: extended-thinking is a follow-up,
  doesn't leak; llamacpp: TODO to reuse extract_reasoning if it serves a reasoning
  model locally — not Asha's path).

Tests: extract_reasoning over well-formed / unclosed-runaway / server-field /
plain+empty-think (4); adapter + response_validator suites green (9 + 7); all
TextGenerationResponse constructors updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…hink switch

The reasoning-strip (9d3f895) cleans the OUTPUT; this addresses the INPUT side —
the model thinks on EVERY turn (even "say hello"), burning latency and feeding the
runaway loop. Verified live which mechanism works: the gateway IGNORES
chat_template_kwargs.enable_thinking for this forged model, but Qwen3's `/no_think`
SOFT-SWITCH appended to the user turn works — empty <think></think> + direct answer.

- ThinkingMode { Default, Suppress } on the OpenAI adapter config; format_messages
  appends `/no_think` to the last user message when Suppress (apply_no_think_switch
  — model-specific token owned at the adapter boundary; higher layers stay model-
  agnostic).
- The local unsloth/GGUF reasoning gateway defaults to Suppress (this 4B's thinking
  rambles + loops, and it answers correctly without it). Operator override
  `UNSLOTH_THINKING=on` re-enables it — the reasoning-strip still protects the room.
  Cloud providers keep their default.

Gateway-level for now; per-task/per-request thinking is the follow-up (a recipe
that needs deliberation re-enables it). Tests: apply_no_think_switch targets the
last user turn + no-ops without one; adapter suite 11 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@joelteply joelteply merged commit cd7d655 into canary Jun 22, 2026
6 checks passed
@joelteply joelteply deleted the feat/persona-seed-self-heal branch June 22, 2026 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant