Skip to content

0.6.10 - deterministic waiting, safe self-healing, token controls, domain understanding#2

Merged
divshekhar merged 33 commits into
mainfrom
feat/iris-init-vite-plugin
Jun 18, 2026
Merged

0.6.10 - deterministic waiting, safe self-healing, token controls, domain understanding#2
divshekhar merged 33 commits into
mainfrom
feat/iris-init-vite-plugin

Conversation

@divshekhar

@divshekhar divshekhar commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What this is

The 0.6.10 feature release — live-verified, not 1.0.0 (see plan/V1-ROADMAP.md for the 1.0 gate). Bumps all 10
publishable @syrin/iris-* packages 0.5.0 → 0.6.10.

Highlights (full list in CHANGELOG [0.6.10])

Deterministic waiting (M3) — kills the #1 flake cause (sleeps)

  • New { kind: "settled", quietMs } predicate: passes once network + structural-DOM activity is quiet for quietMs. Ambient
    dom.text/animation churn (count-ups, spinners) is ignored so an animated page still settles.
  • iris_act_and_wait with no until now auto-settles — "act, then wait for quiet" in one call, the documented replacement
    for a fixed sleep.

Self-healing (M5) — heal the locator, never the intent

  • iris_flow_heal apply:true re-replays the healed flow and re-asserts its success consequence; if a rebound locator resolves
    but the consequence no longer fires, the write is refused (status:consequence_broken, file untouched).

Token efficiency (M2) — every read tool self-budgets

  • iris_query gains limit + count_only; iris_network/iris_console gain limit + a cost:{bytes,tokens} hint
    (most-recent-N, non-silent truncation).

Domain understanding (M4)

  • iris_domain now reports mustHold per flow — the success consequence that must hold — so an agent can answer "what are the
    critical flows and what must hold for each?" from the model alone.

Rigor
success-oracle thesis, now enforced at the live-assert layer too).

Reliability

  • Browser observers (network/route/console) now restore the true original global on teardown instead of a bound copy.

Verification

  • Unit: build 10/10, lint 9/9, typecheck 18/18, unit tests 16/16 packages (~978 tests).
  • Live e2e on this build: full Playwright battery 14/14 (Vite/React demo + Next.js 15 + api) → no regression; a new
    new-features-test.mjs spec 7/7 positively exercises the new surfaces in a real browser — including settled resolving
    against the demo's count-up animations (the exact ambient-churn condition the fix targets).

Summary by CodeRabbit

  • New Features

    • Deterministic settling predicates for reliable page waits
    • iris_act_and_wait auto-settles when until is omitted
    • iris_query now supports limit and count_only pagination controls
    • Network and console tools now support limit parameter
    • New iris_domain tool for inspecting app's testable surface
    • New iris init command for guided project setup with Vite/Next/Claude/Cursor integration
  • Bug Fixes

    • Browser teardown now fully restores patched globals to original identities
    • iris_flow_heal re-verifies success consequences; refuses to persist if broken
  • Documentation

    • Updated getting-started guides for new setup command and Vite plugin integration
    • Added token-efficiency guidance for diffed snapshots

divshekhar and others added 30 commits June 18, 2026 02:04
Collapse Iris integration from a 6-step manual process into `npx @syrin/iris
init` plus a dev-server restart, and add a Vite plugin that makes the whole
React integration a single config line.

@syrin/iris-vite-plugin (exposed as @syrin/iris/vite):
- iris() with apply:'serve' so Vite drops it from `vite build` entirely —
  production-leak protection is structural, not a user-managed env gate
- reuses the babel plugin to stamp data-iris-source, and injects a dev-only
  iris.connect() via transformIndexHtml (no entry-file edit needed)
- options: port/session/token/sourceMapping/inject; non-default port is baked
  into the injected connect() url
- integration test runs Vite's real resolveConfig to prove the plugin is
  filtered out of the build pipeline and present in serve

iris init:
- detects framework (Next/Vite/HTML), React major, and package manager
- merges an iris entry into .mcp.json without clobbering, runs the dependency
  install, and for Vite patches the config with iris()
- bail-to-manual safety: only auto-patches the obvious shape, otherwise prints
  the exact paste-in snippet rather than half-editing a build config; Next
  config wrap + layout mount are intentionally manual
- pure modules (detect/mcp-config/vite-config/plan) + a thin injectable IO
  shell, fully unit-tested; --dry-run, --no-mcp, --no-install, --yes, --port

Docs: getting-started.md now leads with the init fast path and recommends the
Vite plugin.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…script

An inline <script type="module"> injected via transformIndexHtml is not run
through Vite's import resolution, so its bare `import '@syrin/iris'` failed in
the browser with "Failed to resolve module specifier". Found by driving the
real demo app with a headless browser.

Serve the connect code as a real module (resolveId + load) referenced by a
<script src>, so Vite resolves the import normally. Verified end-to-end against
the demo: session connects, the react adapter loads, iris_inspect returns the
component source file:line, and a real `vite build` emits zero Iris artifacts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…entry under pnpm

Three issues found by a deep end-to-end stress test (adversarial config shapes +
a faithful npm-tarball install driven by a real browser):

1. A .mcp.json with comments/trailing commas (jsonc — what the docs show and the
   MCP convention) crashed init with an uncaught JSON.parse SyntaxError. Now it
   bails to a manual instruction and never rewrites the file (preserving comments).

2. `iris init --port N` wrote the port into .mcp.json but injected a bare iris(),
   so the browser SDK connected to 4400 while the bridge listened on N. The port
   now flows into iris({ port: N }) in the Vite patch and into the Next/HTML
   connect snippets, so bridge and SDK always agree.

3. The CLI bin silently no-opped under pnpm installs: the bin shim runs the
   symlinked node_modules/@syrin/iris path as argv[1] while ESM import.meta.url
   is the realpath, so the entry guard was false and main() never ran — breaking
   the documented `npx @syrin/iris init` / `mcp`. The guard now also compares the
   realpath of argv[1].

Verified end-to-end: packed the real tarball, installed it into a fresh Vite +
React 19 app, ran `iris init` via the .bin shim, drove a headless browser — the
plugin connected the session and iris_inspect returned src/App.tsx:8 — and a
real `vite build` emitted zero Iris artifacts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oject

The bridge + MCP server is a single process that serves every project, so it
should be registered once — not re-added to a .mcp.json in each repo. `iris init`
now runs `claude mcp add iris -s user` (idempotent: probes `claude mcp get iris`
first, skips if present) instead of writing a project .mcp.json. Per project,
only the SDK (the iris() Vite plugin / connect call) is added.

- new init/mcp.ts: builds the `claude mcp add -s user` command, an existence
  probe, an availability probe, and a manual fallback (printed when the claude
  CLI is absent, with a Cursor-style global-config snippet too)
- run.ts gains a quiet `probe` IO capability; gathers claudeCli + mcpExists
- removed init/mcp-config.ts (the project .mcp.json JSON merge) — no longer used
- --port flows into the global registration too, so bridge and SDK agree
- docs (getting-started, integrate-with-claude-code): register once, globally

Verified end-to-end in an isolated CLAUDE_CONFIG_DIR: init writes the iris
server to the user config (no project .mcp.json), and a re-run reports ALREADY
instead of erroring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`iris init` now registers Iris once, globally, for every supported agent it
detects — not just Claude:

- Claude Code: `claude mcp add iris -s user` (as before)
- Cursor: merges an `iris` entry into ~/.cursor/mcp.json (Cursor has no CLI, but
  its global MCP config is a small dedicated file — safe to merge, unlike
  ~/.claude.json). Idempotent; never clobbers an existing entry; bails to manual
  on unparseable jsonc.

Detection is per-agent (claude CLI present / ~/.cursor exists), so each gets its
own step and only installed agents are touched. If neither is found, a single
manual instruction is printed. --port flows into both registrations. The init IO
gained homeDir() and absolute-path support for writing the global Cursor file.

Verified end-to-end in an isolated HOME: with Cursor present, init writes
~/.cursor/mcp.json (and registers Claude too); a re-run reports ALREADY for both.
Confirmed the real ~/.claude.json and ~/.cursor configs were untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nly flows

Iris now tells an agent, the moment it saves a flow, whether that flow actually
asserts an observable consequence — or whether it will pass even when the
feature is broken. iris_flow_save returns assertions.grade:

  - asserted       — a step or the success end-condition asserts a signal/network
                     consequence (a wrong element can't fake it)
  - presence-only  — only element-presence checks; a locator healed to the WRONG
                     element can still pass (the failure self-healing vendors admit)
  - assertion-free — acts but asserts nothing observable

For non-"asserted" grades it returns a warning telling the agent to add a
consequence assertion (iris_annotate assert-signal / assert-net / success-state).

Grounded in: Fowler "Assertion-Free Testing" (100% coverage, 0 assertions),
Kent C. Dodds "make your test fail", mabl/qate.ai (healed-wrong-locator ships
bugs green), and AI agents agreeing with human pass/fail only ~68% of the time —
so the flow must carry a real oracle, not rely on the agent eyeballing success.

First slice of V1-ROADMAP M1 (intent-anchored assertions). Pure classifier +
7 unit tests; wired into iris_flow_save.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed oracle)

flow.success was checked only by the @syrin/iris-test CI spec runner — the live
iris_flow_replay MCP tool an agent calls never evaluated it, so a replay reported
ok even when the end-condition (the actual intent) was never met.

Move the success oracle (successToPredicate + assertSuccess) down into iris-server
as flow-success.ts so ONE implementation is shared by both the MCP tool and the
spec runner — no divergent oracle. @syrin/iris-test/success-assert now re-exports
it (its 12 tests still pass against the moved code).

iris_flow_replay now, after every step runs clean, asserts the success
end-condition as a real consequence: a signal/net success that never fires makes
the replay status:error "intent not satisfied" even though all locators
resolved — catching the regression a healed-but-wrong locator ships green
(mabl/qate.ai). Dynamic (LLM-output) success fields are skipped, symmetric with
the step layer. A `success` result row is appended so the agent sees the verdict.

V1-ROADMAP M1 slice 2 (corrected: shared oracle, not a duplicate in replayFlow).
Discovered + fixed via the full suite catching a turbo-cached masking failure.
8 new server tests; iris-server 488, iris-test 97; all 16 tasks green uncached.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…before huge reads

High-volume read results now carry cost:{ bytes, tokens } (tokens estimated at
~chars/4) so an agent can re-scope BEFORE spending context on a large body —
re-scope a snapshot (mode:interactive/status or a tighter `scope`) or narrow a
query (`name`/`scope`). Pure helpers in output-budget.ts (estimateTokens,
sizeCost, withSizeCost) reuse the existing JSON-byte idiom; the wrapper is pure
and unit-tested, the handlers just .then(withSizeCost).

Grounded in: by ~step 15 a screenshot/Playwright-MCP agent accrues 60–80K tokens
of stale accessibility-tree data and starts hallucinating selectors that don't
exist; a single screenshot is up to ~4,784 tokens. Giving the agent a cheap
size signal up front is the first lever of V1-ROADMAP M2 (token efficiency).

11 output-budget tests (+8); 493 server tests; all 16 tasks green uncached.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…what changed

After the first snapshot, an agent can pass diff:true to get back only the
delta since its last look of the same scope/mode — mode:delta with added/removed
lines, or mode:unchanged — instead of the full accessibility tree again. A route
change automatically resets to a full snapshot (a cross-page delta would be
meaningless), so the agent never sees a misleading diff.

Why: screenshot/Playwright-MCP agents accrue 60–80K tokens of stale a11y data
over a session and start hallucinating selectors that no longer exist. Returning
only the change set attacks both the token cost AND that stale-context failure in
one move. The agent-facing cost is the MCP result, so the delta is computed
server-side, reusing the same normalize+diff the baseline layer uses.

Pure decision (snapshotDelta) + a route-invalidated, bounded SnapshotCache +
applySnapshotDelta shaper — all unit-tested without a browser. V1-ROADMAP M2.

10 new tests; 503 server tests; all 16 tasks green uncached.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A reproducible, regression-guarding benchmark (runs the shipped sizeCost /
estimateTokens / applySnapshotDelta, not mocks) over a representative 150-row
operational dashboard — the shape that makes full snapshots expensive.

Measured:
  full re-snapshot   = 4,246 tok (16,983 bytes)
  diff (1 row changed) =    60 tok  → 99% saved
  unchanged re-snapshot =  17 tok

Asserts the diff is <10% of a full re-read and ≥90% saved, so the M2 win can't
silently regress. This is the data behind the "token efficient" claim — the
absolute figure uses the ~chars/4 heuristic, the relative saving is robust.
V1-ROADMAP M0 (first numbers; real-app capture is the follow-up).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…efore testing

New tool + pure builder that synthesizes every saved flow and the registered
capabilities into a compact domain model an agent reads BEFORE driving the app:
each flow with its assertion grade and the signals/testids it exercises, coverage
counts, and — the differentiator — GAPS: declared signals/testids that NO flow
asserts (untested intent), plus flows that assert no consequence. A one-line
summary headlines it.

Why: automation "checks the DOM, not the intent" — tests pass while integrated
bugs ship (Bolton; Fowler). Pointing the agent straight at "you declared signal
order:placed but no flow asserts it" is domain understanding + rigor in one read,
instead of crawling the whole app or reading all the source. Reuses the flow
assertion classifier so the two reinforce each other.

Pure buildDomainModel (no IO) + a thin iris_domain tool reading .iris/flows/ +
.iris/contract.json (no browser). V1-ROADMAP M4 first slice.

5 domain-model tests; 511 server tests; all 16 tasks green uncached.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ver the intent)

When two present testids tie at the minimum edit distance, the prior nearest-pick
was an arbitrary lexical tiebreak — exactly how a self-healing rebind lands on
the WRONG element and ships a bug green (the failure mabl/qate.ai admit). Drift
now carries `ambiguous:true` in that case, and the heal proposal layer refuses to
auto-rebind an ambiguous drift: it still surfaces the drift + a nearest candidate
for a human/agent to choose, but never applies a coin-flip.

- protocol Drift gains optional `ambiguous`
- flow-replay: pure `nearestIsAmbiguous` (≥2 candidates at min distance) sets it
- heal.proposeRebindWith refuses when drift.ambiguous (regardless of confidence)

Pairs with the M1 replay success oracle (which catches a wrong rebind at run
time): this stops the wrong rebind from ever being proposed. V1-ROADMAP M5.

6 new tests; 517 server tests; all 16 tasks green uncached.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…grades, heal)

Eight iterations shipped capabilities the human/agent docs didn't mention —
features are invisible without docs. Surfaced across the three read surfaces:

- token-efficiency.md: diffed snapshots section with the measured numbers
  (full 4,246 tok vs diff 60 tok, ~99% saved) + the cost-preview note.
- agent-cheatsheet.md: "read iris_domain first" in Start-here, iris_domain in
  the core tool set, and a token note covering diff:true, cost, assertion grades,
  and ambiguous-heal refusal.
- usage.md: iris_snapshot diff:true + cost documented; new iris_domain reference
  entry (flows + coverage + untested-intent gaps).

Docs only; no code change. V1-ROADMAP M8 (discoverability slice).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… quality)

iris_domain now folds .iris/project.json run history into the model and returns
`riskRanked` (flow names worst-first) plus a per-flow `risk:{ level, reason,
lastStatus }`. Risk is the WORSE of two signals:
  - run history: last run errored/drifted = high; passed-but-with-console/network
    errors = medium; passed clean = low; never run = unknown.
  - assertion quality: an assertion-free flow is medium EVEN when it passed —
    a green run of a flow that asserts nothing is false confidence.

Test-priority order is high > medium > unknown (unvalidated) > low, so the agent
tests the genuinely riskiest journeys first instead of re-running known-good ones.
Synthesizes "domain-aware" + "rigorous": a flow that looks green but can't catch a
regression surfaces as risk. Pure flow-risk.ts; iris_domain loads the history.
V1-ROADMAP M4 (deepening).

13 new tests (flow-risk + domain); 528 server tests; all 16 tasks green uncached.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the risk-ranking work: the one-line `summary` now names the flow to
test first ("test first: <name> (<reason>)") when run history flags a high/medium
risk flow — so the single most actionable fact (where the real regression risk
is) is in the headline, not buried in the arrays. Omitted when the top flow is
only low risk (no false alarm).

Pure; 2 new tests; 529 server tests; all 16 tasks green uncached.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…utputSchema

The handler returned bare flow-name strings while its declared outputSchema
promises {name, path, createdAt?} objects — so a schema-validating MCP client
rejected the result ("Expected object, received string"). Caught for real while
driving the live demo via MCP. The unit test had codified the wrong (string[])
shape, so it passed while the contract was violated — corrected too.

Now maps each name → {name, path: flowPath(irisRoot, name)}. 529 server tests;
all gates green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…endation)

Observed live driving the demo: a single login flooded iris_observe with 319
events / ~37KB because the dashboard's count-up animations emit a dom.text per
frame — the agent only avoided the token hit because it knew to pass filters.
Now costHint adds a `recommendation` when a timeline is large (>=80 events or
>=8KB) telling the agent to pass filters:[...] or max_events next time. Surfaced
in iris_observe's cost output schema. Backed by real session data; V1-ROADMAP M2.

3 new tests; 532 server tests; all 16 tasks green uncached. (No daemon restart —
the live demo session is untouched.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rdown

The network observer stored window.fetch.bind(window) and assigned that
bound copy back on teardown, so window.fetch was never restored to its
original identity — violating the "fully reversible" contract in types.ts.
Keep the true original for restore; use a separate window-bound copy only
for invocation (fetch throws "Illegal invocation" on the wrong `this`).

Adds the first test coverage for installNetwork, pinning GET->500 emission
(the case a live MCP session appeared to miss; the observer itself is sound,
so that miss was a long-session/window artifact, not a code bug), POST
method capture, reject-path status:0, and teardown identity.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eardown

Same bind-on-restore bug audited from the network observer fix: route.ts
and console.ts stored bound copies (history.pushState.bind / console[m].bind)
and assigned those back on teardown, so the globals were never restored to
their original identity — breaking the "reversible" contract. Keep the true
originals for restore; use bound copies only for invocation.

Adds first-ever observer-level tests for both (emit + teardown identity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-based flakiness)

The #1 flake cause in agent-driven UI testing is async-wait + concurrency
handled with fixed sleeps. Adds a { kind: 'settled', quietMs } predicate:
the page is settled when no network/DOM/animation activity has occurred for
quietMs (default 500). "No activity in the last N ms" is relative to now, so
the buffer clock is injected via PredicateSession.elapsed() (CLAUDE.md rule 7)
and the wait loop's poll interval flips it to pass once activity stops.

Usable today via iris_wait_for and iris_act_and_wait's `until` (PredicateSchema
already gates both), composes inside allOf with the consequence you expect, and
both tool descriptions now steer agents to it over raw sleeps. FlowReplaySession
gains elapsed(); test fakes across server + the public @syrin/iris-test package
updated to match. All 16 packages green, 537 server tests (+5 settled cases).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (M3)

Makes "act then wait for the page to go quiet" a single zero-config call: omit
`until` on iris_act_and_wait and it now waits for the { kind: 'settled' }
predicate (network/DOM/animation idle) instead of requiring a predicate — the
documented alternative to a fixed sleep. Explicit predicates are unchanged;
to assert a consequence AND settle, allOf them. Tool + field descriptions
updated to steer agents here. Adds onEvent to the live-control test fake so the
wait loop is exercised. 538 server tests green (+1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…5 — heal the locator, never the intent)

iris_flow_heal apply:true rewrote a drifted testid to its nearest match and
wrote it to disk WITHOUT checking that the healed flow still does what it was
testing. A rebind can resolve to a real but wrong element (a look-alike control)
that no longer triggers the flow's success signal/net — persisting that ships a
green test that asserts nothing. The maintenance-tax research is explicit:
locator-only healing covers <1/3 of failures and can ship bugs.

Now, before writing, apply re-replays the healed flow in memory and re-asserts
its success consequence. If the consequence no longer fires, the write is
REFUSED (status:consequence_broken, file untouched) and the proposal is still
surfaced for a human. Flows with no declared success heal but say so loudly
(the rebind couldn't be verified — add a success-state assertion).

Extracts the pure applyHealChanges (shared by FlowStore.heal's on-disk writer
and the in-memory verification so both rewrite identically). 541 server tests
green (+3: verified-heal, refused-broken-consequence, unverified-no-success).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A broad query (by=role value=button) on a busy page returns every descriptor,
spending agent context on data it may not need. Adds two opt-in trims to the
result the AGENT sees:
- count_only: return just { count }, drop the element array (when you only need
  "how many match?").
- limit: keep the first N descriptors; if more matched, carry total +
  truncated:true so the trim is never silent — the agent narrows with name/scope
  rather than assuming it saw everything.

Pure, result-shape-tolerant paginateQueryResult (non-{elements} results pass
through). 548 server tests green (+7).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both tools returned every matching call/log uncapped and carried no size hint —
unlike iris_observe/iris_query/iris_snapshot. A buggy page spams the console and
a wide `since` window returns many calls, both spending agent context. Adds a
`limit` (most-recent-N via the already-tested applyEventBudget, reporting total +
droppedOldest so a trim is never silent) and a `cost:{bytes,tokens}` hint so the
agent can self-budget — consistent with the other read tools. 551 server tests
green (+3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…w (M4)

iris_domain told an agent that a flow asserts SOMETHING (boolean) but not WHAT
its success depends on. The M4 bar is that the agent can answer "what are the
critical flows and what must hold for each?" from Iris alone — the "what must
hold" was missing. Each DomainFlowSummary now carries mustHold: the human label
of the flow's success consequence (signal name / net URL), or undefined when the
flow declares no success (then asserts:false — it tests nothing observable).
552 server tests green (+1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…control, safe-heal, and domain features

Records this session's user-facing changes under [Unreleased]: the settled
predicate + act_and_wait auto-settle (M3), iris_query/network/console token
controls + cost hints (M2), iris_domain mustHold (M4), consequence-verified
flow_heal (M5), and the observer teardown-identity fix. (Also folds in a
pre-existing cosmetic prettier reflow of the 0.5.0 section.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nsequence (rigor)

The success-oracle thesis ("green means the feature worked") was enforced for
saved flows (classifyFlowAssertions) but not for ad-hoc iris_assert. An agent
that asserts only { element }/{ text } presence gets a green verdict even though
a healed-but-wrong element or a stale render can satisfy it (Fowler, Assertion-
Free Testing; Dodds, Make Your Test Fail). iris_assert now attaches `advice` to a
PASSING presence-only assertion, steering toward a { signal }/{ net } consequence
(or an allOf with one). Never on a failing verdict or when a consequence is
already asserted. Pure isPresenceOnlyAssertion walks allOf/anyOf/not. 563 server
tests green (+11).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rts, and read caps

Surfaces tonight's habits in the canonical quick-reference so they're discoverable
to agents and humans, not just buried in tool descriptions:
- Never sleep — iris_act_and_wait with no `until` settles; { kind:"settled" } waits
  for network/DOM/animation idle; allOf a consequence with settled.
- Assert a consequence ({signal}/{net}) over presence ({element}/{text}); a passing
  presence-only assert returns `advice`.
- Cap broad reads: iris_query limit/count_only, iris_network/iris_console limit + cost.
- flow_heal refuses to write a rebind that breaks the success consequence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Self-review caught a real bug in tonight's settled predicate: it counted
dom.text and animation events as activity, so any page with an ambient effect —
a count-up counter, spinner, or pulsing dot — emits an event every frame and
NEVER goes quiet, making act->settle time out at 4s on a healthy page. Observed
live in iter 18: one login flooded 319 dom.text events from the dashboard's
count-up animations. Same trap that deprecated Playwright's networkidle.

SETTLE_ACTIVITY is now network + structural DOM (added/removed/attr) only —
the real "app is doing work" signals. Ambient text/anim churn is ignored;
for an animation-gated outcome, assert the specific consequence instead. Tool
descriptions, cheatsheet, and CHANGELOG corrected to "network + DOM idle".
564 server tests green (+1 ambient-animation regression test).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ignal false pass)

Self-review of the M5 heal verification found a real honesty gap: assertSuccess
evaluated the success predicate against the WHOLE session buffer (no since floor).
In flow_heal this is acute — one heal call does two replays (the pre-heal drift
replay's prefix, then the verify replay), so a success signal emitted by the
first could satisfy the verify even when the rebound step never fired it: a
false-positive heal, the exact thing the feature exists to prevent. flow_replay
had the same staleness across runs in one session.

assertSuccess now takes a `since` floor (WaitForSignal gains the param, already
honored by waitForPredicate); flow_replay and the heal verify each capture the
cursor before their replay so only that replay's events count — the same
since-floor honesty principle used by iris_assert/wait_for. 565 server tests
green (+1 floor regression test); flow-replay test fake gains elapsed().

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
divshekhar and others added 2 commits June 18, 2026 16:52
Positively exercises tonight's additions end-to-end in a real browser (the
existing battery only proves no-regression): iris_query count_only/limit, the
settled predicate + act_and_wait auto-settle, and the iris_assert presence-only
advice. The settled check is run against the demo's count-up animations on
purpose — the exact ambient dom.text churn that the iter-29 fix must tolerate;
it now settles (quietForMs ~307) instead of timing out. 7/7 live.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ontrols, domain mustHold

Bumps all 10 publishable @syrin/iris-* packages 0.5.0 -> 0.6.10 and promotes the
CHANGELOG [Unreleased] section to [0.6.10]. This is the live-verified feature
release (NOT 1.0.0 — see plan/V1-ROADMAP.md for the 1.0 gate): full e2e battery
14/14 + a new-features spec 7/7 green on a real browser across the Vite/React
demo, Next.js 15, and the api.

Cleanup: removed a throwaway headless-launcher script (untracked) and its leaked
browser process. No dead code to remove otherwise — noUnusedLocals + no-unused-vars
are enforced and green; knip's "unused files" are all intentional
dynamically-spawned e2e specs and standalone asset scripts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@divshekhar, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 45 minutes and 26 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 08274d45-919f-4fb3-a8e8-888c7bcc4f0e

📥 Commits

Reviewing files that changed from the base of the PR and between 3b0e8e0 and 777a60b.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (2)
  • packages/vite-plugin/package.json
  • pre-commit.sh
📝 Walkthrough

Walkthrough

This PR adds new Iris tool behaviors, flow and domain analysis, iris init, a Vite plugin package and export path, browser observer teardown fixes, end-to-end and unit coverage, documentation updates, and release/version metadata for v0.6.10.

Changes

Core release changes

Layer / File(s) Summary
Tool runtime controls and predicates
packages/server/src/tools/*, packages/server/src/events/predicate.ts, packages/server/src/session/output-budget.ts, apps/e2e/specs/new-features-test.mjs, docs/agent-cheatsheet.md, docs/token-efficiency.md, docs/usage.md
Adds settled predicate support, snapshot diff responses, query pagination, cost metadata, observe recommendations, passing presence-only assert advice, and network or console limit handling, with tests and docs.
Flow grading and domain summaries
packages/server/src/flows/flow-classify.ts, packages/server/src/flows/flow-success.ts, packages/server/src/domain/*, packages/server/src/index.ts, packages/test/src/*
Adds flow assertion grading, success assertion helpers, domain-model and risk-ranking builders, the iris_domain tool, richer flow save/list outputs, public re-exports, and shared test-side usage.
Replay success verification and heal safeguards
packages/protocol/src/*, packages/server/src/flows/flow-replay.ts, packages/server/src/flows/heal.ts, packages/server/src/flows/flow-tools.ts, packages/server/src/flows/flows.ts, packages/server/src/flows/*test.ts
Adds replay time-floor support, ambiguous drift reporting, reusable heal application, replay success verification, and refusal to persist healed flows when the declared consequence no longer passes.
Vite plugin package and SDK export
packages/vite-plugin/*, packages/iris/src/vite.ts, packages/iris/package.json, packages/iris/tsup.config.ts, docs/getting-started.md
Introduces the Vite plugin package, exposes @syrin/iris/vite, adds tests and README coverage, updates build references, and documents plugin-based Vite and React wiring.
Init command and project wiring
packages/server/src/cli.ts, packages/server/src/init/*, docs/getting-started.md, docs/integrate-with-claude-code.md
Adds the iris init CLI flow, project detection, MCP registration planning for Claude and Cursor, install and framework-specific wiring steps, Node-backed execution, and updated setup guidance.
Browser observer restoration and release metadata
packages/browser/src/observers/*, CHANGELOG.md, packages/*/package.json
Restores original browser global identities after console, fetch, and route observer teardown, adds regression tests, updates the changelog, and bumps package versions to 0.6.10.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as iris init
  participant Runner as runInit
  participant Planner as buildPlan
  participant Claude as claude mcp add
  participant Cursor as ~/.cursor/mcp.json
  participant Project as vite.config.ts

  CLI->>Runner: init options
  Runner->>Planner: gather inputs and build plan
  Planner-->>Runner: MCP, install, framework steps
  Runner->>Claude: register iris MCP when planned
  Runner->>Cursor: merge iris server config when planned
  Runner->>Project: patch Vite config when planned
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~105 minutes

Poem

🐇 I sniffed through tools and trails anew,
Found quieter waits and snapshots too.
A Vite burrow bloomed beside the lane,
While heals now test their hops again.
With tidy paws, the globals came back home.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main features of this release: deterministic waiting, self-healing, token controls, and domain understanding are all core changes reflected in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/iris-init-vite-plugin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…6j-xcff + 2 moderates) and add build+audit to pre-commit

audit: vite-plugin's devDependency vite "^5" resolved to 5.4.21, flagged by 1 high
(server.fs.deny bypass) + 2 moderate (optimized-deps path traversal, launch-editor
NTLM disclosure) advisories (all vite <=6.4.2). Bumped to "^7" so it dedupes to the
already-in-tree patched vite 7.3.5; peer stays >=4. `pnpm audit --audit-level high`
now reports no known vulnerabilities; vite-plugin builds + 12 tests pass under vite 7.

pre-commit: the hook was missing two CI steps. It now mirrors .github/workflows/ci.yml
one-for-one — added `pnpm build` (step 2) and `pnpm audit --audit-level high` (step 7),
so a green commit is a green CI run and a high+ vuln is caught before push, not in CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/getting-started.md (1)

443-443: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align the troubleshooting package name with Step 3.

This line points to @syrin/iris-babel-plugin, while Step 3 instructs @syrin/iris/babel. Keeping both names in one guide is confusing and can misdirect setup.

Suggested doc fix
-- Wire up `@syrin/iris-babel-plugin` (Step 3). Without it, only component identity is available.
+- Wire up `@syrin/iris/babel` (Step 3). Without it, only component identity is available.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/getting-started.md` at line 443, The troubleshooting section mentions
`@syrin/iris-babel-plugin` but Step 3 of the guide references
`@syrin/iris/babel`. Update the package name reference at line 443 to use
`@syrin/iris/babel` instead to match the package name specified in Step 3,
ensuring consistency throughout the documentation and preventing setup
confusion.
🟡 Minor comments (8)
packages/browser/src/observers/console.test.ts-27-36 (1)

27-36: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert forwarding explicitly in the first test.

The test name says forwarding is verified, but only event emission is asserted. Add an assertion that the original console.error path is invoked so this regression is actually covered.

Suggested patch
 it('emits CONSOLE_ERROR and still forwards to the original console', () => {
+  const errorSpy = vi.spyOn(console, 'error').mockImplementation(() => {});
   const { emit, events } = collect();
   teardown = installConsole(emit);

   console.error('boom', 42);

   expect(events).toHaveLength(1);
   expect(events[0]?.type).toBe(EventType.CONSOLE_ERROR);
   expect(events[0]?.data.message).toBe('boom 42');
+  expect(errorSpy).toHaveBeenCalledWith('boom', 42);
+  errorSpy.mockRestore();
 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/browser/src/observers/console.test.ts` around lines 27 - 36, The
test verifies event emission but does not assert that the original console.error
method is actually being forwarded. Add a spy or mock of the original
console.error method before calling installConsole, then after the console.error
call, add an assertion to verify that the original console.error was invoked
with the correct arguments ('boom' and 42) to ensure the forwarding behavior is
actually covered by the test.
CHANGELOG.md-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the missing link definition for 0.6.10.

The heading uses [0.6.10] reference-link syntax, but there is no matching link definition at the bottom, so this version entry won’t hyperlink.

📝 Proposed fix
+[0.6.10]: https://github.com/syrin-labs/iris/releases/tag/v0.6.10
 [0.5.0]: https://github.com/syrin-labs/iris/releases/tag/v0.5.0
 [0.4.0]: https://github.com/syrin-labs/iris/releases/tag/v0.4.0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` at line 7, The heading for version 0.6.10 uses markdown
reference-link syntax with [0.6.10], but there is no corresponding link
definition at the bottom of the CHANGELOG.md file to complete the reference. Add
a link definition at the end of the file following the pattern of other version
links in the document to match the [0.6.10] reference in the heading and enable
proper hyperlinking for this version entry.
docs/usage.md-267-267 (1)

267-267: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add mustHold? to the documented iris_domain flow shape.

Line 267 omits mustHold?, but buildDomainModel sets it when a flow has success (see packages/server/src/domain/domain-model.ts Line 117). The docs currently under-specify the returned payload.

Doc fix
-- `iris_domain({})` → `{ flowCount, flows: [{ name, steps, grade, asserts, signals, testids, warning?, risk? }], declared: { testids, signals, stores }, coverage: { asserted, presenceOnly, assertionFree }, gaps: { unassertedFlows, declaredUntestedSignals, declaredUntestedTestids }, riskRanked, summary }`
+- `iris_domain({})` → `{ flowCount, flows: [{ name, steps, grade, asserts, signals, testids, mustHold?, warning?, risk? }], declared: { testids, signals, stores }, coverage: { asserted, presenceOnly, assertionFree }, gaps: { unassertedFlows, declaredUntestedSignals, declaredUntestedTestids }, riskRanked, summary }`
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/usage.md` at line 267, The `iris_domain({})` return type documentation
at line 267 of docs/usage.md is missing the `mustHold?` optional property from
the flow object shape. Add `mustHold?` to the documented flow properties in the
return type definition to match what `buildDomainModel` actually sets in the
packages/server/src/domain/domain-model.ts file when a flow has a success
condition.
packages/server/src/flows/flows.heal.test.ts-85-93 (1)

85-93: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the heal test session cursor-aware so stale-signal protection is actually exercised.

heal(..., apply: true) verifies success using a since floor from session.elapsed(). Here, eventsSince() ignores the cursor and elapsed() is always 0, so buffered signals can satisfy verification even when they should be considered stale.

Suggested test-harness fix
-  eventsSince(): IrisEvent[] {
-    return this.events;
+  eventsSince(cursor: number): IrisEvent[] {
+    return this.events.filter((event) => event.t >= cursor);
   }
@@
   elapsed(): number {
-    return 0;
+    return 1;
   }
@@
 function signalEvent(name: string): IrisEvent {
-  return { t: 0, type: EventType.SIGNAL, sessionId: 's', data: { name, data: {} } };
+  return { t: 1, type: EventType.SIGNAL, sessionId: 's', data: { name, data: {} } };
 }

Also applies to: 96-99

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/flows/flows.heal.test.ts` around lines 85 - 93, The test
mock session needs to be made cursor-aware to properly exercise stale-signal
protection. Modify the mock session to track a cursor position and elapsed time:
make eventsSince() accept and use a cursor parameter to filter events rather
than returning all events, and make elapsed() return an actual tracked value
(either incrementing or controllable) instead of always returning 0. This will
ensure that heal(..., apply: true) verification with the since floor from
session.elapsed() properly validates that buffered signals outside the time
window are considered stale.
packages/server/src/tools/snapshot-cost.test.ts-6-11 (1)

6-11: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove design-doc identifiers from comments, test title, and log labels.

M0/M2 markers are internal design-doc style identifiers and should not appear in comments/test strings.

✏️ Suggested edit
- * M0 — a reproducible measurement of the M2 token wins, run over the REAL shipped functions
+ * Reproducible measurement of token savings, run over the real shipped functions
@@
-describe('M0 — snapshot token cost (M2 wins, measured on real functions)', () => {
+describe('Snapshot token cost regression (measured on real functions)', () => {
@@
-      `[M0] full re-snapshot=${String(fullTokens)} tok  diff=${String(deltaTokens)} tok  saved=${String(savedPct)}%`,
+      `[snapshot-cost] full re-snapshot=${String(fullTokens)} tok  diff=${String(deltaTokens)} tok  saved=${String(savedPct)}%`,
@@
-    console.log(`[M0] unchanged re-snapshot=${String(unchangedTokens)} tok`);
+    console.log(`[snapshot-cost] unchanged re-snapshot=${String(unchangedTokens)} tok`);
@@
-      `[M0] cost preview: ${String(preview.tokens)} tok / ${String(preview.bytes)} bytes`,
+      `[snapshot-cost] cost preview: ${String(preview.tokens)} tok / ${String(preview.bytes)} bytes`,

As per coding guidelines, "Never include internal tracking tags in comments, file names, directory names, or test descriptions. Do not use design-doc reference codes (letter + digit patterns like N5, G4, M8, P2, F1, R1) or internal version strings."

Also applies to: 39-40, 59-60, 78-78, 87-88

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/tools/snapshot-cost.test.ts` around lines 6 - 11, Remove
the internal design-doc identifiers M0 and M2 from the comment block in the
snapshot-cost.test.ts file. Specifically, remove "M0 — a reproducible
measurement of the" and replace it with a direct description, and remove "M2"
references so the comment describes the measurement approach without internal
tracking codes. Apply the same fix to all other locations in the file where
these design-doc reference patterns appear (at lines 39-40, 59-60, 78, and
87-88), ensuring all letter-digit patterns like N5, G4, M8, etc. are removed
from comments, test titles, and log labels throughout the file.

Source: Coding guidelines

apps/e2e/specs/new-features-test.mjs-1-1 (1)

1-1: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove the internal changelog tag from the file comment.

[Unreleased] in an inline comment violates the internal-tag prohibition.

✏️ Suggested edit
-// Live verification of the features added in the [Unreleased] CHANGELOG section, against the real
+// Live verification of the features added in the CHANGELOG section, against the real

As per coding guidelines, "Never include internal tracking tags in comments, file names, directory names, or test descriptions."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/e2e/specs/new-features-test.mjs` at line 1, The file comment at the
start of new-features-test.mjs contains the internal changelog tag [Unreleased]
which violates the coding guideline prohibiting internal tracking tags in
comments. Remove the reference to [Unreleased] and the CHANGELOG section from
the comment, keeping only the core description of what the file does for live
feature verification.

Source: Coding guidelines

packages/server/src/tools/assert-grade.ts-40-46 (1)

40-46: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Mixed predicates are incorrectly flagged as presence-only.

At Line 44-46 and Line 60, allOf/anyOf with { kind: 'element' } plus a non-presence predicate like { kind: 'settled' } still returns presence-only advice. That contradicts the contract in this file and causes false advice emission.

Proposed fix
 interface PredicateKinds {
   /** A signal/net leaf is present — proves the app did something a wrong element cannot fake. */
   consequence: boolean;
   /** An element/text leaf is present — a weak presence check. */
   presence: boolean;
+  /** Any non-presence, non-consequence leaf (route/console/animation/settled). */
+  other: boolean;
 }

 function walk(predicate: Predicate): PredicateKinds {
   switch (predicate.kind) {
     case 'signal':
     case 'net':
-      return { consequence: true, presence: false };
+      return { consequence: true, presence: false, other: false };
     case 'element':
     case 'text':
-      return { consequence: false, presence: true };
+      return { consequence: false, presence: true, other: false };
     case 'route':
     case 'console':
     case 'animation':
     case 'settled':
-      // Observable but not the weak presence pattern we nudge — neither flags the advice.
-      return { consequence: false, presence: false };
+      return { consequence: false, presence: false, other: true };
     case 'allOf':
     case 'anyOf': {
       const subs = predicate.predicates.map(walk);
       return {
         consequence: subs.some((s) => s.consequence),
         presence: subs.some((s) => s.presence),
+        other: subs.some((s) => s.other),
       };
     }
     case 'not':
       return walk(predicate.predicate);
   }
 }

 export function isPresenceOnlyAssertion(predicate: Predicate): boolean {
   const kinds = walk(predicate);
-  return kinds.presence && !kinds.consequence;
+  return kinds.presence && !kinds.consequence && !kinds.other;
 }

Also applies to: 58-60

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/tools/assert-grade.ts` around lines 40 - 46, The walk
function's handling of allOf and anyOf predicates incorrectly marks mixed
predicates (containing both presence-only and non-presence predicates) as
presence-only. In the return statement for these cases, the presence property is
computed using subs.some() which returns true if ANY sub-predicate is
presence-only; this should instead use subs.every() to return true only when ALL
sub-predicates are presence-only. This ensures that a combined predicate is only
flagged as presence-only when all of its constituent predicates are
presence-only, fixing the false advice emission issue that contradicts the
file's contract. Apply the same fix to both the allOf/anyOf case block starting
around line 42 and the similar logic at line 58-60.
packages/server/src/init/mcp.ts-42-45 (1)

42-45: ⚠️ Potential issue | 🟡 Minor

Extract CLI command strings as named constants.

The function uses inline strings ('get', 'add', 'user', '-s', '--', '--version') for CLI arguments. Per the coding guidelines, all domain/wire/UI strings must be named constants. Define constants for these command tokens (e.g., const MCP_GET_SUBCOMMAND = 'get', const MCP_ADD_SUBCOMMAND = 'add', const SCOPE_USER = 'user', etc.) and reference them instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/init/mcp.ts` around lines 42 - 45, The
claudeExistsProbe() function and other functions in this file use inline string
literals for CLI command arguments (such as 'get', 'add', 'user', '-s', '--',
'--version') instead of named constants. Define named constants at the top of
the file for each of these CLI command tokens (e.g., MCP_GET_SUBCOMMAND,
MCP_ADD_SUBCOMMAND, SCOPE_USER, etc.) following the same naming pattern as the
existing CLAUDE_CLI, MCP_SUBCOMMAND, and MCP_SERVER_NAME constants, then replace
all inline string occurrences throughout the file with references to these new
constants.
🧹 Nitpick comments (10)
packages/server/src/flows/flow-success.ts (1)

30-75: ⚡ Quick win

Extract domain/wire string literals into named constants.

This segment introduces multiple free domain/wire literals ('signal', 'net', 'element', 'allOf', 'testid', 'role', 'name', 'net', 'success'). Please promote them to named constants to match repository rules.

Proposed refactor
+const PredicateKind = {
+  signal: 'signal',
+  net: 'net',
+  element: 'element',
+  allOf: 'allOf',
+} as const;
+
+const ElementQueryKey = {
+  testid: 'testid',
+  role: 'role',
+  name: 'name',
+} as const;
+
+const SuccessLabelFallback = {
+  net: 'net',
+  success: 'success',
+} as const;
+
 export function successLabel(success: FlowExpect): string {
   if (success.signal !== undefined) return success.signal;
-  if (success.net !== undefined) return success.net.urlContains ?? success.net.method ?? 'net';
-  return success.element?.testid ?? success.element?.name ?? success.element?.role ?? 'success';
+  if (success.net !== undefined)
+    return success.net.urlContains ?? success.net.method ?? SuccessLabelFallback.net;
+  return (
+    success.element?.testid ??
+    success.element?.name ??
+    success.element?.role ??
+    SuccessLabelFallback.success
+  );
 }
 ...
-      const query: Record<string, string> = {};
-      if (testid !== undefined) query['testid'] = testid;
-      if (element.role !== undefined) query['role'] = element.role;
-      if (element.name !== undefined) query['name'] = element.name;
-      if (Object.keys(query).length > 0) parts.push({ kind: 'element', query });
+      const query: Record<string, string> = {};
+      if (testid !== undefined) query[ElementQueryKey.testid] = testid;
+      if (element.role !== undefined) query[ElementQueryKey.role] = element.role;
+      if (element.name !== undefined) query[ElementQueryKey.name] = element.name;
+      if (Object.keys(query).length > 0) parts.push({ kind: PredicateKind.element, query });

As per coding guidelines, “All domain/wire/UI strings must be named constants, never free strings in code.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/flows/flow-success.ts` around lines 30 - 75, Extract all
hardcoded string literals from the successLabel and successToPredicate functions
into named constants at the module level. Replace all occurrences of 'signal',
'net', 'element', 'allOf', 'testid', 'role', 'name', and 'success' string
literals with these named constants throughout both functions to comply with the
domain/wire string literal guidelines.

Source: Coding guidelines

packages/server/src/domain/domain-model.test.ts (1)

59-67: ⚡ Quick win

Add a regression test for “anchor signal only” coverage.

Please add a case where a flow references a signal anchor but does not assert expect.signal/success.signal, and verify declaredUntestedSignals still flags it. That protects the stated “asserted signal” contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/domain/domain-model.test.ts` around lines 59 - 67, Add a
new test case in the test file after the existing test that verifies when a flow
defines a signal anchor (via the signal property in flow()) but does not assert
it with expect.signal or success.signal, the signal is still properly flagged in
declaredUntestedSignals. Use buildDomainModel, flow, and contract functions
similar to the existing test pattern to construct a scenario where a flow
references a signal without asserting it, then verify that signal appears in the
gaps.declaredUntestedSignals array to protect the stated asserted signal
contract.
packages/server/src/domain/flow-risk.test.ts (1)

29-52: ⚡ Quick win

Add a PRESENCE_ONLY risk test case.

Current coverage skips the FlowAssertionGrade.PRESENCE_ONLY branch, so risk-priority regressions there won’t be caught.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/domain/flow-risk.test.ts` around lines 29 - 52, The test
suite for flowRisk is missing coverage for the FlowAssertionGrade.PRESENCE_ONLY
case, which could allow regressions to go undetected. Add a new test case in the
describe block that exercises the PRESENCE_ONLY grade by calling flowRisk with
FlowAssertionGrade.PRESENCE_ONLY as the first argument and a run result as the
second argument (similar to the existing test cases like the assertion-free flow
test), then assert the expected RiskLevel that should be returned for this grade
to ensure the branch is properly tested and documented.
packages/server/src/events/predicate.ts (1)

363-373: ⚡ Quick win

Extract newly introduced wire strings to named constants.

Line 363 and Line 372 introduce new inline domain/wire strings (note and failureReason). Please hoist these to named constants to match the repository rule and keep message contracts centralized.

As per coding guidelines, "All domain/wire/UI strings must be named constants, never free strings in code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/events/predicate.ts` around lines 363 - 373, The inline
strings `'no activity to settle'` (used as the note value) and the failureReason
string template (used in the failureReason property) need to be extracted to
named constants following the repository's coding guidelines. Define these
strings as named constants at the module level or in a centralized constants
location, then replace the inline string literals with references to those
constants. Ensure the failureReason constant properly handles the dynamic values
by using a template string with the placeholders for lastType, quietForMs, and
quietMs.

Source: Coding guidelines

packages/server/src/events/predicate.test.ts (1)

228-311: ⚡ Quick win

Consolidate repeated predicate/event literals into named test constants.

The new settled suite adds many inline domain/wire strings (e.g., predicate kinds, signal names, URLs, messages). Please move these to file-level constants for compliance and easier fixture reuse.

As per coding guidelines, "All domain/wire/UI strings must be named constants, never free strings in code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/events/predicate.test.ts` around lines 228 - 311, Extract
all inline domain/wire/UI strings from the settled predicate test suite (the
describe block starting at line 228) into file-level named constants. This
includes predicate kinds like 'settled', 'signal', and 'allOf', signal names
like 'deploy:shipped' and 'x', URLs like '/api/x' and '/api/deploy', event
property names and values, and failure reason strings like 'not settled'.
Replace all occurrences of these inline literals throughout the test cases with
references to the corresponding named constants to ensure compliance with the
coding guideline that all domain/wire/UI strings must be named constants.

Source: Coding guidelines

packages/server/src/tools/tools.near-miss.test.ts (1)

79-127: ⚡ Quick win

Extract newly added event/tool literals to shared constants.

The new token-budget tests add inline domain/wire strings (URLs, messages, tool-facing literals). Please promote them to named constants for compliance and consistency with other fixtures.

As per coding guidelines, "All domain/wire/UI strings must be named constants, never free strings in code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/tools/tools.near-miss.test.ts` around lines 79 - 127, The
token budget tests in the describe block contain hardcoded string literals for
URLs, console messages, and log levels that should be extracted to named
constants. Identify all inline strings in the test cases including the URLs
'/1', '/2', '/3' used in iris_network tests, the messages 'a', 'b', 'c' used in
iris_console tests, and the level string 'error' used in the iris_console test.
Create named constants for each of these values and replace the inline strings
throughout the test methods (in the ev() calls, handler calls, and expect
assertions) with references to these constants to ensure compliance with the
coding guideline that all domain/wire strings must be named constants.

Source: Coding guidelines

packages/server/src/tools/assert-grade.test.ts (1)

5-60: ⚡ Quick win

Apply the string-constant rule in this new test suite.

This file introduces many inline domain/wire strings in predicates and assertions. Please define named constants and reuse them across cases.

As per coding guidelines, "All domain/wire/UI strings must be named constants, never free strings in code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/tools/assert-grade.test.ts` around lines 5 - 60, The test
suite in isPresenceOnlyAssertion contains many inline string literals for
predicate kinds (element, text, signal, net, allOf, not, route, settled,
console) and test data values (button, Saved, order:placed, /api/order, Done,
Welcome, /success, error, etc.). Define named constants at the top of the test
file for all these domain/wire/UI strings and replace each inline string literal
throughout all test cases in the describe block with references to the
corresponding constant. This ensures consistency and makes the strings reusable
across the test suite.

Source: Coding guidelines

packages/server/src/session/output-budget.ts (1)

46-46: ⚡ Quick win

Extract the recommendation message into a named constant.

This introduces a free domain/wire string inline in a .ts file; move it to a named constant and format it from parameters.

♻️ Suggested change
+const LARGE_TIMELINE_RECOMMENDATION_TEMPLATE =
+  'large timeline ({events} events, ~{tokens} tokens) — pass filters:[...] (e.g. ["signal","net"]) or max_events to scope your next call and cut tokens';
+
 export function costHint(payload: unknown, events: number, droppedOldest = 0): CostHint {
   const json = JSON.stringify(payload) ?? '';
   const bytes = json.length;
   const base: CostHint = droppedOldest > 0 ? { events, bytes, droppedOldest } : { events, bytes };
   if (events >= LARGE_TIMELINE_EVENTS || bytes >= LARGE_TIMELINE_BYTES) {
-    base.recommendation = `large timeline (${String(events)} events, ~${String(estimateTokens(json))} tokens) — pass filters:[...] (e.g. ["signal","net"]) or max_events to scope your next call and cut tokens`;
+    base.recommendation = LARGE_TIMELINE_RECOMMENDATION_TEMPLATE
+      .replace('{events}', String(events))
+      .replace('{tokens}', String(estimateTokens(json)));
   }
   return base;
 }

As per coding guidelines, "All domain/wire/UI strings must be named constants, never free strings in code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/session/output-budget.ts` at line 46, Extract the
hardcoded recommendation message string currently assigned to
base.recommendation into a named constant at the module level. The constant
should contain the template with placeholders for the dynamic values (events
count and estimated tokens), and then reference this constant in the assignment
while still providing the formatted parameters. This ensures the domain/wire
string is not a free string in the code but follows the guideline of being a
named constant.

Source: Coding guidelines

packages/server/src/init/run.test.ts (1)

61-67: ⚡ Quick win

Add a regression test for malformed package.json.

The suite checks the missing-file case but not invalid JSON content. A targeted test here will prevent reintroducing the crash path.

🧪 Suggested test case
 describe('runInit', () => {
   it('errors cleanly without a package.json', () => {
     const io = memoryIo({});
     const r = runInit(OPTS, io);
     expect(r.ok).toBe(false);
     expect(io.lines.join('\n')).toContain('No package.json');
   });

+  it('errors cleanly with an invalid package.json', () => {
+    const io = memoryIo({ 'package.json': '{ "name": "bad-json",' });
+    const r = runInit(OPTS, io);
+    expect(r.ok).toBe(false);
+    expect(io.lines.join('\n')).toContain('Invalid package.json');
+  });
+
   it('registers iris globally via the claude CLI (not a project .mcp.json) and patches vite', () => {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/init/run.test.ts` around lines 61 - 67, Add a new
regression test case in the same test suite after the existing "errors cleanly
without a package.json" test. Create a test that verifies the runInit function
handles malformed JSON content gracefully. Use memoryIo to provide a
package.json file with invalid JSON syntax (such as incomplete or syntactically
broken JSON), call runInit with OPTS and the io object, and verify that the
result has ok set to false and that io.lines contains an appropriate error
message indicating the JSON parsing failure. This test will prevent crashes when
package.json exists but contains invalid JSON.
packages/server/src/init/vite-config.test.ts (1)

30-34: ⚡ Quick win

Add a regression case for “import exists, plugin missing”.

Current idempotency coverage only verifies fully patched input. Add a test where the config already imports @syrin/iris/vite but plugins lacks iris(); expected result should still be APPLY.

Suggested test
+  it('does not treat import-only configs as already patched', () => {
+    const src = `import { defineConfig } from 'vite';
+import { iris } from '`@syrin/iris/vite`';
+import react from '`@vitejs/plugin-react`';
+
+export default defineConfig({
+  plugins: [react()],
+});
+`;
+    const r = patchViteConfig(src);
+    expect(r.kind).toBe(VitePatchKind.APPLY);
+  });

Also applies to: 56-61

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/init/vite-config.test.ts` around lines 30 - 34, Add a new
test case after the existing idempotency test in the test file to cover the
regression scenario where a Vite config already has the import statement for
`@syrin/iris/vite` but is missing the iris() plugin in the plugins array. Create a
test config string that includes the import line but lacks the iris() function
call in the plugins configuration, then call patchViteConfig on this config and
assert that the returned kind is VitePatchKind.APPLY, confirming that the
function correctly identifies and applies the patch for this incomplete state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/e2e/specs/new-features-test.mjs`:
- Around line 54-91: The test code lacks proper error handling and resource
cleanup. If any await statement throws an error before reaching b.close() and
server.close(), resources will leak and hang the CI. Wrap all the test logic
(from chromium.launch through the final chk call) in a try block, and move the
b.close() and server.close() calls into a finally block that runs regardless of
success or failure. Keep the process.exit call after the finally block to ensure
it only executes after cleanup is complete.

In `@packages/server/src/domain/domain-model.ts`:
- Around line 174-190: Extract all hardcoded strings in the coverage status
message building code into named constants at the module or class level. This
includes the "No saved flows yet" message, the template strings used for
constructing the parts array such as "flow"/"flows", "test first: ", "declared
signal(s) no flow asserts (", and "flow(s) assert no consequence". Replace each
free string in the code with a reference to its corresponding named constant to
ensure consistency and maintainability of domain strings throughout the
codebase.
- Around line 77-85: The flowSignals function incorrectly includes anchor
signals in addition to asserted signals, which causes flows that only reference
a signal anchor to be treated as tested even when they don't actually assert or
expect that signal. Remove the line that adds anchor signals to the set (the
condition checking if step.anchor.kind === AnchorKind.SIGNAL) so that only
signals that are explicitly asserted through expect.signal or success.signal are
counted as covered by the flow.

In `@packages/server/src/domain/domain-tools.ts`:
- Around line 23-37: The outputSchema.flows object definition is missing the
optional risk field that is actually returned by the handler's buildDomainModel
function call, causing a schema contract mismatch for consumers. Add an optional
risk field to the Zod schema object within the flows definition (around the
z.object block), using the same pattern as the other optional fields like
mustHold and warning, declaring it as z.string().optional() or with an
appropriate type based on what buildDomainModel returns.
- Around line 17-19: Extract all free string literals from the tool definition
(including the long description string in the description field, and any strings
at the locations mentioned in lines 31-33 and 57-58) into named constants at the
top of the file, then replace each free string with a reference to its
corresponding constant. This ensures all domain/wire/UI strings follow the
repository policy of being named constants rather than inline literals.

In `@packages/server/src/domain/flow-risk.ts`:
- Around line 64-75: The gradeRisk function currently maps
FlowAssertionGrade.PRESENCE_ONLY to RiskLevel.LOW, which incorrectly ranks it as
equivalent to consequence-asserting flows. Instead, change the return value for
the PRESENCE_ONLY condition to return a higher risk level (such as
RiskLevel.MEDIUM) to properly reflect that presence-only assertions represent a
weaker oracle and should be prioritized higher in triage according to the
module's risk prioritization strategy.

In `@packages/server/src/flows/flow-tools.ts`:
- Around line 109-117: The assertions schema object in the outputSchema (which
contains fields like grade, hasConsequenceAssertion, totalSteps,
consequenceSteps, weakSteps, and warning) is missing the successIsConsequence
field that is actually returned by the classifyFlowAssertions function on line
143. Add the successIsConsequence field to the z.object() schema definition with
an appropriate zod type (likely z.boolean()) to match the payload being returned
and prevent strict schema consumers from rejecting the response.
- Around line 246-247: The condition for `stepsClean` includes `steps.length >
0`, which prevents success verification when a flow has zero replayed steps.
This allows flows with declared `success` values but no steps to bypass the
`assertSuccess` check. Remove the `steps.length > 0` requirement from the
`stepsClean` variable definition so that success verification occurs regardless
of whether there are any steps, ensuring that any declared success consequence
is properly validated. Apply this same fix to the similar pattern at lines
483-484.

In `@packages/server/src/init/cursor.ts`:
- Around line 39-44: The parseConfig function currently returns a success result
with an empty config object when encountering non-object JSON roots or null
values, which incorrectly allows automatic rewrites that could overwrite
existing configuration. Modify the parseConfig function to return a result
indicating MANUAL handling mode instead of ok: true when the parsed JSON root is
not an object or is null. Additionally, apply the same MANUAL handling approach
for invalid mcpServers shapes in the configuration validation logic (in the
range around lines 54-67) to prevent unintended rewrites of existing
configuration content.

In `@packages/server/src/init/mcp.ts`:
- Around line 38-50: Extract the inline CLI command tokens ('add', 'user',
'get', '--version') into named constants at the module level, following the
no-free-strings rule. Define constants for each token (e.g., similar to how
CLAUDE_CLI, MCP_SUBCOMMAND, and MCP_SERVER_NAME are defined), then replace all
occurrences of these hardcoded strings in the command builder function and the
claudeExistsProbe and claudeAvailableProbe functions with references to the
newly created constants. This ensures all domain/wire strings are captured as
named constants throughout the module.

In `@packages/server/src/init/plan.ts`:
- Around line 68-255: Extract the inline string literals for step titles and
details throughout the file into named constants at the top, following the
pattern of CLAUDE_MCP_TITLE and CURSOR_MCP_TITLE. This applies to strings in the
installStep, viteSteps, nextSteps, and buildPlan functions including titles like
"Install dependency", "Vite plugin", "IrisDev component", "Next config
(withIris)", "Mount IrisDev", "Connect snippet", and their corresponding detail
messages. Replace each inline string literal with the appropriate constant
reference to ensure all domain/UI strings are centralized and follow the
repository's coding guidelines.

In `@packages/server/src/init/run.ts`:
- Around line 70-71: The JSON.parse call in the gatherPlanInput function at line
71 can throw an exception if the pkgRaw string contains malformed JSON, causing
the CLI to crash. Wrap the JSON.parse(pkgRaw) call in a try-catch block to
handle the parse error gracefully, and either throw a custom error with a
user-friendly message or return an error state that the caller can handle.
Similarly, check the related code around lines 157-166 where similar JSON
parsing may occur and apply the same error handling pattern to ensure all JSON
parsing in the runInit path handles malformed input without crashing.

In `@packages/server/src/init/vite-config.ts`:
- Around line 49-57: The ALREADY detection in patchViteConfig function is too
broad because checking only if the source includes IRIS_MARKER can match
occurrences in comments or imports without verifying the iris plugin is actually
in the plugins array. Replace the simple source.includes(IRIS_MARKER) check with
a more robust validation that confirms the iris() plugin call is present in the
plugins array, similar to how PLUGINS_ARRAY.test(source) validates the plugins
array structure. This ensures ALREADY is only returned when the plugin injection
is truly complete.

In `@packages/server/src/tools/tools.ts`:
- Line 1299: The tools.ts file exceeds the repository's 500-line file limit at
1299 lines. Refactor by extracting related tool definition groups (such as
snapshot tools, query tools, assert tools, and network-console tools) into
separate dedicated modules. Then import these extracted tool groups back into
tools.ts and compose them together in the DOMAIN_TOOLS spread operator to
maintain the same public API while keeping each file within the 500-line limit.
- Around line 339-356: The snapshot cache key generation in the handler function
uses a fallback value of 'default' when sessionId is not provided, causing
different sessions without an explicit sessionId to share the same cache entry
in the process-wide SNAPSHOT_CACHE. This creates privacy and correctness issues
as unrelated sessions can get deltas computed from another session's prior
snapshot. Remove the fallback to 'default' in the sessionId field passed to
applySnapshotDelta and ensure each session has a uniquely identifiable sessionId
rather than silently defaulting multiple sessions to the same cache key value.

---

Outside diff comments:
In `@docs/getting-started.md`:
- Line 443: The troubleshooting section mentions `@syrin/iris-babel-plugin` but
Step 3 of the guide references `@syrin/iris/babel`. Update the package name
reference at line 443 to use `@syrin/iris/babel` instead to match the package
name specified in Step 3, ensuring consistency throughout the documentation and
preventing setup confusion.

---

Minor comments:
In `@apps/e2e/specs/new-features-test.mjs`:
- Line 1: The file comment at the start of new-features-test.mjs contains the
internal changelog tag [Unreleased] which violates the coding guideline
prohibiting internal tracking tags in comments. Remove the reference to
[Unreleased] and the CHANGELOG section from the comment, keeping only the core
description of what the file does for live feature verification.

In `@CHANGELOG.md`:
- Line 7: The heading for version 0.6.10 uses markdown reference-link syntax
with [0.6.10], but there is no corresponding link definition at the bottom of
the CHANGELOG.md file to complete the reference. Add a link definition at the
end of the file following the pattern of other version links in the document to
match the [0.6.10] reference in the heading and enable proper hyperlinking for
this version entry.

In `@docs/usage.md`:
- Line 267: The `iris_domain({})` return type documentation at line 267 of
docs/usage.md is missing the `mustHold?` optional property from the flow object
shape. Add `mustHold?` to the documented flow properties in the return type
definition to match what `buildDomainModel` actually sets in the
packages/server/src/domain/domain-model.ts file when a flow has a success
condition.

In `@packages/browser/src/observers/console.test.ts`:
- Around line 27-36: The test verifies event emission but does not assert that
the original console.error method is actually being forwarded. Add a spy or mock
of the original console.error method before calling installConsole, then after
the console.error call, add an assertion to verify that the original
console.error was invoked with the correct arguments ('boom' and 42) to ensure
the forwarding behavior is actually covered by the test.

In `@packages/server/src/flows/flows.heal.test.ts`:
- Around line 85-93: The test mock session needs to be made cursor-aware to
properly exercise stale-signal protection. Modify the mock session to track a
cursor position and elapsed time: make eventsSince() accept and use a cursor
parameter to filter events rather than returning all events, and make elapsed()
return an actual tracked value (either incrementing or controllable) instead of
always returning 0. This will ensure that heal(..., apply: true) verification
with the since floor from session.elapsed() properly validates that buffered
signals outside the time window are considered stale.

In `@packages/server/src/init/mcp.ts`:
- Around line 42-45: The claudeExistsProbe() function and other functions in
this file use inline string literals for CLI command arguments (such as 'get',
'add', 'user', '-s', '--', '--version') instead of named constants. Define named
constants at the top of the file for each of these CLI command tokens (e.g.,
MCP_GET_SUBCOMMAND, MCP_ADD_SUBCOMMAND, SCOPE_USER, etc.) following the same
naming pattern as the existing CLAUDE_CLI, MCP_SUBCOMMAND, and MCP_SERVER_NAME
constants, then replace all inline string occurrences throughout the file with
references to these new constants.

In `@packages/server/src/tools/assert-grade.ts`:
- Around line 40-46: The walk function's handling of allOf and anyOf predicates
incorrectly marks mixed predicates (containing both presence-only and
non-presence predicates) as presence-only. In the return statement for these
cases, the presence property is computed using subs.some() which returns true if
ANY sub-predicate is presence-only; this should instead use subs.every() to
return true only when ALL sub-predicates are presence-only. This ensures that a
combined predicate is only flagged as presence-only when all of its constituent
predicates are presence-only, fixing the false advice emission issue that
contradicts the file's contract. Apply the same fix to both the allOf/anyOf case
block starting around line 42 and the similar logic at line 58-60.

In `@packages/server/src/tools/snapshot-cost.test.ts`:
- Around line 6-11: Remove the internal design-doc identifiers M0 and M2 from
the comment block in the snapshot-cost.test.ts file. Specifically, remove "M0 —
a reproducible measurement of the" and replace it with a direct description, and
remove "M2" references so the comment describes the measurement approach without
internal tracking codes. Apply the same fix to all other locations in the file
where these design-doc reference patterns appear (at lines 39-40, 59-60, 78, and
87-88), ensuring all letter-digit patterns like N5, G4, M8, etc. are removed
from comments, test titles, and log labels throughout the file.

---

Nitpick comments:
In `@packages/server/src/domain/domain-model.test.ts`:
- Around line 59-67: Add a new test case in the test file after the existing
test that verifies when a flow defines a signal anchor (via the signal property
in flow()) but does not assert it with expect.signal or success.signal, the
signal is still properly flagged in declaredUntestedSignals. Use
buildDomainModel, flow, and contract functions similar to the existing test
pattern to construct a scenario where a flow references a signal without
asserting it, then verify that signal appears in the
gaps.declaredUntestedSignals array to protect the stated asserted signal
contract.

In `@packages/server/src/domain/flow-risk.test.ts`:
- Around line 29-52: The test suite for flowRisk is missing coverage for the
FlowAssertionGrade.PRESENCE_ONLY case, which could allow regressions to go
undetected. Add a new test case in the describe block that exercises the
PRESENCE_ONLY grade by calling flowRisk with FlowAssertionGrade.PRESENCE_ONLY as
the first argument and a run result as the second argument (similar to the
existing test cases like the assertion-free flow test), then assert the expected
RiskLevel that should be returned for this grade to ensure the branch is
properly tested and documented.

In `@packages/server/src/events/predicate.test.ts`:
- Around line 228-311: Extract all inline domain/wire/UI strings from the
settled predicate test suite (the describe block starting at line 228) into
file-level named constants. This includes predicate kinds like 'settled',
'signal', and 'allOf', signal names like 'deploy:shipped' and 'x', URLs like
'/api/x' and '/api/deploy', event property names and values, and failure reason
strings like 'not settled'. Replace all occurrences of these inline literals
throughout the test cases with references to the corresponding named constants
to ensure compliance with the coding guideline that all domain/wire/UI strings
must be named constants.

In `@packages/server/src/events/predicate.ts`:
- Around line 363-373: The inline strings `'no activity to settle'` (used as the
note value) and the failureReason string template (used in the failureReason
property) need to be extracted to named constants following the repository's
coding guidelines. Define these strings as named constants at the module level
or in a centralized constants location, then replace the inline string literals
with references to those constants. Ensure the failureReason constant properly
handles the dynamic values by using a template string with the placeholders for
lastType, quietForMs, and quietMs.

In `@packages/server/src/flows/flow-success.ts`:
- Around line 30-75: Extract all hardcoded string literals from the successLabel
and successToPredicate functions into named constants at the module level.
Replace all occurrences of 'signal', 'net', 'element', 'allOf', 'testid',
'role', 'name', and 'success' string literals with these named constants
throughout both functions to comply with the domain/wire string literal
guidelines.

In `@packages/server/src/init/run.test.ts`:
- Around line 61-67: Add a new regression test case in the same test suite after
the existing "errors cleanly without a package.json" test. Create a test that
verifies the runInit function handles malformed JSON content gracefully. Use
memoryIo to provide a package.json file with invalid JSON syntax (such as
incomplete or syntactically broken JSON), call runInit with OPTS and the io
object, and verify that the result has ok set to false and that io.lines
contains an appropriate error message indicating the JSON parsing failure. This
test will prevent crashes when package.json exists but contains invalid JSON.

In `@packages/server/src/init/vite-config.test.ts`:
- Around line 30-34: Add a new test case after the existing idempotency test in
the test file to cover the regression scenario where a Vite config already has
the import statement for `@syrin/iris/vite` but is missing the iris() plugin in
the plugins array. Create a test config string that includes the import line but
lacks the iris() function call in the plugins configuration, then call
patchViteConfig on this config and assert that the returned kind is
VitePatchKind.APPLY, confirming that the function correctly identifies and
applies the patch for this incomplete state.

In `@packages/server/src/session/output-budget.ts`:
- Line 46: Extract the hardcoded recommendation message string currently
assigned to base.recommendation into a named constant at the module level. The
constant should contain the template with placeholders for the dynamic values
(events count and estimated tokens), and then reference this constant in the
assignment while still providing the formatted parameters. This ensures the
domain/wire string is not a free string in the code but follows the guideline of
being a named constant.

In `@packages/server/src/tools/assert-grade.test.ts`:
- Around line 5-60: The test suite in isPresenceOnlyAssertion contains many
inline string literals for predicate kinds (element, text, signal, net, allOf,
not, route, settled, console) and test data values (button, Saved, order:placed,
/api/order, Done, Welcome, /success, error, etc.). Define named constants at the
top of the test file for all these domain/wire/UI strings and replace each
inline string literal throughout all test cases in the describe block with
references to the corresponding constant. This ensures consistency and makes the
strings reusable across the test suite.

In `@packages/server/src/tools/tools.near-miss.test.ts`:
- Around line 79-127: The token budget tests in the describe block contain
hardcoded string literals for URLs, console messages, and log levels that should
be extracted to named constants. Identify all inline strings in the test cases
including the URLs '/1', '/2', '/3' used in iris_network tests, the messages
'a', 'b', 'c' used in iris_console tests, and the level string 'error' used in
the iris_console test. Create named constants for each of these values and
replace the inline strings throughout the test methods (in the ev() calls,
handler calls, and expect assertions) with references to these constants to
ensure compliance with the coding guideline that all domain/wire strings must be
named constants.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 47fd35ff-a94f-48c2-880e-ff8bf8781c13

📥 Commits

Reviewing files that changed from the base of the PR and between ff54d8d and 3b0e8e0.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (89)
  • CHANGELOG.md
  • apps/e2e/specs/new-features-test.mjs
  • docs/agent-cheatsheet.md
  • docs/getting-started.md
  • docs/integrate-with-claude-code.md
  • docs/token-efficiency.md
  • docs/usage.md
  • packages/babel-plugin/package.json
  • packages/browser/package.json
  • packages/browser/src/observers/console.test.ts
  • packages/browser/src/observers/console.ts
  • packages/browser/src/observers/network.test.ts
  • packages/browser/src/observers/network.ts
  • packages/browser/src/observers/route.test.ts
  • packages/browser/src/observers/route.ts
  • packages/eslint-plugin/package.json
  • packages/iris/package.json
  • packages/iris/src/vite.ts
  • packages/iris/tsup.config.ts
  • packages/next/package.json
  • packages/protocol/package.json
  • packages/protocol/src/constants.ts
  • packages/protocol/src/types.ts
  • packages/react/package.json
  • packages/server/package.json
  • packages/server/src/cli.test.ts
  • packages/server/src/cli.ts
  • packages/server/src/domain/domain-model.test.ts
  • packages/server/src/domain/domain-model.ts
  • packages/server/src/domain/domain-tools.ts
  • packages/server/src/domain/flow-risk.test.ts
  • packages/server/src/domain/flow-risk.ts
  • packages/server/src/events/predicate.test.ts
  • packages/server/src/events/predicate.ts
  • packages/server/src/flows/flow-classify.test.ts
  • packages/server/src/flows/flow-classify.ts
  • packages/server/src/flows/flow-replay.dynamic.test.ts
  • packages/server/src/flows/flow-replay.test.ts
  • packages/server/src/flows/flow-replay.ts
  • packages/server/src/flows/flow-success.test.ts
  • packages/server/src/flows/flow-success.ts
  • packages/server/src/flows/flow-tools.ts
  • packages/server/src/flows/flows.heal.test.ts
  • packages/server/src/flows/flows.ts
  • packages/server/src/flows/heal-proposal.test.ts
  • packages/server/src/flows/heal.ts
  • packages/server/src/flows/tools.flow-replay.test.ts
  • packages/server/src/flows/tools.flows.test.ts
  • packages/server/src/index.ts
  • packages/server/src/init/cursor.test.ts
  • packages/server/src/init/cursor.ts
  • packages/server/src/init/detect.test.ts
  • packages/server/src/init/detect.ts
  • packages/server/src/init/mcp.test.ts
  • packages/server/src/init/mcp.ts
  • packages/server/src/init/node-io.ts
  • packages/server/src/init/plan.test.ts
  • packages/server/src/init/plan.ts
  • packages/server/src/init/run.test.ts
  • packages/server/src/init/run.ts
  • packages/server/src/init/snippets.ts
  • packages/server/src/init/vite-config.test.ts
  • packages/server/src/init/vite-config.ts
  • packages/server/src/session/output-budget.test.ts
  • packages/server/src/session/output-budget.ts
  • packages/server/src/session/tools.live-control.test.ts
  • packages/server/src/tools/assert-grade.test.ts
  • packages/server/src/tools/assert-grade.ts
  • packages/server/src/tools/query-paginate.test.ts
  • packages/server/src/tools/query-paginate.ts
  • packages/server/src/tools/snapshot-cost.test.ts
  • packages/server/src/tools/snapshot-delta.test.ts
  • packages/server/src/tools/snapshot-delta.ts
  • packages/server/src/tools/tool-names.ts
  • packages/server/src/tools/tools.assert-advice.test.ts
  • packages/server/src/tools/tools.near-miss.test.ts
  • packages/server/src/tools/tools.ts
  • packages/test/package.json
  • packages/test/src/flow-spec.test.ts
  • packages/test/src/register.test.ts
  • packages/test/src/success-assert.test.ts
  • packages/test/src/success-assert.ts
  • packages/vite-plugin/README.md
  • packages/vite-plugin/package.json
  • packages/vite-plugin/src/build.integration.test.ts
  • packages/vite-plugin/src/index.test.ts
  • packages/vite-plugin/src/index.ts
  • packages/vite-plugin/tsconfig.json
  • tsconfig.json

Comment on lines +54 to +91
const b = await chromium.launch({ headless: true });
const p = await b.newPage();
await p.goto('http://localhost:4310/?session=demo', { waitUntil: 'networkidle' });
for (let i = 0; i < 200 && server.bridge.sessions.count() === 0; i++) await sleep(50);

console.log('\n=== Iris × new features (:4310) ===');
chk('dashboard SDK connected', server.bridge.sessions.count() > 0);

// count_only — just the match count, no descriptors.
const co = await T('iris_query', { by: 'role', value: 'button', count_only: true });
chk('iris_query count_only returns a count, drops elements', typeof co.count === 'number' && co.count >= 1 && co.elements === undefined, `count=${co.count}`);

// limit — cap descriptors; when more matched, total + truncated flag it.
const lim = await T('iris_query', { by: 'role', value: 'button', limit: 1 });
const moreThanOne = (co.count ?? 0) > 1;
chk('iris_query limit caps descriptors (truncated when more)', (lim.elements?.length ?? 0) <= 1 && (!moreThanOne || (lim.truncated === true && lim.total === co.count)), `returned=${lim.elements?.length}, total=${lim.total ?? 'n/a'}`);

// Auth (pre-filled) → dashboard with its count-up animations.
await T('iris_act_and_wait', { ref: await refOf('testid', 'login-submit'), action: 'click', until: { kind: 'signal', name: 'auth:granted' }, timeout_ms: 5000 });
chk('login → dashboard', (await refOf('testid', 'nav-deployments')) !== null);

// settled wait — the dashboard's count-up counters emit dom.text every frame; settle must STILL
// resolve (the ambient-animation fix). Pre-fix this would time out at 4s with pass:false.
const settled = await T('iris_wait_for', { predicate: { kind: 'settled', quietMs: 300 }, timeout_ms: 4000 });
chk('settled resolves despite count-up animation churn', settled.pass === true, JSON.stringify(settled.evidence ?? {}));

// act_and_wait with NO `until` → auto-settle after a nav click; verdict carries settled evidence.
const aw = await T('iris_act_and_wait', { ref: await refOf('testid', 'nav-deployments'), action: 'click' });
chk('act_and_wait (no until) auto-settles', aw.verdict?.pass === true && aw.verdict?.evidence?.settled === true, JSON.stringify(aw.verdict?.evidence ?? {}));

// presence-only advice — a PASSING element assertion is nudged toward a consequence.
const adv = await T('iris_assert', { predicate: { kind: 'element', query: { testid: 'deploy-list' } } });
chk('iris_assert presence-only attaches advice', adv.pass === true && typeof adv.advice === 'string' && adv.advice.includes('consequence'), adv.advice ? 'advice present' : 'no advice');

console.log(`\n${fail === 0 ? '✅ NEW FEATURES VERIFIED' : '❌ FAILED'} (${pass} passed, ${fail} failed)`);
await b.close();
await server.close();
process.exit(fail === 0 ? 0 : 1);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guarantee teardown with try/finally around the e2e run.

Any thrown await in the flow skips b.close()/server.close(), which can leak resources and hang CI jobs.

🛠️ Suggested hardening
-const b = await chromium.launch({ headless: true });
-const p = await b.newPage();
-await p.goto('http://localhost:4310/?session=demo', { waitUntil: 'networkidle' });
-for (let i = 0; i < 200 && server.bridge.sessions.count() === 0; i++) await sleep(50);
+let b;
+let p;
+try {
+  b = await chromium.launch({ headless: true });
+  p = await b.newPage();
+  await p.goto('http://localhost:4310/?session=demo', { waitUntil: 'networkidle' });
+  for (let i = 0; i < 200 && server.bridge.sessions.count() === 0; i++) await sleep(50);
@@
-console.log(`\n${fail === 0 ? '✅ NEW FEATURES VERIFIED' : '❌ FAILED'} (${pass} passed, ${fail} failed)`);
-await b.close();
-await server.close();
+  console.log(`\n${fail === 0 ? '✅ NEW FEATURES VERIFIED' : '❌ FAILED'} (${pass} passed, ${fail} failed)`);
+} finally {
+  await p?.close();
+  await b?.close();
+  await server.close();
+}
 process.exit(fail === 0 ? 0 : 1);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/e2e/specs/new-features-test.mjs` around lines 54 - 91, The test code
lacks proper error handling and resource cleanup. If any await statement throws
an error before reaching b.close() and server.close(), resources will leak and
hang the CI. Wrap all the test logic (from chromium.launch through the final chk
call) in a try block, and move the b.close() and server.close() calls into a
finally block that runs regardless of success or failure. Keep the process.exit
call after the finally block to ensure it only executes after cleanup is
complete.

Comment on lines +77 to +85
function flowSignals(flow: FlowFile): string[] {
const set = new Set<string>();
for (const step of flatten(flow.steps)) {
if (step.anchor.kind === AnchorKind.SIGNAL) set.add(step.anchor.name);
if (step.expect?.signal !== undefined) set.add(step.expect.signal);
}
if (flow.success?.signal !== undefined) set.add(flow.success.signal);
return [...set];
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Signal-gap computation currently over-credits non-asserting flows.

On Line 80, flowSignals includes anchor signals, but declaredUntestedSignals is defined as signals that no flow asserts. A flow that only references a signal anchor (without expect.signal/success.signal) will be treated as covered, which can suppress real intent gaps.

Suggested fix
 function flowSignals(flow: FlowFile): string[] {
   const set = new Set<string>();
   for (const step of flatten(flow.steps)) {
-    if (step.anchor.kind === AnchorKind.SIGNAL) set.add(step.anchor.name);
     if (step.expect?.signal !== undefined) set.add(step.expect.signal);
   }
   if (flow.success?.signal !== undefined) set.add(flow.success.signal);
   return [...set];
 }

Also applies to: 133-136

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/domain/domain-model.ts` around lines 77 - 85, The
flowSignals function incorrectly includes anchor signals in addition to asserted
signals, which causes flows that only reference a signal anchor to be treated as
tested even when they don't actually assert or expect that signal. Remove the
line that adds anchor signals to the set (the condition checking if
step.anchor.kind === AnchorKind.SIGNAL) so that only signals that are explicitly
asserted through expect.signal or success.signal are counted as covered by the
flow.

Comment on lines +174 to +190
return 'No saved flows yet — record the critical journeys (iris_record_start) so the agent learns the app.';
}
const parts = [
`${String(flowCount)} flow${flowCount === 1 ? '' : 's'}: ${String(coverage.asserted)} asserted, ${String(coverage.presenceOnly)} presence-only, ${String(coverage.assertionFree)} assertion-free`,
];
if (topRisk !== undefined) {
parts.push(`test first: ${topRisk.name} (${topRisk.reason})`);
}
if (gaps.declaredUntestedSignals.length > 0) {
parts.push(
`${String(gaps.declaredUntestedSignals.length)} declared signal(s) no flow asserts (${gaps.declaredUntestedSignals.join(', ')})`,
);
}
if (gaps.unassertedFlows.length > 0) {
parts.push(`${String(gaps.unassertedFlows.length)} flow(s) assert no consequence`);
}
return parts.join('. ') + '.';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Extract runtime/domain strings into named constants.

Lines 174–190 introduce multiple free strings in tool-facing output. This violates the repo rule and makes wording changes harder to maintain consistently.

As per coding guidelines, “All domain/wire/UI strings must be named constants, never free strings in code.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/domain/domain-model.ts` around lines 174 - 190, Extract
all hardcoded strings in the coverage status message building code into named
constants at the module or class level. This includes the "No saved flows yet"
message, the template strings used for constructing the parts array such as
"flow"/"flows", "test first: ", "declared signal(s) no flow asserts (", and
"flow(s) assert no consequence". Replace each free string in the code with a
reference to its corresponding named constant to ensure consistency and
maintainability of domain strings throughout the codebase.

Source: Coding guidelines

Comment on lines +17 to +19
description:
'Read the app domain model BEFORE testing: every saved flow with its assertion grade, the consequence that MUST hold for it (mustHold = what it actually tests), the anchors/signals it exercises, plus GAPS — declared signals/testids that NO flow asserts (untested intent), and flows that assert no observable consequence. Use this to decide what to test and where the real risk is, instead of crawling the whole app. Reads .iris/flows/ + .iris/contract.json (no browser needed).',
inputSchema: {},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Move tool-facing strings to named constants.

This file introduces multiple free strings in tool description and schema metadata. Please extract them to constants to satisfy the repo string policy.

As per coding guidelines, “All domain/wire/UI strings must be named constants, never free strings in code.”

Also applies to: 31-33, 57-58

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/domain/domain-tools.ts` around lines 17 - 19, Extract all
free string literals from the tool definition (including the long description
string in the description field, and any strings at the locations mentioned in
lines 31-33 and 57-58) into named constants at the top of the file, then replace
each free string with a reference to its corresponding constant. This ensures
all domain/wire/UI strings follow the repository policy of being named constants
rather than inline literals.

Source: Coding guidelines

Comment on lines +23 to +37
z.object({
name: z.string(),
steps: z.number(),
grade: z.string(),
asserts: z.boolean(),
mustHold: z
.string()
.optional()
.describe(
'The success consequence that must hold for this flow (what it actually tests).',
),
warning: z.string().optional(),
signals: z.array(z.string()),
testids: z.array(z.string()),
}),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

outputSchema.flows is missing the optional risk field returned by the handler.

The handler returns buildDomainModel(...) (Line 71), and DomainFlowSummary includes risk when run history exists. But outputSchema.flows doesn’t declare it, creating a wire-contract mismatch for schema-aware consumers.

Suggested schema addition
       flows: z.array(
         z.object({
           name: z.string(),
           steps: z.number(),
           grade: z.string(),
           asserts: z.boolean(),
@@
           warning: z.string().optional(),
           signals: z.array(z.string()),
           testids: z.array(z.string()),
+          risk: z
+            .object({
+              level: z.string(),
+              reason: z.string(),
+              lastStatus: z.string().optional(),
+            })
+            .optional(),
         }),
       ),

Also applies to: 61-72

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/domain/domain-tools.ts` around lines 23 - 37, The
outputSchema.flows object definition is missing the optional risk field that is
actually returned by the handler's buildDomainModel function call, causing a
schema contract mismatch for consumers. Add an optional risk field to the Zod
schema object within the flows definition (around the z.object block), using the
same pattern as the other optional fields like mustHold and warning, declaring
it as z.string().optional() or with an appropriate type based on what
buildDomainModel returns.

Comment on lines +68 to +255
const CLAUDE_MCP_TITLE = 'MCP server (Claude, global)';
const CURSOR_MCP_TITLE = 'MCP server (Cursor, global)';

function claudeMcpStep(input: PlanInput): Step | null {
if (!input.claudeCli) return null;
if (input.mcpExists) {
return {
title: CLAUDE_MCP_TITLE,
target: MCP_TARGET,
status: StepStatus.ALREADY,
detail: 'iris already registered (install once, used by every project)',
};
}
const cmd = claudeAddCommand(input.options.port);
return {
title: CLAUDE_MCP_TITLE,
target: MCP_TARGET,
status: StepStatus.APPLY,
detail: 'register iris globally for all projects',
exec: { command: cmd.command, args: cmd.args, fallback: cmd.display },
};
}

function cursorMcpStep(input: PlanInput): Step | null {
if (!input.cursorPresent) return null;
const r = mergeCursorConfig(input.cursorConfig, input.options.port);
if (r.status === CursorMergeStatus.ALREADY) {
return {
title: CURSOR_MCP_TITLE,
target: input.cursorConfigPath,
status: StepStatus.ALREADY,
detail: 'iris already in Cursor global config',
};
}
if (r.status === CursorMergeStatus.MANUAL) {
return {
title: CURSOR_MCP_TITLE,
target: input.cursorConfigPath,
status: StepStatus.MANUAL,
detail: `couldn't parse ${input.cursorConfigPath} — add this server by hand:\n "iris": ${JSON.stringify(cursorServerEntry(input.options.port))}`,
};
}
return {
title: CURSOR_MCP_TITLE,
target: input.cursorConfigPath,
status: StepStatus.APPLY,
detail: 'register iris in Cursor global config',
write: { path: input.cursorConfigPath, content: r.content },
};
}

/** One global registration per detected agent (Claude + Cursor). Falls back to a manual note. */
function mcpSteps(input: PlanInput): Step[] {
if (!input.options.mcp) {
return [
{
title: 'MCP server (global)',
target: MCP_TARGET,
status: StepStatus.SKIP,
detail: '--no-mcp',
},
];
}
const steps = [claudeMcpStep(input), cursorMcpStep(input)].filter((s): s is Step => s !== null);
if (steps.length > 0) return steps;
// No supported agent detected — print the one-time global instructions.
return [
{
title: 'MCP server (global)',
target: MCP_TARGET,
status: StepStatus.MANUAL,
detail: mcpManual(input.options.port),
},
];
}

function installStep(input: PlanInput): Step {
const pm = input.detection.packageManager;
const command = installCommand(pm, IRIS_PACKAGE);
if (!input.options.install) {
return {
title: 'Install dependency',
target: 'package.json',
status: StepStatus.MANUAL,
detail: command,
};
}
const parts = installCommandParts(pm, IRIS_PACKAGE);
return {
title: 'Install dependency',
target: 'package.json',
status: StepStatus.APPLY,
detail: command,
exec: { command: parts.command, args: parts.args, fallback: command },
};
}

function viteSteps(input: PlanInput): Step[] {
const cfg = input.viteConfig;
const port = input.options.port;
if (cfg === null) {
return [
{
title: 'Vite plugin',
target: 'vite.config',
status: StepStatus.MANUAL,
detail: viteManual(port),
},
];
}
const patch = patchViteConfig(cfg.source, port);
if (patch.kind === VitePatchKind.ALREADY) {
return [
{
title: 'Vite plugin',
target: cfg.path,
status: StepStatus.ALREADY,
detail: 'iris() already in plugins',
},
];
}
if (patch.kind === VitePatchKind.MANUAL) {
return [
{
title: 'Vite plugin',
target: cfg.path,
status: StepStatus.MANUAL,
detail: `${patch.reason}\n\n${viteManual(port)}`,
},
];
}
return [
{
title: 'Vite plugin',
target: cfg.path,
status: StepStatus.APPLY,
detail: 'add iris() to plugins (also injects connect())',
write: { path: cfg.path, content: patch.code },
},
];
}

function nextSteps(input: PlanInput): Step[] {
const configFile = input.nextConfigFile ?? 'next.config.mjs';
const devFile: Step = input.nextIrisDevExists
? {
title: 'IrisDev component',
target: NEXT_IRIS_DEV_PATH,
status: StepStatus.ALREADY,
detail: 'file exists',
}
: {
title: 'IrisDev component',
target: NEXT_IRIS_DEV_PATH,
status: StepStatus.APPLY,
detail: 'create dev-only connect component',
write: { path: NEXT_IRIS_DEV_PATH, content: nextIrisDevFile(input.options.port) },
};
return [
devFile,
{
title: 'Next config (withIris)',
target: configFile,
status: StepStatus.MANUAL,
detail: nextConfigManual(configFile),
},
{
title: 'Mount IrisDev',
target: 'app/layout.tsx',
status: StepStatus.MANUAL,
detail: NEXT_LAYOUT_MANUAL,
},
];
}

export function buildPlan(input: PlanInput): Plan {
const steps: Step[] = [...mcpSteps(input), installStep(input)];
if (input.detection.framework === Framework.VITE) {
steps.push(...viteSteps(input));
} else if (input.detection.framework === Framework.NEXT) {
steps.push(...nextSteps(input));
} else {
steps.push({
title: 'Connect snippet',
target: 'index.html',
status: StepStatus.MANUAL,
detail: htmlManual(input.options.port),
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Extract remaining step title/detail literals into named constants.

This file still has many inline domain/UI strings in plan steps; centralizing them as constants will align with the repo rule and reduce drift in CLI messaging.

As per coding guidelines, All domain/wire/UI strings must be named constants, never free strings in code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/init/plan.ts` around lines 68 - 255, Extract the inline
string literals for step titles and details throughout the file into named
constants at the top, following the pattern of CLAUDE_MCP_TITLE and
CURSOR_MCP_TITLE. This applies to strings in the installStep, viteSteps,
nextSteps, and buildPlan functions including titles like "Install dependency",
"Vite plugin", "IrisDev component", "Next config (withIris)", "Mount IrisDev",
"Connect snippet", and their corresponding detail messages. Replace each inline
string literal with the appropriate constant reference to ensure all domain/UI
strings are centralized and follow the repository's coding guidelines.

Source: Coding guidelines

Comment on lines +70 to +71
function gatherPlanInput(options: InitOptions, io: InitIo, pkgRaw: string): PlanInput {
const pkg: unknown = JSON.parse(pkgRaw);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle malformed package.json without crashing iris init.

Line 71 can throw during parse, and that exception is currently uncaught in the runInit path. This turns a recoverable input error into a hard CLI failure.

💡 Proposed fix
 export function runInit(options: InitOptions, io: InitIo): InitResult {
   const pkgRaw = io.readFile(PACKAGE_JSON);
   if (pkgRaw === null) {
     io.print('No package.json found. Run `iris init` from your project root.');
     return { ok: false, applied: 0, manual: 0 };
   }

-  const plan = buildPlan(gatherPlanInput(options, io, pkgRaw));
+  let plan: Plan;
+  try {
+    plan = buildPlan(gatherPlanInput(options, io, pkgRaw));
+  } catch {
+    io.print('Invalid package.json. Fix JSON syntax and re-run `iris init`.');
+    return { ok: false, applied: 0, manual: 0 };
+  }
   const failed = options.dryRun ? new Set<string>() : applyEffects(plan, io);
   return report(plan, options.dryRun, failed, io);
 }

Also applies to: 157-166

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/init/run.ts` around lines 70 - 71, The JSON.parse call in
the gatherPlanInput function at line 71 can throw an exception if the pkgRaw
string contains malformed JSON, causing the CLI to crash. Wrap the
JSON.parse(pkgRaw) call in a try-catch block to handle the parse error
gracefully, and either throw a custom error with a user-friendly message or
return an error state that the caller can handle. Similarly, check the related
code around lines 157-166 where similar JSON parsing may occur and apply the
same error handling pattern to ensure all JSON parsing in the runInit path
handles malformed input without crashing.

Comment on lines +49 to +57
export function patchViteConfig(source: string, port?: number): VitePatch {
if (source.includes(IRIS_MARKER)) {
return { kind: VitePatchKind.ALREADY };
}
if (!PLUGINS_ARRAY.test(source)) {
return { kind: VitePatchKind.MANUAL, reason: NO_PLUGINS_REASON };
}
return { kind: VitePatchKind.APPLY, code: insertImport(insertPlugin(source, port)) };
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

ALREADY detection is too broad and can skip required plugin injection.

Checking only @syrin/iris/vite presence can return ALREADY when the config has the import (or even just a comment) but no iris(...) in plugins, leaving setup incomplete.

Suggested fix
 const IRIS_MARKER = '`@syrin/iris/vite`';
+const IRIS_PLUGIN_IN_PLUGINS = /plugins\s*:\s*\[[\s\S]*?\biris\s*\(/;

 function insertImport(source: string): string {
+  if (source.includes(IRIS_MARKER)) return source;
   const matches = [...source.matchAll(IMPORT_LINE)];
   const last = matches[matches.length - 1];
   if (last?.index === undefined) {
     return `${VITE_IMPORT}\n${source}`;
   }
   const end = last.index + last[0].length;
   return `${source.slice(0, end)}\n${VITE_IMPORT}${source.slice(end)}`;
 }

 export function patchViteConfig(source: string, port?: number): VitePatch {
-  if (source.includes(IRIS_MARKER)) {
+  if (source.includes(IRIS_MARKER) && IRIS_PLUGIN_IN_PLUGINS.test(source)) {
     return { kind: VitePatchKind.ALREADY };
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/init/vite-config.ts` around lines 49 - 57, The ALREADY
detection in patchViteConfig function is too broad because checking only if the
source includes IRIS_MARKER can match occurrences in comments or imports without
verifying the iris plugin is actually in the plugins array. Replace the simple
source.includes(IRIS_MARKER) check with a more robust validation that confirms
the iris() plugin call is present in the plugins array, similar to how
PLUGINS_ARRAY.test(source) validates the plugins array structure. This ensures
ALREADY is only returned when the plugin injection is truly complete.

Comment on lines +339 to +356
handler: (deps, args) => {
const sessionId = asString(args['sessionId']);
const mode = asString(args['mode']) ?? SnapshotMode.FULL;
return commandOrThrow(deps, sessionId, IrisCommand.SNAPSHOT, {
scope: args['scope'],
mode: args['mode'] ?? SnapshotMode.FULL,
}),
mode,
}).then((raw) =>
withSizeCost(
applySnapshotDelta(
raw,
{
sessionId: sessionId ?? 'default',
scope: asString(args['scope']) ?? '',
mode,
diff: args['diff'] === true,
},
SNAPSHOT_CACHE,
),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Snapshot diff cache key can collide across sessions when sessionId is omitted.

At Line 350, sessionId ?? 'default' is used with a process-wide SNAPSHOT_CACHE. Different sessions that omit sessionId can share one cache key and get deltas computed from another session’s prior snapshot, which is both incorrect and a privacy boundary risk.

Proposed fix
     handler: (deps, args) => {
-      const sessionId = asString(args['sessionId']);
+      const requestedSessionId = asString(args['sessionId']);
+      const resolvedSessionId = deps.sessions.resolve(requestedSessionId).id;
       const mode = asString(args['mode']) ?? SnapshotMode.FULL;
-      return commandOrThrow(deps, sessionId, IrisCommand.SNAPSHOT, {
+      return commandOrThrow(deps, resolvedSessionId, IrisCommand.SNAPSHOT, {
         scope: args['scope'],
         mode,
       }).then((raw) =>
         withSizeCost(
           applySnapshotDelta(
             raw,
             {
-              sessionId: sessionId ?? 'default',
+              sessionId: resolvedSessionId,
               scope: asString(args['scope']) ?? '',
               mode,
               diff: args['diff'] === true,
             },
             SNAPSHOT_CACHE,
           ),
         ),
       );
     },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/tools/tools.ts` around lines 339 - 356, The snapshot
cache key generation in the handler function uses a fallback value of 'default'
when sessionId is not provided, causing different sessions without an explicit
sessionId to share the same cache entry in the process-wide SNAPSHOT_CACHE. This
creates privacy and correctness issues as unrelated sessions can get deltas
computed from another session's prior snapshot. Remove the fallback to 'default'
in the sessionId field passed to applySnapshotDelta and ensure each session has
a uniquely identifiable sessionId rather than silently defaulting multiple
sessions to the same cache key value.

},
// iris_capabilities (live | fromDisk) + iris_contract_save. See contract-tools.ts.
...CONTRACT_TOOLS,
...DOMAIN_TOOLS,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

tools.ts exceeds the repository’s 500-line file limit.

This file now reaches at least Line 1299. Please split tool definitions into smaller modules (for example: snapshot/query/assert/network-console groups) and compose them in TOOLS to stay within the limit.

As per coding guidelines, **/*.{ts,tsx,js,jsx} requires that no file exceeds 500 lines.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/tools/tools.ts` at line 1299, The tools.ts file exceeds
the repository's 500-line file limit at 1299 lines. Refactor by extracting
related tool definition groups (such as snapshot tools, query tools, assert
tools, and network-console tools) into separate dedicated modules. Then import
these extracted tool groups back into tools.ts and compose them together in the
DOMAIN_TOOLS spread operator to maintain the same public API while keeping each
file within the 500-line limit.

Source: Coding guidelines

@divshekhar divshekhar merged commit ad2ff44 into main Jun 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant