Releases: syrin-labs/iris
Releases · syrin-labs/iris
v0.8.0
[0.8.0] — 2026-06-20
The "developers love it" release. 0.7.0 won the agent; 0.8.0 wins the human — the dev who watches the
agent work, points at what's wrong, and trusts the green.
Added
- Human review marks — "annotate the bug where you see it" (
packages/browser,packages/server,
packages/protocol). A dev-only "Flag a bug" button rides with the presenter: the human toggles
it, clicks the element that looks wrong, types what's wrong, and Iris drops a numbered pin + emits a
HUMAN_MARK. The mark carries the element's re-resolvable anchor (the same durable address a
recorded flow uses) and the sourcefile:line— so the agent fixes the exact element and code,
not a guess. The agent drains marks with the newiris_reviewtool: each pending mark comes with
a ready-to-actfixhint (Open src/Checkout.tsx:42 and fix: <note>. Then iris_review { resolve: m1 }),
reading never consumes a mark, andresolveretires it once fixed. Off the deterministic benchmark
path (human-driven) —pnpm benchunchanged. - First-run readiness + loop intro —
iris_wait_ready(packages/server). Call it right after
init: it blocks until the app's SDK connects (returns instantly if a session already exists, so zero
latency on the happy path and on the benchmark), or times out with arecoveryhint. Smooths the
most common first-5-minutes footgun — the agent's first real call racing the WebSocket connect. Its
ready response also carries a one-lineloopguide (look → act → observe → assert → regress, plus
the human-flag →iris_reviewloop), so a fresh agent learns how to drive Iris on its first call
without reading docs. Pure, injected clock/sleep; off the benchmark path. - Deterministic visual regression —
iris_viewport(packages/server). Pin the driven page to a
fixed viewport size (clamped to sane bounds) so a screenshot baseline is reproducible across machines
— the last missing piece of CI-stable visual diffing, alongside the already-shippediris_visual_diff
masks(neutralize volatile regions) and a frozen clock (iris_clock). Drive-only, additive; off the
benchmark path. Provider-driven and tested via a fake page likeiris_network_mock. - CDP network mock / intercept —
iris_network_mock(packages/server). On a driven page
(iris drive), stub a request deterministically: return a500, force offline (abort), or delay a
response — so "verify the app handles a failed payment" is one declared rule, no backend changes. The
matcher is pure (first rule whose url-substring + optional method matches wins → fulfill/abort/continue)
and the Playwrightpage.routewiring is driven in tests with a fake Page/Route. Needs a driven
browser; returns arecommendationtoiris driveotherwise. Off the agent/benchmark path. iris statusshows sessions + health at a glance (packages/server). The daemon exposes a
localGET /status;iris statusnow reports each connected tab (url, throttled, stale, pending
human marks) and the session count — not just "running: pid". The plan's "no more pkill in a README"
daemon DX. Local-only, off the agent/benchmark path.- Actionable error recovery (
packages/server). Every tool error returned to the agent now carries
arecoveryhint when the failure is recognized — the no-session footgun, multiple/unknown sessions,
a throttled tab, a missing baseline/recording, the pairing-token config — so the first 5 minutes never
dead-end on "what do I do now?". Conservative: an unrecognized error gets no invented advice. - The panel always reflects the agent's real state —
iris_yield(packages/server,
packages/browser,packages/protocol). A human watching the browser must never see "live" when the
agent has actually stopped. The agent signals its turn boundary withiris_yield({ mode: "waiting" })
(done responding, will resume on your next message) or{ mode: "ask", note }(blocked, needs your
answer — the question shows on the panel); the session is revived automatically on the agent's next
call. Taught as the mandatory last step in the session lease, the loop guide, and the skill — and it's
agent-independent (Codex / OpenCode / Claude / Hermes). The panel renders each handback distinctly
via a PRESENTERtone: waiting = calm teal ✋, ask = amber ❓ pulse, agent crashed/disconnected =
amber ⚠ pulse, a clean end = calm green. When the last agent's MCP connection drops, the daemon ends
every session and pushes the "switch to your terminal" notice (verified end-to-end through a SIGKILL-ed
agent). Off the benchmark path. - Don't lose a panel prompt in the death-race (
packages/server,packages/protocol). If the human
types a message into the panel at the exact moment the agent stops, it would land in a dead inbox; now
both the agent-detach and idle paths fold any unread note into the end banner — quoted and labeled
Undelivered (paste into your terminal): "…"— so the words are surfaced back, not silently dropped. - Replay a saved flow from the panel — no agent (
packages/browser,packages/server,
packages/protocol). The daemon pushes the saved-flow names to the HUD on connect; the human clicks
▶ on a flow and it re-runs with no agent in the loop — the page animates via the normal replay path
and the ✓ / ⚠ drift / ✗ verdict lands in the same activity log they watch the agent in. The dev plays
the regression suite directly. Off the benchmark path (a panel-driven control, not a tool).
Changed
- Internal cohesion split (no behavior change):
SessionManagermoved to its own
session-manager.ts, and the on-disk-artifact constants toflow-constants.ts, bringing both
parent files back under the 500-line cap. All public import paths unchanged (re-exported).
Fixed
- Panel composer is now multi-line (
packages/browser). The HUD message box was a single-line
<input>that sent on any Enter; it's a<textarea>now — Enter sends, Shift+Enter inserts a
newline, and it auto-grows to fit. - Flag mode keeps the right cursors (
packages/browser). In "Flag a bug" mode every element showed
the crosshair, including the Flag button and its popover — which are clickable; they keep the pointer
cursor now. And the hover outline that boxes the element under the cursor no longer snaps jumpily: it
waits for the cursor to rest (~130 ms), then glides into place on an ease and fades in.
v0.6.10
[0.6.10] — 2026-06-18
Added
- Deterministic waiting — the
settledpredicate (packages/server). A new predicate
{ kind: "settled", quietMs }passes once network + structural-DOM activity has been quiet for
quietMs(default 500ms); ambientdom.text/animation churn (count-ups, spinners) is ignored so
an animated page can still settle. Usable iniris_wait_forandiris_assert, and composable inside
allOfwith the consequence you expect. Replaces fixed sleeps — the #1 cause of flaky agent tests. iris_act_and_waitauto-settle (packages/server). Omituntiland the tool waits for the page
to settle instead of requiring a predicate — "act, then wait for quiet" is now a single zero-config
call, the documented alternative to a sleep.iris_querytoken controls (packages/server) —limit(cap returned descriptors; reports
total+truncatedso a trim is never silent) andcount_only(return just the match count).iris_network/iris_consoletoken controls (packages/server) —limit(keep the most
recent N matches, reportingtotal+droppedOldest) and acost:{bytes,tokens}hint, matching the
other read tools so the agent can self-budget everywhere.iris_domainmustHoldper flow (packages/server) — each flow now reports the success
consequence that must hold for it (signal name / net URL), so an agent can answer "what are the
critical flows and what must hold for each?" from the domain model alone.
Changed
- Self-healing now verifies the consequence before persisting (
packages/server).iris_flow_heal
withapply:truere-replays the healed flow and re-asserts its success consequence; if a rebound
locator resolves but the flow no longer satisfies its intent, the write is refused
(status:consequence_broken, file untouched). It heals the locator, never the intent.
Fixed
- Browser observers fully restore patched globals on teardown (
packages/browser). The network,
route, and console observers stored a bound copy and assigned it back on teardown, sowindow.fetch
/history.pushState/console.*were never restored to their original identity. They now keep the
true original for restore and a bound copy only for invocation.
v0.5.0
[0.5.0] — 2026-06-15
Added
iris mcp— smart proxy with auto-start (packages/server). Runiris mcp --drive <url>and you're
done: it starts the daemon if one isn't running, waits for it to be ready, then bridges Claude Code's stdin/stdout to the daemon's SSE endpoint. Users no longer manage the daemon manually.iris mcp --drive <url>/iris serve --drive <url>— pass a URL and Iris launches its own
Playwright browser at that URL, giving the agent full autonomous control without relying on the user's open browser tab.iris mcp --headed/--headedflag — opt in to a visible browser window so you can watch exactly what the agent is doing.- Three new update MCP tools (
packages/server):iris_version_info— returns the installed version, execution kind (npx / global / local), and
whether a newer version is available on npm.iris_apply_update— upgrades Iris in place; requiresconfirm: trueto actually run.iris_rollback— downgrades to the previous version; requiresconfirm: true.
- Presenter mode (
packages/browser,packages/server) —iris.connect({ present: true })mounts a
dev-only HUD overlay that the agent can control:iris_narrateshows a caption,iris_highlight
draws a ring around any element. The HUD is excluded from snapshots and tree-shaken in production. - Unified
SKILL.mdat repo root — a single skill file auto-detects mode: setup wizard on first
run (no.iris.json), live-app testing on every run after. Covers Claude Code, OpenCode, Codex CLI, Cursor, Windsurf, VS Code, and Zed MCP config formats. .iris.jsonproject config — written after first-run setup; persistsport,headed,
framework, andharnessesso subsequent runs need zero questions.dev:irisscript inapps/demo— second Vite dev server on port 4310, isolated from the user's normal dev port.
Fixed
- All-throttled session auto-selection (
packages/server). When every connected tab is hidden
(e.g. user is in VS Code with Chrome on another desktop),SessionManager.resolve()now picks the session with the freshest heartbeat instead of throwing"multiple sessions connected". - Presenter HUD shows on bridge connect — the overlay now mounts as soon as the SDK connects to the bridge, not only after the first
iris_narratecall. iris_narrateMCP schema validation — relaxed the output schema so the tool no longer rejects responses from narration calls.iris_inspect/iris_clockoutput schemas — relaxed to pass through extra fields instead of stripping them, fixing spurious validation errors.