The full reference and cookbook. If you haven't set up Iris yet, start with Getting Started.
Contents
- How Iris helps you
- Core concepts
- The tools — full reference
- The predicate DSL — full reference
- Actions — full list
- Snapshot modes & scoping
- Cookbook: real situations
- Regression: baselines & diff
- Recording a flow
- Autonomous exploration
- Turning your test cases into agent checks
- Token discipline
- Best practices & gotchas
- FAQ
- Security & privacy
You mostly talk to your agent in plain English — "add X and verify it works." The agent uses Iris under the hood. Here's the value, by situation:
- You stop being the agent's eyes. Today you build a feature, then you click through the browser to check it. With Iris the agent checks its own work and only comes back when it's actually verified — or with a precise reason it failed.
- Silent breakage gets caught. A console error, a 500 on one locale, a button that quietly disappeared after a refactor — humans skim past these; Iris asserts on them.
- The fix loop closes. When something's wrong, Iris reports the evidence — the failing network call, the console stack, and (on React) the source file:line to edit.
- It's cheap enough to run constantly. ~100 tokens per verified interaction means the agent can verify on every edit, not just at the end (see token-efficiency).
- Your manual QA becomes automated. The checklist you never turned into Playwright tests? Your agent runs it now (see §11).
Who benefits most: anyone shipping dashboards, internal tools, SaaS apps — behavior-heavy UIs with lots of forms, lists, modals, and API calls that change often.
The loop: look → act → observe → assert.
- Look with
iris_snapshot(what's on screen) oriris_query(find a specific thing). - Act with
iris_act(click/fill/…). It returns asincecursor — a timestamp marker. - Observe with
iris_observe({ since })— everything the app did after that action. - Assert with
iris_assert({ predicate })— verify it, get evidence.
Refs. Elements are addressed by stable handles like e7. You get them from snapshot
or query, then pass them to act/inspect. A ref re-resolves to its element across
re-renders; if the element is gone, you get a clear error.
Evidence, not prose. Every tool returns structured data — counts, the matching network call, the snapshot delta — so the agent reasons over facts, not a vibe.
Sessions. Each connected browser tab is a session (named via iris.connect({ session })).
With one tab open you never specify it; with several, pass sessionId.
List connected tabs. → { sessions: [{ sessionId, url, title, lastSeenMs, hidden, focused, throttled }] }.
lastSeenMs is the silence since the tab last reported (not time-since-connect); throttled is
true when the tab is hidden or has gone quiet — a throttled tab silently no-ops timers/rAF/pointer.
A semantic, accessibility-tree view of the page.
- args:
mode?: 'full' | 'interactive' | 'status'(defaultfull),scope?(CSS selector or ref),diff?: boolean,sessionId?. - returns:
{ tree, status: { route, title, visibleDialogs }, nodes, truncated, cost: { bytes, tokens } }. diff: truereturns only what changed since your last snapshot of the same scope/mode —{ mode: 'delta', delta: { added, removed, addedCount, removedCount } }or{ mode: 'unchanged' }(no full tree). The first call (and any call after a route change) still returns the full tree. ~99% fewer tokens to re-look after an action; see token-efficiency.md.costis an estimated size of the result — re-scope (mode/scope) before reading if large.
Find elements (Testing-Library semantics).
- args:
by: 'role'|'text'|'label'|'placeholder'|'testid'|'alt',value,name?(for role),scope?,sessionId?. - returns:
{ elements: [{ ref, role, name, value?, states, visible, text? }] }.
iris_query({ by: "role", value: "button", name: "Save" }) // → ref + descriptorDeep detail on one element — including the signals a snapshot/a11y tree omits, so you can tell "present" from "actually usable / on-theme".
- args:
ref,sessionId?. - returns: descriptor +
tag+box+occluded(another element covers its center — a z-index/overlay bug) +styles { color, backgroundColor, opacity, cursor, display, visibility }+theme { colorToken, backgroundToken, offTheme, tokenCount }(compliance vs the app's:rootdesign tokens —offTheme:trueflags an off-palette color) +component { componentStack, source?: { file, line, column } }(with@syrin/iris-react). - Use it to catch present-but-broken UI:
opacity:0/box0×0 /occluded:true(invisible or unclickable),cursornotpointer(dead control),offTheme:true(off-design-token color).
Perform one action / several in order.
iris_actargs:ref,action,args?,refuseWhenThrottled?,sessionId?. →{ since, dispatched, settled, settleReason, result, session, warning? }whereresult = { ok, ref, action, dispatched, settled, settleReason, effect }. Thesessionblock{ lastSeenMs, throttled, focused }(F2) reports tab health on every act; whenthrottledis true awarningstring is also attached. PassrefuseWhenThrottled: trueto hard-fail instead of warning (opt-in; default is warn-only so background testing never breaks).iris_act_sequenceargs:steps: [{ ref, action, args? }]. →{ since, dispatched, result }whereresult = { ok, count, effects: [...], steps: [...] }(oneeffectper step; each step carries its owndispatched/settled/settleReason).- See §5 for the action list.
Dispatch vs settle (F1). The action is two phases: the dispatch (the synchronous click/fill —
this is what can fail) and the settle (waiting one animation frame so React's commit lands before
we return). The settle is bounded (~200ms): in a throttled/background tab requestAnimationFrame
never fires, so Iris falls back to a timer and resolves anyway. A settle timeout is therefore never an
error — iris_act resolves with settled:false, settleReason:"timeout" and the dispatch (the click)
has still landed. Only a real dispatch failure (stale ref, wrong element type) throws.
| top-level field | meaning |
|---|---|
dispatched |
the action dispatched without throwing (mirror of effect.dispatched) |
settled |
a real animation frame flushed within the budget; false = the fallback timer fired |
settleReason |
"timeout" when the fallback fired (throttled tab), else null |
result.effect — best-effort evidence the action landed. All probes are cheap and capture
only the immediate effect (one microtask + one rAF after dispatch); async, network-driven
re-renders show up in iris_observe, not here.
| field | meaning |
|---|---|
dispatched |
always true (if we couldn't dispatch, the tool throws instead) |
targetMatched |
the ref still resolved to a connected element |
visible |
element was visible at the start of the action |
enabled |
element was not disabled / aria-disabled at the start |
defaultPrevented |
a handler called preventDefault() on the primary cancelable event. Only meaningful for click/dblclick/hover/fill/type/clear/press/upload/drag; always false for non-cancelable events (focus/blur/select/check/uncheck/submit/scrollIntoView) |
focusMoved |
"<prevRef>-><newRef>" if document.activeElement changed, else null (body counts as null) |
valueChanged |
fill/type/clear only: input value before !== after; otherwise false |
domMutatedWithin |
count of MutationObserver records seen in the window |
occluded |
click/dblclick only: the click point hit-tested to a foreign element (an overlay is on top). Synthetic dispatch still delivered the event, but a real user could not click it — treat the target as visually blocked. false when not click-like or not hit-testable |
occludedBy |
the ref of the element actually on top at the click point when occluded, else null |
scrolledIntoView |
click/dblclick only: the target was off-viewport, so Iris scrolled it into view before dispatch |
Use it to distinguish failure modes: visible:false/enabled:false/targetMatched:false →
your action missed; the tool throwing → it never dispatched; occluded:true → the control is
covered by something (a real user is blocked even though the synthetic event landed);
defaultPrevented:true or all of valueChanged:false/focusMoved:null/domMutatedWithin:0 →
the app didn't react.
Clicks run the code, they don't push pixels. A click/dblclick fires the full
pointerdown → mousedown → focus → pointerup → mouseup → click sequence directly on the resolved
element — so pointer- and focus-gated handlers fire the way they do for a real user, with no
coordinate gesture to be intercepted by the presenter HUD or missed off-screen. This is the default
even when native CDP real input is configured (inputMode:"synthetic",
inputModeReason:"synthetic-click-preferred"). Before dispatch Iris hit-tests the click point
(occluded) and scrolls an off-screen target in (scrolledIntoView), so a blocked or off-viewport
target is reported, never silently "successful". For the rare case that needs a trusted native
click — a native file picker, clipboard, or an isTrusted-gated handler — pass args:{ native:true }
to drive it through CDP. hover/drag still use native pointer input (they need real hit-testing).
Cookbook — "Did my action even land?"
const { result } = iris_act({ ref: saveBtn, action: 'click' });
if (result.effect.defaultPrevented) {
// a handler blocked the default — the click was swallowed
} else if (result.effect.domMutatedWithin === 0) {
// dispatched cleanly but the app rendered nothing — likely a dead control
}The timeline + summary of what happened.
- args:
window_ms?(default 2000) orsince?(cursor from an act),filters?(event-type names),max_events?(cap the timeline to the most recent N),sessionId?. - returns:
{ window_ms, events: [...], summary: { network, domAdded, domRemoved, routeChanges, consoleErrors, animations, signals }, cost: { events, bytes, droppedOldest? } }. - Output budget. Every result carries a
cost:{ events, bytes }hint so you can self-budget your next call. Whenmax_eventstruncates the timeline, the dropped count is surfaced ascost.droppedOldest— never a silent cap. (The presenter HUD's own animations are filtered out of the timeline automatically, soobserveshows the app, not the instrument.)
Act, then wait for a predicate — the whole act→observe→assert loop in one hop.
- args:
ref,action,args?,until: <predicate>,timeout_ms?(default 4000; 0 = evaluate once),refuseWhenThrottled?,sessionId?. - returns:
{ effect, verdict, trace, session, warning? }—effectis the action result ({ ok, ref, action }),verdictis{ pass, evidence?, failureReason? },traceis the reaction report of everything the app did after the action, andsession(F2) is the tab-health block{ lastSeenMs, throttled, focused }(with awarningwhen throttled). A failingverdictstill returnseffect+traceso you can see what did happen. The predicate is automatically floored at this act's cursor, so it only matches events the action actually caused.
Block until a predicate holds (or time out). Looks both backward (recent buffer) and forward.
- args:
predicate,timeout_ms?(default 4000),since?,sessionId?. - No stale-signal false passes. By default the evaluation window is floored at your last
act's cursor, so a signal/network/console/animation event buffered before the action can
never satisfy the predicate (the report's "validation 68 == 68 was a lie" footgun). Pass an
explicit
since(an act/observe cursor) to widen or narrow the window deliberately. Element/text predicates query the live DOM and are unaffected bysince.
Verify a predicate; optionally wait for it.
- args:
predicate,timeout_ms?(0 = evaluate once),since?,sessionId?. - Same
sincedefault asiris_wait_for: scoped to your last act so a stale buffered event can't fake a pass; override with an explicitsince. - returns:
{ pass, evidence, failureReason?, session, warning? }. On failure includes a near-miss (e.g. "found the dialog but not visible", or "no button named 'Submit'; saw: Cancel"). Thesessionblock{ lastSeenMs, throttled, focused }(F2) reports tab health on every assert; when throttled awarningis attached so you never assert against a tab that is silently no-oping.
Fast targeted lookups without a full timeline.
iris_network({ since?, method?, urlContains?, status? })→{ calls }iris_console({ level?, since? })→{ logs }iris_animations()→ running/recent animations.
The app-advertised testable surface (registered via iris.describe). Call this first to learn
what to assert on without reading source.
iris_capabilities({ sessionId? })→{ testids, signals, stores, flows }
iris_sessions also surfaces a hasCapabilities flag per session so you know when it's worth
calling. Returns empty arrays (never errors) if the app advertised nothing.
Read the app's domain model before testing: a synthesis of every saved flow + the registered
capabilities. Tells you what to test and where the real risk is without crawling the app. Reads
.iris/flows/ + .iris/contract.json — no browser needed.
iris_domain({})→{ flowCount, flows: [{ name, steps, grade, asserts, signals, testids, warning?, risk? }], declared: { testids, signals, stores }, coverage: { asserted, presenceOnly, assertionFree }, gaps: { unassertedFlows, declaredUntestedSignals, declaredUntestedTestids }, riskRanked, summary }gapsis the point:declaredUntestedSignalsare intents the app emits that no flow asserts (untested behavior);unassertedFlowsact but verify no consequence. Close them with a flow + a consequence assertion (iris_annotate).riskRankedorders flow names worst-first by combining run history (.iris/project.json: recently failed/drifted, or passed-with-errors) with assertion quality (a green assertion-free flow is still risky). Test these first. Each flow'sriskcarries{ level, reason, lastStatus? }.
Read live framework/store state directly instead of inferring it from the DOM — §17.
iris_state({ store?, ref?, path?, depth?, sessionId? })→{ stores, component? }, or{ store, path, found, value, availableKeys?, storeNames }whenpath/depthis given.
Store reads are the reliable path. The ref component read is best-effort and bounded: when the
component state can't be read it returns component: { ok: false, reason: "component-state-unavailable" }
rather than hanging.
Scope big stores so you don't pay for them. A whole store can be tens of KB. Narrow the read:
pathextracts a dot-path sub-tree relative to the namedstore(numeric segments index arrays), e.g.iris_state({ store:"workspace", path:"captionCache.v3.0.text" }).depthcollapses anything deeper than N levels to a compact size marker ({…7 keys},[Array(120)]) so you can skim a store's shape before drilling in.- A wrong
pathreturns{ found:false, availableKeys:[...] }— the keys that were present where the walk stopped — so a mistyped path is self-correcting, not a barenull.
A page can be thrashing — committing many React renders a second — while the DOM stays visually
identical. The DOM/screenshot tools see an idle page; only a tool inside the runtime sees the commit
rate. Iris exposes it as a registered store you read with iris_state:
// app entry — MUST run before react-dom loads, so import it FIRST (React reads the devtools hook
// at renderer-inject time). It augments a real React DevTools hook if present; host-safe (no-ops on
// any failure, never breaks the app).
import { installRenderMeter } from '@syrin/iris';
installRenderMeter();iris_state({ store: "__iris_renders", path: "commits" }) // → total React commits (monotonic)
// read it, do an action (or wait a window), read again → the delta is the commit count for that span.A render storm shows up as a commit count that climbs with no corresponding DOM mutation — a perf regression invisible to any outside-the-page tool.
Show the agent's intent on the page, and control time (toasts/debounces/auto-dismiss) — §16.
Regression detection — §8.
Capture a flow's reaction report and compile it into a replayable program — §9.
iris_record_stop also returns a cost:{ events, bytes } hint alongside the reaction report so you
can gauge the recording's size.
List interactive elements + console-error count for autonomous exploration — §10.
iris_contract_save, iris_flow_save / iris_flow_save_recorded / iris_flow_list /
iris_flow_load / iris_flow_replay / iris_flow_verify, iris_flow_heal, iris_annotate —
record once, replay forever (anchored on testid/signal — or an auto-derived component/source anchor
when there's no testid), with legible drift + self-heal. Full guide:
Flows, the recorder & self-healing.
iris_flow_verify({ names?, sessionId? })— the regression-suite call: replays EVERY saved flow (or a subset) deterministically and returns one verdict{ status, passed, failed, failures: [{ flow, verdict, whatChanged, whereInSource, nextAction }] }. Passing flows are counted; only failures carry detail. Run it after any change — one call, no LLM per flow.- Decision envelope: on a drift/fail,
iris_flow_replay(and eachiris_flow_verifyfailure) returns the actionable fix —whatChanged,whereInSource(file:line), and a one-linenextAction(e.g. "rebind the anchor to 'new-deploy', or update the flow if intended").
iris_end_session, iris_resume, iris_messages — the human can pause the agent, send it a
correction, or end the session from the floating panel; the agent receives guidance on its next
tool call. Full guide: Human-in-the-loop control.
The dev clicks "Flag a bug" in the running app, points at the element that looks wrong, and types what's wrong (⌘/Ctrl+Enter to send). Each flag becomes a mark the agent drains:
iris_review({ sessionId })
→ { marks: [{ id: "m1", note: "this button is misaligned", label: "button \"Pay\"",
source: { file: "src/Checkout.tsx", line: 42 },
fix: "Open src/Checkout.tsx:42 and fix: this button is misaligned. Then iris_review { resolve: \"m1\" }" }],
pendingCount: 1 }
Each pending mark carries the human note, the element label, the source file:line (when the
framework stamped one), and a ready-to-act fix hint. Open the file, apply the fix, then
iris_review({ resolve: "m1" }) — the human watching the panel sees "✓ fixed: …" land. Reading
never consumes a mark, so you can list → fix → verify → resolve. iris_sessions also reports
pendingMarks so you notice flagged bugs during normal orientation.
On a page Iris drives (iris drive), make a request return a 500, force it offline, or delay it — so
testing error/edge states is one declared rule, no backend changes:
iris_network_mock({ mocks: [{ urlContains: "/api/pay", method: "POST", status: 500 }] })
→ { applied: true, count: 1 } // now the checkout POST returns 500 — verify the failure UI
iris_network_mock({ mocks: [{ urlContains: "/api/feed", abort: true }] }) // simulate offline
iris_network_mock({ clear: true }) // turn mocking off
First matching rule wins (urlContains + optional case-insensitive method). Needs a driven browser;
without one it returns a recommendation pointing at iris drive.
Pin the driven page to a fixed viewport so a screenshot baseline is reproducible across machines:
iris_viewport({ width: 1280, height: 800 }) // set once, before iris_screenshot / iris_visual_diff
→ { applied: true, width: 1280, height: 800 }
This is one of three knobs for CI-stable visual regression — set them together:
iris_viewport({ width, height })— same dimensions on every machine.iris_clock({ freeze: true })— kill animation/time jitter so the pixels are stable.iris_visual_diff({ baseline, masks: [{ x, y, width, height }] })— neutralize volatile regions (clocks, avatars, ids) so only real changes fail.
A predicate declares what should be true. iris_assert / iris_wait_for evaluate it
against the live DOM + the event buffer.
// An element exists / is in a state
{ "kind": "element", "query": { "role": "dialog", "name": "Confirm" }, "state": "visible" }
// query supports: role, name, text, label, placeholder, testid, alt, scope
// state: visible | hidden | enabled | disabled | checked | expanded | focused | present
// add "absent": true to assert it is NOT there (regression / removal)
// Visible text anywhere (optionally scoped via an element query instead)
{ "kind": "text", "contains": "Saved successfully", "visible": true }
// A network call happened
{ "kind": "net", "method": "POST", "urlContains": "/api/order", "status": 200, "since": 1820 }
// Navigation
{ "kind": "route", "pathname": "/success" } // or: "contains": "/success"
// Console / errors
{ "kind": "console", "level": "error", "absent": true } // "no errors during this flow"
// Animation
{ "kind": "animation", "name": "dialog-in", "completed": true }
// An app-emitted signal (webhook/websocket/store change you surfaced via iris.signal)
{ "kind": "signal", "name": "webhook:received", "dataMatches": { "provider": "stripe", "id": "*" } }
// A registered store's VALUE — the source of truth no DOM/network read can reach. Walks a dot-path
// (numeric array indices) and matches `equals`: a literal, omitted = presence, or a
// { $gte | $lte | $gt | $lt | $contains | $length } operator pattern. Catches a UI-vs-store desync
// (a deploy that only LOOKS shipped) deterministically, in one call — no LLM, no DOM scraping.
{ "kind": "state", "store": "app", "path": "deployments.0.status", "equals": "live" }A state assertion is graded as a consequence (a wrong element or stale render cannot fake it),
and is usable the same three ways anywhere predicates flow: ad-hoc (iris_assert / iris_act_and_wait
until), as a flow step invariant (iris_annotate { kind: "assert-state", statePath, store?, equals? }),
and as a flow's golden end-condition (iris_annotate { kind: "success-state", statePath, … }). On a
miss it names the real store value and the keys that were available — legible, not a blind fail.
{ "allOf": [ <predicate>, <predicate>, … ] } // every one must hold
{ "anyOf": [ <predicate>, … ] } // at least one
{ "not": <predicate> }timeout_ms(onassert/wait_for): wait up to N ms for it to become true.since(onnet/consoleleaves): only consider events after this cursor (fromact).
dataMatches uses shallow JSON matching; * means "present, any value".
iris_act({ ref, action, args }):
| action | args | notes |
|---|---|---|
click / dblclick |
— | dispatches a real click |
hover |
— | mouseover+mouseenter (triggers JS hover state) |
focus / blur |
— | |
fill |
{ value } |
sets value via React-safe native setter + input/change |
type |
{ text } |
appends to current value |
clear |
— | empties an input |
select |
{ value } |
<select> option |
check / uncheck |
— | checkbox/radio |
submit |
— | submits the element's <form> |
press |
{ key } |
keydown/up (default Enter) |
scrollIntoView |
— | |
upload |
{ name, content?, type? } |
sets a file on <input type=file> |
drag |
{ toRef } |
pointer-based drag (dnd-kit / rbd) + HTML5 DnD |
webmcp |
{ tool, params } |
calls a navigator.modelContext tool if the site exposes one |
iris_snapshot has three modes — pick the cheapest that answers your question:
status(~30 tokens) — route, visible dialogs, counters. "Where am I, is a modal open?"interactive(~100 tokens) — only actionable elements (buttons, inputs, tabs…). "What can I click?" Non-interactive content (e.g. 1,000 list rows) is skipped.full— the whole semantic tree. Use only when you truly need everything.
scope narrows any snapshot or query to a subtree — a CSS selector
(scope: "[data-testid=item-list]") or a ref. This is the main lever for keeping payloads
small and queries unambiguous on big pages.
Each is phrased as the situation you're in, then how the agent verifies it.
const { since } = iris_act({ ref: iconBtn, action: "click" })
iris_assert({ timeout_ms: 2000, predicate: { allOf: [
{ kind: "element", query: { role: "dialog" }, state: "visible" },
{ kind: "console", level: "error", absent: true }
]}})const { since } = iris_act({ ref: saveBtn, action: "click" })
iris_assert({ timeout_ms: 3000, predicate: { allOf: [
{ kind: "net", method: "PUT", urlContains: "/api/profile", status: 200, since },
{ kind: "text", contains: "Saved", visible: true }
]}})Act in section A, navigate to B, assert there:
iris_act({ ref: notifyBtn, action: "click" }) // in "Items"
iris_act({ ref: notificationsTab, action: "click" }) // go to "Notifications"
iris_assert({ timeout_ms: 2000,
predicate: { kind: "text", contains: "New item queued", visible: true } })const { since } = iris_act({ ref: addBtn, action: "click" })
iris_assert({ predicate: { kind: "net", urlContains: "/api/items", status: 202, since } }) // accepted
iris_assert({ predicate: { kind: "element",
query: { text: name, scope: "[data-testid=item-list]" }, absent: true } }) // not yet
// …later: click your Refresh button, then wait for it…
iris_act({ ref: refreshBtn, action: "click" })
iris_wait_for({ timeout_ms: 5000, predicate: { kind: "element",
query: { text: name, scope: "[data-testid=item-list]" }, state: "visible" } })Don't scroll and eyeball — query finds it regardless of position:
iris_assert({ timeout_ms: 3000, predicate: { kind: "element",
query: { text: "Invoice #4821", scope: "[data-testid=item-list]" }, state: "visible" } })Note: if your list is virtualized (react-window/virtuoso), off-screen rows aren't in the DOM yet — scroll-to-find support is on the roadmap; for now scroll the container or assert against the data via an
iris.signal.
iris_act({ ref: emailRef, action: "fill", args: { value: "admin@acme.com" } })
iris_act({ ref: pwRef, action: "fill", args: { value: "•••••••" } })
const { since } = iris_act({ ref: submitRef, action: "click" })
iris_assert({ timeout_ms: 3000, predicate: { allOf: [
{ kind: "net", method: "POST", urlContains: "/api/login", status: 200, since },
{ kind: "element", query: { role: "heading", name: "Dashboard" }, state: "visible" }
]}})
// And the failure path:
iris_assert({ predicate: { allOf: [
{ kind: "net", urlContains: "/api/login", status: 401 },
{ kind: "element", query: { role: "alert" }, state: "visible" }
]}})iris_assert({ predicate: { kind: "console", level: "error", absent: true } })const { since } = iris_act({ ref: generateBtn, action: "click" })
iris_assert({ timeout_ms: 15000, predicate: { allOf: [
{ kind: "net", method: "POST", urlContains: "/api/generate", status: 200, since },
{ kind: "element", query: { testid: "script-output" }, state: "visible" }
]}})iris_act({ ref: fileInput, action: "upload", args: { name: "pitch.mp4", type: "video/mp4" } })
const { since } = iris_act({ ref: analyzeBtn, action: "click" })
iris_assert({ timeout_ms: 15000, predicate: { allOf: [
{ kind: "net", method: "POST", urlContains: "/api/score", status: 200, since },
{ kind: "element", query: { role: "dialog", name: "Score result" }, state: "visible" },
{ kind: "text", contains: "/ 100", visible: true }
]}})const before = iris_inspect({ ref }).styles.backgroundColor
iris_act({ ref, action: "hover" })
const after = iris_inspect({ ref }).styles.backgroundColor
// assert before !== afterPure CSS
:hoverstyling needs a real pointer; drive hover effects from JS state (or use a Playwright real-hover) if you need pixel-exact:hover. Iris reads computed style after the JS state change.
Surface it from your app, then assert on it:
// in your app
iris.signal('webhook:received', { provider: 'stripe', event: 'payment_intent.succeeded' });
iris.state('cart', { items: 3 });
// Advertise your testable surface at init so the agent learns it without reading source.
// Call this once at module load (before connect); it merges idempotently across HMR reloads.
iris.describe({
testids: ['cart-badge', 'toast'],
signals: ['webhook:received'],
stores: ['cart'],
flows: [{ name: 'checkout', steps: ['fill address', 'pay', 'see confirmation'] }],
});The agent reads this back with iris_capabilities() — see §3.
iris_assert({ timeout_ms: 30000, predicate: {
kind: "signal", name: "webhook:received", dataMatches: { provider: "stripe" } } })Signals only help if you actually emit one whenever user-visible state changes. The
@syrin/iris-eslint-plugin package ships one rule, iris/require-signal-on-mutation, that flags any
function which calls a configured store mutator but never fires the signal callee in
the same body — so the signal map can't silently fall behind the store.
// eslint.config.mjs
import iris from '@syrin/iris-eslint-plugin';
export default [
{
plugins: { iris },
rules: {
'iris/require-signal-on-mutation': [
'error',
{ mutators: ['set', 'reorderSections', 'addSection'], signalCallee: 'irisSignal' },
],
},
},
];mutators lists the callee names that change state; signalCallee (default
['irisSignal', 'signal']) is the name that counts as firing a signal. See
packages/eslint-plugin/README.md for scoping and
matching details.
The "did anything silently break/disappear?" workflow.
// after you've confirmed a screen is good:
iris_baseline_save({ name: "checkout-ok" })
// later, after a change:
iris_diff({ baseline: "checkout-ok" })
// → { removed: ["- button \"Export\""], added: ["- alert \"Card declined\""],
// consoleErrors: 2, routeChanged: false }diff ignores volatile ref ids and compares the semantic structure, so you get real
ADDED/REMOVED elements plus the current console-error count. Great as a guardrail the agent
runs after each edit: "diff against checkout-ok; fail if anything interactive was removed
or console errors increased."
The semantic iris_diff above never flakes. For an actual pixel diff (iris_screenshot +
iris_visual_diff, driven mode), three knobs make it CI-stable instead of flaky:
iris_viewport({ width: 1280, height: 800 }) // 1. same size on every machine
iris_clock({ freeze: true }) // 2. no animation/time jitter
iris_screenshot({ name: "checkout-ok" }) // capture the baseline
// …later, after a change, at the same viewport + frozen clock:
iris_visual_diff({ baseline: "checkout-ok", masks: [{ x: 0, y: 0, width: 200, height: 24 }] })
// → { matched: false, changedPixels, ratio, region, diffPath } // 3. masks ignore volatile regionsWithout all three, a pixel diff fails on a different window size, a mid-animation frame, or a live clock/avatar — the classic reasons teams give up on screenshot tests. With them, only a real visual change fails.
Capture everything that happens across a span — useful for "run my whole checkout flow and tell me what happened," or to keep a known-good trace.
iris_record_start({ recordingName: "checkout" })
// …agent performs the flow (iris_act / iris_act_sequence)…
iris_record_stop({ recordingName: "checkout" })
// → {
// recordingName,
// program: { version, steps: [{ tool, args: { by:"testid", value, action, args }, stable }] },
// events: [...ordered timeline...],
// summary: { network, domAdded, … },
// warning? // present when some steps could not be bound to a testid
// }iris_record_stop returns a compiled, replayable program: the agent's iris_act /
iris_act_sequence invocations captured during the span, with each ref normalized to its
element's data-testid where resolvable. Re-run it later:
iris_replay({ recordingName: "checkout" })
// re-resolves each step by testid and re-runs the actions in order
// → { recordingName, ok, steps: [{ tool, ok, error?, note? }] } // stops at the first failureLimitation. Normalization to a stable testid only works for elements that have a
data-testid. A step whose element has none is stored in ref form (stable: false) and
iris_record_stop returns a warning; replay best-effort re-uses the stored ref, which is
only valid within the same live session and is not portable across reloads. Add data-testid
to the elements you want replay-stable.
Have the agent crawl and stress a screen without a script:
iris_explore({ scope: "main" })
// → { interactive: [ { ref, desc }, … ], consoleErrors, hint }The agent then acts on each ref, observes the reaction, and reports anomalies (failed requests, console errors, dead controls). Good for "click everything on this page and tell me what breaks."
If you already have test cases — a QA checklist, acceptance criteria, a spreadsheet, manual steps — you can hand them to your agent and have it run + verify each against the live app. Each case becomes a predicate:
| Test case (English) | Iris check |
|---|---|
| Login with valid creds lands on the dashboard | allOf[ net /api/login 200, element heading "Dashboard" visible ] |
| Submitting the form shows a success toast | text "Saved" visible (+ net … 200) |
| Deleting an item removes it from the list | element {text, scope:list} absent: true |
| No console errors on the checkout page | console level:error absent:true |
| Export button visible for admins, hidden for viewers | element {role:button, name:Export} visible / absent |
| Clicking a row opens the detail drawer | element {role:dialog} visible |
A practical workflow:
"Here are our 12 dashboard test cases. For each, drive the app with Iris and tell me pass/fail with evidence. For any failure, show the source file to fix."
This is the sweet spot: the manual cases you never automated become things the agent runs in seconds, on every change. It complements your CI Playwright/Cypress suite (which gates releases) — Iris is the in-loop checklist while you build.
Iris is cheap by design (benchmark), but keep it that way:
- Prefer
iris_query+iris_assert(~30 tokens each) over snapshots inside the loop. - Use
mode: "interactive"or"status", not"full". - Use
scopeto look at just the relevant subtree. - Reach for
mode: "full"only when you truly need the whole page.
- Accessibility = legibility. Real
roles, labels, anddata-testids make queries precise and stable. It's also just good a11y. - Stable handles for controls. Prefer
data-testidover names that include dynamic counts (e.g. "Notifications (3)") — the count changes the accessible name. - Always thread
since. Pass the cursor fromiris_actintoobserve/assertso you only consider what happened after the action. - Use
timeout_msfor async. Don't assert instantly on something that arrives over the network or after a re-render. - Watch
session.throttled(F2). Background tabs throttle timers/rAF/pointer gestures, so an act can silently no-op. Everyiris_act/iris_assert/iris_act_and_waitresult carriessession: { lastSeenMs, throttled, focused }and, when throttled, awarning. Refocus the tab (or run it foregrounded) before driving; passrefuseWhenThrottled: trueto hard-fail instead. - Scope big pages. On dashboards with hundreds of elements, scope queries to the panel you care about.
- Never breaks your app. Observers are additive and reversible (
iris.disconnect()restores patched globals). It won't interfere with your app's behavior.
Does this run in production? No — keep iris.connect() behind a dev guard. The SDK is
side-effect-free and tree-shakes out of prod builds.
Do I have to change my components? No, for basic look/act/observe. You'll get better
results by adding data-testids and labels where the agent needs precision.
Does it work without React? Yes — the core (DOM/network/route/console/animation/snapshot/ actions) is framework-agnostic. React gets component + source mapping; Vue/Svelte adapters are on the roadmap.
Can it judge whether my UI looks good? No. Iris verifies behavior, not aesthetics. Visual/pixel correctness and "does it feel right" remain human (or a visual-diff tool).
Does it replace Playwright/Cypress? No — those are your scripted CI suite. Iris is for in-loop verification while the agent codes, and for the cases you never automated. They compose.
How does it compare to Playwright MCP / Chrome DevTools MCP? Those let an agent drive/ inspect a separate browser; Iris verifies your own running app (real session/auth) with assertions + regression as first-class, far more cheaply. See the README comparison.
Multiple tabs/apps? Each is a session; pass sessionId to any tool when more than one is
connected (iris_sessions lists them).
- Dev-only, localhost-only by default. The bridge binds
127.0.0.1; the SDK is meant for dev builds. - No telemetry. Nothing phones home. Baselines/recordings are local.
- Network bodies aren't captured by default — only method/url/status/timing. Body capture
is opt-in and runs through a redactor (drop
password/token/secret/… + your patterns). - Additive & reversible. Iris patches
fetch/History/console defensively and restores them on disconnect; it will not break the app under test.
Turn it on when connecting:
iris.connect({ session: 'my-app', present: true, pace: 450 });You get, in the page itself:
- a glowing border while the agent is working,
- a synthetic cursor that flies to each target before acting,
- click ripples, hover rings, and a status HUD ("Clicking button "Save"… ✓ passed"),
- a per-action pacing delay (
pace, ms) so a human can follow.
All presenter DOM uses data-iris-* and is excluded from snapshots/observers, so it never
pollutes what the agent sees. Use setIgnoreSelectors([...]) to also hide your own dev
widgets.
A session starts on the agent's first activity and must reliably end even when the agent
misbehaves. Iris is an MCP tool, so the agent (Claude) can crash, disconnect, or simply forget
to call iris_end_session — and a backgrounded tab's own timers are throttled by the browser.
So the Node server owns liveness, not the browser tab:
- Agent goes idle / forgets to end → a server-side reaper (immune to tab throttling) ends the
session after
idleEndMsof no agent commands and pushes the end to the browser. A backgrounded tab still receives that push, so you can switch windows and come back to a correctly-ended HUD. - Agent (MCP client) disconnects cleanly → every active session ends at once.
- Agent kills the Iris server process (so no push can arrive) → the SDK self-ends the session
after it can't reach the bridge for
BRIDGE_LOST_MS(~15s), showing "lost connection to Iris." - Slow-but-alive agent → if it goes quiet long enough to auto-end and then acts again, the
session revives automatically (an explicit
iris_end_sessionstays terminal).
Tune the idle window with iris_session({ idleEndMs }) — it updates both the browser timer and
the server reaper. The human keeps the panel (with Copy/Export of the run) after any end.
So the human sees what the agent is about to do and why:
iris_narrate({ text: "Adding a beat, then checking the section count goes up" })It renders on the HUD. (The agent's private reasoning isn't visible to Iris — narration is how it surfaces intent on the page.)
Fast-forward toasts, debounces, auto-dismiss, and commit-on-blur without waiting:
iris_clock({ freeze: true }) // freeze app timers (Date.now/setTimeout/setInterval)
iris_act({ ref: e9, action: "click" })
iris_clock({ advanceMs: 5000 }) // jump 5s — the auto-dismiss fires now, deterministically
iris_assert({ predicate: { kind: "element", query: { role: "alert" }, absent: true } })
iris_clock({ reset: true }) // restore real timersIt does not freeze requestAnimationFrame/microtasks (React's scheduler keeps running),
and Iris's own internal timers are insulated, so freezing never stalls the tools.
blurnow fires a bubblingfocusout, so React's commit-on-blur (onBlur) runs — inline editors and form fields commit.fill/typefocus first so a laterblurcommits.hoveraccepts{ holdMs }to dwell, so timer-gated reveals mount; thenwait_forthe revealed nodes.dragyields a frame between phases (React flushes between steps) and accepts{ data: { mime, value } }for customdataTransferpayloads.
{
"kind": "signal",
"name": "chat:edit-applied",
"dataMatches": { "count": { "$gte": 1 }, "sections": { "$contains": "hook" } },
}
// operators: $gte $lte $gt $lt $contains (array/substring) $length ; "*" = presentOn a failed signal assert, the result includes a near-miss: the signals that did fire
with that name + their data. And iris_observe's summary now includes domChanged (in-place
text/attribute re-renders, not just added/removed nodes).
These close the "is the action trusted?" gap — so you can tell my action missed vs the app didn't react vs the tool didn't dispatch.
Every iris_act result now carries an effect:
{ since, dispatched, settled, settleReason,
result: { ok: true, ref, action, dispatched, settled, settleReason, testid,
effect: { dispatched, targetMatched, visible, enabled, defaultPrevented,
focusMoved: "e11->e12"|null, valueChanged, domMutatedWithin } } }settled:false, settleReason:"timeout" means the settle frame did not flush within the budget
(a throttled/background tab) — this is not a failure: the dispatch landed and the tool resolved.
Read it to disambiguate failures instantly: targetMatched:false = your ref was stale;
defaultPrevented:true = a handler cancelled it; domMutatedWithin:0 + valueChanged:false
= the app didn't react.
iris_act_and_wait({ ref, action, args?, until: <predicate>, timeout_ms })
// → { effect, verdict: { pass, evidence, failureReason? }, trace: <reaction report> }Performs the action (with settle so React commits land in the window), waits for until, and
returns the action's effect + the verdict + the full causal trace. Collapses four calls into
one.
No need to broadcast a signal for every fact. Register stores in your app:
import { registerStore } from '@syrin/iris';
registerStore('workspace', () => useWorkspace.getState());iris_state({ store: "workspace" }) // → { stores: { workspace: {…} } }
iris_state({ ref: "e9" }) // → { component: { ok: true, component, hooks } } or { component: { ok: false, reason: "component-state-unavailable" } }
// Scope a large store instead of paying for the whole thing:
iris_state({ store: "workspace", path: "captionCache.v3" }) // → { found: true, value: {…} }
iris_state({ store: "workspace", depth: 1 }) // → top-level keys, deeper values collapsed to "{…N keys}"
iris_state({ store: "workspace", path: "nope" }) // → { found: false, availableKeys: ["captionCache", "version", …] }Store reads are the reliable path; ref reads degrade to a structured failure rather than blocking.
path (dot-path, numeric segments index arrays) and depth keep a 60KB store from becoming a token
tax — and a wrong path returns the keys that were there, so it's self-correcting.
Declare it once so the agent learns the surface without reading source:
import { registerCapabilities } from '@syrin/iris';
registerCapabilities({ testids: [...], signals: [...], stores: [...], flows: [...] });iris_capabilities() // → { testids, signals, stores, flows }iris_record_start → drive the flow → iris_record_stop returns a compiled program
(steps bound to testids/signals, not volatile refs). iris_replay({ recordingName }) re-executes it —
your flow becomes a deterministic regression run, not a checklist.
Iris drives actions by dispatching JS events from inside the page. That covers click, fill,
type, select, submit, press, and HTML5 drag — but it cannot trigger browser-native pointer
behavior: onMouseEnter/onMouseLeave, hover-gated reveals, and pointer-library drags rely on
the browser's real hit-testing, which synthetic events don't drive.
Clicks are synthetic by default — on purpose. Even with real input configured, click/dblclick
run the occlusion-honest synthetic path (full pointerdown→…→click sequence + a occluded hit-test +
off-viewport auto-scroll), reporting inputModeReason:"synthetic-click-preferred". There's no
coordinate gesture for the presenter HUD to intercept or to miss off-screen, and synthetic dispatch
reaches the resolved element directly. Reserve native clicks for the rare isTrusted-gated case
(native file picker, clipboard) with args:{ native:true }. Real input remains the path for
hover/drag, which genuinely need the browser's hit-testing. Every iris_act result tells you
which path ran:
{ since, dispatched, settled, inputMode: "synthetic" | "real", inputModeReason?, result, session, warning? }When inputMode is "synthetic" and the target has hover/enter handlers, the result carries a
warning so you know a hover may be a no-op — you never have to reverse-engineer it.
inputModeReason — never a silent fallback. When real input is configured but a pointer
act still ran synthetic, the result says why, so per-element inconsistency is diagnosable
instead of mysterious:
inputModeReason |
meaning / fix |
|---|---|
page-not-correlated-to-a-cdp-target |
no CDP page matches the session URL — usually a fresh tab or a CDP target that isn't this page |
element-not-locatable |
the element had no box (off-screen / stale ref) — scrollIntoView first |
drag-target-unresolved |
a drag's toRef was missing or not locatable |
provider-declined / provider-error |
the CDP provider declined or threw (the latter also sets warning) |
not-a-pointer-action |
fill/type/etc. — these are always synthetic by design |
synthetic-click-preferred |
a click/dblclick ran the occlusion-honest synthetic path by default — pass args:{ native:true } to force a trusted native click |
(No inputModeReason is set when real input simply isn't configured — synthetic is the expected default there.)
Point Iris's server at a Chrome DevTools (CDP) endpoint; it then drives real pointer input
(via Playwright connectOverCDP) at the element's box for hover/drag (and for click/dblclick
only when you pass args:{ native:true } — clicks default to synthetic), and reports
inputMode: "real".
-
Launch your browser with remote debugging:
# Chrome/Chromium google-chrome --remote-debugging-port=9222 http://localhost:3000 -
Tell the Iris server where it is, via the MCP config
env:// .mcp.json { "mcpServers": { "iris": { "command": "npx", "args": ["@syrin/iris"], "env": { "IRIS_CDP_URL": "http://localhost:9222" }, }, }, }
That's it. Iris correlates the CDP page to your SDK session by URL; pointer actions now fire
native hover/enter so hover-gated suggestion panels, tooltips, and pointer-based drag become
drivable. Everything else is unchanged, and with no IRIS_CDP_URL set, Iris stays in the
synthetic (zero-dependency, in-page) mode — Playwright is an optional dependency loaded only
when you opt in.
SPA navigation is handled. The URL correlation tracks client-side route changes (
pushState/replaceState/popstate), so real input keeps working after your app navigates into a sub-route — e.g. the hover/quick-edit cluster on a/workspaceview stays drivable. (Before 0.3.6 the reported session URL froze at load, so real input silently dropped to synthetic after the first SPA navigation; if you seeinputModeReason:"page-not-correlated-to-a-cdp-target", upgrade to ≥ 0.3.6.)
Watching the agent (presenter, M5.8). With
present: truethe activity border now glows once while the agent is busy and fades when idle (no per-action strobe); the HUD sits bottom-center, shows a READING vs ACTING chip so you can tell observation from action at a glance, andiris_narratelines are queued with a minimum on-screen dwell so none flash by unread.
Limitation — un-scriptable tabs. Iris observes/drives a tab through the in-page SDK + (optionally) CDP; it cannot bring to front or recover a browser tab the OS won't let it script (e.g. a backgrounded or non-default-browser tab reporting
hidden:true/throttled:true). When that happens,iris_sessionsand every act/assert result carry asession.recommendationsaying so and pointing toiris drive <url>for a guaranteed scriptable context — refocus the tab, or useiris drive.
iris_snapshot({ mode: "interactive" }) // - tab "Overview" (ref=e2) // - button "Add item" (ref=e5) // status: { route: "/dashboard", visibleDialogs: [] } iris_snapshot({ diff: true }) // after an action — only the change set // { mode: "delta", delta: { added: ['- alert "Saved!"'], removed: [], addedCount: 1, removedCount: 0 } }