rwyn ("arwin") means run what you need.
rwyn is a stage-aware planner and executor for change-driven verification. Given a code change, it determines which repository requirements are plausibly at risk, gathers and weighs the relevant evidence, constructs the smallest practical plan it can justify for the current stage of the code lifecycle, and executes that plan automatically.
The goal is simple: get the confidence you need with the least unnecessary work.
Install rwyn:
curl -fsSL https://get.rwyn.dev/install.sh | sh
brew install rwyn
cargo install rwynInitialize a repository:
cd your-repo
rwyn setup
# have the agent inspect the raw sources and write .rwyn/config.yaml
rwyn doctor
rwyn build
rwyn run --stage save
rwyn plan --stage merge
rwyn explainFor contributors or local development from source:
cargo install --path .Or build a release binary locally:
cargo build --release
./target/release/rwyn --helpSet up a repository in five steps:
- install the CLI
- run
rwyn setupto print the agent setup contract and raw evidence sources - have the agent inspect the repo and write
.rwyn/config.yaml - run
rwyn doctor - run
rwyn build - run
rwyn run --stage save
The preferred bootstrap path is agent-driven: rwyn provides the schema,
validation surface, and raw evidence inventory; the agent reads the repository and
writes the model. CI files, scripts, docs, and manifests are evidence. They are
not imported as truth by brittle provider-specific setup code.
rwyn setup does not mutate the repo. It tells the agent what to inspect and
where to write the model:
.rwyn/config.yaml- an initial set of stages
- an initial set of steps
- obvious repository structure and toolchain assumptions
- suggested CI wiring
A minimal config looks like:
requirements:
- id: tests-pass
description: TypeScript tests pass
stages:
save:
default_confidence: medium
commit:
default_confidence: high
merge:
default_confidence: certain
steps:
- id: test
kind: test
command: bun test
inputs:
- "src/**/*.ts"
satisfies:
- tests-passBootstrap is the heavy-agent phase of the loop described in How The Model Is Built And Improved; after the first session, the same skill drives lighter ongoing iteration.
If you are using Claude Code or Codex, the best initial setup flow is:
- run
rwyn setup - ask the agent to inspect the raw sources it reports and scaffold
.rwyn/config.yaml - have it add declarative plugins for obvious repo-specific structure
- have it run
rwyn setup inspectto compare observed repo surfaces with the configured model - have it run
rwyn doctor - have it run
rwyn build - have it run
rwyn plan --stage save - have it explain any surprising selections
The agent uses the rwyn skill or plugin surface for setup. Repository truth lives in config and plugins, not in prompts.
The model improves as the repository gives rwyn more information:
- a minimal working config
- declared prerequisites and hard relationships
- dynamic evidence such as coverage
- plugins for hidden structure
gaps,replay, andcomparefor ongoing refinement
Practical habits that move the model forward:
- keep steps narrow and scopeable
- declare obvious prerequisites and hard relationships explicitly
- collect coverage so test scoping and confidence improve
- model generated artifacts and hidden dependencies with plugins
- treat repeated expensive early-stage work as a sign that the repo needs a cheaper signal
Coverage tells rwyn what code a step actually exercises, which sharpens scoping and confidence beyond what declared and static evidence can give.
Use this loop to keep dynamic evidence current:
rwyn coverage status
rwyn coverage refresh
rwyn coverage refresh --background
rwyn coverage refresh --all --background
rwyn coverage refresh --all --background --max-parallel 4
rwyn coverage refresh --step go-test-coverage --scope ./pkg/service
rwyn coverage refresh --all --dry-run --json --full-statuses
rwyn coverage jobs
rwyn coverage ingest --input path/to/lcov.info --format lcov --step bun-testUseful evidence is:
- incremental
- scope-aware
- fresh enough to trust
- shared across local runs, CI, and agents when possible
Add Plugins When The Repo Has Hidden Structure
Plugins capture repository truth that is real but not obvious from plain file layout:
- generated-artifact relationships
- hidden dependency edges
- interface-to-implementation links
- path-derived scopes
- repository-specific structure that affects relevance or confidence
Most teams still treat verification as tribal knowledge.
Developers learn rules like "if you touch this area, run these tests." CI pipelines encode partial logic in scattered configs and scripts. Agents miss important repo-specific checks, or overrun by falling back to "run everything."
rwyn exists to replace that folklore with a repository model. Instead of teaching every human and every agent what to run for every kind of change, the repository declares what it cares about once, and rwyn plans and executes from that model everywhere.
rwyn is built around a small set of concepts:
- requirement A property the repository wants to hold, such as formatting being correct, generated artifacts being current, relevant builds succeeding, or relevant tests passing.
- step An executable action that provides evidence about, verifies, satisfies, or helps satisfy one or more requirements.
- evidence
The raw information
rwynuses to decide what is relevant, which steps are useful, and when a plan is sufficient. - plan A stage-specific decision about which steps to run, in what order, at what scope, for which requirements.
- stage A repo-defined lifecycle checkpoint with a default confidence target for relevant requirements.
- confidence A global concept applied per requirement. Different requirements do not redefine what confidence means; they differ in what evidence is needed to reach it.
The model is many-to-many:
- a requirement may be supported by multiple steps
- a step may support multiple requirements
- some steps fully satisfy a requirement
- some steps provide only partial evidence
That lets the repository express realities like:
- a formatter satisfying a formatting requirement
- a non-mutating formatting step verifying the same requirement
- a narrow unit test providing partial evidence about a broader integration risk
- a generation step satisfying an artifact-freshness requirement that later verification depends on
The same logical requirement may admit different operational strategies at different stages. The repository model defines those choices explicitly.
rwyn is fundamentally an evidence system.
For a given change, rwyn first asks which requirements have non-zero plausible risk. Then it asks which steps provide the best next evidence for those requirements at the current stage. Planning stops when every relevant requirement reaches its effective confidence target.
This means rwyn distinguishes between two phases:
- selection Which requirements are plausibly in play for this change?
- planning Which steps should run now so each relevant requirement reaches the confidence needed for this stage?
A plan gathers enough evidence so that every relevant requirement reaches its target with the least unnecessary work. The planning question is:
what is the cheapest evidence I can gather now that reduces the chance of later-stage failure enough for this stage?
If a slow step is genuinely necessary at an early stage, rwyn runs it. Repeated expensive early-stage work is diagnostic: the repo is missing a cheaper earlier signal for that risk.
Evidence remains inspectable. rwyn can explain:
- why a requirement is relevant
- why a step is useful
- why the selected plan is sufficient
Those are three distinct layers of evidence:
- requirement evidence for relevance
- step evidence for usefulness
- plan evidence for sufficiency
Relevance is computed from a stack of evidence sources, with stronger evidence preferred before weaker evidence:
- declared repository knowledge
- static structural evidence
- semantic or AST-level evidence
- dynamic execution evidence such as coverage or traces
- historical empirical evidence
- heuristics and priors last
Freshness, scope, reliability, cost, contradiction, and recency are all inputs to the calculation.
Confidence is the probability, for a given requirement, that the selected subset of relevant checks catches what the full set of relevant checks would catch, measured against observed outcomes.
A target of 0.75 for a requirement means: calibrated against observed history, the selected subset is expected to catch the same set of failures the full set would catch with at least 0.75 probability. 1 - 0.75 = 0.25 is the acceptable probability that a failure surfaces later.
Confidence is tracked per relevant requirement, not as one global score for the whole change.
For each change:
rwynidentifies the requirements with non-zero plausible risk.- Each relevant requirement gets a confidence estimate from the evidence the planner has, the priors it carries, and the calibration accumulated from prior runs.
- Candidate steps are evaluated by how much useful evidence they provide relative to cost.
rwynkeeps selecting steps until every relevant requirement reaches its effective confidence target.
The stage ladder is a probability budget spread across the lifecycle: early-stage checks accept a higher probability of missed failures because later stages re-verify at higher targets.
Confidence targets inherit cleanly:
stage default -> requirement override
A stage supplies a default confidence target for the requirements relevant at that lifecycle point, declared with the default_confidence: field in .rwyn/config.yaml. Every stage must declare one; missing defaults are surfaced by rwyn doctor.
Confidence is configured on one global scale. Repositories can use either named labels or numeric values, and both resolve to the same underlying targets.
The built-in confidence labels map to:
| Label | Numeric target |
|---|---|
low |
0.25 |
medium |
0.50 |
high |
0.75 |
very_high |
0.90 |
certain |
1.00 |
Numeric values use the same 0.00 to 1.00 scale. For example, confidence: 0.85 sets a stricter target than high and a looser target than very_high.
Within a single planning pass, confidence accumulation is monotonic: adding valid evidence only maintains or increases confidence for a requirement.
Calibration is empirical. In a fresh repo, targets are reached using declared evidence and priors; as run history accumulates, calibration sharpens. The planner reports per requirement whether its confidence number is calibrated against history or still relying on priors.
The planner is one artifact of rwyn's verification model. Building and maintaining that model is the rest of the system.
The model is built and improved in three modes that share the same machinery:
- Bootstrap. Turn a fresh repo into a usable model in one good agent session. Sources include programmatic analysis (file structure, AST, language detection), dynamic evidence (coverage when collected), and AI-elicited declared knowledge (the strongest tier in the evidence stack). The only evidence source legitimately missing at this point is historical outcomes; everything else can be in place from the first session.
- Iteration. When
rwynmisses a failure or pays too much for confidence, the diagnostic surface (gaps,explain,replay) describes the gap as honestly as it can: clean attribution where the data supports it, candidate causes where it does not, and explicit "I cannot tell" where it cannot. The agent reads that report, proposes a model change, validates withreplay, and commits. - Calibration. Background sharpening of probability estimates from accumulated runs. The planner's predictions become more honest as outcomes flow back into the model.
rwyn combines declared semantics with empirical evidence.
Users declare what they know for sure:
- explicit requirement and step relationships
- prerequisites
- obvious full-satisfaction cases
- stage configuration
- scope rules
- repo-specific structure
Everything else is learned empirically over time:
- how predictive a step really is for a requirement
- which early steps substitute well for broader later steps
- how much confidence a scoped run really buys
- which failure surfaces are under-modeled
- where the repo is missing a cheaper earlier signal
Declared configuration remains authoritative for planning and execution. When observed outcomes repeatedly contradict declared assumptions, rwyn surfaces the divergence through warnings, reports, and recommendations.
When a step fails at a later stage, rwyn looks backward to find the earlier stages where the same step was a candidate but was skipped. The miss type comes from why it was skipped:
- Selection miss. The relevance gate filtered the step out wrongly. Fix: tighten relevance.
- Weight miss. The step was a candidate, but the planner believed another step substituted for it. Fix: adjust evidence weights.
- Set miss. The step was not a candidate for the relevant requirement at the earlier stage. Fix: declare or learn the link.
- Link miss. The change-to-step relationship was not modeled at the earlier stage. Fix: add a plugin or declared edge.
Failures where no current step would have caught the problem (a novel failure mode, an unmodeled risk) are a different gap class — "no earlier signal exists for this failure type" — and surface separately, not as miss attribution.
rwyn produces diagnostics and accepts model changes. The orchestration of "read gap → propose change → validate → commit" lives in the skill. The bundled Claude Code and Codex skills are reference drivers; the loop they implement is one example among many.
JSON outputs (rwyn gaps --json, rwyn explain --json, rwyn plan --json) are the public APIs the loop writes against. Their schemas are stable across versions.
For each change, rwyn produces a run record — the durable artifact that powers replay, compare, gaps, and calibration over time.
A run record contains:
- identity: change ref (commit, diff, or range), stage, environment, timestamp,
rwynversion, model state hash - plan: the selected steps, the candidate steps that were skipped, the per-requirement confidence reached, scopes
- decisions: for each candidate step, why it was selected or skipped — the data that powers
explainand miss attribution - outcomes: per executed step, pass/fail, duration, exit code, captured evidence (coverage paths, traces)
- provenance: source of the record (a local
rwyn run, a CI run, or an external ingest)
Plans are proposals before execution and records after. The same object survives both phases, so intent, outcomes, and later attribution all reference the same artifact.
By default, run records live locally in .rwyn/runs/ as JSON files, one per run, and that directory is gitignored. The schema is stable across versions and admits external sources, so any record — local, CI, future hosted — can be ingested by any environment.
The engine ships two primitives:
rwyn export runs— write records out for transport (CI artifacts, archival, manual sharing)rwyn ingest runs <path>— bring records from elsewhere into the local model
Local↔remote sync is orchestrated by the skill. A typical flow: CI uploads .rwyn/runs/ as a build artifact at the end of a stage; the skill, on git pull or session start, downloads new artifacts and ingests them.
An opt-in runs_storage: git_branch mode stores records on a parallel branch like rwyn/runs. The tradeoffs (repo bloat, paths and outcomes in git history, a tool writing to a branch) are why it is opt-in.
rwyn works like this:
- Model the repository.
- Map a change onto that model.
- Select requirements with non-zero plausible risk.
- Evaluate candidate steps as evidence.
- Build the smallest practical sufficient plan for the stage.
- Execute that plan with ordering, prerequisites, and environment contracts preserved.
- Record outcomes and feed them back into future planning.
During development:
rwyn run --stage save
rwyn run --stage commitWhen work is pushed remotely:
rwyn run --stage pushBefore or during integration:
rwyn run --stage mergeWhen the result surprises you:
rwyn explain
rwyn gapsWhen the repository model needs work:
rwyn doctor
rwyn gapsrwyn is stage-aware, but stages are repo-defined lifecycle checkpoints, not platform nouns like "PR" or "merge queue".
A stage provides:
- a default confidence target for relevant requirements
- a lifecycle marker that steps reference to declare when they apply
The planner's objective is already cheapest sufficient evidence, so cost lives in the planner, not in stage configuration. Which steps run when is decided by step-level stage applicability.
The default stage vocabulary is:
savecommitpushmergepost_mergerelease
These are examples, not a universal lifecycle. Repos define the stages that match how they actually work, including names like:
nightlystaginghotfixperfsecuritydeploy
Stages are also flexible enough to support immediate local or operational goals, such as keeping the workspace healthy, validating post-merge behavior, or preparing release artifacts.
The command surface is small and role-oriented.
Most commands operate on the same core inputs:
- stage
Which lifecycle checkpoint you are planning for, such as
save,commit,push,merge, or a repo-defined custom stage. - change The change under consideration. By default this is the current local diff, but it can also be a base/head range, a commit, a pushed change, or an explicit diff artifact.
- scope overrides Optional narrowing or explicit step selection when a user wants to override automatic planning.
- output mode Human-readable explanation by default, with machine-readable output available for CI, agents, and tooling.
Most commands use flags in the shape of:
--stage <stage>
--base <rev>
--head <rev>
--change <change-ref>
--step <step-id>
--scope <scope>
--jsonPrint the agent setup contract for a repository.
setup does not try to import CI or write a guessed model. It inventories raw
sources an agent should inspect, such as manifests, CI files, scripts, docs,
toolchain files, generated-artifact tooling, coverage sources, and repo-specific
linkage clues.
The expected agent outcome is a full initial setup, not a starter file: write
.rwyn/config.yaml, add declarative plugins for hidden repo structure, wire and
seed coverage for scopeable checks, run representative plan/explain probes,
and leave notes for any coverage jobs or environment contracts that are still
running or blocked.
Examples:
rwyn setup
rwyn setup --json
rwyn setup inspect
rwyn setup inspect --jsonrwyn setup inspect is the setup dossier command. It is intentionally factual:
it reports mechanically observed repo surfaces, raw source pointers, configured
requirements and steps, configured plugins and profile groups, coverage evidence,
and schema-level alignment gaps between those facts. It does not grep for magic
words or infer repo intent from loose text; the raw sources are there so the
client-side agent can run the searches and make the judgment. Findings are
labeled as observed, configured, candidate, or unknown so a client-side agent can
do the judgment work without rwyn pretending to be a repo expert.
When CI exposes independently scheduled jobs with separate commands and failure
surfaces, the setup agent should usually model those as separate steps instead
of one aggregate convenience command. Aggregates can be useful local shortcuts,
but they hide runtime cost and make CI comparison less precise. Independently
required steps should have distinct satisfies requirements; a shared aggregate
requirement can make them look like alternatives to the planner at cheaper
stages.
The intended setup loop is:
rwyn setup- agent reads the raw evidence and writes or edits
.rwyn/config.yamland plugins rwyn setup inspect- agent resolves, documents, or intentionally ignores the remaining pointers
rwyn doctor,rwyn build, representativeplan/explain, and coverage probes- repeat until the inspect dossier and probe behavior both look credible
Bootstrap rwyn in a repository.
init initializes a small starter repository model and gets simple repos to a
usable baseline quickly. For serious monorepos, prefer the agent-driven
rwyn setup flow.
rwyn init is responsible for:
- detecting languages, tools, and common repo patterns
- inferring an initial set of requirements and steps
- creating
.rwyn/config.yaml - suggesting stage defaults
- suggesting CI wiring
- optionally scaffolding plugins for common repo-specific structure
Examples:
rwyn init
rwyn init --yes
rwyn init --stage-defaults save,commit,push,mergeValidate installation, repo model, tools, environment, evidence state, and integrations.
doctor is the trust and diagnosis command. It answers questions like:
- is
rwyninstalled correctly? - did the repo load the configuration I expect?
- are required tools available?
- are required environment contracts satisfied?
- is the repository model stale or broken?
- is coverage or other evidence missing or obviously inconsistent?
Examples:
rwyn doctor
rwyn doctor --json
rwyn doctor --stage mergeJSON doctor output includes summary.overall plus ok/warn/error counts, followed
by the detailed result list.
Build or refresh repository structure and derived evidence indexes.
build refreshes the repository model itself. In a mature setup it may happen automatically when needed; the explicit command is for debugging, CI bootstrap, and large repo changes.
Examples:
rwyn build
rwyn build --full
rwyn build --refreshPlan the change, maintain coverage evidence for the selected plan, replan, and optionally execute the final commands.
verify is the beta day-to-day command for repositories with coverage-backed
scoping. By default it runs the coverage maintenance loop but does not execute
the final verification commands. Pass --run to execute the final plan, or
--dry-run to preview the coverage refresh queue and final command selection
without collecting coverage or running checks.
Examples:
rwyn verify --stage pr --base develop
rwyn verify --stage pr --base develop --run
rwyn verify --stage pr --base develop --dry-run
rwyn verify --stage pr --base develop --background
rwyn verify --stage pr --base develop --json --full-scopes --full-statusesThe loop is:
- select the initial plan from the diff
- inspect coverage for the selected checks and scopes
- refresh stale or missing scoped coverage evidence unless disabled
- replan from the updated evidence
- run the final plan only when
--runis present
Use --no-maintain-coverage when you want the old "plan only from existing
evidence" behavior. Use --background for long coverage refreshes; status is
reported through rwyn coverage jobs.
Plan and execute the right steps for the current change and stage.
run is the primary command and the one most day-to-day use lives in.
run is responsible for:
- selecting relevant requirements
- evaluating candidate steps as evidence
- building a sufficient plan for the requested stage
- executing the selected steps in the right order
- recording results for replay, comparison, analytics, and learning
Examples:
rwyn run --stage save
rwyn run --stage commit
rwyn run --stage merge
rwyn run --stage merge --jsonrun also supports explicit user intent when needed:
rwyn run --stage save --step rust-test
rwyn run --stage commit --scope src/foo.ts
rwyn run --stage merge --change origin/main...HEADShow the selected plan without executing it.
plan shows:
- what would run
- why it would run
- what is being scoped
- what prerequisites would be pulled in
- what confidence targets are driving the decision
Examples:
rwyn plan --stage save
rwyn plan --stage merge
rwyn plan --stage merge --json
rwyn plan --stage merge --json --full-scopes
rwyn plan --stage merge --change origin/main...HEADJSON plan output is compact by default: each item reports scope count, an
8-entry preview, and whether the list was truncated. It also includes a summary
and grouped step fanout so agents can see profile expansion, scope totals, and
selection-reason previews without reading every item. Use --full-scopes when
an agent needs every rendered scope, every grouped reason, and the full rendered
command.
Explain a single planning decision.
explain operates on one decision at a time — the most recent plan, or a specific target like a file, requirement, or step. It answers:
- why a requirement is relevant
- why a step was selected
- why a scope was chosen
- why a broader or cheaper alternative was not chosen
- why the final plan is sufficient for the stage
For model-wide introspection — where the model itself is wrong, weak, or contradicted by observed outcomes — use gaps.
Examples:
rwyn explain
rwyn explain path/to/file.ts
rwyn explain --step integration-tests
rwyn explain --requirement formattingExplain text output is compact by default: it shows outgoing graph edges,
grouped plan fanout, and a bounded item-detail preview with bounded scope and
contribution previews. Use --json when an agent needs the complete item list.
Manage coverage and related dynamic execution evidence.
Coverage is one evidence source among many. These commands let the repo inspect, refresh, collect, and ingest coverage without treating it as the whole system.
Examples:
rwyn coverage status
rwyn coverage refresh
rwyn coverage refresh --background
rwyn coverage refresh --all --background
rwyn coverage refresh --all --background --max-parallel 4
rwyn coverage refresh --step go-test-coverage --scope ./pkg/service
rwyn coverage refresh --all --dry-run --json --full-statuses
rwyn coverage jobs
rwyn coverage ingest --input path/to/lcov.info --format lcov --step bun-testUse --background for expensive refresh queues. It starts the same refresh work
as a local job, writes a durable JSON job record and log under .rwyn/jobs, and
returns immediately so an agent can continue setup or planning while coverage
collects. rwyn coverage jobs reports observed process state, elapsed time, log
freshness, log size, and active process-group work such as child count, total CPU
and memory, and the busiest child process. New refresh jobs also record queue
progress, the active coverage step, estimated percent complete, and ETA or
lower-bound ETA from configured step duration estimates. If an old job record
claims to be running but its process is gone and no final status was written,
coverage jobs reconciles it to orphaned with an explicit error.
Use --all when setting up or reseeding a repo to inspect every stage-eligible
coverage step instead of only the diff-derived refresh queue. Coverage freshness
is per atom, not per run: an atom may be a file today and can grow to a function,
line range, or test-derived unit as parsers learn more detail. Fresh atoms are
kept, uncovered atoms are retained as evidence that the code was instrumented
but not hit, stale atoms are mapped back to the narrowest runnable scope rwyn
can derive, and full coverage is used only when an atom cannot be refreshed with
a scoped command. Pass --force when you intentionally want to reseed fresh or
uncovered atoms too.
Coverage status is intentionally evidence-shaped:
fresh: observed coverage evidence is current for the source fingerprintuncovered: the native report instrumented the atom and recorded zero hitsstale: the atom was observed before, but its source fingerprint changedmissing: no trustworthy coverage evidence exists for that atom or scope
rwyn coverage status is scoped to the current diff-derived plan. If no
coverage atoms are selected for that plan, the command says so explicitly and
points agents to rwyn plan or rwyn coverage refresh --all --dry-run for a
full configured coverage evidence view.
Coverage output is compact by default. JSON includes status totals, a bounded
preview, per-status preview buckets, and a bounded refresh-job preview so agents
can see examples of stale, missing, uncovered, fresh, and queued work without
reading the complete atom or job list. Use --full-statuses only when an agent
needs the complete raw atom and refresh-job lists.
For new code, rwyn treats missing coverage as an exploratory refresh. If a
diff-derived plan has already inferred a runnable scope, and a changed file has
no coverage atom yet, coverage refresh queues that scope once. After the native
coverage command runs, zero-hit executable ranges become uncovered evidence
instead of staying in a rerun loop.
Coverage reports with explicit scope_targets provide direct evidence for the
runnable scopes that produced the report. When a report has no explicit targets,
rwyn only infers scopes from records that match the changed atom; unrelated
records from the same full run are not treated as runnable scopes.
Use --max-parallel N to let coverage refresh run multiple refresh jobs at
once. Parallelism is conservative: a coverage step is serial unless it declares
parallel_safe: true. Mark a step parallel-safe only when its command writes
mutable outputs to rwyn-provided locations like {run_dir}, {output},
{junit}, or {go_test_json}, and does not clobber fixed repo paths, shared
databases, service ports, or tool artifacts. rwyn does not assume worktree
isolation for you. This is separate from batch_scopes: batching reduces native
coverage runs for one step, while split-on-failure isolates unreliable batches
without assuming the step is safe to run concurrently.
Use --step and --scope when you already know the coverage evidence to
refresh. --step accepts either a coverage step id or the covered check id; use
multiple --scope flags to refresh only those runnable scopes.
When ingesting an externally produced report for a profiled coverage step, pass the covered check, coverage profile, coverage step, and profile variant together:
rwyn coverage ingest \
--input path/to/lcov.info \
--format lcov \
--step forge-test \
--profile forge-lcov \
--coverage-step forge-test-coverage \
--profile-variant contracts_features:opcm-v2The variant is part of coverage freshness. Unprofiled evidence must not satisfy a profile-specific check.
For Foundry, prefer ingesting attribution JSON when available:
rwyn coverage ingest \
--input packages/contracts-bedrock/coverage-attribution.json \
--format foundry-attribution \
--step forge-test-prod \
--profile forge-attribution \
--coverage-step forge-test-prod-coverage \
--source-root packages/contracts-bedrockCoverage is modeled as ordinary kind: coverage steps, so build/setup
prerequisites belong in requires, inputs, and outputs, just like any other
step. The coverage-specific fields only say which check the evidence covers and
which artifacts to parse.
Full refresh queues run cheaper coverage steps first and continue independent coverage steps after a failure, then report any failed jobs at the end.
Ingest external evidence or historical results.
This command family brings externally generated evidence into rwyn's model, including coverage, execution reports, CI artifacts, and learned priors.
Examples:
rwyn ingest coverage path/to/lcov.info
rwyn ingest runs path/to/run-records/
rwyn ingest evidence path/to/report.jsonRe-evaluate historical changes against the current model.
replay answers: if the current planner had existed in the past, what would it have chosen, and what would it have missed?
This matters for:
- validating model changes
- measuring recall
- understanding regressions
- improving trust before changing policy
Examples:
rwyn replay
rwyn replay --stage merge
rwyn replay --since 30dCompare behavior across stages, environments, or time.
compare helps answer questions like:
- what changed between local and CI behavior?
- what changed after a policy update?
- why does
mergerun more thancommithere? - where plans diverge in ways that matter
Examples:
rwyn compare --group change
rwyn compare --stage commit --stage merge
rwyn compare --environment local --environment ciSurface where the model itself is wrong, weak, or contradicted.
Where explain introspects a single decision, gaps introspects the model against ground truth and accumulated outcomes. It surfaces two classes of gaps:
- correctness gaps Missing early signals, contradicted declarations, under-modeled requirements, weak evidence paths.
- efficiency gaps Expensive early-stage work, broad steps that need narrower scopes, missing cheaper proxies, repeated unnecessary evidence gathering.
Calibration of evidence weights from observed outcomes happens automatically as runs accumulate; gaps is how that calibration surfaces.
Examples:
rwyn gaps
rwyn gaps --stage commit
rwyn gaps --kind efficiency
rwyn gaps --jsonInspect, validate, and edit effective configuration.
This command family answers questions like:
- what config is actually in effect?
- where did this setting come from?
- how are stage defaults resolving?
- what does this requirement or step currently look like?
Examples:
rwyn config show
rwyn config show --effective
rwyn config explain stages.merge
rwyn config validateManage declarative repository-model extensions.
Plugins define repository-specific structure and evidence logic in the repo model.
Examples:
rwyn plugin list
rwyn plugin validate
rwyn plugin scaffold relationScaffold, inspect, and validate CI integration.
Examples:
rwyn ci init github-actions
rwyn ci init circleci
rwyn ci doctor
rwyn ci showThe primary config surface is .rwyn/config.yaml.
It describes:
- requirements
- steps
- stages
- plugins
- runtime paths
- evidence and learning policy
Split files are fine for larger repos, but the default experience is one obvious entry point.
Example:
graph: .rwyn/graph.json
coverage_data: .rwyn/coverage-data
runs_dir: .rwyn/runs
jobs_dir: .rwyn/jobs
requirements:
- id: rust-tests-pass
description: Rust unit and integration tests pass
- id: typescript-tests-pass
description: TypeScript tests pass
- id: bindings-current
description: Generated Go bindings match the Solidity sources
- id: contracts-tests-pass
description: Solidity contracts tests pass
plugins:
- id: solidity-interface-link
type: relation
from: "interfaces/**/*.sol"
to: "src/**/*.sol"
edge: imports
match_rule:
by: normalized_basename
from_strip_prefix: I
- id: solidity-bindings
type: generate
from: "src/**/*.sol"
to: "bindings/**/*.go"
match_rule:
by: normalized_basename
profile_groups:
- id: contracts_features
profiles:
- id: main
env: {}
- id: custom-gas-token
env:
SYS_FEATURE__CUSTOM_GAS_TOKEN: "true"
- id: opcm-v2
env:
DEV_FEATURE__OPCM_V2: "true"
resources:
cargo:
max_parallel: 1
stages:
save:
default_confidence: medium
commit:
default_confidence: high
merge:
default_confidence: certain
steps:
- id: rust-test
name: Rust tests
kind: test
language: rust
command: cargo test --all-targets --all-features
tools: [cargo]
resource_group: cargo
inputs:
- "src/**/*.rs"
satisfies:
- rust-tests-pass
- id: bun-test
name: Bun tests
kind: test
language: typescript
command: bun test
scopeable: true
scope_flag: ""
scope_type: test_paths
tools: [bun]
inputs:
- "src/**/*.ts"
- "src/**/*.tsx"
scope_inputs:
- "src/**/*.test.ts"
- "src/**/*.test.tsx"
- "src/**/*.spec.ts"
- "src/**/*.spec.tsx"
satisfies:
- typescript-tests-pass
- id: bun-test-coverage
name: Bun test coverage
kind: coverage
language: typescript
coverage_for: bun-test
coverage_profile: default
coverage_format: lcov
coverage_language: typescript
coverage_output: lcov.info
coverage_junit: junit.xml
command: >-
bun test --coverage --coverage-reporter=lcov --coverage-dir {run_dir}
--reporter=junit --reporter-outfile {junit} {args}
parallel_safe: true
scopeable: true
scope_flag: ""
scope_type: test_paths
tools: [bun]
inputs:
- "src/**/*.ts"
- "src/**/*.tsx"
scope_inputs:
- "src/**/*.test.ts"
- "src/**/*.test.tsx"
- "src/**/*.spec.ts"
- "src/**/*.spec.tsx"
- id: forge-test
name: Forge tests
kind: test
language: solidity
command: forge test
inputs:
- "src/**/*.sol"
- "test/**/*.sol"
impacted_by:
profile_groups: [contracts_features]
satisfies:
- contracts-tests-passCoverage steps run from the repository root. {output}, {junit},
{go_test_json}, and {run_dir} expand to paths under the coverage work
directory, and {args} expands to rwyn-rendered scope arguments. Use
coverage_source_root when a native coverage report emits paths relative to a
package or project directory instead of the repository root; rwyn stores
coverage atoms as repo-relative paths but strips that source root again when it
renders path scopes for the native command.
Use scope_inputs when a scoped command's dependency surface is broader than
the runnable scope universe. inputs means "these files can affect this step";
scope_inputs means "these files/packages are valid values rwyn may pass to the
runner as a scope." For example, a coverage command may depend on both
src/**/*.sol and test/**/*.sol but only accept test/**/*.t.sol as
--match-path targets.
Coverage reports should include explicit scope_targets whenever a scoped
refresh produced the report. Without explicit targets, rwyn can only infer a
scope from records that match the changed atom; records that merely appeared in
the same broad report are not scope evidence. Coverage steps read that evidence
through their coverage_for check, so the refresh command and the covered test
plan use the same scope model.
When coverage refresh --all, coverage refresh --step <coverage-step>, or a
forced full refresh targets a scopeable coverage step and no explicit --scope
is provided, rwyn discovers the configured scope_inputs universe. By default it
queues one refresh job per discovered scope. This is precise for native reports
that only describe one requested package, file, crate, or test scope at a time:
each report records the requested scope as scope_targets, so later stale atoms
can map back to the narrowest runnable scope instead of inheriting every file
that happened to appear in a full report. If the native runner can accept many
scopes in one command and still emit precise attribution, set
batch_scopes: true; rwyn will pass the discovered scopes together as one
refresh job and rely on the report's per-record attribution to recover the
narrow evidence. If a batched refresh fails, rwyn retries by splitting the scope
set and keeps successful smaller reports; the remaining failures name the scopes
that still need repo-specific attention.
If the native runner is flaky, set coverage_attempts: N on the coverage step.
rwyn will retry the same scoped coverage job before splitting it, and it only
ingests coverage from a successful complete attempt. Keep retry loops outside
native coverage commands when the native tool rewrites coverage artifacts during
reruns.
If a repository has a convenience coverage command that chains multiple native
coverage runners, split it into separate coverage steps when those runners accept
different scope flags. A scoped coverage step should wrap one coherent native
runner surface.
LCOV, Go coverprofile, and LLVM coverage JSON ingestion keeps both hit ranges
and zero-hit executable ranges. Foundry attribution JSON keeps per-test hit
ranges, exact test identities, and record-level scope targets, so one full
native coverage run can teach rwyn which tests and runnable test files exercise
each source atom without rerunning coverage per atom. When a step uses
scope_join: foundry_tests, rwyn preserves both dimensions: file/path evidence
renders as --match-path, exact test evidence renders as --match-contract and
--match-test, and records that have both use both filters together. If changed
line coverage exists, rwyn scopes from the records covering those changed lines;
otherwise it falls back to the file-level records for the changed file. A report
with only zero-hit executable code is valid coverage evidence; it should become
uncovered, not fail ingestion.
For Rust LCOV from Cargo workspaces, rwyn can mechanically attribute source
records to Cargo package scopes by walking to the nearest Cargo.toml package
manifest. That lets a package-scoped cargo llvm-cov setup refresh exact Cargo
package atoms even when the LCOV format only reports source paths.
When source files are available, rwyn filters native line ranges against the
source text before creating coverage atoms. Blank lines, comment-only lines, and
delimiter-only lines such as bare braces are ignored generically so coverage
maps describe code, not formatter or syntax noise.
Coverage steps are not parallel by default. Set parallel_safe: true only after
the command writes every variable artifact to rwyn-provided paths and does not
share mutable repo-local output with another refresh job.
Profile groups are selective. A step without impacted_by runs once. A step
with impacted_by.profile_groups expands across the named profiles, with each
profile's env applied at execution time. Coverage freshness is tracked per
coverage profile and per selected profile variant, so baseline coverage does not
satisfy feature-flag coverage. Use impacted_by.profiles with group:profile
entries when only specific profiles affect a step.
Profile selection uses evidence rather than a representative variant. When a changed plan item has multiple profile variants and rwyn has profile-specific coverage for the covered step, it compares the changed file and line atoms against each variant's coverage and selects the smallest deterministic set of variants that covers the observed changed atoms. If coverage cannot answer, rwyn falls back to all configured variants for that step.
Scopes propagate through declared prerequisites when both steps explicitly
declare compatible scope_type values. For example, if a scoped test requires a
build step and both steps use compatible path scopes, rwyn carries the selected
test paths back to the build step and renders them with the build step's own
scope_flag/scope_join settings. Non-scopeable prerequisites and incompatible
scope types stay unscoped.
Use scope_type: go_packages for Go commands whose native target is an exact
package path such as ./pkg/service. Use scope_type: packages for package
systems whose rendered scope is intentionally recursive, such as
./pkg/service/.... Use scope_type: compile_paths for build commands whose
native target is a file path to compile. Direct source changes become direct
file scopes, while compatible test path scopes can still propagate backward from
dependent test or coverage steps.
- id: go-test
name: Go tests
kind: test
language: go
command: go test
scopeable: true
scope_type: go_packages
inputs:
- "**/*.go"
- id: go-test-coverage
name: Go test coverage
kind: coverage
language: go
coverage_for: go-test
coverage_profile: go-cover
coverage_format: go_coverprofile
coverage_language: go
coverage_output: coverage.out
command: "go test -coverprofile={output} {args}"
parallel_safe: true
scopeable: true
scope_type: go_packages
inputs:
- "**/*.go"
requires:
- generated-fixturesIf broad coverage needs package discovery, generated artifacts, service setup, or
CI-like environment, model those as normal steps and add them to requires.
Explicit CLI flags still override config when needed.
Requirements are first-class declared objects. Each one names a property the repository wants to hold; steps reference requirements to declare what they provide evidence for.
A requirement describes:
- identity (
id, optional human-readablename) description- optional
confidenceoverride (replaces the stage default for this requirement when relevant)
Example:
requirements:
- id: rust-tests-pass
description: Rust unit and integration tests pass
- id: security-checks-pass
description: All critical security checks pass
confidence: certain # always certain, regardless of stage default
- id: bindings-current
description: Generated Go bindings match the Solidity sourcesSteps reference requirements by id, with relationship strength:
satisfies:— the step's success fully addresses the requirementevidence_for:— the step is candidate evidence for the requirement; its lift is learned from outcomes
steps:
- id: cargo-fmt-check
satisfies:
- formatting-clean
- id: rust-test
satisfies:
- rust-tests-pass
evidence_for:
- bindings-current # rust tests indirectly exercise generated bindingsevidence_for contributes zero confidence until the planner has enough observed outcomes to calibrate the lift. The declaration marks the step as candidate evidence: when it runs (because it satisfies something else, or because it is cheap), its outcomes accumulate against the requirement and a learned weight emerges over time.
A mutating step (a formatter applying fixes) and a non-mutating step (a formatter in check mode) are two different steps. Each can declare stage applicability — stages: [list] to limit to specific stages, exclude_stages: [list] to remove specific ones, or neither to apply at every stage. The planner picks from stage-eligible steps:
steps:
- id: cargo-fmt
kind: format
mutating: true
stages: [save, commit]
satisfies:
- formatting-clean
- id: cargo-fmt-check
kind: format
stages: [merge, push]
satisfies:
- formatting-cleanMutation is a step property recorded on the step itself; behavior across stages is controlled by which step is listed where.
A step describes:
- identity and kind
- command
- inputs and outputs
- explicit prerequisites for non-file dependencies (
requires:) - which requirements it
satisfiesor providesevidence_for - stage applicability (
stages:to allowlist,exclude_stages:to blocklist; defaults to all stages) - whether it mutates (
mutating: true) - whether and how it can be scoped
- toolchain requirements
- required environment variables
- optional dynamic evidence such as coverage
Explicit step invocation uses the normal planner and executor, so prerequisites, layering, and evidence rules still apply.
Examples:
rwyn plan --step rust-test
rwyn plan --step bun-test --scope src/foo.test.ts
rwyn plan --step lint --step test --step-scope test=src/foo.tsrwyn run always executes; rwyn plan never does. They share the same arg shape, so any preview is the same invocation with plan instead of run.
Step ordering is derived from declared inputs and outputs by default. If step B's inputs include a path that step A's outputs produce — directly, or via a generate-type plugin relationship — the planner runs A before B without anything explicit.
For dependencies that are not file-based (a service that must be running, a setup script that exports environment, a remote resource that must be initialized), declare them explicitly with requires::
steps:
- id: db-migrate
kind: setup
command: ./scripts/migrate.sh
stages: [save, commit, merge]
- id: integration-test
kind: test
command: bun test --integration
requires: [db-migrate]
inputs:
- "src/**/*.ts"
satisfies:
- integration-tests-passThe planner combines implicit (file-derived) and explicit (requires:) ordering into a single dependency graph and executes steps in valid topological order. Cycles are surfaced by rwyn doctor.
If independent steps contend for the same local resource, declare scheduling
constraints instead of fake prerequisites. requires: means semantic dependency;
resources and mutexes only limit concurrency inside an otherwise parallel layer.
resources:
cargo:
max_parallel: 1
steps:
- id: crate-a-test
kind: test
command: cargo test -p crate-a
resource_group: cargo
- id: db-integration
kind: test
command: ./scripts/integration-db.sh
mutexes: [local-test-db]rwyn plan still shows prerequisite layers separately, and includes a resources
section when selected steps have scheduling constraints.
Steps can also declare environment contracts:
steps:
- id: slice-v5
name: Slice adapter v5
kind: test
language: typescript
command: bun run scripts/integration-run.ts --adapter v5
tools: [bun]
required_env:
- FELDERA_API_URL
- FELDERA_API_TOKENrwyn integrates with existing CI systems:
- CI remains the execution substrate
rwynbecomes the planner and executor- local development, agents, and CI all use the same verification model
rwyn works adopted entirely locally. With local and CI both routed through it, plans, evidence, and outcomes reinforce each other over time.
A CI setup looks like:
- name: Install rwyn
run: curl -fsSL https://get.rwyn.dev/install.sh | sh
- name: Run merge-stage verification
run: rwyn run --stage mergeCI bootstrap commands look like:
rwyn ci init github-actions
rwyn ci init circleci
rwyn ci doctorrwyn treats coverage as one dynamic execution signal among many, used for scoping and confidence updates.
Coverage and other evidence are:
- incremental
- atom-based
- scope-aware when a stale atom can be mapped back to a runnable scope
- freshness-aware by source fingerprint
- explicit about zero-hit executable code as
uncovered - exploratory for new or otherwise unobserved changed code
- reusable across local runs, CI, and agents
Examples:
rwyn coverage status
rwyn coverage refresh
rwyn coverage refresh --background
rwyn coverage refresh --all --background
rwyn coverage refresh --all --background --max-parallel 4
rwyn coverage refresh --all --dry-run --json --full-statuses
rwyn coverage jobsExecuted plans also produce normalized run records that feed replay, compare, and gaps. Calibration of evidence weights from those records happens automatically; the loop that uses them to improve the model over time is described in How The Model Is Built And Improved.
Repo-specific structure lives in declarative repository knowledge: hidden dependency relationships, generated-artifact relationships, path-to-scope derivation, and any repository-specific structure that affects relevance or confidence.
The plugin DSL is extensible. New types can be added as the engine learns new repository patterns; the existing types remain stable.
rwyn normally treats comment-only source diffs as text-only so doc/comment edits do not fan out into full semantic checks. Some ecosystems put executable metadata in comments, such as inline test-runner directives. Declare those as config, not core heuristics:
semantic_comment_patterns:
- id: test-runner-directives
paths:
- "tests/**/*.ext"
contains: "runner-config:"Use contains: for stable literals or regex: when the directive has multiple spellings. Matching added or removed lines are treated as semantic changes for planning.
Declares an edge between two sets of files. When a file in from: changes, files matched in to: are treated as semantically affected, and the planner uses the edge during relevance computation. The edge: label is a free-form string that surfaces in explain output ("touched via interface link") but does not drive planning logic — the planner cares that an edge exists, not what it is named.
- id: solidity-interface-link
type: relation
from: "interfaces/**/*.sol"
to: "src/**/*.sol"
edge: imports
match_rule:
by: normalized_basename
from_strip_prefix: IDeclares that files in from: produce files in to:. Two effects: a generator step runs before any step that consumes the output, and changes to from: invalidate the freshness of the corresponding to: files until regenerated.
- id: solidity-bindings
type: generate
from: "src/**/*.sol"
to: "bindings/**/*.go"
match_rule:
by: normalized_basenameDerives an execution scope for a scopeable step from a change. When changed files match from:, the named target_step:'s scope becomes the matching to: paths. Lets the planner narrow a broad step to the part of the repo a change actually affects, instead of running it across everything.
- id: typescript-tests-by-module
type: scope
target_step: bun-test
from: "src/**/*.ts"
to: "tests/**/*.test.ts"
match_rule:
by: normalized_basenameHow from: and to: glob matches are paired. Three modes ship today; more may be added as the engine grows.
| Mode | Behavior | Notes |
|---|---|---|
normalized_basename |
Match by filename, stripped of optional prefix/suffix | Use from_strip_prefix / from_strip_suffix to normalize before comparison |
directory_path |
Match by directory path | Useful for "src/X/* maps to tests/X/*" style mappings |
regex |
Capture groups in from:, substitution in to: |
Most flexible escape hatch; use when neither basename nor directory matching fits |
The goal is to keep repository truth in the repository model itself.
This repo includes an official Claude Code plugin and marketplace layout:
- marketplace:
.claude-plugin/marketplace.json - plugin manifest:
plugins/rwyn/.claude-plugin/plugin.json
Local testing:
claude --plugin-dir ./plugins/rwynPublic install after adding this repo as a marketplace:
claude plugin marketplace add smartcontracts/rwyn
claude plugin install rwyn@rwyn-pluginsThis repo also includes a Codex plugin scaffold:
- marketplace entry:
.agents/plugins/marketplace.json - plugin manifest:
plugins/rwyn/.codex-plugin/plugin.json
The bundled skill content lives under plugins/rwyn/skills/.
These skills are reference drivers for the loop described in How The Model Is Built And Improved. They demonstrate one good bootstrap-and-iteration flow against rwyn's diagnostic surface; users can replace them with their own.
Notable bundled skills:
rwynOperate and debug an existingrwynworkflow.setupInspect a repo, scaffold.rwyn/config.yaml, and add declarative transforms.doctorDiagnose a repo'srwynsetup and verification surface.selectExplain and inspect the chosen plan for a change.planPreview whatrwynwould execute without running it.explainExplain why a file or change selected a given plan item.
There is a parity harness at scripts/benchmark-parity.sh.
It compares rwyn against a legacy selector on a commit corpus and reports:
- selected item count
- missing selections vs legacy
- extra selections vs legacy
- per-commit runtime
Run the core checks locally:
cargo fmt
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-targets --all-featuresMIT, see LICENSE.