Skip to content

smartcontracts/rwyn

Repository files navigation

rwyn

rwyn ("arwin") means run what you need.

rwyn is a stage-aware planner and executor for change-driven verification. Given a code change, it determines which repository requirements are plausibly at risk, gathers and weighs the relevant evidence, constructs the smallest practical plan it can justify for the current stage of the code lifecycle, and executes that plan automatically.

The goal is simple: get the confidence you need with the least unnecessary work.

Install And Get Started

Install rwyn:

curl -fsSL https://get.rwyn.dev/install.sh | sh
brew install rwyn
cargo install rwyn

Initialize a repository:

cd your-repo
rwyn setup
# have the agent inspect the raw sources and write .rwyn/config.yaml
rwyn doctor
rwyn build
rwyn run --stage save
rwyn plan --stage merge
rwyn explain

For contributors or local development from source:

cargo install --path .

Or build a release binary locally:

cargo build --release
./target/release/rwyn --help

Set Up A Repo

Set up a repository in five steps:

  1. install the CLI
  2. run rwyn setup to print the agent setup contract and raw evidence sources
  3. have the agent inspect the repo and write .rwyn/config.yaml
  4. run rwyn doctor
  5. run rwyn build
  6. run rwyn run --stage save

The preferred bootstrap path is agent-driven: rwyn provides the schema, validation surface, and raw evidence inventory; the agent reads the repository and writes the model. CI files, scripts, docs, and manifests are evidence. They are not imported as truth by brittle provider-specific setup code.

rwyn setup does not mutate the repo. It tells the agent what to inspect and where to write the model:

  • .rwyn/config.yaml
  • an initial set of stages
  • an initial set of steps
  • obvious repository structure and toolchain assumptions
  • suggested CI wiring

A minimal config looks like:

requirements:
  - id: tests-pass
    description: TypeScript tests pass

stages:
  save:
    default_confidence: medium
  commit:
    default_confidence: high
  merge:
    default_confidence: certain

steps:
  - id: test
    kind: test
    command: bun test
    inputs:
      - "src/**/*.ts"
    satisfies:
      - tests-pass

Set Up With An Agent

Bootstrap is the heavy-agent phase of the loop described in How The Model Is Built And Improved; after the first session, the same skill drives lighter ongoing iteration.

If you are using Claude Code or Codex, the best initial setup flow is:

  1. run rwyn setup
  2. ask the agent to inspect the raw sources it reports and scaffold .rwyn/config.yaml
  3. have it add declarative plugins for obvious repo-specific structure
  4. have it run rwyn setup inspect to compare observed repo surfaces with the configured model
  5. have it run rwyn doctor
  6. have it run rwyn build
  7. have it run rwyn plan --stage save
  8. have it explain any surprising selections

The agent uses the rwyn skill or plugin surface for setup. Repository truth lives in config and plugins, not in prompts.

Improving The Model

The model improves as the repository gives rwyn more information:

  1. a minimal working config
  2. declared prerequisites and hard relationships
  3. dynamic evidence such as coverage
  4. plugins for hidden structure
  5. gaps, replay, and compare for ongoing refinement

Practical habits that move the model forward:

  • keep steps narrow and scopeable
  • declare obvious prerequisites and hard relationships explicitly
  • collect coverage so test scoping and confidence improve
  • model generated artifacts and hidden dependencies with plugins
  • treat repeated expensive early-stage work as a sign that the repo needs a cheaper signal

Collect Better Evidence

Coverage tells rwyn what code a step actually exercises, which sharpens scoping and confidence beyond what declared and static evidence can give.

Use this loop to keep dynamic evidence current:

rwyn coverage status
rwyn coverage refresh
rwyn coverage refresh --background
rwyn coverage refresh --all --background
rwyn coverage refresh --all --background --max-parallel 4
rwyn coverage refresh --step go-test-coverage --scope ./pkg/service
rwyn coverage refresh --all --dry-run --json --full-statuses
rwyn coverage jobs
rwyn coverage ingest --input path/to/lcov.info --format lcov --step bun-test

Useful evidence is:

  • incremental
  • scope-aware
  • fresh enough to trust
  • shared across local runs, CI, and agents when possible

Add Plugins When The Repo Has Hidden Structure

Plugins capture repository truth that is real but not obvious from plain file layout:

  • generated-artifact relationships
  • hidden dependency edges
  • interface-to-implementation links
  • path-derived scopes
  • repository-specific structure that affects relevance or confidence

Why rwyn exists

Most teams still treat verification as tribal knowledge.

Developers learn rules like "if you touch this area, run these tests." CI pipelines encode partial logic in scattered configs and scripts. Agents miss important repo-specific checks, or overrun by falling back to "run everything."

rwyn exists to replace that folklore with a repository model. Instead of teaching every human and every agent what to run for every kind of change, the repository declares what it cares about once, and rwyn plans and executes from that model everywhere.

Fundamentals

rwyn is built around a small set of concepts:

  • requirement A property the repository wants to hold, such as formatting being correct, generated artifacts being current, relevant builds succeeding, or relevant tests passing.
  • step An executable action that provides evidence about, verifies, satisfies, or helps satisfy one or more requirements.
  • evidence The raw information rwyn uses to decide what is relevant, which steps are useful, and when a plan is sufficient.
  • plan A stage-specific decision about which steps to run, in what order, at what scope, for which requirements.
  • stage A repo-defined lifecycle checkpoint with a default confidence target for relevant requirements.
  • confidence A global concept applied per requirement. Different requirements do not redefine what confidence means; they differ in what evidence is needed to reach it.

The model is many-to-many:

  • a requirement may be supported by multiple steps
  • a step may support multiple requirements
  • some steps fully satisfy a requirement
  • some steps provide only partial evidence

That lets the repository express realities like:

  • a formatter satisfying a formatting requirement
  • a non-mutating formatting step verifying the same requirement
  • a narrow unit test providing partial evidence about a broader integration risk
  • a generation step satisfying an artifact-freshness requirement that later verification depends on

The same logical requirement may admit different operational strategies at different stages. The repository model defines those choices explicitly.

How rwyn Thinks

rwyn is fundamentally an evidence system.

For a given change, rwyn first asks which requirements have non-zero plausible risk. Then it asks which steps provide the best next evidence for those requirements at the current stage. Planning stops when every relevant requirement reaches its effective confidence target.

This means rwyn distinguishes between two phases:

  • selection Which requirements are plausibly in play for this change?
  • planning Which steps should run now so each relevant requirement reaches the confidence needed for this stage?

A plan gathers enough evidence so that every relevant requirement reaches its target with the least unnecessary work. The planning question is:

what is the cheapest evidence I can gather now that reduces the chance of later-stage failure enough for this stage?

If a slow step is genuinely necessary at an early stage, rwyn runs it. Repeated expensive early-stage work is diagnostic: the repo is missing a cheaper earlier signal for that risk.

Evidence Model

Evidence remains inspectable. rwyn can explain:

  • why a requirement is relevant
  • why a step is useful
  • why the selected plan is sufficient

Those are three distinct layers of evidence:

  • requirement evidence for relevance
  • step evidence for usefulness
  • plan evidence for sufficiency

Relevance is computed from a stack of evidence sources, with stronger evidence preferred before weaker evidence:

  • declared repository knowledge
  • static structural evidence
  • semantic or AST-level evidence
  • dynamic execution evidence such as coverage or traces
  • historical empirical evidence
  • heuristics and priors last

Freshness, scope, reliability, cost, contradiction, and recency are all inputs to the calculation.

Confidence Model

Confidence is the probability, for a given requirement, that the selected subset of relevant checks catches what the full set of relevant checks would catch, measured against observed outcomes.

A target of 0.75 for a requirement means: calibrated against observed history, the selected subset is expected to catch the same set of failures the full set would catch with at least 0.75 probability. 1 - 0.75 = 0.25 is the acceptable probability that a failure surfaces later.

Confidence is tracked per relevant requirement, not as one global score for the whole change.

For each change:

  1. rwyn identifies the requirements with non-zero plausible risk.
  2. Each relevant requirement gets a confidence estimate from the evidence the planner has, the priors it carries, and the calibration accumulated from prior runs.
  3. Candidate steps are evaluated by how much useful evidence they provide relative to cost.
  4. rwyn keeps selecting steps until every relevant requirement reaches its effective confidence target.

The stage ladder is a probability budget spread across the lifecycle: early-stage checks accept a higher probability of missed failures because later stages re-verify at higher targets.

Confidence targets inherit cleanly:

stage default -> requirement override

A stage supplies a default confidence target for the requirements relevant at that lifecycle point, declared with the default_confidence: field in .rwyn/config.yaml. Every stage must declare one; missing defaults are surfaced by rwyn doctor.

Confidence is configured on one global scale. Repositories can use either named labels or numeric values, and both resolve to the same underlying targets.

The built-in confidence labels map to:

Label Numeric target
low 0.25
medium 0.50
high 0.75
very_high 0.90
certain 1.00

Numeric values use the same 0.00 to 1.00 scale. For example, confidence: 0.85 sets a stricter target than high and a looser target than very_high.

Within a single planning pass, confidence accumulation is monotonic: adding valid evidence only maintains or increases confidence for a requirement.

Calibration is empirical. In a fresh repo, targets are reached using declared evidence and priors; as run history accumulates, calibration sharpens. The planner reports per requirement whether its confidence number is calibrated against history or still relying on priors.

How The Model Is Built And Improved

The planner is one artifact of rwyn's verification model. Building and maintaining that model is the rest of the system.

The model is built and improved in three modes that share the same machinery:

  • Bootstrap. Turn a fresh repo into a usable model in one good agent session. Sources include programmatic analysis (file structure, AST, language detection), dynamic evidence (coverage when collected), and AI-elicited declared knowledge (the strongest tier in the evidence stack). The only evidence source legitimately missing at this point is historical outcomes; everything else can be in place from the first session.
  • Iteration. When rwyn misses a failure or pays too much for confidence, the diagnostic surface (gaps, explain, replay) describes the gap as honestly as it can: clean attribution where the data supports it, candidate causes where it does not, and explicit "I cannot tell" where it cannot. The agent reads that report, proposes a model change, validates with replay, and commits.
  • Calibration. Background sharpening of probability estimates from accumulated runs. The planner's predictions become more honest as outcomes flow back into the model.

Declared And Learned

rwyn combines declared semantics with empirical evidence.

Users declare what they know for sure:

  • explicit requirement and step relationships
  • prerequisites
  • obvious full-satisfaction cases
  • stage configuration
  • scope rules
  • repo-specific structure

Everything else is learned empirically over time:

  • how predictive a step really is for a requirement
  • which early steps substitute well for broader later steps
  • how much confidence a scoped run really buys
  • which failure surfaces are under-modeled
  • where the repo is missing a cheaper earlier signal

Declared configuration remains authoritative for planning and execution. When observed outcomes repeatedly contradict declared assumptions, rwyn surfaces the divergence through warnings, reports, and recommendations.

Misses Are Typed

When a step fails at a later stage, rwyn looks backward to find the earlier stages where the same step was a candidate but was skipped. The miss type comes from why it was skipped:

  • Selection miss. The relevance gate filtered the step out wrongly. Fix: tighten relevance.
  • Weight miss. The step was a candidate, but the planner believed another step substituted for it. Fix: adjust evidence weights.
  • Set miss. The step was not a candidate for the relevant requirement at the earlier stage. Fix: declare or learn the link.
  • Link miss. The change-to-step relationship was not modeled at the earlier stage. Fix: add a plugin or declared edge.

Failures where no current step would have caught the problem (a novel failure mode, an unmodeled risk) are a different gap class — "no earlier signal exists for this failure type" — and surface separately, not as miss attribution.

The Skill Is The Loop's Driver

rwyn produces diagnostics and accepts model changes. The orchestration of "read gap → propose change → validate → commit" lives in the skill. The bundled Claude Code and Codex skills are reference drivers; the loop they implement is one example among many.

JSON outputs (rwyn gaps --json, rwyn explain --json, rwyn plan --json) are the public APIs the loop writes against. Their schemas are stable across versions.

What rwyn Produces

For each change, rwyn produces a run record — the durable artifact that powers replay, compare, gaps, and calibration over time.

A run record contains:

  • identity: change ref (commit, diff, or range), stage, environment, timestamp, rwyn version, model state hash
  • plan: the selected steps, the candidate steps that were skipped, the per-requirement confidence reached, scopes
  • decisions: for each candidate step, why it was selected or skipped — the data that powers explain and miss attribution
  • outcomes: per executed step, pass/fail, duration, exit code, captured evidence (coverage paths, traces)
  • provenance: source of the record (a local rwyn run, a CI run, or an external ingest)

Plans are proposals before execution and records after. The same object survives both phases, so intent, outcomes, and later attribution all reference the same artifact.

Storage And Sync

By default, run records live locally in .rwyn/runs/ as JSON files, one per run, and that directory is gitignored. The schema is stable across versions and admits external sources, so any record — local, CI, future hosted — can be ingested by any environment.

The engine ships two primitives:

  • rwyn export runs — write records out for transport (CI artifacts, archival, manual sharing)
  • rwyn ingest runs <path> — bring records from elsewhere into the local model

Local↔remote sync is orchestrated by the skill. A typical flow: CI uploads .rwyn/runs/ as a build artifact at the end of a stage; the skill, on git pull or session start, downloads new artifacts and ingests them.

An opt-in runs_storage: git_branch mode stores records on a parallel branch like rwyn/runs. The tradeoffs (repo bloat, paths and outcomes in git history, a tool writing to a branch) are why it is opt-in.

How It Works

rwyn works like this:

  1. Model the repository.
  2. Map a change onto that model.
  3. Select requirements with non-zero plausible risk.
  4. Evaluate candidate steps as evidence.
  5. Build the smallest practical sufficient plan for the stage.
  6. Execute that plan with ordering, prerequisites, and environment contracts preserved.
  7. Record outcomes and feed them back into future planning.

Typical Workflow

During development:

rwyn run --stage save
rwyn run --stage commit

When work is pushed remotely:

rwyn run --stage push

Before or during integration:

rwyn run --stage merge

When the result surprises you:

rwyn explain
rwyn gaps

When the repository model needs work:

rwyn doctor
rwyn gaps

Stage Model

rwyn is stage-aware, but stages are repo-defined lifecycle checkpoints, not platform nouns like "PR" or "merge queue".

A stage provides:

  • a default confidence target for relevant requirements
  • a lifecycle marker that steps reference to declare when they apply

The planner's objective is already cheapest sufficient evidence, so cost lives in the planner, not in stage configuration. Which steps run when is decided by step-level stage applicability.

The default stage vocabulary is:

  • save
  • commit
  • push
  • merge
  • post_merge
  • release

These are examples, not a universal lifecycle. Repos define the stages that match how they actually work, including names like:

  • nightly
  • staging
  • hotfix
  • perf
  • security
  • deploy

Stages are also flexible enough to support immediate local or operational goals, such as keeping the workspace healthy, validating post-merge behavior, or preparing release artifacts.

Command Overview

The command surface is small and role-oriented.

Shared Inputs

Most commands operate on the same core inputs:

  • stage Which lifecycle checkpoint you are planning for, such as save, commit, push, merge, or a repo-defined custom stage.
  • change The change under consideration. By default this is the current local diff, but it can also be a base/head range, a commit, a pushed change, or an explicit diff artifact.
  • scope overrides Optional narrowing or explicit step selection when a user wants to override automatic planning.
  • output mode Human-readable explanation by default, with machine-readable output available for CI, agents, and tooling.

Most commands use flags in the shape of:

--stage <stage>
--base <rev>
--head <rev>
--change <change-ref>
--step <step-id>
--scope <scope>
--json

Setup And Health

rwyn setup

Print the agent setup contract for a repository.

setup does not try to import CI or write a guessed model. It inventories raw sources an agent should inspect, such as manifests, CI files, scripts, docs, toolchain files, generated-artifact tooling, coverage sources, and repo-specific linkage clues.

The expected agent outcome is a full initial setup, not a starter file: write .rwyn/config.yaml, add declarative plugins for hidden repo structure, wire and seed coverage for scopeable checks, run representative plan/explain probes, and leave notes for any coverage jobs or environment contracts that are still running or blocked.

Examples:

rwyn setup
rwyn setup --json
rwyn setup inspect
rwyn setup inspect --json

rwyn setup inspect is the setup dossier command. It is intentionally factual: it reports mechanically observed repo surfaces, raw source pointers, configured requirements and steps, configured plugins and profile groups, coverage evidence, and schema-level alignment gaps between those facts. It does not grep for magic words or infer repo intent from loose text; the raw sources are there so the client-side agent can run the searches and make the judgment. Findings are labeled as observed, configured, candidate, or unknown so a client-side agent can do the judgment work without rwyn pretending to be a repo expert.

When CI exposes independently scheduled jobs with separate commands and failure surfaces, the setup agent should usually model those as separate steps instead of one aggregate convenience command. Aggregates can be useful local shortcuts, but they hide runtime cost and make CI comparison less precise. Independently required steps should have distinct satisfies requirements; a shared aggregate requirement can make them look like alternatives to the planner at cheaper stages.

The intended setup loop is:

  1. rwyn setup
  2. agent reads the raw evidence and writes or edits .rwyn/config.yaml and plugins
  3. rwyn setup inspect
  4. agent resolves, documents, or intentionally ignores the remaining pointers
  5. rwyn doctor, rwyn build, representative plan/explain, and coverage probes
  6. repeat until the inspect dossier and probe behavior both look credible

rwyn init

Bootstrap rwyn in a repository.

init initializes a small starter repository model and gets simple repos to a usable baseline quickly. For serious monorepos, prefer the agent-driven rwyn setup flow.

rwyn init is responsible for:

  • detecting languages, tools, and common repo patterns
  • inferring an initial set of requirements and steps
  • creating .rwyn/config.yaml
  • suggesting stage defaults
  • suggesting CI wiring
  • optionally scaffolding plugins for common repo-specific structure

Examples:

rwyn init
rwyn init --yes
rwyn init --stage-defaults save,commit,push,merge

rwyn doctor

Validate installation, repo model, tools, environment, evidence state, and integrations.

doctor is the trust and diagnosis command. It answers questions like:

  • is rwyn installed correctly?
  • did the repo load the configuration I expect?
  • are required tools available?
  • are required environment contracts satisfied?
  • is the repository model stale or broken?
  • is coverage or other evidence missing or obviously inconsistent?

Examples:

rwyn doctor
rwyn doctor --json
rwyn doctor --stage merge

JSON doctor output includes summary.overall plus ok/warn/error counts, followed by the detailed result list.

rwyn build

Build or refresh repository structure and derived evidence indexes.

build refreshes the repository model itself. In a mature setup it may happen automatically when needed; the explicit command is for debugging, CI bootstrap, and large repo changes.

Examples:

rwyn build
rwyn build --full
rwyn build --refresh

Planning And Execution

rwyn verify

Plan the change, maintain coverage evidence for the selected plan, replan, and optionally execute the final commands.

verify is the beta day-to-day command for repositories with coverage-backed scoping. By default it runs the coverage maintenance loop but does not execute the final verification commands. Pass --run to execute the final plan, or --dry-run to preview the coverage refresh queue and final command selection without collecting coverage or running checks.

Examples:

rwyn verify --stage pr --base develop
rwyn verify --stage pr --base develop --run
rwyn verify --stage pr --base develop --dry-run
rwyn verify --stage pr --base develop --background
rwyn verify --stage pr --base develop --json --full-scopes --full-statuses

The loop is:

  1. select the initial plan from the diff
  2. inspect coverage for the selected checks and scopes
  3. refresh stale or missing scoped coverage evidence unless disabled
  4. replan from the updated evidence
  5. run the final plan only when --run is present

Use --no-maintain-coverage when you want the old "plan only from existing evidence" behavior. Use --background for long coverage refreshes; status is reported through rwyn coverage jobs.

rwyn run

Plan and execute the right steps for the current change and stage.

run is the primary command and the one most day-to-day use lives in.

run is responsible for:

  • selecting relevant requirements
  • evaluating candidate steps as evidence
  • building a sufficient plan for the requested stage
  • executing the selected steps in the right order
  • recording results for replay, comparison, analytics, and learning

Examples:

rwyn run --stage save
rwyn run --stage commit
rwyn run --stage merge
rwyn run --stage merge --json

run also supports explicit user intent when needed:

rwyn run --stage save --step rust-test
rwyn run --stage commit --scope src/foo.ts
rwyn run --stage merge --change origin/main...HEAD

rwyn plan

Show the selected plan without executing it.

plan shows:

  • what would run
  • why it would run
  • what is being scoped
  • what prerequisites would be pulled in
  • what confidence targets are driving the decision

Examples:

rwyn plan --stage save
rwyn plan --stage merge
rwyn plan --stage merge --json
rwyn plan --stage merge --json --full-scopes
rwyn plan --stage merge --change origin/main...HEAD

JSON plan output is compact by default: each item reports scope count, an 8-entry preview, and whether the list was truncated. It also includes a summary and grouped step fanout so agents can see profile expansion, scope totals, and selection-reason previews without reading every item. Use --full-scopes when an agent needs every rendered scope, every grouped reason, and the full rendered command.

rwyn explain

Explain a single planning decision.

explain operates on one decision at a time — the most recent plan, or a specific target like a file, requirement, or step. It answers:

  • why a requirement is relevant
  • why a step was selected
  • why a scope was chosen
  • why a broader or cheaper alternative was not chosen
  • why the final plan is sufficient for the stage

For model-wide introspection — where the model itself is wrong, weak, or contradicted by observed outcomes — use gaps.

Examples:

rwyn explain
rwyn explain path/to/file.ts
rwyn explain --step integration-tests
rwyn explain --requirement formatting

Explain text output is compact by default: it shows outgoing graph edges, grouped plan fanout, and a bounded item-detail preview with bounded scope and contribution previews. Use --json when an agent needs the complete item list.

Evidence And Diagnostics

rwyn coverage ...

Manage coverage and related dynamic execution evidence.

Coverage is one evidence source among many. These commands let the repo inspect, refresh, collect, and ingest coverage without treating it as the whole system.

Examples:

rwyn coverage status
rwyn coverage refresh
rwyn coverage refresh --background
rwyn coverage refresh --all --background
rwyn coverage refresh --all --background --max-parallel 4
rwyn coverage refresh --step go-test-coverage --scope ./pkg/service
rwyn coverage refresh --all --dry-run --json --full-statuses
rwyn coverage jobs
rwyn coverage ingest --input path/to/lcov.info --format lcov --step bun-test

Use --background for expensive refresh queues. It starts the same refresh work as a local job, writes a durable JSON job record and log under .rwyn/jobs, and returns immediately so an agent can continue setup or planning while coverage collects. rwyn coverage jobs reports observed process state, elapsed time, log freshness, log size, and active process-group work such as child count, total CPU and memory, and the busiest child process. New refresh jobs also record queue progress, the active coverage step, estimated percent complete, and ETA or lower-bound ETA from configured step duration estimates. If an old job record claims to be running but its process is gone and no final status was written, coverage jobs reconciles it to orphaned with an explicit error.

Use --all when setting up or reseeding a repo to inspect every stage-eligible coverage step instead of only the diff-derived refresh queue. Coverage freshness is per atom, not per run: an atom may be a file today and can grow to a function, line range, or test-derived unit as parsers learn more detail. Fresh atoms are kept, uncovered atoms are retained as evidence that the code was instrumented but not hit, stale atoms are mapped back to the narrowest runnable scope rwyn can derive, and full coverage is used only when an atom cannot be refreshed with a scoped command. Pass --force when you intentionally want to reseed fresh or uncovered atoms too.

Coverage status is intentionally evidence-shaped:

  • fresh: observed coverage evidence is current for the source fingerprint
  • uncovered: the native report instrumented the atom and recorded zero hits
  • stale: the atom was observed before, but its source fingerprint changed
  • missing: no trustworthy coverage evidence exists for that atom or scope

rwyn coverage status is scoped to the current diff-derived plan. If no coverage atoms are selected for that plan, the command says so explicitly and points agents to rwyn plan or rwyn coverage refresh --all --dry-run for a full configured coverage evidence view.

Coverage output is compact by default. JSON includes status totals, a bounded preview, per-status preview buckets, and a bounded refresh-job preview so agents can see examples of stale, missing, uncovered, fresh, and queued work without reading the complete atom or job list. Use --full-statuses only when an agent needs the complete raw atom and refresh-job lists.

For new code, rwyn treats missing coverage as an exploratory refresh. If a diff-derived plan has already inferred a runnable scope, and a changed file has no coverage atom yet, coverage refresh queues that scope once. After the native coverage command runs, zero-hit executable ranges become uncovered evidence instead of staying in a rerun loop.

Coverage reports with explicit scope_targets provide direct evidence for the runnable scopes that produced the report. When a report has no explicit targets, rwyn only infers scopes from records that match the changed atom; unrelated records from the same full run are not treated as runnable scopes.

Use --max-parallel N to let coverage refresh run multiple refresh jobs at once. Parallelism is conservative: a coverage step is serial unless it declares parallel_safe: true. Mark a step parallel-safe only when its command writes mutable outputs to rwyn-provided locations like {run_dir}, {output}, {junit}, or {go_test_json}, and does not clobber fixed repo paths, shared databases, service ports, or tool artifacts. rwyn does not assume worktree isolation for you. This is separate from batch_scopes: batching reduces native coverage runs for one step, while split-on-failure isolates unreliable batches without assuming the step is safe to run concurrently.

Use --step and --scope when you already know the coverage evidence to refresh. --step accepts either a coverage step id or the covered check id; use multiple --scope flags to refresh only those runnable scopes.

When ingesting an externally produced report for a profiled coverage step, pass the covered check, coverage profile, coverage step, and profile variant together:

rwyn coverage ingest \
  --input path/to/lcov.info \
  --format lcov \
  --step forge-test \
  --profile forge-lcov \
  --coverage-step forge-test-coverage \
  --profile-variant contracts_features:opcm-v2

The variant is part of coverage freshness. Unprofiled evidence must not satisfy a profile-specific check.

For Foundry, prefer ingesting attribution JSON when available:

rwyn coverage ingest \
  --input packages/contracts-bedrock/coverage-attribution.json \
  --format foundry-attribution \
  --step forge-test-prod \
  --profile forge-attribution \
  --coverage-step forge-test-prod-coverage \
  --source-root packages/contracts-bedrock

Coverage is modeled as ordinary kind: coverage steps, so build/setup prerequisites belong in requires, inputs, and outputs, just like any other step. The coverage-specific fields only say which check the evidence covers and which artifacts to parse.

Full refresh queues run cheaper coverage steps first and continue independent coverage steps after a failure, then report any failed jobs at the end.

rwyn ingest ...

Ingest external evidence or historical results.

This command family brings externally generated evidence into rwyn's model, including coverage, execution reports, CI artifacts, and learned priors.

Examples:

rwyn ingest coverage path/to/lcov.info
rwyn ingest runs path/to/run-records/
rwyn ingest evidence path/to/report.json

rwyn replay

Re-evaluate historical changes against the current model.

replay answers: if the current planner had existed in the past, what would it have chosen, and what would it have missed?

This matters for:

  • validating model changes
  • measuring recall
  • understanding regressions
  • improving trust before changing policy

Examples:

rwyn replay
rwyn replay --stage merge
rwyn replay --since 30d

rwyn compare

Compare behavior across stages, environments, or time.

compare helps answer questions like:

  • what changed between local and CI behavior?
  • what changed after a policy update?
  • why does merge run more than commit here?
  • where plans diverge in ways that matter

Examples:

rwyn compare --group change
rwyn compare --stage commit --stage merge
rwyn compare --environment local --environment ci

rwyn gaps

Surface where the model itself is wrong, weak, or contradicted.

Where explain introspects a single decision, gaps introspects the model against ground truth and accumulated outcomes. It surfaces two classes of gaps:

  • correctness gaps Missing early signals, contradicted declarations, under-modeled requirements, weak evidence paths.
  • efficiency gaps Expensive early-stage work, broad steps that need narrower scopes, missing cheaper proxies, repeated unnecessary evidence gathering.

Calibration of evidence weights from observed outcomes happens automatically as runs accumulate; gaps is how that calibration surfaces.

Examples:

rwyn gaps
rwyn gaps --stage commit
rwyn gaps --kind efficiency
rwyn gaps --json

Configuration And Integration

rwyn config ...

Inspect, validate, and edit effective configuration.

This command family answers questions like:

  • what config is actually in effect?
  • where did this setting come from?
  • how are stage defaults resolving?
  • what does this requirement or step currently look like?

Examples:

rwyn config show
rwyn config show --effective
rwyn config explain stages.merge
rwyn config validate

rwyn plugin ...

Manage declarative repository-model extensions.

Plugins define repository-specific structure and evidence logic in the repo model.

Examples:

rwyn plugin list
rwyn plugin validate
rwyn plugin scaffold relation

rwyn ci ...

Scaffold, inspect, and validate CI integration.

Examples:

rwyn ci init github-actions
rwyn ci init circleci
rwyn ci doctor
rwyn ci show

Configuration Model

The primary config surface is .rwyn/config.yaml.

It describes:

  • requirements
  • steps
  • stages
  • plugins
  • runtime paths
  • evidence and learning policy

Split files are fine for larger repos, but the default experience is one obvious entry point.

Example:

graph: .rwyn/graph.json
coverage_data: .rwyn/coverage-data
runs_dir: .rwyn/runs
jobs_dir: .rwyn/jobs

requirements:
  - id: rust-tests-pass
    description: Rust unit and integration tests pass

  - id: typescript-tests-pass
    description: TypeScript tests pass

  - id: bindings-current
    description: Generated Go bindings match the Solidity sources

  - id: contracts-tests-pass
    description: Solidity contracts tests pass

plugins:
  - id: solidity-interface-link
    type: relation
    from: "interfaces/**/*.sol"
    to: "src/**/*.sol"
    edge: imports
    match_rule:
      by: normalized_basename
      from_strip_prefix: I

  - id: solidity-bindings
    type: generate
    from: "src/**/*.sol"
    to: "bindings/**/*.go"
    match_rule:
      by: normalized_basename

profile_groups:
  - id: contracts_features
    profiles:
      - id: main
        env: {}
      - id: custom-gas-token
        env:
          SYS_FEATURE__CUSTOM_GAS_TOKEN: "true"
      - id: opcm-v2
        env:
          DEV_FEATURE__OPCM_V2: "true"

resources:
  cargo:
    max_parallel: 1

stages:
  save:
    default_confidence: medium
  commit:
    default_confidence: high
  merge:
    default_confidence: certain

steps:
  - id: rust-test
    name: Rust tests
    kind: test
    language: rust
    command: cargo test --all-targets --all-features
    tools: [cargo]
    resource_group: cargo
    inputs:
      - "src/**/*.rs"
    satisfies:
      - rust-tests-pass

  - id: bun-test
    name: Bun tests
    kind: test
    language: typescript
    command: bun test
    scopeable: true
    scope_flag: ""
    scope_type: test_paths
    tools: [bun]
    inputs:
      - "src/**/*.ts"
      - "src/**/*.tsx"
    scope_inputs:
      - "src/**/*.test.ts"
      - "src/**/*.test.tsx"
      - "src/**/*.spec.ts"
      - "src/**/*.spec.tsx"
    satisfies:
      - typescript-tests-pass

  - id: bun-test-coverage
    name: Bun test coverage
    kind: coverage
    language: typescript
    coverage_for: bun-test
    coverage_profile: default
    coverage_format: lcov
    coverage_language: typescript
    coverage_output: lcov.info
    coverage_junit: junit.xml
    command: >-
      bun test --coverage --coverage-reporter=lcov --coverage-dir {run_dir}
      --reporter=junit --reporter-outfile {junit} {args}
    parallel_safe: true
    scopeable: true
    scope_flag: ""
    scope_type: test_paths
    tools: [bun]
    inputs:
      - "src/**/*.ts"
      - "src/**/*.tsx"
    scope_inputs:
      - "src/**/*.test.ts"
      - "src/**/*.test.tsx"
      - "src/**/*.spec.ts"
      - "src/**/*.spec.tsx"

  - id: forge-test
    name: Forge tests
    kind: test
    language: solidity
    command: forge test
    inputs:
      - "src/**/*.sol"
      - "test/**/*.sol"
    impacted_by:
      profile_groups: [contracts_features]
    satisfies:
      - contracts-tests-pass

Coverage steps run from the repository root. {output}, {junit}, {go_test_json}, and {run_dir} expand to paths under the coverage work directory, and {args} expands to rwyn-rendered scope arguments. Use coverage_source_root when a native coverage report emits paths relative to a package or project directory instead of the repository root; rwyn stores coverage atoms as repo-relative paths but strips that source root again when it renders path scopes for the native command.

Use scope_inputs when a scoped command's dependency surface is broader than the runnable scope universe. inputs means "these files can affect this step"; scope_inputs means "these files/packages are valid values rwyn may pass to the runner as a scope." For example, a coverage command may depend on both src/**/*.sol and test/**/*.sol but only accept test/**/*.t.sol as --match-path targets.

Coverage reports should include explicit scope_targets whenever a scoped refresh produced the report. Without explicit targets, rwyn can only infer a scope from records that match the changed atom; records that merely appeared in the same broad report are not scope evidence. Coverage steps read that evidence through their coverage_for check, so the refresh command and the covered test plan use the same scope model.

When coverage refresh --all, coverage refresh --step <coverage-step>, or a forced full refresh targets a scopeable coverage step and no explicit --scope is provided, rwyn discovers the configured scope_inputs universe. By default it queues one refresh job per discovered scope. This is precise for native reports that only describe one requested package, file, crate, or test scope at a time: each report records the requested scope as scope_targets, so later stale atoms can map back to the narrowest runnable scope instead of inheriting every file that happened to appear in a full report. If the native runner can accept many scopes in one command and still emit precise attribution, set batch_scopes: true; rwyn will pass the discovered scopes together as one refresh job and rely on the report's per-record attribution to recover the narrow evidence. If a batched refresh fails, rwyn retries by splitting the scope set and keeps successful smaller reports; the remaining failures name the scopes that still need repo-specific attention. If the native runner is flaky, set coverage_attempts: N on the coverage step. rwyn will retry the same scoped coverage job before splitting it, and it only ingests coverage from a successful complete attempt. Keep retry loops outside native coverage commands when the native tool rewrites coverage artifacts during reruns. If a repository has a convenience coverage command that chains multiple native coverage runners, split it into separate coverage steps when those runners accept different scope flags. A scoped coverage step should wrap one coherent native runner surface.

LCOV, Go coverprofile, and LLVM coverage JSON ingestion keeps both hit ranges and zero-hit executable ranges. Foundry attribution JSON keeps per-test hit ranges, exact test identities, and record-level scope targets, so one full native coverage run can teach rwyn which tests and runnable test files exercise each source atom without rerunning coverage per atom. When a step uses scope_join: foundry_tests, rwyn preserves both dimensions: file/path evidence renders as --match-path, exact test evidence renders as --match-contract and --match-test, and records that have both use both filters together. If changed line coverage exists, rwyn scopes from the records covering those changed lines; otherwise it falls back to the file-level records for the changed file. A report with only zero-hit executable code is valid coverage evidence; it should become uncovered, not fail ingestion. For Rust LCOV from Cargo workspaces, rwyn can mechanically attribute source records to Cargo package scopes by walking to the nearest Cargo.toml package manifest. That lets a package-scoped cargo llvm-cov setup refresh exact Cargo package atoms even when the LCOV format only reports source paths. When source files are available, rwyn filters native line ranges against the source text before creating coverage atoms. Blank lines, comment-only lines, and delimiter-only lines such as bare braces are ignored generically so coverage maps describe code, not formatter or syntax noise. Coverage steps are not parallel by default. Set parallel_safe: true only after the command writes every variable artifact to rwyn-provided paths and does not share mutable repo-local output with another refresh job.

Profile groups are selective. A step without impacted_by runs once. A step with impacted_by.profile_groups expands across the named profiles, with each profile's env applied at execution time. Coverage freshness is tracked per coverage profile and per selected profile variant, so baseline coverage does not satisfy feature-flag coverage. Use impacted_by.profiles with group:profile entries when only specific profiles affect a step.

Profile selection uses evidence rather than a representative variant. When a changed plan item has multiple profile variants and rwyn has profile-specific coverage for the covered step, it compares the changed file and line atoms against each variant's coverage and selects the smallest deterministic set of variants that covers the observed changed atoms. If coverage cannot answer, rwyn falls back to all configured variants for that step.

Scopes propagate through declared prerequisites when both steps explicitly declare compatible scope_type values. For example, if a scoped test requires a build step and both steps use compatible path scopes, rwyn carries the selected test paths back to the build step and renders them with the build step's own scope_flag/scope_join settings. Non-scopeable prerequisites and incompatible scope types stay unscoped.

Use scope_type: go_packages for Go commands whose native target is an exact package path such as ./pkg/service. Use scope_type: packages for package systems whose rendered scope is intentionally recursive, such as ./pkg/service/.... Use scope_type: compile_paths for build commands whose native target is a file path to compile. Direct source changes become direct file scopes, while compatible test path scopes can still propagate backward from dependent test or coverage steps.

  - id: go-test
    name: Go tests
    kind: test
    language: go
    command: go test
    scopeable: true
    scope_type: go_packages
    inputs:
      - "**/*.go"

  - id: go-test-coverage
    name: Go test coverage
    kind: coverage
    language: go
    coverage_for: go-test
    coverage_profile: go-cover
    coverage_format: go_coverprofile
    coverage_language: go
    coverage_output: coverage.out
    command: "go test -coverprofile={output} {args}"
    parallel_safe: true
    scopeable: true
    scope_type: go_packages
    inputs:
      - "**/*.go"
    requires:
      - generated-fixtures

If broad coverage needs package discovery, generated artifacts, service setup, or CI-like environment, model those as normal steps and add them to requires.

Explicit CLI flags still override config when needed.

Requirements

Requirements are first-class declared objects. Each one names a property the repository wants to hold; steps reference requirements to declare what they provide evidence for.

A requirement describes:

  • identity (id, optional human-readable name)
  • description
  • optional confidence override (replaces the stage default for this requirement when relevant)

Example:

requirements:
  - id: rust-tests-pass
    description: Rust unit and integration tests pass

  - id: security-checks-pass
    description: All critical security checks pass
    confidence: certain    # always certain, regardless of stage default

  - id: bindings-current
    description: Generated Go bindings match the Solidity sources

Steps reference requirements by id, with relationship strength:

  • satisfies: — the step's success fully addresses the requirement
  • evidence_for: — the step is candidate evidence for the requirement; its lift is learned from outcomes
steps:
  - id: cargo-fmt-check
    satisfies:
      - formatting-clean

  - id: rust-test
    satisfies:
      - rust-tests-pass
    evidence_for:
      - bindings-current     # rust tests indirectly exercise generated bindings

evidence_for contributes zero confidence until the planner has enough observed outcomes to calibrate the lift. The declaration marks the step as candidate evidence: when it runs (because it satisfies something else, or because it is cheap), its outcomes accumulate against the requirement and a learned weight emerges over time.

A mutating step (a formatter applying fixes) and a non-mutating step (a formatter in check mode) are two different steps. Each can declare stage applicability — stages: [list] to limit to specific stages, exclude_stages: [list] to remove specific ones, or neither to apply at every stage. The planner picks from stage-eligible steps:

steps:
  - id: cargo-fmt
    kind: format
    mutating: true
    stages: [save, commit]
    satisfies:
      - formatting-clean

  - id: cargo-fmt-check
    kind: format
    stages: [merge, push]
    satisfies:
      - formatting-clean

Mutation is a step property recorded on the step itself; behavior across stages is controlled by which step is listed where.

Steps And Execution

A step describes:

  • identity and kind
  • command
  • inputs and outputs
  • explicit prerequisites for non-file dependencies (requires:)
  • which requirements it satisfies or provides evidence_for
  • stage applicability (stages: to allowlist, exclude_stages: to blocklist; defaults to all stages)
  • whether it mutates (mutating: true)
  • whether and how it can be scoped
  • toolchain requirements
  • required environment variables
  • optional dynamic evidence such as coverage

Explicit step invocation uses the normal planner and executor, so prerequisites, layering, and evidence rules still apply.

Examples:

rwyn plan --step rust-test
rwyn plan --step bun-test --scope src/foo.test.ts
rwyn plan --step lint --step test --step-scope test=src/foo.ts

rwyn run always executes; rwyn plan never does. They share the same arg shape, so any preview is the same invocation with plan instead of run.

Step ordering is derived from declared inputs and outputs by default. If step B's inputs include a path that step A's outputs produce — directly, or via a generate-type plugin relationship — the planner runs A before B without anything explicit.

For dependencies that are not file-based (a service that must be running, a setup script that exports environment, a remote resource that must be initialized), declare them explicitly with requires::

steps:
  - id: db-migrate
    kind: setup
    command: ./scripts/migrate.sh
    stages: [save, commit, merge]

  - id: integration-test
    kind: test
    command: bun test --integration
    requires: [db-migrate]
    inputs:
      - "src/**/*.ts"
    satisfies:
      - integration-tests-pass

The planner combines implicit (file-derived) and explicit (requires:) ordering into a single dependency graph and executes steps in valid topological order. Cycles are surfaced by rwyn doctor.

If independent steps contend for the same local resource, declare scheduling constraints instead of fake prerequisites. requires: means semantic dependency; resources and mutexes only limit concurrency inside an otherwise parallel layer.

resources:
  cargo:
    max_parallel: 1

steps:
  - id: crate-a-test
    kind: test
    command: cargo test -p crate-a
    resource_group: cargo

  - id: db-integration
    kind: test
    command: ./scripts/integration-db.sh
    mutexes: [local-test-db]

rwyn plan still shows prerequisite layers separately, and includes a resources section when selected steps have scheduling constraints.

Steps can also declare environment contracts:

steps:
  - id: slice-v5
    name: Slice adapter v5
    kind: test
    language: typescript
    command: bun run scripts/integration-run.ts --adapter v5
    tools: [bun]
    required_env:
      - FELDERA_API_URL
      - FELDERA_API_TOKEN

CI Integration

rwyn integrates with existing CI systems:

  • CI remains the execution substrate
  • rwyn becomes the planner and executor
  • local development, agents, and CI all use the same verification model

rwyn works adopted entirely locally. With local and CI both routed through it, plans, evidence, and outcomes reinforce each other over time.

A CI setup looks like:

- name: Install rwyn
  run: curl -fsSL https://get.rwyn.dev/install.sh | sh

- name: Run merge-stage verification
  run: rwyn run --stage merge

CI bootstrap commands look like:

rwyn ci init github-actions
rwyn ci init circleci
rwyn ci doctor

Coverage And Dynamic Evidence

rwyn treats coverage as one dynamic execution signal among many, used for scoping and confidence updates.

Coverage and other evidence are:

  • incremental
  • atom-based
  • scope-aware when a stale atom can be mapped back to a runnable scope
  • freshness-aware by source fingerprint
  • explicit about zero-hit executable code as uncovered
  • exploratory for new or otherwise unobserved changed code
  • reusable across local runs, CI, and agents

Examples:

rwyn coverage status
rwyn coverage refresh
rwyn coverage refresh --background
rwyn coverage refresh --all --background
rwyn coverage refresh --all --background --max-parallel 4
rwyn coverage refresh --all --dry-run --json --full-statuses
rwyn coverage jobs

Executed plans also produce normalized run records that feed replay, compare, and gaps. Calibration of evidence weights from those records happens automatically; the loop that uses them to improve the model over time is described in How The Model Is Built And Improved.

Repository Modeling And Plugins

Repo-specific structure lives in declarative repository knowledge: hidden dependency relationships, generated-artifact relationships, path-to-scope derivation, and any repository-specific structure that affects relevance or confidence.

The plugin DSL is extensible. New types can be added as the engine learns new repository patterns; the existing types remain stable.

Semantic Comment Patterns

rwyn normally treats comment-only source diffs as text-only so doc/comment edits do not fan out into full semantic checks. Some ecosystems put executable metadata in comments, such as inline test-runner directives. Declare those as config, not core heuristics:

semantic_comment_patterns:
  - id: test-runner-directives
    paths:
      - "tests/**/*.ext"
    contains: "runner-config:"

Use contains: for stable literals or regex: when the directive has multiple spellings. Matching added or removed lines are treated as semantic changes for planning.

relation

Declares an edge between two sets of files. When a file in from: changes, files matched in to: are treated as semantically affected, and the planner uses the edge during relevance computation. The edge: label is a free-form string that surfaces in explain output ("touched via interface link") but does not drive planning logic — the planner cares that an edge exists, not what it is named.

- id: solidity-interface-link
  type: relation
  from: "interfaces/**/*.sol"
  to: "src/**/*.sol"
  edge: imports
  match_rule:
    by: normalized_basename
    from_strip_prefix: I

generate

Declares that files in from: produce files in to:. Two effects: a generator step runs before any step that consumes the output, and changes to from: invalidate the freshness of the corresponding to: files until regenerated.

- id: solidity-bindings
  type: generate
  from: "src/**/*.sol"
  to: "bindings/**/*.go"
  match_rule:
    by: normalized_basename

scope

Derives an execution scope for a scopeable step from a change. When changed files match from:, the named target_step:'s scope becomes the matching to: paths. Lets the planner narrow a broad step to the part of the repo a change actually affects, instead of running it across everything.

- id: typescript-tests-by-module
  type: scope
  target_step: bun-test
  from: "src/**/*.ts"
  to: "tests/**/*.test.ts"
  match_rule:
    by: normalized_basename

match_rule.by Modes

How from: and to: glob matches are paired. Three modes ship today; more may be added as the engine grows.

Mode Behavior Notes
normalized_basename Match by filename, stripped of optional prefix/suffix Use from_strip_prefix / from_strip_suffix to normalize before comparison
directory_path Match by directory path Useful for "src/X/* maps to tests/X/*" style mappings
regex Capture groups in from:, substitution in to: Most flexible escape hatch; use when neither basename nor directory matching fits

The goal is to keep repository truth in the repository model itself.

Plugins

Claude Code

This repo includes an official Claude Code plugin and marketplace layout:

Local testing:

claude --plugin-dir ./plugins/rwyn

Public install after adding this repo as a marketplace:

claude plugin marketplace add smartcontracts/rwyn
claude plugin install rwyn@rwyn-plugins

Codex

This repo also includes a Codex plugin scaffold:

The bundled skill content lives under plugins/rwyn/skills/.

These skills are reference drivers for the loop described in How The Model Is Built And Improved. They demonstrate one good bootstrap-and-iteration flow against rwyn's diagnostic surface; users can replace them with their own.

Notable bundled skills:

  • rwyn Operate and debug an existing rwyn workflow.
  • setup Inspect a repo, scaffold .rwyn/config.yaml, and add declarative transforms.
  • doctor Diagnose a repo's rwyn setup and verification surface.
  • select Explain and inspect the chosen plan for a change.
  • plan Preview what rwyn would execute without running it.
  • explain Explain why a file or change selected a given plan item.

Benchmarking

There is a parity harness at scripts/benchmark-parity.sh.

It compares rwyn against a legacy selector on a commit corpus and reports:

  • selected item count
  • missing selections vs legacy
  • extra selections vs legacy
  • per-commit runtime

Development

Run the core checks locally:

cargo fmt
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-targets --all-features

License

MIT, see LICENSE.

About

Run what you need.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors