feat: Issue #18 Phase 1 — autonomous agent orchestration baseline by stevei101 · Pull Request #50 · stevedores-org/oxidizedgraph

stevei101 · 2026-06-02T23:21:06Z

Summary

Implements Phase 1 of the autonomous AI agent orchestration roadmap (EPIC1, EPIC3, EPIC4 baseline):

execution — RunContext, append-only TransitionLog, TracedRunner, ReplayRunner, StateValidator for traceable, auditable runs
tools — ToolPolicyEngine (fail-closed, blocked patterns, approval routing), SubprocessSandbox (timeout + cwd roots), ToolNodeConfig (policy + per-tool timeout)
guardrails — QualityGateNode, ReviewFinding schema, RiskClassifier, merge-blocker routing (passed / gate_failed / needs_approval)

Also adds autonomous_dev_workflow example, docs/ROADMAP-18.md, README roadmap updates, and a cargo test job in validate.yml for PRs touching Rust sources.

Closes #22 (EPIC4 Code Quality Guardrails baseline).

Test plan

cargo test (107 unit + 9 integration tests)
cargo run --example autonomous_dev_workflow
Verify CI rust job on this PR

Made with Cursor

Deliver EPIC1 traced execution, EPIC3 tool policy/sandbox, and EPIC4 quality gates so agent workflows can run with traceability, bounded tool risk, and CI-style merge routing before shipping changes. Closes #22 Co-authored-by: Cursor <cursoragent@cursor.com>

stevei101

Code review

Overview

Phase 1 of the autonomous-agent orchestration roadmap (#18). Three new modules — execution (RunContext, TransitionLog, TracedRunner, ReplayRunner, StateValidator), tools (ToolPolicyEngine, SubprocessSandbox, ToolNodeConfig), guardrails (QualityGateNode, ReviewFinding, RiskClassifier, merge routing) — plus an example, docs, and a cargo test CI job. 2007 LOC, 22 files, 107 + 9 tests.

Solid baseline. The shape (per-module types, clean re-exports through prelude, dependency injection via traits for the command runner / sandbox executor) is sound and matches the existing project style. Issues below are mostly hygiene + a few real bugs.

Correctness — real bugs

truncate_log will panic on multi-byte UTF-8. src/guardrails/gate.rs:
```
fn truncate_log(s: &str, max: usize) -> String {
    if s.len() <= max { s.to_string() } else { format!("{}…", &s[..max]) }
}
```
&s[..max] panics if max lands inside a multi-byte char. Tool output (especially cargo with emoji/colors) routinely contains UTF-8. Use chars().take(max).collect::<String>() or floor_char_boundary.
Policy blocked_patterns bypassed for RequireApproval tools. src/tools/policy.rs::evaluate checks approval_required before blocked_patterns. So an approval-required shell tool whose command contains sudo or rm -rf / will route through approval and (presumably) execute on human OK — but the human shouldn't even see those. Reorder: deny → blocked_patterns → approval_required → allow → fail-closed.
QualityGateConfig.block_on_failure is dead. Set in rust_defaults() and the example, never read by QualityGateNode::execute. The node routes purely off merge_blocker.blocked. Either honour the flag or drop the field.
ReviewFinding::error(...).at("Cargo.toml", 1) is a fake location. src/guardrails/gate.rs::run_checks always tags failed-check findings as Cargo.toml:1 regardless of which check failed. Misleading in PR comments. Drop the .at(...) call when there's no real location, or parse it out of the check output.
ReplayRunner doesn't replay — it compares. Naming oversells. ReplayRunner::compare(expected, actual) is a log-diff. The actual primitive needed to "validate deterministic replay" is to re-run the graph from a recorded log + initial state and check the produced log matches. Consider renaming to TransitionLogDiff or wiring up an actual re-execution path (the stub output_from_kind hints at the intent but is dead-code).

Correctness — minor

Iteration counter type mismatch. TracedRunner::run_loop uses iterations: u32; state.iteration is usize; TransitionRecord has both (iteration: u32, state_iteration: usize). Pick one, propagate.
resolve_next silently routes Continue(None) to END but warns on missing Transition(key) edge. Inconsistent. Either both should warn, or document why Continue(None) ending the graph is silent-by-design.
Policy command_hint extraction assumes arguments["command"]. Many shell tools use script, cmd, bash_cmd. Document the convention or take the arg name as a config field on the policy.
SubprocessSandbox::validate_working_dir uses blocking std::fs::canonicalize inside async. Tokio runtime won't deadlock for short paths but it's a smell — use tokio::fs::canonicalize or spawn_blocking.

Architecture / convention

src/nodes/quality_gate.rs is just a re-export of src/guardrails. Two import paths for the same types (crate::nodes::QualityGateNode and crate::guardrails::QualityGateNode). Pick one canonical location. The guardrails home reads better — built-in node modules become discoverable via nodes only when they don't have a richer home.
SubprocessSandbox is a subprocess runner with a timeout, not a sandbox. It does:
- Timeout ✅
- cwd allowlist ✅ (but default = empty = any cwd)
- sh -c "$cmd" — no fs/process/network isolation, classic shell-injection surface
Either rename to SubprocessRunner or be explicit in the docstring that this is the absolute minimum and a "real" sandbox needs namespaces/seccomp/etc. The current name suggests a security boundary it doesn't provide.
SandboxConfig::default() allows any cwd. A sandbox that defaults to "no cwd restriction" is barely a sandbox. Either tighten the default or rename the constructor to permissive.
Asymmetric ReviewFinding constructors. Only ::error. Add ::info and ::warning for symmetry, otherwise people will reach for struct-literal syntax which couples to internal fields.
use super::transition::TransitionLog; after TracedRunResult definition. Stylistic — Rust allows it, but conventional placement is at the top with other uses. The current order made me re-read to confirm scope.

Performance

QualityGateNode::run_checks runs checks sequentially. For independent gates (fmt, clippy, test), tokio::join!/buffer_unordered would parallelise. Big win for the canonical Rust defaults — fmt is ~1s, clippy 30s+, test 60s+; they're independent.

Test coverage

No test for the policy bypass bug (#2).
No test for SubprocessSandbox::validate_working_dir.
No test for StateValidator's schema-based path — only the require_key path is exercised.
ReplayRunner tests only test the comparator, not full replay. Reinforces the naming concern (#5).

CI

cargo test job has no paths: filter. Will run on doc-only PRs. Minor.

Security

The big one is #11 — "Sandbox" branding. If autonomous agents in the wild end up trusting SubprocessSandbox because its name suggests isolation, the project may inherit incidents. Either ship real isolation primitives (rootless containers, seccomp profile, network namespace) or pick a name that doesn't claim more than it delivers.
Policy RequireApproval has no approval primitive yet. The PR description claims "approval routing"; the actual flow returns an error message in ToolResult. There's no human-in-the-loop handshake. Worth flagging in docs/ROADMAP-18.md or PR body what "approval routing" means vs. what's deferred.

Priority for follow-up

Fix #1 (UTF-8 panic) — real crash risk.
Fix #2 (policy ordering) — security correctness.
Fix #4 (Cargo.toml:1 fake location) — quality of agent outputs.
Resolve #11 (Sandbox naming/scope) — sets expectations for downstream consumers.
Either honour or remove #3 (block_on_failure).

Nothing here blocks the Phase 1 baseline landing — happy to see the surface area shape up. Worth a fixup pass on the bugs before Phase 2 builds on it.

stevei101 commented Jun 3, 2026

View reviewed changes

This was referenced Jun 3, 2026

Status: Issue #18 roadmap tracker #51

Open

Roadmap: Autonomous AI Agent Orchestration for Code Development #18

Open

stevei101 merged commit c5a5f0b into develop Jun 3, 2026
5 checks passed

stevei101 deleted the feat/issue-18-autonomous-orchestration-phase1 branch June 3, 2026 22:36

stevei101 mentioned this pull request Jun 3, 2026

fix: quality gate medium-risk review routing (Issue #18 follow-up) #52

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Issue #18 Phase 1 — autonomous agent orchestration baseline#50

feat: Issue #18 Phase 1 — autonomous agent orchestration baseline#50
stevei101 merged 1 commit into
developfrom
feat/issue-18-autonomous-orchestration-phase1

stevei101 commented Jun 2, 2026

Uh oh!

stevei101 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stevei101 commented Jun 2, 2026

Summary

Test plan

Uh oh!

stevei101 left a comment

Choose a reason for hiding this comment

Code review

Overview

Correctness — real bugs

Correctness — minor

Architecture / convention

Performance

Test coverage

CI

Security

Priority for follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant