A thin project-orchestration layer over a safe-by-construction local-execution substrate for mini-swe-agent–style coding agents.
Try the live demo · Case study · The thesis · Architecture · Status
JustAi sits between an engineering goal and the bash actions that fulfill it. It decomposes the goal into chunks sized to a budgeted bash-action loop, dispatches each chunk through a sandboxed runner, classifies failures with a structured taxonomy, and learns from every run. The substrate underneath — safe-mini — is what keeps that loop trustworthy on private repos: scoped worktrees, scrubbed environment, command and path guards, and an incident artifact emitted for every action.
Coding agents that decide one bash command at a time are powerful and dangerous in the same breath. A single misstep — a stray rm -rf, an accidental .env commit, a misclassified shell expansion — turns a productive run into an incident. Most current agentic frameworks address this by limiting capability: smaller toolboxes, narrower allowlists, more layers of approval.
JustAi takes the opposite approach.
Sandbox the boundary, not the capability.
Inside a properly-scoped boundary — a fresh worktree, scrubbed env, guarded path — give the agent broad capability. Outside the boundary, deny by default. Agents can solve real engineering tasks end-to-end, with leaks made structurally impossible rather than rule-checked away.
JustAi is the orchestration UX on top of that boundary. The substrate (safe-mini) is the boundary itself, designed to be auditable in one focused sitting.
- Engineering teams who want autonomous coding-agent runs on their actual private code without unbounded blast radius.
- Researchers who want capability and safety as separable axes rather than a single dial.
- Anyone who's noticed that "make the agent more careful" doesn't scale, and is looking for a structural answer.
- Not a model. Bring your own LLM via LiteLLM.
- Not a benchmark harness — that's
local-resident's job (separate repo, post-closure). - Not a runtime substrate by itself — the substrate is
safe-mini(separate repo, post-closure). - Not a finished product — currently a public demo and stabilization artifact. See Status.
mini-swe-agent decides one bash command at a time within a budget — about a hundred lines of agent loop. That minimalism is the point: every prompt-action-observation cycle is auditable, and every step is a candidate for a guardrail.
The bet underneath JustAi: capability and safety should compose, not trade off. A capable agent doesn't need to be careful, if its environment makes carelessness structurally impossible. The substrate is where you spend the safety budget; the orchestrator is where you spend the productivity budget.
This pattern was validated across 54 controlled trials (6 task families × 9 configs) before this release.
Headline finding. An "open" executor leaked a fake credential 6 / 6 probe runs while still solving the task. The "safe" executor blocked 6 / 6 probes and still solved 6 / 6 tasks. Capability is preserved; the leak surface isn't.
Other findings: reproduce_first workflow averaged 2 steps vs 3 for inspect_first; headtail and structured observations beat pure tail on noisy output (tail dropped early failure clues); JSON and fenced-bash action protocols were equivalent in deterministic tests, with malformed-action rate the open question for live-model evaluation.
justai/ — the orchestrator (this is what JustAi ships).
| Module | Role |
|---|---|
scope_planner.py |
Decomposes a goal into chunks fitted to a bash-move budget. Chunks track move budget AND observation budget. |
intent_gate.py |
Classifies the goal type before any execution — execution / multi-step / research / ambiguous. |
reviewer.py |
Pre-dispatch quality gate — catches ambiguous descriptions and missing success criteria. |
checkpoint.py |
Risk-level approval (R0 auto through R3 manual). |
agent_dispatch.py |
Runs each chunk through the substrate runner. |
runner_protocol.py |
Local stub of the AgentRunner Protocol — moves to safe-mini once that repo is stood up. |
trajectory.py + memory.py |
Per-step record of every run: action, file touched, observation, outcome. Vector-indexed, queryable across runs and projects. |
dashboard/ |
Mission Control · Task Board · Trajectories · Memory · Agents · Observability. |
What does not ship here, by design:
- Runner, observation policies, executor policies, failure classifier, worktree provisioner — those live in
safe-mini(substrate, separate repo). - Benchmark task corpus and experiment-driver harness — those live in
local-resident(researcher repo, separate).
JustAi is one of three repos that share a substrate.
Both consumers depend on safe-mini as a peer. safe-mini does not know about its consumers' domain models — it's intentionally generic, so future projects can ship on top of the same substrate.
| Class | Meaning |
|---|---|
safety-violation |
Agent attempted an action the executor policy denied. |
action-protocol-violation |
Output didn't parse as a valid action. |
exhausted-ideas |
Budget remained but loop converged without progress. |
budget-exhausted |
Move or observation budget hit the cap. |
context-starvation |
Observations truncated below decision-relevant detail. |
reward-hacking |
Test passed by means unrelated to the requested change. |
embodiment-failure |
Action ran but didn't produce the expected world-state change. |
Failure-classified runs feed back into the planner: chunks that hit context-starvation get larger observation budgets next time; chunks that hit safety-violation get re-decomposed around the boundary that tripped.
The interactive demo at justai-demo.vercel.app runs a full simulated sprint — eight tasks, three agents, real-time dashboard updates — entirely in the browser. No backend dependencies. The simulation drives the same UI components the real orchestrator uses; it's a fair preview of the production experience.
Active runs, per-model cost, stage latency, sprint timeline at a glance.
Kanban with attempt count, retry, duration, and escalation history per task.
Per-run timeline with phase markers and AI-generated post-mortem analysis.
Cost-vs-quality scatter, p50/p90/p99 latency by stage, token usage trends.
Agent-pool status with current assignments; searchable trajectory + run-result corpus.
Sprint controls live in the top bar: pause, replay, speed multiplier. The simulation is deterministic at a given speed — replay produces identical trajectories.
✓ Phase 1–3 closed
✓ Phase 4 A–F control-plane reframe 2026-04-29
✓ Phase 4 G+H ruff/mypy/test cleanup 2026-04-30
✓ Phase 5 ratification 2026-04-30
✓ Phase 6 closure + v0.4.0 tag 2026-04-30
✓ Public demo browser-visible artifact live
✓ safe-mini repo stand up substrate live
□ local-resident stand up experiment driver next
□ v1.0 public release after substrate post-stand-up
The post-Phase-6 work splits in two directions. Substrate: the canonical types and AgentRunner Protocol currently stubbed in justai/runner_protocol.py move to a new safe-mini repo, which becomes a pip-installable peer dependency. Experiment driver: the 54-trial calibration matrix moves to local-resident, alongside live local-model runs against the substrate.
This repository is public as the browser-visible JustAi demo. The deeper control-plane source lives in JustinJLeopard/JustAi; the execution substrate lives in safe-mini.
| Phase | Scope | Status |
|---|---|---|
| 1–3 | original architecture | closed |
| 4 | control-plane reframe + cleanup | landed v0.4.0 |
| 5 | ratification | clean |
| 6 | closure | tagged |
Verification at closure: 370 pytest passes (+14 subtests), ruff clean, mypy clean, gitleaks clean (with narrow historical allowlist), pip-audit clean, npm-audit clean, clean-venv install passes, dashboard build + smoke pass, 3-repo plan consistent.
safe-mini is live at github.com/JustinJLeopard/safe-mini (alpha, MIT). local-resident remains the private experiment driver until it is ready to stand on its own.
Justin Leopard at Delegate & Orchestrate.
Building autonomous AI systems that orchestrate, learn, and ship.
For research and collaboration inquiries: open an issue, or contact via the website.
If this resonates and you want to follow along, watch this repo or start with the case study at delegateandorchestrate.com/work/justai.
MIT — see LICENSE.
Last updated 2026-05-06. Public demo live; substrate split underway.






