JustAi

A thin project-orchestration layer over a safe-by-construction local-execution substrate for mini-swe-agent–style coding agents.

Try the live demo · Case study · The thesis · Architecture · Status

JustAi sits between an engineering goal and the bash actions that fulfill it. It decomposes the goal into chunks sized to a budgeted bash-action loop, dispatches each chunk through a sandboxed runner, classifies failures with a structured taxonomy, and learns from every run. The substrate underneath — safe-mini — is what keeps that loop trustworthy on private repos: scoped worktrees, scrubbed environment, command and path guards, and an incident artifact emitted for every action.

Why JustAi

Coding agents that decide one bash command at a time are powerful and dangerous in the same breath. A single misstep — a stray rm -rf, an accidental .env commit, a misclassified shell expansion — turns a productive run into an incident. Most current agentic frameworks address this by limiting capability: smaller toolboxes, narrower allowlists, more layers of approval.

JustAi takes the opposite approach.

Sandbox the boundary, not the capability.

Inside a properly-scoped boundary — a fresh worktree, scrubbed env, guarded path — give the agent broad capability. Outside the boundary, deny by default. Agents can solve real engineering tasks end-to-end, with leaks made structurally impossible rather than rule-checked away.

JustAi is the orchestration UX on top of that boundary. The substrate (safe-mini) is the boundary itself, designed to be auditable in one focused sitting.

Who this is for

Engineering teams who want autonomous coding-agent runs on their actual private code without unbounded blast radius.
Researchers who want capability and safety as separable axes rather than a single dial.
Anyone who's noticed that "make the agent more careful" doesn't scale, and is looking for a structural answer.

What this is not

Not a model. Bring your own LLM via LiteLLM.
Not a benchmark harness — that's local-resident's job (separate repo, post-closure).
Not a runtime substrate by itself — the substrate is safe-mini (separate repo, post-closure).
Not a finished product — currently a public demo and stabilization artifact. See Status.

The thesis

mini-swe-agent decides one bash command at a time within a budget — about a hundred lines of agent loop. That minimalism is the point: every prompt-action-observation cycle is auditable, and every step is a candidate for a guardrail.

The bet underneath JustAi: capability and safety should compose, not trade off. A capable agent doesn't need to be careful, if its environment makes carelessness structurally impossible. The substrate is where you spend the safety budget; the orchestrator is where you spend the productivity budget.

This pattern was validated across 54 controlled trials (6 task families × 9 configs) before this release.

Headline finding. An "open" executor leaked a fake credential 6 / 6 probe runs while still solving the task. The "safe" executor blocked 6 / 6 probes and still solved 6 / 6 tasks. Capability is preserved; the leak surface isn't.

Other findings: reproduce_first workflow averaged 2 steps vs 3 for inspect_first; headtail and structured observations beat pure tail on noisy output (tail dropped early failure clues); JSON and fenced-bash action protocols were equivalent in deterministic tests, with malformed-action rate the open question for live-model evaluation.

What the orchestrator does

justai/ — the orchestrator (this is what JustAi ships).

Module	Role
`scope_planner.py`	Decomposes a goal into chunks fitted to a bash-move budget. Chunks track move budget AND observation budget.
`intent_gate.py`	Classifies the goal type before any execution — execution / multi-step / research / ambiguous.
`reviewer.py`	Pre-dispatch quality gate — catches ambiguous descriptions and missing success criteria.
`checkpoint.py`	Risk-level approval (R0 auto through R3 manual).
`agent_dispatch.py`	Runs each chunk through the substrate runner.
`runner_protocol.py`	Local stub of the AgentRunner Protocol — moves to `safe-mini` once that repo is stood up.
`trajectory.py` + `memory.py`	Per-step record of every run: action, file touched, observation, outcome. Vector-indexed, queryable across runs and projects.
`dashboard/`	Mission Control · Task Board · Trajectories · Memory · Agents · Observability.

What does not ship here, by design:

Runner, observation policies, executor policies, failure classifier, worktree provisioner — those live in safe-mini (substrate, separate repo).
Benchmark task corpus and experiment-driver harness — those live in local-resident (researcher repo, separate).

Three-repo architecture

JustAi is one of three repos that share a substrate.

Both consumers depend on safe-mini as a peer. safe-mini does not know about its consumers' domain models — it's intentionally generic, so future projects can ship on top of the same substrate.

The substrate's failure taxonomy

Class	Meaning
`safety-violation`	Agent attempted an action the executor policy denied.
`action-protocol-violation`	Output didn't parse as a valid action.
`exhausted-ideas`	Budget remained but loop converged without progress.
`budget-exhausted`	Move or observation budget hit the cap.
`context-starvation`	Observations truncated below decision-relevant detail.
`reward-hacking`	Test passed by means unrelated to the requested change.
`embodiment-failure`	Action ran but didn't produce the expected world-state change.

Failure-classified runs feed back into the planner: chunks that hit context-starvation get larger observation budgets next time; chunks that hit safety-violation get re-decomposed around the boundary that tripped.

The dashboard

The interactive demo at justai-demo.vercel.app runs a full simulated sprint — eight tasks, three agents, real-time dashboard updates — entirely in the browser. No backend dependencies. The simulation drives the same UI components the real orchestrator uses; it's a fair preview of the production experience.

Mission Control

Active runs, per-model cost, stage latency, sprint timeline at a glance.

Task Board

Kanban with attempt count, retry, duration, and escalation history per task.

Trajectories

Per-run timeline with phase markers and AI-generated post-mortem analysis.

Observability

Cost-vs-quality scatter, p50/p90/p99 latency by stage, token usage trends.

Agents and Memory

Agent-pool status with current assignments; searchable trajectory + run-result corpus.

Sprint controls live in the top bar: pause, replay, speed multiplier. The simulation is deterministic at a given speed — replay produces identical trajectories.

Roadmap

✓  Phase 1–3                                    closed
✓  Phase 4 A–F      control-plane reframe       2026-04-29
✓  Phase 4 G+H      ruff/mypy/test cleanup      2026-04-30
✓  Phase 5          ratification                2026-04-30
✓  Phase 6          closure + v0.4.0 tag        2026-04-30
✓  Public demo       browser-visible artifact   live
✓  safe-mini repo    stand up substrate         live
□  local-resident   stand up experiment driver  next
□  v1.0 public      release after substrate     post-stand-up

The post-Phase-6 work splits in two directions. Substrate: the canonical types and AgentRunner Protocol currently stubbed in justai/runner_protocol.py move to a new safe-mini repo, which becomes a pip-installable peer dependency. Experiment driver: the 54-trial calibration matrix moves to local-resident, alongside live local-model runs against the substrate.

Status

This repository is public as the browser-visible JustAi demo. The deeper control-plane source lives in JustinJLeopard/JustAi; the execution substrate lives in safe-mini.

Phase	Scope	Status
1–3	original architecture	closed
4	control-plane reframe + cleanup	landed `v0.4.0`
5	ratification	clean
6	closure	tagged

Verification at closure: 370 pytest passes (+14 subtests), ruff clean, mypy clean, gitleaks clean (with narrow historical allowlist), pip-audit clean, npm-audit clean, clean-venv install passes, dashboard build + smoke pass, 3-repo plan consistent.

safe-mini is live at github.com/JustinJLeopard/safe-mini (alpha, MIT). local-resident remains the private experiment driver until it is ready to stand on its own.

Built by

Justin Leopard at Delegate & Orchestrate.

Building autonomous AI systems that orchestrate, learn, and ship.

For research and collaboration inquiries: open an issue, or contact via the website.

If this resonates and you want to follow along, watch this repo or start with the case study at delegateandorchestrate.com/work/justai.

License

MIT — see LICENSE.

_{Last updated 2026-05-06. Public demo live; substrate split underway.}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
assets		assets
screenshots		screenshots
scripts		scripts
writing		writing
LICENSE		LICENSE
README.md		README.md
architecture.png		architecture.png
architecture.svg		architecture.svg
index.html		index.html
og-image.png		og-image.png
og-image.svg		og-image.svg
package.json		package.json
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JustAi