This repo is built around one workflow:
take a technical launch brief, run parallel review passes for architecture, security, operations, and evals, then produce a board memo plus a machine-readable recommendation.
The checked scenario in this repo is an internal review for an AI release reviewer that would inspect pull requests before production deployment.
demo/input/release-review-brief.mddemo/output/decision-memo.mddemo/output/review.jsondemo/output/execution-plan.json
If those four files make sense, the code will make sense.
The brief asks a release board a concrete question:
should a platform team approve a four-week pilot for an AI reviewer that looks at GitHub PRs, Jira context, rollout history, and deployment metadata before production release?
The answer from the checked run is not “AI is great” or “multi-agent systems are powerful.” The answer is narrower:
- approve advisory-only pilot
- do not approve enforced blocking yet
- close redaction, telemetry, EU routing, eval, and hotfix ownership gaps first
This is the kind of review where a single free-form assistant tends to blur too many concerns together. The point of the orchestrator is to keep the outputs separated long enough to be useful.
The live run generated:
- a review plan in
demo/output/execution-plan.json - four specialist artifacts in
demo/output/01-step-1.mdthroughdemo/output/04-step-4.md - a board memo in
demo/output/decision-memo.md - a scored review in
demo/output/review.json - an event trace in
demo/output/execution-events.json
Rendered captures:
Summary from demo/output/run-summary.json:
{
"fallback_used": false,
"steps": 4,
"artifacts": 4,
"total_tokens": 23446,
"recommendation": "needs-work",
"coverage_score": 90
}The most important checked output is the board memo:
**Approve a four-week pilot in advisory-only mode. Do not approve enforced blocking of production changes yet.**The most important machine-readable output is the review:
{
"release_recommendation": "needs-work",
"findings": [
{
"severity": "high",
"title": "Prompt redaction and sensitive data handling remain undefined"
}
]
}This project does not expose a generic agent bus.
It runs a fixed board review:
- planner writes the work split
- specialists run in parallel
- synthesizer writes one memo
- reviewer scores whether the memo is actually board-ready
The following controls stay local:
- allowed specialist roles
- plan validation
- concurrency limit
- fallback plan when the planner output is invalid
- artifact ordering
- review schema
The model writes content. The application keeps control of the review process.
Install:
uv sync --extra devSet Azure variables:
export AZURE_OPENAI_ENDPOINT="https://<resource>.openai.azure.com/"
export AZURE_OPENAI_API_KEY="<key>"
export AZURE_OPENAI_API_VERSION="2025-04-01-preview"
export MULTI_AGENT_PLANNER_DEPLOYMENT="gpt-5.4"
export MULTI_AGENT_SPECIALIST_DEPLOYMENT="gpt-5.2-chat"
export MULTI_AGENT_SYNTHESIZER_DEPLOYMENT="gpt-5.4"
export MULTI_AGENT_REVIEWER_DEPLOYMENT="gpt-5.4"Run the checked brief:
uv run python scripts/run_live_demo.pyThat will regenerate the files in demo/output/.
uv run mao-run \
--brief-file /path/to/brief.md \
--goal "Decide whether this pilot should be approved and what controls are missing." \
--out-dir /tmp/review-runThe input should be a real review brief, not a prompt toy. Good inputs usually include:
- target system and pilot scope
- hard requirements and SLOs
- known constraints
- open questions
- ownership gaps
from pathlib import Path
from multi_agent_orchestrator import Orchestrator, Settings
brief = Path("demo/input/release-review-brief.md").read_text()
run = Orchestrator(settings=Settings.from_env()).run(
goal=(
"Decide whether the platform team should approve a pilot rollout "
"of the AI release reviewer and define the controls required first."
),
brief_title="release review brief",
brief_markdown=brief,
)
print(run.review.release_recommendation)
print(run.final_memo_markdown)-
src/multi_agent_orchestrator/orchestrator.pyExecution graph, fallback handling, artifact ordering. -
src/multi_agent_orchestrator/client.pyAzure chat client and JSON-repair pass for structured stages. -
src/multi_agent_orchestrator/prompts.pyBoard-specific stage prompts and schemas. -
demo/README.mdInspection order for the checked run. -
docs/release-review-playbook.mdWhat this review style is trying to protect. -
docs/azure-foundry.mdDeployment split used for the live run.
uv run pytest -q
uv run python -m compileall src scripts