Codex Council

Make Codex argue with itself before you ship.

Codex Council turns one Codex request into a small panel review: several reviewers answer independently, their work is anonymized and ranked on a rubric, and a Chairman writes the final call — with the token cost shown up front. It runs entirely inside Codex and never calls a third‑party model API.

📖 Website · 🧭 Wiki / playbook · 🔬 Real example · 🇮🇹 Italiano

One honest caveat, up front. The "diversity" here comes from isolated role prompts and anonymous review, not from multiple model vendors. That's enough to break single‑pass anchoring and surface dissent — it is not the same as a panel of independent labs. The whole project is built around saying that plainly.

When to use it

Reach for the council when a confident, wrong answer is expensive — a migration you can't undo, a regression users will hit, a tradeoff you can't call alone. For small, reversible, checkable work, plain Codex is faster.

Good fit	Skip it
Architecture decisions, risky diffs, migrations	Tiny edits and quick questions
Security, privacy, data‑loss risk	Anything you can verify yourself in a minute
Frontend/UX behavior and release go/no‑go	A task that just needs one straightforward answer

Quickstart

1. Install from the Codex Marketplace CLI, then reload Codex:

npx codex-marketplace add ercoledevs/codex-council --plugin --global -y

2. Ask for a review in chat — just say what to review and what you care about:

Use Codex Council to review this architecture decision.
Focus on blockers, rollback, and verification.

3. Accept the estimate. Before any Standard/Deep run, the council shows a token estimate and waits for your OK. "Use Codex Council" is a request, not permission to spend — and expanded needs a separate, explicit yes.

That's it. Everything below is detail you can reach for when you need it.

How it works

One request becomes four stages:

Stage	What happens
1. First opinions	Up to six reviewers answer independently, in parallel, before seeing each other's work.
2. Anonymous review	Outputs lose their authorship (Candidate A–F) and are ranked/scored on a rubric — not on who sounds senior.
3. Aggregation	Scores combine deterministically (locally, when reviewer JSON exists). Blockers and dissent are kept, not averaged away.
4. Chairman synthesis	The main agent writes the final call from saved outputs — winner, dissent, blockers, verification. Not just "the highest score wins."

The final answer leads with the decision and a confidence level (high / medium / low / blocked), separates blockers from refinements, keeps dissent visible, and lists the exact verification to run. Council consensus is not proof — the verification is how you make it real.

The council

Six reviewers answer every standard run. Each one guards a different concern:

Role	Lens
Ada Lovelace — Principal Architect	boundaries, integration, maintainability, migration risk
Grace Hopper — Reliability Engineer	failure modes, tests, rollback, observability
Hypatia — Security & Governance	secrets, permissions, privacy, provenance, policy
Florence Nightingale — Product & Operator	workflow fit, docs, adoption, operational friction
Alan Turing — Contrarian Red Team	hidden assumptions, simpler alternatives, overengineering
Seymour Cray — Performance Engineer	latency, throughput, memory, cost, scale, measurement

Optional frontend gate

Turn it on with --frontend-review (or --type frontend) for UI/UX work:

Leonardo da Vinci — a brutally honest UX/UI critic. A Leonardo blocker lowers final confidence even when the technical scores are high.
Bob — a browser evidence runner. He drives a real browser and reports pass/fail. Bob never votes, and nothing is called "verified" until Bob (or equivalent browser evidence) actually ran the path.

Modes & token budget

Pick the smallest mode that catches the risk. Escalate per blocker, not by default.

Mode	Use it for
`fast`	small, reversible, low‑risk decisions (a single Chairman pass)
`standard`	implementation, architecture, and performance decisions (six members)
`deep`	security, data loss, migrations, irreversible changes, or a close tie
`--frontend-review`	UI/UX/browser behavior (adds Leonardo + Bob)
`--type skill --skill-review`	plugin/skill usability (a cheap three‑lens panel)

Output detail is controlled by --token-budget, which defaults to compact:

Budget	When
`compact` (default)	normal decisions — tight outputs, smallest reference set
`balanced`	real tradeoffs and ambiguity — more detail, only on the risky parts
`expanded`	security/data‑loss/irreversible — full evidence. Blocked until you confirm.

Typed synthesis templates are available via --type architecture|implementation|decision|skill|frontend.

How to prompt it

A council prompt isn't a question — it's a decision to pressure‑test. The shape that works:

[Standard|Deep|Frontend] Council: review <the specific decision>.
Context: <the diff, files, or links it should look at>.
Constraints: <hard limits — compatibility, deadline, budget>.
Return: blockers, dissent, confidence, the safest v1, and the exact verification.

A few ready‑to‑use examples:

Council review this diff.
Focus on regressions, missing tests, and performance impact.

Deep Council: review this migration for security, rollback, and data-loss risk.

Frontend Council: review this modal flow with Leonardo and have Bob verify
the browser interaction cases before Chairman synthesis.

Tips: name the mode (it sets cost and scrutiny), give it a real decision rather than a vibe, point at evidence, and state your hard constraints. Don't ask it to "just confirm" — the council preserves dissent on purpose. Explaining the council ("how does it work?") is not running it; an ambiguous ask gets one clarifying line, no dispatch.

➡️ The Wiki has a 16‑recipe cookbook with paste‑ready prompts for common situations.

Tuning roles (alters)

An alter is a bounded, local tweak to how one reviewer behaves — make Ada blunter, point Seymour at database cost, tell Leonardo to stop being polite about bad UI.

Tuning is advisory only: it can sharpen focus and tone, but it can never remove blockers, dissent, verification, anonymization, or Bob's non‑voting status. Bob isn't tunable. Always preview before saving:

python3 scripts/codex_council.py alters preview --role leonardo \
  --tone "more brutally honest about confusing interaction design" \
  --domain-focus "mobile UI, modal accessibility, and click-through regressions"

See CLI reference → Tune roles for the full command set.

CLI reference

The helper script (scripts/codex_council.py) is stdlib‑only. For everyday use you don't need it at all — ask in chat. The CLI is for traceable sessions, estimates, scoring, and stats you want to keep. Sections are collapsed; click to expand.

Setup & estimate

# Configure the local consumer profile used for estimates (stored with consent).
python3 scripts/codex_council.py profile --plan Plus --model GPT-5.3-Codex --reasoning medium

# Show the first-run questions when no profile exists.
python3 scripts/codex_council.py profile

# Estimate before starting, then accept the range.
python3 scripts/codex_council.py estimate --topic "Architecture Review" --mode standard --token-budget compact

# Is this a real run, or just talking about the council?
python3 scripts/codex_council.py classify-invocation --text "explain how council works"

Run a traceable session

# Scaffold a session after accepting the estimate. --root is the workspace analyzed;
# artifacts are stored in plugin-local .codex-council/sessions/, never in your project.
python3 scripts/codex_council.py init --topic "Architecture Review" --root . \
  --mode standard --token-budget compact --confirm-estimate

# Frontend session (Leonardo + Bob).
python3 scripts/codex_council.py init --topic "Modal Review" --root . \
  --mode standard --frontend-review --confirm-estimate

# Compact skill/tool review session.
python3 scripts/codex_council.py init --topic "Skill Review" --root . \
  --type skill --skill-review --confirm-estimate

# expanded must be confirmed explicitly.
python3 scripts/codex_council.py init --topic "Migration Review" --root . \
  --mode deep --token-budget expanded --confirm-expanded

# Optional: ASCII banner (terminal) or a one-line dispatch announcement.
python3 scripts/codex_council.py init --topic "Architecture Review" --root . --banner
python3 scripts/codex_council.py init --topic "Decision Review" --root . --type decision --announce --confirm-estimate

Score, validate & close out

# Aggregate reviewer scores from a JSON file (use --compact for compact JSON).
python3 scripts/codex_council.py score --input reviews.json

# Validate a generated session.
python3 scripts/codex_council.py validate-session --session <printed-session-dir>

# End-of-session stats; --write persists stats.json and stats.md.
python3 scripts/codex_council.py stats --session <printed-session-dir> --write

# Optional: path-only raw bundle, and compact pre/post history (with consent).
python3 scripts/codex_council.py stats --session <printed-session-dir> --write --raw-bundle
python3 scripts/codex_council.py stats --session <printed-session-dir> --write --record-history

Stats are local estimates, not actual Codex token usage, billing telemetry, or exact tool‑call accounting. They separate pre_execution_estimate, post_execution_estimate, and artifact_only_tokens; if prompts or outputs are missing, coverage is reported as partial.

Tune roles (alters)

# Inspect current tuning.
python3 scripts/codex_council.py alters list
python3 scripts/codex_council.py alters show --role ada

# Preview, then save (Ada, Grace, Hypatia, Florence, Turing, Seymour, Leonardo).
python3 scripts/codex_council.py alters preview   --role ada --tone "more direct" --domain-focus "API design and maintainability"
python3 scripts/codex_council.py alters configure --role ada --tone "more direct" --domain-focus "API design and maintainability"

# Reset one role or all tuning.
python3 scripts/codex_council.py alters reset --role ada
python3 scripts/codex_council.py alters reset --all

Supported fields: --domain-focus, --strictness, --tone, --risk-posture, --evidence-preference, --extra-check, --instruction. Use the CLI for changes — don't hand‑edit alter-overrides.json.

Maintain

# Strict plugin validation.
python3 scripts/codex_council.py validate --plugin-root . --strict

# Check for a newer GitHub release (--json for machine-readable output).
python3 scripts/codex_council.py check-update

# Run the test suite.
python3 -m unittest discover -s tests -v

Privacy & local state

The council keeps its runtime artifacts — session scaffolds, estimates, prompts, outputs, stats, history, and alter overrides — in plugin‑local .codex-council/, not inside your project, and the folder is gitignored. So you can reuse profiles and learning history across projects without polluting any repo.

Invocation logs are compact JSONL and never store prompt text, raw output, secrets, topics, workspace roots, or absolute paths.
The consumer profile stores only your declared plan/model/reasoning and compact aggregate history — never prompts or transcripts.
State lives in a stable parent (codex-council/.codex-council/) so tuning and history survive plugin updates. Override paths with CODEX_COUNCIL_STATE_ROOT, CODEX_COUNCIL_HOME, or CODEX_COUNCIL_SESSION_ROOT.

Limits

Read these before you rely on it:

Consensus is not proof. This is an advisory workflow, not a legal, security, or compliance approval system. Always run the verification.
Not multi‑vendor diversity. Role isolation reduces single‑pass anchoring; it does not equal multiple independent model providers.
No fake UI verification. UI behavior isn't "verified" unless Bob, or equivalent browser evidence, actually ran the path.
No billing telemetry. Token reports are local heuristics, not your real Codex usage or remaining quota — check Codex Settings → Usage for that.
expanded is gated. It can consume a lot of usage, so it never runs without explicit confirmation. Prefer expanding one blocker over a whole session.
Use Deep mode for sensitive, irreversible, privacy, security, migration, or data‑loss decisions.

Install options

Project vs. global scope

# Pick scope when prompted, or set it explicitly:
npx codex-marketplace add ercoledevs/codex-council --plugin --project
npx codex-marketplace add ercoledevs/codex-council --plugin --global

# Non-interactive:
npx codex-marketplace add ercoledevs/codex-council --plugin --global -y

Restart or reload Codex after installing or updating.

Update

Re‑run the install command to pull the latest version:

npx codex-marketplace add ercoledevs/codex-council --plugin --global -y

Then reload Codex. To get notified of new versions, Watch the repo → Custom → enable Releases. You can also check from the CLI:

python3 scripts/codex_council.py check-update

Manual install

Clone into your local Codex plugin directory:

mkdir -p ~/plugins
git clone https://github.com/ercoledevs/codex-council.git ~/plugins/codex-council

Add it to your local marketplace file (usually ~/.agents/plugins/marketplace.json):

{
  "name": "codex-council",
  "source": { "source": "local", "path": "./plugins/codex-council" },
  "policy": { "installation": "AVAILABLE", "authentication": "ON_INSTALL" },
  "category": "Productivity"
}

Restart or reload Codex after adding the plugin.

Development

# Tests
python3 -m unittest discover -s tests -v

# Strict validation
python3 scripts/codex_council.py validate --plugin-root . --strict

# Before publishing, check for stray local artifacts
find . -name '.DS_Store' -o -name '._*' -o -name '__pycache__' -o -name '*.pyc'

Repository layout:

codex-council/
├── .codex-plugin/plugin.json        # plugin manifest
├── scripts/codex_council.py         # stdlib-only helper CLI
├── skills/codex-council/            # the skill + reference docs
│   ├── SKILL.md
│   └── references/                  # roles, rubric, protocol, token budget, …
├── skills/codex-council-alters/     # role-tuning skill
├── docs/                            # the website (GitHub Pages)
├── assets/
├── tests/
└── PROVENANCE.md

The public site in docs/ is published with GitHub Pages from the main branch (/docs folder) → https://ercoledevs.github.io/codex-council/

Credits & license

Inspired by the public LLM Council pattern:

The original asks multiple independent models for answers, anonymizes them for peer review/ranking, then has a Chairman model synthesize the result. Codex Council keeps that decision shape while adapting execution to Codex roles, optional Codex subagents, and local deterministic scoring. Additional workflow patterns (single‑round critics, a separate synthesis pass, typed panels, fail‑fast setup checks, compact invocation logging) are adapted from Chris Blattman's Claude council pattern — without adding any cross‑vendor model calls. See PROVENANCE.md for details.

Licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codex Council

Contents

When to use it

Quickstart

How it works

The council

Optional frontend gate

Modes & token budget

How to prompt it

Tuning roles (alters)

CLI reference

Privacy & local state

Limits

Install options

Development

Credits & license

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.codex-plugin		.codex-plugin
assets		assets
docs		docs
scripts		scripts
skills		skills
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
PROVENANCE.md		PROVENANCE.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Codex Council

Contents

When to use it

Quickstart

How it works

The council

Optional frontend gate

Modes & token budget

How to prompt it

Tuning roles (alters)

CLI reference

Privacy & local state

Limits

Install options

Development

Credits & license

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages