Language: English · 한국어
One model editing its own code keeps reinforcing its own blind spots, so the same bug survives multiple review passes? mangchi alternates Claude-as-editor with Codex-as-reviewer across 5 rotating axes (correctness / security / readability / performance / design), catching each model's blind spots with the other until the code converges.
Claude writes, Codex critiques, Claude decides. Rotating review axes until the code stops moving or you hit the cap. Like hammering metal into shape one stroke at a time.
Without mangchi:
You ask Claude to harden auth.py. It reviews itself, says "good enough", you ship it. A week later, Codex or a different reviewer immediately spots a race condition that Claude and you both missed. Same blind spot, one model, one pass.
With mangchi:
/mangchi src/services/auth.py. Round 1: Claude edits, Codex reviews on correctness. Round 2: Claude edits, Codex reviews on security. Round 3: readability. Each round a different axis, a different model doing the critique. Stop when two rounds in a row agree nothing more is needed, or at round 5. The delta file is written separately so the original is never touched unless you say so.
- Production-bound code you want to harden before release - single-file sweep across 5 axes
- Prompt-injection and security defense code - adversarial second opinion from a different model
- Claude Max users with spare Claude tokens but needing external cross-model review without running Codex by hand
- Correctness / security / readability / performance / design sweep in one loop
- Code-review PRs that need a rationale doc - mangchi writes the round-by-round history
- ddaro - worktree-based parallel Claude Code sessions with safe merge.
- prism - 5-agent parallel code review (broader, one-shot).
- triad - 3-perspective deliberation for design docs and markdown.
A Claude Code skill that hardens a single code file through iterative cross-model review. Each round picks one axis (correctness / security / readability / performance / design), sends the file to the Codex CLI for adversarial critique, lets Claude decide which issues to accept, and applies the fixes. Repeat with a different axis until the file converges.
Think of it as a code-review ping-pong where neither player gets to short-circuit - Codex can't apply its own fixes, Claude can't skip the critique.
- Existing code that needs hardening before a release, PR, or deploy
- Security review of code that accepts untrusted input
- Post-implementation cleanup when Claude wrote the first draft and you want an outside model to sanity-check it
- Before turning a prototype into production code
- Greenfield construction from scratch - mangchi hardens existing code; use a code-generation tool for net-new implementation
- Markdown or documentation review - use a deliberation tool instead (e.g. triad - mangchi operates on executable code with real fixes, not prose)
- Cross-file architectural refactors - mangchi is single-file by design
- Tiny utility files (< 80 LoC) - overhead exceeds signal
Mangchi is designed to compose with other Claude Code tools in a natural workflow:
| Stage | Tool | Role |
|---|---|---|
| Decide | deliberation tools (e.g. triad) | Multi-perspective design review before coding |
| Build | code-generation plugins (e.g. pumasi) | Parallel greenfield implementation |
| Harden | mangchi (this) | Single-file iterative cross-model review |
| Verify | existing review/test runners | Final gate before merge |
Pick the one that matches the stage you're in. Mangchi specifically targets the "code exists, needs to be better" gap.
# Codex CLI (the reviewer)
npm install -g @openai/codex
codex login # or set OPENAI_API_KEY
# Claude Code (the orchestrator)
# See https://docs.claude.com/en/docs/claude-codemangchi is distributed through the haroom_plugins aggregator marketplace along with the other haroom plugins (ddaro, prism, triad).
# 1. Add the haroom_plugins marketplace (one time)
/plugin marketplace add https://github.com/minwoo-data/haroom_plugins.git
# 2. Install
/plugin install mangchiRestart Claude Code after install. Upgrades are a single /plugin update - the aggregator pulls updates for every haroom plugin you have installed.
/mangchi src/services/auth.py # default: updated.* only, original untouched
/mangchi src/services/auth.py --apply=original # also Edit the original file
/mangchi src/utils/hash.py --only-axes=correctness,security # restrict axis rotation
/mangchi src/new_module.py --include-axes=necessity # default 5 + necessity opt-in
/mangchi src/auth.py --start-axis=security # R1 starts on security (not default correctness)
/mangchi src/parse.py --gate "pytest -x tests/" # require external gate before CONVERGED
/mangchi src/x.py --no-verify # skip verify loop (adversarial guarantee lost)
/mangchi --continue src-services-auth-py # resume an in-progress session (including aborted)
/mangchi --stop src-services-auth-py # force close with whatever's in state
Natural-language triggers also work (e.g. "refine with mangchi", "cross-model review"). Korean triggers are documented in the Korean README.
See skills/mangchi/references/usage.md for every flag and more examples.
| Axis | Question it asks | Default |
|---|---|---|
correctness |
Does the code behave right on every input shape? | ✓ |
security |
What attack surface does this expose? | ✓ |
readability |
Can a contributor understand & change this in 6 months? | ✓ |
performance |
Where is it wasting I/O, memory, or cycles? | ✓ |
design |
Will this still be maintainable in a year? | ✓ |
necessity |
Is this new code necessary, or does existing infra cover it? (YAGNI) | opt-in via --include-axes=necessity |
robustness |
Does it survive on concurrency / failure-recovery / data-integrity / state-transitions? | opt-in via --include-axes=robustness |
Each round uses exactly one axis. Adjacent rounds cannot repeat an axis - rotation is enforced so you don't get five "correctness" rounds in a row.
robustness is a 4-sub-axis runtime-failure probe - the reviewer must walk all four sub-axes (concurrency, failure & recovery, data integrity, state transitions) per round, or mark N/A with reason. It's orthogonal to correctness (which asks "does it meet its contract on the happy path?") - robustness asks "does it survive adversarial runtime conditions?". Opt-in because pure/stateless code has no realistic signal on these axes.
To enable both opt-in axes: --include-axes=necessity,robustness.
Any of:
- Two consecutive verified rounds - all ACCEPTed issues actually touched
a matching diff (
locus±5 rule) AND zeroDISAGREEreturns from Codex's verify pass. - Two consecutive
PASS + no_changes_suggested- requirescorrectnessANDsecurityto each have been executed at least once (gaming guard; "easy axes only" PASS streaks don't count). - R5 hard cap (prevents oscillation + token drain).
--gate "<cmd>"exit 0 - external command passes (runs once before termination, or every round with--gate-every-round).- User
--stop.
Aborts (session can be resumed with --continue after manual arbitration):
- Cumulative Codex tokens ≥ 500K
forced_accept_count ≥ --force-accept-threshold(default 1 = strict)- Codex YAML schema retry exhausted
- Per-call context window exceeded (≥ 180K tokens)
- Original files are never modified without explicit
--apply=original. - Default writes under
docs/refinement/mangchi/<slug>/updated.*; namespaced to avoid collision withtriad. - Verification loop (Phase 6) - every Claude REJECT is re-reviewed by
Codex.
DISAGREEcarries the issue into the next round; two consecutiveDISAGREEon the same issue flips toFORCED_ACCEPT(system-promoted, not Claude's choice). - ACCEPT diff verification (Phase 5) - each ACCEPTed issue's
locusmust actually be touched ingit diff -w --numstat(±5 line fuzz). No-op ACCEPTs are carried forward, not counted toward convergence. - REJECT requires citation -
file:LINEor test name in the reason is mandatory (hard validation error, never silently flipped to ACCEPT). - Shell-injection-safe Codex calls - strict tempfile + stdin pattern; no
argv interpolation. All dynamic prompt content appended via
cat <file> >>(never via unquoted variable expansion). - Pre-flight guards - Bash 4+, file size (≤ 2000 LoC / ≤ 200KB unless
--force), 2-PASS coverage warning if--only-axesexcludes correctness or security. - Token budget - per-round (80K estimate cap,
--force-roundbypass), per-call (180K context window, hard abort), cumulative (150K warn, 500K abort). - Codex missing handling - interactive confirmation before self-review
(which disables verify loop +
FORCED_ACCEPT). Non-interactive envs auto-abort unless--allow-self-reviewpassed. - Each round's Codex prompt, review response, and verify response are
preserved as audit trail (
round-N.prompt.txt,.codex.txt,.verify.txt).INDEX.mdsummarizes all rounds.
Cross-model code review is empirically supported. LLMs systematically underperform when reviewing their own output - Tsui et al. (2025) document a 64.5% self-correction blind spot across 14 open-source models (arXiv:2507.02778, NeurIPS 2025 LLM Evaluation Workshop). Gong et al. (2024) find the same pattern specifically for code security: models repair their own insecure code far less successfully than code produced by a different model (arXiv:2408.10495). Semgrep (2025) confirms the complementary-failure prediction in practice - Claude and Codex caught different vulnerability classes on 11 real-world Python web apps (Semgrep blog).
Mangchi operationalizes this: Claude edits, Codex reviews, rotating axes, round-delta accumulation, audit trail per round.
See skills/mangchi/RESEARCH.md for full
quotes, links, and what these sources do NOT claim.
See skills/mangchi/CASE-STUDIES.md for
concrete examples of real bugs caught on real projects - bug categories,
accept rates, token costs, and honest limits.
MIT - see LICENSE.
- Created by: haroom
- Built on Claude Code and the OpenAI Codex CLI
- Inspired by pumasi, which pioneered Claude-as-supervisor / Codex-as-worker patterns in Claude Code plugins