Mangchi - Iterative Code Refinement

Language: English · 한국어

One model editing its own code keeps reinforcing its own blind spots, so the same bug survives multiple review passes? mangchi alternates Claude-as-editor with Codex-as-reviewer across 5 rotating axes (correctness / security / readability / performance / design), catching each model's blind spots with the other until the code converges.

Claude writes, Codex critiques, Claude decides. Rotating review axes until the code stops moving or you hit the cap. Like hammering metal into shape one stroke at a time.

30-second demo

Without mangchi:

You ask Claude to harden auth.py. It reviews itself, says "good enough", you ship it. A week later, Codex or a different reviewer immediately spots a race condition that Claude and you both missed. Same blind spot, one model, one pass.

With mangchi:

/mangchi src/services/auth.py. Round 1: Claude edits, Codex reviews on correctness. Round 2: Claude edits, Codex reviews on security. Round 3: readability. Each round a different axis, a different model doing the critique. Stop when two rounds in a row agree nothing more is needed, or at round 5. The delta file is written separately so the original is never touched unless you say so.

Who should use this

Production-bound code you want to harden before release - single-file sweep across 5 axes
Prompt-injection and security defense code - adversarial second opinion from a different model
Claude Max users with spare Claude tokens but needing external cross-model review without running Codex by hand
Correctness / security / readability / performance / design sweep in one loop
Code-review PRs that need a rationale doc - mangchi writes the round-by-round history

Sibling tools (same marketplace)

ddaro - worktree-based parallel Claude Code sessions with safe merge.
prism - 5-agent parallel code review (broader, one-shot).
triad - 3-perspective deliberation for design docs and markdown.

What this is

A Claude Code skill that hardens a single code file through iterative cross-model review. Each round picks one axis (correctness / security / readability / performance / design), sends the file to the Codex CLI for adversarial critique, lets Claude decide which issues to accept, and applies the fixes. Repeat with a different axis until the file converges.

Think of it as a code-review ping-pong where neither player gets to short-circuit - Codex can't apply its own fixes, Claude can't skip the critique.

When to use it

Existing code that needs hardening before a release, PR, or deploy
Security review of code that accepts untrusted input
Post-implementation cleanup when Claude wrote the first draft and you want an outside model to sanity-check it
Before turning a prototype into production code

When NOT to use it

Greenfield construction from scratch - mangchi hardens existing code; use a code-generation tool for net-new implementation
Markdown or documentation review - use a deliberation tool instead (e.g. triad - mangchi operates on executable code with real fixes, not prose)
Cross-file architectural refactors - mangchi is single-file by design
Tiny utility files (< 80 LoC) - overhead exceeds signal

Related tools (ecosystem fit)

Mangchi is designed to compose with other Claude Code tools in a natural workflow:

Stage	Tool	Role
Decide	deliberation tools (e.g. triad)	Multi-perspective design review before coding
Build	code-generation plugins (e.g. pumasi)	Parallel greenfield implementation
Harden	mangchi (this)	Single-file iterative cross-model review
Verify	existing review/test runners	Final gate before merge

Pick the one that matches the stage you're in. Mangchi specifically targets the "code exists, needs to be better" gap.

Install

Prerequisites

# Codex CLI (the reviewer)
npm install -g @openai/codex
codex login    # or set OPENAI_API_KEY

# Claude Code (the orchestrator)
# See https://docs.claude.com/en/docs/claude-code

Plugin install

mangchi is distributed through the haroom_plugins aggregator marketplace along with the other haroom plugins (ddaro, prism, triad).

# 1. Add the haroom_plugins marketplace (one time)
/plugin marketplace add https://github.com/minwoo-data/haroom_plugins.git

# 2. Install
/plugin install mangchi

Restart Claude Code after install. Upgrades are a single /plugin update - the aggregator pulls updates for every haroom plugin you have installed.

Usage

/mangchi src/services/auth.py                         # default: updated.* only, original untouched
/mangchi src/services/auth.py --apply=original        # also Edit the original file
/mangchi src/utils/hash.py --only-axes=correctness,security   # restrict axis rotation
/mangchi src/new_module.py --include-axes=necessity   # default 5 + necessity opt-in
/mangchi src/auth.py --start-axis=security            # R1 starts on security (not default correctness)
/mangchi src/parse.py --gate "pytest -x tests/"       # require external gate before CONVERGED
/mangchi src/x.py --no-verify                         # skip verify loop (adversarial guarantee lost)
/mangchi --continue src-services-auth-py              # resume an in-progress session (including aborted)
/mangchi --stop src-services-auth-py                  # force close with whatever's in state

Natural-language triggers also work (e.g. "refine with mangchi", "cross-model review"). Korean triggers are documented in the Korean README.

See skills/mangchi/references/usage.md for every flag and more examples.

The five axes (default) + `necessity` / `robustness` opt-in

Axis	Question it asks	Default
`correctness`	Does the code behave right on every input shape?	✓
`security`	What attack surface does this expose?	✓
`readability`	Can a contributor understand & change this in 6 months?	✓
`performance`	Where is it wasting I/O, memory, or cycles?	✓
`design`	Will this still be maintainable in a year?	✓
`necessity`	Is this new code necessary, or does existing infra cover it? (YAGNI)	opt-in via `--include-axes=necessity`
`robustness`	Does it survive on concurrency / failure-recovery / data-integrity / state-transitions?	opt-in via `--include-axes=robustness`

Each round uses exactly one axis. Adjacent rounds cannot repeat an axis - rotation is enforced so you don't get five "correctness" rounds in a row.

robustness is a 4-sub-axis runtime-failure probe - the reviewer must walk all four sub-axes (concurrency, failure & recovery, data integrity, state transitions) per round, or mark N/A with reason. It's orthogonal to correctness (which asks "does it meet its contract on the happy path?") - robustness asks "does it survive adversarial runtime conditions?". Opt-in because pure/stateless code has no realistic signal on these axes.

To enable both opt-in axes: --include-axes=necessity,robustness.

Termination (v2 schema)

Any of:

Two consecutive verified rounds - all ACCEPTed issues actually touched a matching diff (locus ±5 rule) AND zero DISAGREE returns from Codex's verify pass.
Two consecutive PASS + no_changes_suggested - requires correctness AND security to each have been executed at least once (gaming guard; "easy axes only" PASS streaks don't count).
R5 hard cap (prevents oscillation + token drain).
--gate "<cmd>" exit 0 - external command passes (runs once before termination, or every round with --gate-every-round).
User --stop.

Aborts (session can be resumed with --continue after manual arbitration):

Cumulative Codex tokens ≥ 500K
forced_accept_count ≥ --force-accept-threshold (default 1 = strict)
Codex YAML schema retry exhausted
Per-call context window exceeded (≥ 180K tokens)

Safety defaults

Original files are never modified without explicit --apply=original.
Default writes under docs/refinement/mangchi/<slug>/updated.*; namespaced to avoid collision with triad.
Verification loop (Phase 6) - every Claude REJECT is re-reviewed by Codex. DISAGREE carries the issue into the next round; two consecutive DISAGREE on the same issue flips to FORCED_ACCEPT (system-promoted, not Claude's choice).
ACCEPT diff verification (Phase 5) - each ACCEPTed issue's locus must actually be touched in git diff -w --numstat (±5 line fuzz). No-op ACCEPTs are carried forward, not counted toward convergence.
REJECT requires citation - file:LINE or test name in the reason is mandatory (hard validation error, never silently flipped to ACCEPT).
Shell-injection-safe Codex calls - strict tempfile + stdin pattern; no argv interpolation. All dynamic prompt content appended via cat <file> >> (never via unquoted variable expansion).
Pre-flight guards - Bash 4+, file size (≤ 2000 LoC / ≤ 200KB unless --force), 2-PASS coverage warning if --only-axes excludes correctness or security.
Token budget - per-round (80K estimate cap, --force-round bypass), per-call (180K context window, hard abort), cumulative (150K warn, 500K abort).
Codex missing handling - interactive confirmation before self-review (which disables verify loop + FORCED_ACCEPT). Non-interactive envs auto-abort unless --allow-self-review passed.
Each round's Codex prompt, review response, and verify response are preserved as audit trail (round-N.prompt.txt, .codex.txt, .verify.txt). INDEX.md summarizes all rounds.

Research backing

Cross-model code review is empirically supported. LLMs systematically underperform when reviewing their own output - Tsui et al. (2025) document a 64.5% self-correction blind spot across 14 open-source models (arXiv:2507.02778, NeurIPS 2025 LLM Evaluation Workshop). Gong et al. (2024) find the same pattern specifically for code security: models repair their own insecure code far less successfully than code produced by a different model (arXiv:2408.10495). Semgrep (2025) confirms the complementary-failure prediction in practice - Claude and Codex caught different vulnerability classes on 11 real-world Python web apps (Semgrep blog).

Mangchi operationalizes this: Claude edits, Codex reviews, rotating axes, round-delta accumulation, audit trail per round.

See skills/mangchi/RESEARCH.md for full quotes, links, and what these sources do NOT claim.

Evidence from real use

See skills/mangchi/CASE-STUDIES.md for concrete examples of real bugs caught on real projects - bug categories, accept rates, token costs, and honest limits.

License

MIT - see LICENSE.

Credits

Created by: haroom
Built on Claude Code and the OpenAI Codex CLI
Inspired by pumasi, which pioneered Claude-as-supervisor / Codex-as-worker patterns in Claude Code plugins

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude-plugin		.claude-plugin
commands		commands
skills/mangchi		skills/mangchi
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.ko.md		README.ko.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mangchi - Iterative Code Refinement

30-second demo

Who should use this

Sibling tools (same marketplace)

What this is

When to use it

When NOT to use it

Related tools (ecosystem fit)

Install

Prerequisites

Plugin install

Usage

The five axes (default) + `necessity` / `robustness` opt-in

Termination (v2 schema)

Safety defaults

Research backing

Evidence from real use

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Mangchi - Iterative Code Refinement

30-second demo

Who should use this

Sibling tools (same marketplace)

What this is

When to use it

When NOT to use it

Related tools (ecosystem fit)

Install

Prerequisites

Plugin install

Usage

The five axes (default) + necessity / robustness opt-in

Termination (v2 schema)

Safety defaults

Research backing

Evidence from real use

License

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

The five axes (default) + `necessity` / `robustness` opt-in

Packages