One file. Four rules. Opus 4.7 + MYTHOS Harness won 7 of 8 tasks against base Opus 4.7 in an 8-task pilot. No SDK, no runtime — paste one Markdown file into your coding agent's system prompt.
MYTHOS Harness is a prompt doctrine you paste into a coding agent's system prompt. Four short rules push the agent to name the target before acting, tag claims as known / inferred / guessed, resist unrequested abstraction, and report what was actually run before declaring done.
It ships no model, no runtime, and no SDK — just a single Markdown file. The clean proof artifact: Opus 4.7 + MYTHOS Harness won 7 of 8 tasks (22.88 avg) vs base Opus 4.7 at 1 of 8 (21.00 avg) in an 8-task run (see benchmarks/). The average-score gap is modest; the task-win flip is the louder signal. Small-sample and setup-dependent — re-run on your own suite before relying on it.
Core 8-task run. Same task family, same single-judge setup, same environment. Bronze = Opus 4.7 + MYTHOS Harness. Slate = base Opus 4.7.
Most coding-agent failures in practice are not reasoning failures. They are:
- Acting on the wrong target (wrong file, wrong caller, wrong symptom).
- Upgrading a guessed claim into a confident one by repeating it.
- Building an abstraction the code did not need.
- Declaring done on unverified output.
MYTHOS Harness is four short sections that push back on exactly those four patterns. No config, no knobs. Copy the file, paste it in, done.
# Claude Code
cp mythos-harness.md ~/.claude/rules/
# Codex CLI, Hermes, OpenClaw, or any custom harness with a system-prompt file
cat mythos-harness.md >> your-system-prompt.mdThat is the entire install. No SDK, no runtime, no lockfile, no build step. If it does not improve the next coding task you care about, remove the file and you are back where you started.
The public doctrine file is: mythos-harness.md.
Drop the file into ~/.claude/rules/ (or your project's rules directory). It will load alongside your other rules. No restart required for most setups.
cp mythos-harness.md ~/.claude/rules/Append the contents to your system prompt, or load it as an additional rule file if your harness supports rule layering.
Treat it as a system-prompt suffix. It is additive — it assumes your base prompt already covers tool use, safety posture, and working style.
No install script, no dependency tree, no lockfile. That is intentional.
Four short sections:
- Strategic depth — name the second-order effect, surface cheaper reframes, state the tradeoff you are accepting.
- Epistemic honesty — tag claims as known / inferred / guessed; unverified symbols stay unverified until checked this session.
- Abstraction restraint — prefer duplication over an abstraction you cannot name precisely; new interfaces need two real callers.
- Clean execution — finish or revert; before declaring done, list what was run, what was observed, what was not checked.
Full text: mythos-harness.md.
The doctrine does not tell the agent what to build. It constrains how the agent reasons before and after the edit. That is why a four-rule file can move a benchmark: the failures it targets are behavioral, not intellectual.
There are two different benchmark stories in this package:
- Core harness benchmarks — the clean evidence for MYTHOS Harness itself versus base Opus 4.7.
- Combined-workflow pilot — a broader A1–A6 experiment involving base models, harnessed models, and dual-model executor/judge workflows.
The core harness benchmark is the main proof artifact for the doctrine itself.
Full writeups:
Core harness benchmarks — Opus 4.7 + MYTHOS Harness vs base Opus 4.7 (the main story):
| Condition | Total score | Avg / task | Task wins |
|---|---|---|---|
| Opus 4.7 base | 168 | 21.00 | 1 / 8 |
| Opus 4.7 + MYTHOS Harness | 183 | 22.88 | 7 / 8 |
Combined-workflow pilot (separate context, not the core harness proof):
| Arm | Configuration | Avg | Wins |
|---|---|---|---|
| A1 | Opus 4.7 base | 21.875 | 1 |
| A2 | GPT-5.4 base | 15.812 | 0 |
| A3 | Opus 4.7 + MYTHOS Harness | 23.062 | 5 |
| A4 | GPT-5.4 + MYTHOS Harness | 16.312 | 0 |
| A5 | GPT-5.4 + MYTHOS Harness executor + Opus judge | 22.312 | 2 |
| A6 | Opus executor + GPT-5.4 judge | 22.875 | 4 |
Honest read of the numbers:
- The core harness benchmarks are the strongest evidence that MYTHOS Harness improves Opus 4.7 on the measured task sets.
- The combined-workflow pilot is useful context, but it is a broader systems experiment, not the cleanest proof artifact for the harness itself.
- A dual-model executor+judge arrangement (A6) beat plain base Opus, but did not beat Opus 4.7 + MYTHOS Harness on this pilot.
- The win is real but modest. This package supports a narrow claim of observed gains for Opus 4.7 + MYTHOS Harness in these small-sample runs, not a statistical or state-of-the-art claim.
- Re-run it on your own suite if you want stronger confidence.
MYTHOS Harness is a plain Markdown rule file. It is naturally compatible with:
- Claude Code and other Anthropic-style coding-agent workflows
- Codex CLI and OpenAI-style coding harnesses
- Hermes and OpenClaw style orchestrators that layer additive rule files
- Custom orchestrators running Opus, Sonnet, GPT-class models, or other capable coding agents
It is not tied to one model family. Opus 4.7 is simply where the strongest measured evidence lives right now.
- Not a model. This repository ships a prompt doctrine, not Anthropic's Mythos Preview model. See
DISCLAIMER.mdfor the naming clarification. - Not an evaluation framework. It is four pages of constraints. Measurement lived in a separate pilot harness.
- Not a replacement for a good system prompt. It is additive. If your base prompt is weak, MYTHOS will not fix that.
- Small-sample pilot evidence only. The package summarizes two harness-favoring benchmark runs in
benchmarks/CORE-HARNESS-BENCHMARKS.md, but there is still no statistical significance test, no public reproducibility package, and no third-party replication. - Not affiliated with Anthropic, OpenAI, or any other company. See
DISCLAIMER.md.
- Small surface. One file, four sections. If you cannot read it in two minutes, it is too big.
- Additive, not prescriptive. It does not override your existing rules on tool use, safety, or voice.
- Honest about epistemics. "Haven't checked X" beats "X should work." That rule applies to this README too.
- Forkable. MIT licensed. Rewrite the sections in your own voice if that fits your team better.
MIT.
Chosen over Apache-2.0 because:
- The contribution is text and methodology, not code with patent surface.
- MIT is shorter, universally understood, and maximally permissive for forking and embedding in proprietary prompts.
- We explicitly want people to copy, rewrite, and re-ship this doctrine inside their own agent frameworks without ceremony.
If you need Apache-2.0-style patent protection for your use case, fork and relicense your derivative — MIT allows it.
The fastest way to improve this is to fork it, rewrite one section in your own words, run it against a task suite you care about, and publish what you saw. If you change the doctrine in a way that clearly helps, we want to read it.
Issues, PRs, and "here is what broke on my setup" notes are all welcome.
mythos-harness.md— the doctrine file.README.md— this file.DISCLAIMER.md— non-affiliation and benchmark-scope notice.LICENSE— MIT.assets/— benchmark graphics, behavior diagrams, and README visual assets.benchmarks/CORE-HARNESS-BENCHMARKS.md— harness-focused benchmark evidence.benchmarks/PILOT-A1-A6.md— separate combined-workflow pilot.benchmarks/PILOT-A1-A6-summary.json— machine-readable summary of the combined-workflow pilot.site/— optional dark landing page.