Skip to content

BitmapAsset/mythos-harness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MYTHOS Harness

MYTHOS Harness — name the target, tag the uncertainty, finish or revert.

One file. Four rules. Opus 4.7 + MYTHOS Harness won 7 of 8 tasks against base Opus 4.7 in an 8-task pilot. No SDK, no runtime — paste one Markdown file into your coding agent's system prompt.

MYTHOS Harness is a prompt doctrine you paste into a coding agent's system prompt. Four short rules push the agent to name the target before acting, tag claims as known / inferred / guessed, resist unrequested abstraction, and report what was actually run before declaring done.

It ships no model, no runtime, and no SDK — just a single Markdown file. The clean proof artifact: Opus 4.7 + MYTHOS Harness won 7 of 8 tasks (22.88 avg) vs base Opus 4.7 at 1 of 8 (21.00 avg) in an 8-task run (see benchmarks/). The average-score gap is modest; the task-win flip is the louder signal. Small-sample and setup-dependent — re-run on your own suite before relying on it.

Benchmark summary showing base Opus 4.7 at 1 of 8 task wins and Opus 4.7 plus MYTHOS Harness at 7 of 8 task wins, with honest average score bars at 21.00 and 22.88.

Core 8-task run. Same task family, same single-judge setup, same environment. Bronze = Opus 4.7 + MYTHOS Harness. Slate = base Opus 4.7.


Why you might want this

Most coding-agent failures in practice are not reasoning failures. They are:

  • Acting on the wrong target (wrong file, wrong caller, wrong symptom).
  • Upgrading a guessed claim into a confident one by repeating it.
  • Building an abstraction the code did not need.
  • Declaring done on unverified output.

MYTHOS Harness is four short sections that push back on exactly those four patterns. No config, no knobs. Copy the file, paste it in, done.


Quick start (30 seconds)

# Claude Code
cp mythos-harness.md ~/.claude/rules/

# Codex CLI, Hermes, OpenClaw, or any custom harness with a system-prompt file
cat mythos-harness.md >> your-system-prompt.md

That is the entire install. No SDK, no runtime, no lockfile, no build step. If it does not improve the next coding task you care about, remove the file and you are back where you started.


Install

The public doctrine file is: mythos-harness.md.

Claude Code

Drop the file into ~/.claude/rules/ (or your project's rules directory). It will load alongside your other rules. No restart required for most setups.

cp mythos-harness.md ~/.claude/rules/

Codex CLI / generic agent harnesses

Append the contents to your system prompt, or load it as an additional rule file if your harness supports rule layering.

Custom orchestrator

Treat it as a system-prompt suffix. It is additive — it assumes your base prompt already covers tool use, safety posture, and working style.

No install script, no dependency tree, no lockfile. That is intentional.


What it contains

Four short sections:

  1. Strategic depth — name the second-order effect, surface cheaper reframes, state the tradeoff you are accepting.
  2. Epistemic honesty — tag claims as known / inferred / guessed; unverified symbols stay unverified until checked this session.
  3. Abstraction restraint — prefer duplication over an abstraction you cannot name precisely; new interfaces need two real callers.
  4. Clean execution — finish or revert; before declaring done, list what was run, what was observed, what was not checked.

Full text: mythos-harness.md.

How the four rules reshape agent behavior

Before and after chart showing how MYTHOS Harness changes coding-agent behavior across target selection, uncertainty tagging, abstraction restraint, and execution discipline.

The doctrine does not tell the agent what to build. It constrains how the agent reasons before and after the edit. That is why a four-rule file can move a benchmark: the failures it targets are behavioral, not intellectual.


Measured results (summary)

There are two different benchmark stories in this package:

  1. Core harness benchmarks — the clean evidence for MYTHOS Harness itself versus base Opus 4.7.
  2. Combined-workflow pilot — a broader A1–A6 experiment involving base models, harnessed models, and dual-model executor/judge workflows.

The core harness benchmark is the main proof artifact for the doctrine itself.

Full writeups:

Core harness benchmarks — Opus 4.7 + MYTHOS Harness vs base Opus 4.7 (the main story):

Condition Total score Avg / task Task wins
Opus 4.7 base 168 21.00 1 / 8
Opus 4.7 + MYTHOS Harness 183 22.88 7 / 8

Combined-workflow pilot (separate context, not the core harness proof):

Arm Configuration Avg Wins
A1 Opus 4.7 base 21.875 1
A2 GPT-5.4 base 15.812 0
A3 Opus 4.7 + MYTHOS Harness 23.062 5
A4 GPT-5.4 + MYTHOS Harness 16.312 0
A5 GPT-5.4 + MYTHOS Harness executor + Opus judge 22.312 2
A6 Opus executor + GPT-5.4 judge 22.875 4

Honest read of the numbers:

  • The core harness benchmarks are the strongest evidence that MYTHOS Harness improves Opus 4.7 on the measured task sets.
  • The combined-workflow pilot is useful context, but it is a broader systems experiment, not the cleanest proof artifact for the harness itself.
  • A dual-model executor+judge arrangement (A6) beat plain base Opus, but did not beat Opus 4.7 + MYTHOS Harness on this pilot.
  • The win is real but modest. This package supports a narrow claim of observed gains for Opus 4.7 + MYTHOS Harness in these small-sample runs, not a statistical or state-of-the-art claim.
  • Re-run it on your own suite if you want stronger confidence.

Works with Claude Code, Codex, Hermes, OpenClaw, and custom agent harnesses

MYTHOS Harness is a plain Markdown rule file. It is naturally compatible with:

  • Claude Code and other Anthropic-style coding-agent workflows
  • Codex CLI and OpenAI-style coding harnesses
  • Hermes and OpenClaw style orchestrators that layer additive rule files
  • Custom orchestrators running Opus, Sonnet, GPT-class models, or other capable coding agents

It is not tied to one model family. Opus 4.7 is simply where the strongest measured evidence lives right now.

MYTHOS Harness representation art showing a named target vector entering a bronze reticle, uncertainty tags, and a grounded execution baseline.


What this is not

  • Not a model. This repository ships a prompt doctrine, not Anthropic's Mythos Preview model. See DISCLAIMER.md for the naming clarification.
  • Not an evaluation framework. It is four pages of constraints. Measurement lived in a separate pilot harness.
  • Not a replacement for a good system prompt. It is additive. If your base prompt is weak, MYTHOS will not fix that.
  • Small-sample pilot evidence only. The package summarizes two harness-favoring benchmark runs in benchmarks/CORE-HARNESS-BENCHMARKS.md, but there is still no statistical significance test, no public reproducibility package, and no third-party replication.
  • Not affiliated with Anthropic, OpenAI, or any other company. See DISCLAIMER.md.

Design principles

  • Small surface. One file, four sections. If you cannot read it in two minutes, it is too big.
  • Additive, not prescriptive. It does not override your existing rules on tool use, safety, or voice.
  • Honest about epistemics. "Haven't checked X" beats "X should work." That rule applies to this README too.
  • Forkable. MIT licensed. Rewrite the sections in your own voice if that fits your team better.

License

MIT.

Chosen over Apache-2.0 because:

  • The contribution is text and methodology, not code with patent surface.
  • MIT is shorter, universally understood, and maximally permissive for forking and embedding in proprietary prompts.
  • We explicitly want people to copy, rewrite, and re-ship this doctrine inside their own agent frameworks without ceremony.

If you need Apache-2.0-style patent protection for your use case, fork and relicense your derivative — MIT allows it.


Fork, rewrite, ship

The fastest way to improve this is to fork it, rewrite one section in your own words, run it against a task suite you care about, and publish what you saw. If you change the doctrine in a way that clearly helps, we want to read it.

Issues, PRs, and "here is what broke on my setup" notes are all welcome.


Files in this package