`codeaudit` — A Universal Bug-Audit Skill for AI Coding Agents

A single, portable, cross-agent skill that audits any git repository for logical bugs, typos, formula/calculation mistakes, inconsistencies, runtime errors, and compile-time/type/syntax errors — and writes a categorized BUG_REPORT.md. Works in Claude Code, OpenCode, Codex CLI, and any other agent that follows the open Agent Skills standard.

The problem

Every modern CLI coding agent — Claude Code, OpenCode, Codex CLI, Cursor, Gemini CLI, Copilot — already has a notion of a "skill", "slash command", or "agent". And every senior engineer has, at some point, asked an agent "find bugs in the codebase" and gotten back a shallow, hand-wavy answer that misses the categories that actually matter: typos that silently fail comparisons, formulas that compute the wrong number, near-duplicate functions that have drifted apart, code that compiles but blows up at runtime.

The problem is that bug-hunting quality scales directly with how carefully the model reasons, and most agents default to "good enough" reasoning unless you explicitly ask for the deepest mode. Worse, a less capable model will silently drop findings it can't hold in context, and return a report that looks complete but isn't.

This repo is the build/audit harness for a single skill — find-bugs — that fixes both problems:

It forces maximum reasoning depth via three independent levers (frontmatter effort: max + the trigger word ultrathink + explicit in-body instructions), so a plain request like "find bugs in the codebase" gets the same depth as if you'd typed ultrathink yourself.
It externalizes the audit's working state to a scratch directory (.bugaudit/) using a fixed JSONL schema, so a weaker model can audit a 200-file repo without losing track of earlier findings.

The result: a reproducible, file-line-precise BUG_REPORT.md for any repository, regardless of which CLI agent is running the audit.

What this repo is

This repo is not a typical code project. It is the build harness for the find-bugs skill itself. After BUILD_GUIDE.md runs, the deliverable is the find-bugs skill — placed under .agents/skills/, .claude/skills/, and .opencode/skill/, .opencode/skills/, plus a short pointer in AGENTS.md.

┌──────────────────────────────────────────────────────────────┐
│                    THIS REPO (codeaudit)                     │
│                                                              │
│   ┌─────────────┐  ┌──────────────┐  ┌──────────────────┐    │
│   │  SPEC.md    │  │ BUILD_GUIDE  │  │ PROMPT_FOR_AGENT │    │
│   │  (the why)  │  │  (the what)  │  │  (the prompt)    │    │
│   └──────┬──────┘  └──────┬───────┘  └────────┬─────────┘    │
│          │                │                   │              │
│          ▼                ▼                   ▼              │
│   ┌────────────────────────────────────────────────────┐     │
│   │   After running BUILD_GUIDE.md Steps 1-7:         │     │
│   │   .agents/skills/find-bugs/SKILL.md  (canonical)  │     │
│   │   .agents/skills/find-bugs/references/bug-taxonomy │     │
│   │   .agents/skills/find-bugs/assets/report_template │     │
│   │   .agents/skills/find-bugs/scripts/run_static_*   │     │
│   │   + 3 byte-identical mirrors                      │     │
│   │   + AGENTS.md pointer block                       │     │
│   └────────────────────────────────────────────────────┘     │
│                                                              │
│   The skill then audits any repo it's pointed at.            │
└──────────────────────────────────────────────────────────────┘

If you only want to use the skill, copy the four files in .agents/skills/find-bugs/ (plus the three mirrors and the AGENTS.md block) to your repo. If you want to build/modify the skill, edit SPEC.md and BUILD_GUIDE.md together — they form the design and implementation contract.

How it works

You (any CLI coding agent)            codeaudit skill
┌────────────┐                          ┌────────────────────┐
│  user:     │   "find bugs in the     │  SKILL.md (loaded) │
│ "find bugs │ ── codebase" ────────▶  │  + references/     │
│  in the    │                          │  + assets/         │
│ codebase"  │                          │  + scripts/        │
└────────────┘                          └─────────┬──────────┘
                                                  │
                                                  ▼
                              ┌─────────────────────────────────┐
                              │  Phase 0  Setup                 │  .bugaudit/
                              │  Phase 1  Inventory & triage    │  ├── inventory.md
                              │  Phase 2  Static analysis       │  ├── static-analysis.md
                              │  Phase 3  Per-file review       │  ├── findings.jsonl
                              │  Phase 4  Cross-file sweep      │  └── notes.md
                              │  Phase 5  Formula verification  │
                              │  Phase 6  Triage & dedup        │
                              │  Phase 7  Write BUG_REPORT.md   │  ┌───────────────┐
                              │  Phase 8  Chat summary          │  │ BUG_REPORT.md │
                              └─────────────────────────────────┘  └───────────────┘

When the skill triggers, it:

Creates a .bugaudit/ scratch directory in the target repo (the one being audited, not this one) and never touches any source file there.
Enumerates the target repo's files and classifies them into Tier 1/2/3 (entry points + calculation-shaped files get deep-reviewed; tests and generated code are skimmed).
Runs read-only static analysis (the bundled run_static_checks.sh script detects the toolchain — Node, Python, Go, Rust, Java, .NET, PHP, Ruby — and runs whatever checkers are installed).
Walks every Tier 1/2 file against the six bug categories, appending one JSON line per finding to findings.jsonl.
Performs cross-file consistency checks (drifted duplicates, mismatched API contracts, stale docs).
Verifies any arithmetic with business/scientific meaning term-by-term against the intended formula.
Triages/dedups/severities the findings, then writes BUG_REPORT.md using assets/report_template.md as the skeleton.
Gives you a 1-paragraph chat summary (counts by severity + the single most important issue).

The whole thing is read-only against the audited repo by design.

What it finds

The taxonomy has six categories, mapped 1:1 to the findings.jsonl schema and to the six section headers in BUG_REPORT.md:

#	Category	Definition (short)
1	Logical	Control flow / boolean logic doesn't match the intent implied by naming, comments, or surrounding code.
2	Typos	A misspelled literal, key, identifier, or pattern that should byte-for-byte match another one but doesn't, causing a silent mismatch.
3	Formula / Calculation	Arithmetic, statistical, or unit-conversion logic that runs fine but computes the wrong number.
4	Inconsistencies	Two or more places that should agree (behavior, format, naming, validation, docs) but have drifted apart.
5	Runtime	Code that parses/compiles fine but can throw, crash, hang, or behave unsafely on realistic inputs.
6	Compile-time / Syntax / Type	The code would fail to compile/parse or be rejected by a type checker.

Full checklists and worked examples for each category are in .agents/skills/find-bugs/references/bug-taxonomy.md.

Severity rubric

Severity	Meaning
Critical	Crashes, data loss/corruption, security issue, or completely wrong output on a common/primary path.
High	Incorrect results or failures on common inputs/paths — materially wrong behavior, not necessarily a crash.
Medium	Wrong only on edge cases/uncommon inputs, or a real inconsistency likely to cause a bug later.
Low	Cosmetic — typos in comments/logs/UI copy, minor style inconsistencies, very-low-probability edge cases.

confidence (high/medium/low) is tracked separately from severity — a finding can be severe but uncertain (goes to the "Possible Issues" appendix) or minor but certain (goes in the main tables as Low).

Repository layout

codeaudit/
├── README.md                          ← you are here
├── AGENTS.md                          ← short pointer block (always-loaded context)
├── SPEC.md                            ← design rationale (the "why")
├── BUILD_GUIDE.md                     ← runnable shell that builds the skill
├── PROMPT_FOR_AGENT.md                ← prompt to hand to a coding agent
├── BUG_REPORT.md                      ← self-audit report of this repo
│
├── .agents/skills/find-bugs/          ← canonical skill (4 files)
│   ├── SKILL.md                       ← main instructions, YAML frontmatter
│   ├── references/
│   │   └── bug-taxonomy.md            ← full category checklist
│   ├── assets/
│   │   └── report_template.md         ← BUG_REPORT.md skeleton
│   └── scripts/
│       └── run_static_checks.sh       ← read-only multi-toolchain runner
│
├── .claude/skills/find-bugs/          ← mirror (Claude Code discovers here)
├── .opencode/skill/find-bugs/         ← mirror (OpenCode singular form)
└── .opencode/skills/find-bugs/        ← mirror (OpenCode plural form)

The three mirror directories under .claude/ and .opencode/ are byte-identical to the canonical .agents/skills/find-bugs/ (verified by diff -r after every build). They are present because the open Agent Skills standard does not yet mandate a single scan path — different agents and different OpenCode releases have used both singular and plural forms. Mirroring the same folder into all of them is the safest cross-tool contract.

Quick start

I just want to use the skill on my own repo

Copy this whole directory (or at minimum the four .agents/skills/find-bugs/ files) to the root of your target repo.
Open your CLI coding agent in that repo.
Say "find bugs in the codebase" — or invoke explicitly with /find-bugs (Claude Code, OpenCode) or $find-bugs (Codex CLI).
A BUG_REPORT.md will be written to your repo root when the audit finishes.

I want to build the skill from scratch in a clean repo

Copy SPEC.md, BUILD_GUIDE.md, and PROMPT_FOR_AGENT.md to the target repo.
Open your CLI coding agent in that repo.
Paste the contents of PROMPT_FOR_AGENT.md (everything below the --------------- divider) as your prompt.
The agent will execute BUILD_GUIDE.md Steps 1-7 in order, then run Step 8's verification commands and report PASS/FAIL.

I want to modify the skill

Edit SPEC.md first (it's the rationale). Update the section that describes the change.
Edit BUILD_GUIDE.md to match (every shell code block in Steps 1-7 is the literal final content to write to disk).
Re-run BUILD_GUIDE.md Steps 1-7 to regenerate the skill.
Re-run Step 6 to re-mirror the canonical copy to .claude/, .opencode/skill/, and .opencode/skills/.

The 8-phase audit pipeline

┌────────────┐
│  Phase 0   │  Setup — create .bugaudit/ scratch dir (and gitignore it)
└─────┬──────┘
      ▼
┌────────────┐
│  Phase 1   │  Inventory & triage — Tier 1/2/3 classification
└─────┬──────┘
      ▼
┌────────────┐
│  Phase 2   │  Static analysis — run read-only checkers (tsc, ruff, etc.)
└─────┬──────┘
      ▼
┌────────────┐
│  Phase 3   │  Per-file deep review — every Tier 1/2 file × 6 categories
└─────┬──────┘
      ▼
┌────────────┐
│  Phase 4   │  Cross-file consistency sweep — signature mismatches,
└─────┬──────┘  drifted duplicates, stale docs, API-contract gaps
      ▼
┌────────────┐
│  Phase 5   │  Formula & calculation verification — derive the intended
└─────┬──────┘  formula symbolically, compare term-by-term
      ▼
┌────────────┐
│  Phase 6   │  Triage, dedup, severity — merge same-root-cause findings,
└─────┬──────┘  confirm severities, drop false positives
      ▼
┌────────────┐
│  Phase 7   │  Write BUG_REPORT.md — fill assets/report_template.md
└─────┬──────┘
      ▼
┌────────────┐
│  Phase 8   │  Chat summary — counts by severity + top issue, no re-paste
└────────────┘

The full details of each phase live in .agents/skills/find-bugs/SKILL.md. The full checklists for each category are in .agents/skills/find-bugs/references/bug-taxonomy.md.

Design choices worth knowing about

1. Common-denominator `SKILL.md` format

Anthropic's Agent Skills docs, OpenAI's Codex Skills docs, and third-party compatibility surveys all converge on the same minimum viable SKILL.md: a YAML frontmatter block with name and description, followed by Markdown instructions. Anything beyond that (allowed-tools, context: fork, hooks, effort) is tool-specific but safely ignored by agents that don't support them. So the skill writes one canonical SKILL.md using only the universal fields plus effort: max (Claude Code's adaptive-effort system) and mirrors it everywhere.

2. Progressive disclosure

SKILL.md itself is kept well under the ~5,000-word guidance so the model isn't overwhelmed when the skill loads. The 6,000-word bug checklist lives in references/bug-taxonomy.md (loaded in Phase 3). The report skeleton lives in assets/report_template.md. The static-analysis runner lives in scripts/run_static_checks.sh. SKILL.md is the orchestration layer.

3. Externalize state, don't trust working memory

A weaker model auditing a 200-file repo will lose track of earlier findings if it has to hold them all in context. The skill writes everything to .bugaudit/ as it goes:

.bugaudit/
├── inventory.md           # Phase 1: file tiers + counts
├── static-analysis.md     # Phase 2: tool output
├── findings.jsonl         # Phases 3-5: one JSON line per finding
└── notes.md               # running progress log

Phase 6 (triage/dedup) becomes a mechanical "read all lines, group by root cause" operation instead of relying on recall.

4. Three independent levers for "thinking mode max"

Layer	Mechanism	Applies to
1	`effort: max` in `SKILL.md` frontmatter	Claude Code's adaptive-effort system
2	Literal word `ultrathink` in §0	Claude Code's trigger-word preprocessing
3	Explicit in-body instructions	Any model, including DeepSeek-class and OpenCode

Why three? The skill is supposed to auto-trigger on a plain "find bugs in the codebase" — which won't contain any thinking keyword. The frontmatter and in-body instructions make the skill self-sufficient regardless of how it was invoked.

5. Read-only by default

A bug-finding skill that also starts editing files is a different (riskier) product. This skill explicitly never modifies source files. "Fix the bugs" is a deliberate follow-up task, not part of this skill.

Customization

Knob	Where to change it
Output filename / location	`SKILL.md` §1 (and `assets/report_template.md` if you want it reflected in the skeleton)
Large-repo threshold (~150 files)	`SKILL.md` Phase 1 step 5 — raise for faster/shallower, lower to force explicit "Methodology & Limitations" disclosure
Tier 1 patterns (entry points, calc-shaped files)	`SKILL.md` Phase 1 step 3 — add project-specific globs (e.g. monorepo package names)
Static-analysis coverage	`scripts/run_static_checks.sh` — add another `if [ -f <manifest> ]; then ... fi` block following the existing pattern
`findings.jsonl` schema	`SKILL.md` §3.3 — add fields like `tags: []` and update `assets/report_template.md` if they should surface in the report

Limitations

Very large monorepos (thousands of files) will hit the Tier 1/2 budget quickly. The skill is designed to disclose this in "Methodology & Limitations" rather than solve it. For true monorepos, scope invocations to one package at a time (the skill supports a user-specified path).
Formula verification (Phase 5) depends on domain inference — for highly specialized math (actuarial, cryptographic, ML-numerical) the model may not know the "correct" formula to compare against; such findings should naturally land at confidence: low.
Static-analysis script coverage is best-effort and assumes common CLI tool names/flags. Some projects use wrapper scripts (make lint, npm run check) — the script falls back to ad-hoc commands if the bundled script doesn't cover the project's toolchain.
No fix mode — by design. A natural follow-up skill (fix-bugs or apply-bug-fixes) could read BUG_REPORT.md and apply fixes one at a time with user confirmation. Out of scope here.

Relationship between SPEC, BUILD_GUIDE, and PROMPT

The three top-level files form a three-layer contract:

┌─────────────────────────────────────────────────────────────┐
│  SPEC.md  (the why)                                         │
│  ─────────                                                  │
│  Design rationale, compatibility matrix, severity rubric,   │
│  acceptance checklist. Read once when building or           │
│  modifying the skill.                                       │
└──────────────────────────┬──────────────────────────────────┘
                           │ informs
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  BUILD_GUIDE.md  (the what)                                 │
│  ────────────────                                           │
│  Literal, runnable shell blocks that write the skill to     │
│  disk exactly as given. Source of truth for file contents   │
│  and paths. Safe to re-run (idempotent: mkdir -p, cat >     │
│  overwrite, rm -rf before cp -r).                           │
└──────────────────────────┬──────────────────────────────────┘
                           │ executed by
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  PROMPT_FOR_AGENT.md  (the prompt)                           │
│  ──────────────────────                                     │
│  Short instruction to hand to a coding agent (e.g. DeepSeek │
│  V4 Flash) so it executes BUILD_GUIDE.md mechanically,      │
│  without reinterpreting or "improving" it.                   │
└─────────────────────────────────────────────────────────────┘

None of these three files are part of the skill itself — once built, the skill is just .agents/skills/find-bugs/ (+ mirrors + the AGENTS.md pointer). They can be deleted from the target repo after a successful build, or kept under e.g. docs/dev/ for future maintenance.

Running it on your own repo

# 1. Copy the skill into your repo
cp -r .agents/skills/find-bugs YOUR_REPO/.agents/skills/find-bugs
cp -r .claude/skills/find-bugs YOUR_REPO/.claude/skills/find-bugs
cp -r .opencode/skill/find-bugs YOUR_REPO/.opencode/skill/find-bugs
cp -r .opencode/skills/find-bugs YOUR_REPO/.opencode/skills/find-bugs

# 2. Drop the AGENTS.md pointer into YOUR_REPO/AGENTS.md
cat AGENTS.md >> YOUR_REPO/AGENTS.md

# 3. Open YOUR_REPO in your CLI agent and say:
#    "find bugs in the codebase"

When the audit finishes, you'll get:

YOUR_REPO/BUG_REPORT.md — the categorized report
YOUR_REPO/.bugaudit/ — the scratch directory (gitignore it)

Acceptance checklist

After running BUILD_GUIDE.md, confirm:

.agents/skills/find-bugs/SKILL.md exists and begins with valid YAML frontmatter containing name: find-bugs, a non-empty description, and effort: max.
references/bug-taxonomy.md and assets/report_template.md exist and are non-empty.
scripts/run_static_checks.sh exists, is executable (chmod +x applied), and runs cleanly against an empty directory.
.claude/skills/find-bugs/, .opencode/skill/find-bugs/, and .opencode/skills/find-bugs/ each contain the same four files — diff -r .agents/skills/find-bugs .claude/skills/find-bugs should print nothing.
AGENTS.md ends with the "Bug audit skill" block.
Smoke test: run the skill on a real or small test repo. Confirm BUG_REPORT.md is created with all six category headers present, a filled-in summary table, and a "Methodology & Limitations" section. Confirm .bugaudit/ was created and contains findings.jsonl.
No side effects: git status before and after — the only new paths should be BUG_REPORT.md, .bugaudit/ (if not gitignored yet), and whatever Step 6/7 added during the build.

Optional deeper smoke test (seeded bugs)

If you want concrete proof each category is detected, create a throwaway file with one deliberate bug per category and re-run the skill against it. Examples:

Logical: if (count > 10) where the comment says "trigger when 10 or more" (should be >=).
Typo: compare status == "complete" against a constant defined as "completed".
Formula: total = price - price * 20 where 20 should be 0.20 (20% discount).
Inconsistency: two near-identical helper functions where only one handles a None/null name.
Runtime: items[0].name on a list that can be empty.
Compile/type: call a function with one fewer argument than its definition requires.

A correct run produces one finding per category referencing the right line.

Self-audit

This repo ran the find-bugs skill against itself. The result is BUG_REPORT.md — 1 critical, 3 high, 3 medium, 1 low finding (plus 3 low-confidence possible issues), all of which were fixed before pushing. See the report for the full list, including a real critical bug in the skill's own YAML frontmatter that a strict parser would have rejected.

License

MIT. See LICENSE (add one if missing).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`codeaudit` — A Universal Bug-Audit Skill for AI Coding Agents

Table of contents

The problem

What this repo is

How it works

What it finds

Severity rubric

Repository layout

Quick start

I just want to use the skill on my own repo

I want to build the skill from scratch in a clean repo

I want to modify the skill

The 8-phase audit pipeline

Design choices worth knowing about

1. Common-denominator `SKILL.md` format

2. Progressive disclosure

3. Externalize state, don't trust working memory

4. Three independent levers for "thinking mode max"

5. Read-only by default

Customization

Limitations

Relationship between SPEC, BUILD_GUIDE, and PROMPT

Running it on your own repo

Acceptance checklist

Optional deeper smoke test (seeded bugs)

Self-audit

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.agents/skills/find-bugs		.agents/skills/find-bugs
.claude/skills/find-bugs		.claude/skills/find-bugs
.opencode		.opencode
.gitignore		.gitignore
AGENTS.md		AGENTS.md
BUG_REPORT.md		BUG_REPORT.md
BUILD_GUIDE.md		BUILD_GUIDE.md
LICENSE		LICENSE
PROMPT_FOR_AGENT.md		PROMPT_FOR_AGENT.md
README.md		README.md
SPEC.md		SPEC.md

Folders and files

Latest commit

History

Repository files navigation

codeaudit — A Universal Bug-Audit Skill for AI Coding Agents

Table of contents

The problem

What this repo is

How it works

What it finds

Severity rubric

Repository layout

Quick start

I just want to use the skill on my own repo

I want to build the skill from scratch in a clean repo

I want to modify the skill

The 8-phase audit pipeline

Design choices worth knowing about

1. Common-denominator SKILL.md format

2. Progressive disclosure

3. Externalize state, don't trust working memory

4. Three independent levers for "thinking mode max"

5. Read-only by default

Customization

Limitations

Relationship between SPEC, BUILD_GUIDE, and PROMPT

Running it on your own repo

Acceptance checklist

Optional deeper smoke test (seeded bugs)

Self-audit

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`codeaudit` — A Universal Bug-Audit Skill for AI Coding Agents

1. Common-denominator `SKILL.md` format

Packages