Skip to content

muhammad-saadd/codeaudit

Repository files navigation

codeaudit — A Universal Bug-Audit Skill for AI Coding Agents

A single, portable, cross-agent skill that audits any git repository for logical bugs, typos, formula/calculation mistakes, inconsistencies, runtime errors, and compile-time/type/syntax errors — and writes a categorized BUG_REPORT.md. Works in Claude Code, OpenCode, Codex CLI, and any other agent that follows the open Agent Skills standard.

status license skill standard categories


Table of contents


The problem

Every modern CLI coding agent — Claude Code, OpenCode, Codex CLI, Cursor, Gemini CLI, Copilot — already has a notion of a "skill", "slash command", or "agent". And every senior engineer has, at some point, asked an agent "find bugs in the codebase" and gotten back a shallow, hand-wavy answer that misses the categories that actually matter: typos that silently fail comparisons, formulas that compute the wrong number, near-duplicate functions that have drifted apart, code that compiles but blows up at runtime.

The problem is that bug-hunting quality scales directly with how carefully the model reasons, and most agents default to "good enough" reasoning unless you explicitly ask for the deepest mode. Worse, a less capable model will silently drop findings it can't hold in context, and return a report that looks complete but isn't.

This repo is the build/audit harness for a single skill — find-bugs — that fixes both problems:

  1. It forces maximum reasoning depth via three independent levers (frontmatter effort: max + the trigger word ultrathink + explicit in-body instructions), so a plain request like "find bugs in the codebase" gets the same depth as if you'd typed ultrathink yourself.
  2. It externalizes the audit's working state to a scratch directory (.bugaudit/) using a fixed JSONL schema, so a weaker model can audit a 200-file repo without losing track of earlier findings.

The result: a reproducible, file-line-precise BUG_REPORT.md for any repository, regardless of which CLI agent is running the audit.


What this repo is

This repo is not a typical code project. It is the build harness for the find-bugs skill itself. After BUILD_GUIDE.md runs, the deliverable is the find-bugs skill — placed under .agents/skills/, .claude/skills/, and .opencode/skill/, .opencode/skills/, plus a short pointer in AGENTS.md.

┌──────────────────────────────────────────────────────────────┐
│                    THIS REPO (codeaudit)                     │
│                                                              │
│   ┌─────────────┐  ┌──────────────┐  ┌──────────────────┐    │
│   │  SPEC.md    │  │ BUILD_GUIDE  │  │ PROMPT_FOR_AGENT │    │
│   │  (the why)  │  │  (the what)  │  │  (the prompt)    │    │
│   └──────┬──────┘  └──────┬───────┘  └────────┬─────────┘    │
│          │                │                   │              │
│          ▼                ▼                   ▼              │
│   ┌────────────────────────────────────────────────────┐     │
│   │   After running BUILD_GUIDE.md Steps 1-7:         │     │
│   │   .agents/skills/find-bugs/SKILL.md  (canonical)  │     │
│   │   .agents/skills/find-bugs/references/bug-taxonomy │     │
│   │   .agents/skills/find-bugs/assets/report_template │     │
│   │   .agents/skills/find-bugs/scripts/run_static_*   │     │
│   │   + 3 byte-identical mirrors                      │     │
│   │   + AGENTS.md pointer block                       │     │
│   └────────────────────────────────────────────────────┘     │
│                                                              │
│   The skill then audits any repo it's pointed at.            │
└──────────────────────────────────────────────────────────────┘

If you only want to use the skill, copy the four files in .agents/skills/find-bugs/ (plus the three mirrors and the AGENTS.md block) to your repo. If you want to build/modify the skill, edit SPEC.md and BUILD_GUIDE.md together — they form the design and implementation contract.


How it works

You (any CLI coding agent)            codeaudit skill
┌────────────┐                          ┌────────────────────┐
│  user:     │   "find bugs in the     │  SKILL.md (loaded) │
│ "find bugs │ ── codebase" ────────▶  │  + references/     │
│  in the    │                          │  + assets/         │
│ codebase"  │                          │  + scripts/        │
└────────────┘                          └─────────┬──────────┘
                                                  │
                                                  ▼
                              ┌─────────────────────────────────┐
                              │  Phase 0  Setup                 │  .bugaudit/
                              │  Phase 1  Inventory & triage    │  ├── inventory.md
                              │  Phase 2  Static analysis       │  ├── static-analysis.md
                              │  Phase 3  Per-file review       │  ├── findings.jsonl
                              │  Phase 4  Cross-file sweep      │  └── notes.md
                              │  Phase 5  Formula verification  │
                              │  Phase 6  Triage & dedup        │
                              │  Phase 7  Write BUG_REPORT.md   │  ┌───────────────┐
                              │  Phase 8  Chat summary          │  │ BUG_REPORT.md │
                              └─────────────────────────────────┘  └───────────────┘

When the skill triggers, it:

  1. Creates a .bugaudit/ scratch directory in the target repo (the one being audited, not this one) and never touches any source file there.
  2. Enumerates the target repo's files and classifies them into Tier 1/2/3 (entry points + calculation-shaped files get deep-reviewed; tests and generated code are skimmed).
  3. Runs read-only static analysis (the bundled run_static_checks.sh script detects the toolchain — Node, Python, Go, Rust, Java, .NET, PHP, Ruby — and runs whatever checkers are installed).
  4. Walks every Tier 1/2 file against the six bug categories, appending one JSON line per finding to findings.jsonl.
  5. Performs cross-file consistency checks (drifted duplicates, mismatched API contracts, stale docs).
  6. Verifies any arithmetic with business/scientific meaning term-by-term against the intended formula.
  7. Triages/dedups/severities the findings, then writes BUG_REPORT.md using assets/report_template.md as the skeleton.
  8. Gives you a 1-paragraph chat summary (counts by severity + the single most important issue).

The whole thing is read-only against the audited repo by design.


What it finds

The taxonomy has six categories, mapped 1:1 to the findings.jsonl schema and to the six section headers in BUG_REPORT.md:

# Category Definition (short)
1 Logical Control flow / boolean logic doesn't match the intent implied by naming, comments, or surrounding code.
2 Typos A misspelled literal, key, identifier, or pattern that should byte-for-byte match another one but doesn't, causing a silent mismatch.
3 Formula / Calculation Arithmetic, statistical, or unit-conversion logic that runs fine but computes the wrong number.
4 Inconsistencies Two or more places that should agree (behavior, format, naming, validation, docs) but have drifted apart.
5 Runtime Code that parses/compiles fine but can throw, crash, hang, or behave unsafely on realistic inputs.
6 Compile-time / Syntax / Type The code would fail to compile/parse or be rejected by a type checker.

Full checklists and worked examples for each category are in .agents/skills/find-bugs/references/bug-taxonomy.md.

Severity rubric

Severity Meaning
Critical Crashes, data loss/corruption, security issue, or completely wrong output on a common/primary path.
High Incorrect results or failures on common inputs/paths — materially wrong behavior, not necessarily a crash.
Medium Wrong only on edge cases/uncommon inputs, or a real inconsistency likely to cause a bug later.
Low Cosmetic — typos in comments/logs/UI copy, minor style inconsistencies, very-low-probability edge cases.

confidence (high/medium/low) is tracked separately from severity — a finding can be severe but uncertain (goes to the "Possible Issues" appendix) or minor but certain (goes in the main tables as Low).


Repository layout

codeaudit/
├── README.md                          ← you are here
├── AGENTS.md                          ← short pointer block (always-loaded context)
├── SPEC.md                            ← design rationale (the "why")
├── BUILD_GUIDE.md                     ← runnable shell that builds the skill
├── PROMPT_FOR_AGENT.md                ← prompt to hand to a coding agent
├── BUG_REPORT.md                      ← self-audit report of this repo
│
├── .agents/skills/find-bugs/          ← canonical skill (4 files)
│   ├── SKILL.md                       ← main instructions, YAML frontmatter
│   ├── references/
│   │   └── bug-taxonomy.md            ← full category checklist
│   ├── assets/
│   │   └── report_template.md         ← BUG_REPORT.md skeleton
│   └── scripts/
│       └── run_static_checks.sh       ← read-only multi-toolchain runner
│
├── .claude/skills/find-bugs/          ← mirror (Claude Code discovers here)
├── .opencode/skill/find-bugs/         ← mirror (OpenCode singular form)
└── .opencode/skills/find-bugs/        ← mirror (OpenCode plural form)

The three mirror directories under .claude/ and .opencode/ are byte-identical to the canonical .agents/skills/find-bugs/ (verified by diff -r after every build). They are present because the open Agent Skills standard does not yet mandate a single scan path — different agents and different OpenCode releases have used both singular and plural forms. Mirroring the same folder into all of them is the safest cross-tool contract.


Quick start

I just want to use the skill on my own repo

  1. Copy this whole directory (or at minimum the four .agents/skills/find-bugs/ files) to the root of your target repo.
  2. Open your CLI coding agent in that repo.
  3. Say "find bugs in the codebase" — or invoke explicitly with /find-bugs (Claude Code, OpenCode) or $find-bugs (Codex CLI).
  4. A BUG_REPORT.md will be written to your repo root when the audit finishes.

I want to build the skill from scratch in a clean repo

  1. Copy SPEC.md, BUILD_GUIDE.md, and PROMPT_FOR_AGENT.md to the target repo.
  2. Open your CLI coding agent in that repo.
  3. Paste the contents of PROMPT_FOR_AGENT.md (everything below the --------------- divider) as your prompt.
  4. The agent will execute BUILD_GUIDE.md Steps 1-7 in order, then run Step 8's verification commands and report PASS/FAIL.

I want to modify the skill

  1. Edit SPEC.md first (it's the rationale). Update the section that describes the change.
  2. Edit BUILD_GUIDE.md to match (every shell code block in Steps 1-7 is the literal final content to write to disk).
  3. Re-run BUILD_GUIDE.md Steps 1-7 to regenerate the skill.
  4. Re-run Step 6 to re-mirror the canonical copy to .claude/, .opencode/skill/, and .opencode/skills/.

The 8-phase audit pipeline

┌────────────┐
│  Phase 0   │  Setup — create .bugaudit/ scratch dir (and gitignore it)
└─────┬──────┘
      ▼
┌────────────┐
│  Phase 1   │  Inventory & triage — Tier 1/2/3 classification
└─────┬──────┘
      ▼
┌────────────┐
│  Phase 2   │  Static analysis — run read-only checkers (tsc, ruff, etc.)
└─────┬──────┘
      ▼
┌────────────┐
│  Phase 3   │  Per-file deep review — every Tier 1/2 file × 6 categories
└─────┬──────┘
      ▼
┌────────────┐
│  Phase 4   │  Cross-file consistency sweep — signature mismatches,
└─────┬──────┘  drifted duplicates, stale docs, API-contract gaps
      ▼
┌────────────┐
│  Phase 5   │  Formula & calculation verification — derive the intended
└─────┬──────┘  formula symbolically, compare term-by-term
      ▼
┌────────────┐
│  Phase 6   │  Triage, dedup, severity — merge same-root-cause findings,
└─────┬──────┘  confirm severities, drop false positives
      ▼
┌────────────┐
│  Phase 7   │  Write BUG_REPORT.md — fill assets/report_template.md
└─────┬──────┘
      ▼
┌────────────┐
│  Phase 8   │  Chat summary — counts by severity + top issue, no re-paste
└────────────┘

The full details of each phase live in .agents/skills/find-bugs/SKILL.md. The full checklists for each category are in .agents/skills/find-bugs/references/bug-taxonomy.md.


Design choices worth knowing about

1. Common-denominator SKILL.md format

Anthropic's Agent Skills docs, OpenAI's Codex Skills docs, and third-party compatibility surveys all converge on the same minimum viable SKILL.md: a YAML frontmatter block with name and description, followed by Markdown instructions. Anything beyond that (allowed-tools, context: fork, hooks, effort) is tool-specific but safely ignored by agents that don't support them. So the skill writes one canonical SKILL.md using only the universal fields plus effort: max (Claude Code's adaptive-effort system) and mirrors it everywhere.

2. Progressive disclosure

SKILL.md itself is kept well under the ~5,000-word guidance so the model isn't overwhelmed when the skill loads. The 6,000-word bug checklist lives in references/bug-taxonomy.md (loaded in Phase 3). The report skeleton lives in assets/report_template.md. The static-analysis runner lives in scripts/run_static_checks.sh. SKILL.md is the orchestration layer.

3. Externalize state, don't trust working memory

A weaker model auditing a 200-file repo will lose track of earlier findings if it has to hold them all in context. The skill writes everything to .bugaudit/ as it goes:

.bugaudit/
├── inventory.md           # Phase 1: file tiers + counts
├── static-analysis.md     # Phase 2: tool output
├── findings.jsonl         # Phases 3-5: one JSON line per finding
└── notes.md               # running progress log

Phase 6 (triage/dedup) becomes a mechanical "read all lines, group by root cause" operation instead of relying on recall.

4. Three independent levers for "thinking mode max"

Layer Mechanism Applies to
1 effort: max in SKILL.md frontmatter Claude Code's adaptive-effort system
2 Literal word ultrathink in §0 Claude Code's trigger-word preprocessing
3 Explicit in-body instructions Any model, including DeepSeek-class and OpenCode

Why three? The skill is supposed to auto-trigger on a plain "find bugs in the codebase" — which won't contain any thinking keyword. The frontmatter and in-body instructions make the skill self-sufficient regardless of how it was invoked.

5. Read-only by default

A bug-finding skill that also starts editing files is a different (riskier) product. This skill explicitly never modifies source files. "Fix the bugs" is a deliberate follow-up task, not part of this skill.


Customization

Knob Where to change it
Output filename / location SKILL.md §1 (and assets/report_template.md if you want it reflected in the skeleton)
Large-repo threshold (~150 files) SKILL.md Phase 1 step 5 — raise for faster/shallower, lower to force explicit "Methodology & Limitations" disclosure
Tier 1 patterns (entry points, calc-shaped files) SKILL.md Phase 1 step 3 — add project-specific globs (e.g. monorepo package names)
Static-analysis coverage scripts/run_static_checks.sh — add another if [ -f <manifest> ]; then ... fi block following the existing pattern
findings.jsonl schema SKILL.md §3.3 — add fields like tags: [] and update assets/report_template.md if they should surface in the report

Limitations

  • Very large monorepos (thousands of files) will hit the Tier 1/2 budget quickly. The skill is designed to disclose this in "Methodology & Limitations" rather than solve it. For true monorepos, scope invocations to one package at a time (the skill supports a user-specified path).
  • Formula verification (Phase 5) depends on domain inference — for highly specialized math (actuarial, cryptographic, ML-numerical) the model may not know the "correct" formula to compare against; such findings should naturally land at confidence: low.
  • Static-analysis script coverage is best-effort and assumes common CLI tool names/flags. Some projects use wrapper scripts (make lint, npm run check) — the script falls back to ad-hoc commands if the bundled script doesn't cover the project's toolchain.
  • No fix mode — by design. A natural follow-up skill (fix-bugs or apply-bug-fixes) could read BUG_REPORT.md and apply fixes one at a time with user confirmation. Out of scope here.

Relationship between SPEC, BUILD_GUIDE, and PROMPT

The three top-level files form a three-layer contract:

┌─────────────────────────────────────────────────────────────┐
│  SPEC.md  (the why)                                         │
│  ─────────                                                  │
│  Design rationale, compatibility matrix, severity rubric,   │
│  acceptance checklist. Read once when building or           │
│  modifying the skill.                                       │
└──────────────────────────┬──────────────────────────────────┘
                           │ informs
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  BUILD_GUIDE.md  (the what)                                 │
│  ────────────────                                           │
│  Literal, runnable shell blocks that write the skill to     │
│  disk exactly as given. Source of truth for file contents   │
│  and paths. Safe to re-run (idempotent: mkdir -p, cat >     │
│  overwrite, rm -rf before cp -r).                           │
└──────────────────────────┬──────────────────────────────────┘
                           │ executed by
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  PROMPT_FOR_AGENT.md  (the prompt)                           │
│  ──────────────────────                                     │
│  Short instruction to hand to a coding agent (e.g. DeepSeek │
│  V4 Flash) so it executes BUILD_GUIDE.md mechanically,      │
│  without reinterpreting or "improving" it.                   │
└─────────────────────────────────────────────────────────────┘

None of these three files are part of the skill itself — once built, the skill is just .agents/skills/find-bugs/ (+ mirrors + the AGENTS.md pointer). They can be deleted from the target repo after a successful build, or kept under e.g. docs/dev/ for future maintenance.


Running it on your own repo

# 1. Copy the skill into your repo
cp -r .agents/skills/find-bugs YOUR_REPO/.agents/skills/find-bugs
cp -r .claude/skills/find-bugs YOUR_REPO/.claude/skills/find-bugs
cp -r .opencode/skill/find-bugs YOUR_REPO/.opencode/skill/find-bugs
cp -r .opencode/skills/find-bugs YOUR_REPO/.opencode/skills/find-bugs

# 2. Drop the AGENTS.md pointer into YOUR_REPO/AGENTS.md
cat AGENTS.md >> YOUR_REPO/AGENTS.md

# 3. Open YOUR_REPO in your CLI agent and say:
#    "find bugs in the codebase"

When the audit finishes, you'll get:

  • YOUR_REPO/BUG_REPORT.md — the categorized report
  • YOUR_REPO/.bugaudit/ — the scratch directory (gitignore it)

Acceptance checklist

After running BUILD_GUIDE.md, confirm:

  • .agents/skills/find-bugs/SKILL.md exists and begins with valid YAML frontmatter containing name: find-bugs, a non-empty description, and effort: max.
  • references/bug-taxonomy.md and assets/report_template.md exist and are non-empty.
  • scripts/run_static_checks.sh exists, is executable (chmod +x applied), and runs cleanly against an empty directory.
  • .claude/skills/find-bugs/, .opencode/skill/find-bugs/, and .opencode/skills/find-bugs/ each contain the same four files — diff -r .agents/skills/find-bugs .claude/skills/find-bugs should print nothing.
  • AGENTS.md ends with the "Bug audit skill" block.
  • Smoke test: run the skill on a real or small test repo. Confirm BUG_REPORT.md is created with all six category headers present, a filled-in summary table, and a "Methodology & Limitations" section. Confirm .bugaudit/ was created and contains findings.jsonl.
  • No side effects: git status before and after — the only new paths should be BUG_REPORT.md, .bugaudit/ (if not gitignored yet), and whatever Step 6/7 added during the build.

Optional deeper smoke test (seeded bugs)

If you want concrete proof each category is detected, create a throwaway file with one deliberate bug per category and re-run the skill against it. Examples:

  • Logical: if (count > 10) where the comment says "trigger when 10 or more" (should be >=).
  • Typo: compare status == "complete" against a constant defined as "completed".
  • Formula: total = price - price * 20 where 20 should be 0.20 (20% discount).
  • Inconsistency: two near-identical helper functions where only one handles a None/null name.
  • Runtime: items[0].name on a list that can be empty.
  • Compile/type: call a function with one fewer argument than its definition requires.

A correct run produces one finding per category referencing the right line.


Self-audit

This repo ran the find-bugs skill against itself. The result is BUG_REPORT.md — 1 critical, 3 high, 3 medium, 1 low finding (plus 3 low-confidence possible issues), all of which were fixed before pushing. See the report for the full list, including a real critical bug in the skill's own YAML frontmatter that a strict parser would have rejected.


License

MIT. See LICENSE (add one if missing).

About

Universal bug-audit skill for AI coding agents (Claude Code, OpenCode, Codex CLI). Implements 6-category audit pipeline (logical, typos, formula, inconsistency, runtime, compile) with externalized state, three-layer max-reasoning levers, and a categorized BUG_REPORT.md output.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages