battle-test

A Claude Code skill that runs a council of personas over any artifact before you commit to it.

What it is

battle-test is a Claude Code skill. Point it at any artifact — a blog draft, a lab plan, an architecture proposal, a contract section, anything you'd want a second opinion on — and it dispatches nine personas in parallel. Each one reads the artifact through their own lens, files findings tagged Critical / Material / Polish, and votes Ship / One More Round / Structural Rework. The votes don't always agree; the gate decides anyway. You get a markdown review log and an HTML report in about 60 seconds.

https://github.com/botz-pillar/battle-test/raw/main/assets/launch-video.mp4

90-second explainer. The recursive moment is real — battle-test refused to ship its own README on the first pass.

Why this exists / what skills are for

This repo is also a worked example of a Claude Code skill. If you've found yourself prompting Claude with the same scaffolding over and over for some workflow you keep doing, that's a skill waiting to be written. battle-test is mine for review-before-you-commit. Yours might be different — and the structure of this repo (skill file + persona prompts + templates + plugin manifest) is the shape that's worked for me. Read it, take what you want.

Install

claude plugin marketplace add botz-pillar/battle-test
claude plugin install battle-test@battle-test

(Claude Code installs plugins via marketplaces. The first command adds this repo as a single-plugin marketplace; the second installs the skill from it.)

Before running on your own artifacts, see Data flow & sub-processors below.

First run

Try it on the included sample first — see what a real council critique looks like before pointing it at your own work:

/battle-test examples/sample-blog-post.md

The sample is deliberately mid-tier (looks fine on first read; has hidden slop, weak claims, vague CTA). The council will surface real findings.

To preview without dispatching any Claude calls:

/battle-test examples/sample-blog-post.md --dry-run

Example output

See examples/sample-review-output.html for what running on the sample produces.

Generated, not handwritten. The output ships unedited.

The 9 personas

Persona	Role lens
`target-audience-primary`	Generic primary-audience template — customize for your domain
`domain-practitioner`	Subject-matter expert in your field
`ai-security-researcher`	Adversarial AI / agentic security expert (OWASP Agentic Threats, NIST AI RMF, MITRE ATLAS)
`instructional-designer`	Adult-learning / curriculum design
`copywriter`	Direct-response / community voice
`policy-risk-reviewer`	Regulatory / legal / brand-risk lens (directional only — not legal advice)
`audience-skeptic`	Reader who's been burned by similar content before
`future-self`	You, six months from now, rereading your own work
`contrarian`	Fatal-flaw hunter (non-voting)

How it works

Stage detection — the skill picks a stage from the filename (or --stage if you override).
Roster selection — picks 3 personas (iterate tier: target-audience-primary, domain-practitioner, plus one stage-specific or technical-anchor persona) or 5 (gate tier: same three plus future-self voting and contrarian non-voting). The full roster composition rules — including how stage-specific add-ons (instructional-designer, copywriter, audience-skeptic, policy-risk-reviewer) get layered in — are in skills/battle-test/SKILL.md.
Parallel dispatch — each persona runs in a separate subagent with allowed_tools=[]. The artifact is fenced inside a per-dispatch nonce-tagged tag so the skill can't be tricked by directives in the artifact itself.
Synthesis + verdict — the synthesizer reads validated structured outputs (counts, votes, anti-slop strings — not free-form persona prose). The verdict is computed mechanically from the validated counts and votes, not from the synthesizer's narrative reasoning. The synthesizer authors the headline prose; it does not author the verdict.

Severity + verdict

Each finding is tagged:

Critical — would materially break the artifact or fail a hostile scan. Mandatory fix.
Material — meaningful gap or weakness. Should fix before publish.
Polish — refinement that improves quality but isn't blocking.

The deterministic gate:

Verdict	Rule
Ship	All voters Ship AND zero unresolved Critical AND ≤1 unresolved Material
One More Round	Mixed votes OR ≥2 unresolved Material findings (no Critical)
Structural Rework	Any unresolved Critical OR ≥2 voters Structural Rework

The gate function is deterministic given persona votes; the persona votes themselves are LLM-generated and have run-to-run variance. Run twice on the same artifact and you may see different finding counts and (rarely) a different verdict. The gate stops the obvious slop from shipping; it doesn't claim to be a reliability instrument.

Customize it

The personas in skills/battle-test/personas/ are markdown prompt files. Edit them. The two you'll customize first:

target-audience-primary.md — your specific audience profile. The more specific, the sharper the findings.
domain-practitioner.md — a senior practitioner in your field. Replace the generic template with your domain (medical writing, legal drafting, fiction, GTM copy, security engineering — whatever fits).

The personas/AUTHORING.md file is intentionally not here. The persona file contract is inline at the top of each .md — copy any of them as a starting point and rewrite. Five sections: Identity, Vocabulary, Triggers, Anti-triggers, Anchor date.

Data flow & sub-processors

When you run battle-test, the artifact's full content is sent to Anthropic (an external AI sub-processor) as input to each persona subagent.

Anthropic is a sub-processor of any artifact you point this skill at.

If your artifact contains regulated data (PII, PHI, CUI, attorney-client material, ITAR, financial records, customer-confidential information), confirm that your organization permits sending that data to Anthropic before running. Battle-test does not make that determination for you.

For HIPAA / PCI / FedRAMP / EU AI Act / GDPR / CUI workflows: run your vendor-risk process on Anthropic before adoption. Anthropic's DPA and trust information: https://www.anthropic.com/legal

Residency: calls dispatch via the Claude Code session you're already authenticated to. Whatever residency your Claude Code config + Anthropic contract establish is what applies. Battle-test does not constrain or override that.

This tool is informational, not a compliance control. It produces a timestamped, hash-stamped record suitable for inclusion in a publishing or review-workflow audit trail. It does not satisfy a regulatory control on its own.

The pre-flight banner asks you to confirm the artifact is approved for third-party AI processing before each run. The --yes flag skips the prompt; pair it with --data-classification-confirmed in scripted use, otherwise the skill warns you on stderr.

Threat model

Before reading the security architecture, here's what battle-test is defending against. The four guarantees below are scoped to these adversaries; outside this list, no claims are made.

Adversary	Capability	What they can move
Artifact author (most common)	Authors or edits the file you point the skill at. Can embed prompt-injection payloads, role-shift directives, fake-system-message framings, hidden Unicode, fence-breakout attempts, attacker-controlled URLs / code blocks.	Can attempt to manipulate persona responses. Cannot escalate to the host — subagents have `allowed_tools=[]`.
Filename-controlled adversary	Authors or edits the file and its filename. Same as above plus: can pick a filename pattern that auto-routes to a more permissive stage (`-DRAFT.md` vs `-FINAL.md`).	Can influence which personas review the artifact when stage is auto-detected. Mitigation: pass `--stage` explicitly when reviewing third-party content; the pre-flight banner surfaces this when stage is auto-detected.
Persona-output coercer	Successfully shifts a persona's reasoning via injection.	Can produce biased findings. Cannot inject into the synthesizer — synthesizer ingests structured fields wrapped in fresh nonce-tagged fences and validates against schema; verdict is computed from validated counts/votes, not synthesizer prose.
Output-render exploiter	Hopes the HTML report renders attacker-controlled strings as live HTML.	Cannot — every interpolated string is HTML-entity escaped, CSP `default-src 'none'` blocks resource loads, no `<script>`/`<a href>`/`<img src>`/`style=` from artifact content.

What is NOT in the threat model:

Skill-file tamperer. An attacker who can edit skills/battle-test/SKILL.md, personas/*.md, or models.json on your local machine has full control of the review pipeline (and likely full code execution as your user, independent of this skill). Treat the skill files like any other code you run — review on install and on update, pin to known-good commits if you fork.
Anthropic-side compromise. Personas dispatch as Claude API calls. If the model behaves maliciously or the inference path is compromised, no part of this skill protects you. This is the same trust posture as any other Claude Code workflow.
Network adversary against your Claude Code session. Out of scope — that's a Claude Code property, not a battle-test property.

The architecture below mitigates the four in-scope adversaries. If your threat model includes more (e.g., supply-chain assurance for the skill itself), bring additional controls (signed releases, pinned hashes, sandbox isolation).

Security architecture

Four numbered guarantees, scoped to the threat model above. All four ship in v1.0.

Subagent least-privilege. Persona subagents are dispatched with allowed_tools=[]. Even on full prompt-injection success against a persona, the resulting subagent has no tools — no Read, no Bash, no network. The orchestrator pre-reads the artifact bytes and passes them inline.
Nonce-tagged input fence. Per-dispatch CSPRNG nonce on the <artifact_<nonce>> fence. Any literal </artifact_<nonce>> in the artifact is HTML-entity-escaped before interpolation, so the artifact cannot break out of the fence.
Strict HTML escaping on all output. Every interpolated string in the HTML companion (artifact quotes, persona findings, headline text — everything) passes through HTML-entity-escape. CSP default-src 'none', no <script>, no <a href> from artifact, no <img src> from artifact, no style= attributes on interpolated content.
Falsifiable security claim. examples/sample-adversarial.md ships canonical prompt-injection payloads. examples/sample-adversarial-expected.md documents the expected verdict (Structural Rework) and a regression-response procedure that explicitly handles run-to-run variance: a single SHIP verdict is treated as rare-variance, but ≥2 SHIPs in 3 consecutive runs is a regression. The claim is testable across stochastic votes, not asserted against a single fixed expected output.

The synthesizer (which combines persona outputs into the verdict) gets the same data-vs-instructions treatment: persona payloads are wrapped in their own nonce-tagged <persona_payload> tags with strict-schema validation. Persona free-text fields (findings, anti-slop quotes, strengths) are treated as data, never as synthesizer instructions. The verdict is computed mechanically from the validated vote enum and severity counters; the synthesizer cannot move the verdict by anything it writes in the headline prose. This closes the persona-text-as-laundering-channel surface that strict-schema-of-shape alone would not.

Known limitations

Indirect prompt injection. A poisoned artifact can shift persona behavior. The kit's defenses (least-privilege subagents, input fencing, output escaping, falsifiable testing) limit blast radius and make regressions detectable, but do not absolutely prevent a sophisticated injection from biasing findings. Treat third-party content with appropriate skepticism. Defense-in-depth: read the artifact yourself before handing it to any AI tool.
Persona drift across model versions. Personas are calibrated against the model IDs and rates pinned in skills/battle-test/models.json (with a rates_as_of date). When the underlying model versions change, persona behavior may shift. Bump models.json and the CHANGELOG when you re-verify.
Severity tags are LLM-self-assigned. Each persona tags its own findings Critical / Material / Polish. There is no external calibration rubric — a persona that is over-cautious or over-permissive will skew the counts that the gate function then treats as deterministic. The mechanical verdict closes the prose-laundering channel; it does not close the severity-laundering channel one column over. Treat verdicts as decision support, not adjudication.
English-only tuning. Personas work on non-English artifacts but are calibrated against English. Results on other languages will be uneven.
Run-to-run variance. LLM votes are stochastic. The gate is deterministic given the votes, but the votes themselves can shift slightly between runs.

Roadmap

These aren't promises — they're the directions the kit is most likely to grow.

An interactive /battle-test --init wizard for first-time customization.
Per-persona model overrides in models.json.
Marketplace distribution as the Claude Code plugin ecosystem matures (status as of 2026-04-28: install via claude plugin marketplace add botz-pillar/battle-test; check the CHANGELOG for current distribution).

No GitHub Action. No SaaS. No community persona-pack contribution structure — fork the repo if you want to extend it.

License + author

MIT. Maintained by Josh Botz (Pillar Security).

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
assets		assets
commands		commands
examples		examples
skills/battle-test		skills/battle-test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

battle-test

What it is

Why this exists / what skills are for

Install

First run

Example output

The 9 personas

How it works

Severity + verdict

Customize it

Data flow & sub-processors

Threat model

Security architecture

Known limitations

Roadmap

License + author

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

battle-test

What it is

Why this exists / what skills are for

Install

First run

Example output

The 9 personas

How it works

Severity + verdict

Customize it

Data flow & sub-processors

Threat model

Security architecture

Known limitations

Roadmap

License + author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages