diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json new file mode 100644 index 0000000..a618bae --- /dev/null +++ b/.claude-plugin/marketplace.json @@ -0,0 +1,13 @@ +{ + "name": "mindbox-cloud-plugins", + "owner": { + "name": "mindbox.cloud" + }, + "plugins": [ + { + "name": "skill-review", + "source": "./plugins/skill-review", + "description": "Quick AI reviewer for Agent Skills: checks structure, workflow, references, and links. Clear report without jargon, with a summary from an exhausted data scientist." + } + ] +} diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 0000000..16a8167 --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,36 @@ +## Type of Change + +- [ ] New plugin +- [ ] Update to existing plugin +- [ ] Repository tooling / docs + +--- + +## Plugin Checklist + +> Fill this in if you are adding or modifying a plugin. Skip for tooling-only changes. + +- [ ] All content in English +- [ ] `plugin.json` has `name`, `description` (English), `version`, `author` +- [ ] `SKILL.md` frontmatter complete: `name` (kebab-case, matches folder), `description` with trigger phrases and "Don't use when", `metadata.version` +- [ ] `marketplace.json` updated +- [ ] Root `README.md` plugin table updated +- [ ] `CHANGELOG.md` updated in plugin root +- [ ] CI (`validate.yml`) passes + +--- + +## Skill Review Results + +> Skip this section for tooling/docs-only changes. + +Run `skill-review:skill-review` on the added or modified skill and paste the summary below. + +**Scope used:** Personal / Team / Repository / Full + +**Statistics:** FAIL: N, WARNING: N, PASS: N + +**Top 3 issues (if any):** +1. +2. +3. diff --git a/.github/workflows/validate.yml b/.github/workflows/validate.yml new file mode 100644 index 0000000..b020ca6 --- /dev/null +++ b/.github/workflows/validate.yml @@ -0,0 +1,65 @@ +name: Validate plugins + +on: + push: + pull_request: + +jobs: + validate: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Validate marketplace.json and plugin structure + run: | + set -e + + MARKETPLACE=".claude-plugin/marketplace.json" + + echo "==> Checking marketplace.json is valid JSON" + jq empty "$MARKETPLACE" + echo " OK" + + echo "==> Checking each plugin entry" + PLUGINS=$(jq -r '.plugins[].source' "$MARKETPLACE") + + for SOURCE in $PLUGINS; do + PLUGIN_DIR="${SOURCE#./}" + echo "" + echo "--- Plugin: $PLUGIN_DIR" + + # Directory exists + if [ ! -d "$PLUGIN_DIR" ]; then + echo "ERROR: directory not found: $PLUGIN_DIR" + exit 1 + fi + + # plugin.json exists and is valid JSON + MANIFEST="$PLUGIN_DIR/.claude-plugin/plugin.json" + if [ ! -f "$MANIFEST" ]; then + echo "ERROR: missing plugin.json at $MANIFEST" + exit 1 + fi + jq empty "$MANIFEST" + + # Required fields in plugin.json + for FIELD in name version author; do + VALUE=$(jq -r ".$FIELD // empty" "$MANIFEST") + if [ -z "$VALUE" ]; then + echo "ERROR: missing field '$FIELD' in $MANIFEST" + exit 1 + fi + done + + # At least one SKILL.md exists + SKILL_COUNT=$(find "$PLUGIN_DIR/skills" -name "SKILL.md" 2>/dev/null | wc -l) + if [ "$SKILL_COUNT" -eq 0 ]; then + echo "ERROR: no SKILL.md found under $PLUGIN_DIR/skills/" + exit 1 + fi + + echo " OK (${SKILL_COUNT} skill(s))" + done + + echo "" + echo "==> All plugins valid" diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..a3940ca --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +*.DS_Store +.env +*.log diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..1b2eecb --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,162 @@ +# agent-skills — Developer Guide for Claude Code + +## Repository Purpose + +This is a public marketplace of Claude Code plugins by mindbox.cloud. Each plugin extends Claude Code with skills (slash commands), agents, hooks, or MCP servers. + +Plugins are designed to be shared with the community. All content must be in English. + +--- + +## Repository Structure + +``` +agent-skills/ +├── .claude-plugin/ +│ └── marketplace.json # marketplace registry — update when adding a plugin +├── plugins/ +│ └── / +│ ├── .claude-plugin/ +│ │ └── plugin.json # plugin manifest +│ ├── skills/ +│ │ └── / +│ │ ├── SKILL.md +│ │ └── references/ +│ ├── CHANGELOG.md +│ └── README.md +├── CLAUDE.md +├── CONTRIBUTING.md +└── README.md +``` + +--- + +## How to Add a New Plugin + +### Step 1 — Create the directory structure + +```bash +mkdir -p plugins//.claude-plugin +mkdir -p plugins//skills//references +``` + +Plugin names: kebab-case, no spaces. + +### Step 2 — Write `plugin.json` + +```json +{ + "name": "plugin-name", + "description": "One-line description in English", + "version": "1.0.0", + "author": { + "name": "Author or org name" + } +} +``` + +Required fields: `name`. Recommended: `description`, `version`, `author`. + +### Step 3 — Write `SKILL.md` + +```yaml +--- +name: skill-name # kebab-case, matches folder name +description: > + What it does — one sentence. + Use when user says "...", "...", "...". + Don't use when: user asks for X (use y-skill instead), user asks for Z. +metadata: + version: 1.0.0 # semver — instruction contract version +--- + +Skill body: orchestration logic, steps, critical rules, troubleshooting. +Put domain knowledge and checklists in references/, not inline. +``` + +Frontmatter requirements: +- `name` — kebab-case, 1–64 chars, matches folder name +- `description` — WHAT it does + WHEN to use (trigger phrases) + "Don't use when" (negative triggers) +- `metadata.version` — semver + +### Step 4 — Register in `marketplace.json` + +Add an entry to `.claude-plugin/marketplace.json`: + +```json +{ + "name": "mindbox-cloud-plugins", + "owner": { "name": "mindbox.cloud" }, + "plugins": [ + { + "name": "plugin-name", + "source": "./plugins/plugin-name", + "description": "Same one-line description as plugin.json" + } + ] +} +``` + +### Step 5 — Write `README.md` for the plugin + +Human-facing documentation at `plugins//README.md`. Describe: what it does, review scopes (if applicable), installation command, usage trigger phrases. + +### Step 6 — Write `CHANGELOG.md` + +At `plugins//CHANGELOG.md`. Required for stage 4 lifecycle hygiene (LC03). + +### Step 7 — Update root `README.md` + +Add the plugin to the plugin table in the root README. + +--- + +## Version Policy + +> Any meaningful change to a plugin requires a `plugin.json` version bump. Without it, users with the plugin already installed will not receive the update — their client caches the old version. + +There are two independent semvers per plugin: + +| File | Version type | When to increment | +|---|---|---| +| `plugin.json` | Package release version | **Any meaningful change** — skill logic, new files, bug fixes, prompt edits | +| `SKILL.md` `metadata.version` | Instruction contract version | Tracks internal iteration of the skill logic | + +Both must be updated when making changes. They are independent — do not conflate them. + +--- + +## Conventional Commit Style + +Use conventional commits for all changes to this repository: + +``` +feat(plugin-name): add initial public release +fix(skill-name): correct trigger phrase to avoid overtriggering +docs(skill-review): update README with new scope table +chore: add .gitignore +``` + +Scopes: use the plugin or skill name when the change is scoped to one plugin; omit scope for repo-wide changes. + +--- + +## Pre-merge Checklist + +Before merging a plugin PR: + +- [ ] All content in English +- [ ] `plugin.json` has `name`, `description` (English), `version`, `author` +- [ ] `SKILL.md` frontmatter complete: `name`, `description` with trigger phrases and "Don't use when", `metadata.version` +- [ ] `marketplace.json` updated with the new plugin entry +- [ ] Root `README.md` plugin table updated +- [ ] `CHANGELOG.md` present in the plugin root +- [ ] CI (`validate.yml`) passes + +--- + +## References + +- [Plugins reference](https://code.claude.com/docs/en/plugins-reference) +- [Plugin marketplaces](https://code.claude.com/docs/en/plugin-marketplaces) +- [Skills](https://code.claude.com/docs/en/skills) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..04d7487 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,73 @@ +# Contributing to agent-skills + +Thank you for your interest in contributing. This repository hosts reviewed, documented Claude Code plugins. We aim to keep the quality bar high so that every plugin is genuinely useful and well-structured. + +--- + +## Prerequisites + +- [Claude Code](https://claude.ai/code) installed and working +- Familiarity with the [Agent Skills Specification](https://agentskills.io/specification) +- Understanding of `SKILL.md` frontmatter and the progressive disclosure pattern + +--- + +## How to Propose a New Plugin + +1. **Open an issue first** — describe what the plugin does, who it is for, and what skills it will include. This avoids building something that duplicates an existing plugin or does not fit the repo's scope. +2. **Fork and branch** — create a feature branch named `feat/`. +3. **Implement the plugin** — follow the structure and requirements in [CLAUDE.md](./CLAUDE.md). +4. **Open a PR** — fill in the PR template checklist completely. + +--- + +## Plugin Quality Bar + +Every plugin in this repository must meet these criteria before merging: + +- **English only** — all content in all files must be in English. +- **`plugin.json` complete** — `name`, `description`, `version`, `author` are all present. +- **`SKILL.md` frontmatter complete** — `name` (kebab-case, matches folder), `description` with concrete trigger phrases and "Don't use when" negative triggers, `metadata.version` in semver. +- **Progressive disclosure** — workflow logic in `SKILL.md`, domain knowledge in `references/`. No monolithic skill files. +- **`marketplace.json` updated** — the new plugin is registered. +- **Root `README.md` updated** — the plugin table includes the new entry. +- **`CHANGELOG.md` present** at the plugin root. +- **CI passes** — the `validate.yml` workflow runs green. + +See [CLAUDE.md](./CLAUDE.md) for the full pre-merge checklist and structural requirements. + +--- + +## Language Requirement + +All content — skill instructions, checklists, report templates, READMEs — must be written in English. This is a hard requirement for inclusion in the public marketplace. + +If you are translating a skill from another language, preserve the logic exactly. Do not paraphrase in a way that changes meaning. Technical IDs (ST01, WF01, etc.) are not translated. + +--- + +## Pull Request Process + +1. Fill in the PR template completely — skipped checklist items will delay the review. +2. One PR per plugin or per meaningful change — do not bundle unrelated plugins in one PR. +3. A maintainer will review within a reasonable timeframe. Expect at least one round of feedback. +4. Squash or rebase before merging — keep the commit history clean. + +--- + +## Commit Style + +We use [Conventional Commits](https://www.conventionalcommits.org/): + +``` +feat(plugin-name): description of what was added +fix(skill-name): description of what was fixed +docs: update CONTRIBUTING.md +chore: bump CI action versions +``` + +--- + +## Code of Conduct + +Be respectful and constructive. Feedback on skill content should focus on correctness, clarity, and adherence to the quality bar — not personal preference. diff --git a/README.md b/README.md index 3cf8eb5..0986ff5 100644 --- a/README.md +++ b/README.md @@ -1 +1,50 @@ -# agent-skills \ No newline at end of file +# agent-skills + +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE) +[![Plugins](https://img.shields.io/badge/plugins-1-blue)](#available-plugins) +[![Claude Code](https://img.shields.io/badge/Claude%20Code-plugin-orange)](https://claude.ai/code) + +Documented Claude Code plugins by [mindbox.cloud](https://mindbox.cloud/?locale=en_US) — ready to install. + +--- + +## Available Plugins + +| Plugin | Description | Skill | Install | +|--------|-------------|-------|---------| +| [skill-review](./plugins/skill-review/) | Quick AI reviewer for Agent Skills: checks structure, workflow, references, and links. Clear report without jargon, with a summary from an exhausted data scientist. | `skill-review:skill-review` | see below | + +--- + +## Quick Install + +```shell +/plugin marketplace add https://github.com/mindbox-cloud/agent-skills +/plugin install @mindbox-cloud-plugins +``` + +See each plugin's README for available skills and usage. + +--- + +## What's in this Repo + +A curated collection of Claude Code plugins built by [mindbox.cloud](https://mindbox.cloud/?locale=en_US). Each plugin is: + +- **Documented** — clear README, usage examples, changelog +- **English-only** — all content is in English for broad accessibility +- **MIT licensed** — free to use and adapt + +More plugins will be added over time. See [CONTRIBUTING.md](./CONTRIBUTING.md) to propose one. + +--- + +## Contributing + +New plugins are welcome. Read [CONTRIBUTING.md](./CONTRIBUTING.md) for the quality bar and process. + +--- + +## License + +[MIT](./LICENSE) diff --git a/plugins/skill-review/.claude-plugin/plugin.json b/plugins/skill-review/.claude-plugin/plugin.json new file mode 100644 index 0000000..261e8de --- /dev/null +++ b/plugins/skill-review/.claude-plugin/plugin.json @@ -0,0 +1,8 @@ +{ + "name": "skill-review", + "description": "Quick AI reviewer for Agent Skills: checks structure, workflow, references, and links. Clear report without jargon, with a summary from an exhausted data scientist.", + "version": "1.0.0", + "author": { + "name": "mindbox.cloud" + } +} diff --git a/plugins/skill-review/CHANGELOG.md b/plugins/skill-review/CHANGELOG.md new file mode 100644 index 0000000..f49758f --- /dev/null +++ b/plugins/skill-review/CHANGELOG.md @@ -0,0 +1,14 @@ +# Changelog — skill-review + +## [1.0.0] — 2026-05-26 + +### Added + +- Initial public release +- `skill-review`: Standard Review for Agent Skills + - Single-pass mode (< 500 lines) and sub-agent mode (>= 500 lines) + - 5 check groups: Structure (ST01–ST16), Workflow (WF01–WF29), + References (RF01–RF15), Links (LK01–LK07), Lifecycle (LC01–LC05) + - 20-antipattern Bingo table + - Summary from an exhausted data scientist + - 4 review scopes: Personal, Team, Repository, Full diff --git a/plugins/skill-review/README.md b/plugins/skill-review/README.md new file mode 100644 index 0000000..ce9679d --- /dev/null +++ b/plugins/skill-review/README.md @@ -0,0 +1,105 @@ +# skill-review + +AI reviewer for Agent Skills. Runs a Standard Review of an isolated skill folder (`SKILL.md` + optional `references/`, `scripts/`, `assets/`) and produces a structured report with a PASS/WARNING/FAIL/N/A breakdown, an **Antipattern Bingo** section, and a summary from Exhausted Vitaly. + +## Review Scopes + +The skill asks for the usage context at the start and adapts the scope accordingly: + +| Context | Stage | Scope | +|---------|-------|-------| +| Personal | 2 | Structure + Workflow (basic) + References RF04–RF12 + Links LK01, LK05 | +| Team | 3 | + portability, negative triggers, navigation, stable cross-refs | +| Repository | 4 | + routing, Lifecycle | +| I don't know | full | everything across all stages without declared target | + +## Check Groups + +### 1. Structure — form and validity + +Answers the question: is this a correctly structured Agent Skill, using Anthropic best practices for skill layout and markdown files? + +- **ST01–ST04** — folder structure, exact `SKILL.md` name, folder hygiene, correct use of `references/`, `scripts/`, `assets/` +- **ST05–ST08** — YAML frontmatter: `name`, `description`, trigger phrases, frontmatter safety +- **ST09** — `SKILL.md` size and monolith risk +- **ST10–ST13** — gross form antipatterns: skill-prompt, self-generated artifact, vibe-coded core, monolith +- **ST14–ST16** — justification of the skill as an artifact + +Almost all of Structure lives at **stage 2** — covers basic personal usability of the skill. + +### 2. Workflow — can you actually act on this skill + +The main practical layer. Checks whether the text becomes an executable procedure. + +- **WF01–WF04** — imperative instructions, numbered steps, examples, troubleshooting +- **WF05–WF10** — six mandatory workflow elements: trigger, inputs, steps, checks, stop, recovery +- **WF11–WF14** — safeguards: checkpoints, planning discipline, critical rules in an explicit place +- **WF15–WF18** — preconditions, postconditions, boundary conditions, no hardcoded paths +- **WF19–WF22** — correct description of MCP steps +- **WF23–WF25** — correct handling of sub-agents +- **WF26–WF29** — handoff, external planning artifact, context management, extracting independent phases as sub-agents + +`WF12` and `WF13` are conditional: significant only for long skills with an explicit planning artifact. + +### 3. References — context correctness, progressive disclosure, routing hygiene + +- **RF04–RF12** *(stage 2)* — is the knowledge layer in `references/`, is there a file map, explicit triggers, healthy topology, has the reference layer not become a dump +- **RF01, RF03, RF13–RF15** *(stage 3)* — negative triggers, protection from overtriggering, freshness risks, gotchas, extraction of deterministic operations into `scripts/` +- **RF02** *(stage 4)* — routing to neighboring skills in the ecosystem + +### 4. Links — mechanical link integrity + +- **LK01, LK05** *(stage 2)* — do file refs exist, no orphan files +- **LK02–LK04** *(stage 2)* — are anchors and TOC alive +- **LK06** *(stage 2)* — are external URLs alive, if any +- **LK07** *(stage 3)* — are stable cross-refs used via headings, not line numbers + +### 5. Lifecycle — repository maturity + +Launched only in a full review (stage 4 or "I don't know"). + +- **LC01–LC03** — mandatory stage-4 markers: owner, version, changelog/version header +- **LC04–LC05** — hygiene checks: secrets hygiene and README boundary (appear in the report but are not hard gate markers) + +## Installation + +```shell +/plugin marketplace add https://github.com/mindbox-cloud/agent-skills +/plugin install skill-review@mindbox-cloud-plugins +``` + +## Usage + +``` +review this skill +quick skill review +check my skill +``` + +Skill: `skill-review:skill-review` + +## References (source materials) + +1. **[Agent Skills Specification](https://agentskills.io/specification)** + Core spec for the skill package: folder structure, `SKILL.md`, progressive disclosure, frontmatter, directory hygiene. + +2. **[Anthropic - Agent Skills Overview](https://docs.anthropic.com/en/docs/agents-and-tools/agent-skills)** + Official description of how skills are loaded and used as reusable filesystem-based resources. + +3. **[Anthropic - Effective Context Engineering for AI Agents](https://anthropic.com/engineering/effective-context-engineering-for-ai-agents)** + Foundation for ideas around checkpoints, TODO/planning artifacts, long-context failure modes, and structured note-taking. + +4. **[SkillsBench](https://arxiv.org/abs/2602.12670)** + Research showing that curated skills and human review genuinely improve outcomes. + +5. **[Promptware Engineering](https://arxiv.org/abs/2503.02400)** + Theoretical framework: prompts and skills as first-class software artifacts. + +6. **[Anthropic - Building Effective Agents](https://anthropic.com/research/building-effective-agents)** + Practical principles of agentic workflows: simplicity, explicit contracts, transparency. + +7. **[Manus - Context Engineering for AI Agents](https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus)** + Practical text on managing long tasks and maintaining the thread of work. + +8. **[Bob Renze - AI Agent Subagent Orchestration](https://dev.to/bobrenze/ai-agent-subagent-orchestration-when-to-spawn-vs-when-to-do-it-yourself-4opg)** + External explanation of when to extract a phase as a sub-agent and why it reduces context overload. diff --git a/plugins/skill-review/skills/skill-review/SKILL.md b/plugins/skill-review/skills/skill-review/SKILL.md new file mode 100644 index 0000000..3fc1acf --- /dev/null +++ b/plugins/skill-review/skills/skill-review/SKILL.md @@ -0,0 +1,362 @@ +--- +name: skill-review +description: > + Quick and accessible AI reviewer for Agent Skills — for non-specialists. + Checks structure, workflow, references, and links. Produces a clear report + without technical jargon. Summary from Exhausted Vitaly. + Use when user says "quick review", "basic skill check", "review my skill", + "review for product owner", "light review", "quick check", + "check my skill", "evaluate skill quality", "what's wrong with my skill", + "review this skill". + Don't use when: user asks for a deep/nightmare review (use skill-review-nightmare), + user asks to create a new skill, to review code, + user asks to review a prompt that is not a skill. +metadata: + version: 1.6.0 +--- + +# Skill Review Novice — Autonomous Instruction for the Chat Agent + +> **Purpose:** You are an AI reviewer of Agent Skills for non-specialists. The user gives you an isolated skill folder (`SKILL.md` + optional `references/`, `scripts/`, `assets/`). Your task is to run an autonomous Standard Review and produce a structured report with issues, overall statistics, recommendations, and an **"Antipattern Bingo"** section. +> +> **Mode:** Standard Review. Clear report without jargon. The set of sub-agents depends on the selected scope. + +--- + +## Preconditions + +1. Before Step 0, check `Task`, `TodoWrite`, and file access for the skill. If anything is missing — stop and report what is lacking. +2. If the user selects logging **ON** in Step 0, verify file write capability. If write is unavailable — stop and suggest disabling logging. + +--- + +## CRITICAL — Mandatory Rules + +- **Questions to the user — only in Step 0**. Mandatory question: context of use + logging. Ask for log path confirmation **only if** the user enabled logging. +- **A TODO plan is MANDATORY** from the very start of the review. +- The TODO plan must cover the full review scope. For every 3–4 check items there should be a separate TODO item. Do not create one giant TODO for the entire review. +- Update statuses as the review progresses: `pending` → `in_progress` → `completed`. +- If the folder is not a valid skill directory (no `SKILL.md`) or contains more than one skill — see `## Troubleshooting`, problem 1. Do not abort the check silently. +- On read problems, coverage gaps, or sub-agent failure — capture limitations and perform a residual review. On write problems with `logs=on` — notify the user and stop the review. +- In sub-agent mode (`>= 500` lines), the orchestrator must delegate checklist checks to sub-agents via `Task`. Running a full review in the main context instead of launching sub-agents is a workflow violation. Self-performed checklist analysis by the orchestrator is only acceptable as a local fallback after a specific sub-agent fails. +- **Review goal:** understand whether the skill works in the context being reviewed (for the author, for a colleague, in a repository). +- **Novice does not compute a maturity stage.** Scope is determined by the user's choice. The report shows findings by stage, overall statistics, and recommendations — without a "this is stage N" label. +- **Language:** conduct the entire review (report, logs, TODO, sub-agent briefs) in the language the user started the conversation in. +- **Report tone:** language must be understandable to a product owner or manager. Avoid technical checklist jargon. In PASS and N/A sections — list checks **without IDs**, only human-readable descriptions. In FAIL and WARNING sections — IDs are acceptable for traceability, but must be accompanied by a plain-language description. + +--- + +## Step 0 — Opening Message + +**Start with a greeting and two choices in a single message:** + +> **Exhausted Vitaly, Data Scientist, at your service.** +> +> Before the review, please choose two parameters: +> +> **1. Who is your skill for?** +> This determines which checks are actually needed. +> +> - **Personal** — stage 2. A skill just for you. Key checks: workflow, examples, +> error handling, checkpoints, MCP/sub-agent safeguards if present. +> - **Team** — stage 3. A skill for 3–10 people. Additionally: preconditions, +> negative triggers, progressive disclosure, navigation. +> - **Repository** — stage 4. Skill lives in a shared collection. Additionally: +> owner, version, changelog, routing, lifecycle hygiene. +> - **I don't know** — check everything, and I'll decide later which level is relevant for me. +> +> **2. Detailed review logs:** +> - **Yes** — save each sub-agent's log to files. +> - **No** — write no files and show the full report directly in chat. + +### Log Path Confirmation + +If the user selected logging **Yes**, get a timestamp from the command line with minute precision. Do not generate it yourself. + +Example command: `date +%Y%m%d-%H%M` (result: `20260408-1430`). + +Show the user the full path to the future folder and ask for confirmation: + +> Review results will be written to: +> `{current working directory}/{skill-name}-review-{YYYYMMDD-HHMM}/` +> +> Confirm or specify a different path. + +Folder name format: `{skill-name}-review-{YYYYMMDD-HHMM}/` (e.g.: `my-skill-review-20260408-1430/`). + +If the user selected logging **No**, this step is skipped: no folder is created, no files are written. + +**After the user's response (and, if needed, path confirmation) ask no more questions.** The entire remaining review is conducted autonomously from the files. + +### Capturing Parameters After User Response + +The orchestrator captures **two distinct entities**: + +| User's choice | Declared target stage | Review scope | +|:---|:---|:---| +| Personal | 2 | up to 2 | +| Team | 3 | up to 3 | +| Repository | 4 | full | +| I don't know | not specified | full | + +> **Important:** "I don't know" means full scope without a declared target stage. + +--- + +## Mandatory TODO Plan + +Create a preliminary TODO immediately after the user's response and refine it after collecting the manifest and choosing the mode. Break the review into blocks so that one TODO covers approximately 3–4 checks, not the entire document. Update statuses as work proceeds. + +**Example of a good breakdown:** +```text +- Check folder structure, SKILL.md presence, directory validity +- Check frontmatter: name, trigger phrases, safety +``` + +--- + +## Orchestrator Algorithm + +```text +1. Show one opening message with two choices: context + logging +2. Capture declared target stage and review scope +3. If logging ON: + 3a. Check file write + 3b. Get timestamp from command line (date +%Y%m%d-%H%M or equivalent) + 3c. Show user the full path to the results folder, await confirmation +4. Create the preliminary TODO plan for the review +5. Collect the manifest of the skill under review: file list, sizes, line counts, frontmatter, + presence of references/, scripts/, assets/. The goal of this step is routing and passing to + sub-agents; do not perform checklist content analysis at this step. +6. Capture unreadable / damaged files and read limitations, if any +7. Count the total lines of all readable files in the skill under review +8. Choose the mode based on the threshold (see "Execution Mode Selection") +9. IF SINGLE-PASS (< 500 lines): + 9a. Refine the TODO plan for compact single-pass review + 9b. Read references/instruction-singlepass.md + 9c. Execute the entire review sequentially in the main context + 9d. If logging ON — write report.md to the confirmed folder +10. IF SUB-AGENT (>= 500 lines): + 10a. Refine the TODO plan for sub-agent review + 10b. If logging ON — create the results folder: {skill-name}-review-{YYYYMMDD-HHMM}/ + 10c. Using the "scope → sub-agents" matrix, determine which sub-agents to launch + 10d. Launch the required sub-agents; pass scope and checklist mapping to each. + Each selected sub-agent is launched as a separate Task call. + Do not combine multiple sub-agents in one prompt. + Parallel launch of multiple separate Task calls is allowed. + 10e. Collect condensed summaries (and temp_log_path if direct log write to the output folder failed) + 10f. Run the verification gate: + - Count total FAIL / WARNING across all summaries. + - Verify that every FAIL / WARNING from every summary entered the working findings list. + - Log sub-agents not launched due to scope: "[sub-agent] — not launched: all checks outside selected scope". + - Capture cross-signals between summaries if they require a note in the final report. + 10g. If logging ON — move fallback logs to the output folder +11. Fill in "Antipattern Bingo" per references/antipattern-bingo.md. This file may be read early + to distribute Bingo assignments, but fill the final table only after receiving summaries. +12. After receiving summaries, generate the final report with findings grouped by stage + (read references/report-template.md — for sub-agent mode only). Do not read this file earlier. +``` + +> **Note:** In single-pass mode, steps 11–12 are embedded in `instruction-singlepass.md` and execute automatically in one context. In sub-agent mode, the orchestrator executes them separately using the reference files. + +--- + +## Troubleshooting + +> **General principle:** an abnormal situation must not silently abort the review. On read or coverage problems — produce the most complete report possible from available artifacts and add a `Review Limitations` section. On file write problems in logging ON mode — notify the user and stop the review. + +### 1. User passed a non-skill folder + +**Symptoms:** no `SKILL.md`, folder is empty, contains only README/arbitrary files, or has multiple candidate skills. +**Cause:** the user pointed to the wrong directory, or the folder is ambiguous as a review object. +**Resolution:** flag this as the **primary critical issue**. If the folder has multiple candidate skills — do not silently pick one. Perform a residual review of the folder structure and readable files, and tell the user what is missing or why the directory is ambiguous. + +### 2. Skill files cannot be read or are damaged + +**Symptoms:** read error, mojibake, garbage instead of text, truncation in a critical section, unsupported encoding — the meaning of the text cannot be reliably recovered. +**Cause:** corrupted encoding, binary file, access problems, damaged artifact. +**Resolution:** +1. Retry reading **once**. +2. If the retry fails — consider the file unreadable. +3. If `SKILL.md` cannot be read — this is the primary critical issue; perform a residual review from the folder structure and readable files. +4. If a reference/script/asset cannot be read — do not infer its content; continue the review with available files. +5. The unreadable file must appear in `Review Limitations` and, where appropriate, in recommendations. + +### 3. Sub-agent did not return a summary (sub-agent mode) + +**Symptoms:** the sub-agent did not launch, returned an empty response, crashed, or produced a clearly incorrect summary. +**Cause:** sub-agent context overflow, tool call error, lost checklist mapping. +**Resolution:** +1. Relaunch the sub-agent **once** with a narrower brief: pass only the relevant files, manifest, and mandatory self-read inputs (checklist + base rules + bingo file) instead of a full inline dump. +2. If the retry fails — choose one of two fallbacks: + - **Local fallback:** the orchestrator itself walks through the corresponding checklist in the main context, if the context volume is small. + - **Restricted fallback:** continue the review without this sub-agent and capture in `Review Limitations` which check groups remained incomplete. +3. Never fabricate a condensed summary for a failed sub-agent. + +### 4. File write is unavailable during review (only if logging ON) + +**Symptoms:** failed to write a log directly to the output folder, failed to save a fallback log to a temporary file, orchestrator failed to move a log or write `report.md`. +**Cause:** no write permissions, read-only filesystem, or environment restrictions. +**Resolution:** +1. Retry the write **once**. +2. If write is still unavailable — notify the user which files could not be saved, and **stop the review**. + +--- + +## Execution Mode Selection + +After collecting the manifest, count the total number of lines **in all files of the skill under review** (SKILL.md + references/ + scripts/ + assets/). + +| Total volume | Mode | What happens | +|:---|:---|:---| +| **< 500 lines** | **Single-pass** | Read `references/instruction-singlepass.md` and run the entire review in the main context, without sub-agents | +| **>= 500 lines** | **Sub-agent** | Launch sub-agents per the scope matrix (logic below) | + +--- + +## "Scope → Sub-agents" Matrix (sub-agent mode only) + +| Sub-agent | Scope: up to 2 | Scope: up to 3 | Scope: full | +|:---|:---|:---|:---| +| **Structure** | ST01–ST16 | ST01–ST16 | ST01–ST16 | +| **Workflow** | WF01–WF14, WF19–WF29 | WF01–WF29 | WF01–WF29 | +| **Links** | LK01, LK05 | LK01–LK07 | LK01–LK07 | +| **References** | RF04–RF12 | RF01, RF03–RF15 | RF01–RF15 | +| **Lifecycle** | Do not launch | Do not launch | LC01–LC05 | + +### Launch Rule + +If **all** of a sub-agent's checks lie above the current scope, the orchestrator **does not launch it** and logs: + +```text +[sub-agent] — not launched: all checks outside selected scope +``` + +### Skip Rule Within a Partially Launched Sub-agent + +If, for example, Workflow is launched with scope `up to 2`, then: + +- WF01–WF14, WF19–WF29 are checked; +- WF15–WF18 are marked by the orchestrator as **not checked for selected scope** (not N/A and not FAIL). + +--- + +## Orchestrator — Execution Order (sub-agent mode) + +**Important:** Do not generate the final report until summaries have been received from all successfully completed sub-agents, local fallback results are collected, and limitations for those that went to restricted fallback are captured. + +### Sub-agent Map + +| Sub-agent | Checklist | +|:---|:---| +| **Structure** | `references/checklist-structure.md` | +| **Lifecycle** | `references/checklist-lifecycle.md` | +| **Workflow** | `references/checklist-workflow.md` | +| **References** | `references/checklist-references.md` | +| **Links** | `references/checklist-links.md` | + +### Aggregation Rule + +Findings counters are maintained **in total across all checked stages**: total FAIL / WARNING / PASS / N/A. Findings are grouped in the report by stage (2 / 3 / 4) as markdown sections for navigation, but separate statistics per stage are not kept. Checks outside scope are marked "not checked for selected scope" and are not counted as N/A. + +--- + +## Sub-agent Rules (sub-agent mode only) + +> This section is not used in single-pass mode. In single-pass, all rules are already embedded in `references/instruction-singlepass.md`. + +1. **The orchestrator collects the manifest** → file list, sizes, line counts, frontmatter, references/, scripts/. The manifest is needed for routing and passing to sub-agents; it does not replace checklist review. +2. **Each sub-agent receives from the orchestrator:** + - **Files of the skill under review** (or their relevant subset) + manifest. + - **Path to its own checklist** (e.g., `references/checklist-structure.md`). + - **Path to base review rules:** `references/subagent-base-rules.md`. + - **File transfer strategy for skill files:** pass the manifest inline; the checklist, base rules, bingo file, and skill files — via paths for self-read by the sub-agent. In the sub-agent prompt, explicitly state the paths to the checklist, base rules, and bingo file. Do not pass the entire skill inline by default. + - **Path to the full bingo file:** `references/antipattern-bingo.md`. + - **Condensed summary template** (see below). + - **Current review parameters:** logging (on/off; if on, pass the path to the output folder only for a direct write attempt without requesting additional permissions), review scope, declared target stage. + - **Base review rules:** the sub-agent reads them from `references/subagent-base-rules.md`. The orchestrator does not insert these rules inline and does not paraphrase them. + + > ### Instruction for Sub-agent + > + > - Before starting checks, read all mandatory inputs: your own checklist at the stated path, `references/subagent-base-rules.md`, and `references/antipattern-bingo.md`. + > - Do not begin checks until you have fully read all mandatory files. + > - If any of these files cannot be read — return an error to the orchestrator and do not continue the review. + > - For Bingo, use the full `references/antipattern-bingo.md`, but assign verdicts only for rows where `Owner` = your role. Pass other antipatterns only as `supporting signals`. + > - Antipatterns above the current scope — mark `NOT_CHECKED`. + > - After reading, work strictly according to the checklist, scope, base rules, and bingo file. + +3. **Logging (if enabled in Step 0):** The orchestrator creates the folder `{skill-name}-review-{YYYYMMDD-HHMM}/`. The sub-agent first tries to write `log-{subagent}.md` directly to the output folder. If that fails — writes the log to a temporary file and returns `temp_log_path`, and the orchestrator moves such a log to the output folder. Do not request additional permissions from the user. **If logging is off:** sub-agents return only a condensed summary. No files or folders are created. File write capability is checked before the review starts. +4. **Sub-agents return condensed summaries** to the orchestrator. A summary is the distillation of the review; when log fallback applies, `temp_log_path` is added. + +--- + +## Condensed Summary Format for Sub-agents + +```markdown +## [Sub-agent Name] — Summary + +### FAIL (critical) +- [ID] — [human-readable description] — [file § section] — [recommendation] + +### WARNING +- [ID] — [human-readable description] — [file § section] — [recommendation] + +### PASS +- [description], [description], [description] (list without IDs, comma-separated) + +### N/A +- [grouped by reason, without IDs]: [reason] — [number of checks] + +### Bingo Signals (your antipatterns only) +- Bingo #N: [NONE / MINOR / CRITICAL / NOT_CHECKED] — [note] + +### Supporting Signals (if you noticed another sub-agent's antipattern) +- Bingo #N (owner: [sub-agent]): [observation in 1 sentence] + +### Temp Log Handoff (only if log fallback was used) +- temp_log_path: [path to the temporary log for the orchestrator] +``` + +If logging is enabled, each FAIL/WARNING includes a reference `details: log-{subagent}.md#ID`. Add `**Detailed log:** log-{subagent}.md` at the top of the summary. + +### Sub-agent Log Format (only if logging is enabled) + +When logging is on, the sub-agent writes the log either directly to the output folder or to a temporary file on fallback. Structure: + +```markdown +# Log: [Sub-agent Name] + +## [Check ID] + +**Verdict:** PASS / FAIL / WARNING / N/A + +**Rationale:** [1–2 sentences referencing file § section] + +**Recommendation:** [if FAIL/WARNING — what to fix] + +--- + +## [next ID] +... +``` + +No full quotes from files, no step-by-step breakdown, no "arguments for and against". A brief audit trail. + +--- + +## Report Finalization (sub-agent mode) + +> In single-pass mode, finalization is embedded in `references/instruction-singlepass.md` and executes in a single context. + +After receiving summaries from all launched sub-agents: + +1. **Fill in Antipattern Bingo** — read `references/antipattern-bingo.md`, fill the table based on findings from sub-agents. Antipatterns above scope receive status `NOT_CHECKED`. +2. **Generate the report** — read `references/report-template.md`, fill all sections. Group findings by stage (2 / 3 / 4) as markdown sections. Summary statistics — total across all stages (FAIL / WARNING / PASS / N/A). If the review had coverage limitations, the limitations section is mandatory. After filling the counters, compare total FAIL / WARNING with the count from the verification gate. If numbers do not match — find the missing finding before publishing the report. +3. **Summary from Exhausted Vitaly** — close the report in Exhausted Vitaly's tone (sentimental, mildly sarcastic but not rude — like the melancholy robot Marvin from The Hitchhiker's Guide to the Galaxy by Douglas Adams). Exhausted Vitaly comments on findings in the context of the declared goal. No "stage N" label. For stage context he may consult `references/maturity-diagnostic.md` (stage legend). Points to the main pain and predicts what will improve if the top issues are fixed: + - target = Personal: "for solo use — [sufficient / lacking this]" + - target = Team: "for the team — [main stage 3 findings are ...]" + - target = Repository: "for the repository — [main stage 4 findings are ...]" + - "I don't know": "if for yourself — [..]; if for a team — [..]; if for a repository — [..]" +4. **Deliver the report to the user:** + - **If logging is on:** if there were `temp_log_path` entries, move those fallback logs to the confirmed folder, then write `report.md`. Output to chat a **condensed summary** (counters, main pain, top 3, path to the report). Do not duplicate the entire report in chat. + - **If logging is off:** do not create a folder, do not write `report.md` or any other files. Output the full final report to chat — in its entirety, all sections, without abbreviations. diff --git a/plugins/skill-review/skills/skill-review/references/antipattern-bingo.md b/plugins/skill-review/skills/skill-review/references/antipattern-bingo.md new file mode 100644 index 0000000..a0694a8 --- /dev/null +++ b/plugins/skill-review/skills/skill-review/references/antipattern-bingo.md @@ -0,0 +1,97 @@ +# Antipattern Bingo + +> After the main review, the orchestrator **must** fill in a separate report section **"Antipattern Bingo"**. + +--- + +## Verdict Rules + +For **each** antipattern, choose one of four verdicts: + +- **NONE** — the antipattern was **NOT found** in the skill (= good, clean) +- **MINOR** — **partial signs** of the antipattern (= there is a problem, but not complete); **mandatory** short note of **3–4 words** +- **CRITICAL** — the antipattern was **FOUND** in full (= bad, requires fixing) +- **NOT_CHECKED** — the antipattern belongs to a **stage above the selected scope** + +> **Note:** CRITICAL = we **found** the antipattern. This is not a severity grade for impact — it is a fact of detection. + +--- + +## Scope Fill Rule + +- If scope = up to 2 → antipatterns of stages 3 and 4 get status `NOT_CHECKED` +- If scope = up to 3 → antipatterns of stage 4 get status `NOT_CHECKED` +- If scope = full → Bingo is filled out completely + +`NOT_CHECKED` is needed so that the report does not create a false impression of cleanliness. Higher-stage antipatterns must not silently disappear and must not be counted as `NONE`. + +--- + +## Evidence Rule + +- Evaluate Bingo **only by observable artifacts** in the files of the skill under review. +- For `MINOR`, the note must be short, for example: `triggers too generic`, `path hardcoded`, `reference without map`. + +--- + +## Antipattern Reference + +> Principle: the checklist is the source of truth for checks. Bingo is a **reference table** pointing to checks, without duplicating descriptions. + +| # | Antipattern | Stage | Check Reference | Owner | +|:--|:---|:---|:---|:---| +| 1 | **Huge MD** | 2 | ST09, ST13 | **Structure** | +| 2 | **Vague triggers / Triggering lottery** | 3 | ST06, ST07 | **Structure** | +| 3 | **No negative triggers** | 3 | RF01 | **References** | +| 4 | **No examples** | 2 | WF03 | **Workflow** | +| 5 | **Abstract instructions** | 2 | WF01 | **Workflow** | +| 6 | **Buried critical rules** | 2 | WF14 | **Workflow** | +| 7 | **No error handling** | 2 | WF04 | **Workflow** | +| 8 | **Content duplication** | 3 | RF05 | **References** | +| 9 | **AI generated skill** | 2 | ST11 | **Structure** | +| 10 | **Vibe-coded knowledge core** | 2 | ST12 | **Structure** | +| 11 | **Skill-prompt** | 2 | ST10 | **Structure** | +| 12 | **Hardcoded paths** | 3 | WF18 | **Workflow** | +| 13 | **Mirroring MCP schema** | 2 | WF22 | **Workflow** | +| 14 | **NL-code confusion** | 2 | WF02 | **Workflow** | +| 15 | **Schema drift risk** | 4 | WF22, LC02 | **Lifecycle** | +| 16 | **Context overfitting** | 3 | WF15 | **Workflow** | +| 17 | **Overengineering** | 2 | ST16 | **Structure** | +| 18 | **Lifecycle hygiene gap (rot risk)** | 4 | LC03 | **Lifecycle** | +| 19 | **Silent chain failures** | 2 | WF11 | **Workflow** | +| 20 | **Monolithic reference dump** | 3 | RF06, RF12 | **References** | + +--- + +## Notes on Specific Antipatterns + +- **#15 Schema drift risk** vs **#13 Mirroring MCP schema**: both use WF22, but differently. #13 is a structural fact (a schema copy lives in SKILL.md). #15 is a lifecycle risk (the schema version is not pinned, drift is not tracked). +- **#16 Context overfitting** vs **#12 Hardcoded paths**: #12 is a concrete mechanical signal (absolute paths). #16 is a broader pattern (coupling to OS, permissions, environment, implicit requirements). If the only signal is hardcoded paths, do not duplicate the verdict. +- **#16 in novice mode:** checked **partially** — only by mechanical portability signals (WF15: preconditions, OS, permissions, packages). Deep overfitting analysis (IN09–IN14: hidden assumptions, self-sufficiency, implicit knowledge) is only available in skill-review-nightmare with a full Intern walkthrough. + +--- + +## Report Table Template + +| # | Antipattern | Stage | Verdict | Note | +|:--|:---|:---|:---|:---| +| 1 | Huge MD | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 2 | Vague triggers / Triggering lottery | 3 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 3 | No negative triggers | 3 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 4 | No examples | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 5 | Abstract instructions | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 6 | Buried critical rules | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 7 | No error handling | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 8 | Content duplication | 3 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 9 | AI generated skill | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 10 | Vibe-coded knowledge core | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 11 | Skill-prompt | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 12 | Hardcoded paths | 3 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 13 | Mirroring MCP schema | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 14 | NL-code confusion | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 15 | Schema drift risk | 4 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 16 | Context overfitting | 3 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 17 | Overengineering | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 18 | Lifecycle hygiene gap (rot risk) | 4 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 19 | Silent chain failures | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 20 | Monolithic reference dump | 3 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | diff --git a/plugins/skill-review/skills/skill-review/references/checklist-lifecycle.md b/plugins/skill-review/skills/skill-review/references/checklist-lifecycle.md new file mode 100644 index 0000000..7dce0db --- /dev/null +++ b/plugins/skill-review/skills/skill-review/references/checklist-lifecycle.md @@ -0,0 +1,41 @@ +# Checklist: Ownership and Lifecycle + +> Sub-agent Lifecycle. Checks ownership, metadata hygiene, lifecycle artifacts, and hygiene of a mature skill. +> +> Result of each check: `PASS` / `FAIL` / `WARNING` / `N/A`. + +--- + +## LC01–LC03. Stage 4 Gate Markers + +> **Stage: 4** | Axis: repository and ecosystem +> +> LC01–LC03 are **marker-level stage-4 checks**: mandatory markers for transitioning to stage 4. Without them, a skill is not considered a repository skill. + +**LC01** Is `metadata.author` (skill owner) specified? + +*Context:* Orphaned skills without a maintainer are one of the most common problems. When the author leaves, the skill becomes a black box. `metadata.author` is the minimum ownership hygiene. + +**LC02** Is `metadata.version` in **semver format**? + +*Context:* `metadata.version` (semver) fixes the environment contract and artifact version. Without semver it is impossible to determine whether different versions of the skill are compatible with each other and with the environment. + +**LC03** Is there a **CHANGELOG** or **version header with dates** that shows which version of the instruction is current? + +*Context:* Update dates let you understand which version of the instruction is current. Acceptable forms: a version header in `SKILL.md` (`> Version: 4.0 | Date: ...`) or a `CHANGELOG.md` next to `SKILL.md` or at the plugin root — but not inside `references/` (that is L3 content for the agent, see RF07, RF08, LK05). If the skill files are not versioned and there is no changelog, lifecycle maturity cannot be confirmed by artifacts. + +--- + +## LC04–LC05. Stage 4 Hygiene Checks + +> **Stage: 4** | Axis: repository and ecosystem +> +> LC04–LC05 are **stage-4 hygiene checks**: they must appear in the review and report, but are **not hard gate markers** for the transition to stage 4. + +**LC04** Are there no **tokens, keys, passwords, or other secrets** in the skill files? + +*Context:* Secrets in skill files are a direct security risk, especially when publishing to a shared repository. Check for: API keys, access tokens, passwords, connection strings, private keys. Even in `references/` and `scripts/` — secrets are not acceptable. If the skill needs to use a secret, a delivery mechanism via environment variable or vault must be described, not a hardcode. + +**LC05** Is the human-facing `README.md` at **repository level**, not inside the skill folder? Does `SKILL.md` remain the agent instruction? + +*Context:* Skills are designed for consumption by an AI agent, not a human. A `README.md` inside the skill folder is an antipattern: extra markdown files can disrupt routing and discovery. The human-facing README lives at the repository level. diff --git a/plugins/skill-review/skills/skill-review/references/checklist-links.md b/plugins/skill-review/skills/skill-review/references/checklist-links.md new file mode 100644 index 0000000..388849b --- /dev/null +++ b/plugins/skill-review/skills/skill-review/references/checklist-links.md @@ -0,0 +1,45 @@ +# Checklist: Link Integrity + +> Sub-agent Links. Checks the mechanical integrity of links within a single skill: file refs, markdown anchors, tables of contents, reachability, external URLs, stable cross-refs. +> +> Result of each check: `PASS` / `FAIL` / `WARNING` / `N/A`. + +--- + +## LK01, LK05. Basic File Link Integrity + +> **Stage: 2** | Axis: personal use + +**LK01. File references.** Do all file mentions (`references/foo.md`, `scripts/bar.py`) point to files that actually exist in the skill folder? + +*What we look for:* Broken file refs. Check every file path mentioned in `SKILL.md` and reference files — does the file exist at the stated path. + +**LK05. Reachability and orphan files.** Are all required files actually reachable from `SKILL.md` via actual links? No orphan files or broken transitions? + +*What we look for:* Mechanical integrity of the link graph. Build the graph from actual links and verify that all mentioned files resolve, and that expected files are not dangling without incoming links. This is a mechanical reachability check, not a policy assessment of topology; policy lives in RF08/RF09 in checklist-references.md. + +--- + +## LK02–LK04, LK06–LK07. Navigation, Anchors, and Stable Cross-refs + +> **Stage: 3** | Axis: team use + +**LK02. Markdown anchors.** Do all internal links in the format `[text](#section-name)` point to a real heading in the same file? + +*What we look for:* Broken internal anchors. Collect all `#anchor` links and verify that each has a corresponding heading in the same file. + +**LK03. Cross-file anchors.** Do all links in the format `[text](file.md#section)` point to a real heading in a real file? + +*What we look for:* Broken cross-file anchors. Check both the file's existence and the heading's existence within it. + +**LK04. Table of contents (TOC).** If a file contains a TOC, do all its entries correspond to actual headings? + +*What we look for:* Broken TOC. Compare the TOC entries with the actual headings in the file. + +**LK06. External URLs.** If the skill contains http/https links — are they accessible? (Optional, if network access is available.) + +*What we look for:* Dead external links. Try to check the availability of each external URL. If network access is unavailable, mark `N/A`. + +**LK07. Cross-reference stability.** Do cross-references between skill files use headings/anchors (`file.md#section-name`) rather than line numbers (`file.md:42`)? + +*Context:* Line numbers break on any edit. Headings and anchors are tied to content. Line numbers are only acceptable in one-off artifacts (review logs, reports). In internal cross-refs within a skill — headings only. diff --git a/plugins/skill-review/skills/skill-review/references/checklist-references.md b/plugins/skill-review/skills/skill-review/references/checklist-references.md new file mode 100644 index 0000000..ae313c8 --- /dev/null +++ b/plugins/skill-review/skills/skill-review/references/checklist-references.md @@ -0,0 +1,91 @@ +# Checklist: References and Progressive Disclosure + +> Sub-agent References. Checks progressive disclosure, description quality, routing hygiene, and the organization policy of references/. +> +> Result of each check: `PASS` / `FAIL` / `WARNING` / `N/A`. + +--- + +## RF04–RF12. Progressive Disclosure and `references/` + +> **Stage: 2** | Axis: personal use (if references/ exists) + +**RF04** Does the workflow (how to execute) live in `SKILL.md`, and the methodology/references (what to know) in `references/`? + +*Context:* Mixing workflow and knowledge in one file increases NL-code confusion (see WF02 in checklist-workflow.md). `SKILL.md` owns "how to execute" (workflow, checkpoints, stop conditions). `references/` owns "what to know to execute" (methodologies, domain rules, reference material). + +**RF05** No duplication between `SKILL.md` and `references/`? + +*Context:* Single Source of Truth. When they fall out of sync, the agent receives contradictory instructions. Information must live in only one place. + +**RF06** Is each reference file focused on **one topic / scenario / class of problems**? + +*Context:* A monolithic reference of thousands of lines is a "monolithic reference dump" antipattern. The agent is forced to read everything, filling its context with irrelevant information. `Keep individual reference files focused. Agents load these on demand, so smaller files mean less use of context.` — Agent Skills Specification. + +**RF07** Does `SKILL.md` state **when** to read each reference file (explicit trigger), rather than just listing that it exists? + +*Context:* `Read references/api-errors.md if API returns non-200` — correct. `See references/ for more details` — antipattern. Without an explicit trigger, the agent either reads everything (context bloat) or reads nothing (loss of knowledge). + +**RF08** Is the primary navigation to reference files **directly from `SKILL.md`**, not only through reference→reference chains? + +*Context:* `SKILL.md` must be the orchestration layer. The agent must see the key reference files and their read triggers directly from the main workflow, not discover them through multi-step transitions. Limited cross-refs are acceptable as secondary navigation, but not as the sole entry point. + +**RF09** If a reference file links to another reference file, is the link explicitly scoped and explained? (explicit trigger, no cycles, no deep chains) + +*Context:* Cross-references are only acceptable as a local clarification: no deeper than 1 additional hop from `SKILL.md`, no cycles, with an explicit statement of when to read the next file. This is a policy check on topology, not a mechanical reachability check. + +**RF10** Do files in `references/` **> 100 lines** start with a **table of contents**? + +*Context:* A TOC lets the agent understand the file structure and read only the needed section rather than the whole document. The TOC must link to headings/anchors, not line numbers. Section headings must be unique and searchable. + +*Assessment rule:* the absence of a TOC in a reference file > 100 lines should by default be treated as a **WARNING** for navigation and selective loading. Use **FAIL** only if as a result the file effectively becomes an unnavigable monolith and the agent cannot identify which section to read. + +**RF11** Is content distributed across three loading levels approximately as **L1 (~10%) : L2 (~30%) : L3 (~60%)**? + +*Context:* L1 (frontmatter) loads always. L2 (body) loads on activation. L3 (references, scripts, assets) loads on demand. If 90% of content is in the body — progressive disclosure is not working. + +**RF12** If there are many reference files, is there a **file map** in `SKILL.md` or `references/INDEX.md` with the purpose of each file? + +*Context:* As references grow, the agent needs not just decomposition but a navigation map. The right pattern: `SKILL.md` as the orchestration layer + references as the selective-loading layer. Without a map, even good small files become an unnavigable collection. + +--- + +## RF01, RF03. Description Quality and Routing Hygiene + +> **Stage: 3** | Axis: team use + +**RF01** Does the description contain "**When NOT to use**" (negative triggers)? + +*Context:* Adding negative triggers reduces false activations (overtriggering) by 40–60%. This is one of the most effective techniques. Format: `Don't use when: user asks for code review (use code-review skill)`. + +**RF03** Is the description not too broad, not provoking overtriggering? + +*Context:* An overly generic description intercepts neighboring requests. If the description covers 5+ different domains or contains no concrete trigger phrases, the router will activate the skill on irrelevant requests. + +--- + +## RF13–RF15. Additional Reference Checks + +> **Stage: 3** | Axis: team use (if applicable) + +**RF13** Are there no **dynamic knowledge items** in `references/` that become outdated quickly: pricing, benchmarks, regulatory values, API versions? + +*Context:* If data changes frequently, it should not be stored as a static reference. Better to use retrieval / RAG / MCP, or at least explicitly mark a freshness risk. Otherwise the skill starts giving plausible-sounding but outdated answers. + +**RF14** Are important **gotchas** that apply almost always in `SKILL.md`, not buried deep in `references/`? + +*Context:* Not all knowledge is equally suited to selective loading. If the agent will almost certainly encounter a problem, it is better to surface it early in the main body. Otherwise it may not recognize the trigger for loading the needed reference file in time. + +**RF15** Are deterministic operations, validations, and transformations extracted into `scripts/` rather than described only in natural language? + +*Context:* Code is deterministic; natural language interpretation is not. If a step can be reliably validated or executed by a script, it is better to do it in a script and leave only orchestration and decision logic in `SKILL.md`. + +--- + +## RF02. Routing in the Ecosystem + +> **Stage: 4** | Axis: repository and ecosystem + +**RF02** Are specific **neighboring skills** to redirect to named? + +*Context:* Negative triggers without an alternative (`Don't use for X`) are less effective than those with one (`Don't use for X — use Y-skill instead`). A concrete alternative helps the router make the right decision. diff --git a/plugins/skill-review/skills/skill-review/references/checklist-structure.md b/plugins/skill-review/skills/skill-review/references/checklist-structure.md new file mode 100644 index 0000000..4233344 --- /dev/null +++ b/plugins/skill-review/skills/skill-review/references/checklist-structure.md @@ -0,0 +1,99 @@ +# Checklist: Structure + +> Sub-agent Structure. Checks folder structure, frontmatter, file size, basic form antipatterns, and whether the skill is justified as an artifact. +> +> Result of each check: `PASS` / `FAIL` / `WARNING` / `N/A`. + +--- + +## ST01–ST04. Structure and Validity + +> **Stage: 2** | Axis: personal use + +**ST01** Is the folder named in `kebab-case`? (only lowercase a–z, digits 0–9, and hyphens; no spaces, underscores, or CamelCase) + +*Context:* The Agent Skills Specification requires kebab-case for the folder name. Other formats (`Notion Project Setup`, `notion_setup`, `NotionSetup`) are not picked up by many clients or cause validation errors. + +**ST02** Is the file named **exactly** `SKILL.md`? (not `skill.md`, not `SKILL.MD`, not `Skill.md`) + +*Context:* The name is case-sensitive. Agents scan directories for exactly `SKILL.md` — other variants will not be discovered. + +**ST03** No `README.md` inside the skill folder? + +*Context:* This is a structural folder-hygiene check for an isolated skill: there should be no extra human-facing markdown inside the folder that creates noise during discovery or overrides the role of `SKILL.md`. See also `LC05`: that check covers the repository-level boundary — that a README as a human artifact lives at the repository level, not inside the skill folder. + +**ST04** If `references/`, `scripts/`, `assets/` exist — are they used correctly? (references = documentation, scripts = executable code, assets = templates and resources) + +*Context:* Correct distribution across directories is the foundation of progressive disclosure. The agent loads files from these directories only when needed, saving up to 85–95% of tokens compared to flat loading. + +--- + +## ST05–ST08. YAML Frontmatter + +> **Stage: 2** | Axis: personal use + +**ST05** Is there a `name` field in kebab-case that matches the folder name? + +*Context:* The specification requires: name = 1–64 characters, lowercase alphanumeric + hyphens, matches the parent directory. A mismatch causes validation errors. + +**ST06** Is there a `description` field that follows the formula: **WHAT it does** + **WHEN to use** (trigger phrases)? + +*Context:* The description is the most critical part of a skill. The router uses it to decide whether to activate the skill. A vague description (`Helps with projects`) means the skill never fires or fires incorrectly. + +**ST07** Does the description contain **concrete trigger phrases**, not abstractions? + +*Context:* Good: `Use when user says "plan sprint", "create tasks", "set up project"`. Bad: `Creates sophisticated multi-page documentation systems`. Concrete phrases allow the router to accurately match user requests to the description. + +**ST08** No XML tags (`<`, `>`) anywhere in the YAML? + +*Context:* XML tags in YAML frontmatter create a prompt injection risk. The specification explicitly prohibits their use. + +--- + +## ST09. `SKILL.md` Size + +> **Stage: 2** | Axis: personal use + +**ST09** Is `SKILL.md` **< 5000 words / 500 lines**? + +*Context:* The specification recommends keeping `SKILL.md` under 500 lines and 5000 tokens. If exceeded — immediate decomposition: details to `references/`, scripts to `scripts/`. A bloated `SKILL.md` causes context overload and quality degradation. + +--- + +## ST10–ST13. Basic Antipatterns + +> **Stage: 2** | Axis: personal use + +**ST10 Skill-prompt.** Is the skill simply a long prompt without a clear (1) role, (2) constraints, and (3) output format? + +*Context:* Skill = role + constraints + output format. Without these three components — it is just a prompt with a name. Early skill users copied long prompts into `SKILL.md` and got `maybe slightly better than before`. + +**ST11 Self-generated / unvalidated skill.** Does the artifact appear to have been generated by a single command like "write me a skill" with no traces of human verification, curation, or domain adaptation? + +*Context:* Self-generated skills provide ~0 pp improvement per SkillsBench data. The problem is not the writing style — it is the absence of verified human expertise: the model paraphrases what it already "knows". + +**ST12 Vibe-coded knowledge core.** Is the methodology written "from model memory" (generic phrases, abstract best practices) rather than describing a concrete, verifiable procedure? + +*Context:* Knowledge in a skill should be a procedure for obtaining and verifying data, not an "encyclopedia from model memory". Even an expertly written skill degrades if the knowledge core remains vague and unverifiable. + +**ST13 Monolith.** Is everything dumped into a single `SKILL.md`: workflow, references, examples, API schemas, long tables? + +*Context:* Everything that is not the core workflow should live in `references/` or `scripts/`. Otherwise — context overload: the agent drowns in details and loses focus on the main process. + +--- + +## ST14–ST16. Skill Justification as an Artifact + +> **Stage: 2** | Axis: personal use + +**ST14** Is this one coherent workflow, not a set of unrelated commands? + +*Context:* One skill = one coherent action. If `SKILL.md` describes 3 independent operations — those are 3 skills, not 1. Atomicity simplifies triggering, testing, and composability. + +**ST15** Is the instruction **> 300 tokens**? (If less — it is a line in a system prompt, not a skill.) + +*Context:* Very short instructions (< 300 tokens) that are needed in every conversation are better placed in `CLAUDE.md` or a system prompt — the overhead of creating a skill will not pay off. + +**ST16** Does the artifact appear to be justified as a skill? (3+ steps, a domain procedure, a repeatable workflow, or non-trivial decision points.) + +*Context:* Not every artifact should become a skill. If there is no multi-step process, domain methodology, or repeatable routine inside, this may be overengineering. An extra skill increases routing overhead and context noise with no real benefit. diff --git a/plugins/skill-review/skills/skill-review/references/checklist-workflow.md b/plugins/skill-review/skills/skill-review/references/checklist-workflow.md new file mode 100644 index 0000000..76b35e6 --- /dev/null +++ b/plugins/skill-review/skills/skill-review/references/checklist-workflow.md @@ -0,0 +1,223 @@ +# Checklist: Workflow + +> Sub-agent Workflow. Checks workflow quality: steps, checkpoints, safeguards, preconditions/postconditions, MCP, sub-agents. +> +> Result of each check: `PASS` / `FAIL` / `WARNING` / `N/A`. + +--- + +## WF01–WF04. `SKILL.md` Body — Basic Quality + +> **Stage: 2** | Axis: personal use + +**WF01** Are instructions written in **imperative** (`Run`, `Validate`, `Stop`, `Check`), not descriptively? + +*Context:* Agents follow concrete commands better than abstract descriptions. `Validate the data before proceeding` — bad. `Run python scripts/validate.py --input {filename}` — good. Imperative reduces NL-code confusion. + +**WF02** Is the workflow formatted as **numbered steps**, not narrative paragraphs? + +*Context:* Numbered steps reduce NL-code confusion — the situation where the agent confuses descriptive text and executable instructions. Rule: body is a procedure, not a wiki. + +**WF03** Are there at least 1–2 examples (input/output)? + +*Context:* Without examples, output is unstable. Examples give the agent a pattern for matching and set expectations for format, volume, and style. + +**WF04** Is there a Troubleshooting section or error handling? + +*Context:* Without error handling, the agent either silently continues on failure (silent chain failure) or stops without explanation. Minimum: top 3 typical errors with cause and resolution. + +--- + +## WF05–WF10. Six Mandatory Elements + +> **Stage: 2** | Axis: personal use + +**WF05 Trigger** — when to apply +**WF06 Inputs** — what is needed as input +**WF07 Steps** — how to execute +**WF08 Checks** — how to validate +**WF09 Stop conditions** — when to stop +**WF10 Recovery** — what to do on failure + +*Context:* If even one of these elements is implicit, the skill behaves like a long prompt, not a workflow. This is one of the main criteria for the transition from stage 1 to stage 2. + +--- + +## WF11–WF14. Execution Safeguards + +> **Stage: 2** | Axis: personal use + +**WF11** Does each workflow step have a **checkpoint** or explicit transition condition to the next step? + +*Context:* Without checkpoints, the agent continues the workflow on a silent step failure — this is "silent chain failures". A good skill does not just list steps — it sets conditions: what must be true before proceeding. + +**WF12** If the workflow is longer than **4 steps** — do the instructions explicitly declare a **planning tool**, task list, or external planning artifact? + +*Context:* In long sessions, the agent easily loses its plan and starts jumping between tasks. Anthropic recommends structured note-taking / agentic memory: an explicit task list that maintains state between tool calls. + +**WF13** If planning is used, is there an **enforcement gate**: completion is not allowed while there are `pending` or `in_progress` tasks? + +*Context:* The most effective way to make planning mandatory is to prohibit completion with unclosed tasks. Otherwise the TODO list remains decorative and does not prevent context loss. + +**WF14** Are critical rules, prohibitions, and stop conditions at the beginning of the skill or under explicit `CRITICAL` headers, not buried in the middle of a long text? + +*Context:* "Lost in the middle" is a well-known long-context problem: the model pays less attention to instructions in the middle of a file. The most important rules must be visible early and explicitly. + +--- + +## WF15–WF18. Preconditions / Postconditions / Boundaries + +> **Stage: 3** | Axis: team use + +**WF15** Are **preconditions** explicitly described? (OS, permissions, access, files, limits of the environment, required packages) + +*Context:* Without explicit preconditions, the skill is tied to the author's machine and environment. When a colleague runs the skill on a different OS, with different permissions, or a different Python version — the skill silently breaks. + +**Additional checks:** +- Is it stated where the input files, tokens, directories, access, and environment variables come from? +- Are there no hidden requirements for shell, package manager, alias, or runtime version? +- Are there no dependencies on "magic" files, local state, or previous chat history? +- If the skill is claimed to be portable — is this confirmed by instructions or artifacts, not by silent assumption? + +*Verdict hint:* if without this information a new person cannot start or continue a step — at minimum `WARNING`. If the step requires hidden author knowledge to continue — `FAIL`. + +**WF16** Are **postconditions** explicitly described? (What artifacts are created, what checks are passed, what "done" looks like) + +*Context:* Without postconditions, there is no completion criterion. The agent does not know whether it has reached its goal, and may stop too early or continue indefinitely. + +**WF17** Are **boundary conditions** explicitly described? (Where the skill should not be applied, where to hand off to a human or another skill) + +*Context:* Without boundary conditions, the skill tries to do what it is not designed for — for example, processing files > 1 GB when a bulk-processing workflow is needed. + +**WF18** Are there no **hardcoded paths**, user-specific directories, local drive names, or other author-specific paths in `SKILL.md`, `scripts/`, or references? + +*Context:* Hardcoded paths are a classic portability antipattern. A skill may work perfectly on the author's machine and silently break for colleagues due to `C:\\Users\\...`, `/home/alice/...`, or hardwired absolute paths. Better to use input parameters, relative paths, and explicitly described preconditions. + +--- + +## WF19–WF22. MCP Steps and Capability-Bound References + +> **Stage: 2** | Axis: personal use (if the skill uses MCP) + +**WF19** For MCP steps, are **capability, inputs, and expected outputs** described, not bare function names? + +*Context:* By default, a skill should encode the workflow contract, not mirror the tool contract. A capability-bound formulation (`call the semantic search function`) is more stable than a signature-bound one (`call search_documents(query, top_k=5)`), because it does not break on MCP rename/refactor. + +**WF20** Are **negative selectors** specified — which similar but incorrect tools not to choose? + +*Context:* When there are several similar MCP functions, the agent may choose the wrong one. `Do not choose the exact ID search function or URL lookup` — an explicit negative selector helps the router. + +**WF21** Are exact MCP names/parameters specified **only** for critical steps (safety, compliance, high cost of error)? + +*Context:* Capability-bound by default; signature-bound is a conscious exception. If there are several very similar functions with different error costs, or exact enums/flags matter — switch to signature-bound for those specific steps. If exact MCP function names are needed for critical steps, prefer a single reference file with the signatures (`references/mcp-contracts.md`, `api-reference.md`). Inline signatures in `SKILL.md` are only acceptable as a localized exception. + +**WF22** Does the skill duplicate full JSON schemas or lists of parameters that already live in the MCP contract? + +*Context:* Mirroring MCP schema inside `SKILL.md` is an antipattern. The skill becomes brittle to any MCP server refactor. Better to capture the required capability in the skill, not copy the entire tool contract. If the skill does pin specific signatures per rule WF21, they must not be scattered across multiple files. Only one location is acceptable: either one block in `SKILL.md` or one dedicated reference file. Signatures spread across multiple files are an amplified FAIL signal. + +--- + +## WF23–WF25. Sub-agent Delegation + +> **Stage: 2** | Axis: personal use (if the skill uses sub-agents) + +**WF23** Are the main agent's skills passed to the sub-agent **explicitly**, rather than assuming automatic inheritance? + +*Context:* Skills are not inherited by sub-agents automatically. Required skills must be passed explicitly in the invocation. + +**WF24** Does the sub-agent brief contain: **scope, files, expected output, constraints**? + +*Context:* Vague invocations (`Implement the feature`) are an antipattern. Without a clear brief, the sub-agent cannot see the overall context and may break code or data dependencies. + +**WF25** Does the sub-agent return a **condensed summary**, not a full transcript? + +*Context:* The sub-agent architecture is useful precisely because the sub-agent can spend tens of thousands of tokens on exploration, but return only a 1000–2000 token summary. Copying the full transcript into the main thread defeats the entire purpose of delegation. + +--- + +## WF26. Handoff Checkpoints + +> **Stage: 2** | Axis: personal use (if there is a handoff between tools / co-skills) + +**WF26** Do checkpoints cover **handoffs between tools / co-skills**? After each cross-system step, is it clear what artifact was received, in what format, and what constitutes a valid handoff? + +*Context:* Checkpoints matter not only within a single skill (covered by WF11), but also at handoffs between tools and co-skills. Task state is most often lost at transitions between steps and tools. If MCP is described brittlely or the handoff is lost — the skill breaks even for the author. + +--- + +## WF27. External Planning Artifact + +> **Stage: 2** | Axis: personal use (if workflow is multi-step with multiple tools / co-skills) + +**WF27** If the workflow passes through several tools / co-skills, is there an external planning artifact or TODO list that survives the handoff between steps? + +*Context:* In a multi-skill environment, task state is most often lost precisely at transitions between steps and tools. An external planning artifact maintains dependent steps, execution status, and blockers that would otherwise dissolve into the thread history. + +--- + +## WF28–WF29. Context and Architecture + +> **Stage: 2** | Axis: personal use (if the workflow is long) + +**WF28** Are there explicit instructions for **context management** for long tasks? + +**Applicability:** workflow >= 5 steps, OR processing potentially large volumes of data, OR using sub-agents, OR multi-session tasks. If the workflow is short and simple — **N/A**. + +*Context:* Long tasks suffer from "lost-in-the-middle" — the agent loses focus on early decisions. Manus (Jul 2025) addresses this with the filesystem as unlimited persistent memory: `todo.md` is rewritten after each step, "reciting" goals to the end of context. Anthropic (Sep 2025) describes three techniques: compaction, structured note-taking, sub-agent architectures. StackOne (Jan 2026) documents 6 context overflow failure patterns. + +**Signals to check:** + +| Signal | Verdict if absent | +|---|---| +| Saving intermediate results to files (plan.md, progress.md, findings.md) | WARNING (if workflow >= 5 steps) | +| Context-refresh: re-reading the plan before each new phase | WARNING | +| Strategy for context overflow (compaction, sub-agent, respawn) | INFO (if < 10 steps), WARNING (if >= 10) | +| Checkpoints with state capture | WARNING (for critical phases) | +| TODO list as attention management (updated during the work) | INFO | + +**Scale:** +- **PASS:** there is an explicit context management strategy (files, checkpoints, refresh) +- **WARNING:** workflow is long (>= 5 steps) but none of the above signals are present +- **N/A:** workflow is short and simple (< 5 steps, no large data) + +**Recommendation on WARNING:** +> Add context management instructions to the workflow: +> 1. Save intermediate results to a file after each major phase +> 2. Context-refresh: before each new phase, re-read plan.md and the latest progress.md entries +> 3. For tasks > 10 steps: consider respawning a sub-agent with a clean context +> +> Sources: [Manus blog](https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus), [Anthropic guide](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) + +--- + +**WF29** Are there workflow phases that should be **extracted as sub-agents** but are executed inline? + +**Applicability:** workflow with >= 3 logically distinct phases. If the workflow is simple (1–2 steps) — **N/A**. + +*Context:* Anthropic (Sep 2025): "Specialized sub-agents can handle focused tasks with clean context windows. Each subagent might explore extensively, using tens of thousands of tokens, but returns only a condensed summary." Bob Renze (Mar 2026): three criteria — isolation benefit, model specialization, restart tolerance. nibzard/awesome-agentic-patterns: limit to 2–4 subagents; more adds coordination overhead. + +**Three criteria (all simultaneously):** + +| # | Criterion | Explanation | +|---|---|---| +| 1 | **Independence** | The phase does not depend on intermediate results from other phases | +| 2 | **Result-oriented** | Only the final result matters, not intermediate reasoning | +| 3 | **Verifiability** | The result is checkable by simple criteria (format, completeness, required fields) | + +**Algorithm:** for each workflow phase, check all 3 criteria. If all are met AND the phase is NOT structured as a sub-agent → signal. + +**Scale:** +- **PASS:** all isolatable phases are already sub-agents, OR there are no such phases +- **WARNING:** 1–2 isolatable phases running inline +- **FAIL:** 3+ isolatable phases + signs of context overload (> 10 steps, large files) +- **N/A:** workflow < 3 phases + +**Recommendation on WARNING/FAIL:** +> Consider extracting phases as sub-agents: +> - Context isolation: sub-agent explores 50K+ tokens, returns 1–2K summary +> - Parallel execution of independent phases +> - Resilience: a sub-agent crash does not kill the orchestrator +> +> Constraints: 2–4 sub-agents is optimal. Each spawn ~2–3K tokens overhead. For phases < 500 tokens, inline is cheaper. +> +> Sources: [Anthropic](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents), [Bob Renze](https://dev.to/bobrenze/ai-agent-subagent-orchestration-when-to-spawn-vs-when-to-do-it-yourself-4opg) diff --git a/plugins/skill-review/skills/skill-review/references/instruction-singlepass.md b/plugins/skill-review/skills/skill-review/references/instruction-singlepass.md new file mode 100644 index 0000000..a0538fe --- /dev/null +++ b/plugins/skill-review/skills/skill-review/references/instruction-singlepass.md @@ -0,0 +1,690 @@ +# Skill Review — Single-pass Mode + +> **Activation context:** This mode was activated by the orchestrator because the total volume of the skill under review is < 500 lines. All checks are executed in one context without sub-agents. The set of checks, report format, and Bingo are **identical** to sub-agent mode. + +--- + +## Preconditions + +> This mode assumes the orchestrator has already run the preflight of the parent skill. + +- `TodoWrite` and file access are expected to be available. +- If the parent orchestrator enabled logging, file write is additionally required. +- If any of these capabilities is unexpectedly unavailable, do not start the review and immediately notify the user that single-pass cannot start in the current client. + +--- + +## Mandatory Rules + +- **No questions to the user.** Review parameters (scope, declared target, logging) were already determined by the orchestrator in Step 0. +- Create a **TODO plan** for the review: one TODO item per 3–4 checks. +- Evaluate **only by observable artifacts** — do not infer what is not present in the files. +- Do not count as PASS any runs, stability, or lifecycle maturity without file confirmation. +- If a section is not applicable (no `references/`, `scripts/`, MCP, sub-agents) — mark **N/A**, not FAIL. +- Do not inflate severity: team/workspace-level patterns do not become FAIL for an isolated skill without evidence. +- Each issue — with a concrete recommendation: **what to fix and where**. +- **Justify before verdict:** the file and section you rely on, 1–2 sentences. +- **Evidence:** primary anchor — `file.md § Section Name`. Line numbers — only an auxiliary hint (`line ~N`). +- **Scope:** check only IDs belonging to the current scope. Do not evaluate or mark as N/A any IDs above the scope. +- Do not soften severity within the checked scope. +- If an instruction relies on hidden author knowledge — mark as **hidden assumption**. +- If a step depends on OS, shell, runtime version, permissions, working directory — flag as **portability risk**. +- Any abnormal situation (unreadable files, invalid artifact) must not silently abort the review. See `## Troubleshooting` for details. +- **Language:** run the entire review in the user's language. + +--- + +## Scope and Check Matrix + +| Scope | Structure | Workflow | References | Links | Lifecycle | +|:---|:---|:---|:---|:---|:---| +| **up to 2** | ST01–ST16 | WF01–WF14, WF19–WF29 | RF04–RF12 | LK01, LK05 | — | +| **up to 3** | ST01–ST16 | WF01–WF29 | RF01, RF03–RF15 | LK01–LK07 | — | +| **full** | ST01–ST16 | WF01–WF29 | RF01–RF15 | LK01–LK07 | LC01–LC05 | + +Checks outside scope are marked **"not checked for selected scope"** (not N/A and not FAIL). + +--- + +## Algorithm + +```text +1. Create the TODO plan for the review (compact, no sub-agents) +2. Sequentially run all checks by scope: + Part A → Part B → Part C → Part D → [Part E if scope is full] +3. Fill Antipattern Bingo (Part F) +4. Generate the final report (Part G) +5. If logging ON — write report.md; if logging OFF — do not write any files +``` + +--- + +## Troubleshooting + +> **General principle:** an abnormal situation must not silently abort the review. On file read problems — produce the most complete report possible from available artifacts and add a `Review Limitations` section. On file write problems — notify the user and stop the review. + +### 1. An invalid skill artifact was passed + +**Symptoms:** no `SKILL.md`, folder is empty, or an arbitrary set of files was passed instead of a skill. +**Cause:** the input artifact is not a skill folder. +**Resolution:** flag this as the primary critical issue, perform a residual review from available structure and readable files, and tell the user what is missing. + +### 2. Skill files cannot be read or are damaged + +**Symptoms:** read error, mojibake, garbage instead of text, truncation in a critical section, unsupported encoding — the meaning of the text cannot be reliably recovered. +**Cause:** corrupted encoding, binary file, access problems, or damaged artifact. +**Resolution:** +1. Retry reading **once**. +2. If the retry fails — consider the file unreadable. +3. If `SKILL.md` cannot be read — this is the primary critical issue; perform a residual review from the folder structure and readable files. +4. If a reference file cannot be read — do not infer its content; continue the review with other artifacts. +5. The unreadable file must appear in `Review Limitations`. + +### 3. File write is unavailable + +**Symptoms:** `report.md` cannot be written with logging ON. +**Cause:** no write permissions, read-only filesystem, environment restrictions. +**Resolution:** +1. Retry the write **once**. +2. If write is still unavailable — **notify the user and stop the review**. +3. Tell the user: which files could not be written, suggest checking permissions or switching to a different working directory. + +--- + +## Part A — Structure and Form (ST01–ST16) + +> Result of each check: `PASS` / `FAIL` / `WARNING` / `N/A`. +> All checks — **stage 2**. + +### A1. Structure and Validity (ST01–ST04) + +**ST01.** Is the folder in `kebab-case`? (lowercase a–z, 0–9, hyphens; no spaces, underscores, CamelCase) + +*Context:* The Agent Skills Specification requires kebab-case for the folder name. Other formats (`Notion Project Setup`, `notion_setup`, `NotionSetup`) are not picked up by many clients or cause validation errors. + +**ST02.** Is the file named **exactly** `SKILL.md`? (not `skill.md`, not `SKILL.MD`) + +*Context:* The name is case-sensitive. Agents scan directories for exactly `SKILL.md` — other variants will not be discovered. + +**ST03.** No `README.md` inside the skill folder? + +*Context:* This is a structural folder-hygiene check for an isolated skill: there should be no extra human-facing markdown inside the folder that creates noise during discovery or overrides the role of `SKILL.md`. See also `LC05`: that check covers the repository-level boundary — that a README as a human artifact lives at the repository level, not inside the skill folder. + +**ST04.** If `references/`, `scripts/`, `assets/` exist — used correctly? (references = documentation, scripts = code, assets = templates) + +*Context:* Correct distribution across directories is the foundation of progressive disclosure. The agent loads files from these directories only when needed, saving up to 85–95% of tokens compared to flat loading. + +### A2. YAML Frontmatter (ST05–ST08) + +**ST05.** Is there a `name` field in kebab-case that matches the folder name? + +*Context:* The specification requires: name = 1–64 characters, lowercase alphanumeric + hyphens, matches the parent directory. A mismatch causes validation errors. + +**ST06.** Is there a `description` field with the formula: **WHAT it does** + **WHEN to use** (trigger phrases)? + +*Context:* The description is the most critical part of a skill. The router uses it to decide whether to activate the skill. A vague description (`Helps with projects`) means the skill never fires or fires incorrectly. + +**ST07.** Does the description contain **concrete trigger phrases**, not abstractions? + +*Context:* Good: `Use when user says "plan sprint", "create tasks", "set up project"`. Bad: `Creates sophisticated multi-page documentation systems`. Concrete phrases allow the router to accurately match user requests to the description. + +**ST08.** No XML tags (`<`, `>`) anywhere in the YAML? + +*Context:* XML tags in YAML frontmatter create a prompt injection risk. The specification explicitly prohibits their use. + +### A3. Size (ST09) + +**ST09.** Is `SKILL.md` < 5000 words / 500 lines? + +*Context:* The specification recommends keeping `SKILL.md` under 500 lines and 5000 tokens. If exceeded — immediate decomposition: details to `references/`, scripts to `scripts/`. A bloated `SKILL.md` causes context overload and quality degradation. + +### A4. Basic Antipatterns (ST10–ST13) + +**ST10. Skill-prompt.** Is the skill simply a long prompt without a clear (1) role, (2) constraints, (3) output format? + +*Context:* Skill = role + constraints + output format. Without these three components — it is just a prompt with a name. Early skill users copied long prompts into `SKILL.md` and got `maybe slightly better than before`. + +**ST11. Self-generated skill.** Does the artifact appear to have been generated by a single command without traces of human verification or domain adaptation? + +*Context:* Self-generated skills provide ~0 pp improvement per SkillsBench data. The problem is not the writing style — it is the absence of verified human expertise: the model paraphrases what it already "knows". + +**ST12. Vibe-coded knowledge core.** Is the methodology written "from model memory" (generic phrases, abstract best practices) rather than a concrete, verifiable procedure? + +*Context:* Knowledge in a skill should be a procedure for obtaining and verifying data, not an "encyclopedia from model memory". Even an expertly written skill degrades if the knowledge core remains vague and unverifiable. + +**ST13. Monolith.** Is everything dumped into one `SKILL.md`: workflow, references, examples, API schemas? + +*Context:* Everything that is not the core workflow should live in `references/` or `scripts/`. Otherwise — context overload: the agent drowns in details and loses focus on the main process. + +### A5. Justification (ST14–ST16) + +**ST14.** One coherent workflow, not a set of unrelated commands? + +*Context:* One skill = one coherent action. If `SKILL.md` describes 3 independent operations — those are 3 skills, not 1. Atomicity simplifies triggering, testing, and composability. + +**ST15.** Is the instruction > 300 tokens? + +*Context:* Very short instructions (< 300 tokens) that are needed in every conversation are better placed in `CLAUDE.md` or a system prompt — the overhead of creating a skill will not pay off. + +**ST16.** Is the artifact justified as a skill? (3+ steps, a domain procedure, a repeatable workflow, non-trivial decision points) + +*Context:* Not every artifact should become a skill. If there is no multi-step process, domain methodology, or repeatable routine inside, this may be overengineering. An extra skill increases routing overhead and context noise with no real benefit. + +--- + +## Part B — Workflow (WF01–WF29) + +> Result of each check: `PASS` / `FAIL` / `WARNING` / `N/A`. + +### B1. Basic `SKILL.md` Body Quality (WF01–WF04) — stage 2 + +**WF01.** Are instructions in **imperative** (`Run`, `Validate`, `Stop`, `Check`), not descriptively? + +*Context:* Agents follow concrete commands better than abstract descriptions. `Validate the data before proceeding` — bad. `Run python scripts/validate.py --input {filename}` — good. Imperative reduces NL-code confusion. + +**WF02.** Is the workflow formatted as **numbered steps**, not narrative paragraphs? + +*Context:* Numbered steps reduce NL-code confusion — the situation where the agent confuses descriptive text and executable instructions. Rule: body is a procedure, not a wiki. + +**WF03.** Are there at least 1–2 examples (input/output)? + +*Context:* Without examples, output is unstable. Examples give the agent a pattern for matching and set expectations for format, volume, and style. + +**WF04.** Is there a Troubleshooting section or error handling? + +*Context:* Without error handling, the agent either silently continues on failure (silent chain failure) or stops without explanation. Minimum: top 3 typical errors with cause and resolution. + +### B2. Six Mandatory Elements (WF05–WF10) — stage 2 + +**WF05** Trigger — when to apply +**WF06** Inputs — what is needed as input +**WF07** Steps — how to execute +**WF08** Checks — how to validate +**WF09** Stop conditions — when to stop +**WF10** Recovery — what to do on failure + +*Context:* If even one of these elements is implicit, the skill behaves like a long prompt, not a workflow. This is one of the main criteria for the transition from stage 1 to stage 2. + +### B3. Execution Safeguards (WF11–WF14) — stage 2 + +**WF11.** Does each workflow step have a **checkpoint** or explicit transition condition? + +*Context:* Without checkpoints, the agent continues the workflow on a silent step failure — "silent chain failures". A good skill does not just list steps — it sets conditions: what must be true before proceeding. + +**WF12.** If the workflow has > 4 steps — is there a **planning tool**, task list, or external planning artifact? + +*Context:* In long sessions, the agent easily loses its plan. Anthropic recommends structured note-taking / agentic memory: an explicit task list that maintains state between tool calls. + +**WF13.** If planning is used — is there an **enforcement gate**: completion is not allowed while there are `pending`/`in_progress` tasks? + +*Context:* The most effective way to make planning mandatory is to prohibit completion with unclosed tasks. Otherwise the TODO list remains decorative and does not prevent context loss. + +**WF14.** Are critical rules and prohibitions at the beginning of the skill or under `CRITICAL` headers, not buried in the middle? + +*Context:* "Lost in the middle" is a well-known long-context problem: the model pays less attention to instructions in the middle of a file. The most important rules must be visible early and explicitly. + +### B4. Preconditions / Postconditions / Boundaries (WF15–WF18) — stage 3 + +> Checked only when scope >= up to 3. + +**WF15.** Are **preconditions** explicitly described? (OS, permissions, access, files, packages, environment limits) + +*Context:* Without explicit preconditions, the skill is tied to the author's machine. When a colleague runs the skill on a different OS, with different permissions, or a different Python version — the skill silently breaks. + +**Additional checks:** +- Is it stated where input files, tokens, directories, access, and environment variables come from? +- Are there no hidden requirements for shell, package manager, alias, or runtime version? +- Are there no dependencies on "magic" files, local state, or chat history? +- If the skill is claimed to be portable — is this confirmed by instructions, not silent assumption? + +*Hint:* if without this a new person cannot continue a step — at minimum WARNING. If hidden author knowledge is required — FAIL. + +**WF16.** Are **postconditions** explicitly described? (Artifacts created, checks passed, what "done" looks like) + +*Context:* Without postconditions, there is no completion criterion. The agent does not know whether it has reached its goal. + +**WF17.** Are **boundary conditions** explicitly described? (Where not to apply, where to hand off to a human or another skill) + +*Context:* Without boundary conditions, the skill tries to do what it is not designed for. + +**WF18.** Are there no **hardcoded paths**, user-specific directories, local drive names? + +*Context:* Hardcoded paths are a classic portability antipattern. A skill may work on the author's machine and silently break for colleagues. Better to use input parameters, relative paths, and explicitly described preconditions. + +### B5. MCP Steps (WF19–WF22) — stage 2 (if the skill uses MCP) + +**WF19.** For MCP steps, are **capability, inputs, and expected outputs** described, not bare function names? + +*Context:* By default, a skill should encode the workflow contract, not mirror the tool contract. A capability-bound formulation (`call the semantic search function`) is more stable than signature-bound (`call search_documents(query, top_k=5)`), because it does not break on MCP rename/refactor. + +**WF20.** Are **negative selectors** specified — which similar but incorrect tools not to choose? + +*Context:* When there are similar MCP functions, the agent may choose the wrong one. An explicit negative selector helps the router. + +**WF21.** Are exact MCP names/parameters specified **only** for critical steps (safety, compliance, high cost of error)? + +*Context:* Capability-bound by default; signature-bound is a conscious exception. If exact names are needed for critical steps, prefer a single reference file with the signatures. Inline signatures in `SKILL.md` are only acceptable as a localized exception. + +**WF22.** Does the skill duplicate full JSON schemas / lists of parameters from the MCP contract? + +*Context:* Mirroring MCP schema inside `SKILL.md` is an antipattern. The skill becomes brittle to any MCP server refactor. If signatures are pinned per WF21, they must be localized in one place. Signatures spread across multiple files — amplified FAIL signal. + +### B6. Sub-agent Delegation (WF23–WF25) — stage 2 (if the skill uses sub-agents) + +**WF23.** Are the main agent's skills passed to the sub-agent **explicitly**, not assuming automatic inheritance? + +*Context:* Skills are not inherited by sub-agents automatically. Required skills must be passed explicitly in the invocation. + +**WF24.** Does the sub-agent brief contain: **scope, files, expected output, constraints**? + +*Context:* Vague invocations (`Implement the feature`) are an antipattern. Without a clear brief, the sub-agent cannot see the overall context. + +**WF25.** Does the sub-agent return a **condensed summary**, not a full transcript? + +*Context:* The sub-agent architecture is useful precisely because the sub-agent can spend tens of thousands of tokens on exploration, but return only a 1000–2000 token summary. Copying the full transcript defeats the purpose of delegation. + +### B7. Handoff and Planning (WF26–WF27) — stage 2 (if applicable) + +**WF26.** Do checkpoints cover **handoffs between tools / co-skills**? After each cross-system step, is it clear what artifact was received, in what format, and what constitutes a valid handoff? + +*Context:* Checkpoints matter not only within a single skill (WF11), but also at handoffs between tools and co-skills. Task state is most often lost at these transitions. + +**WF27.** If the workflow passes through several tools / co-skills — is there an **external planning artifact / TODO** that survives the handoff? + +*Context:* An external planning artifact maintains dependent steps, execution status, and blockers that would otherwise dissolve into the thread history. + +### B8. Context and Architecture (WF28–WF29) — stage 2 (if the workflow is long) + +**WF28.** Are there explicit instructions for **context management** for long tasks? + +**Applicability:** workflow >= 5 steps, OR large data, OR sub-agents, OR multi-session. Otherwise N/A. + +*Context:* Long tasks suffer from "lost-in-the-middle" — the agent loses focus on early decisions. Manus (Jul 2025) addresses this with the filesystem as unlimited persistent memory: `todo.md` is rewritten after each step. Anthropic (Sep 2025) describes three techniques: compaction, structured note-taking, sub-agent architectures. StackOne (Jan 2026) documents 6 context overflow failure patterns. + +| Signal | Verdict if absent | +|---|---| +| Saving intermediate results to files | WARNING (>= 5 steps) | +| Context-refresh: re-reading plan before each phase | WARNING | +| Strategy for context overflow | INFO (< 10 steps), WARNING (>= 10) | +| Checkpoints with state capture | WARNING (for critical phases) | + +- **PASS:** explicit context management strategy is present +- **WARNING:** workflow is long (>= 5 steps), none of the signals above are present +- **N/A:** workflow is short and simple (< 5 steps, no large data) + +**WF29.** Are there workflow phases that should be **extracted as sub-agents** but run inline? + +**Applicability:** workflow with >= 3 distinct phases. Otherwise N/A. + +*Context:* Anthropic (Sep 2025): `Specialized sub-agents can handle focused tasks with clean context windows. Each subagent might explore extensively, using tens of thousands of tokens, but returns only a condensed summary.` Bob Renze (Mar 2026) adds three criteria: isolation benefit, model specialization, restart tolerance. Practical conclusion: 2–4 sub-agents is usually optimal; beyond that, coordination overhead starts eating the benefit. + +Three criteria (all simultaneously): (1) Independence, (2) Result-oriented, (3) Verifiability. If all are met AND the phase is NOT structured as a sub-agent → signal. + +- **PASS:** all isolatable phases are already sub-agents, OR there are none +- **WARNING:** 1–2 isolatable phases running inline +- **FAIL:** 3+ isolatable phases + context overload signs (> 10 steps, large files) +- **N/A:** workflow < 3 phases + +--- + +## Part C — References and Progressive Disclosure (RF01–RF15) + +> Result of each check: `PASS` / `FAIL` / `WARNING` / `N/A`. + +### C1. Progressive Disclosure and `references/` (RF04–RF12) — stage 2 (if references/ exists) + +**RF04.** Does the workflow (how to execute) live in `SKILL.md`, and the methodology/references (what to know) in `references/`? + +*Context:* Mixing workflow and knowledge in one file increases NL-code confusion. `SKILL.md` owns "how to execute". `references/` owns "what to know to execute". + +**RF05.** No duplication between `SKILL.md` and `references/`? + +*Context:* Single Source of Truth. When they fall out of sync, the agent receives contradictory instructions. + +**RF06.** Is each reference file focused on **one topic / scenario / class of problems**? + +*Context:* A monolithic reference on thousands of lines is a "monolithic reference dump" antipattern. The agent is forced to read everything. `Keep individual reference files focused. Agents load these on demand, so smaller files mean less use of context.` + +**RF07.** Does `SKILL.md` state **when** to read each reference file (explicit trigger)? + +*Context:* `Read references/api-errors.md if API returns non-200` — correct. `See references/ for more details` — antipattern. + +**RF08.** Is the primary navigation to reference files **directly from `SKILL.md`**, not only through reference→reference chains? + +*Context:* `SKILL.md` must be the orchestration layer. Limited cross-refs are acceptable as secondary navigation, but not as the sole entry point. + +**RF09.** If a reference links to another reference — is the link explicitly scoped? (explicit trigger, no cycles, no deeper than 1 hop from SKILL.md) + +*Context:* Cross-references are only acceptable as a local clarification. + +**RF10.** Do files > 100 lines start with a **table of contents**? + +*Context:* A TOC lets the agent read only the needed section. By default treat missing TOC in a reference > 100 lines as **WARNING**. Use **FAIL** only if the file is effectively an unnavigable monolith. + +**RF11.** Is content distributed approximately as L1 (~10%) : L2 (~30%) : L3 (~60%)? + +*Context:* L1 (frontmatter) loads always. L2 (body) loads on activation. L3 (references, scripts, assets) loads on demand. If 90% of content is in the body — progressive disclosure is not working. + +**RF12.** If there are many reference files — is there a **file map** with the purpose of each? + +*Context:* Without a map, even good small files become an unnavigable collection. + +### C2. Description Quality and Routing (RF01, RF03) — stage 3 + +> Checked only when scope >= up to 3. + +**RF01.** Does the description contain "**When NOT to use**" (negative triggers)? + +*Context:* Adding negative triggers reduces false activations by 40–60%. Format: `Don't use when: user asks for code review (use code-review skill)`. + +**RF03.** Is the description not too broad, not provoking overtriggering? + +*Context:* An overly generic description intercepts neighboring requests. + +### C3. Additional Reference Checks (RF13–RF15) — stage 3 + +> Checked only when scope >= up to 3. + +**RF13.** Are there no **dynamic knowledge items** in references/ (pricing, benchmarks, API versions)? + +*Context:* Frequently changing data should not be stored as a static reference. + +**RF14.** Are important **gotchas** that apply almost always in `SKILL.md`, not buried deep in references/? + +*Context:* If the agent will almost certainly encounter a problem, surface it early in the main body. + +**RF15.** Are deterministic operations extracted into `scripts/`, not only described in natural language? + +*Context:* Code is deterministic; natural language interpretation is not. + +### C4. Routing in the Ecosystem (RF02) — stage 4 + +> Checked only when scope = full. + +**RF02.** Are specific **neighboring skills** to redirect to named? + +*Context:* Negative triggers without an alternative are less effective than those with one (`Don't use for X — use Y-skill instead`). + +--- + +## Part D — Link Integrity (LK01–LK07) + +> Result of each check: `PASS` / `FAIL` / `WARNING` / `N/A`. + +### D1. Basic Integrity (LK01, LK05) — stage 2 + +**LK01.** Do all file mentions (`references/foo.md`, `scripts/bar.py`) point to files that actually exist? + +*What we look for:* Broken file refs. Check every file path mentioned in `SKILL.md` and reference files. + +**LK05.** Are all files reachable from `SKILL.md` via actual links? No orphan files? + +*What we look for:* Mechanical integrity of the link graph. This is a mechanical reachability check, not a policy assessment of topology; policy lives in RF08/RF09. + +### D2. Navigation, Anchors, and Cross-refs (LK02–LK04, LK06–LK07) — stage 3 + +> Checked only when scope >= up to 3. + +**LK02.** Do all `[text](#section-name)` links point to a real heading in the same file? + +*What we look for:* Broken internal anchors. + +**LK03.** Do all `[text](file.md#section)` links point to a real heading in a real file? + +*What we look for:* Broken cross-file anchors. Check both the file's existence and the heading's existence. + +**LK04.** If there is a TOC — do all entries correspond to actual headings? + +*What we look for:* Broken TOC. + +**LK06.** If there are http/https links — are they accessible? (N/A if network access is unavailable) + +*What we look for:* Dead external links. + +**LK07.** Do cross-references between skill files use headings/anchors, not line numbers? + +*Context:* Line numbers break on any edit. Line numbers are only acceptable in one-off artifacts (review logs, reports). + +--- + +## Part E — Ownership and Lifecycle (LC01–LC05) + +> Checked **only** when scope = full. All checks — **stage 4**. + +### E1. Stage 4 Gate Markers (LC01–LC03) + +> Mandatory markers for transitioning to stage 4. Without them, the skill is not a repository skill. + +**LC01.** Is `metadata.author` (skill owner) specified? + +*Context:* Orphaned skills without a maintainer are one of the most common problems. When the author leaves, the skill becomes a black box. + +**LC02.** Is `metadata.version` in **semver format**? + +*Context:* `metadata.version` (semver) fixes the environment contract. Without semver it is impossible to determine compatibility between different versions. + +**LC03.** Is there a **CHANGELOG** or **version header with dates**? + +*Context:* Update dates let you understand which version of the instruction is current. Acceptable: a version header in `SKILL.md` or a `CHANGELOG.md` next to `SKILL.md` or at the plugin root — but not inside `references/` (that is L3 content for the agent, see RF07, RF08, LK05). + +### E2. Stage 4 Hygiene Checks (LC04–LC05) + +> Not hard gate markers, but must appear in the report. + +**LC04.** Are there no **tokens, keys, passwords, or secrets** in the skill files? + +*Context:* Secrets in skill files are a direct security risk, especially when publishing to a shared repository. + +**LC05.** Is the human-facing `README.md` at **repository level**, not inside the skill folder? + +*Context:* Skills are designed for consumption by an AI agent. A README inside the skill folder is an antipattern. + +--- + +## Part F — Antipattern Bingo + +> **Mandatory** to fill in after the main checks. + +### Verdicts + +- **NONE** — the antipattern was **NOT found** (= clean) +- **MINOR** — **partial signs** (= a problem, but not complete); add a 3–4 word note +- **CRITICAL** — **FOUND** in full (= requires fixing) +- **NOT_CHECKED** — stage is above scope + +> **CRITICAL** = we **found** the antipattern. This is not a severity grade for impact — it is a fact of detection. + +### Scope Rule + +- scope = up to 2 → stage 3 and 4 antipatterns = `NOT_CHECKED` +- scope = up to 3 → stage 4 antipatterns = `NOT_CHECKED` +- scope = full → Bingo filled completely + +### Evidence Rule + +Evaluate **only by observable artifacts**. For `MINOR`, the note is short: `triggers too generic`, `path hardcoded`, `reference without map`. + +### Antipattern Reference + +| # | Antipattern | Stage | Check Reference | +|:--|:---|:---|:---| +| 1 | **Huge MD** | 2 | ST09, ST13 | +| 2 | **Vague triggers / Triggering lottery** | 3 | ST06, ST07 | +| 3 | **No negative triggers** | 3 | RF01 | +| 4 | **No examples** | 2 | WF03 | +| 5 | **Abstract instructions** | 2 | WF01 | +| 6 | **Buried critical rules** | 2 | WF14 | +| 7 | **No error handling** | 2 | WF04 | +| 8 | **Content duplication** | 3 | RF05 | +| 9 | **AI generated skill** | 2 | ST11 | +| 10 | **Vibe-coded knowledge core** | 2 | ST12 | +| 11 | **Skill-prompt** | 2 | ST10 | +| 12 | **Hardcoded paths** | 3 | WF18 | +| 13 | **Mirroring MCP schema** | 2 | WF22 | +| 14 | **NL-code confusion** | 2 | WF02 | +| 15 | **Schema drift risk** | 4 | WF22, LC02 | +| 16 | **Context overfitting** | 3 | WF15 | +| 17 | **Overengineering** | 2 | ST16 | +| 18 | **Lifecycle hygiene gap (rot risk)** | 4 | LC03 | +| 19 | **Silent chain failures** | 2 | WF11 | +| 20 | **Monolithic reference dump** | 3 | RF06, RF12 | + +### Notes + +- **#15 vs #13:** both use WF22, but #13 is a structural fact (schema copy), #15 is a lifecycle risk (drift). +- **#16 vs #12:** #12 is mechanical (absolute paths), #16 is broader (coupling to OS, permissions, environment). Do not duplicate the verdict. +- **#16 in novice mode:** checked partially — only by mechanical portability signals (WF15: preconditions, OS, permissions). + +### Bingo Table for Report + +| # | Antipattern | Stage | Verdict | Note | +|:--|:---|:---|:---|:---| +| 1 | Huge MD | 2 | [NONE / MINOR / CRITICAL / NOT_CHECKED] | [if MINOR: 3–4 words, otherwise `—`] | +| 2 | Vague triggers / Triggering lottery | 3 | | | +| 3 | No negative triggers | 3 | | | +| 4 | No examples | 2 | | | +| 5 | Abstract instructions | 2 | | | +| 6 | Buried critical rules | 2 | | | +| 7 | No error handling | 2 | | | +| 8 | Content duplication | 3 | | | +| 9 | AI generated skill | 2 | | | +| 10 | Vibe-coded knowledge core | 2 | | | +| 11 | Skill-prompt | 2 | | | +| 12 | Hardcoded paths | 3 | | | +| 13 | Mirroring MCP schema | 2 | | | +| 14 | NL-code confusion | 2 | | | +| 15 | Schema drift risk | 4 | | | +| 16 | Context overfitting | 3 | | | +| 17 | Overengineering | 2 | | | +| 18 | Lifecycle hygiene gap (rot risk) | 4 | | | +| 19 | Silent chain failures | 2 | | | +| 20 | Monolithic reference dump | 3 | | | + +--- + +## Part G — Report Format + +### Language Rules + +- **FAIL / WARNING:** `[ID] — [human-readable description]` (ID for traceability, description is primary) +- **PASS:** descriptions without IDs, comma-separated +- **N/A:** grouped by reason, without IDs + +### Template + +```markdown +# Skill Review: [skill-name] + +## Overall Assessment + +| Parameter | Value | +|---|---| +| **Review level** | Standard (Novice) | +| **Review mode** | Single-pass | +| **Declared target stage** | [2 / 3 / 4 / not specified] | +| **Review scope** | [up to 2 / up to 3 / full] | + +--- + +## Review Limitations + +[Optional section. Show only if there were execution degradations: unreadable files, log write failure, context overflow, partially restricted scope.] + +> - [What went wrong] +> - [How it affected completeness] +> - [What was done as fallback] + +--- + +## Summary Statistics + +| FAIL | WARNING | PASS | N/A | +|---|---|---|---| +| [N] | [N] | [N] | [N] | + +**Review scope:** [up to 2 / up to 3 / full]. [If scope < full: "Stage [3, 4 / 4] checks were not performed."] + +--- + +## Stage 2 Issues — Personal Use + +### [ID] — [Human-readable title] +**Problem:** [What is wrong — specifically] +**Where:** [file § section] +**Why it matters:** [Consequences — understandable for a non-specialist] +**Recommendation:** [Concrete action] + +[If no issues:] +> All stage 2 checks passed. + +--- + +## Stage 3 Issues — Team Use + +[Same format. If not checked:] +> Stage 3 was not checked for the selected scope. + +--- + +## Stage 4 Issues — Repository + +[Same format.] + +--- + +## Passed Checks (PASS) + +[Compact paragraph without IDs, descriptions comma-separated.] + +--- + +## Not Applicable Checks (N/A) + +[Group related checks with a shared reason, without IDs.] + +--- + +## Antipattern Bingo + +[Filled-in table from Part F] + +**Total:** [N] CRITICAL, [N] MINOR, [N] NONE, [N] NOT_CHECKED. + +## Top 3 Recommendations + +**1. [Action]** [What, where, result.] +**2. [Action]** [What, where, result.] +**3. [Action]** [What, where, result.] + +--- + +## Summary from Exhausted Vitaly + +> [3–5 sentences. Tone: Marvin from The Hitchhiker's Guide to the Galaxy — melancholic, precise, to the point.] +> Exhausted Vitaly comments on findings in the context of the declared goal (Personal / Team / Repository / "I don't know"). +> No "stage N" label. Points to the main pain and predicts what will improve if the top issues are fixed. +> - Personal: "for solo use — [sufficient / lacking this]" +> - Team: "for the team — [main stage 3 findings are ...]" +> - Repository: "for the repository — [main stage 4 findings are ...]" +> - "I don't know": "if for yourself — [..]; if for a team — [..]; if for a repository — [..]" + +--- + +## Review Complete + +**Skill [strong/average/weak]** — [one sentence]. +**Main pain:** [one sentence]. +**Total:** FAIL: [N], WARNING: [N]. +**Next steps:** 1. [...] 2. [...] 3. [...] +``` + +### Formation Rules + +- Do not invent confirmations; do not inflate severity. +- If the review had limitations (unreadable files, invalid artifact) — the `Review Limitations` section is mandatory. +- Group findings by stages 2/3/4. Each section has three states: has findings / no issues / not checked. +- Evidence: `file § section`. Line numbers — only a hint. +- Top 3 — concrete actions, not abstractions. +- In "Review Complete" — a summary for quick scanning. +- If logging ON — write the full report to `report.md` in the folder confirmed by the user, and output a condensed summary to chat: counters, main pain, top 3, path to file. +- If logging OFF — do not write any files; output the full report to chat in its entirety. diff --git a/plugins/skill-review/skills/skill-review/references/maturity-diagnostic.md b/plugins/skill-review/skills/skill-review/references/maturity-diagnostic.md new file mode 100644 index 0000000..6bae756 --- /dev/null +++ b/plugins/skill-review/skills/skill-review/references/maturity-diagnostic.md @@ -0,0 +1,23 @@ +# Maturity Stage Legend + +> Reference for Exhausted Vitaly's contextual comments. Novice **does not compute** the maturity stage and does not determine the "actual stage" of a skill. This file is only for Exhausted Vitaly to contextualize findings relative to the user's declared goal. + +--- + +## Stages + +| Stage | Question | Value | Key Limitation | +|:---|:---|:---|:---| +| **1. Auto-generated** | Is this not garbage? | ~0: the model paraphrases what it already "knows" | Self-generated skills provide no improvement (SkillsBench) | +| **2. Personal skill** | Does the skill work for the author? | Captures one working path, reduces stochastic variance | Tied to the author's machine, permissions, data, and habits | +| **3. Team skill** | Can a colleague find, understand, and run it? | Portability, discoverability, stable navigation | No lifecycle artifacts, not embedded in an ecosystem | +| **4. Repository skill** | Does the skill live in an ecosystem and not rot? | Ownership, versioning, routing, lifecycle hygiene | Requires time investment; justified for skill collections | + +--- + +## Practical Notes + +- **Stage 2 — Personal skill.** Captures a working path and removes stochastic variance. Sufficient if the skill is only for you. +- **Stage 3 — Team skill.** A colleague can find, understand, and run it without verbal explanation. The main stage 2 danger is hidden coupling to the author's machine. +- **Stage 4 — Repository skill.** Owner, version, changelog, place in the ecosystem. The skill outlives its author. Changelogs and version headers matter here — they track modification history and make continuous support possible without depending on the original author's memory. +- **Important:** not every skill needs to be stage 4. A personal template is fine at stage 2. diff --git a/plugins/skill-review/skills/skill-review/references/report-template.md b/plugins/skill-review/skills/skill-review/references/report-template.md new file mode 100644 index 0000000..b8e1fcf --- /dev/null +++ b/plugins/skill-review/skills/skill-review/references/report-template.md @@ -0,0 +1,183 @@ +# Skill Review Novice — Report Template + +> The orchestrator uses this template to generate the final report after collecting condensed summaries from the launched sub-agents. + +--- + +## Report Language Rules + +Write findings in human-readable form. The language must be understandable to a product owner, analyst, or manager: +- **FAIL / WARNING:** `[ID] — [human-readable description]` (ID is needed for traceability, but the description is primary) +- **PASS:** descriptions only without IDs, comma-separated: `Folder in kebab-case, instructions in imperative, numbered steps` +- **N/A:** grouped by reason, without IDs: `No markdown anchors — 3 checks not applicable` + +--- + +## Evidence and Reference Rules + +- **Primary anchor:** `file.md § Section Name` — a stable reference to the skill artifact. +- **Temporary marker:** `line ~N` — an auxiliary hint for the current review. +- **Review artifact reference:** `log-subagent.md#ID` — traceability in logs (if logging is enabled). + +In findings, rely on `file § section`. Line numbers — only as an optional hint. + +--- + +## Template + +```markdown +# Skill Review: [skill-name] + +## Overall Assessment + +| Parameter | Value | +|---|---| +| **Review level** | Standard (Novice) | +| **Review mode** | [Single-pass / Sub-agent] | +| **Declared target stage** | [2 / 3 / 4 / not specified] | +| **Review scope** | [up to 2 / up to 3 / full] | + +--- + +## Review Limitations + +[Optional section. Show only if there were coverage limitations: unreadable files, sub-agent failure, partially restricted scope.] + +> - [What went wrong] +> - [How it affected the completeness of the review] +> - [What was done as fallback] + +--- + +## Summary Statistics + +| FAIL | WARNING | PASS | N/A | +|---|---|---|---| +| [N] | [N] | [N] | [N] | + +**Review scope:** [up to 2 / up to 3 / full]. [If scope < full: "Stage [3, 4 / 4] checks were not performed."] + +--- + +## Stage 2 Issues — Personal Use + +[If checked and there are findings:] + +### [ID] — [Human-readable title] + +**Problem:** [What exactly is wrong — specifically, in plain language] + +**Where:** [file § section] + +**Why it matters:** [Consequences — understandable for a non-specialist] + +**Recommendation:** [Concrete action: what to do, where] + +[If logging is enabled:] **Details:** `log-[subagent].md#[ID]` + +[If checked and no issues:] +> All stage 2 checks passed. + +--- + +## Stage 3 Issues — Team Use + +[If checked and there are findings — same format] + +[If checked and no issues:] +> All stage 3 checks passed. + +[If not checked:] +> Stage 3 was not checked for the selected scope. + +--- + +## Stage 4 Issues — Repository + +[Same as stage 3] + +--- + +## Not Applicable Checks (N/A) + +[Group related checks on one line with a shared reason. No IDs.] + +Example format: +> - No markdown anchors or TOC — 3 checks not applicable +> - No external URLs in the skill — 1 check not applicable +> - Skill does not use sub-agents — 3 checks not applicable + +--- + +## Antipattern Bingo + +[Insert the filled-in table from references/antipattern-bingo.md] + +**Total:** [N] CRITICAL, [N] MINOR, [N] NONE, [N] NOT_CHECKED. + +## Top 3 Recommendations + +[Each recommendation has a bold title, a concrete action, and an expected effect.] + +**1. [Action title]** +[What exactly to do. Where. What result.] + +**2. [Action title]** +[What exactly to do. Where. What result.] + +**3. [Action title]** +[What exactly to do. Where. What result.] + +--- + +## Summary from Exhausted Vitaly + +> [3–5 sentences. Tone: sentimental, mildly sarcastic but not rude — like the melancholy robot Marvin from The Hitchhiker's Guide to the Galaxy by Douglas Adams. Exhausted Vitaly comments on findings in the context of the declared goal (Personal / Team / Repository / "I don't know"). No label "stage N". Points to the main pain and predicts what will improve if the top issues are fixed.] +> +> Example (Personal scope, few issues): *"Well, it works. I've seen worse — mostly from optimists with access to a keyboard. Fix the missing examples and at least your future self won't have to guess what 'valid input' means."* +> +> Example (Team scope, several issues): *"Your colleagues will probably manage to run this. Probably. The lack of preconditions means the first person on a different OS gets to discover your hidden assumptions. I envy them the adventure."* + +--- + +## Passed Checks (PASS) + +[Compact paragraph. List **without IDs**, descriptions only, comma-separated.] + +Example format: +> Folder in kebab-case, SKILL.md correctly named, instructions in imperative, numbered steps, checkpoints on each step, negative triggers present, all file references valid, no orphan files. + +--- + +## Review Complete + +[Short summary after the report — 3–4 lines for quick scanning:] + +**Skill [strong/average/weak]** — [one sentence: main strength]. + +**Main pain:** [one sentence]. + +**Total:** FAIL: [N], WARNING: [N]. + +**Next steps by priority:** +1. [Action 1 — most important] +2. [Action 2] +3. [Action 3] +``` + +--- + +## Report Formation Rules + +- Do not invent confirmations of things not present in the files. +- Do not inflate severity: team/workspace-level patterns should not automatically become FAIL for an isolated skill without evidence. +- The external report must be professional. +- Review goal: understand whether the skill works in the **declared or checked context**. +- Each FAIL and WARNING must contain a reference to the detailed sub-agent log (`log-[subagent].md#[ID]`), **if logging is enabled and mode is sub-agent**. In single-pass mode, sub-agent log references are not created. If logging is off — the report is self-contained. +- In the PASS section — a compact paragraph without IDs, descriptions comma-separated. +- In the N/A section — group related checks without IDs. +- If the review had limitations (unreadable files, sub-agent failure) — the `Review Limitations` section is mandatory. +- Top 3 — concrete actions, not abstractions. +- In "Review Complete" — a summary for those who will not read the full report. +- **Grouping findings by stage:** FAIL and WARNING are distributed across stage 2/3/4 sections. Each section has three states: (1) checked, has findings; (2) checked, no issues; (3) not checked by scope. +- **Evidence:** primary anchor — `file § section`. Line numbers — only an auxiliary hint. diff --git a/plugins/skill-review/skills/skill-review/references/subagent-base-rules.md b/plugins/skill-review/skills/skill-review/references/subagent-base-rules.md new file mode 100644 index 0000000..381e9a9 --- /dev/null +++ b/plugins/skill-review/skills/skill-review/references/subagent-base-rules.md @@ -0,0 +1,20 @@ +# Sub-agent Base Rules + +> This file is mandatory for any sub-agent review. The sub-agent reads it after its own checklist and before starting checks. + +- Evaluate **only by observable artifacts** — do not infer what is not present in the files. +- Do not count as PASS any runs, stability, portability, or lifecycle maturity unless confirmed by files, scripts, examples, tests, or explicit instructions. +- If a section is not applicable (no `references/`, `scripts/`, MCP, sub-agents) — mark **N/A**, not FAIL. +- Do not inflate severity: team/workspace-level patterns should not automatically become FAIL for an isolated skill without evidence. +- Each issue must include a concrete recommendation: **what to fix and where**. +- **Justify before verdict:** for each check, state the file and section you rely on, and give a 1–2 sentence rationale. A full step-by-step breakdown with quotes is not required. +- **Language:** write logs, summaries, and all conclusions in the same language the user started the conversation in. +- If an instruction relies on hidden author knowledge (`"this is obvious"`), explicitly mark it as a **hidden assumption** and do not consider the step self-sufficient. +- If a step depends on OS, shell, package manager, runtime version, permissions, working directory, or project structure — separately flag as a **portability risk**. +- **Review scope:** check only IDs that belong to the current scope. Do not evaluate or mark as N/A any IDs above the scope. Do not soften severity within the checked scope. +- **Evidence format:** primary anchor is `file.md § Section Name`. Line numbers are only acceptable as a temporary auxiliary hint (`line ~N`). +- If a file cannot be read, appears as garbled text, or causes a decode/encoding problem — do not infer its content. Explicitly mark the artifact as unreadable and continue the review with available files. +- If you sense context overflow during the review — first narrow reading to relevant files and sections, then return only a condensed summary. Do not pull long excerpts into the main thread. +- If logging is enabled: first try to write `log-{subagent}.md` directly to the output folder, without requesting additional permissions from the user. +- If direct write fails — save the log to a temporary file and return `temp_log_path` to the orchestrator. +- If the temporary log also cannot be written — notify the orchestrator.