diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 053743c..3c4e8a3 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -6,14 +6,14 @@ }, "metadata": { "description": "Marketplace hosting the ssep plugin \u2014 Super Software Engineering Powers skills for Claude Code", - "version": "0.2.0" + "version": "0.2.1" }, "plugins": [ { "name": "ssep", "source": "./", "description": "Super Software Engineering Powers \u2014 specialized skills for spec review, design fidelity, feature completeness, and multi-layer testing. Complements the superpowers plugin.", - "version": "0.2.0", + "version": "0.2.1", "category": "engineering", "tags": [ "spec-review", diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index c2ca0b7..a4a0fca 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -1,7 +1,8 @@ { "name": "ssep", "description": "Super Software Engineering Powers \u2014 specialized skills for spec review, design fidelity, feature completeness, and multi-layer testing. Complements the superpowers plugin by filling gaps in spec/design review and integration testing workflows.", - "version": "0.2.0", + "//": "v0.2.1 added explicit anti-skip clauses to all four skill descriptions to push back against agentic 'task looks small, skip the skill' rationalization observed in real sessions.", + "version": "0.2.1", "author": { "name": "bill", "email": "bill.han@evar.co.kr" diff --git a/CHANGELOG.md b/CHANGELOG.md index bb46416..c3ebd7a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [0.2.1] — 2026-04-28 + +### Changed +- All four skill descriptions strengthened with explicit "Triggers even when ..." anti-skip clauses, naming the dominant skip rationalization per skill so the description itself pushes back against it: + - `reviewing-design-fidelity` — "even when the requested change appears trivial — single label swap, one CSS rule, one-line JSX edit" + - `reviewing-spec-and-policy` — "even when the requirement statement reads as a single concise sentence ... concise requirements typically hide unstated questions about default values for existing data, role/permission interactions, and state-transition edge cases" + - `improving-feature-completeness` — "even when the diff is small or the happy path was verified manually ... 'I already clicked through it in Playwright' does not substitute for the audit" + - `running-integration-tests` — "even when only one boundary is crossed ... manual verification proves the path works once but does not produce a codified regression test" + +### Added +- `CLAUDE.md` — new "Trigger discipline" section documenting four real-session skip rationalizations with their countering moves, derived from a session retrospective where all four skills should have fired but were each skipped under "task looks small" reasoning. Also notes the convention that future description edits must preserve at least one explicit "Triggers even when ..." clause per skill. + ## [0.2.0] — 2026-04-27 ### Added diff --git a/CLAUDE.md b/CLAUDE.md index 33f870b..cf73169 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -9,6 +9,23 @@ This directory is a Claude Code plugin providing four specialized SE skills (spe - Descriptions are 3rd-person ("Reviews ...", "Authors ...") not 2nd-person ("You MUST ..."), matching official Claude Code skill guidelines. - Each skill explicitly cross-references `superpowers` for adjacent responsibilities so the two plugins compose without overlap. +## Trigger discipline + +These skills are most often *skipped* not because the trigger keywords are unclear but because the task looks small in the moment. The descriptions in v0.2.1+ explicitly push back against this rationalization, but contributors editing descriptions or references should preserve and extend the pattern. + +Real-session skip rationalizations and the right move: + +| Rationalization observed in real sessions | The right move | +|---|---| +| "It's a single-line CSS swap, fidelity check is overkill" | Invoke `reviewing-design-fidelity`. The skill chooses scope; let it decide, not the agent. | +| "The requirement is one sentence, no spec to audit" | Invoke `reviewing-spec-and-policy`. Concise requirements hide unstated edge cases (default values for existing data, role differences, state-transition gaps). | +| "Happy path passed in Playwright, ship it" | Invoke `improving-feature-completeness`. State coverage (empty/loading/error/disabled) and i18n/a11y are not checked by happy-path e2e. | +| "I already clicked through it manually, no integration test needed" | Invoke `running-integration-tests`. Manual click ≠ codified regression. | + +Each skill's `## When NOT to use` section defines the *only* legitimate skip cases. If the situation isn't there, invoke the skill — the skill itself decides scope (full audit vs quick check) faster than the caller can rationalize a skip. + +When editing skill descriptions, keep one explicit "Triggers even when ..." clause per description that names the dominant rationalization. That clause is what makes the skill harder to skip in agentic sessions where the model is biased toward "looks fast, just do it." + ## Verifying changes After editing any `SKILL.md`: diff --git a/skills/improving-feature-completeness/SKILL.md b/skills/improving-feature-completeness/SKILL.md index cc8e54b..7ba51d2 100644 --- a/skills/improving-feature-completeness/SKILL.md +++ b/skills/improving-feature-completeness/SKILL.md @@ -1,6 +1,6 @@ --- name: improving-feature-completeness -description: Audits a working feature implementation for the gap between "happy path passes" and "ready to ship to production" — edge cases, error handling, loading/empty/error states, accessibility, responsive behavior, internationalization, observability, and operational hooks. Use when a feature implementation works but needs polish, when user says "완성도 높여줘", "production ready로 만들어줘", "edge case 검토", "polish this feature", "is this ready to ship?", "출시 가능한지 점검", "엣지 케이스 다 챙겼나", "PR 전 점검", "빠진 상태 없나", "ship 전 검토", or before shipping any non-trivial frontend or backend feature. Also triggers proactively right before PR creation on a non-trivial feature branch, even without explicit user request. Distinct from brainstorming (which designs the feature), test-driven-development (which implements it), and requesting-code-review (which evaluates code structure) — focuses specifically on raising the bar of an already-working implementation through a structured production-readiness audit. +description: Audits a working feature for the gap between "happy path passes" and "production-ready" — edge cases, error handling, loading/empty/error states, a11y, responsive, i18n, observability, ops hooks. Use when a feature works but needs polish, for "완성도 높여줘", "production ready로", "edge case 검토", "polish this feature", "is this ready to ship?", "엣지 케이스 다 챙겼나", "PR 전 점검", "빠진 상태 없나", or before shipping any non-trivial feature. Also triggers proactively right before PR creation on a non-trivial branch. Triggers even when the diff is small or the happy path was verified manually; gaps emerge from state interactions (empty/loading/error/disabled, role and locale variants), not line count, and "I already clicked through it in Playwright" does not substitute for the audit. Distinct from brainstorming (designs), test-driven-development (implements), and requesting-code-review (evaluates structure); raises the bar of a working impl via production-readiness audit. allowed-tools: Read, Grep, Glob, Edit, Bash --- diff --git a/skills/reviewing-design-fidelity/SKILL.md b/skills/reviewing-design-fidelity/SKILL.md index 06e7456..b94d234 100644 --- a/skills/reviewing-design-fidelity/SKILL.md +++ b/skills/reviewing-design-fidelity/SKILL.md @@ -1,6 +1,6 @@ --- name: reviewing-design-fidelity -description: Reviews implemented UI against design source-of-truth (Figma frames, mockup images, design system tokens) to verify visual fidelity — spacing, typography, color, responsive behavior, interaction states, and accessibility. Use when comparing built pages to design specs, when a Figma URL is shared alongside a running URL or screenshot, or when user says "디자인 검수", "퍼블리싱 검토", "compare with figma", "check design fidelity", "픽셀 비교", "design QA", "피그마 비교해서 차이점 찾아줘", "figma vs 현재 화면", "구현이 디자인과 맞는지", "피그마대로 되어있나", "현황과 비교", "figma url 과 현재 화면 대조". Also triggers when user shares a Figma URL and asks about existing implementation (comparison implied even without explicit keyword). Skill captures both sides (Figma extraction + Playwright snapshot of live impl), runs a structured visual diff, and reports gaps prioritized by user-visible impact. Distinct from reviewing-spec-and-policy (which audits docs) and from code review (which audits code structure). +description: Reviews implemented UI against design source-of-truth (Figma frames, mockup images, design tokens) for visual fidelity — spacing, typography, color, responsive behavior, interaction states, accessibility. Use when comparing built pages to design specs, when a Figma URL is shared alongside a live URL/screenshot, or for "디자인 검수", "퍼블리싱 검토", "compare with figma", "픽셀 비교", "design QA", "피그마대로 되어있나", "figma vs 현재 화면", "구현이 디자인과 맞는지". Also triggers when a Figma URL is shared with reference to existing implementation. Triggers even when the change appears trivial — single label swap, one CSS rule, one-line JSX edit — because apparent simplicity is the most common skip rationalization, and the skill itself decides scope (full audit vs quick check) faster than the caller can. Captures both sides (Figma + Playwright snapshot), runs structured visual diff, reports gaps by user-visible impact. Distinct from reviewing-spec-and-policy (audits docs) and code review (audits code structure). allowed-tools: Read, Glob, Grep, Bash, mcp__plugin_figma_figma__get_design_context, mcp__plugin_figma_figma__get_screenshot, mcp__plugin_figma_figma__get_metadata, mcp__plugin_figma_figma__get_variable_defs, mcp__plugin_figma_figma__get_code_connect_map, mcp__plugin_playwright_playwright__browser_navigate, mcp__plugin_playwright_playwright__browser_snapshot, mcp__plugin_playwright_playwright__browser_take_screenshot, mcp__plugin_playwright_playwright__browser_resize, mcp__plugin_playwright_playwright__browser_evaluate, mcp__plugin_playwright_playwright__browser_console_messages, mcp__plugin_playwright_playwright__browser_hover, mcp__plugin_playwright_playwright__browser_click --- diff --git a/skills/reviewing-spec-and-policy/SKILL.md b/skills/reviewing-spec-and-policy/SKILL.md index 642a2ea..c43e2b2 100644 --- a/skills/reviewing-spec-and-policy/SKILL.md +++ b/skills/reviewing-spec-and-policy/SKILL.md @@ -1,6 +1,6 @@ --- name: reviewing-spec-and-policy -description: Reviews product specifications, requirements documents, RFCs, PRDs, and policy documents from text, markdown, or Figma sources. Use when evaluating specs for completeness, internal consistency, edge-case coverage, ambiguity, conflicts with prior policies, or compliance gaps. Triggers on user requests like "이 기획서 검토해줘", "스펙 리뷰", "정책 검토", "review this PRD", "is this requirement complete?", "check this against [policy/RFC]", "이 기획대로 구현하면 빠진게 뭐가 있을까", "스펙 갭", "요구사항 누락", "기획서 애매한 부분", or whenever a Figma URL is shared in a planning/spec context (before implementation begins). Skill reads sources, runs a structured multi-dimensional review, and returns a prioritized gap report. Intentionally distinct from brainstorming (which generates new designs) and from reviewing-design-fidelity (which compares implementation vs design after code exists); this audits an existing artifact. +description: Reviews product specs, requirements docs, RFCs, PRDs, and policy documents from text, markdown, or Figma. Use for completeness, consistency, edge-case coverage, ambiguity, policy conflicts, or compliance gaps. Triggers on "이 기획서 검토해줘", "스펙 리뷰", "정책 검토", "review this PRD", "is this complete?", "이 기획대로 구현하면 빠진게 뭐가 있을까", "스펙 갭", "요구사항 누락", or whenever a Figma URL is shared in a planning context. Triggers even when the requirement reads as a single concise sentence (e.g. "X should default to inactive and be activatable from detail"); concise requirements hide unstated questions about default values for existing data, role/permission interactions, and state-transition edges that a structured audit surfaces in minutes. Reads sources, runs a multi-dimensional review, returns a prioritized gap report. Distinct from brainstorming (generates new designs) and reviewing-design-fidelity (compares impl vs design); this audits an existing artifact. allowed-tools: Read, Grep, Glob, WebFetch, mcp__plugin_figma_figma__get_design_context, mcp__plugin_figma_figma__get_screenshot, mcp__plugin_figma_figma__get_metadata, mcp__plugin_figma_figma__get_variable_defs --- diff --git a/skills/running-integration-tests/SKILL.md b/skills/running-integration-tests/SKILL.md index a5d9c5e..f0683ec 100644 --- a/skills/running-integration-tests/SKILL.md +++ b/skills/running-integration-tests/SKILL.md @@ -1,6 +1,6 @@ --- name: running-integration-tests -description: Authors and executes tests across all levels of the test pyramid — unit, frontend-backend integration, and end-to-end browser scenarios — using Playwright MCP and standard test runners (jest, vitest, mocha). Use when verifying multi-layer behavior or when user requests "e2e 테스트", "integration 테스트", "통합 테스트", "playwright로 검증", "브라우저에서 직접 확인", "백엔드-프론트 연동 테스트", "API 통합 테스트", "staging에서 확인", "staging 검증", "배포 후 검증", "실제 화면에서 확인", "smoke test", or when a feature crosses service boundaries (API + UI + DB). Also triggers when user asks to verify a deployed feature or when a PR test plan has unchecked browser/API verification items. Helps decide which test level fits, sets up infrastructure, writes tests, runs them, reports failures with diagnostic context. Distinct from superpowers:test-driven-development (unit-level TDD); this handles integration and e2e tiers. +description: Authors and runs tests across the test pyramid — unit, FE↔BE integration, and end-to-end browser scenarios — using Playwright MCP and runners (jest, vitest, mocha). Use when verifying multi-layer behavior or for "e2e 테스트", "integration 테스트", "통합 테스트", "playwright로 검증", "브라우저에서 직접 확인", "백엔드-프론트 연동", "API 통합 테스트", "staging 검증", "배포 후 검증", "smoke test", or when a feature crosses boundaries (API + UI + DB). Also triggers when verifying a deployed feature or when a PR test plan has unchecked browser/API items. Triggers even when only one boundary is crossed and a manual Playwright click was performed; manual verification proves the path works once but does not produce a codified regression, and integration-tier surprises cluster at exactly the boundary that "looked fine" during manual checks. Decides which test level fits, sets up infra, writes tests, runs them, reports failures with diagnostic context. Distinct from superpowers:test-driven-development (unit-level TDD); handles integration and e2e tiers. allowed-tools: Read, Grep, Glob, Edit, Bash, mcp__plugin_playwright_playwright__browser_navigate, mcp__plugin_playwright_playwright__browser_click, mcp__plugin_playwright_playwright__browser_type, mcp__plugin_playwright_playwright__browser_fill_form, mcp__plugin_playwright_playwright__browser_select_option, mcp__plugin_playwright_playwright__browser_press_key, mcp__plugin_playwright_playwright__browser_hover, mcp__plugin_playwright_playwright__browser_snapshot, mcp__plugin_playwright_playwright__browser_take_screenshot, mcp__plugin_playwright_playwright__browser_evaluate, mcp__plugin_playwright_playwright__browser_console_messages, mcp__plugin_playwright_playwright__browser_network_requests, mcp__plugin_playwright_playwright__browser_wait_for, mcp__plugin_playwright_playwright__browser_resize, mcp__plugin_playwright_playwright__browser_navigate_back, mcp__plugin_playwright_playwright__browser_handle_dialog, mcp__plugin_playwright_playwright__browser_close ---