From 1400b414bf844ba81fe4e4f2a95058b73ab6c216 Mon Sep 17 00:00:00 2001
From: bill <bill.han@evar.co.kr>
Date: Tue, 28 Apr 2026 11:37:02 +0900
Subject: [PATCH 1/3] feat(v0.3.0): anti-skip clauses + trigger discipline
 guidance
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

이번 릴리즈는 description matcher 의 강도가 아니라 invocation discipline
자체를 강화한다. 실제 세션 회고에서 4개 스킬 모두 트리거 키워드를 만족
했음에도 "task가 작아 보였기 때문에" 라는 합리화로 invoke 가 누락된 패턴이
관찰됨. 이를 description 본문과 CLAUDE.md 양쪽에서 명시적으로 차단.

### 변경

- 4개 SKILL.md description 에 "Triggers even when ..." 절 추가:
  - reviewing-design-fidelity: 단일 라벨 swap / 1 CSS rule 같은 trivial
    수정도 invoke 대상
  - reviewing-spec-and-policy: 한 문장짜리 요구사항이 가장 흔한 갭
    유발 패턴
  - improving-feature-completeness: small diff / 수동 happy-path
    검증으로 audit 대체 불가
  - running-integration-tests: 수동 클릭은 codified regression 이 아님
- CLAUDE.md "Trigger discipline" 섹션 신설 — 4개 합리화 → 올바른 행동
  매핑 표 + future contributor 가 description 편집 시 보존해야 할
  컨벤션 명시
- plugin.json / marketplace.json version 0.2.0 → 0.3.0
- CHANGELOG [0.3.0] 항목 추가, minor bump 사유 명기 (description
  matcher behavior 변경)

### 검증

- jq -e: plugin.json / marketplace.json 모두 valid
- wc -l: 4개 SKILL.md 모두 ~150 미만 (96 / 100 / 110 / 114)
- 프런트매터 탭 미존재

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .claude-plugin/marketplace.json                |  4 ++--
 .claude-plugin/plugin.json                     |  3 ++-
 CHANGELOG.md                                   | 15 +++++++++++++++
 CLAUDE.md                                      | 17 +++++++++++++++++
 skills/improving-feature-completeness/SKILL.md |  2 +-
 skills/reviewing-design-fidelity/SKILL.md      |  2 +-
 skills/reviewing-spec-and-policy/SKILL.md      |  2 +-
 skills/running-integration-tests/SKILL.md      |  2 +-
 8 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
index 053743c..7cc1923 100644
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -6,14 +6,14 @@
   },
   "metadata": {
     "description": "Marketplace hosting the ssep plugin \u2014 Super Software Engineering Powers skills for Claude Code",
-    "version": "0.2.0"
+    "version": "0.3.0"
   },
   "plugins": [
     {
       "name": "ssep",
       "source": "./",
       "description": "Super Software Engineering Powers \u2014 specialized skills for spec review, design fidelity, feature completeness, and multi-layer testing. Complements the superpowers plugin.",
-      "version": "0.2.0",
+      "version": "0.3.0",
       "category": "engineering",
       "tags": [
         "spec-review",
diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
index c2ca0b7..8e40932 100644
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -1,7 +1,8 @@
 {
   "name": "ssep",
   "description": "Super Software Engineering Powers \u2014 specialized skills for spec review, design fidelity, feature completeness, and multi-layer testing. Complements the superpowers plugin by filling gaps in spec/design review and integration testing workflows.",
-  "version": "0.2.0",
+  "//": "v0.3.0 added explicit anti-skip clauses to all four skill descriptions to push back against agentic 'task looks small, skip the skill' rationalization observed in real sessions.",
+  "version": "0.3.0",
   "author": {
     "name": "bill",
     "email": "bill.han@evar.co.kr"
diff --git a/CHANGELOG.md b/CHANGELOG.md
index bb46416..ff5aa6b 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [0.3.0] — 2026-04-28
+
+### Changed
+- All four skill descriptions strengthened with explicit "Triggers even when ..." anti-skip clauses, naming the dominant skip rationalization per skill so the description itself pushes back against it:
+  - `reviewing-design-fidelity` — "even when the requested change appears trivial — single label swap, one CSS rule, one-line JSX edit"
+  - `reviewing-spec-and-policy` — "even when the requirement statement reads as a single concise sentence ... concise requirements typically hide unstated questions about default values for existing data, role/permission interactions, and state-transition edge cases"
+  - `improving-feature-completeness` — "even when the diff is small or the happy path was verified manually ... 'I already clicked through it in Playwright' does not substitute for the audit"
+  - `running-integration-tests` — "even when only one boundary is crossed ... manual verification proves the path works once but does not produce a codified regression test"
+
+### Added
+- `CLAUDE.md` — new "Trigger discipline" section documenting four real-session skip rationalizations with their countering moves, derived from a session retrospective where all four skills should have fired but were each skipped under "task looks small" reasoning. Also notes the convention that future description edits must preserve at least one explicit "Triggers even when ..." clause per skill.
+
+### Why this is a minor (not patch) bump
+- Description matcher behavior changes — each description gained ~150 chars of new trigger surface. This is a behavior-relevant change for the eval loop and downstream agents that cache description hashes, so the version is bumped past patch.
+
 ## [0.2.0] — 2026-04-27
 
 ### Added
diff --git a/CLAUDE.md b/CLAUDE.md
index 33f870b..a6e7eea 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -9,6 +9,23 @@ This directory is a Claude Code plugin providing four specialized SE skills (spe
 - Descriptions are 3rd-person ("Reviews ...", "Authors ...") not 2nd-person ("You MUST ..."), matching official Claude Code skill guidelines.
 - Each skill explicitly cross-references `superpowers` for adjacent responsibilities so the two plugins compose without overlap.
 
+## Trigger discipline
+
+These skills are most often *skipped* not because the trigger keywords are unclear but because the task looks small in the moment. The descriptions in v0.3.0+ explicitly push back against this rationalization, but contributors editing descriptions or references should preserve and extend the pattern.
+
+Real-session skip rationalizations and the right move:
+
+| Rationalization observed in real sessions | The right move |
+|---|---|
+| "It's a single-line CSS swap, fidelity check is overkill" | Invoke `reviewing-design-fidelity`. The skill chooses scope; let it decide, not the agent. |
+| "The requirement is one sentence, no spec to audit" | Invoke `reviewing-spec-and-policy`. Concise requirements hide unstated edge cases (default values for existing data, role differences, state-transition gaps). |
+| "Happy path passed in Playwright, ship it" | Invoke `improving-feature-completeness`. State coverage (empty/loading/error/disabled) and i18n/a11y are not checked by happy-path e2e. |
+| "I already clicked through it manually, no integration test needed" | Invoke `running-integration-tests`. Manual click ≠ codified regression. |
+
+Each skill's `## When NOT to use` section defines the *only* legitimate skip cases. If the situation isn't there, invoke the skill — the skill itself decides scope (full audit vs quick check) faster than the caller can rationalize a skip.
+
+When editing skill descriptions, keep one explicit "Triggers even when ..." clause per description that names the dominant rationalization. That clause is what makes the skill harder to skip in agentic sessions where the model is biased toward "looks fast, just do it."
+
 ## Verifying changes
 
 After editing any `SKILL.md`:
diff --git a/skills/improving-feature-completeness/SKILL.md b/skills/improving-feature-completeness/SKILL.md
index cc8e54b..09425b7 100644
--- a/skills/improving-feature-completeness/SKILL.md
+++ b/skills/improving-feature-completeness/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: improving-feature-completeness
-description: Audits a working feature implementation for the gap between "happy path passes" and "ready to ship to production" — edge cases, error handling, loading/empty/error states, accessibility, responsive behavior, internationalization, observability, and operational hooks. Use when a feature implementation works but needs polish, when user says "완성도 높여줘", "production ready로 만들어줘", "edge case 검토", "polish this feature", "is this ready to ship?", "출시 가능한지 점검", "엣지 케이스 다 챙겼나", "PR 전 점검", "빠진 상태 없나", "ship 전 검토", or before shipping any non-trivial frontend or backend feature. Also triggers proactively right before PR creation on a non-trivial feature branch, even without explicit user request. Distinct from brainstorming (which designs the feature), test-driven-development (which implements it), and requesting-code-review (which evaluates code structure) — focuses specifically on raising the bar of an already-working implementation through a structured production-readiness audit.
+description: Audits a working feature implementation for the gap between "happy path passes" and "ready to ship to production" — edge cases, error handling, loading/empty/error states, accessibility, responsive behavior, internationalization, observability, and operational hooks. Use when a feature implementation works but needs polish, when user says "완성도 높여줘", "production ready로 만들어줘", "edge case 검토", "polish this feature", "is this ready to ship?", "출시 가능한지 점검", "엣지 케이스 다 챙겼나", "PR 전 점검", "빠진 상태 없나", "ship 전 검토", or before shipping any non-trivial frontend or backend feature. Also triggers proactively right before PR creation on a non-trivial feature branch, even without explicit user request. Triggers even when the diff is small or the happy path was verified manually; completeness gaps emerge from state interactions (empty/loading/error/disabled, role variants, locale variants), not from line count, and "I already clicked through it in Playwright" does not substitute for the audit. Distinct from brainstorming (which designs the feature), test-driven-development (which implements it), and requesting-code-review (which evaluates code structure) — focuses specifically on raising the bar of an already-working implementation through a structured production-readiness audit.
 allowed-tools: Read, Grep, Glob, Edit, Bash
 ---
 
diff --git a/skills/reviewing-design-fidelity/SKILL.md b/skills/reviewing-design-fidelity/SKILL.md
index 06e7456..ee039d9 100644
--- a/skills/reviewing-design-fidelity/SKILL.md
+++ b/skills/reviewing-design-fidelity/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: reviewing-design-fidelity
-description: Reviews implemented UI against design source-of-truth (Figma frames, mockup images, design system tokens) to verify visual fidelity — spacing, typography, color, responsive behavior, interaction states, and accessibility. Use when comparing built pages to design specs, when a Figma URL is shared alongside a running URL or screenshot, or when user says "디자인 검수", "퍼블리싱 검토", "compare with figma", "check design fidelity", "픽셀 비교", "design QA", "피그마 비교해서 차이점 찾아줘", "figma vs 현재 화면", "구현이 디자인과 맞는지", "피그마대로 되어있나", "현황과 비교", "figma url 과 현재 화면 대조". Also triggers when user shares a Figma URL and asks about existing implementation (comparison implied even without explicit keyword). Skill captures both sides (Figma extraction + Playwright snapshot of live impl), runs a structured visual diff, and reports gaps prioritized by user-visible impact. Distinct from reviewing-spec-and-policy (which audits docs) and from code review (which audits code structure).
+description: Reviews implemented UI against design source-of-truth (Figma frames, mockup images, design system tokens) to verify visual fidelity — spacing, typography, color, responsive behavior, interaction states, and accessibility. Use when comparing built pages to design specs, when a Figma URL is shared alongside a running URL or screenshot, or when user says "디자인 검수", "퍼블리싱 검토", "compare with figma", "check design fidelity", "픽셀 비교", "design QA", "피그마 비교해서 차이점 찾아줘", "figma vs 현재 화면", "구현이 디자인과 맞는지", "피그마대로 되어있나", "현황과 비교", "figma url 과 현재 화면 대조". Also triggers when user shares a Figma URL and asks about existing implementation (comparison implied even without explicit keyword). Triggers even when the requested change appears trivial — single label swap, one CSS rule, one-line JSX edit — because apparent simplicity is the most common rationalization for skipping fidelity checks, and the skill itself decides scope (full audit vs quick check) faster than the caller can. Skill captures both sides (Figma extraction + Playwright snapshot of live impl), runs a structured visual diff, and reports gaps prioritized by user-visible impact. Distinct from reviewing-spec-and-policy (which audits docs) and from code review (which audits code structure).
 allowed-tools: Read, Glob, Grep, Bash, mcp__plugin_figma_figma__get_design_context, mcp__plugin_figma_figma__get_screenshot, mcp__plugin_figma_figma__get_metadata, mcp__plugin_figma_figma__get_variable_defs, mcp__plugin_figma_figma__get_code_connect_map, mcp__plugin_playwright_playwright__browser_navigate, mcp__plugin_playwright_playwright__browser_snapshot, mcp__plugin_playwright_playwright__browser_take_screenshot, mcp__plugin_playwright_playwright__browser_resize, mcp__plugin_playwright_playwright__browser_evaluate, mcp__plugin_playwright_playwright__browser_console_messages, mcp__plugin_playwright_playwright__browser_hover, mcp__plugin_playwright_playwright__browser_click
 ---
 
diff --git a/skills/reviewing-spec-and-policy/SKILL.md b/skills/reviewing-spec-and-policy/SKILL.md
index 642a2ea..6374a0b 100644
--- a/skills/reviewing-spec-and-policy/SKILL.md
+++ b/skills/reviewing-spec-and-policy/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: reviewing-spec-and-policy
-description: Reviews product specifications, requirements documents, RFCs, PRDs, and policy documents from text, markdown, or Figma sources. Use when evaluating specs for completeness, internal consistency, edge-case coverage, ambiguity, conflicts with prior policies, or compliance gaps. Triggers on user requests like "이 기획서 검토해줘", "스펙 리뷰", "정책 검토", "review this PRD", "is this requirement complete?", "check this against [policy/RFC]", "이 기획대로 구현하면 빠진게 뭐가 있을까", "스펙 갭", "요구사항 누락", "기획서 애매한 부분", or whenever a Figma URL is shared in a planning/spec context (before implementation begins). Skill reads sources, runs a structured multi-dimensional review, and returns a prioritized gap report. Intentionally distinct from brainstorming (which generates new designs) and from reviewing-design-fidelity (which compares implementation vs design after code exists); this audits an existing artifact.
+description: Reviews product specifications, requirements documents, RFCs, PRDs, and policy documents from text, markdown, or Figma sources. Use when evaluating specs for completeness, internal consistency, edge-case coverage, ambiguity, conflicts with prior policies, or compliance gaps. Triggers on user requests like "이 기획서 검토해줘", "스펙 리뷰", "정책 검토", "review this PRD", "is this requirement complete?", "check this against [policy/RFC]", "이 기획대로 구현하면 빠진게 뭐가 있을까", "스펙 갭", "요구사항 누락", "기획서 애매한 부분", or whenever a Figma URL is shared in a planning/spec context (before implementation begins). Triggers even when the requirement statement reads as a single concise sentence (e.g. "X should default to inactive and be activatable from detail"); concise requirements typically hide unstated questions about default values for existing data, role/permission interactions, and state-transition edge cases that a structured audit surfaces in minutes. Skill reads sources, runs a structured multi-dimensional review, and returns a prioritized gap report. Intentionally distinct from brainstorming (which generates new designs) and from reviewing-design-fidelity (which compares implementation vs design after code exists); this audits an existing artifact.
 allowed-tools: Read, Grep, Glob, WebFetch, mcp__plugin_figma_figma__get_design_context, mcp__plugin_figma_figma__get_screenshot, mcp__plugin_figma_figma__get_metadata, mcp__plugin_figma_figma__get_variable_defs
 ---
 
diff --git a/skills/running-integration-tests/SKILL.md b/skills/running-integration-tests/SKILL.md
index a5d9c5e..6997a12 100644
--- a/skills/running-integration-tests/SKILL.md
+++ b/skills/running-integration-tests/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: running-integration-tests
-description: Authors and executes tests across all levels of the test pyramid — unit, frontend-backend integration, and end-to-end browser scenarios — using Playwright MCP and standard test runners (jest, vitest, mocha). Use when verifying multi-layer behavior or when user requests "e2e 테스트", "integration 테스트", "통합 테스트", "playwright로 검증", "브라우저에서 직접 확인", "백엔드-프론트 연동 테스트", "API 통합 테스트", "staging에서 확인", "staging 검증", "배포 후 검증", "실제 화면에서 확인", "smoke test", or when a feature crosses service boundaries (API + UI + DB). Also triggers when user asks to verify a deployed feature or when a PR test plan has unchecked browser/API verification items. Helps decide which test level fits, sets up infrastructure, writes tests, runs them, reports failures with diagnostic context. Distinct from superpowers:test-driven-development (unit-level TDD); this handles integration and e2e tiers.
+description: Authors and executes tests across all levels of the test pyramid — unit, frontend-backend integration, and end-to-end browser scenarios — using Playwright MCP and standard test runners (jest, vitest, mocha). Use when verifying multi-layer behavior or when user requests "e2e 테스트", "integration 테스트", "통합 테스트", "playwright로 검증", "브라우저에서 직접 확인", "백엔드-프론트 연동 테스트", "API 통합 테스트", "staging에서 확인", "staging 검증", "배포 후 검증", "실제 화면에서 확인", "smoke test", or when a feature crosses service boundaries (API + UI + DB). Also triggers when user asks to verify a deployed feature or when a PR test plan has unchecked browser/API verification items. Triggers even when only one boundary is crossed in the change and a manual Playwright click was already performed; manual verification proves the path works once but does not produce a codified regression test, and integration-tier surprises cluster at exactly the boundary that "looked fine" during manual checks. Helps decide which test level fits, sets up infrastructure, writes tests, runs them, reports failures with diagnostic context. Distinct from superpowers:test-driven-development (unit-level TDD); this handles integration and e2e tiers.
 allowed-tools: Read, Grep, Glob, Edit, Bash, mcp__plugin_playwright_playwright__browser_navigate, mcp__plugin_playwright_playwright__browser_click, mcp__plugin_playwright_playwright__browser_type, mcp__plugin_playwright_playwright__browser_fill_form, mcp__plugin_playwright_playwright__browser_select_option, mcp__plugin_playwright_playwright__browser_press_key, mcp__plugin_playwright_playwright__browser_hover, mcp__plugin_playwright_playwright__browser_snapshot, mcp__plugin_playwright_playwright__browser_take_screenshot, mcp__plugin_playwright_playwright__browser_evaluate, mcp__plugin_playwright_playwright__browser_console_messages, mcp__plugin_playwright_playwright__browser_network_requests, mcp__plugin_playwright_playwright__browser_wait_for, mcp__plugin_playwright_playwright__browser_resize, mcp__plugin_playwright_playwright__browser_navigate_back, mcp__plugin_playwright_playwright__browser_handle_dialog, mcp__plugin_playwright_playwright__browser_close
 ---
 

From e206dca82da4cc63f3424673244af082e92b3d23 Mon Sep 17 00:00:00 2001
From: bill <bill.han@evar.co.kr>
Date: Tue, 28 Apr 2026 11:42:00 +0900
Subject: [PATCH 2/3] fix(v0.3.0): trim 4 skill descriptions under 1024 char
 limit

Lint workflow flagged all 4 descriptions over the 1024 char limit
after the v0.3.0 anti-skip clause additions. Trimmed redundant Korean
trigger phrases and tightened English phrasing while preserving the
new "Triggers even when ..." clause (the value-add of v0.3.0).

Final lengths:
- reviewing-design-fidelity:      989 / 1024
- reviewing-spec-and-policy:      947 / 1024
- improving-feature-completeness: 962 / 1024
- running-integration-tests:     1014 / 1024

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 skills/improving-feature-completeness/SKILL.md | 2 +-
 skills/reviewing-design-fidelity/SKILL.md      | 2 +-
 skills/reviewing-spec-and-policy/SKILL.md      | 2 +-
 skills/running-integration-tests/SKILL.md      | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/skills/improving-feature-completeness/SKILL.md b/skills/improving-feature-completeness/SKILL.md
index 09425b7..7ba51d2 100644
--- a/skills/improving-feature-completeness/SKILL.md
+++ b/skills/improving-feature-completeness/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: improving-feature-completeness
-description: Audits a working feature implementation for the gap between "happy path passes" and "ready to ship to production" — edge cases, error handling, loading/empty/error states, accessibility, responsive behavior, internationalization, observability, and operational hooks. Use when a feature implementation works but needs polish, when user says "완성도 높여줘", "production ready로 만들어줘", "edge case 검토", "polish this feature", "is this ready to ship?", "출시 가능한지 점검", "엣지 케이스 다 챙겼나", "PR 전 점검", "빠진 상태 없나", "ship 전 검토", or before shipping any non-trivial frontend or backend feature. Also triggers proactively right before PR creation on a non-trivial feature branch, even without explicit user request. Triggers even when the diff is small or the happy path was verified manually; completeness gaps emerge from state interactions (empty/loading/error/disabled, role variants, locale variants), not from line count, and "I already clicked through it in Playwright" does not substitute for the audit. Distinct from brainstorming (which designs the feature), test-driven-development (which implements it), and requesting-code-review (which evaluates code structure) — focuses specifically on raising the bar of an already-working implementation through a structured production-readiness audit.
+description: Audits a working feature for the gap between "happy path passes" and "production-ready" — edge cases, error handling, loading/empty/error states, a11y, responsive, i18n, observability, ops hooks. Use when a feature works but needs polish, for "완성도 높여줘", "production ready로", "edge case 검토", "polish this feature", "is this ready to ship?", "엣지 케이스 다 챙겼나", "PR 전 점검", "빠진 상태 없나", or before shipping any non-trivial feature. Also triggers proactively right before PR creation on a non-trivial branch. Triggers even when the diff is small or the happy path was verified manually; gaps emerge from state interactions (empty/loading/error/disabled, role and locale variants), not line count, and "I already clicked through it in Playwright" does not substitute for the audit. Distinct from brainstorming (designs), test-driven-development (implements), and requesting-code-review (evaluates structure); raises the bar of a working impl via production-readiness audit.
 allowed-tools: Read, Grep, Glob, Edit, Bash
 ---
 
diff --git a/skills/reviewing-design-fidelity/SKILL.md b/skills/reviewing-design-fidelity/SKILL.md
index ee039d9..b94d234 100644
--- a/skills/reviewing-design-fidelity/SKILL.md
+++ b/skills/reviewing-design-fidelity/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: reviewing-design-fidelity
-description: Reviews implemented UI against design source-of-truth (Figma frames, mockup images, design system tokens) to verify visual fidelity — spacing, typography, color, responsive behavior, interaction states, and accessibility. Use when comparing built pages to design specs, when a Figma URL is shared alongside a running URL or screenshot, or when user says "디자인 검수", "퍼블리싱 검토", "compare with figma", "check design fidelity", "픽셀 비교", "design QA", "피그마 비교해서 차이점 찾아줘", "figma vs 현재 화면", "구현이 디자인과 맞는지", "피그마대로 되어있나", "현황과 비교", "figma url 과 현재 화면 대조". Also triggers when user shares a Figma URL and asks about existing implementation (comparison implied even without explicit keyword). Triggers even when the requested change appears trivial — single label swap, one CSS rule, one-line JSX edit — because apparent simplicity is the most common rationalization for skipping fidelity checks, and the skill itself decides scope (full audit vs quick check) faster than the caller can. Skill captures both sides (Figma extraction + Playwright snapshot of live impl), runs a structured visual diff, and reports gaps prioritized by user-visible impact. Distinct from reviewing-spec-and-policy (which audits docs) and from code review (which audits code structure).
+description: Reviews implemented UI against design source-of-truth (Figma frames, mockup images, design tokens) for visual fidelity — spacing, typography, color, responsive behavior, interaction states, accessibility. Use when comparing built pages to design specs, when a Figma URL is shared alongside a live URL/screenshot, or for "디자인 검수", "퍼블리싱 검토", "compare with figma", "픽셀 비교", "design QA", "피그마대로 되어있나", "figma vs 현재 화면", "구현이 디자인과 맞는지". Also triggers when a Figma URL is shared with reference to existing implementation. Triggers even when the change appears trivial — single label swap, one CSS rule, one-line JSX edit — because apparent simplicity is the most common skip rationalization, and the skill itself decides scope (full audit vs quick check) faster than the caller can. Captures both sides (Figma + Playwright snapshot), runs structured visual diff, reports gaps by user-visible impact. Distinct from reviewing-spec-and-policy (audits docs) and code review (audits code structure).
 allowed-tools: Read, Glob, Grep, Bash, mcp__plugin_figma_figma__get_design_context, mcp__plugin_figma_figma__get_screenshot, mcp__plugin_figma_figma__get_metadata, mcp__plugin_figma_figma__get_variable_defs, mcp__plugin_figma_figma__get_code_connect_map, mcp__plugin_playwright_playwright__browser_navigate, mcp__plugin_playwright_playwright__browser_snapshot, mcp__plugin_playwright_playwright__browser_take_screenshot, mcp__plugin_playwright_playwright__browser_resize, mcp__plugin_playwright_playwright__browser_evaluate, mcp__plugin_playwright_playwright__browser_console_messages, mcp__plugin_playwright_playwright__browser_hover, mcp__plugin_playwright_playwright__browser_click
 ---
 
diff --git a/skills/reviewing-spec-and-policy/SKILL.md b/skills/reviewing-spec-and-policy/SKILL.md
index 6374a0b..c43e2b2 100644
--- a/skills/reviewing-spec-and-policy/SKILL.md
+++ b/skills/reviewing-spec-and-policy/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: reviewing-spec-and-policy
-description: Reviews product specifications, requirements documents, RFCs, PRDs, and policy documents from text, markdown, or Figma sources. Use when evaluating specs for completeness, internal consistency, edge-case coverage, ambiguity, conflicts with prior policies, or compliance gaps. Triggers on user requests like "이 기획서 검토해줘", "스펙 리뷰", "정책 검토", "review this PRD", "is this requirement complete?", "check this against [policy/RFC]", "이 기획대로 구현하면 빠진게 뭐가 있을까", "스펙 갭", "요구사항 누락", "기획서 애매한 부분", or whenever a Figma URL is shared in a planning/spec context (before implementation begins). Triggers even when the requirement statement reads as a single concise sentence (e.g. "X should default to inactive and be activatable from detail"); concise requirements typically hide unstated questions about default values for existing data, role/permission interactions, and state-transition edge cases that a structured audit surfaces in minutes. Skill reads sources, runs a structured multi-dimensional review, and returns a prioritized gap report. Intentionally distinct from brainstorming (which generates new designs) and from reviewing-design-fidelity (which compares implementation vs design after code exists); this audits an existing artifact.
+description: Reviews product specs, requirements docs, RFCs, PRDs, and policy documents from text, markdown, or Figma. Use for completeness, consistency, edge-case coverage, ambiguity, policy conflicts, or compliance gaps. Triggers on "이 기획서 검토해줘", "스펙 리뷰", "정책 검토", "review this PRD", "is this complete?", "이 기획대로 구현하면 빠진게 뭐가 있을까", "스펙 갭", "요구사항 누락", or whenever a Figma URL is shared in a planning context. Triggers even when the requirement reads as a single concise sentence (e.g. "X should default to inactive and be activatable from detail"); concise requirements hide unstated questions about default values for existing data, role/permission interactions, and state-transition edges that a structured audit surfaces in minutes. Reads sources, runs a multi-dimensional review, returns a prioritized gap report. Distinct from brainstorming (generates new designs) and reviewing-design-fidelity (compares impl vs design); this audits an existing artifact.
 allowed-tools: Read, Grep, Glob, WebFetch, mcp__plugin_figma_figma__get_design_context, mcp__plugin_figma_figma__get_screenshot, mcp__plugin_figma_figma__get_metadata, mcp__plugin_figma_figma__get_variable_defs
 ---
 
diff --git a/skills/running-integration-tests/SKILL.md b/skills/running-integration-tests/SKILL.md
index 6997a12..f0683ec 100644
--- a/skills/running-integration-tests/SKILL.md
+++ b/skills/running-integration-tests/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: running-integration-tests
-description: Authors and executes tests across all levels of the test pyramid — unit, frontend-backend integration, and end-to-end browser scenarios — using Playwright MCP and standard test runners (jest, vitest, mocha). Use when verifying multi-layer behavior or when user requests "e2e 테스트", "integration 테스트", "통합 테스트", "playwright로 검증", "브라우저에서 직접 확인", "백엔드-프론트 연동 테스트", "API 통합 테스트", "staging에서 확인", "staging 검증", "배포 후 검증", "실제 화면에서 확인", "smoke test", or when a feature crosses service boundaries (API + UI + DB). Also triggers when user asks to verify a deployed feature or when a PR test plan has unchecked browser/API verification items. Triggers even when only one boundary is crossed in the change and a manual Playwright click was already performed; manual verification proves the path works once but does not produce a codified regression test, and integration-tier surprises cluster at exactly the boundary that "looked fine" during manual checks. Helps decide which test level fits, sets up infrastructure, writes tests, runs them, reports failures with diagnostic context. Distinct from superpowers:test-driven-development (unit-level TDD); this handles integration and e2e tiers.
+description: Authors and runs tests across the test pyramid — unit, FE↔BE integration, and end-to-end browser scenarios — using Playwright MCP and runners (jest, vitest, mocha). Use when verifying multi-layer behavior or for "e2e 테스트", "integration 테스트", "통합 테스트", "playwright로 검증", "브라우저에서 직접 확인", "백엔드-프론트 연동", "API 통합 테스트", "staging 검증", "배포 후 검증", "smoke test", or when a feature crosses boundaries (API + UI + DB). Also triggers when verifying a deployed feature or when a PR test plan has unchecked browser/API items. Triggers even when only one boundary is crossed and a manual Playwright click was performed; manual verification proves the path works once but does not produce a codified regression, and integration-tier surprises cluster at exactly the boundary that "looked fine" during manual checks. Decides which test level fits, sets up infra, writes tests, runs them, reports failures with diagnostic context. Distinct from superpowers:test-driven-development (unit-level TDD); handles integration and e2e tiers.
 allowed-tools: Read, Grep, Glob, Edit, Bash, mcp__plugin_playwright_playwright__browser_navigate, mcp__plugin_playwright_playwright__browser_click, mcp__plugin_playwright_playwright__browser_type, mcp__plugin_playwright_playwright__browser_fill_form, mcp__plugin_playwright_playwright__browser_select_option, mcp__plugin_playwright_playwright__browser_press_key, mcp__plugin_playwright_playwright__browser_hover, mcp__plugin_playwright_playwright__browser_snapshot, mcp__plugin_playwright_playwright__browser_take_screenshot, mcp__plugin_playwright_playwright__browser_evaluate, mcp__plugin_playwright_playwright__browser_console_messages, mcp__plugin_playwright_playwright__browser_network_requests, mcp__plugin_playwright_playwright__browser_wait_for, mcp__plugin_playwright_playwright__browser_resize, mcp__plugin_playwright_playwright__browser_navigate_back, mcp__plugin_playwright_playwright__browser_handle_dialog, mcp__plugin_playwright_playwright__browser_close
 ---
 

From 77b44b2e65d92c51b4c2889ab7afc3b1e6dd33ea Mon Sep 17 00:00:00 2001
From: bill <bill.han@evar.co.kr>
Date: Tue, 28 Apr 2026 11:44:45 +0900
Subject: [PATCH 3/3] =?UTF-8?q?fix:=20=EB=B2=84=EC=A0=84=20=EB=B2=88?=
 =?UTF-8?q?=ED=98=B8=200.3.0=20=E2=86=92=200.2.1=20=EC=A0=95=EC=A0=95=20(p?=
 =?UTF-8?q?atch=20level)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

릴리즈 영향 범위 재평가: skill description 의 anti-skip 절 추가는 트리거
키워드를 새로 추가한 게 아니라 기존 트리거 강도를 보강한 변경. 다운스트림
behavior change 가 아니라 description text 의 wording 보강이므로 patch
level 이 적절.

CHANGELOG 의 "Why minor" 정당화 섹션 제거.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .claude-plugin/marketplace.json | 4 ++--
 .claude-plugin/plugin.json      | 4 ++--
 CHANGELOG.md                    | 5 +----
 CLAUDE.md                       | 2 +-
 4 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
index 7cc1923..3c4e8a3 100644
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -6,14 +6,14 @@
   },
   "metadata": {
     "description": "Marketplace hosting the ssep plugin \u2014 Super Software Engineering Powers skills for Claude Code",
-    "version": "0.3.0"
+    "version": "0.2.1"
   },
   "plugins": [
     {
       "name": "ssep",
       "source": "./",
       "description": "Super Software Engineering Powers \u2014 specialized skills for spec review, design fidelity, feature completeness, and multi-layer testing. Complements the superpowers plugin.",
-      "version": "0.3.0",
+      "version": "0.2.1",
       "category": "engineering",
       "tags": [
         "spec-review",
diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
index 8e40932..a4a0fca 100644
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -1,8 +1,8 @@
 {
   "name": "ssep",
   "description": "Super Software Engineering Powers \u2014 specialized skills for spec review, design fidelity, feature completeness, and multi-layer testing. Complements the superpowers plugin by filling gaps in spec/design review and integration testing workflows.",
-  "//": "v0.3.0 added explicit anti-skip clauses to all four skill descriptions to push back against agentic 'task looks small, skip the skill' rationalization observed in real sessions.",
-  "version": "0.3.0",
+  "//": "v0.2.1 added explicit anti-skip clauses to all four skill descriptions to push back against agentic 'task looks small, skip the skill' rationalization observed in real sessions.",
+  "version": "0.2.1",
   "author": {
     "name": "bill",
     "email": "bill.han@evar.co.kr"
diff --git a/CHANGELOG.md b/CHANGELOG.md
index ff5aa6b..c3ebd7a 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,7 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
-## [0.3.0] — 2026-04-28
+## [0.2.1] — 2026-04-28
 
 ### Changed
 - All four skill descriptions strengthened with explicit "Triggers even when ..." anti-skip clauses, naming the dominant skip rationalization per skill so the description itself pushes back against it:
@@ -19,9 +19,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 - `CLAUDE.md` — new "Trigger discipline" section documenting four real-session skip rationalizations with their countering moves, derived from a session retrospective where all four skills should have fired but were each skipped under "task looks small" reasoning. Also notes the convention that future description edits must preserve at least one explicit "Triggers even when ..." clause per skill.
 
-### Why this is a minor (not patch) bump
-- Description matcher behavior changes — each description gained ~150 chars of new trigger surface. This is a behavior-relevant change for the eval loop and downstream agents that cache description hashes, so the version is bumped past patch.
-
 ## [0.2.0] — 2026-04-27
 
 ### Added
diff --git a/CLAUDE.md b/CLAUDE.md
index a6e7eea..cf73169 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -11,7 +11,7 @@ This directory is a Claude Code plugin providing four specialized SE skills (spe
 
 ## Trigger discipline
 
-These skills are most often *skipped* not because the trigger keywords are unclear but because the task looks small in the moment. The descriptions in v0.3.0+ explicitly push back against this rationalization, but contributors editing descriptions or references should preserve and extend the pattern.
+These skills are most often *skipped* not because the trigger keywords are unclear but because the task looks small in the moment. The descriptions in v0.2.1+ explicitly push back against this rationalization, but contributors editing descriptions or references should preserve and extend the pattern.
 
 Real-session skip rationalizations and the right move: