fix(v0.2.1): anti-skip clauses + trigger discipline guidance#6
Conversation
이번 릴리즈는 description matcher 의 강도가 아니라 invocation discipline
자체를 강화한다. 실제 세션 회고에서 4개 스킬 모두 트리거 키워드를 만족
했음에도 "task가 작아 보였기 때문에" 라는 합리화로 invoke 가 누락된 패턴이
관찰됨. 이를 description 본문과 CLAUDE.md 양쪽에서 명시적으로 차단.
### 변경
- 4개 SKILL.md description 에 "Triggers even when ..." 절 추가:
- reviewing-design-fidelity: 단일 라벨 swap / 1 CSS rule 같은 trivial
수정도 invoke 대상
- reviewing-spec-and-policy: 한 문장짜리 요구사항이 가장 흔한 갭
유발 패턴
- improving-feature-completeness: small diff / 수동 happy-path
검증으로 audit 대체 불가
- running-integration-tests: 수동 클릭은 codified regression 이 아님
- CLAUDE.md "Trigger discipline" 섹션 신설 — 4개 합리화 → 올바른 행동
매핑 표 + future contributor 가 description 편집 시 보존해야 할
컨벤션 명시
- plugin.json / marketplace.json version 0.2.0 → 0.3.0
- CHANGELOG [0.3.0] 항목 추가, minor bump 사유 명기 (description
matcher behavior 변경)
### 검증
- jq -e: plugin.json / marketplace.json 모두 valid
- wc -l: 4개 SKILL.md 모두 ~150 미만 (96 / 100 / 110 / 114)
- 프런트매터 탭 미존재
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the plugin to version 0.3.0, introducing explicit 'anti-skip' clauses to all four skill descriptions to prevent models from skipping tasks based on perceived simplicity. It also adds a 'Trigger discipline' section to CLAUDE.md and documents these changes in the CHANGELOG. Feedback focuses on maintaining JSON standard compliance by removing non-standard comments in plugin.json and suggests condensing the lengthened skill descriptions to optimize matching efficiency and token usage while preserving the new trigger logic.
| "//": "v0.3.0 added explicit anti-skip clauses to all four skill descriptions to push back against agentic 'task looks small, skip the skill' rationalization observed in real sessions.", | ||
| "version": "0.3.0", |
There was a problem hiding this comment.
JSON 표준은 주석을 지원하지 않습니다. "//"와 같은 비표준 키는 대부분의 파서에서 무시되지만, 엄격한 스키마 검증을 수행하는 환경에서는 오류를 유발할 수 있습니다. 버전 업데이트의 상세 배경은 CHANGELOG.md에 이미 잘 설명되어 있으므로, 매니페스트 파일의 표준 준수를 위해 이 필드를 제거하는 것을 권장합니다.
| "//": "v0.3.0 added explicit anti-skip clauses to all four skill descriptions to push back against agentic 'task looks small, skip the skill' rationalization observed in real sessions.", | |
| "version": "0.3.0", | |
| "version": "0.3.0", |
| --- | ||
| name: improving-feature-completeness | ||
| description: Audits a working feature implementation for the gap between "happy path passes" and "ready to ship to production" — edge cases, error handling, loading/empty/error states, accessibility, responsive behavior, internationalization, observability, and operational hooks. Use when a feature implementation works but needs polish, when user says "완성도 높여줘", "production ready로 만들어줘", "edge case 검토", "polish this feature", "is this ready to ship?", "출시 가능한지 점검", "엣지 케이스 다 챙겼나", "PR 전 점검", "빠진 상태 없나", "ship 전 검토", or before shipping any non-trivial frontend or backend feature. Also triggers proactively right before PR creation on a non-trivial feature branch, even without explicit user request. Distinct from brainstorming (which designs the feature), test-driven-development (which implements it), and requesting-code-review (which evaluates code structure) — focuses specifically on raising the bar of an already-working implementation through a structured production-readiness audit. | ||
| description: Audits a working feature implementation for the gap between "happy path passes" and "ready to ship to production" — edge cases, error handling, loading/empty/error states, accessibility, responsive behavior, internationalization, observability, and operational hooks. Use when a feature implementation works but needs polish, when user says "완성도 높여줘", "production ready로 만들어줘", "edge case 검토", "polish this feature", "is this ready to ship?", "출시 가능한지 점검", "엣지 케이스 다 챙겼나", "PR 전 점검", "빠진 상태 없나", "ship 전 검토", or before shipping any non-trivial frontend or backend feature. Also triggers proactively right before PR creation on a non-trivial feature branch, even without explicit user request. Triggers even when the diff is small or the happy path was verified manually; completeness gaps emerge from state interactions (empty/loading/error/disabled, role variants, locale variants), not from line count, and "I already clicked through it in Playwright" does not substitute for the audit. Distinct from brainstorming (which designs the feature), test-driven-development (which implements it), and requesting-code-review (which evaluates code structure) — focuses specifically on raising the bar of an already-working implementation through a structured production-readiness audit. |
There was a problem hiding this comment.
스킬 설명(description)이 매우 길어지고 있습니다(약 1,000자 이상). Claude Code의 매처(matcher)가 이 설명을 기반으로 스킬을 선택하는데, 설명이 너무 길면 핵심 키워드의 가중치가 희석되거나 불필요한 토큰 소모가 발생할 수 있습니다. "Triggers even when..." 절의 핵심 의도는 유지하면서도, 전체적인 문장을 조금 더 간결하게 다듬어 효율성을 높이는 것을 권장합니다.
| description: Audits a working feature implementation for the gap between "happy path passes" and "ready to ship to production" — edge cases, error handling, loading/empty/error states, accessibility, responsive behavior, internationalization, observability, and operational hooks. Use when a feature implementation works but needs polish, when user says "완성도 높여줘", "production ready로 만들어줘", "edge case 검토", "polish this feature", "is this ready to ship?", "출시 가능한지 점검", "엣지 케이스 다 챙겼나", "PR 전 점검", "빠진 상태 없나", "ship 전 검토", or before shipping any non-trivial frontend or backend feature. Also triggers proactively right before PR creation on a non-trivial feature branch, even without explicit user request. Triggers even when the diff is small or the happy path was verified manually; completeness gaps emerge from state interactions (empty/loading/error/disabled, role variants, locale variants), not from line count, and "I already clicked through it in Playwright" does not substitute for the audit. Distinct from brainstorming (which designs the feature), test-driven-development (which implements it), and requesting-code-review (which evaluates code structure) — focuses specifically on raising the bar of an already-working implementation through a structured production-readiness audit. | |
| description: Audits a working feature implementation for production readiness (edge cases, error handling, a11y, i18n, observability). Use when a feature works but needs polish, or when user says "완성도 높여줘", "production ready로 만들어줘", "edge case 검토", "polish this feature", "is this ready to ship?", "출시 가능한지 점검", "엣지 케이스 다 챙겼나", "PR 전 점검", "빠진 상태 없나", "ship 전 검토". Triggers proactively before PRs. Triggers even when the diff is small or happy path was verified manually; completeness gaps emerge from state interactions (empty/loading/error, roles, locales), not line count. "I already clicked through it" is not a substitute for this audit. Distinct from brainstorming, TDD, and code review. |
| --- | ||
| name: reviewing-design-fidelity | ||
| description: Reviews implemented UI against design source-of-truth (Figma frames, mockup images, design system tokens) to verify visual fidelity — spacing, typography, color, responsive behavior, interaction states, and accessibility. Use when comparing built pages to design specs, when a Figma URL is shared alongside a running URL or screenshot, or when user says "디자인 검수", "퍼블리싱 검토", "compare with figma", "check design fidelity", "픽셀 비교", "design QA", "피그마 비교해서 차이점 찾아줘", "figma vs 현재 화면", "구현이 디자인과 맞는지", "피그마대로 되어있나", "현황과 비교", "figma url 과 현재 화면 대조". Also triggers when user shares a Figma URL and asks about existing implementation (comparison implied even without explicit keyword). Skill captures both sides (Figma extraction + Playwright snapshot of live impl), runs a structured visual diff, and reports gaps prioritized by user-visible impact. Distinct from reviewing-spec-and-policy (which audits docs) and from code review (which audits code structure). | ||
| description: Reviews implemented UI against design source-of-truth (Figma frames, mockup images, design system tokens) to verify visual fidelity — spacing, typography, color, responsive behavior, interaction states, and accessibility. Use when comparing built pages to design specs, when a Figma URL is shared alongside a running URL or screenshot, or when user says "디자인 검수", "퍼블리싱 검토", "compare with figma", "check design fidelity", "픽셀 비교", "design QA", "피그마 비교해서 차이점 찾아줘", "figma vs 현재 화면", "구현이 디자인과 맞는지", "피그마대로 되어있나", "현황과 비교", "figma url 과 현재 화면 대조". Also triggers when user shares a Figma URL and asks about existing implementation (comparison implied even without explicit keyword). Triggers even when the requested change appears trivial — single label swap, one CSS rule, one-line JSX edit — because apparent simplicity is the most common rationalization for skipping fidelity checks, and the skill itself decides scope (full audit vs quick check) faster than the caller can. Skill captures both sides (Figma extraction + Playwright snapshot of live impl), runs a structured visual diff, and reports gaps prioritized by user-visible impact. Distinct from reviewing-spec-and-policy (which audits docs) and from code review (which audits code structure). |
There was a problem hiding this comment.
스킬 설명이 매우 길어지고 있습니다. 핵심 키워드와 'anti-skip' 의도는 유지하되, 문장을 더 간결하게 다듬어 매칭 효율을 높이는 것이 좋습니다.
| description: Reviews implemented UI against design source-of-truth (Figma frames, mockup images, design system tokens) to verify visual fidelity — spacing, typography, color, responsive behavior, interaction states, and accessibility. Use when comparing built pages to design specs, when a Figma URL is shared alongside a running URL or screenshot, or when user says "디자인 검수", "퍼블리싱 검토", "compare with figma", "check design fidelity", "픽셀 비교", "design QA", "피그마 비교해서 차이점 찾아줘", "figma vs 현재 화면", "구현이 디자인과 맞는지", "피그마대로 되어있나", "현황과 비교", "figma url 과 현재 화면 대조". Also triggers when user shares a Figma URL and asks about existing implementation (comparison implied even without explicit keyword). Triggers even when the requested change appears trivial — single label swap, one CSS rule, one-line JSX edit — because apparent simplicity is the most common rationalization for skipping fidelity checks, and the skill itself decides scope (full audit vs quick check) faster than the caller can. Skill captures both sides (Figma extraction + Playwright snapshot of live impl), runs a structured visual diff, and reports gaps prioritized by user-visible impact. Distinct from reviewing-spec-and-policy (which audits docs) and from code review (which audits code structure). | |
| description: Reviews implemented UI against design source-of-truth (Figma, tokens) for visual fidelity (spacing, typography, color, a11y). Use when comparing pages to specs, or when user says "디자인 검수", "퍼블리싱 검토", "compare with figma", "check design fidelity", "픽셀 비교", "design QA", "피그마 비교해서 차이점 찾아줘", "figma vs 현재 화면", "구현이 디자인과 맞는지", "피그마대로 되어있나", "현황과 비교", "figma url 과 현재 화면 대조". Triggers even for trivial changes (single label swap, one CSS rule) as apparent simplicity often leads to skipped checks; the skill decides scope faster than the caller. Captures Figma + Playwright snapshots for structured visual diff. Distinct from spec review and code review. |
| --- | ||
| name: reviewing-spec-and-policy | ||
| description: Reviews product specifications, requirements documents, RFCs, PRDs, and policy documents from text, markdown, or Figma sources. Use when evaluating specs for completeness, internal consistency, edge-case coverage, ambiguity, conflicts with prior policies, or compliance gaps. Triggers on user requests like "이 기획서 검토해줘", "스펙 리뷰", "정책 검토", "review this PRD", "is this requirement complete?", "check this against [policy/RFC]", "이 기획대로 구현하면 빠진게 뭐가 있을까", "스펙 갭", "요구사항 누락", "기획서 애매한 부분", or whenever a Figma URL is shared in a planning/spec context (before implementation begins). Skill reads sources, runs a structured multi-dimensional review, and returns a prioritized gap report. Intentionally distinct from brainstorming (which generates new designs) and from reviewing-design-fidelity (which compares implementation vs design after code exists); this audits an existing artifact. | ||
| description: Reviews product specifications, requirements documents, RFCs, PRDs, and policy documents from text, markdown, or Figma sources. Use when evaluating specs for completeness, internal consistency, edge-case coverage, ambiguity, conflicts with prior policies, or compliance gaps. Triggers on user requests like "이 기획서 검토해줘", "스펙 리뷰", "정책 검토", "review this PRD", "is this requirement complete?", "check this against [policy/RFC]", "이 기획대로 구현하면 빠진게 뭐가 있을까", "스펙 갭", "요구사항 누락", "기획서 애매한 부분", or whenever a Figma URL is shared in a planning/spec context (before implementation begins). Triggers even when the requirement statement reads as a single concise sentence (e.g. "X should default to inactive and be activatable from detail"); concise requirements typically hide unstated questions about default values for existing data, role/permission interactions, and state-transition edge cases that a structured audit surfaces in minutes. Skill reads sources, runs a structured multi-dimensional review, and returns a prioritized gap report. Intentionally distinct from brainstorming (which generates new designs) and from reviewing-design-fidelity (which compares implementation vs design after code exists); this audits an existing artifact. |
There was a problem hiding this comment.
스킬 설명이 매우 길어지고 있습니다. 핵심 키워드와 'anti-skip' 의도는 유지하되, 문장을 더 간결하게 다듬어 매칭 효율을 높이는 것이 좋습니다.
| description: Reviews product specifications, requirements documents, RFCs, PRDs, and policy documents from text, markdown, or Figma sources. Use when evaluating specs for completeness, internal consistency, edge-case coverage, ambiguity, conflicts with prior policies, or compliance gaps. Triggers on user requests like "이 기획서 검토해줘", "스펙 리뷰", "정책 검토", "review this PRD", "is this requirement complete?", "check this against [policy/RFC]", "이 기획대로 구현하면 빠진게 뭐가 있을까", "스펙 갭", "요구사항 누락", "기획서 애매한 부분", or whenever a Figma URL is shared in a planning/spec context (before implementation begins). Triggers even when the requirement statement reads as a single concise sentence (e.g. "X should default to inactive and be activatable from detail"); concise requirements typically hide unstated questions about default values for existing data, role/permission interactions, and state-transition edge cases that a structured audit surfaces in minutes. Skill reads sources, runs a structured multi-dimensional review, and returns a prioritized gap report. Intentionally distinct from brainstorming (which generates new designs) and from reviewing-design-fidelity (which compares implementation vs design after code exists); this audits an existing artifact. | |
| description: Reviews product specs, PRDs, and policies for completeness, consistency, and edge cases. Use for "이 기획서 검토해줘", "스펙 리뷰", "정책 검토", "review this PRD", "is this requirement complete?", "스펙 갭", "요구사항 누락", "기획서 애매한 부분", or when a Figma URL is shared in planning. Triggers even for single-sentence requirements (e.g. "X should default to inactive"); concise statements often hide unstated edge cases regarding default values, permissions, and state transitions. Returns a prioritized gap report. Distinct from brainstorming and design fidelity review. |
| --- | ||
| name: running-integration-tests | ||
| description: Authors and executes tests across all levels of the test pyramid — unit, frontend-backend integration, and end-to-end browser scenarios — using Playwright MCP and standard test runners (jest, vitest, mocha). Use when verifying multi-layer behavior or when user requests "e2e 테스트", "integration 테스트", "통합 테스트", "playwright로 검증", "브라우저에서 직접 확인", "백엔드-프론트 연동 테스트", "API 통합 테스트", "staging에서 확인", "staging 검증", "배포 후 검증", "실제 화면에서 확인", "smoke test", or when a feature crosses service boundaries (API + UI + DB). Also triggers when user asks to verify a deployed feature or when a PR test plan has unchecked browser/API verification items. Helps decide which test level fits, sets up infrastructure, writes tests, runs them, reports failures with diagnostic context. Distinct from superpowers:test-driven-development (unit-level TDD); this handles integration and e2e tiers. | ||
| description: Authors and executes tests across all levels of the test pyramid — unit, frontend-backend integration, and end-to-end browser scenarios — using Playwright MCP and standard test runners (jest, vitest, mocha). Use when verifying multi-layer behavior or when user requests "e2e 테스트", "integration 테스트", "통합 테스트", "playwright로 검증", "브라우저에서 직접 확인", "백엔드-프론트 연동 테스트", "API 통합 테스트", "staging에서 확인", "staging 검증", "배포 후 검증", "실제 화면에서 확인", "smoke test", or when a feature crosses service boundaries (API + UI + DB). Also triggers when user asks to verify a deployed feature or when a PR test plan has unchecked browser/API verification items. Triggers even when only one boundary is crossed in the change and a manual Playwright click was already performed; manual verification proves the path works once but does not produce a codified regression test, and integration-tier surprises cluster at exactly the boundary that "looked fine" during manual checks. Helps decide which test level fits, sets up infrastructure, writes tests, runs them, reports failures with diagnostic context. Distinct from superpowers:test-driven-development (unit-level TDD); this handles integration and e2e tiers. |
There was a problem hiding this comment.
스킬 설명이 매우 길어지고 있습니다. 핵심 키워드와 'anti-skip' 의도는 유지하되, 문장을 더 간결하게 다듬어 매칭 효율을 높이는 것이 좋습니다.
| description: Authors and executes tests across all levels of the test pyramid — unit, frontend-backend integration, and end-to-end browser scenarios — using Playwright MCP and standard test runners (jest, vitest, mocha). Use when verifying multi-layer behavior or when user requests "e2e 테스트", "integration 테스트", "통합 테스트", "playwright로 검증", "브라우저에서 직접 확인", "백엔드-프론트 연동 테스트", "API 통합 테스트", "staging에서 확인", "staging 검증", "배포 후 검증", "실제 화면에서 확인", "smoke test", or when a feature crosses service boundaries (API + UI + DB). Also triggers when user asks to verify a deployed feature or when a PR test plan has unchecked browser/API verification items. Triggers even when only one boundary is crossed in the change and a manual Playwright click was already performed; manual verification proves the path works once but does not produce a codified regression test, and integration-tier surprises cluster at exactly the boundary that "looked fine" during manual checks. Helps decide which test level fits, sets up infrastructure, writes tests, runs them, reports failures with diagnostic context. Distinct from superpowers:test-driven-development (unit-level TDD); this handles integration and e2e tiers. | |
| description: Authors and executes tests across the pyramid (unit, integration, e2e) using Playwright and standard runners. Use for "e2e 테스트", "integration 테스트", "통합 테스트", "playwright로 검증", "브라우저에서 직접 확인", "API 통합 테스트", "staging 검증", "smoke test", or when features cross service boundaries. Triggers even when only one boundary is crossed and manual verification was performed; manual checks don't produce codified regressions, and surprises often cluster at "simple" boundaries. Helps choose test level and provides diagnostic context on failure. Distinct from unit-level TDD. |
Lint workflow flagged all 4 descriptions over the 1024 char limit after the v0.3.0 anti-skip clause additions. Trimmed redundant Korean trigger phrases and tightened English phrasing while preserving the new "Triggers even when ..." clause (the value-add of v0.3.0). Final lengths: - reviewing-design-fidelity: 989 / 1024 - reviewing-spec-and-policy: 947 / 1024 - improving-feature-completeness: 962 / 1024 - running-integration-tests: 1014 / 1024 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
릴리즈 영향 범위 재평가: skill description 의 anti-skip 절 추가는 트리거 키워드를 새로 추가한 게 아니라 기존 트리거 강도를 보강한 변경. 다운스트림 behavior change 가 아니라 description text 의 wording 보강이므로 patch level 이 적절. CHANGELOG 의 "Why minor" 정당화 섹션 제거. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
요약
ssep v0.3.0 — 4개 스킬 description 에 "Triggers even when ..." anti-skip 절을 박고,
CLAUDE.md에 Trigger discipline 섹션을 신설합니다. description matcher 의 트리거 강도(키워드)가 아니라, invocation discipline 자체를 강화하는 것이 목적입니다.동기 (실제 세션 회고)
직전 세션에서 다음 4 케이스가 모두 발생:
reviewing-design-fidelityreviewing-spec-and-policyimproving-feature-completenessrunning-integration-testsdescription 의 트리거 키워드는 이미 충분(Figma URL, "디자인 검수" 등)했지만 "task가 작아 보였다" 는 한 가지 합리화로 모두 skip. description 자체에 그 합리화를 정면으로 깨는 절을 박아 놓는 게 가장 효과적인 방어선.
AS-IS → TO-BE
Skill descriptions (4건)
각 description 에 한 절 추가:
reviewing-design-fidelityreviewing-spec-and-policyimproving-feature-completenessrunning-integration-testsCLAUDE.md— Trigger discipline 섹션 신설4개 합리화 → 올바른 행동 매핑 표 + 향후 description 수정 시 보존 컨벤션 (각 description 에 "Triggers even when ..." 절 1개 이상 유지).
버전
plugin.json/marketplace.json0.2.0 → 0.3.0 (minor bump)CHANGELOG.md [0.3.0]항목에 사유 명기.Test plan
jq -e .—plugin.json,marketplace.json모두 validwc -l skills/*/SKILL.md— 96 / 100 / 110 / 114 (모두 150 미만)후속 (이 PR 범위 외)
superpowers스킬에 적용할지는 별도 결정 (소유자가 다름).🤖 Generated with Claude Code