Skip to content

[codex] Slim watchlist skill runtime#10

Merged
dd3ok merged 2 commits into
mainfrom
codex/watchlist-skill-progressive-disclosure
May 16, 2026
Merged

[codex] Slim watchlist skill runtime#10
dd3ok merged 2 commits into
mainfrom
codex/watchlist-skill-progressive-disclosure

Conversation

@dd3ok
Copy link
Copy Markdown
Owner

@dd3ok dd3ok commented May 16, 2026

Summary

  • Slim watchlist-md runtime instructions and move lifecycle/safety detail into progressive-disclosure references.
  • Add a bundled standalone WATCHLIST validator and CI smoke coverage for installed-skill validation.
  • Move the starter watchlist into examples/, ignore generated .watchlist/WATCHLIST.md, and expand negative trigger eval coverage.

Validation

  • python3 -m unittest discover -s evals -p 'test_*.py'
  • python3 evals/check_release_metadata.py
  • python3 evals/check_policy_markers.py
  • python3 evals/check_semantic_cases.py
  • python3 evals/check_watchlist.py examples/WATCHLIST.example.md --strict-format --strict-safety --require-archive-section
  • python3 evals/check_watchlist.py .agents/skills/watchlist-md/assets/WATCHLIST.template.md --strict-format --strict-safety --require-archive-section
  • python3 .agents/skills/watchlist-md/scripts/validate_watchlist.py .agents/skills/watchlist-md/assets/WATCHLIST.template.md --strict-format --strict-safety --require-archive-section
  • git diff --check

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the watchlist-md skill by moving detailed lifecycle and safety documentation into dedicated reference files and introducing a new validation script (validate_watchlist.py). It updates the item ID format to WL-YYYYMMDD-NNN across the skill definition, prompts, and evaluation cases. Additionally, the repository structure is updated to ignore local .watchlist files and provide clearer examples. Feedback was provided regarding the scan_safety function in the new validation script, which is currently unused dead code.

Comment thread .agents/skills/watchlist-md/scripts/validate_watchlist.py Outdated
@dd3ok dd3ok marked this pull request as ready for review May 16, 2026 07:15
@dd3ok dd3ok merged commit 2988e37 into main May 16, 2026
4 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 019d2da6d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +47 to +48
Add an item only when the user explicitly asks to record a future, time-gated, or
event-gated check, or has opted into pre-authorized watchlist recording. If the
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve implicit deferred-check adds

For the existing semantic case evals/cases/add-kr-02.json, the prompt 배포가 방금 시작됐어. 30분 뒤에 에러 로그 확인해야 해. is still expected to trigger add_item, but it does not explicitly ask to “record” anything. This new “only when the user explicitly asks to record” rule tells agents to skip that supported event-gated deferred-check flow, so users who state that a deployment/log check must be done later will no longer get a watchlist entry unless they know the exact WATCHLIST wording.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant