Skip to content

feat: /codex skill — multi-AI second opinion + proactive suggestions#197

Merged
garrytan merged 18 commits intomainfrom
garrytan/codex-review-skill
Mar 19, 2026
Merged

feat: /codex skill — multi-AI second opinion + proactive suggestions#197
garrytan merged 18 commits intomainfrom
garrytan/codex-review-skill

Conversation

@garrytan
Copy link
Owner

@garrytan garrytan commented Mar 19, 2026

Summary

  • /codex skill — three modes: review (diff review with pass/fail gate), challenge (adversarial — tries to break your code), and consult (ask anything with session continuity)
  • Cross-model analysis — when both /review and /codex review run, shows which findings overlap and which are unique to each AI
  • Integrated into /review, /ship, /plan-eng-review — Codex second opinion offered after Claude's own review, optional gate in ship, plan critique before eng review
  • Proactive skill suggestions — gstack notices your development stage and suggests the right skill; opt out with "stop suggesting"
  • Trigger phrase validation tests — ensures all skills have "Use when" and "Proactively suggest" phrases for reliable NLP routing
  • Bug fixes from Codex adversarial challenge: scoped plan lookup (cross-project leak), mktemp for stderr (race condition), quoted path variables, .context/ gitignored (session ID leak), ARG_MAX-safe plan review

Pre-Landing Review

No issues found. All changes are SKILL.md templates, test files, gen-skill-docs.ts, .gitignore, and generated SKILL.md files.

Test plan

  • All unit/validation tests pass (bun test — 0 failures)
  • Codex review PASS (3 P2 findings, all fixed)
  • Codex adversarial challenge run (4 critical/high, 6 medium — all addressed)
  • Merge conflicts with main resolved (careful/freeze/guard/unfreeze skills)
  • gen:skill-docs regenerates all 22 SKILL.md files successfully

🤖 Generated with Claude Code

garrytan and others added 18 commits March 18, 2026 21:11
…ult)

Three modes: code review with pass/fail gate, adversarial challenge mode,
and conversational consult with session continuity. First multi-AI skill
in gstack, wrapping OpenAI's Codex CLI.
/review offers Codex second opinion after completing its own review.
/ship offers Codex review as optional gate before pushing.
/plan-eng-review offers Codex plan critique after scope challenge.
Review Readiness Dashboard shows Codex Review as optional row.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stub tests (free tier): verify template content — three modes, gate verdict,
session continuity, cost tracking, cross-model comparison, binary discovery,
error handling, mktemp usage, and integrations into /review, /ship, /plan-eng-review.

E2E test (paid tier): runs /codex review on vulnerable fixture repo via
session-runner, verifies output contains findings and GATE verdict.
Codex authenticates via ChatGPT OAuth (codex login), not an env var.
gpt-5.2-codex is the only model available with ChatGPT login.
All commands now use model_reasoning_effort="high" for maximum
depth — the whole point is a thorough second opinion.
…e) + web search

Review and consult use high reasoning — thorough but not slow.
Challenge (adversarial) uses xhigh — maximum depth for breaking code.
All modes enable web_search_cached so Codex can look up docs/APIs.
Use --json flag to parse codex's JSONL events, extracting reasoning
traces ([codex thinking]), tool calls ([codex ran]), and token counts.
This gives richer output than the -o flag alone — you can see what
codex thought through before its answer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Don't write a codex-review entry to reviews.jsonl when only the
adversarial challenge (option B) was selected — there's no gate
verdict to record, and a false entry misleads the Review Readiness
Dashboard into thinking a code review happened.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After scope challenge (Step 0), offer to have Codex independently
review the plan with a brutally honest tech reviewer persona.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ng, stderr

- plan-eng-review: Codex now reads the plan file itself instead of inlining
  content as a CLI arg (avoids ARG_MAX for large plans)
- review: add missing echo to persist codex-review results to reviews.jsonl
- codex: consult mode uses $TMPERR (mktemp) instead of hardcoded stderr path
- codex + review: quote $SLUG/$BRANCH_SLUG in review log paths
- codex: scope plan lookup to current project, warn on cross-project fallback

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex consult mode stores session IDs in .context/codex-session-id.
Without this ignore rule, session IDs could leak into commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Preamble reads proactive config via gstack-config
- Root SKILL.md.tmpl has lifecycle map (stage → skill suggestion)
- Users can opt out ("stop suggesting") / opt in ("be proactive again")
- Restored trigger phrase validation tests (16 skills × "Use when" check)
- Added missing "Use when" trigger phrases to /debug and /office-hours

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lls)

Merged main which added /careful, /freeze, /guard, /unfreeze skills,
analytics tracking, proactive suggest phrases, and dirty-tree handling.
Resolved conflicts by keeping both sides: codex + new safety skills in
template list, deduplicated proactive config in preamble, merged trigger
phrase tests with proactive phrase tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title feat: /codex skill — multi-AI second opinion platform (v0.8.0) feat: /codex skill — multi-AI second opinion + proactive suggestions Mar 19, 2026
@garrytan garrytan merged commit d852330 into main Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant