feat(skillopt): implement SkillOpt skill-document optimizer by wesleysimplicio · Pull Request #75 · wesleysimplicio/llm-project-mapper

wesleysimplicio · 2026-05-26T01:36:54Z

Summary

Implements SkillOpt (Microsoft Research — Executive Strategy for Self-Evolving Agent Skills) as a tool for this repo's .skills/ ecosystem. SkillOpt treats a natural-language skill document as the only trainable artifact and optimizes it for a frozen target model through a four-stage loop:

Rollout — score the current skill on the train split.
Reflect — turn failure/success batches into candidate edits.
Edit — apply up to budget add/delete/replace ops (the textual learning rate), skipping any op already in the rejected-edit buffer.
Gate — accept a candidate only if it improves a held-out split; otherwise its edits are buffered as negative feedback.

Output is best_skill.md, plus an optional run report and a content-addressed receipt under .catalog/receipts/ (matching the repo's receipt schema).

What's included

scripts/skillopt/engine.js — deterministic, dependency-free engine (optimize, reflect, applyEdits, evaluateSplit). The rollout scorer is pluggable (opts.scorer) so a real LLM adapter can replace the default heuristic without touching the loop. Runs fully offline.
bin/skillopt.js + cli.js skillopt subcommand — npx ... skillopt --suite <suite.json> [--skill ...] [--out best_skill.md] [--report ...] [--rounds N] [--budget N] [--no-receipt] [--json].
.skills/skillopt/SKILL.md — skill manifest, plus runnable example.skill.md / example.suite.json.
Tests: tests/unit/skillopt.test.js (engine + CLI, 15 tests) and tests/e2e/skillopt.spec.ts (Playwright CLI e2e with evidence attachments).
Wired into package.json bin, README "Companion tooling", CHANGELOG.md, .skills/README.md, and a .gitignore whitelist for scripts/skillopt/.

Design notes

The skill markdown is the only thing edited; the "model" is never fine-tuned.
A regression can never become best (acceptance requires candidateGate > bestGate). When no holdout tasks exist the gate falls back to the train split and reports usedHoldout: false.
Hardened against malformed suite JSON (null/non-object tasks, non-array directives) at the system boundary; the CLI exits 2 cleanly on bad input. Regex used in replace edits is escaped (no ReDoS / injection).

Test plan

npm test — 57 pass / 5 pre-existing skips, 0 fail
npm run lint — 0 errors
npx playwright test --project=chromium — 10 pass / 1 pre-existing skip (incl. new skillopt.spec.ts)
Manual run on the example suite: gate score 0.5 -> 1, EXIT_SIGNAL: true, best_skill.md gains the missing directives
Reviewer confirms the best_skill.md diff workflow reads well

https://claude.ai/code/session_01MuMv2kN3x5s6UwjXMap2ZP

Generated by Claude Code

Add the SkillOpt loop (Rollout -> Reflect -> Edit -> Gate) from https://microsoft.github.io/SkillOpt/ as a tool for this skill ecosystem. The skill markdown is the only trainable artifact; the target model stays frozen and edits are accepted only when they improve a held-out task split, with a rejected-edit buffer providing negative feedback and an edit budget acting as the textual learning rate. - scripts/skillopt/engine.js: deterministic, dependency-free engine (optimize, reflect, applyEdits, evaluateSplit) with a pluggable scorer so real LLM adapters can replace the default heuristic. - bin/skillopt.js + cli.js subcommand: optimize a SKILL.md against a task suite, emit best_skill.md, an optional report, and a content-addressed receipt under .catalog/receipts/. - .skills/skillopt: skill manifest plus runnable example fixtures. - Unit tests (engine + CLI) and a Playwright CLI e2e with evidence. https://claude.ai/code/session_01MuMv2kN3x5s6UwjXMap2ZP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skillopt): implement SkillOpt skill-document optimizer#75

feat(skillopt): implement SkillOpt skill-document optimizer#75
wesleysimplicio wants to merge 1 commit into
mainfrom
claude/skillopt-implementation-SHUSn

wesleysimplicio commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wesleysimplicio commented May 26, 2026

Summary

What's included

Design notes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants