feat: add FGA, Pipes, Feature Flags, and Radar references + evals by nicknisi · Pull Request #19 · workos/skills

nicknisi · 2026-04-14T17:21:26Z

Summary

Adds reference files for 4 WorkOS products that had no skill coverage: FGA, Pipes, Feature Flags, Radar
Updates skill router (SKILL.md) with new feature table rows, expanded description, and FGA vs RBAC disambiguation
Adds 18 eval cases (42 → 60 total) covering all four products across Node.js and Python

Eval results (2-sample, Sonnet 4.5)

Product	Avg Delta	Cases	Verdict
Pipes	+21%	4	Strong value
FGA	+11%	5	Moderate value
Radar	+10%	4	Moderate value
Feature Flags	~+1%	4	Low value (model already knows)

No regressions in existing products. All 60 cases pass dry-run. Lint and tests clean.

Gotcha sourcing

Gotchas were inferred from docs, not from observed LLM failures. Eval runs surfaced two cases where gotchas actively hurt (radar-node-blocklist, feature-flags-nextjs-check) — both fixed and re-validated before this PR.

Test plan

pnpm eval -- --dry-run loads all 60 cases
pnpm lint passes
pnpm test passes (184 tests)
Full eval run (60 cases, 2 samples) — no product-level negative deltas
Targeted re-runs on fixed cases confirm improvement
Spot-check doc URLs after merge (some may 404 if product docs moved)

Close coverage gaps for four WorkOS products that had no reference files or routing in the skill router. New reference files (references/*.md): - workos-fga.md — Fine-Grained Authorization (new API, not legacy warrants) - workos-pipes.md — Pipes / Connected Apps (OAuth integrations) - workos-feature-flags.md — Feature Flags (access token claims) - workos-radar.md — Radar (bot/fraud detection) New eval cases (scripts/eval/cases/*.yaml): - fga.yaml (5 cases), pipes.yaml (4 cases), feature-flags.yaml (4 cases), radar.yaml (4 cases) - Total cases: 42 → 60 Skill router updates (SKILL.md): - Added 4 rows to Features routing table - Updated frontmatter description to trigger on new products - Expanded routing decision tree with new product slugs - Added FGA vs RBAC disambiguation Eval results (2-sample, Sonnet): - Pipes: +21% avg delta (strongest new product) - FGA: +11% avg delta - Radar: +10% avg delta (after blocklist case fix) - Feature Flags: ~+1% avg delta (model already knows most of this) - No regressions in existing products

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add FGA, Pipes, Feature Flags, and Radar references + evals#19

feat: add FGA, Pipes, Feature Flags, and Radar references + evals#19
nicknisi wants to merge 1 commit intomainfrom
feat/add-fga-pipes-feature-flags-radar

nicknisi commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

nicknisi commented Apr 14, 2026

Summary

Eval results (2-sample, Sonnet 4.5)

Gotcha sourcing

Test plan

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant