Skip to content

feat: add FGA, Pipes, Feature Flags, and Radar references + evals#19

Open
nicknisi wants to merge 1 commit intomainfrom
feat/add-fga-pipes-feature-flags-radar
Open

feat: add FGA, Pipes, Feature Flags, and Radar references + evals#19
nicknisi wants to merge 1 commit intomainfrom
feat/add-fga-pipes-feature-flags-radar

Conversation

@nicknisi
Copy link
Copy Markdown
Member

Summary

  • Adds reference files for 4 WorkOS products that had no skill coverage: FGA, Pipes, Feature Flags, Radar
  • Updates skill router (SKILL.md) with new feature table rows, expanded description, and FGA vs RBAC disambiguation
  • Adds 18 eval cases (42 → 60 total) covering all four products across Node.js and Python

Eval results (2-sample, Sonnet 4.5)

Product Avg Delta Cases Verdict
Pipes +21% 4 Strong value
FGA +11% 5 Moderate value
Radar +10% 4 Moderate value
Feature Flags ~+1% 4 Low value (model already knows)

No regressions in existing products. All 60 cases pass dry-run. Lint and tests clean.

Gotcha sourcing

Gotchas were inferred from docs, not from observed LLM failures. Eval runs surfaced two cases where gotchas actively hurt (radar-node-blocklist, feature-flags-nextjs-check) — both fixed and re-validated before this PR.

Test plan

  • pnpm eval -- --dry-run loads all 60 cases
  • pnpm lint passes
  • pnpm test passes (184 tests)
  • Full eval run (60 cases, 2 samples) — no product-level negative deltas
  • Targeted re-runs on fixed cases confirm improvement
  • Spot-check doc URLs after merge (some may 404 if product docs moved)

Close coverage gaps for four WorkOS products that had no reference files
or routing in the skill router.

New reference files (references/*.md):
- workos-fga.md — Fine-Grained Authorization (new API, not legacy warrants)
- workos-pipes.md — Pipes / Connected Apps (OAuth integrations)
- workos-feature-flags.md — Feature Flags (access token claims)
- workos-radar.md — Radar (bot/fraud detection)

New eval cases (scripts/eval/cases/*.yaml):
- fga.yaml (5 cases), pipes.yaml (4 cases),
  feature-flags.yaml (4 cases), radar.yaml (4 cases)
- Total cases: 42 → 60

Skill router updates (SKILL.md):
- Added 4 rows to Features routing table
- Updated frontmatter description to trigger on new products
- Expanded routing decision tree with new product slugs
- Added FGA vs RBAC disambiguation

Eval results (2-sample, Sonnet):
- Pipes: +21% avg delta (strongest new product)
- FGA: +11% avg delta
- Radar: +10% avg delta (after blocklist case fix)
- Feature Flags: ~+1% avg delta (model already knows most of this)
- No regressions in existing products
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant