feat: add FGA, Pipes, Feature Flags, and Radar references + evals#19
Open
feat: add FGA, Pipes, Feature Flags, and Radar references + evals#19
Conversation
Close coverage gaps for four WorkOS products that had no reference files or routing in the skill router. New reference files (references/*.md): - workos-fga.md — Fine-Grained Authorization (new API, not legacy warrants) - workos-pipes.md — Pipes / Connected Apps (OAuth integrations) - workos-feature-flags.md — Feature Flags (access token claims) - workos-radar.md — Radar (bot/fraud detection) New eval cases (scripts/eval/cases/*.yaml): - fga.yaml (5 cases), pipes.yaml (4 cases), feature-flags.yaml (4 cases), radar.yaml (4 cases) - Total cases: 42 → 60 Skill router updates (SKILL.md): - Added 4 rows to Features routing table - Updated frontmatter description to trigger on new products - Expanded routing decision tree with new product slugs - Added FGA vs RBAC disambiguation Eval results (2-sample, Sonnet): - Pipes: +21% avg delta (strongest new product) - FGA: +11% avg delta - Radar: +10% avg delta (after blocklist case fix) - Feature Flags: ~+1% avg delta (model already knows most of this) - No regressions in existing products
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Eval results (2-sample, Sonnet 4.5)
No regressions in existing products. All 60 cases pass dry-run. Lint and tests clean.
Gotcha sourcing
Gotchas were inferred from docs, not from observed LLM failures. Eval runs surfaced two cases where gotchas actively hurt (
radar-node-blocklist,feature-flags-nextjs-check) — both fixed and re-validated before this PR.Test plan
pnpm eval -- --dry-runloads all 60 casespnpm lintpassespnpm testpasses (184 tests)