feat(docs): improve llms mdx component fidelity#7871
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
WalkthroughThis PR adds a markdown normalization pipeline for LLM-compatible text extraction from documentation. It introduces type-safe transformations that convert specialized UI components (callouts, code tabs, accordions, API references) into plain Markdown, with OpenAPI spec enrichment and comprehensive snapshot-driven tests. ChangesLLM Markdown Normalization
🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
apps/docs/scripts/test-llm-markdown-fidelity.ts (1)
4-5: ⚡ Quick winBroaden the JSX-leak assertion to the tags this transformer actually handles.
rawComponentPatternmissesCalloutContainer,CalloutTitle,CalloutDescription,TabsList,TabsTrigger, andTabsContent, so this suite can still pass while raw JSX leaks for supported components. Please extend the pattern and add at least one snapshot that uses<TabsList>/<TabsTrigger>markup.Suggested fix
const rawComponentPattern = - /<(?:APIPage|CodeBlockTabs|CodeBlockTab|Tabs|Tab|Cards|Card|Accordions|Accordion|Youtube|Button|SharedContent|Steps|Step)\b/; + /<(?:APIPage|CalloutContainer|CalloutTitle|CalloutDescription|CodeBlockTabs|CodeBlockTabsList|CodeBlockTabsTrigger|CodeBlockTab|Tabs|TabsList|TabsTrigger|TabsContent|Tab|Cards|Card|Accordions|Accordion|Youtube|Button|SharedContent|Steps|Step)\b/;Also applies to: 156-163
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/docs/scripts/test-llm-markdown-fidelity.ts` around lines 4 - 5, The regex rawComponentPattern currently used to detect raw JSX misses several component tags the transformer handles; update the pattern in test-llm-markdown-fidelity.ts to include CalloutContainer, CalloutTitle, CalloutDescription, TabsList, TabsTrigger, and TabsContent (update the const rawComponentPattern declaration) and add at least one snapshot test that includes <TabsList> and <TabsTrigger> markup to ensure the suite fails when those raw JSX tags leak (also apply the same regex change where the pattern is duplicated at lines 156-163).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/docs/src/lib/llm-markdown.ts`:
- Around line 416-417: The current regex that strips "<TabsList>" and
"<TabsTrigger>" uses a non-capturing alternation and can close with the wrong
tag; update the first replace in llm-markdown (the chain that calls
.replace(/<Tabs(?:List|Trigger)[\s\S]*?<\/Tabs(?:List|Trigger)>/g, "")) to
capture which subtag was opened (e.g., capture List or Trigger) and use that
same capture in the closing tag backreference so the opening and closing names
match; keep the second replace for generic Tabs/TabsContent removal as-is.
- Around line 399-433: The component-rewrite regexes in
normalizeProcessedMarkdown run on raw markdown before fenced code is protected,
so any fenced examples containing tags like CalloutContainer, CodeBlockTab, Tab,
Accordion, Step, or SharedContent get mangled; to fix, call
protectFencedCodeBlocks(markdown) first (or immediately after removing comment
blocks) and then run all component replacement .replace chains against the
returned protected string (use that variable instead of componentMarkdown),
ensuring later logic still unprotects or returns the final content; reference
normalizeProcessedMarkdown and protectFencedCodeBlocks when applying this
change.
---
Nitpick comments:
In `@apps/docs/scripts/test-llm-markdown-fidelity.ts`:
- Around line 4-5: The regex rawComponentPattern currently used to detect raw
JSX misses several component tags the transformer handles; update the pattern in
test-llm-markdown-fidelity.ts to include CalloutContainer, CalloutTitle,
CalloutDescription, TabsList, TabsTrigger, and TabsContent (update the const
rawComponentPattern declaration) and add at least one snapshot test that
includes <TabsList> and <TabsTrigger> markup to ensure the suite fails when
those raw JSX tags leak (also apply the same regex change where the pattern is
duplicated at lines 156-163).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3b167470-725a-450f-8f8b-ca683f1d762b
📒 Files selected for processing (4)
apps/docs/package.jsonapps/docs/scripts/test-llm-markdown-fidelity.tsapps/docs/src/lib/get-llm-text.tsapps/docs/src/lib/llm-markdown.ts
Summary
Resolves DR-8323.
Adds explicit markdown conversion rules for custom MDX components used by the agent-friendly
/llms.mdxdocs output.This ensures AI agents receive readable flat markdown instead of raw JSX, dropped content, or opaque component output.
Changes
APIPageinto readable API reference markdown with method, path, parameters, request body, and responses.Tabs/Tabcontent into labelled markdown sections.Accordions,Youtube,Cards,Button,SharedContent, andStepsinto readable markdown.Verification
pnpm --filter docs run test:llm-markdownpnpm --filter docs run types:check/docs/llms.mdx/...routes locally withcurl.Summary by CodeRabbit
Release Notes
New Features
Tests