Skip to content

feat(docs): improve llms mdx component fidelity#7871

Merged
aidankmcalister merged 1 commit intomainfrom
feat/llms-mdx-component-fidelity
May 6, 2026
Merged

feat(docs): improve llms mdx component fidelity#7871
aidankmcalister merged 1 commit intomainfrom
feat/llms-mdx-component-fidelity

Conversation

@aidankmcalister
Copy link
Copy Markdown
Member

@aidankmcalister aidankmcalister commented May 6, 2026

Summary

Resolves DR-8323.

Adds explicit markdown conversion rules for custom MDX components used by the agent-friendly /llms.mdx docs output.

This ensures AI agents receive readable flat markdown instead of raw JSX, dropped content, or opaque component output.

Changes

  • Convert APIPage into readable API reference markdown with method, path, parameters, request body, and responses.
  • Convert generated package-manager code tabs into labelled markdown sections.
  • Convert manual Tabs / Tab content into labelled markdown sections.
  • Convert admonitions into prefixed blockquotes.
  • Convert Accordions, Youtube, Cards, Button, SharedContent, and Steps into readable markdown.
  • Preserve raw component examples inside fenced code blocks.
  • Add a fast LLM markdown fidelity test suite.

Verification

  • pnpm --filter docs run test:llm-markdown
  • pnpm --filter docs run types:check
  • Manually verified representative /docs/llms.mdx/... routes locally with curl.
  • Confirmed no raw custom MDX component JSX remains in tested markdown output.
ray-so-export

Summary by CodeRabbit

Release Notes

  • New Features

    • Implemented markdown normalization system to standardize documentation component formatting and rendering.
    • Added related pages section to documentation content for improved navigation and context.
  • Tests

    • Added snapshot-driven tests to validate markdown normalization quality across documentation components.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
blog Ready Ready Preview, Comment May 6, 2026 0:38am
docs Ready Ready Preview, Comment May 6, 2026 0:38am
eclipse Ready Ready Preview, Comment May 6, 2026 0:38am
site Ready Ready Preview, Comment May 6, 2026 0:38am

Request Review

@argos-ci
Copy link
Copy Markdown

argos-ci Bot commented May 6, 2026

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ✅ No changes detected - May 6, 2026, 12:45 PM

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Walkthrough

This PR adds a markdown normalization pipeline for LLM-compatible text extraction from documentation. It introduces type-safe transformations that convert specialized UI components (callouts, code tabs, accordions, API references) into plain Markdown, with OpenAPI spec enrichment and comprehensive snapshot-driven tests.

Changes

LLM Markdown Normalization

Layer / File(s) Summary
Core Normalization Pipeline
apps/docs/src/lib/llm-markdown.ts
New module implementing markdown transformation functions: component block replacers (formatCallout, formatCodeBlockTab, formatCard, formatButton, formatSectionComponent), OpenAPI spec loader and operation formatters (formatApiPage, formatParameter, formatRequestBody, formatResponses), fenced code block protection, and the main normalizeProcessedMarkdown orchestrator that chains all transformations.
Integration & Related Content
apps/docs/src/lib/get-llm-text.ts
Import added for normalizeProcessedMarkdown. The getLLMText function now normalizes processed markdown and includes a new "Related pages" section header with formatted links when related pages exist.
Test Infrastructure
apps/docs/scripts/test-llm-markdown-fidelity.ts, apps/docs/package.json
Snapshot-based test suite covering APIPage, CodeBlockTabs, Tabs, admonitions, Accordion with YouTube, Cards, Button, SharedContent, and Steps components; verifies normalization output matches expectations and no raw MDX JSX leaks. New test:llm-markdown npm script added to execute the test suite via tsx.

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main objective: improving how custom MDX components are converted to markdown for LLM consumption, ensuring better fidelity in the agent-facing documentation output.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
apps/docs/scripts/test-llm-markdown-fidelity.ts (1)

4-5: ⚡ Quick win

Broaden the JSX-leak assertion to the tags this transformer actually handles.

rawComponentPattern misses CalloutContainer, CalloutTitle, CalloutDescription, TabsList, TabsTrigger, and TabsContent, so this suite can still pass while raw JSX leaks for supported components. Please extend the pattern and add at least one snapshot that uses <TabsList>/<TabsTrigger> markup.

Suggested fix
 const rawComponentPattern =
-  /<(?:APIPage|CodeBlockTabs|CodeBlockTab|Tabs|Tab|Cards|Card|Accordions|Accordion|Youtube|Button|SharedContent|Steps|Step)\b/;
+  /<(?:APIPage|CalloutContainer|CalloutTitle|CalloutDescription|CodeBlockTabs|CodeBlockTabsList|CodeBlockTabsTrigger|CodeBlockTab|Tabs|TabsList|TabsTrigger|TabsContent|Tab|Cards|Card|Accordions|Accordion|Youtube|Button|SharedContent|Steps|Step)\b/;

Also applies to: 156-163

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/docs/scripts/test-llm-markdown-fidelity.ts` around lines 4 - 5, The
regex rawComponentPattern currently used to detect raw JSX misses several
component tags the transformer handles; update the pattern in
test-llm-markdown-fidelity.ts to include CalloutContainer, CalloutTitle,
CalloutDescription, TabsList, TabsTrigger, and TabsContent (update the const
rawComponentPattern declaration) and add at least one snapshot test that
includes <TabsList> and <TabsTrigger> markup to ensure the suite fails when
those raw JSX tags leak (also apply the same regex change where the pattern is
duplicated at lines 156-163).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/docs/src/lib/llm-markdown.ts`:
- Around line 416-417: The current regex that strips "<TabsList>" and
"<TabsTrigger>" uses a non-capturing alternation and can close with the wrong
tag; update the first replace in llm-markdown (the chain that calls
.replace(/<Tabs(?:List|Trigger)[\s\S]*?<\/Tabs(?:List|Trigger)>/g, "")) to
capture which subtag was opened (e.g., capture List or Trigger) and use that
same capture in the closing tag backreference so the opening and closing names
match; keep the second replace for generic Tabs/TabsContent removal as-is.
- Around line 399-433: The component-rewrite regexes in
normalizeProcessedMarkdown run on raw markdown before fenced code is protected,
so any fenced examples containing tags like CalloutContainer, CodeBlockTab, Tab,
Accordion, Step, or SharedContent get mangled; to fix, call
protectFencedCodeBlocks(markdown) first (or immediately after removing comment
blocks) and then run all component replacement .replace chains against the
returned protected string (use that variable instead of componentMarkdown),
ensuring later logic still unprotects or returns the final content; reference
normalizeProcessedMarkdown and protectFencedCodeBlocks when applying this
change.

---

Nitpick comments:
In `@apps/docs/scripts/test-llm-markdown-fidelity.ts`:
- Around line 4-5: The regex rawComponentPattern currently used to detect raw
JSX misses several component tags the transformer handles; update the pattern in
test-llm-markdown-fidelity.ts to include CalloutContainer, CalloutTitle,
CalloutDescription, TabsList, TabsTrigger, and TabsContent (update the const
rawComponentPattern declaration) and add at least one snapshot test that
includes <TabsList> and <TabsTrigger> markup to ensure the suite fails when
those raw JSX tags leak (also apply the same regex change where the pattern is
duplicated at lines 156-163).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3b167470-725a-450f-8f8b-ca683f1d762b

📥 Commits

Reviewing files that changed from the base of the PR and between a7adfed and 47124a6.

📒 Files selected for processing (4)
  • apps/docs/package.json
  • apps/docs/scripts/test-llm-markdown-fidelity.ts
  • apps/docs/src/lib/get-llm-text.ts
  • apps/docs/src/lib/llm-markdown.ts

Comment thread apps/docs/src/lib/llm-markdown.ts
Comment thread apps/docs/src/lib/llm-markdown.ts
@aidankmcalister aidankmcalister merged commit e142dc7 into main May 6, 2026
16 checks passed
@aidankmcalister aidankmcalister deleted the feat/llms-mdx-component-fidelity branch May 6, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants