Skip to content

Update markdown copy logic to serve raw content via generated HTML pages instead of standalone files#2180

Open
Sachindu-Nethmin wants to merge 7 commits intowso2:mainfrom
Sachindu-Nethmin:copy-url
Open

Update markdown copy logic to serve raw content via generated HTML pages instead of standalone files#2180
Sachindu-Nethmin wants to merge 7 commits intowso2:mainfrom
Sachindu-Nethmin:copy-url

Conversation

@Sachindu-Nethmin
Copy link
Copy Markdown
Contributor

@Sachindu-Nethmin Sachindu-Nethmin commented Apr 15, 2026

Purpose

Adds a "View as Markdown" option to the copy-page toolbar dropdown so readers can view the raw Markdown source of any documentation
page directly in the browser without triggering a file download.

Goals

  • Allow users to view the raw Markdown source of any page as plain text in the browser.
  • Work on both mkdocs serve (localhost) and GitHub Pages with zero web-server configuration.
  • Never trigger a browser file download prompt.

Approach

MkDocs post-build hook (en/hooks/copy_markdown.py)
A new on_post_build hook walks docs_dir and, for every .md source file, creates a directory named <page>.md/ inside site_dir
containing a minimal index.html that renders the escaped Markdown content in a monospace, pre-wrap style. Because the server responds
with Content-Type: text/html for the index.html, browsers render it inline instead of triggering a download — which would happen with a
raw .md file served as application/octet-stream.

en/hooks/copy_md.py cleanup
The previous hook wrote raw .md files to the same paths now used as directories, and registered a Tornado handler (on_serve) that would
have returned 403 on those directories. Both were removed. Page-info collection for llms.txt / llms-full.txt is retained, simplified
to use page.file.src_path directly.

JavaScript (en/docs/assets/js/copy-page.js)

  • The .cp-view click handler now navigates the current tab (window.location.href) to
    <origin><pathname-without-trailing-slash>.md, replacing the previous window.open(..., '_blank').
  • fetchFlattenedMarkdownForCurrentPage (used by "Copy page" and AI-prompt features) is updated to parse the HTML wrapper with DOMParser
    and return body.textContent, which decodes HTML entities and restores the original Markdown string.

URL derivation examples:

Current page URL Navigates to
http://localhost:8000/quick-start-guide/quick-start-guide/ http://localhost:8000/quick-start-guide/quick-start-guide.md
https://mi.docs.wso2.com/en/latest/overview/overview/ https://mi.docs.wso2.com/en/latest/overview/overview.md
http://localhost:8000/ http://localhost:8000/index.md

The server auto-redirects .md.md/ (appends trailing slash for the directory) and serves index.html. The final address-bar URL will
have a trailing slash — this is expected and correct.

User stories

  • As a developer integrating with WSO2 MI, I want to view the raw Markdown source of a documentation page so I can copy-paste it into my
    own notes or tools without unwanted HTML formatting.
  • As an LLM/AI tool user, I want a direct plain-text view of a documentation page so I can feed it to an AI assistant without
    browser-rendered markup.

Release note

Added a "View as Markdown" option to the copy-page toolbar on the documentation site. Clicking it redirects the current tab to a plain-text
view of the page's raw Markdown source, rendered inline in the browser with no download prompt.

Documentation

N/A — this is a documentation site tooling change; it does not affect product documentation content.

Training

N/A

Certification

N/A — no impact on product certification exams.

Marketing

N/A

Automation tests

  • Unit tests: N/A — browser-side JavaScript and MkDocs hook; no unit-testable business logic introduced.
  • Integration tests: Manually verified on mkdocs serve (localhost) and confirmed the build produces site/<page>.md/index.html for every
    source Markdown file. Verified that "View as Markdown" renders inline and "Copy page" still copies clean Markdown text.

Security checks

Samples

N/A

Related PRs

N/A

Migrations (if applicable)

N/A — static site build change only; no data migration required.

Test environment

  • OS: macOS 15 (Darwin 25.3.0)
  • Python: 3.x (MkDocs hook runtime)
  • MkDocs: mkdocs-material
  • Browsers tested: Chrome, Firefox (localhost via mkdocs serve)
  • Deployment target: GitHub Pages

Learning

  • MkDocs hooks lifecycle: Used on_post_build (runs once after all pages are built) rather than on_post_page (runs per page) because
    the hook reads from docs_dir directly and doesn't need per-page context.
  • Content-Type problem with raw .md files: Servers (including GitHub Pages and Python's built-in HTTP server) serve .md files as
    application/octet-stream, triggering a download dialog. Wrapping content in a <body> inside an index.html directory forces
    Content-Type: text/html with no server configuration.
  • html.escape + DOMParser round-trip: Python's html.escape() encodes <, >, &, " so the raw Markdown is safe inside
    <body>. On the JS side, DOMParser + body.textContent decodes the entities back to the original characters, making the same URL usable
    for both browser viewing and programmatic fetch() in the "Copy page" feature.
  • Tornado StaticFileHandler + directories: Tornado returns HTTP 403 for directory paths, so the existing on_serve handler that
    intercepted *.md requests had to be removed to let MkDocs' default handler redirect to the .md/ directory and serve index.html.

Summary by CodeRabbit

  • New Features

    • Build now emits mirror HTML pages and two site indexes for direct plain-text/Markdown access.
  • Improvements

    • "Copy as Markdown" fetches the current page’s Markdown on demand and copies cleaner plain text with temporary "Copied!" feedback.
    • "View as plain text" opens the computed plain-text path; widget is hidden on homepage-like routes and menu wording clarified.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Compute and fetch a page's flattened Markdown at click time in the client UI; add two MkDocs build hooks that mirror Markdown into HTML under the site output and emit site index files; register those hooks in MkDocs config.

Changes

Cohort / File(s) Summary
Client copy UI
en/docs/assets/js/copy-page.js
Replaced createCopyPageButton(markdownUrl) with a no-arg createCopyPageButton(). Added getFlattenedMarkdownUrlFromHtmlUrl to normalize HTML→.md, fetchFlattenedMarkdownForCurrentPage to fetch/parse Markdown at click time, copy action now copies fetched Markdown and shows feedback, "View as Markdown" navigates to computed .md. Consolidated ChatGPT/Perplexity handlers via selector→base-URL map, updated Claude flow to fetch then open https://claude.ai, added noopener,noreferrer to window.open, simplified insertion logic and homepage suppression, and adjusted UI text.
Build hook — HTML mirrors
en/hooks/copy_markdown.py
New on_post_build(config, **kwargs) hook walks config["docs_dir"] for .md files (respecting optional exclude globs), mirrors each as an HTML index.html under the corresponding path in config["site_dir"], embedding escaped Markdown and filename into the generated HTML.
Build hook — Index files
en/hooks/copy_md.py
New hook module with global ALL_PAGES: on_pre_build resets list, on_post_page collects unique {title,url} entries from each page's src_uri, and on_post_build writes llms.txt and llms-full.txt into config['site_dir'] with sorted Markdown link entries.
MkDocs config
en/mkdocs.yml
Registered two new hooks: hooks/copy_md.py and hooks/copy_markdown.py under top-level hooks.

Sequence Diagram(s)

sequenceDiagram
  participant User as "User"
  participant Browser as "Browser (copy-page.js)"
  participant Site as "Site (static .md/.html)"
  participant LLM as "External LLM (chat.openai.com / perplexity / claude.ai)"

  User->>Browser: Click "Copy" / "View as Markdown" / "Open in LLM"
  Browser->>Site: compute flattened `.md` URL (drop hashes, strip index/.html)
  Browser->>Site: fetch computed `.md` URL
  Site-->>Browser: return HTML (mirrored `.md` content)
  Browser->>Browser: parse response via DOMParser -> extract body.textContent
  alt Copy action
    Browser->>User: write plain Markdown to clipboard, show "Copied!" feedback
  else View action
    Browser->>User: navigate window.location.href to computed `.md` URL
  else Open in LLM
    Browser->>LLM: open new tab with base-URL + prompt (noopener,noreferrer)
    Browser->>User: show temporary button feedback (e.g., "Opening..."/"Copied!")
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I flattened paths and nibbled through the night,
I fetched the markdown hidden out of sight,
Hooks planted mirrors, tidy and polite,
An index trail to follow by moonlight,
Clipboard carrots for devs — hop, delight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: replacing standalone file serving with HTML-wrapped directory-based serving for raw markdown content.
Description check ✅ Passed The description comprehensively follows the template with all major sections completed: Purpose, Goals, Approach, User stories, Release note, Documentation, Security checks, and Test environment are all thoroughly addressed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
en/hooks/copy_md.py (1)

34-34: Drop the unused f-string prefix.

Ruff's F541 is correct here: Line 34 has no interpolation, so the f prefix just adds noise.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@en/hooks/copy_md.py` at line 34, Remove the unnecessary f-string prefix on
the print call; replace the expression print(f"SUCCESS - llms.txt and
llms-full.txt generated.") with a normal string literal (print("SUCCESS -
llms.txt and llms-full.txt generated.")) so there is no unused f-string in
en/hooks/copy_md.py.
en/docs/assets/js/copy-page.js (1)

137-156: Avoid stacking document click handlers on re-init.

handleGlobalClick is recreated on every init(), so Lines 155-156 never remove the previous listener. With the observer-driven re-init path, that leaves stale document-level handlers behind. Store the active handler at module scope and unregister that exact reference before adding a new one.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@en/docs/assets/js/copy-page.js` around lines 137 - 156, The document click
handler is recreated on every init() causing stacked listeners; introduce a
module-scoped variable (e.g., activeGlobalClickHandler) to hold the current
handler reference, and before calling document.addEventListener('click',
handleGlobalClick) first check and call document.removeEventListener('click',
activeGlobalClickHandler) if set, then assign activeGlobalClickHandler =
handleGlobalClick and add it; update any teardown logic to remove
activeGlobalClickHandler as well so button/menu/setOpen use the single canonical
handler reference.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@en/docs/assets/js/copy-page.js`:
- Around line 177-183: The "View as Markdown" click handler builds mdPath
manually and misroutes version/lang landing pages; replace the manual path
creation in the cp-view click listener with a call to the existing helper
getFlattenedMarkdownUrlFromHtmlUrl(window.location.pathname) (or the appropriate
exported helper) and use its returned path for window.location.href; update the
event handler (menu.querySelector('.cp-view').addEventListener(...)) to compute
the markdown URL via getFlattenedMarkdownUrlFromHtmlUrl and then navigate to
window.location.origin + that result, leaving setOpen(false) as-is.

In `@en/hooks/copy_markdown.py`:
- Around line 9-21: The hook currently walks docs_dir and copies every .md file
(in the loop using os.walk, filename, src, rel, dst_dir), which ignores MkDocs
exclude patterns; fix it by loading the MkDocs config (e.g., via
mkdocs.config.load_config) and obtain the exclude/include patterns, then before
processing each file (after computing rel) check the file against those exclude
patterns using fnmatch or pathspec and skip any matches so excluded paths (like
includes/* and wip/*) are not copied to site_dir; ensure the check is applied in
the loop that currently tests filename.endswith(".md") and uses src/rel/dst_dir.

In `@en/hooks/copy_md.py`:
- Around line 10-13: In on_post_page, replace the use of page.file.src_path with
page.file.src_uri when building the rel_url stored in ALL_PAGES (the list
appended in the on_post_page function) so links are always normalized to
forward-slash URIs; update the line that sets rel_url to use src_uri and ensure
the rest of the existing check/appending logic against ALL_PAGES remains the
same.

---

Nitpick comments:
In `@en/docs/assets/js/copy-page.js`:
- Around line 137-156: The document click handler is recreated on every init()
causing stacked listeners; introduce a module-scoped variable (e.g.,
activeGlobalClickHandler) to hold the current handler reference, and before
calling document.addEventListener('click', handleGlobalClick) first check and
call document.removeEventListener('click', activeGlobalClickHandler) if set,
then assign activeGlobalClickHandler = handleGlobalClick and add it; update any
teardown logic to remove activeGlobalClickHandler as well so button/menu/setOpen
use the single canonical handler reference.

In `@en/hooks/copy_md.py`:
- Line 34: Remove the unnecessary f-string prefix on the print call; replace the
expression print(f"SUCCESS - llms.txt and llms-full.txt generated.") with a
normal string literal (print("SUCCESS - llms.txt and llms-full.txt generated."))
so there is no unused f-string in en/hooks/copy_md.py.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 10b1e13e-0df6-40eb-8300-b2f1647462c1

📥 Commits

Reviewing files that changed from the base of the PR and between d623323 and cbe1490.

📒 Files selected for processing (4)
  • en/docs/assets/js/copy-page.js
  • en/hooks/copy_markdown.py
  • en/hooks/copy_md.py
  • en/mkdocs.yml

Comment thread en/docs/assets/js/copy-page.js
Comment thread en/hooks/copy_markdown.py
Comment thread en/hooks/copy_md.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
en/docs/assets/js/copy-page.js (1)

190-196: ⚠️ Potential issue | 🟠 Major

The “View as Markdown” path is still built inconsistently.

This still bypasses getFlattenedMarkdownUrlFromHtmlUrl(), so pages like /en/latest/ and /en/4.6.0/ navigate to /en/latest.md and /en/4.6.0.md instead of the mirrored .../index.md target. Reuse the same helper here so view/copy resolve the same source.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@en/docs/assets/js/copy-page.js` around lines 190 - 196, The "View as
Markdown" handler builds mdPath inconsistently; replace the manual
pathname-to-.md logic inside the menu.querySelector('.cp-view') click listener
with a call to getFlattenedMarkdownUrlFromHtmlUrl(window.location.pathname) so
it mirrors the same flattening used elsewhere (then set window.location.href to
window.location.origin + that returned path and call setOpen(false)); ensure
getFlattenedMarkdownUrlFromHtmlUrl is in scope/imported where the handler is
defined so view and copy resolve the same source.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@en/docs/assets/js/copy-page.js`:
- Around line 20-48: The current getFlattenedMarkdownUrlFromHtmlUrl function
reconstructs a source path from window.location.href which is lossy; instead
have the page template emit the actual source path (page.file.src_path) into the
HTML as a data attribute (e.g. data-src-path on body or a meta tag) and update
getFlattenedMarkdownUrlFromHtmlUrl to first read that attribute and return its
corresponding URL if present (falling back to the existing reconstruction logic
only when the attribute is missing). Locate getFlattenedMarkdownUrlFromHtmlUrl
and the copy/view/AI button code that calls it, add reading of the new data
attribute (and appropriate normalization) and ensure the returned URL points
exactly to the source src_path-derived markdown file rather than guessing from
the current href.
- Around line 218-224: The current isHomePage check hard-codes strings and uses
pathname.endsWith('/index.html'); replace it with a normalized-segment check:
read window.location.pathname into pathname, trim leading/trailing slashes,
split into segments, then set isHomePage true when pathname === '/' OR when
segments indicate a site/version landing page (e.g., segments.length === 0 or
segments.length === 1 for top-level, or segments.length === 2 where segments[0]
is a language like 'en' and segments[1] is a version token such as 'latest' or a
semver-like pattern); also handle legacy single-segment roots like 'docs-mi' by
checking against a small whitelist if needed—update the isHomePage assignment to
use this segment-based logic instead of the fixed strings and the
endsWith('/index.html') check.

---

Duplicate comments:
In `@en/docs/assets/js/copy-page.js`:
- Around line 190-196: The "View as Markdown" handler builds mdPath
inconsistently; replace the manual pathname-to-.md logic inside the
menu.querySelector('.cp-view') click listener with a call to
getFlattenedMarkdownUrlFromHtmlUrl(window.location.pathname) so it mirrors the
same flattening used elsewhere (then set window.location.href to
window.location.origin + that returned path and call setOpen(false)); ensure
getFlattenedMarkdownUrlFromHtmlUrl is in scope/imported where the handler is
defined so view and copy resolve the same source.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bc6e8077-1ed8-4835-b44a-2891be59ba56

📥 Commits

Reviewing files that changed from the base of the PR and between cbe1490 and 04cffea.

📒 Files selected for processing (1)
  • en/docs/assets/js/copy-page.js

Comment thread en/docs/assets/js/copy-page.js
Comment thread en/docs/assets/js/copy-page.js Outdated
- Don't show the copy-page toolbar icon on homepage (/ or version root pages)
- Update home page exclusion logic to include /docs-mi/ for GitHub Pages and localhost
- Add visual feedback when copying: show 'Copied!' text in menu item
- Auto-revert 'Copied!' text after 2 seconds
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
en/hooks/copy_markdown.py (1)

10-33: ⚠️ Potential issue | 🟠 Major

Excluded files can still be published due to plugin config type mismatch.

plugins["exclude"] is handled as a dict, so exclude_patterns can stay empty and the hook may mirror excluded docs into site_dir. Also normalize rel before glob matching to avoid separator mismatches.

In MkDocs hook code, what is the type of `config["plugins"]["exclude"]`, and what is the recommended way to read plugin options like `glob` from that plugin instance?
Suggested fix
-    plugins = config.get("plugins", {})
-    if "exclude" in plugins:
-        exclude_config = plugins["exclude"]
-        if isinstance(exclude_config, dict) and "glob" in exclude_config:
-            exclude_patterns = exclude_config["glob"]
+    plugins = config.get("plugins")
+    if plugins and "exclude" in plugins:
+        exclude_plugin = plugins["exclude"]
+        exclude_patterns = list(getattr(exclude_plugin, "config", {}).get("glob", []))
@@
-            rel = os.path.relpath(src, docs_dir)
+            rel = os.path.relpath(src, docs_dir)
+            rel_posix = rel.replace(os.sep, "/")
@@
-            for pattern in exclude_patterns:
-                if fnmatch.fnmatch(rel, pattern) or fnmatch.fnmatch(rel, f"{pattern}/*"):
+            for pattern in exclude_patterns:
+                if fnmatch.fnmatch(rel_posix, pattern):
                     should_skip = True
                     break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@en/hooks/copy_markdown.py` around lines 10 - 33, The hook assumes
plugins["exclude"] is a dict but MkDocs exposes plugin instances; update the
logic that builds exclude_patterns to handle a plugin instance by retrieving its
config (e.g., plugin = plugins["exclude"]; use plugin.config.get("glob") or
similar) and fall back to the dict case if needed, ensure exclude_patterns is
always a list, and normalize rel to POSIX-style paths (e.g., replace os.sep or
use PurePosixPath/posix path) before fnmatch matching so glob separators align.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@en/hooks/copy_md.py`:
- Around line 30-31: The generated markdown link text is not escaped, so titles
containing '[' or ']' will break the link; update the block that builds
full_lines (the for loop over ALL_PAGES and the full_lines.append call) to
escape square brackets in p['title'] (and optionally backslashes) before
formatting the link, e.g. replace '[' with '\[' and ']' with '\]' on p['title']
and then use that escaped title in the f"- [{...}](./{p['url']})" append.

---

Duplicate comments:
In `@en/hooks/copy_markdown.py`:
- Around line 10-33: The hook assumes plugins["exclude"] is a dict but MkDocs
exposes plugin instances; update the logic that builds exclude_patterns to
handle a plugin instance by retrieving its config (e.g., plugin =
plugins["exclude"]; use plugin.config.get("glob") or similar) and fall back to
the dict case if needed, ensure exclude_patterns is always a list, and normalize
rel to POSIX-style paths (e.g., replace os.sep or use PurePosixPath/posix path)
before fnmatch matching so glob separators align.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3e7e3d93-ac94-4ca4-9686-ab617e730f89

📥 Commits

Reviewing files that changed from the base of the PR and between e0c3aec and 4ebeb55.

📒 Files selected for processing (3)
  • en/docs/assets/js/copy-page.js
  • en/hooks/copy_markdown.py
  • en/hooks/copy_md.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • en/docs/assets/js/copy-page.js

Comment thread en/hooks/copy_md.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant