Update markdown copy logic to serve raw content via generated HTML pages instead of standalone files#2180
Update markdown copy logic to serve raw content via generated HTML pages instead of standalone files#2180Sachindu-Nethmin wants to merge 7 commits intowso2:mainfrom
Conversation
… logic to fetch processed content
…ges instead of standalone files
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughCompute and fetch a page's flattened Markdown at click time in the client UI; add two MkDocs build hooks that mirror Markdown into HTML under the site output and emit site index files; register those hooks in MkDocs config. Changes
Sequence Diagram(s)sequenceDiagram
participant User as "User"
participant Browser as "Browser (copy-page.js)"
participant Site as "Site (static .md/.html)"
participant LLM as "External LLM (chat.openai.com / perplexity / claude.ai)"
User->>Browser: Click "Copy" / "View as Markdown" / "Open in LLM"
Browser->>Site: compute flattened `.md` URL (drop hashes, strip index/.html)
Browser->>Site: fetch computed `.md` URL
Site-->>Browser: return HTML (mirrored `.md` content)
Browser->>Browser: parse response via DOMParser -> extract body.textContent
alt Copy action
Browser->>User: write plain Markdown to clipboard, show "Copied!" feedback
else View action
Browser->>User: navigate window.location.href to computed `.md` URL
else Open in LLM
Browser->>LLM: open new tab with base-URL + prompt (noopener,noreferrer)
Browser->>User: show temporary button feedback (e.g., "Opening..."/"Copied!")
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
en/hooks/copy_md.py (1)
34-34: Drop the unused f-string prefix.Ruff's F541 is correct here: Line 34 has no interpolation, so the
fprefix just adds noise.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@en/hooks/copy_md.py` at line 34, Remove the unnecessary f-string prefix on the print call; replace the expression print(f"SUCCESS - llms.txt and llms-full.txt generated.") with a normal string literal (print("SUCCESS - llms.txt and llms-full.txt generated.")) so there is no unused f-string in en/hooks/copy_md.py.en/docs/assets/js/copy-page.js (1)
137-156: Avoid stacking document click handlers on re-init.
handleGlobalClickis recreated on everyinit(), so Lines 155-156 never remove the previous listener. With the observer-driven re-init path, that leaves stale document-level handlers behind. Store the active handler at module scope and unregister that exact reference before adding a new one.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@en/docs/assets/js/copy-page.js` around lines 137 - 156, The document click handler is recreated on every init() causing stacked listeners; introduce a module-scoped variable (e.g., activeGlobalClickHandler) to hold the current handler reference, and before calling document.addEventListener('click', handleGlobalClick) first check and call document.removeEventListener('click', activeGlobalClickHandler) if set, then assign activeGlobalClickHandler = handleGlobalClick and add it; update any teardown logic to remove activeGlobalClickHandler as well so button/menu/setOpen use the single canonical handler reference.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@en/docs/assets/js/copy-page.js`:
- Around line 177-183: The "View as Markdown" click handler builds mdPath
manually and misroutes version/lang landing pages; replace the manual path
creation in the cp-view click listener with a call to the existing helper
getFlattenedMarkdownUrlFromHtmlUrl(window.location.pathname) (or the appropriate
exported helper) and use its returned path for window.location.href; update the
event handler (menu.querySelector('.cp-view').addEventListener(...)) to compute
the markdown URL via getFlattenedMarkdownUrlFromHtmlUrl and then navigate to
window.location.origin + that result, leaving setOpen(false) as-is.
In `@en/hooks/copy_markdown.py`:
- Around line 9-21: The hook currently walks docs_dir and copies every .md file
(in the loop using os.walk, filename, src, rel, dst_dir), which ignores MkDocs
exclude patterns; fix it by loading the MkDocs config (e.g., via
mkdocs.config.load_config) and obtain the exclude/include patterns, then before
processing each file (after computing rel) check the file against those exclude
patterns using fnmatch or pathspec and skip any matches so excluded paths (like
includes/* and wip/*) are not copied to site_dir; ensure the check is applied in
the loop that currently tests filename.endswith(".md") and uses src/rel/dst_dir.
In `@en/hooks/copy_md.py`:
- Around line 10-13: In on_post_page, replace the use of page.file.src_path with
page.file.src_uri when building the rel_url stored in ALL_PAGES (the list
appended in the on_post_page function) so links are always normalized to
forward-slash URIs; update the line that sets rel_url to use src_uri and ensure
the rest of the existing check/appending logic against ALL_PAGES remains the
same.
---
Nitpick comments:
In `@en/docs/assets/js/copy-page.js`:
- Around line 137-156: The document click handler is recreated on every init()
causing stacked listeners; introduce a module-scoped variable (e.g.,
activeGlobalClickHandler) to hold the current handler reference, and before
calling document.addEventListener('click', handleGlobalClick) first check and
call document.removeEventListener('click', activeGlobalClickHandler) if set,
then assign activeGlobalClickHandler = handleGlobalClick and add it; update any
teardown logic to remove activeGlobalClickHandler as well so button/menu/setOpen
use the single canonical handler reference.
In `@en/hooks/copy_md.py`:
- Line 34: Remove the unnecessary f-string prefix on the print call; replace the
expression print(f"SUCCESS - llms.txt and llms-full.txt generated.") with a
normal string literal (print("SUCCESS - llms.txt and llms-full.txt generated."))
so there is no unused f-string in en/hooks/copy_md.py.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 10b1e13e-0df6-40eb-8300-b2f1647462c1
📒 Files selected for processing (4)
en/docs/assets/js/copy-page.jsen/hooks/copy_markdown.pyen/hooks/copy_md.pyen/mkdocs.yml
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
en/docs/assets/js/copy-page.js (1)
190-196:⚠️ Potential issue | 🟠 MajorThe “View as Markdown” path is still built inconsistently.
This still bypasses
getFlattenedMarkdownUrlFromHtmlUrl(), so pages like/en/latest/and/en/4.6.0/navigate to/en/latest.mdand/en/4.6.0.mdinstead of the mirrored.../index.mdtarget. Reuse the same helper here so view/copy resolve the same source.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@en/docs/assets/js/copy-page.js` around lines 190 - 196, The "View as Markdown" handler builds mdPath inconsistently; replace the manual pathname-to-.md logic inside the menu.querySelector('.cp-view') click listener with a call to getFlattenedMarkdownUrlFromHtmlUrl(window.location.pathname) so it mirrors the same flattening used elsewhere (then set window.location.href to window.location.origin + that returned path and call setOpen(false)); ensure getFlattenedMarkdownUrlFromHtmlUrl is in scope/imported where the handler is defined so view and copy resolve the same source.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@en/docs/assets/js/copy-page.js`:
- Around line 20-48: The current getFlattenedMarkdownUrlFromHtmlUrl function
reconstructs a source path from window.location.href which is lossy; instead
have the page template emit the actual source path (page.file.src_path) into the
HTML as a data attribute (e.g. data-src-path on body or a meta tag) and update
getFlattenedMarkdownUrlFromHtmlUrl to first read that attribute and return its
corresponding URL if present (falling back to the existing reconstruction logic
only when the attribute is missing). Locate getFlattenedMarkdownUrlFromHtmlUrl
and the copy/view/AI button code that calls it, add reading of the new data
attribute (and appropriate normalization) and ensure the returned URL points
exactly to the source src_path-derived markdown file rather than guessing from
the current href.
- Around line 218-224: The current isHomePage check hard-codes strings and uses
pathname.endsWith('/index.html'); replace it with a normalized-segment check:
read window.location.pathname into pathname, trim leading/trailing slashes,
split into segments, then set isHomePage true when pathname === '/' OR when
segments indicate a site/version landing page (e.g., segments.length === 0 or
segments.length === 1 for top-level, or segments.length === 2 where segments[0]
is a language like 'en' and segments[1] is a version token such as 'latest' or a
semver-like pattern); also handle legacy single-segment roots like 'docs-mi' by
checking against a small whitelist if needed—update the isHomePage assignment to
use this segment-based logic instead of the fixed strings and the
endsWith('/index.html') check.
---
Duplicate comments:
In `@en/docs/assets/js/copy-page.js`:
- Around line 190-196: The "View as Markdown" handler builds mdPath
inconsistently; replace the manual pathname-to-.md logic inside the
menu.querySelector('.cp-view') click listener with a call to
getFlattenedMarkdownUrlFromHtmlUrl(window.location.pathname) so it mirrors the
same flattening used elsewhere (then set window.location.href to
window.location.origin + that returned path and call setOpen(false)); ensure
getFlattenedMarkdownUrlFromHtmlUrl is in scope/imported where the handler is
defined so view and copy resolve the same source.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bc6e8077-1ed8-4835-b44a-2891be59ba56
📒 Files selected for processing (1)
en/docs/assets/js/copy-page.js
- Don't show the copy-page toolbar icon on homepage (/ or version root pages) - Update home page exclusion logic to include /docs-mi/ for GitHub Pages and localhost - Add visual feedback when copying: show 'Copied!' text in menu item - Auto-revert 'Copied!' text after 2 seconds
04cffea to
e162dbc
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
en/hooks/copy_markdown.py (1)
10-33:⚠️ Potential issue | 🟠 MajorExcluded files can still be published due to plugin config type mismatch.
plugins["exclude"]is handled as adict, soexclude_patternscan stay empty and the hook may mirror excluded docs intosite_dir. Also normalizerelbefore glob matching to avoid separator mismatches.In MkDocs hook code, what is the type of `config["plugins"]["exclude"]`, and what is the recommended way to read plugin options like `glob` from that plugin instance?Suggested fix
- plugins = config.get("plugins", {}) - if "exclude" in plugins: - exclude_config = plugins["exclude"] - if isinstance(exclude_config, dict) and "glob" in exclude_config: - exclude_patterns = exclude_config["glob"] + plugins = config.get("plugins") + if plugins and "exclude" in plugins: + exclude_plugin = plugins["exclude"] + exclude_patterns = list(getattr(exclude_plugin, "config", {}).get("glob", [])) @@ - rel = os.path.relpath(src, docs_dir) + rel = os.path.relpath(src, docs_dir) + rel_posix = rel.replace(os.sep, "/") @@ - for pattern in exclude_patterns: - if fnmatch.fnmatch(rel, pattern) or fnmatch.fnmatch(rel, f"{pattern}/*"): + for pattern in exclude_patterns: + if fnmatch.fnmatch(rel_posix, pattern): should_skip = True break🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@en/hooks/copy_markdown.py` around lines 10 - 33, The hook assumes plugins["exclude"] is a dict but MkDocs exposes plugin instances; update the logic that builds exclude_patterns to handle a plugin instance by retrieving its config (e.g., plugin = plugins["exclude"]; use plugin.config.get("glob") or similar) and fall back to the dict case if needed, ensure exclude_patterns is always a list, and normalize rel to POSIX-style paths (e.g., replace os.sep or use PurePosixPath/posix path) before fnmatch matching so glob separators align.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@en/hooks/copy_md.py`:
- Around line 30-31: The generated markdown link text is not escaped, so titles
containing '[' or ']' will break the link; update the block that builds
full_lines (the for loop over ALL_PAGES and the full_lines.append call) to
escape square brackets in p['title'] (and optionally backslashes) before
formatting the link, e.g. replace '[' with '\[' and ']' with '\]' on p['title']
and then use that escaped title in the f"- [{...}](./{p['url']})" append.
---
Duplicate comments:
In `@en/hooks/copy_markdown.py`:
- Around line 10-33: The hook assumes plugins["exclude"] is a dict but MkDocs
exposes plugin instances; update the logic that builds exclude_patterns to
handle a plugin instance by retrieving its config (e.g., plugin =
plugins["exclude"]; use plugin.config.get("glob") or similar) and fall back to
the dict case if needed, ensure exclude_patterns is always a list, and normalize
rel to POSIX-style paths (e.g., replace os.sep or use PurePosixPath/posix path)
before fnmatch matching so glob separators align.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3e7e3d93-ac94-4ca4-9686-ab617e730f89
📒 Files selected for processing (3)
en/docs/assets/js/copy-page.jsen/hooks/copy_markdown.pyen/hooks/copy_md.py
🚧 Files skipped from review as they are similar to previous changes (1)
- en/docs/assets/js/copy-page.js
… page detection logic
Purpose
Adds a "View as Markdown" option to the copy-page toolbar dropdown so readers can view the raw Markdown source of any documentation
page directly in the browser without triggering a file download.
Goals
mkdocs serve(localhost) and GitHub Pages with zero web-server configuration.Approach
MkDocs post-build hook (
en/hooks/copy_markdown.py)A new
on_post_buildhook walksdocs_dirand, for every.mdsource file, creates a directory named<page>.md/insidesite_dircontaining a minimal
index.htmlthat renders the escaped Markdown content in a monospace, pre-wrap style. Because the server respondswith
Content-Type: text/htmlfor theindex.html, browsers render it inline instead of triggering a download — which would happen with araw
.mdfile served asapplication/octet-stream.en/hooks/copy_md.pycleanupThe previous hook wrote raw
.mdfiles to the same paths now used as directories, and registered a Tornado handler (on_serve) that wouldhave returned 403 on those directories. Both were removed. Page-info collection for
llms.txt/llms-full.txtis retained, simplifiedto use
page.file.src_pathdirectly.JavaScript (
en/docs/assets/js/copy-page.js).cp-viewclick handler now navigates the current tab (window.location.href) to<origin><pathname-without-trailing-slash>.md, replacing the previouswindow.open(..., '_blank').fetchFlattenedMarkdownForCurrentPage(used by "Copy page" and AI-prompt features) is updated to parse the HTML wrapper withDOMParserand return
body.textContent, which decodes HTML entities and restores the original Markdown string.URL derivation examples:
http://localhost:8000/quick-start-guide/quick-start-guide/http://localhost:8000/quick-start-guide/quick-start-guide.mdhttps://mi.docs.wso2.com/en/latest/overview/overview/https://mi.docs.wso2.com/en/latest/overview/overview.mdhttp://localhost:8000/http://localhost:8000/index.mdThe server auto-redirects
.md→.md/(appends trailing slash for the directory) and servesindex.html. The final address-bar URL willhave a trailing slash — this is expected and correct.
User stories
own notes or tools without unwanted HTML formatting.
browser-rendered markup.
Release note
Added a "View as Markdown" option to the copy-page toolbar on the documentation site. Clicking it redirects the current tab to a plain-text
view of the page's raw Markdown source, rendered inline in the browser with no download prompt.
Documentation
N/A — this is a documentation site tooling change; it does not affect product documentation content.
Training
N/A
Certification
N/A — no impact on product certification exams.
Marketing
N/A
Automation tests
mkdocs serve(localhost) and confirmed the build producessite/<page>.md/index.htmlfor everysource Markdown file. Verified that "View as Markdown" renders inline and "Copy page" still copies clean Markdown text.
Security checks
Samples
N/A
Related PRs
N/A
Migrations (if applicable)
N/A — static site build change only; no data migration required.
Test environment
mkdocs serve)Learning
on_post_build(runs once after all pages are built) rather thanon_post_page(runs per page) becausethe hook reads from
docs_dirdirectly and doesn't need per-page context..mdfiles: Servers (including GitHub Pages and Python's built-in HTTP server) serve.mdfiles asapplication/octet-stream, triggering a download dialog. Wrapping content in a<body>inside anindex.htmldirectory forcesContent-Type: text/htmlwith no server configuration.html.escape+DOMParserround-trip: Python'shtml.escape()encodes<,>,&,"so the raw Markdown is safe inside<body>. On the JS side,DOMParser+body.textContentdecodes the entities back to the original characters, making the same URL usablefor both browser viewing and programmatic
fetch()in the "Copy page" feature.StaticFileHandler+ directories: Tornado returns HTTP 403 for directory paths, so the existingon_servehandler thatintercepted
*.mdrequests had to be removed to let MkDocs' default handler redirect to the.md/directory and serveindex.html.Summary by CodeRabbit
New Features
Improvements