✨ feat: generate DOCX directly from Markdown#34
Conversation
Convert DOCX output straight from Markdown via pandoc's gfm reader instead of the intermediate HTML, so pandoc emits clean Word-native styles rather than HTML-derived ones. - Branch the pipeline early in Convert; DOCX no longer builds HTML - Add mermaid_markdown.go to extract fenced Mermaid blocks from raw Markdown and splice rendered PNGs back as image references - Style the pandoc reference document for readable Japanese output: Yu Gothic font (theme), 10.5pt body, compact headings, plus the existing table borders; expose -docx-font to override the family - Bump Mermaid PNG scale 2 -> 4 for sharper embedded diagrams - Resolve user images via --resource-path, diagrams via cmd.Dir Entire-Checkpoint: 87ea4e680115
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughThe DOCX conversion pipeline is replaced from an HTML-intermediate approach to a direct Markdown-to-pandoc ( ChangesDOCX Pipeline Overhaul
Sequence Diagram(s)sequenceDiagram
actor CLI as CLI (-docx-font flag)
participant Conv as Converter.Convert
participant MDOCX as convertMarkdownDOCX
participant Extract as extractMermaidFromMarkdown
participant Mmdc as renderMermaidPNGs (mmdc)
participant RefDoc as buildReferenceDoc + patchReferenceDoc
participant Pandoc as pandoc (gfm reader)
CLI->>Conv: Convert(inputMD, outputPath)
Conv->>Conv: resolve srcDir, absOut
Conv->>MDOCX: mdBytes, srcDir, absOut
MDOCX->>Extract: raw Markdown bytes
Extract-->>MDOCX: rewritten Markdown + mermaid block map
MDOCX->>Mmdc: render each block → workDir/*.png
Mmdc-->>MDOCX: PNG paths
MDOCX->>MDOCX: replace placeholders with  refs
MDOCX->>MDOCX: write document.md to workDir
MDOCX->>RefDoc: pandoc --print-default-data-file reference.docx
RefDoc->>RefDoc: patchReferenceDoc (styles.xml + theme1.xml rewrites)
RefDoc-->>MDOCX: reference.docx written to workDir
MDOCX->>Pandoc: -f gfm document.md -o out.docx --reference-doc --resource-path
Pandoc-->>Conv: DOCX output file
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
概要
DOCX出力を HTML経由ではなく Markdown から pandoc の
gfmリーダーで直変換 するように変更しました。これにより pandoc が HTML 由来の雑多なスタイルではなく、クリーンな Word ネイティブの段落・リストスタイルを生成します。人間が手作業で作った読みやすい docx と比較したところ、読みにくさの主因は (1) HTML 経由による構造の乱れ、(2) 日本語フォント未指定(テーマの East Asian typeface が空でフォールバック)でした。本変更で両方を解消します。
変更点
Convert()で format を判定し、DOCX は HTML を構築せずconvertMarkdownDOCXへ。PDF パスは従来どおり(HTML → Chromium)mermaid_markdown.go(新規) — 生 Markdown を行走査して```mermaidフェンスを抽出(他のフェンスは保持)、レンダリングした PNG を画像参照として差し戻し-docx-fontでフォント変更可--resource-path、生成図はcmd.Dir(作業ディレクトリ)で解決テスト
extractMermaidFromMarkdown/ フェンス解析 / 参照ドキュメント整形ヘルパー(setThemeFonts・setBodyFontSize・shrinkHeadings)のユニットテストを追加convertDOCXテストを新 API(convertMarkdownDOCX)に更新go build/go vet/gofmt/ 全テスト緑、PDF 統合テストも回帰なしドキュメント
Summary by CodeRabbit
New Features
-docx-fontCLI option to customize font family for DOCX output (defaults to Yu Gothic for East Asian text support).Improvements
Documentation