Skip to content

✨ feat: add DOCX output via pandoc#33

Merged
135yshr merged 5 commits into
mainfrom
feat/docx-output
Jun 19, 2026
Merged

✨ feat: add DOCX output via pandoc#33
135yshr merged 5 commits into
mainfrom
feat/docx-output

Conversation

@135yshr

@135yshr 135yshr commented Jun 19, 2026

Copy link
Copy Markdown
Owner

概要

Markdown → PDF に加えて DOCX 出力 に対応しました。最終ステージを Config.Format で分岐させ、docx 指定時はビルド済み HTML を外部 pandoc CLI で変換します(既存の HTML 組み立て・画像コピー・Mermaid 処理を再利用)。PDF が引き続きデフォルトです。

使い方

md2pdf -format docx document.md      # → document.docx
md2pdf -o report.docx document.md    # 拡張子から自動判定

変更内容

DOCX 出力の追加 (e402e8c)

  • ConfigFormat (pdf|docx) と PandocPath を追加
  • -format / -pandoc フラグ、resolveFormat-format または -o 拡張子から判定)
  • docx.go(新規): convertDOCX / findPandoc

図・表のフォーマット改善 (f746ac0)

実ファイル変換で発覚した 2 つの不具合を修正:

  • 画像が表示されない: Word は pandoc が HTML から埋め込む SVG を確実に表示できないため、docx 出力時は Mermaid を PNG で描画して <img> 埋め込みに変更(PDF は従来通り SVG インライン)
  • 表が罫線なし: pandoc デフォルトの Table スタイルに罫線が無いため、reference document を生成して Table スタイルに罫線を注入(OOXML 要素順序準拠・ベストエフォート、失敗時はデフォルトへフォールバック)

検証

実ドキュメント(表 10 個・Mermaid 図 4 個)で確認:

  • ✅ 図がすべて有効な PNG として埋め込み・Word で表示可能
  • ✅ 表がすべて罫線付きで表示
  • ✅ docx の読み戻しも正常

テスト

  • findPandoc / convertDOCX / injectTableBorders / addTableBorders / buildHTML(img 注入)/ resolveFormat / parseFlags を追加
  • 全ユニットテスト・gofmtgo vet 合格

備考

DOCX 出力には pandoc が必要です(CLAUDE.md の依存関係を更新済み)。

Summary by CodeRabbit

Release Notes

  • New Features

    • Added DOCX output format support alongside existing PDF functionality
    • New -format flag to explicitly select output format (pdf/docx)
    • Auto-detection of output format from file extension when not explicitly specified
    • Improved table styling in DOCX documents with visible grid borders
  • Tests

    • Added comprehensive unit tests for format resolution and output logic
    • Added validation tests for DOCX pipeline and diagram rendering
  • Documentation

    • Updated architecture documentation to describe the complete conversion pipeline and external dependencies

135yshr added 2 commits June 19, 2026 14:26
Add a -format flag (pdf|docx) and infer the format from the -o
extension. For DOCX output, the assembled HTML is converted with the
external pandoc CLI, reusing the existing Mermaid/image pipeline. PDF
remains the default.

Note: pandoc's HTML→DOCX path does not reliably embed Mermaid SVGs, so
diagrams may be dropped in DOCX output.

Entire-Checkpoint: 54049f687aa4
Word could not display the inline/standalone SVG that pandoc embeds
from HTML, so Mermaid diagrams went missing. Render them to PNG via
mmdc for DOCX output (PDF keeps inline SVG).

Pandoc's default Table style has no borders, so GFM tables rendered as
borderless text. Generate a reference document and patch the Table
style with borders (best-effort, falls back to pandoc defaults).

Entire-Checkpoint: ff0225ce25ea
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@135yshr, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 14 minutes and 51 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0f53cc52-44a1-4e2c-8966-8715b336e8cc

📥 Commits

Reviewing files that changed from the base of the PR and between f746ac0 and ab95b22.

📒 Files selected for processing (9)
  • CLAUDE.md
  • README.md
  • cmd/md2pdf/flags_test.go
  • internal/converter/converter.go
  • internal/converter/mermaid.go
  • website/content/doc/architecture.md
  • website/content/doc/getting-started.md
  • website/content/doc/usage.md
  • website/layouts/index.html
📝 Walkthrough

Walkthrough

Adds DOCX output format support to the md2pdf converter. Mermaid diagrams render to PNG files for DOCX (instead of inline SVG). HTML assembly conditionally emits <img> tags. A new docx.go module handles pandoc discovery, invocation, and OOXML table-border patching in a reference document. Converter.Convert dispatches by Config.Format, and the CLI gains -format/-pandoc flags with extension-aware output path resolution.

Changes

DOCX Output Format

Layer / File(s) Summary
Data contracts: ImagePath, Format, PandocPath
internal/converter/parser.go, internal/converter/converter.go
mermaidBlock gains an ImagePath field for DOCX <img> emission; Config gains Format ("pdf"|"docx") and PandocPath fields.
Mermaid PNG rendering for DOCX
internal/converter/mermaid.go
renderMermaid branches on format: when docx, renders each diagram to a PNG via new renderSingleDiagramPNG and stores the filename in block.ImagePath; resolveMmdc helper centralizes mmdc binary resolution for both paths.
HTML <img> vs SVG embedding
internal/converter/html.go, internal/converter/html_test.go
buildHTML selects an <img src="..."> element when block.ImagePath is set, falling back to inline SVGContent; test verifies placeholder replacement and <img> injection.
DOCX conversion pipeline
internal/converter/docx.go
New module implementing findPandoc (probes default paths), convertDOCX (builds and runs the pandoc command with optional --reference-doc), buildReferenceDoc (generates and patches a reference DOCX), and addTableBorders/injectTableBorders (rewrites word/styles.xml in-zip to insert OOXML table border fragments).
DOCX pipeline tests
internal/converter/docx_test.go
Tests for pandoc discovery, convertDOCX argument recording and failure propagation, XML border injection ordering/idempotency, zip-level patching selectivity, and fake-pandoc helpers.
Converter.Convert format dispatch
internal/converter/converter.go
Convert resolves the absolute output path then switches on strings.ToLower(cfg.Format) to call convertDOCX for docx or fall through to the existing headless-Chromium PDF path.
CLI -format/-pandoc flags and format-aware output path
cmd/md2pdf/flags.go, cmd/md2pdf/flags_test.go, cmd/md2pdf/main.go
Registers -format and -pandoc flags; resolveFormat infers/validates format from the flag or -o extension with conflict detection; derives default output filename extension from resolved format; populates Config.Format/Config.PandocPath; updates success log to print the uppercase format name.
Architecture docs
CLAUDE.md
Updates pipeline narrative to describe docx.go, Config.Format options, flags.go pandoc auto-detection, and pandoc as a DOCX-specific external dependency.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant flags.go
  participant Converter
  participant renderMermaid
  participant convertDOCX
  participant pandoc

  User->>flags.go: md2pdf -format docx input.md
  flags.go->>flags.go: resolveFormat("-format docx", "")
  flags.go->>Converter: Config{Format:"docx", PandocPath:...}
  Converter->>renderMermaid: blocks, format="docx"
  renderMermaid->>renderMermaid: renderSingleDiagramPNG (mmdc → .png)
  renderMermaid-->>Converter: block.ImagePath set
  Converter->>Converter: buildHTML (embeds img tags)
  Converter->>convertDOCX: absHTML, absOut
  convertDOCX->>pandoc: --print-default-data-file reference.docx
  pandoc-->>convertDOCX: reference.docx bytes
  convertDOCX->>convertDOCX: addTableBorders (patch word/styles.xml)
  convertDOCX->>pandoc: -f html -o out.docx --reference-doc patched.docx
  pandoc-->>convertDOCX: out.docx
  convertDOCX-->>User: "DOCX saved to out.docx"
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • 135yshr/md2pdf#4: Both PRs modify CLAUDE.md, with #4 establishing the initial architecture documentation that this PR updates to reflect the new DOCX pipeline and pandoc dependency.

Poem

🐇 Hop hop, the rabbit types away,
PDF was fine, but DOCX holds sway!
With pandoc's help and borders true,
Tables get their grid lines too.
.docx or .pdf—just name your file,
The rabbit's pipeline runs in style! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding DOCX output support via pandoc. It is concise, clear, and directly related to the primary feature introduced across multiple files in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/docx-output

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Add -format/-pandoc flags, pandoc dependency, and DOCX examples to the
README and Hugo site (usage, getting-started, architecture, homepage).

Entire-Checkpoint: 63056eabcc75
@135yshr 135yshr self-assigned this Jun 19, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cmd/md2pdf/flags_test.go (1)

48-93: ⚡ Quick win

Add a -pandoc passthrough case in parseFlags tests.

The new CLI surface includes -pandoc, but this test block doesn’t verify Config.PandocPath mapping yet.

Suggested test addition
 func TestParseFlags_DefaultOutputExtensionFollowsFormat(t *testing.T) {
@@
 	t.Run("docx inferred from output extension", func(t *testing.T) {
@@
 	})
+
+	t.Run("pandoc path passthrough", func(t *testing.T) {
+		cfg, err := parseFlags([]string{"-format", "docx", "-pandoc", "/custom/pandoc", input})
+		if err != nil {
+			t.Fatalf("parseFlags: %v", err)
+		}
+		if cfg.PandocPath != "/custom/pandoc" {
+			t.Errorf("PandocPath = %q, want %q", cfg.PandocPath, "/custom/pandoc")
+		}
+	})
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/md2pdf/flags_test.go` around lines 48 - 93, Add a new subtest within
TestParseFlags_DefaultOutputExtensionFollowsFormat to verify that the -pandoc
flag is properly handled by parseFlags. Create a test case similar to the
existing subtests that calls parseFlags with the -pandoc argument followed by a
path value, then verify that the returned config's PandocPath field is set to
the expected path value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CLAUDE.md`:
- Line 46: In the CLAUDE.md architecture documentation, correct the description
of pandoc auto-detection ownership. The current text incorrectly attributes
pandoc auto-detection to flags.go, but the actual implementation shows that
findPandoc function in internal/converter/docx.go handles the auto-detection
logic. Update the line to clarify that flags.go handles argument parsing only,
while the actual pandoc path auto-detection is performed by the findPandoc
function in the docx converter module.

In `@internal/converter/mermaid.go`:
- Around line 61-63: To prevent filename collisions between Mermaid-generated
PNGs and user-supplied images, create a dedicated subdirectory for generated
diagrams. Modify the code where pngName is defined (line 62) to include a
subdirectory prefix like _md2pdf_mermaid, and ensure pngFile on line 63
constructs the full path within that subdirectory. Additionally, create the
subdirectory in the working directory before generating files, and ensure that
the pngName variable (or the returned path used later) includes the subdirectory
so that copyImages references the isolated location and cannot collide with user
images.

---

Nitpick comments:
In `@cmd/md2pdf/flags_test.go`:
- Around line 48-93: Add a new subtest within
TestParseFlags_DefaultOutputExtensionFollowsFormat to verify that the -pandoc
flag is properly handled by parseFlags. Create a test case similar to the
existing subtests that calls parseFlags with the -pandoc argument followed by a
path value, then verify that the returned config's PandocPath field is set to
the expected path value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f8b2bc4b-96fa-49c7-b33a-ab1b89523ab9

📥 Commits

Reviewing files that changed from the base of the PR and between 640eb17 and f746ac0.

📒 Files selected for processing (11)
  • CLAUDE.md
  • cmd/md2pdf/flags.go
  • cmd/md2pdf/flags_test.go
  • cmd/md2pdf/main.go
  • internal/converter/converter.go
  • internal/converter/docx.go
  • internal/converter/docx_test.go
  • internal/converter/html.go
  • internal/converter/html_test.go
  • internal/converter/mermaid.go
  • internal/converter/parser.go

Comment thread CLAUDE.md Outdated
Comment thread internal/converter/mermaid.go Outdated
Generated diagrams were written as diagram_N.png in the working
directory, where copyImages could overwrite them with a same-named
user image. Render them under a dedicated _md2pdf_mermaid/ subdir and
skip that prefix in copyImages.

Also add a -pandoc passthrough test and correct the CLAUDE.md note on
where pandoc is resolved (docx.go, not flags.go).

Entire-Checkpoint: 80a277ffcf1c
@135yshr

135yshr commented Jun 19, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Deduplicate the SVG and PNG diagram renderers into a single runMmdc
helper, and name the mmdc background colour and PNG scale constants.
Behavior-preserving: identical mmdc args, side-effect order, and error
messages (verified by independent review).

Entire-Checkpoint: 9c961b1218f1
@135yshr 135yshr merged commit 847c51a into main Jun 19, 2026
4 checks passed
@135yshr 135yshr deleted the feat/docx-output branch June 19, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant