diff --git a/.github/workflows/notify-skills-update.yml b/.github/workflows/notify-skills-update.yml new file mode 100644 index 000000000..042fa89e9 --- /dev/null +++ b/.github/workflows/notify-skills-update.yml @@ -0,0 +1,33 @@ +# Notify cli-jaw-skills repo when OfficeCLI skills change +# +# Trigger: push to skills/** on main/agent branches +# What it does: sends repository_dispatch to cli-jaw-skills repo +# +# Required secret: SKILLS_SYNC_TOKEN +# - GitHub PAT with repo scope for lidge-jun/cli-jaw-skills +# - Set in OfficeCLI repo Settings → Secrets → Actions + +name: Notify Skills Update + +on: + push: + branches: [main, agent] + paths: + - 'skills/**' + +jobs: + notify: + runs-on: ubuntu-latest + steps: + - name: Trigger cli-jaw-skills sync + uses: peter-evans/repository-dispatch@v3 + with: + token: ${{ secrets.SKILLS_SYNC_TOKEN }} + repository: lidge-jun/cli-jaw-skills + event-type: officecli-skills-updated + client-payload: | + { + "ref": "${{ github.ref }}", + "sha": "${{ github.sha }}", + "message": "${{ github.event.head_commit.message }}" + } diff --git a/.gitignore b/.gitignore new file mode 100644 index 000000000..d3bb7d874 --- /dev/null +++ b/.gitignore @@ -0,0 +1,22 @@ +# Build output +build-local/ +bin/ +obj/ +target/ +src/rhwp-field-bridge/target/ +*.user +*.suo +*.pdb + +# IDE +.vs/ +.vscode/ +.idea/ + +# Python +__pycache__/ +*.pyc + +# OS +.DS_Store +Thumbs.db diff --git a/99.9_test/README.md b/99.9_test/README.md new file mode 100644 index 000000000..eda46afb0 --- /dev/null +++ b/99.9_test/README.md @@ -0,0 +1,171 @@ +# 99.9 Phase A-H Manual Test Guide + +> Run from: `cd /Users/jun/Developer/new/700_projects/cli-jaw/officecli` +> CLI: `dotnet run --project src/officecli -- ` +> Python: `python3 scripts/hwpx_form_edit.py ` + +--- + +## Phase E — Security + +### E1: Path Traversal +```bash +# Validate catches no errors on clean file +dotnet run --project src/officecli -- validate 99.9_test/test_phase_e.hwpx +``` +**Pass**: No `path_traversal` errors + +### E2: ZIP Bomb +**Pass**: Opening any test file completes without timeout or OOM + +### E5: XXE Defense +**Pass**: `validate` does not throw on any test file (XXE would cause exception) + +--- + +## Phase A — Quick Wins + +### A1: Extended Keywords +```bash +dotnet run --project src/officecli -- view 99.9_test/test_phase_a.hwpx forms +``` +**Pass**: All 8 keywords (성명, 주소, 생년월일, 전화번호, 이메일, 직업, 학력, 자격증) appear in form field recognition + +### A3: False Positive Filter +```bash +dotnet run --project src/officecli -- view 99.9_test/test_phase_a.hwpx text +``` +**Pass**: "접수시간: 10:30" — the time value "10:30" is NOT stripped (time-related labels preserve values) + +### A4: Shape Alt Text +```bash +dotnet run --project src/officecli -- view 99.9_test/test_phase_a.hwpx text +``` +**Pass**: "면적: 100m²" renders with superscript stripped cleanly + +--- + +## Phase B — Form Enhancement + +### B1-B2: In-Cell & KV Table +```bash +dotnet run --project src/officecli -- view 99.9_test/test_phase_b.hwpx forms +``` +**Pass**: Table recognized as form with fields: 성명, 생년월일, 주소, 전화번호, 이메일, 비고 + +### B5: Checkbox Recognition +```bash +dotnet run --project src/officecli -- view 99.9_test/test_phase_b.hwpx text +``` +**Pass**: "□남 □여" and "□동의 □미동의" detected as checkbox fields + +### B6: Fill Test (Optional) +```bash +dotnet run --project src/officecli -- set 99.9_test/test_phase_b.hwpx fill --props '성명=홍길동' +dotnet run --project src/officecli -- view 99.9_test/test_phase_b.hwpx forms +``` +**Pass**: 성명 field now shows "홍길동" + +--- + +## Phase F — Text Quality + +### F1: PUA Strip +```bash +dotnet run --project src/officecli -- view 99.9_test/test_phase_f.hwpx text +``` +**Pass**: No PUA characters (U+E000-U+F8FF range) in output + +### F3: Legal Heading Detection +```bash +dotnet run --project src/officecli -- view 99.9_test/test_phase_f.hwpx outline +``` +**Pass**: "별표 1" appears as heading in outline +**Pass**: "별첨 서류는 반환하지 않습니다" does NOT appear as heading (it's body text) + +### F5: 1x1 Cell +```bash +dotnet run --project src/officecli -- view 99.9_test/test_phase_f.hwpx markdown +``` +**Pass**: Single-cell table rendered as structured text, not markdown table syntax + +### F7: Phone Spacing (Python) +```bash +python3 scripts/hwpx_form_edit.py extract 99.9_test/test_phase_f.hwpx +``` +**Pass**: "0 1 0 - 1 2 3 4 - 5 6 7 8" collapsed to "010-1234-5678" + +--- + +## Phase G — Parser + +### G1: Section File Regex +**Pass**: All test files open successfully (section discovery works) + +### G3: Legal Heading Detection +```bash +dotnet run --project src/officecli -- view 99.9_test/test_phase_g.hwpx outline +``` +**Pass**: "제1장 총칙" appears as h1 heading +**Pass**: "제2절 적용범위" appears as h2 heading +**Pass**: "제1장에서 언급한 바와 같이" does NOT appear as heading + +### G4: Dublin Core Metadata +```bash +dotnet run --project src/officecli -- get 99.9_test/test_phase_g.hwpx /metadata +``` +**Pass**: Returns metadata dict (may include dc:title, dc:creator if present in file) + +### G5: MIME Validation +```bash +dotnet run --project src/officecli -- validate 99.9_test/test_phase_g.hwpx +``` +**Pass**: No `package_mimetype_invalid` error + +### G7: Markdown Import +```bash +# Create a test file and import markdown +dotnet run --project src/officecli -- create 99.9_test/test_import.hwpx --type hwpx +dotnet run --project src/officecli -- import 99.9_test/test_import.hwpx --markdown "# Heading\n\n> Quote text\n\n- List item 1\n- List item 2\n\n1. Ordered 1\n2. Ordered 2" +dotnet run --project src/officecli -- view 99.9_test/test_import.hwpx text +``` +**Pass**: Heading, quote (with > prefix), list items all present in output + +--- + +## Phase H — Diff/Compare + +### H1: Text Compare +```bash +dotnet run --project src/officecli -- compare 99.9_test/test_phase_h_a.hwpx 99.9_test/test_phase_h_b.hwpx text +``` +**Pass**: Output shows: +- "첫 번째 문장입니다" — `unchanged` +- "두 번째 문장입니다" → "두 번째 문장이 수정되었습니다" — `modified` +- "세 번째 문장입니다" — `unchanged` +- "네 번째 문장이 추가되었습니다" — `added` + +### H5: Page Range Compare +```bash +dotnet run --project src/officecli -- compare 99.9_test/test_phase_h_a.hwpx 99.9_test/test_phase_h_b.hwpx text --pages "1" +``` +**Pass**: Returns diff filtered to section 1 only + +--- + +## Summary Checklist + +| Phase | Items | Test File | Key Check | +|-------|-------|-----------|-----------| +| E | E1-E6 | test_phase_e.hwpx | `validate` clean | +| A | A1-A4 | test_phase_a.hwpx | `view forms` shows 8 keywords | +| B | B1-B7 | test_phase_b.hwpx | `view forms` shows table fields | +| F | F1-F8 | test_phase_f.hwpx | heading detection, 1x1 cell, phone spacing | +| G | G1-G7 | test_phase_g.hwpx | heading h1/h2, MIME check, import | +| H | H1-H5 | test_phase_h_{a,b}.hwpx | compare shows unchanged/modified/added | + +### Build Status +```bash +cd officecli && dotnet build && dotnet test +``` +**Expected**: 0 errors, 189 tests pass, 2 pre-existing failures (Plan703 tests) diff --git a/99.9_test/comprehensive_form.hwpx b/99.9_test/comprehensive_form.hwpx new file mode 100644 index 000000000..1c5245685 Binary files /dev/null and b/99.9_test/comprehensive_form.hwpx differ diff --git a/99.9_test/cu_blank_01.hwpx b/99.9_test/cu_blank_01.hwpx new file mode 100644 index 000000000..44335f4f2 Binary files /dev/null and b/99.9_test/cu_blank_01.hwpx differ diff --git a/99.9_test/cu_blank_02.hwpx b/99.9_test/cu_blank_02.hwpx new file mode 100644 index 000000000..e81ff33f3 Binary files /dev/null and b/99.9_test/cu_blank_02.hwpx differ diff --git a/99.9_test/cu_blank_03.hwpx b/99.9_test/cu_blank_03.hwpx new file mode 100644 index 000000000..68e33ab96 Binary files /dev/null and b/99.9_test/cu_blank_03.hwpx differ diff --git a/99.9_test/cu_blank_04.hwpx b/99.9_test/cu_blank_04.hwpx new file mode 100644 index 000000000..a1118d8ba Binary files /dev/null and b/99.9_test/cu_blank_04.hwpx differ diff --git a/99.9_test/cu_template_01.hwpx b/99.9_test/cu_template_01.hwpx new file mode 100644 index 000000000..c255b2c2e Binary files /dev/null and b/99.9_test/cu_template_01.hwpx differ diff --git a/99.9_test/cu_template_01_visual_pass_01.hwpx b/99.9_test/cu_template_01_visual_pass_01.hwpx new file mode 100644 index 000000000..a053b64a5 Binary files /dev/null and b/99.9_test/cu_template_01_visual_pass_01.hwpx differ diff --git a/99.9_test/cu_template_02.hwpx b/99.9_test/cu_template_02.hwpx new file mode 100644 index 000000000..8fc680eaf Binary files /dev/null and b/99.9_test/cu_template_02.hwpx differ diff --git a/99.9_test/cu_template_02_visual_pass_01.hwpx b/99.9_test/cu_template_02_visual_pass_01.hwpx new file mode 100644 index 000000000..1c5245685 Binary files /dev/null and b/99.9_test/cu_template_02_visual_pass_01.hwpx differ diff --git a/99.9_test/cu_template_03.hwpx b/99.9_test/cu_template_03.hwpx new file mode 100644 index 000000000..fd7b40c3d Binary files /dev/null and b/99.9_test/cu_template_03.hwpx differ diff --git a/99.9_test/cu_template_03_visual_pass_01.hwpx b/99.9_test/cu_template_03_visual_pass_01.hwpx new file mode 100644 index 000000000..67087bdaf Binary files /dev/null and b/99.9_test/cu_template_03_visual_pass_01.hwpx differ diff --git a/99.9_test/cu_template_04.hwpx b/99.9_test/cu_template_04.hwpx new file mode 100644 index 000000000..da6e61e58 Binary files /dev/null and b/99.9_test/cu_template_04.hwpx differ diff --git a/99.9_test/cu_template_04_visual_pass_01.hwpx b/99.9_test/cu_template_04_visual_pass_01.hwpx new file mode 100644 index 000000000..4bf90eb9a Binary files /dev/null and b/99.9_test/cu_template_04_visual_pass_01.hwpx differ diff --git a/99.9_test/cu_template_05.hwpx b/99.9_test/cu_template_05.hwpx new file mode 100644 index 000000000..f3802c54d Binary files /dev/null and b/99.9_test/cu_template_05.hwpx differ diff --git a/99.9_test/cu_template_06.hwpx b/99.9_test/cu_template_06.hwpx new file mode 100644 index 000000000..c50c7830c Binary files /dev/null and b/99.9_test/cu_template_06.hwpx differ diff --git a/99.9_test/cu_template_07.hwpx b/99.9_test/cu_template_07.hwpx new file mode 100644 index 000000000..08d3c291f Binary files /dev/null and b/99.9_test/cu_template_07.hwpx differ diff --git a/99.9_test/cu_template_07_visual_pass_01.hwpx b/99.9_test/cu_template_07_visual_pass_01.hwpx new file mode 100644 index 000000000..90ae9ba5d Binary files /dev/null and b/99.9_test/cu_template_07_visual_pass_01.hwpx differ diff --git a/99.9_test/cu_template_08.hwpx b/99.9_test/cu_template_08.hwpx new file mode 100644 index 000000000..782559c7b Binary files /dev/null and b/99.9_test/cu_template_08.hwpx differ diff --git a/99.9_test/cu_template_09.hwpx b/99.9_test/cu_template_09.hwpx new file mode 100644 index 000000000..5582848bb Binary files /dev/null and b/99.9_test/cu_template_09.hwpx differ diff --git a/99.9_test/cu_template_09_visual_pass_01.hwpx b/99.9_test/cu_template_09_visual_pass_01.hwpx new file mode 100644 index 000000000..1c5245685 Binary files /dev/null and b/99.9_test/cu_template_09_visual_pass_01.hwpx differ diff --git a/99.9_test/kice_edited.hwpx b/99.9_test/kice_edited.hwpx new file mode 100644 index 000000000..fb677b90c Binary files /dev/null and b/99.9_test/kice_edited.hwpx differ diff --git a/99.9_test/kice_korean.hwpx b/99.9_test/kice_korean.hwpx new file mode 100644 index 000000000..d3480d6e8 Binary files /dev/null and b/99.9_test/kice_korean.hwpx differ diff --git a/99.9_test/kice_sample.hwpx b/99.9_test/kice_sample.hwpx new file mode 100644 index 000000000..67087bdaf Binary files /dev/null and b/99.9_test/kice_sample.hwpx differ diff --git a/99.9_test/test_phase_a.hwpx b/99.9_test/test_phase_a.hwpx new file mode 100644 index 000000000..f3802c54d Binary files /dev/null and b/99.9_test/test_phase_a.hwpx differ diff --git a/99.9_test/test_phase_b.hwpx b/99.9_test/test_phase_b.hwpx new file mode 100644 index 000000000..c50c7830c Binary files /dev/null and b/99.9_test/test_phase_b.hwpx differ diff --git a/99.9_test/test_phase_e.hwpx b/99.9_test/test_phase_e.hwpx new file mode 100644 index 000000000..7a53c5853 Binary files /dev/null and b/99.9_test/test_phase_e.hwpx differ diff --git a/99.9_test/test_phase_f.hwpx b/99.9_test/test_phase_f.hwpx new file mode 100644 index 000000000..774079b76 Binary files /dev/null and b/99.9_test/test_phase_f.hwpx differ diff --git a/99.9_test/test_phase_g.hwpx b/99.9_test/test_phase_g.hwpx new file mode 100644 index 000000000..782559c7b Binary files /dev/null and b/99.9_test/test_phase_g.hwpx differ diff --git a/99.9_test/test_phase_h_a.hwpx b/99.9_test/test_phase_h_a.hwpx new file mode 100644 index 000000000..35d5daef2 Binary files /dev/null and b/99.9_test/test_phase_h_a.hwpx differ diff --git a/99.9_test/test_phase_h_b.hwpx b/99.9_test/test_phase_h_b.hwpx new file mode 100644 index 000000000..7916214e7 Binary files /dev/null and b/99.9_test/test_phase_h_b.hwpx differ diff --git "a/99.9_test/\352\263\265\353\254\270_\354\203\230\355\224\214.hwpx" "b/99.9_test/\352\263\265\353\254\270_\354\203\230\355\224\214.hwpx" new file mode 100644 index 000000000..a053b64a5 Binary files /dev/null and "b/99.9_test/\352\263\265\353\254\270_\354\203\230\355\224\214.hwpx" differ diff --git a/BRANCH_STRATEGY.md b/BRANCH_STRATEGY.md new file mode 100644 index 000000000..c11fc8e77 --- /dev/null +++ b/BRANCH_STRATEGY.md @@ -0,0 +1,56 @@ +# Branch Strategy + +## Current State (2026-05-06) +- **upstream/main**: tracks `iOfficeAI/OfficeCLI` main. Pull-only, never push. +- **main**: local mirror of `upstream/main` plus compliance docs. +- **feat/hwpx**: active working branch on `origin`. Phase 36 HWP/HWPX evidence work (compatibility corpus, round-trip catalog, visual thresholds, provider matrix, release gate, safe-save) lives here. +- **docs/structure-init-2026-05-06**: short-lived doc branch that introduced `structure/`, fast-forwarded into `feat/hwpx`. + +## Branch Roles +- **upstream/main**: read-only upstream reference. +- **main**: clean mirror; rebased from `upstream/main`. +- **feat/hwpx**: primary working branch. All cli-jaw HWP/HWPX changes land here. +- **feature/\*** / **docs/\***: optional short-lived branches. Merge back into `feat/hwpx` (typically `--ff-only`) and delete. + +## Sync Procedure +```bash +git fetch upstream +git checkout main +git merge --ff-only upstream/main +git checkout feat/hwpx +git rebase main # or: git merge main --no-ff, depending on history needs +``` + +## Rebase Conflict Notes + +### 2026-05-14: upstream/main 1.0.91 rebase + +`upstream/main` moved from the 1.0.68-era base to 1.0.91 and introduced core plugin/exporter work while `feat/hwpx` carried native HWPX and experimental HWP bridge work. Resolve these conflicts by preserving both sides: + +- `BlankDocCreator.cs`: keep upstream `--minimal`/plugin-create support and keep native `.hwpx` creation before the plugin fallback. Unsupported-type text should list `.hwpx` plus plugin-served formats. +- `CommandBuilder.Import.cs`: keep upstream `--minimal` for DOCX and keep HWPX `--from-markdown`/`--align`; call `BlankDocCreator.Create(file, locale, minimal)` before optional HWPX markdown import. +- `CommandBuilder.cs`: register both upstream `BuildPluginsCommand` and fork `BuildCompareCommand`. +- `DocumentHandlerFactory.cs`: route `.hwpx` to `HwpxHandler`, keep `.hwp` bridge guidance, and fall back to upstream plugin handlers for unknown extensions. +- `CommandBuilder.View.cs` / `ResidentServer.cs`: keep upstream `pdf`/plugin forms support and HWP/HWPX modes (`forms`, `tables`, `markdown`, `objects`, `styles`, `fields`, `field`). Keep `BuildViewDescription()` when HWP bridge help is present. +- `WordHandler.Add.Text.cs`: keep upstream `sym=font:hex` handling so dump/batch round-trips do not duplicate symbol glyph text. + +After resolving, continue the rebase and verify: + +```bash +git ls-files 'src/rhwp-field-bridge/target/*' | wc -l # must be 0 +dotnet build officecli.slnx +cargo build --manifest-path src/rhwp-field-bridge/Cargo.toml +dotnet test tests/OfficeCli.Tests/OfficeCli.Tests.csproj --filter FullyQualifiedName~HwpBridge --no-build +dotnet test tests/OfficeCli.Tests/OfficeCli.Tests.csproj --no-build +``` + +## Working with `structure/` +- See [`structure/INDEX.md`](structure/INDEX.md) for the doc map. +- Update `structure/*.md` whenever the corresponding source/schema/test surface changes (sync checklist in `structure/INDEX.md`). +- Doc-only changes should not modify `.cs`, `.ts`, `.js`, `.py`, or other source files. + +## Rules +- Never push to `upstream`. +- All cli-jaw changes go on `feat/hwpx` (or short-lived branches that merge into it). +- HWP/HWPX claims must be evidence-gated per `docs/qa/phase-36-release-gate.md` — no DOCX parity language. +- Tag releases as `cjk-v{version}` on `feat/hwpx`. diff --git a/COMPLIANCE_NOTES.md b/COMPLIANCE_NOTES.md new file mode 100644 index 000000000..9474b8394 --- /dev/null +++ b/COMPLIANCE_NOTES.md @@ -0,0 +1,39 @@ +# OfficeCLI Fork — Apache 2.0 Compliance Notes + +## License Status +- Upstream: Apache License 2.0 (LICENSE file intact) +- Upstream NOTICE file: **none shipped** (checked v1.0.28) +- Apache 2.0 §4(b): modified files must carry prominent change notices + +## Our Fork +- Fork owner: lidge-jun +- Fork URL: https://github.com/lidge-jun/OfficeCLI +- Upstream: https://github.com/iOfficeAI/OfficeCLI + +## Planned Modifications +- CJK font handling (CjkHelper.cs) — Korean/Japanese/Chinese font metadata +- CJK language tag injection (w:lang, a:lang attributes) +- Kinsoku line-break processing +- East Asian character spacing + +## Attribution (for distribution) +If we ship a bundled binary, include this NOTICE: + +``` +OfficeCLI +Copyright (c) iOfficeAI contributors + +This product includes software developed at +iOfficeAI (https://github.com/iOfficeAI/OfficeCLI). + +Licensed under the Apache License, Version 2.0. + +Modifications by cli-jaw contributors: +- CJK (Korean/Japanese/Chinese) font handling and language tags +- CJK line-break (kinsoku) processing +- East Asian character spacing support +``` + +## Build Environment +- .NET 10.0.201 SDK installed at ~/.dotnet/ +- `dotnet build src/officecli/officecli.csproj -c Release` — Build succeeded (0 warnings, 0 errors) diff --git a/README.md b/README.md index 3d256b2ca..464446e85 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,8 @@ Open-source. Single binary. No Office installation. No dependencies. Works every **English** | [中文](README_zh.md) | [日本語](README_ja.md) | [한국어](README_ko.md) +> 📂 **Working in this repo as an agent or contributor?** Start at [`structure/INDEX.md`](structure/INDEX.md) — the agent-facing source-of-truth map covering file layout, command surface, format support, providers, and Phase 36 HWP/HWPX evidence gates. +

💬 Community: Discord

@@ -172,6 +174,10 @@ officecli add deck.pptx / --type slide --prop title="Q4 Report" | Word (.docx) | ✅ | ✅ | ✅ | | Excel (.xlsx) | ✅ | ✅ | ✅ | | PowerPoint (.pptx) | ✅ | ✅ | ✅ | +| HWPX (.hwpx) | 🧪 | 🧪 | 🧪 (`Resources/base.hwpx`) | +| HWP (.hwp, binary) | 🧪 (rhwp bridge) | 🧪 (output-first + safe in-place text) | 🧪 (rhwp sidecar) | + +> 🧪 = experimental, evidence-gated. HWP/HWPX support is active on `feat/hwpx` and is **not** at DOCX parity. Operation truth lives in `officecli capabilities --json`, the corpus manifests under `tests/fixtures/{hwp,hwpx,common}`, and Phase 36 docs in `docs/qa/`. See [`structure/03-format-support.md`](structure/03-format-support.md) and [`structure/04-providers.md`](structure/04-providers.md) before making any HWP/HWPX claim. **Word** — full [i18n & RTL support](https://github.com/iOfficeAI/OfficeCLI/wiki/i18n) (per-script font slots, per-script BCP-47 lang tags `lang.latin/ea/cs`, complex-script bold/italic/size, `direction=rtl` cascading through paragraph/run/section/table/style/header/footer/docDefaults, `rtlGutter` + `pgBorders` shorthand, locale-aware page numbering for Hindi/Arabic/Thai/CJK), [paragraphs](https://github.com/iOfficeAI/OfficeCLI/wiki/word-paragraph), [runs](https://github.com/iOfficeAI/OfficeCLI/wiki/word-run), [tables](https://github.com/iOfficeAI/OfficeCLI/wiki/word-table), [styles](https://github.com/iOfficeAI/OfficeCLI/wiki/word-style), [headers/footers](https://github.com/iOfficeAI/OfficeCLI/wiki/word-header-footer), [images](https://github.com/iOfficeAI/OfficeCLI/wiki/word-picture) (PNG/JPG/GIF/SVG), [equations](https://github.com/iOfficeAI/OfficeCLI/wiki/word-equation), [comments](https://github.com/iOfficeAI/OfficeCLI/wiki/word-comment), [footnotes](https://github.com/iOfficeAI/OfficeCLI/wiki/word-footnote), [watermarks](https://github.com/iOfficeAI/OfficeCLI/wiki/word-watermark), [bookmarks](https://github.com/iOfficeAI/OfficeCLI/wiki/word-bookmark), [TOC](https://github.com/iOfficeAI/OfficeCLI/wiki/word-toc), [charts](https://github.com/iOfficeAI/OfficeCLI/wiki/word-chart), [hyperlinks](https://github.com/iOfficeAI/OfficeCLI/wiki/word-hyperlink), [sections](https://github.com/iOfficeAI/OfficeCLI/wiki/word-section), [form fields](https://github.com/iOfficeAI/OfficeCLI/wiki/word-formfield), [content controls (SDT)](https://github.com/iOfficeAI/OfficeCLI/wiki/word-sdt), [fields](https://github.com/iOfficeAI/OfficeCLI/wiki/word-field) (22 zero-param types + MERGEFIELD / REF / PAGEREF / SEQ / STYLEREF / DOCPROPERTY / IF), [OLE objects](https://github.com/iOfficeAI/OfficeCLI/wiki/word-ole), [document properties](https://github.com/iOfficeAI/OfficeCLI/wiki/word-document) @@ -198,7 +204,10 @@ officecli add deck.pptx / --type slide --prop title="Q4 Report" ## Installation -Ships as a single self-contained binary. The .NET runtime is embedded -- nothing to install, no runtime to manage. +Ships as a self-contained OfficeCLI binary. Experimental binary HWP support +also installs `rhwp-officecli-bridge` and `rhwp-field-bridge` sidecars beside +the binary so `.hwp` create/read/render/mutation paths can run without manual +environment variables. **One-line install:** diff --git a/README_ko.md b/README_ko.md index d1ae0e5ff..bac3bd6a8 100644 --- a/README_ko.md +++ b/README_ko.md @@ -4,7 +4,7 @@ **모든 AI 에이전트에게 Word, Excel, PowerPoint의 완전한 제어권을 — 단 한 줄의 코드로.** -오픈소스. 단일 바이너리. Office 설치 불필요. 의존성 제로. 모든 플랫폼 지원. +오픈소스. 자체 완결형 바이너리. Office 설치 불필요. 모든 플랫폼 지원. **에이전트 친화적 렌더링 엔진 내장** — 에이전트가 자신이 만든 것을 "볼" 수 있고, Office 불필요. `.docx` / `.xlsx` / `.pptx`를 HTML 또는 PNG로 렌더링하며, *렌더링 → 보기 → 수정* 루프는 바이너리가 실행되는 어디서나 닫힙니다. @@ -198,7 +198,9 @@ officecli add deck.pptx / --type slide --prop title="Q4 Report" ## 설치 -단일 자체 완결형 바이너리로 제공. .NET 런타임 내장 -- 설치할 것도, 관리할 런타임도 없습니다. +자체 완결형 OfficeCLI 바이너리로 제공됩니다. 실험적 binary HWP 지원은 +`rhwp-officecli-bridge`와 `rhwp-field-bridge` sidecar를 바이너리 옆에 함께 +설치해 `.hwp` 생성/읽기/렌더링/수정 경로가 수동 환경변수 없이 동작하게 합니다. **원라인 설치:** diff --git a/SKILL.md b/SKILL.md index abb461ced..48e17f931 100644 --- a/SKILL.md +++ b/SKILL.md @@ -398,7 +398,13 @@ officecli add-part # create new document part | `financial-model` | Financial models, scenarios, projections. NOT for general data analysis (route those to `excel`) | | `data-dashboard` | CSV/tabular data → KPI / analytics / executive dashboards with charts and sparklines. NOT for raw data tracking (route those to `excel`) | -Example: a fundraising deck task → `officecli load_skill pitch-deck` → use the printed rules. +### HWPX (.hwpx, Hancom) + +| Name | When to use | Layer | +|------|-------------|-------| +| `officecli-hwpx` | Korean government / OWPML documents (한글 / 공문서) | **base** | + +Example: a fundraising deck task → `officecli skill pitch-deck` → use the printed rules. --- diff --git a/build.sh b/build.sh index 68aea7653..285963787 100755 --- a/build.sh +++ b/build.sh @@ -3,6 +3,7 @@ set -e PROJECT="src/officecli/officecli.csproj" ALL_TARGETS="osx-arm64:officecli-mac-arm64 osx-x64:officecli-mac-x64 linux-x64:officecli-linux-x64 linux-arm64:officecli-linux-arm64 linux-musl-x64:officecli-linux-alpine-x64 linux-musl-arm64:officecli-linux-alpine-arm64 win-x64:officecli-win-x64.exe win-arm64:officecli-win-arm64.exe" +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" # Detect current platform RID detect_local_rid() { @@ -79,6 +80,9 @@ build_config() { mv -f "$OUTPUT/$NAME.new" "$OUTPUT/$NAME" cp "$TMPDIR/officecli.pdb" "$OUTPUT/${NAME%.*}.pdb" + "$SCRIPT_DIR/scripts/build-rhwp-sidecars.sh" "$OUTPUT" "$RID" "$CONFIG" + copy_platform_sidecar_assets "$OUTPUT" "$NAME" + rm -rf "$TMPDIR" done @@ -89,6 +93,27 @@ build_config() { ls -lh "$OUTPUT" } +copy_platform_sidecar_assets() { + local OUTPUT="$1" + local NAME="$2" + local ASSET_BASE="${NAME%.exe}" + + if [ -f "$OUTPUT/rhwp-field-bridge" ]; then + cp "$OUTPUT/rhwp-field-bridge" "$OUTPUT/${ASSET_BASE}-rhwp-field-bridge" + chmod +x "$OUTPUT/${ASSET_BASE}-rhwp-field-bridge" 2>/dev/null || true + fi + if [ -f "$OUTPUT/rhwp-officecli-bridge" ]; then + cp "$OUTPUT/rhwp-officecli-bridge" "$OUTPUT/${ASSET_BASE}-rhwp-officecli-bridge" + chmod +x "$OUTPUT/${ASSET_BASE}-rhwp-officecli-bridge" 2>/dev/null || true + fi + if [ -f "$OUTPUT/rhwp-field-bridge.exe" ]; then + cp "$OUTPUT/rhwp-field-bridge.exe" "$OUTPUT/${ASSET_BASE}-rhwp-field-bridge.exe" + fi + if [ -f "$OUTPUT/rhwp-officecli-bridge.exe" ]; then + cp "$OUTPUT/rhwp-officecli-bridge.exe" "$OUTPUT/${ASSET_BASE}-rhwp-officecli-bridge.exe" + fi +} + CONFIG="${1:-release}" case "$CONFIG" in diff --git a/dev-install.sh b/dev-install.sh index eb6ac25cf..6b55ee806 100755 --- a/dev-install.sh +++ b/dev-install.sh @@ -43,6 +43,7 @@ esac echo "Building officecli ($RID)..." TMPDIR=$(mktemp -d) dotnet publish "$PROJECT" -c Release -r "$RID" -o "$TMPDIR" --nologo -v quiet +"$SCRIPT_DIR/scripts/build-rhwp-sidecars.sh" "$TMPDIR" "$RID" Release echo "Build complete." # Install @@ -61,7 +62,6 @@ mkdir -p "$INSTALL_DIR" # stuck in uninterruptible `UE` state on the next code page fault. cp "$TMPDIR/$BINARY_NAME" "$INSTALL_DIR/$BINARY_NAME.new" chmod +x "$INSTALL_DIR/$BINARY_NAME.new" -rm -rf "$TMPDIR" # macOS: remove quarantine flag and ad-hoc codesign (required by AppleSystemPolicy) # Done on the staged .new copy so the live binary is never mutated in place. @@ -72,6 +72,21 @@ fi mv -f "$INSTALL_DIR/$BINARY_NAME.new" "$INSTALL_DIR/$BINARY_NAME" +for SIDECAR in rhwp-officecli-bridge rhwp-field-bridge rhwp-officecli-bridge.exe rhwp-field-bridge.exe; do + if [ ! -f "$TMPDIR/$SIDECAR" ]; then + continue + fi + cp "$TMPDIR/$SIDECAR" "$INSTALL_DIR/$SIDECAR.new" + chmod +x "$INSTALL_DIR/$SIDECAR.new" 2>/dev/null || true + if [ "$(uname -s)" = "Darwin" ]; then + xattr -d com.apple.quarantine "$INSTALL_DIR/$SIDECAR.new" 2>/dev/null || true + codesign -s - -f "$INSTALL_DIR/$SIDECAR.new" 2>/dev/null || true + fi + mv -f "$INSTALL_DIR/$SIDECAR.new" "$INSTALL_DIR/$SIDECAR" +done + +rm -rf "$TMPDIR" + # Hint if not in PATH case ":$PATH:" in *":$INSTALL_DIR:"*) ;; diff --git a/docs/hwp-source-inventory.md b/docs/hwp-source-inventory.md new file mode 100644 index 000000000..bdece3372 --- /dev/null +++ b/docs/hwp-source-inventory.md @@ -0,0 +1,20 @@ +# HWP/HWPX Source Inventory + +This inventory supports the HWP/HWPX capability contract. Mutable web sources require +a reproducibility identifier. Sources without one can support background context only, +not OfficeCLI capability claims. + +| Source | URL | Accessed | Observed version or commit | Retrieved artifact hash or note | Claim allowed in OfficeCLI docs | +|---|---|---|---|---|---| +| Hancom Tech HWPX format structure | https://tech.hancom.com/hwpxformat/ | 2026-05-03 KST | web page | no local archive in Phase 0; background context only | HWPX is ZIP/XML-based; this does not prove Hancom-compatible writing. | +| Hancom HWPX FAQ | https://www.hancom.com/support/faqCenter/faq/detail/2784 | 2026-05-03 KST | web page | no local archive in Phase 0; background context only | HWPX is an OWPML-based open document format registered as KS X 6101. | +| rhwp repository | https://github.com/edwardkim/rhwp | 2026-05-10 KST | `62a458aa317e962cd3d0eec6096728c172d57110` (`v0.7.10`) | `git ls-remote ... HEAD`; pinned in `src/rhwp-field-bridge/Cargo.toml` | Candidate upstream HWP/HWPX read/render/edit engine only; no OfficeCLI support claim. | +| HOP repository | https://github.com/golbin/hop | 2026-05-03 KST | `bd6839bf55f8c2819a61c120421be60c4074e2a3` | `git ls-remote ... HEAD` | Candidate desktop integration evidence only; no OfficeCLI support claim. | +| HOP development note | https://github.com/golbin/hop/blob/main/docs/DEVELOPMENT.md | 2026-05-03 KST | `bd6839bf55f8c2819a61c120421be60c4074e2a3` | `git ls-remote ... HEAD`; exact file content not archived in Phase 0 | Upstream HWPX save limitations remain evidence against blanket write claims. | +| Microsoft .NET single-file deployment | https://learn.microsoft.com/en-us/dotnet/core/deploying/single-file/overview | 2026-05-03 KST | web page | no local archive in Phase 0; packaging context only | Single-file publish is RID-specific and native-library behavior must be tested. | +| Wasmtime .NET embedding | https://bytecodealliance.github.io/wasmtime-dotnet/articles/intro.html | 2026-05-03 KST | web page | no local archive in Phase 0; packaging context only | Wasmtime.NET is an embedding option only; packaging and latency must be measured. | + +Rule: upstream rhwp/HOP claims are candidate capability evidence only. They are not +OfficeCLI support claims until `officecli capabilities --json` returns +`roundtrip-verified` with fixture evidence and Hancom-compatible evidence where +required. diff --git a/docs/hwpx-current-operation-inventory.md b/docs/hwpx-current-operation-inventory.md new file mode 100644 index 000000000..ed4398307 --- /dev/null +++ b/docs/hwpx-current-operation-inventory.md @@ -0,0 +1,28 @@ +# HWPX Current Operation Inventory + +This file prevents broad HWPX write claims. An operation can be advertised to cli-jaw +only when `officecli capabilities --json` reports it as `roundtrip-verified` and the +evidence files listed here are complete. + +| Operation | Current engine | Status | Evidence file(s) | Hancom evidence file | Advertise to cli-jaw? | +|---|---|---|---|---|---| +| `read_text` | `custom` | `experimental` | `tests/fixtures/hwpx/text-basic.golden.txt` planned | none yet | no | +| `render_svg` | `none` | `unsupported` | none | none | no | +| `fill_field` | `custom` | `experimental` | planned | none yet | no | +| `save_original` | `custom` | `experimental` | planned | none yet | no | +| `create_blank` | `custom` | `experimental` | `src/officecli/Resources/base.hwpx` | none yet | no | +| `save_as_hwp` | `rhwp-bridge` | `experimental` | `src/rhwp-field-bridge/src/main.rs` | none yet | no | + +Current public wording must be: + +```text +OfficeCLI advertises only the HWPX operations listed as roundtrip-verified in +officecli capabilities --json. +``` + +Do not use: + +```text +OfficeCLI supports HWPX writing. +OfficeCLI supports current roundtrip-verified HWPX XML-first operations. +``` diff --git a/docs/providers/rhwp-sidecar-contract.md b/docs/providers/rhwp-sidecar-contract.md new file mode 100644 index 000000000..620787a80 --- /dev/null +++ b/docs/providers/rhwp-sidecar-contract.md @@ -0,0 +1,116 @@ +# rhwp Sidecar Contract + +This contract defines the stable boundary between OfficeCLI and the experimental +rhwp provider. The current implementation still invokes concrete sidecar +commands, but new work must keep responses compatible with the request/response +schemas in `schemas/interfaces/`. + +## Provider Identity + +Every sidecar response must expose: + +- `schemaVersion` +- `operation` +- `format` +- `engineVersion` +- `warnings` +- either `data` or `error` + +The provider must include the rhwp version or pinned commit whenever available. + +## Request Shape + +Requests are described by: + +```text +schemas/interfaces/rhwp-sidecar-request.v1.schema.json +``` + +Required fields: + +- `schemaVersion` +- `operation` +- `format` + +All operations except `create_blank` must include `inputPath`. `create_blank` +and mutating/export operations must include `outputPath`. + +## Response Shape + +Responses are described by: + +```text +schemas/interfaces/rhwp-sidecar-response.v1.schema.json +``` + +Required fields: + +- `schemaVersion` +- `ok` +- `operation` +- `format` +- `engineVersion` + +## Operation Policy + +Supported HWP operations: + +- `create-blank` +- `read-text` +- `render-svg` +- `list-fields` +- `read-field` +- `fill-field` +- `replace-text` +- `table-map` +- `set-table-cell` +- `save-as-hwp` + +Supported HWPX rhwp operations: + +- `read-text` +- `render-svg` +- `list-fields` +- `read-field` +- `fill-field` +- `replace-text` +- `save-as-hwp` + +Blocked operations must return typed errors instead of silent fallback. + +## Save Policy + +The sidecar must not overwrite the source path. Mutating operations write to an +explicit output path until safe-save transactions are implemented. + +Future in-place support must go through: + +1. temp output in the source directory; +2. provider readback; +3. semantic delta validation; +4. SVG/visual validation where available; +5. backup creation; +6. atomic replace. + +## Error Policy + +Errors must include: + +- `code` +- `message` +- `format` +- `operation` +- `engine` +- `nextCommand` when there is an obvious diagnostic command + +Examples: + +```text +bridge_not_enabled +bridge_missing +rhwp_runtime_missing +rhwp_api_missing +unsupported_operation +roundtrip_unverified +binary_hwp_write_forbidden +``` diff --git a/docs/qa/compatibility-corpus.md b/docs/qa/compatibility-corpus.md new file mode 100644 index 000000000..ab6be8cc3 --- /dev/null +++ b/docs/qa/compatibility-corpus.md @@ -0,0 +1,179 @@ +# HWP/HWPX Compatibility Corpus + +This corpus is the evidence ledger for HWP/HWPX parity work. It does not claim +format-wide fidelity. It records which concrete fixtures prove which concrete +operations. + +## Manifests + +```text +tests/fixtures/hwp/manifest.json +tests/fixtures/hwpx/manifest.json +tests/fixtures/common/expected-capabilities.json +``` + +Each fixture entry includes: + +- stable fixture id; +- repository-relative file path; +- SHA-256; +- byte size; +- document classes; +- verified operations; +- evidence files; +- blocked operations when relevant. + +## Claim Policy + +An HWP/HWPX operation can move toward DOCX-like agent usability only when the +same operation is present in all of these places: + +```text +capabilities --json +schema/help JSON +fixture manifest +expected-capabilities.json +tests or golden evidence +``` + +Unsupported or unverified operations must fail closed with a typed reason such +as `roundtrip_unverified`, `bridge_missing`, or `unsupported_operation`. + +## Current Coverage + +Binary HWP coverage is operation-level: + +```text +read_text +render_svg +list_fields +read_field +fill_field +replace_text +set_table_cell +``` + +HWPX coverage is provider-specific: + +```text +custom provider remains default +rhwp provider is opt-in for read/render/text replacement paths +set_table_cell remains blocked until package and Hancom compatibility gates pass +``` + +## Provider Compatibility Matrix + +Phase 36.5 adds the cross-provider matrix at: + +```text +docs/qa/provider-compatibility-matrix.md +tests/fixtures/common/provider-compatibility.json +``` + +HWPX `custom` remains the default provider; rhwp-bridge stays opt-in only and +must not be promoted to default until evidence parity is reached. HWP defaults +to `rhwp-bridge`. The matrix covers every expected-capability operation for +both `custom` and `rhwp-bridge`, with blocked provider paths carrying typed +reasons such as `unsupported_engine`, `binary_hwp_mutation_forbidden`, +`binary_hwp_write_forbidden`, `rhwp_runtime_missing`, and `rhwp_api_missing`. +Hancom is `optional` on every row; it can support +a future status promotion but must not be required by normal CI. + +## Visual Diff Thresholds + +Phase 36.4 adds the visual evidence policy at: + +```text +docs/qa/visual-diff-thresholds.md +tests/fixtures/common/visual-thresholds.json +``` + +Hard fails (page-count mismatch, missing SVG page, missing render evidence +for a visual-validated operation, and body proof markers in fixed-layout exam +sheets) cannot be tolerated. Thresholded fails (text-only layout drift, +unexpected blank render, exact SVG hash mismatch) have declared bounds. KICE +style fixed-layout exam sheets use a stricter rule: proof markers must stay out +of the visible question body and visible layout drift is `0%` unless the +requested edit explicitly changes exam content. As of Phase 36.4 only +`render_svg` is a visual-validated operation; mutation operations may declare +drift tolerance but cannot claim visual validation without a linked render +evidence file. The renderer status remains `deferred` until OfficeCLI ships a +stable in-CI renderer. + +## Round-Trip Cases + +Phase 36.3 adds an operation-level declarative round-trip catalog at: + +```text +tests/fixtures/common/roundtrip-cases.json +tests/fixtures/common/roundtrip-case.v1.schema.json +``` + +Each case declares a fixtureId, operation, provider, outputMode, args, and the +required checks: `source-unchanged`, `output-created`, `provider-readback`, +`semantic-delta`, `typed-error-if-blocked`. Mutation cases must include +`source-unchanged` and must not run with `outputMode = in-place`. Blocked cases +must include `typed-error-if-blocked` and a typed `expected.error.code`. + +Normal CI enforces declarative invariants over the catalog. Real rhwp-backed +execution is opt-in and gated on `OFFICECLI_REAL_RHWP_BIN`; Phase 36 does not +claim a full executor with semantic output comparison in normal CI. + +## Fixture Class Coverage + +Phase 36.2 records required fixture classes in each manifest under +`fixtureClassCoverage`. Each class must declare a state: + +```text +verified → small in-repo fixture proves the class +blocked → typed reason explains why the class is not verified +external-manual → samples are tracked outside the repo +``` + +Required classes: + +```text +multi-section +merged-cell-tables +nested-tables +pictures-bindata +headers-footers +equations +unicode-edge-cases +malformed-hwpx-package +``` + +`malformed-hwpx-package` is HWPX-only and must remain `blocked` with reason +`fixture_validation_failed`. External-manual entries must not declare +`verifiedOperations` and do not contribute to capability evidence. + +## Phase 36 Release Gate + +Phase 36 closes when all corpus, declarative round-trip, visual-threshold, and +provider-matrix gates agree. The single source of truth lives at: + +```text +docs/qa/phase-36-release-gate.md +``` + +Allowed claim: + +```text +OfficeCLI tracks HWP/HWPX support with corpus-backed operation evidence, +declarative round-trip cases, visual-threshold policy, and provider +compatibility rows. +``` + +The forbidden claim ("HWP/HWPX have DOCX parity") is enforced by +`HwpCompatibilityCorpusTests.NoDocxParityLanguageBeforeScorecard` and remains +blocked until the later parity scorecard is green. + +## Next Gates + +The corpus is intentionally small in this first slice. Later Phase 36 patches +should add fixture classes for: + +```text +footnotes/endnotes +large documents +``` diff --git a/docs/qa/phase-36-release-gate.md b/docs/qa/phase-36-release-gate.md new file mode 100644 index 000000000..879759766 --- /dev/null +++ b/docs/qa/phase-36-release-gate.md @@ -0,0 +1,70 @@ +# Phase 36 Release Gate + +Phase 36 closes only when corpus, declarative round-trip, visual-threshold, +and provider-matrix gates all agree. This document is the single source of +truth for what must be green and the language that may be claimed. + +## Required Artifacts + +- `schemas/interfaces/compatibility-corpus.v1.schema.json` +- `schemas/interfaces/expected-capabilities.v1.schema.json` +- `tests/fixtures/common/expected-capabilities.json` +- `tests/fixtures/common/roundtrip-case.v1.schema.json` +- `tests/fixtures/common/roundtrip-cases.json` +- `tests/fixtures/common/visual-thresholds.json` +- `tests/fixtures/common/provider-compatibility.json` +- `tests/fixtures/hwp/manifest.json` +- `tests/fixtures/hwpx/manifest.json` +- `docs/qa/compatibility-corpus.md` +- `docs/qa/visual-diff-thresholds.md` +- `docs/qa/provider-compatibility-matrix.md` +- `docs/qa/phase-36-release-gate.md` + +## Required Tests + +```text +HwpCompatibilityCorpusTests +HwpRoundTripCorpusTests +HwpVisualDiffThresholdTests +HwpProviderCompatibilityMatrixTests +``` + +The release gate adds: + +```text +HwpCompatibilityCorpusTests.Phase36ReleaseGateRequiresAllCorpusArtifacts +HwpCompatibilityCorpusTests.NoDocxParityLanguageBeforeScorecard +HwpCompatibilityCorpusTests.BlockedOperationsRemainMachineReadable +``` + +## Acceptance Commands + +```text +dotnet build officecli.slnx +dotnet test tests/OfficeCli.Tests/OfficeCli.Tests.csproj --filter FullyQualifiedName~Hwp --no-build +dotnet test tests/OfficeCli.Tests/OfficeCli.Tests.csproj --no-build +git diff --check +git ls-files 'src/rhwp-field-bridge/target/*' | wc -l +``` + +The last command must return `0`. + +## Allowed Claim + +```text +OfficeCLI tracks HWP/HWPX support with corpus-backed operation evidence, +declarative round-trip cases, visual-threshold policy, and provider +compatibility rows. +``` + +## Forbidden Claim + +```text +HWP/HWPX have DOCX parity. +``` + +That claim is forbidden until the later parity scorecard is green and +remains enforced by `NoDocxParityLanguageBeforeScorecard` over corpus, +round-trip, visual, provider, and release-gate documents. The phrase may +appear only in a "forbidden claim" or "must not" guard context, never as +an actual capability statement. diff --git a/docs/qa/provider-compatibility-matrix.md b/docs/qa/provider-compatibility-matrix.md new file mode 100644 index 000000000..099ac1c73 --- /dev/null +++ b/docs/qa/provider-compatibility-matrix.md @@ -0,0 +1,97 @@ +# Provider Compatibility Matrix (Phase 36.5) + +OfficeCLI tracks HWP/HWPX provider behaviour without promoting a provider +ahead of the evidence the corpus carries. The authoritative catalog lives at: + +```text +tests/fixtures/common/provider-compatibility.json +``` + +## Schema + +Each row has: + +```text +format +operation +provider +status +defaultProvider +evidence +blockedReason +hancomLane +``` + +Allowed `status` values: + +```text +unsupported +experimental +fixture-backed +roundtrip-verified +external-manual +``` + +## Defaults + +- HWPX → `custom` is the **default provider**. `rhwp-bridge` is opt-in only and + must not be promoted to default until evidence parity is reached on every + HWPX operation it claims. +- HWP → `rhwp-bridge` is the default provider. `custom` is unsupported for + binary HWP read/render paths under `unsupported_engine` and for mutations + (`fill_field`, `replace_text`, `set_table_cell`) under the binary mutation + block. `create_blank` and `save_as_hwp` are rhwp-backed package-sidecar + operations. +- Hancom — every row carries `hancomLane = optional`. Hancom evidence may + *support* a future status promotion but never replaces corpus or round-trip + evidence and must not be required by normal CI. + +## Coverage Contract + +The matrix must include one row for each +`expected-capabilities.json` format/operation/provider tuple across: + +```text +custom +rhwp-bridge +``` + +Exactly one provider row per format/operation may set `defaultProvider = true`, +and that provider must match the format's `defaultEngine`. + +## Blocked Provider Rows + +Any row with `status = unsupported` must carry a typed `blockedReason` drawn +from the typed reason enum in +`src/officecli/Handlers/Hwp/HwpCapabilityReport.cs`: + +```text +unsupported_format +unsupported_operation +unsupported_engine +roundtrip_unverified +binary_hwp_mutation_forbidden +binary_hwp_write_forbidden +fixture_validation_failed +capability_schema_invalid +bridge_* +rhwp_runtime_missing +rhwp_api_missing +``` + +## Tests + +Enforcement lives in: + +```text +tests/OfficeCli.Tests/Hwp/HwpProviderCompatibilityMatrixTests.cs + - HwpxCustomRemainsDefault + - RhwpPromotionRequiresEvidenceParity + - HancomLaneIsOptionalNotCiRequired + - BlockedProviderRowsHaveTypedReasons + - RowsAreUnique + - MatrixCoversExpectedCapabilityProviderPairs +``` + +These run on every CI build and do not require any external Hancom +installation. diff --git a/docs/qa/visual-diff-thresholds.md b/docs/qa/visual-diff-thresholds.md new file mode 100644 index 000000000..ebc8e7e16 --- /dev/null +++ b/docs/qa/visual-diff-thresholds.md @@ -0,0 +1,93 @@ +# Visual Diff Thresholds (Phase 36.4) + +OfficeCLI tracks HWP/HWPX visual evidence policy without committing to a +specific pixel-diff implementation in CI. This document defines the contract +that any visual claim must respect. + +## Hard Fail Conditions + +A visual claim is rejected outright when any of the following occur: + +```text +page-count-mismatch +missing-expected-svg-page +missing-render-evidence-for-visual-validated-operation +body-marker-in-fixed-layout-exam +``` + +`body-marker-in-fixed-layout-exam` applies to KICE-style exam sheets and +similar fixed-layout documents. When a document is detected as a fixed-layout +exam sheet, proof text such as `[CU TEMPLATE EDIT ...]`, `VISUAL QA`, or +`edited via Hancom Office HWP UI` must not be inserted into the visible body. +Those markers change the page flow and are hard failures even if the file opens +and save/readback succeeds. + +## Thresholded Fail Conditions + +Operation-level layout drift is allowed within declared bounds. The current +catalog lives at: + +```text +tests/fixtures/common/visual-thresholds.json +``` + +Allowed metrics: + +```text +layout-drift-fraction +blank-page-fraction +svg-hash-equality +``` + +Defaults: + +- text-only mutations may drift up to **2%** of layout area +- any unexpected blank page is a fail +- when a row declares exact SVG hash evidence, any drift is a fail +- fixed-layout exam sheets allow **0%** visible body layout drift unless the + requested edit explicitly changes exam content + +## Fixed-Layout Exam Sheets + +Fixed-layout exam sheets are identified by structural signals such as +`NEWSPAPER` two-column layout, exam-title text, and question numbering. Visual +QA for these files requires before/after screenshots and a manual visual review +until a stable renderer is available in CI. + +For these documents, QA proof must be stored outside the visible question body: +use screenshots, sidecar evidence, logs, or non-visible metadata. Do not insert +review markers into the first column, question body, answer choices, header +tables, or floating title/page-number objects. + +## Visual Validated Operations + +Only operations explicitly listed in `visualValidatedOperations` may make +visual claims. As of Phase 36.4 that list is `[render_svg]`. `replace_text`, +`fill_field`, and `set_table_cell` may declare *threshold tolerances* but +cannot declare a visual claim without a render evidence link in +`expected-capabilities.json`. + +## Renderer Status + +```text +rendererStatus = deferred +``` + +OfficeCLI CI does not yet ship a stable HWP/HWPX renderer. Until it does, the +threshold contract is enforced declaratively: visual evidence must be linked, +and any operation marked visual-validated without render evidence fails CI. + +## Tests + +The contract is enforced by: + +```text +tests/OfficeCli.Tests/Hwp/HwpVisualDiffThresholdTests.cs + - PageCountMismatchIsHardFail + - MissingRenderEvidenceFailsVisualClaim + - TextOnlyMutationUsesDeclaredThreshold + - FixedLayoutExamBodyMarkersAreHardFail + - FixedLayoutExamRuleRejectsAdHocBodyProofMarkers +``` + +These run on every CI build and do not depend on a renderer being available. diff --git a/docs/rebase-reports/2026-05-14-officecli-hwpx-rebase.hwpx b/docs/rebase-reports/2026-05-14-officecli-hwpx-rebase.hwpx new file mode 100644 index 000000000..be2311641 Binary files /dev/null and b/docs/rebase-reports/2026-05-14-officecli-hwpx-rebase.hwpx differ diff --git a/docs/rebase-reports/2026-05-14-officecli-hwpx-rebase.md b/docs/rebase-reports/2026-05-14-officecli-hwpx-rebase.md new file mode 100644 index 000000000..17ee4e89d --- /dev/null +++ b/docs/rebase-reports/2026-05-14-officecli-hwpx-rebase.md @@ -0,0 +1,53 @@ +# OfficeCLI HWPX Rebase Report + +## Overview + +This document records the 2026-05-14 rebase of the OfficeCLI `feat/hwpx` +branch onto `upstream/main` 1.0.91. + +## Repository State + +- Repository: `/Users/jun/Developer/new/700_projects/cli-jaw/officecli` +- Branch: `feat/hwpx` +- Upstream base: `upstream/main` 1.0.91 +- PATH binary: `/Users/jun/.local/bin/officecli` +- Registered target: `/Users/jun/Developer/new/700_projects/cli-jaw/officecli/build-local/officecli` +- Published OfficeCLI version: `1.0.91.0` + +## Rebase Result + +The rebase completed successfully after manual conflict resolution. The resolved +branch preserves upstream plugin, PDF, exporter, and minimal-document support +while keeping the fork's native HWPX handler and experimental HWP bridge work. + +## Conflict Resolution Notes + +- `BlankDocCreator.cs`: kept upstream plugin creation support and native `.hwpx` + creation before plugin fallback. +- `CommandBuilder.Import.cs`: kept upstream `--minimal` and fork HWPX + `--from-markdown` / `--align` import behavior. +- `CommandBuilder.cs`: registered both `BuildPluginsCommand` and + `BuildCompareCommand`. +- `DocumentHandlerFactory.cs`: routed `.hwpx` to `HwpxHandler`, preserved `.hwp` + bridge guidance, and kept plugin fallback for unknown extensions. +- `CommandBuilder.View.cs` and `ResidentServer.cs`: kept upstream `pdf` and + plugin forms support together with HWP/HWPX modes such as `forms`, `tables`, + `markdown`, `objects`, `styles`, `fields`, and `field`. +- `WordHandler.Add.Text.cs`: preserved upstream `sym=font:hex` handling to avoid + duplicate symbol glyph text on dump/batch round trips. + +## Verification + +- No tracked Rust `target` artifacts: passed. +- `dotnet build officecli.slnx`: passed with warnings only. +- `cargo build --manifest-path src/rhwp-field-bridge/Cargo.toml`: passed. +- HWP bridge focused tests: 36 passed, 0 failed. +- Full OfficeCLI test project: 234 passed, 0 failed. +- `build-local/officecli` republished for `osx-arm64`. +- PATH `officecli` resolves to the republished `build-local/officecli` target. + +## HWP Capability Boundary + +Binary `.hwp` creation remains unsupported by OfficeCLI capabilities. The current +safe document creation path is `.hwpx`; binary HWP mutation remains +operation-gated and requires the experimental rhwp bridge environment. diff --git a/docs/safety/safe-save-policy.md b/docs/safety/safe-save-policy.md new file mode 100644 index 000000000..d843f7a34 --- /dev/null +++ b/docs/safety/safe-save-policy.md @@ -0,0 +1,70 @@ +# Safe Save Policy + +OfficeCLI HWP and HWPX mutation must preserve the source file unless a +transactional save path proves the edited file can be reopened and validated. + +## Current policy + +The stable policy is output-first: + +```text +input.hwp -> output.hwp +input.hwpx -> output.hwpx +``` + +Commands that mutate HWP or HWPX content must require an explicit output path +until the safe-save transaction contract is implemented for that operation. +They must not overwrite the input path by default. + +## Transaction contract + +A safe-save transaction must follow these gates before any in-place replace: + +1. write the edited document to a temporary file in the same directory as the + final target; +2. fsync the temporary file where the platform exposes a supported flush; +3. reopen the temporary output with the same provider; +4. reopen with an alternate provider when one is available; +5. validate the expected semantic delta, such as text, field, or table-cell + changes; +6. for HWPX, validate ZIP readability, XML well-formedness, manifest references, + header references, and BinData references; +7. render SVG when the provider supports it and compare the edited render with + the expected visual tolerance; +8. write a backup before replacing an existing source file; +9. write a transaction manifest containing checks, warnings, backup path, and + verification evidence; +10. atomically replace only after every required gate passes. + +If any required check fails, the original input file must remain unchanged and +the command must return a structured validation error. + +## Interface schemas + +The policy is described by: + +```text +schemas/interfaces/save-policy.v1.schema.json +schemas/interfaces/save-transaction.v1.schema.json +``` + +`save-policy` describes which checks an operation requires. `save-transaction` +describes the result returned after an output-first or in-place save attempt. + +## HWP and HWPX boundaries + +Binary HWP has the strictest policy because corruption is difficult to inspect +manually. HWPX is ZIP/XML and easier to inspect, but it still needs package and +reference validation before in-place writes are allowed. + +Agent-facing help should keep these claims separate: + +```text +documented shape +handler-supported operation +readback-verified output +safe in-place mutation +``` + +Only the last category may overwrite an input file, and only with backup and +transaction evidence. diff --git a/install.ps1 b/install.ps1 index 1922153d6..fa99c6525 100644 --- a/install.ps1 +++ b/install.ps1 @@ -8,6 +8,7 @@ $source = $null $url = "https://github.com/$repo/releases/latest/download/$asset" $checksumUrl = "https://github.com/$repo/releases/latest/download/SHA256SUMS" $tempFile = "$env:TEMP\$binary" +$assetBase = $asset -replace '\.exe$', '' Write-Host "Downloading OfficeCLI..." try { Invoke-WebRequest -Uri $url -OutFile $tempFile @@ -80,6 +81,35 @@ if ($existing) { New-Item -ItemType Directory -Force -Path $installDir | Out-Null Copy-Item -Force $source "$installDir\$binary" +foreach ($sidecar in @("rhwp-field-bridge", "rhwp-officecli-bridge")) { + $sidecarAsset = "$assetBase-$sidecar.exe" + $sidecarTemp = "$env:TEMP\$sidecarAsset" + $sidecarTarget = Join-Path $installDir "$sidecar.exe" + $sidecarSource = $null + + Write-Host "Checking optional HWP sidecar $sidecarAsset..." + try { + Invoke-WebRequest -Uri "https://github.com/$repo/releases/latest/download/$sidecarAsset" -OutFile $sidecarTemp + $sidecarSource = $sidecarTemp + } catch { + $candidates = @(".\$sidecarAsset", ".\bin\$sidecarAsset", ".\bin\release\$sidecarAsset", ".\$sidecar.exe", ".\bin\$sidecar.exe", ".\bin\release\$sidecar.exe") + foreach ($candidate in $candidates) { + if (Test-Path $candidate) { + $sidecarSource = $candidate + break + } + } + } + + if ($sidecarSource) { + Copy-Item -Force $sidecarSource $sidecarTarget + Write-Host "Installed HWP sidecar: $sidecarTarget" + } else { + Write-Host "Optional HWP sidecar unavailable: $sidecarAsset. Binary .hwp create/read/edit will be dependency-gated." + } + Remove-Item -Force $sidecarTemp -ErrorAction SilentlyContinue +} + Remove-Item -Force $tempFile -ErrorAction SilentlyContinue # Add to PATH if not already there diff --git a/install.sh b/install.sh index 3fe91a1d8..260d536c7 100755 --- a/install.sh +++ b/install.sh @@ -54,6 +54,7 @@ SOURCE="" # Step 1: Try downloading from GitHub DOWNLOAD_URL="https://github.com/$REPO/releases/latest/download/$ASSET" CHECKSUM_URL="https://github.com/$REPO/releases/latest/download/SHA256SUMS" +ASSET_BASE="${ASSET%.exe}" echo "Downloading OfficeCLI ($ASSET)..." if curl -fsSL "$DOWNLOAD_URL" -o "/tmp/$BINARY_NAME" 2>/dev/null; then # Verify checksum if available @@ -135,6 +136,45 @@ fi mv -f "$INSTALL_DIR/$BINARY_NAME.new" "$INSTALL_DIR/$BINARY_NAME" +install_sidecar() { + local sidecar="$1" + local sidecar_asset="${ASSET_BASE}-${sidecar}" + local sidecar_source="" + local tmp_path="/tmp/${sidecar_asset}" + local target_path="$INSTALL_DIR/$sidecar" + + echo "Checking optional HWP sidecar $sidecar_asset..." + if curl -fsSL "https://github.com/$REPO/releases/latest/download/$sidecar_asset" -o "$tmp_path" 2>/dev/null; then + sidecar_source="$tmp_path" + else + for candidate in "./$sidecar_asset" "./bin/$sidecar_asset" "./bin/release/$sidecar_asset" "./$sidecar" "./bin/$sidecar" "./bin/release/$sidecar"; do + if [ -f "$candidate" ]; then + sidecar_source="$candidate" + break + fi + done + fi + + if [ -z "$sidecar_source" ]; then + echo "Optional HWP sidecar unavailable: $sidecar_asset. Binary .hwp create/read/edit will be dependency-gated." + rm -f "$tmp_path" + return 0 + fi + + cp "$sidecar_source" "$target_path.new" + chmod +x "$target_path.new" + if [ "$(uname -s)" = "Darwin" ]; then + xattr -d com.apple.quarantine "$target_path.new" 2>/dev/null || true + codesign -s - -f "$target_path.new" 2>/dev/null || true + fi + mv -f "$target_path.new" "$target_path" + rm -f "$tmp_path" + echo "Installed HWP sidecar: $target_path" +} + +install_sidecar "rhwp-field-bridge" +install_sidecar "rhwp-officecli-bridge" + # Auto-add to PATH if needed case ":$PATH:" in *":$INSTALL_DIR:"*) ;; diff --git a/officecli.slnx b/officecli.slnx index dd218cae2..07e3ff112 100644 --- a/officecli.slnx +++ b/officecli.slnx @@ -1,6 +1,7 @@ + diff --git a/schemas/README.md b/schemas/README.md index f67ec5271..ccb2e67ba 100644 --- a/schemas/README.md +++ b/schemas/README.md @@ -15,12 +15,21 @@ schemas/ docx/.json ← Word per-element capability pptx/.json ← PowerPoint per-element capability xlsx/.json ← Excel per-element capability + hwpx/.json ← HWPX ZIP/XML per-element capability + hwp/.json ← Binary HWP rhwp-backed operation capability + interfaces/ + *.v1.schema.json ← shared JSON contracts for capability, edit, + validation, rhwp sidecar, and safe-save flows ``` ## Editing rule Any PR that changes `Add`, `Set`, or `Get` behavior for an element **must** update the matching schema file in the same PR. CI contract tests will fail otherwise. +HWP/HWPX mutation work must also update the relevant `schemas/interfaces/*` +contract when it changes result envelopes, sidecar requests, validation output, +or save/backup behavior. + ## Not here - Narrative / tutorials / best practices → wiki (generated or hand-written at release time). diff --git a/schemas/help/_schema.json b/schemas/help/_schema.json index cc568f75b..2afd41167 100644 --- a/schemas/help/_schema.json +++ b/schemas/help/_schema.json @@ -9,7 +9,7 @@ "$schema": { "type": "string", "description": "pointer to this meta file for IDE tooling; ignored at runtime" }, "format": { "type": "string", - "enum": ["docx", "xlsx", "pptx"] + "enum": ["docx", "xlsx", "pptx", "hwpx", "hwp"] }, "element": { "type": "string", diff --git a/schemas/help/hwp/capabilities.json b/schemas/help/hwp/capabilities.json new file mode 100644 index 000000000..93721c321 --- /dev/null +++ b/schemas/help/hwp/capabilities.json @@ -0,0 +1,14 @@ +{ + "$schema": "../_schema.json", + "format": "hwp", + "element": "capabilities", + "container": true, + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/capabilities"] }, + "note": "Machine-readable capability surface for binary HWP. Use `officecli capabilities --json` or `officecli hwp capabilities --json` style commands when available. HWP support is experimental and provider-gated through rhwp.", + "properties": { + "read_text": { "type": "enum", "values": ["unsupported", "experimental", "roundtrip-verified"], "get": true, "query": true, "examples": ["officecli capabilities --json"], "enforcement": "strict" }, + "render_svg": { "type": "enum", "values": ["unsupported", "experimental", "roundtrip-verified"], "get": true, "query": true, "examples": ["officecli view file.hwp svg --json"], "enforcement": "strict" }, + "provider": { "type": "string", "get": true, "query": true, "examples": ["rhwp-bridge"], "enforcement": "strict" } + } +} diff --git a/schemas/help/hwp/field.json b/schemas/help/hwp/field.json new file mode 100644 index 000000000..75be76a0b --- /dev/null +++ b/schemas/help/hwp/field.json @@ -0,0 +1,15 @@ +{ + "$schema": "../_schema.json", + "format": "hwp", + "element": "field", + "parent": "fields", + "operations": { "add": false, "set": true, "get": true, "query": true, "remove": false }, + "paths": { "stable": ["/field[@name=NAME]", "/field[@id=ID]"], "positional": ["/field"] }, + "note": "Single field read/fill operation. Mutations write to an explicit output path and never overwrite the source file.", + "properties": { + "name": { "type": "string", "set": true, "get": true, "query": true, "examples": ["--field-name 회사명", "--prop name=회사명"], "enforcement": "strict" }, + "id": { "type": "number", "set": true, "get": true, "query": true, "examples": ["--field-id 1584999796", "--prop id=1584999796"], "enforcement": "strict" }, + "value": { "type": "string", "set": true, "get": true, "examples": ["--prop value=리지"], "enforcement": "strict" }, + "output": { "type": "string", "set": true, "examples": ["--prop output=out.hwp"], "enforcement": "strict" } + } +} diff --git a/schemas/help/hwp/fields.json b/schemas/help/hwp/fields.json new file mode 100644 index 000000000..559503df0 --- /dev/null +++ b/schemas/help/hwp/fields.json @@ -0,0 +1,13 @@ +{ + "$schema": "../_schema.json", + "format": "hwp", + "element": "fields", + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/fields"] }, + "note": "Read-only field discovery through the rhwp API sidecar.", + "properties": { + "fieldId": { "type": "number", "get": true, "query": true, "examples": ["1584999796"], "enforcement": "strict" }, + "name": { "type": "string", "get": true, "query": true, "examples": ["회사명"], "enforcement": "strict" }, + "value": { "type": "string", "get": true, "query": true, "examples": ["홍길동"], "enforcement": "strict" } + } +} diff --git a/schemas/help/hwp/inspect.json b/schemas/help/hwp/inspect.json new file mode 100644 index 000000000..94dd6be2f --- /dev/null +++ b/schemas/help/hwp/inspect.json @@ -0,0 +1,14 @@ +{ + "$schema": "../_schema.json", + "format": "hwp", + "element": "inspect", + "container": true, + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/inspect"] }, + "note": "Planned read-only document-map entry point for binary HWP. Initial schema exists so agents can discover the intended operation and current limits.", + "properties": { + "fields": { "type": "string", "get": true, "query": true, "description": "Field summary if provider supports it.", "enforcement": "report" }, + "tables": { "type": "string", "get": true, "query": true, "description": "Table coordinate summary if provider supports it.", "enforcement": "report" }, + "warnings": { "type": "string", "get": true, "query": true, "description": "Provider limitations.", "enforcement": "report" } + } +} diff --git a/schemas/help/hwp/provider-rhwp.json b/schemas/help/hwp/provider-rhwp.json new file mode 100644 index 000000000..e3d91bb02 --- /dev/null +++ b/schemas/help/hwp/provider-rhwp.json @@ -0,0 +1,14 @@ +{ + "$schema": "../_schema.json", + "format": "hwp", + "element": "provider-rhwp", + "container": true, + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/provider/rhwp"] }, + "note": "Provider contract for the experimental rhwp bridge and API sidecar.", + "properties": { + "engine": { "type": "string", "get": true, "query": true, "examples": ["rhwp-bridge"], "enforcement": "strict" }, + "rhwpVersion": { "type": "string", "get": true, "query": true, "examples": ["rhwp v0.7.9"], "enforcement": "report" }, + "apiSidecar": { "type": "string", "get": true, "query": true, "examples": ["rhwp-field-bridge"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwp/read-text.json b/schemas/help/hwp/read-text.json new file mode 100644 index 000000000..80407249f --- /dev/null +++ b/schemas/help/hwp/read-text.json @@ -0,0 +1,13 @@ +{ + "$schema": "../_schema.json", + "format": "hwp", + "element": "read-text", + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/text"] }, + "note": "Read-only text extraction for binary .hwp through the experimental rhwp bridge. This is an operation schema, not a DOM path schema.", + "properties": { + "format": { "type": "string", "get": true, "query": true, "examples": ["hwp"], "enforcement": "strict" }, + "text": { "type": "string", "get": true, "query": true, "examples": ["officecli view file.hwp text --json"], "enforcement": "strict" }, + "pages": { "type": "string", "get": true, "query": true, "description": "JSON array of extracted page text entries.", "enforcement": "report" } + } +} diff --git a/schemas/help/hwp/render-svg.json b/schemas/help/hwp/render-svg.json new file mode 100644 index 000000000..534504709 --- /dev/null +++ b/schemas/help/hwp/render-svg.json @@ -0,0 +1,13 @@ +{ + "$schema": "../_schema.json", + "format": "hwp", + "element": "render-svg", + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/svg"] }, + "note": "Read-only SVG rendering for visual inspection through rhwp. Use rendered SVG as validation evidence, not as proof of full edit safety.", + "properties": { + "page": { "type": "string", "get": true, "query": true, "examples": ["--page 1"], "enforcement": "strict" }, + "outputDir": { "type": "string", "get": true, "query": true, "examples": ["--output-dir ./svg"], "enforcement": "strict" }, + "sha256": { "type": "string", "get": true, "query": true, "description": "Hash of rendered SVG output.", "enforcement": "report" } + } +} diff --git a/schemas/help/hwp/replace-text.json b/schemas/help/hwp/replace-text.json new file mode 100644 index 000000000..09aae5f25 --- /dev/null +++ b/schemas/help/hwp/replace-text.json @@ -0,0 +1,17 @@ +{ + "$schema": "../_schema.json", + "format": "hwp", + "element": "replace-text", + "operations": { "add": false, "set": true, "get": false, "query": false, "remove": false }, + "paths": { "positional": ["/text"] }, + "note": "Search/replace operation for binary HWP through rhwp. Default mode writes a new output file. Safe in-place mode is experimental and requires --in-place --backup --verify.", + "properties": { + "find": { "type": "string", "set": true, "examples": ["--prop find=마케팅"], "enforcement": "strict" }, + "value": { "type": "string", "set": true, "examples": ["--prop value=브릿지"], "enforcement": "strict" }, + "mode": { "type": "enum", "values": ["one", "all"], "set": true, "examples": ["--prop mode=all"], "enforcement": "strict" }, + "output": { "type": "string", "set": true, "examples": ["--prop output=out.hwp"], "description": "Required unless using --in-place. Must not be combined with --in-place.", "enforcement": "strict" }, + "inPlace": { "type": "bool", "set": true, "examples": ["--in-place"], "description": "Top-level flag, not --prop. Requires --backup and --verify.", "enforcement": "strict" }, + "backup": { "type": "bool", "set": true, "examples": ["--backup"], "description": "Top-level flag required for --in-place. Creates .bak- before atomic replace.", "enforcement": "strict" }, + "verify": { "type": "bool", "set": true, "examples": ["--verify"], "description": "Top-level flag required for --in-place. Runs provider readback, semantic delta, and optional SVG evidence.", "enforcement": "strict" } + } +} diff --git a/schemas/help/hwp/save-policy.json b/schemas/help/hwp/save-policy.json new file mode 100644 index 000000000..cd97ca153 --- /dev/null +++ b/schemas/help/hwp/save-policy.json @@ -0,0 +1,16 @@ +{ + "$schema": "../_schema.json", + "format": "hwp", + "element": "save-policy", + "container": true, + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/save-policy"] }, + "note": "Binary HWP default mutation mode writes to explicit output paths. Text replacement also supports experimental safe in-place mode only with --in-place --backup --verify after temp-write, readback, semantic validation, backup, manifest, and atomic replace gates.", + "properties": { + "outputRequired": { "type": "bool", "get": true, "query": true, "examples": ["true unless --in-place"], "enforcement": "strict" }, + "inPlace": { "type": "enum", "values": ["unsupported", "experimental", "stable"], "get": true, "query": true, "examples": ["experimental"], "description": "Supported for HWP /text replacement only, with --backup --verify.", "enforcement": "strict" }, + "backup": { "type": "string", "get": true, "query": true, "description": "In-place backup path policy: .bak-.", "examples": ["file.hwp.bak-20260505013000"], "enforcement": "strict" }, + "transactionSchema": { "type": "string", "get": true, "query": true, "examples": ["schemas/interfaces/save-transaction.v1.schema.json"], "enforcement": "report" }, + "requiredChecks": { "type": "array", "get": true, "query": true, "examples": ["temp-write,provider-readback,semantic-delta,backup-created,manifest-write,atomic-replace"], "enforcement": "strict" } + } +} diff --git a/schemas/help/hwp/table-cell.json b/schemas/help/hwp/table-cell.json new file mode 100644 index 000000000..db57b706f --- /dev/null +++ b/schemas/help/hwp/table-cell.json @@ -0,0 +1,17 @@ +{ + "$schema": "../_schema.json", + "format": "hwp", + "element": "table-cell", + "parent": "table-map", + "operations": { "add": false, "set": true, "get": true, "query": true, "remove": false }, + "paths": { "stable": ["/table/cell[@section=S,@parent-para=P,@control=C,@cell=N]"], "positional": ["/table/cell"] }, + "note": "Explicit-coordinate table cell operation for HWP. Coordinates should come from table-map. HWPX table-cell mutation through rhwp remains blocked.", + "properties": { + "section": { "type": "number", "set": true, "get": true, "query": true, "examples": ["--prop section=0"], "enforcement": "strict" }, + "parent-para": { "type": "number", "set": true, "get": true, "query": true, "examples": ["--prop parent-para=3"], "enforcement": "strict" }, + "control": { "type": "number", "set": true, "get": true, "query": true, "examples": ["--prop control=0"], "enforcement": "strict" }, + "cell": { "type": "number", "set": true, "get": true, "query": true, "examples": ["--prop cell=0"], "enforcement": "strict" }, + "value": { "type": "string", "set": true, "examples": ["--prop value=오피스셀"], "enforcement": "strict" }, + "output": { "type": "string", "set": true, "examples": ["--prop output=out.hwp"], "enforcement": "strict" } + } +} diff --git a/schemas/help/hwp/table-map.json b/schemas/help/hwp/table-map.json new file mode 100644 index 000000000..f4656d6cd --- /dev/null +++ b/schemas/help/hwp/table-map.json @@ -0,0 +1,14 @@ +{ + "$schema": "../_schema.json", + "format": "hwp", + "element": "table-map", + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/tables"] }, + "note": "Read-only discovery of rhwp table coordinates. Run this before set-table-cell so agents do not guess section/control/cell coordinates.", + "properties": { + "section": { "type": "number", "get": true, "query": true, "examples": ["0"], "enforcement": "strict" }, + "parentParagraph": { "type": "number", "get": true, "query": true, "examples": ["3"], "enforcement": "strict" }, + "control": { "type": "number", "get": true, "query": true, "examples": ["0"], "enforcement": "strict" }, + "cell": { "type": "number", "get": true, "query": true, "examples": ["0"], "enforcement": "strict" } + } +} diff --git a/schemas/help/hwp/validate-output.json b/schemas/help/hwp/validate-output.json new file mode 100644 index 000000000..1aaada08f --- /dev/null +++ b/schemas/help/hwp/validate-output.json @@ -0,0 +1,15 @@ +{ + "$schema": "../_schema.json", + "format": "hwp", + "element": "validate-output", + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/validate-output"] }, + "note": "Validation contract for comparing source and edited HWP outputs. This is required before any future in-place mutation policy.", + "properties": { + "before": { "type": "string", "get": true, "query": true, "examples": ["--before in.hwp"], "enforcement": "strict" }, + "after": { "type": "string", "get": true, "query": true, "examples": ["--after out.hwp"], "enforcement": "strict" }, + "ok": { "type": "bool", "get": true, "query": true, "enforcement": "strict" }, + "checks": { "type": "array", "get": true, "query": true, "examples": ["provider-readback,semantic-delta,svg-render"], "enforcement": "strict" }, + "transaction": { "type": "object", "get": true, "query": true, "examples": ["save-transaction.v1"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/capabilities.json b/schemas/help/hwpx/capabilities.json new file mode 100644 index 000000000..c60ba98e3 --- /dev/null +++ b/schemas/help/hwpx/capabilities.json @@ -0,0 +1,14 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "capabilities", + "container": true, + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/capabilities"] }, + "note": "Machine-readable HWPX capability surface. Custom C# ZIP/XML handler remains default; rhwp is an opt-in provider for selected operations.", + "properties": { + "provider": { "type": "enum", "values": ["custom", "rhwp"], "get": true, "query": true, "examples": ["custom"], "enforcement": "strict" }, + "roundtrip": { "type": "enum", "values": ["unverified", "fixture-backed", "stable"], "get": true, "query": true, "examples": ["unverified"], "enforcement": "strict" }, + "rawXml": { "type": "enum", "values": ["unsupported", "advanced"], "get": true, "query": true, "examples": ["advanced"], "enforcement": "strict" } + } +} diff --git a/schemas/help/hwpx/connector.json b/schemas/help/hwpx/connector.json new file mode 100644 index 000000000..c99d0bdd7 --- /dev/null +++ b/schemas/help/hwpx/connector.json @@ -0,0 +1,12 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "connector", + "parent": "paragraph|section", + "operations": { "add": true, "set": true, "get": true, "query": true, "remove": true }, + "paths": { "positional": ["/section[N]/p[N]/connectLine[N]"] }, + "properties": { + "shape": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop shape=straight"], "enforcement": "report" }, + "color": { "type": "color", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop color=#333333"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/default-font.json b/schemas/help/hwpx/default-font.json new file mode 100644 index 000000000..d27a96d6c --- /dev/null +++ b/schemas/help/hwpx/default-font.json @@ -0,0 +1,12 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "default-font", + "container": true, + "operations": { "add": false, "set": true, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/default-font"] }, + "properties": { + "family": { "type": "string", "set": true, "get": true, "query": true, "examples": ["--prop family=맑은 고딕"], "enforcement": "report" }, + "size": { "type": "font-size", "set": true, "get": true, "query": true, "examples": ["--prop size=11pt"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/diff.json b/schemas/help/hwpx/diff.json new file mode 100644 index 000000000..db024a64d --- /dev/null +++ b/schemas/help/hwpx/diff.json @@ -0,0 +1,13 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "diff", + "container": true, + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/diff"] }, + "properties": { + "before": { "type": "string", "get": true, "query": true, "examples": ["--before a.hwpx"], "enforcement": "strict" }, + "after": { "type": "string", "get": true, "query": true, "examples": ["--after b.hwpx"], "enforcement": "strict" }, + "changes": { "type": "string", "get": true, "query": true, "description": "JSON diff changes.", "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/document.json b/schemas/help/hwpx/document.json new file mode 100644 index 000000000..9621865f4 --- /dev/null +++ b/schemas/help/hwpx/document.json @@ -0,0 +1,17 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "document", + "container": true, + "operations": { "add": false, "set": true, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/document"] }, + "note": "HWPX package-level document contract for the custom ZIP/XML handler.", + "children": [ + { "element": "section", "pathSegment": "section", "cardinality": "1..n" } + ], + "properties": { + "title": { "type": "string", "set": true, "get": true, "query": true, "examples": ["--prop title=문서"], "enforcement": "report" }, + "creator": { "type": "string", "set": true, "get": true, "query": true, "examples": ["--prop creator=OfficeCLI"], "enforcement": "report" }, + "defaultFont": { "type": "string", "set": true, "get": true, "query": true, "examples": ["--prop defaultFont=맑은 고딕"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/metadata.json b/schemas/help/hwpx/metadata.json new file mode 100644 index 000000000..ad2eba332 --- /dev/null +++ b/schemas/help/hwpx/metadata.json @@ -0,0 +1,13 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "metadata", + "container": true, + "operations": { "add": false, "set": true, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/metadata"] }, + "properties": { + "title": { "type": "string", "set": true, "get": true, "query": true, "examples": ["--prop title=보고서"], "enforcement": "report" }, + "author": { "type": "string", "set": true, "get": true, "query": true, "examples": ["--prop author=OfficeCLI"], "enforcement": "report" }, + "subject": { "type": "string", "set": true, "get": true, "query": true, "examples": ["--prop subject=요약"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/paragraph.json b/schemas/help/hwpx/paragraph.json new file mode 100644 index 000000000..efd1d08f1 --- /dev/null +++ b/schemas/help/hwpx/paragraph.json @@ -0,0 +1,16 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "paragraph", + "parent": "section", + "operations": { "add": true, "set": true, "get": true, "query": true, "remove": true }, + "paths": { "stable": ["/section[N]/p[@id=ID]"], "positional": ["/section[N]/p[N]"] }, + "children": [ + { "element": "run", "pathSegment": "run", "cardinality": "0..n" } + ], + "properties": { + "text": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop text=안녕하세요"], "enforcement": "strict" }, + "align": { "type": "enum", "values": ["left", "center", "right", "justify"], "add": true, "set": true, "get": true, "query": true, "examples": ["--prop align=center"], "enforcement": "report" }, + "style": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop style=본문"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/patch.json b/schemas/help/hwpx/patch.json new file mode 100644 index 000000000..aa9ea3e3d --- /dev/null +++ b/schemas/help/hwpx/patch.json @@ -0,0 +1,12 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "patch", + "operations": { "add": true, "set": true, "get": false, "query": false, "remove": true }, + "paths": { "positional": ["/patch"] }, + "note": "Planned multi-operation HWPX patch contract. The operation is schema-visible before becoming stable so agents can discover the intended contract.", + "properties": { + "ops": { "type": "string", "add": true, "set": true, "examples": ["--ops patch.json"], "enforcement": "report" }, + "output": { "type": "string", "add": true, "set": true, "examples": ["--output out.hwpx"], "enforcement": "strict" } + } +} diff --git a/schemas/help/hwpx/picture.json b/schemas/help/hwpx/picture.json new file mode 100644 index 000000000..f3c17be4a --- /dev/null +++ b/schemas/help/hwpx/picture.json @@ -0,0 +1,13 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "picture", + "parent": "paragraph|section", + "operations": { "add": true, "set": true, "get": true, "query": true, "remove": true }, + "paths": { "positional": ["/section[N]/p[N]/pic[N]"] }, + "properties": { + "src": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop src=photo.png"], "enforcement": "strict" }, + "alt": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop alt=사진"], "enforcement": "report" }, + "bindata": { "type": "string", "get": true, "query": true, "description": "Referenced BinData entry.", "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/provider-custom.json b/schemas/help/hwpx/provider-custom.json new file mode 100644 index 000000000..8e4a0b37a --- /dev/null +++ b/schemas/help/hwpx/provider-custom.json @@ -0,0 +1,13 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "provider-custom", + "container": true, + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/provider/custom"] }, + "note": "Default HWPX custom C# ZIP/XML provider. Keeps broad HWPX path/raw XML coverage while rhwp remains optional.", + "properties": { + "default": { "type": "bool", "get": true, "query": true, "examples": ["true"], "enforcement": "strict" }, + "roundtripStatus": { "type": "enum", "values": ["unverified", "fixture-backed", "stable"], "get": true, "query": true, "examples": ["unverified"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/provider-rhwp.json b/schemas/help/hwpx/provider-rhwp.json new file mode 100644 index 000000000..832844395 --- /dev/null +++ b/schemas/help/hwpx/provider-rhwp.json @@ -0,0 +1,14 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "provider-rhwp", + "container": true, + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/provider/rhwp"] }, + "note": "Optional rhwp provider for selected HWPX read/render/field/replace operations. Not the default mutator.", + "properties": { + "engine": { "type": "string", "get": true, "query": true, "examples": ["rhwp-bridge"], "enforcement": "strict" }, + "supportedOps": { "type": "string", "get": true, "query": true, "examples": ["read_text,render_svg,fill_field,replace_text"], "enforcement": "report" }, + "blockedOps": { "type": "string", "get": true, "query": true, "examples": ["set_table_cell"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/raw-xml.json b/schemas/help/hwpx/raw-xml.json new file mode 100644 index 000000000..18cdc3014 --- /dev/null +++ b/schemas/help/hwpx/raw-xml.json @@ -0,0 +1,13 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "raw-xml", + "operations": { "add": true, "set": true, "get": true, "query": true, "remove": true }, + "paths": { "positional": ["/raw-xml"] }, + "note": "Advanced escape hatch for HWPX package XML. Must be followed by package/reference validation before any stability claim.", + "properties": { + "part": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--part Contents/section0.xml"], "enforcement": "strict" }, + "xpath": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--xpath //hp:p[1]"], "enforcement": "report" }, + "value": { "type": "string", "add": true, "set": true, "examples": ["--value ''"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/render-svg.json b/schemas/help/hwpx/render-svg.json new file mode 100644 index 000000000..02cedfa8a --- /dev/null +++ b/schemas/help/hwpx/render-svg.json @@ -0,0 +1,12 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "render-svg", + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/svg"] }, + "note": "SVG render contract for HWPX through rhwp experimental provider.", + "properties": { + "page": { "type": "string", "get": true, "query": true, "examples": ["--page 1"], "enforcement": "report" }, + "outputDir": { "type": "string", "get": true, "query": true, "examples": ["--output-dir ./svg"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/run.json b/schemas/help/hwpx/run.json new file mode 100644 index 000000000..7c9ad88b0 --- /dev/null +++ b/schemas/help/hwpx/run.json @@ -0,0 +1,13 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "run", + "parent": "paragraph", + "operations": { "add": true, "set": true, "get": true, "query": true, "remove": true }, + "paths": { "positional": ["/section[N]/p[N]/run[N]"] }, + "properties": { + "text": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop text=문장"], "enforcement": "strict" }, + "bold": { "type": "bool", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop bold=true"], "enforcement": "report" }, + "font": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop font=함초롬바탕"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/section.json b/schemas/help/hwpx/section.json new file mode 100644 index 000000000..bb13ce17a --- /dev/null +++ b/schemas/help/hwpx/section.json @@ -0,0 +1,16 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "section", + "parent": "document", + "operations": { "add": true, "set": true, "get": true, "query": true, "remove": true }, + "paths": { "positional": ["/section[N]"] }, + "children": [ + { "element": "paragraph", "pathSegment": "p", "cardinality": "0..n" }, + { "element": "table", "pathSegment": "tbl", "cardinality": "0..n" } + ], + "properties": { + "pageWidth": { "type": "length", "set": true, "get": true, "query": true, "examples": ["--prop pageWidth=210mm"], "enforcement": "report" }, + "pageHeight": { "type": "length", "set": true, "get": true, "query": true, "examples": ["--prop pageHeight=297mm"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/shape.json b/schemas/help/hwpx/shape.json new file mode 100644 index 000000000..04decd019 --- /dev/null +++ b/schemas/help/hwpx/shape.json @@ -0,0 +1,12 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "shape", + "parent": "paragraph|section", + "operations": { "add": true, "set": true, "get": true, "query": true, "remove": true }, + "paths": { "positional": ["/section[N]/p[N]/shape[N]"] }, + "properties": { + "geometry": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop geometry=rect"], "enforcement": "report" }, + "fill": { "type": "color", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop fill=#00AAFF"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/style.json b/schemas/help/hwpx/style.json new file mode 100644 index 000000000..58a6dc78e --- /dev/null +++ b/schemas/help/hwpx/style.json @@ -0,0 +1,12 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "style", + "parent": "document", + "operations": { "add": true, "set": true, "get": true, "query": true, "remove": true }, + "paths": { "stable": ["/styles/style[@id=ID]"], "positional": ["/styles/style[N]"] }, + "properties": { + "id": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop id=Body"], "enforcement": "report" }, + "name": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop name=본문"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/table-cell.json b/schemas/help/hwpx/table-cell.json new file mode 100644 index 000000000..580a6b45e --- /dev/null +++ b/schemas/help/hwpx/table-cell.json @@ -0,0 +1,15 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "table-cell", + "parent": "table-row", + "operations": { "add": false, "set": true, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/section[N]/tbl[N]/tr[N]/tc[N]"] }, + "note": "HWPX custom handler table-cell path. rhwp-backed HWPX table-cell mutation remains blocked until round-trip and Hancom compatibility gates close.", + "properties": { + "text": { "type": "string", "set": true, "get": true, "query": true, "examples": ["--prop text=셀"], "enforcement": "strict" }, + "fill": { "type": "color", "set": true, "get": true, "query": true, "examples": ["--prop fill=#FFFF00"], "enforcement": "report" }, + "rowSpan": { "type": "number", "get": true, "query": true, "enforcement": "report" }, + "colSpan": { "type": "number", "get": true, "query": true, "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/table-fill.json b/schemas/help/hwpx/table-fill.json new file mode 100644 index 000000000..d5e40d31f --- /dev/null +++ b/schemas/help/hwpx/table-fill.json @@ -0,0 +1,12 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "table-fill", + "parent": "table|table-cell", + "operations": { "add": false, "set": true, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/table/fill"] }, + "properties": { + "fill": { "type": "color", "set": true, "get": true, "query": true, "examples": ["--prop fill=#EAF2FF"], "enforcement": "strict" }, + "scope": { "type": "enum", "values": ["table", "row", "cell"], "set": true, "get": true, "query": true, "examples": ["--prop scope=cell"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/table-row.json b/schemas/help/hwpx/table-row.json new file mode 100644 index 000000000..4ca00e6ce --- /dev/null +++ b/schemas/help/hwpx/table-row.json @@ -0,0 +1,14 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "table-row", + "parent": "table", + "operations": { "add": true, "set": true, "get": true, "query": true, "remove": true }, + "paths": { "positional": ["/section[N]/tbl[N]/tr[N]"] }, + "children": [ + { "element": "table-cell", "pathSegment": "tc", "cardinality": "1..n" } + ], + "properties": { + "height": { "type": "length", "set": true, "get": true, "query": true, "examples": ["--prop height=12mm"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/table.json b/schemas/help/hwpx/table.json new file mode 100644 index 000000000..27ccc6721 --- /dev/null +++ b/schemas/help/hwpx/table.json @@ -0,0 +1,16 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "table", + "parent": "section|paragraph", + "operations": { "add": true, "set": true, "get": true, "query": true, "remove": true }, + "paths": { "positional": ["/section[N]/tbl[N]", "/section[N]/p[N]/tbl[N]"] }, + "children": [ + { "element": "table-row", "pathSegment": "tr", "cardinality": "1..n" } + ], + "properties": { + "rows": { "type": "number", "add": true, "get": true, "query": true, "examples": ["--prop rows=3"], "enforcement": "strict" }, + "cols": { "type": "number", "add": true, "get": true, "query": true, "examples": ["--prop cols=4"], "enforcement": "strict" }, + "fill": { "type": "color", "set": true, "get": true, "query": true, "examples": ["--prop fill=#F2F2F2"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/text.json b/schemas/help/hwpx/text.json new file mode 100644 index 000000000..7d3e9d58f --- /dev/null +++ b/schemas/help/hwpx/text.json @@ -0,0 +1,13 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "text", + "parent": "run", + "operations": { "add": true, "set": true, "get": true, "query": true, "remove": true }, + "paths": { "positional": ["/section[N]/p[N]/run[N]/t[N]"] }, + "properties": { + "value": { "type": "string", "add": true, "set": true, "get": true, "query": true, "examples": ["--prop value=텍스트"], "enforcement": "strict" }, + "find": { "type": "string", "set": true, "examples": ["--prop find=행정안전부"], "enforcement": "report" }, + "replacement": { "type": "string", "set": true, "examples": ["--prop replacement=리지정부"], "enforcement": "report" } + } +} diff --git a/schemas/help/hwpx/validate.json b/schemas/help/hwpx/validate.json new file mode 100644 index 000000000..3424094fe --- /dev/null +++ b/schemas/help/hwpx/validate.json @@ -0,0 +1,14 @@ +{ + "$schema": "../_schema.json", + "format": "hwpx", + "element": "validate", + "container": true, + "operations": { "add": false, "set": false, "get": true, "query": true, "remove": false }, + "paths": { "positional": ["/validate"] }, + "note": "Planned HWPX package and semantic validation entry. Safe-save transactions must include ZIP, XML, manifest/header, BinData, and reference checks before any in-place write.", + "properties": { + "strict": { "type": "bool", "get": true, "query": true, "examples": ["--strict"], "enforcement": "report" }, + "ok": { "type": "bool", "get": true, "query": true, "enforcement": "strict" }, + "safeSaveTransaction": { "type": "object", "get": true, "query": true, "examples": ["save-transaction.v1"], "enforcement": "report" } + } +} diff --git a/schemas/interfaces/capability-result.v1.schema.json b/schemas/interfaces/capability-result.v1.schema.json new file mode 100644 index 000000000..3a95ca9b2 --- /dev/null +++ b/schemas/interfaces/capability-result.v1.schema.json @@ -0,0 +1,13 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "officecli/interfaces/capability-result/v1", + "title": "OfficeCLI Capability Result", + "type": "object", + "required": ["schemaVersion", "formats"], + "properties": { + "schemaVersion": { "type": "integer" }, + "formats": { "type": "object" }, + "warnings": { "type": "array", "items": { "type": "string" } } + }, + "additionalProperties": true +} diff --git a/schemas/interfaces/compatibility-corpus.v1.schema.json b/schemas/interfaces/compatibility-corpus.v1.schema.json new file mode 100644 index 000000000..fcd88f353 --- /dev/null +++ b/schemas/interfaces/compatibility-corpus.v1.schema.json @@ -0,0 +1,327 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "officecli/interfaces/compatibility-corpus/v1", + "title": "OfficeCLI HWP/HWPX Compatibility Corpus", + "type": "object", + "additionalProperties": false, + "required": [ + "schemaVersion", + "format", + "description", + "fixtures" + ], + "properties": { + "schemaVersion": { + "type": "integer", + "const": 1 + }, + "format": { + "type": "string", + "enum": [ + "hwp", + "hwpx" + ] + }, + "description": { + "type": "string", + "minLength": 1 + }, + "fixtures": { + "type": "array", + "minItems": 1, + "items": { + "$ref": "#/$defs/fixture" + } + }, + "fixtureClassCoverage": { + "type": "array", + "items": { + "$ref": "#/$defs/fixtureClassCoverage" + } + } + }, + "$defs": { + "fixture": { + "type": "object", + "additionalProperties": false, + "required": [ + "id", + "path", + "sha256", + "sizeBytes", + "classes", + "verifiedOperations", + "evidence", + "notes" + ], + "properties": { + "id": { + "type": "string", + "minLength": 1 + }, + "path": { + "type": "string", + "minLength": 1 + }, + "sha256": { + "type": "string", + "pattern": "^[0-9a-f]{64}$" + }, + "sizeBytes": { + "type": "integer", + "minimum": 1 + }, + "classes": { + "type": "array", + "minItems": 1, + "items": { + "type": "string", + "minLength": 1 + } + }, + "verifiedOperations": { + "type": "array", + "minItems": 1, + "items": { + "type": "string", + "enum": [ + "read_text", + "render_svg", + "render_png", + "export_pdf", + "export_markdown", + "thumbnail", + "document_info", + "diagnostics", + "dump_controls", + "dump_pages", + "list_fields", + "read_field", + "fill_field", + "insert_text", + "replace_text", + "read_table_cell", + "scan_cells", + "set_table_cell", + "create_blank", + "save_original", + "convert_to_editable", + "native_mutation", + "native_read", + "save_as_hwp", + "table_map" + ] + } + }, + "blockedOperations": { + "type": "array", + "items": { + "$ref": "#/$defs/blockedOperation" + } + }, + "evidence": { + "type": "array", + "items": { + "type": "string", + "minLength": 1 + } + }, + "notes": { + "type": "string", + "minLength": 1 + } + } + }, + "blockedOperation": { + "type": "object", + "additionalProperties": false, + "required": [ + "operation", + "reason", + "notes" + ], + "properties": { + "operation": { + "type": "string", + "enum": [ + "read_text", + "render_svg", + "list_fields", + "read_field", + "fill_field", + "replace_text", + "set_table_cell", + "create_blank", + "save_original", + "save_as_hwp", + "table_map" + ] + }, + "reason": { + "type": "string", + "enum": [ + "unsupported_format", + "unsupported_operation", + "unsupported_engine", + "roundtrip_unverified", + "bridge_not_enabled", + "bridge_missing", + "bridge_timeout", + "bridge_invalid_json", + "bridge_exit_nonzero", + "rhwp_runtime_missing", + "rhwp_api_missing", + "rhwp_api_missing_or_too_old", + "binary_hwp_mutation_forbidden", + "binary_hwp_write_forbidden", + "fixture_validation_failed", + "capability_schema_invalid" + ] + }, + "notes": { + "type": "string", + "minLength": 1 + } + } + }, + "fixtureClassCoverage": { + "type": "object", + "additionalProperties": false, + "required": [ + "class", + "state", + "notes" + ], + "properties": { + "class": { + "type": "string", + "enum": [ + "multi-section", + "merged-cell-tables", + "nested-tables", + "pictures-bindata", + "headers-footers", + "equations", + "unicode-edge-cases", + "malformed-hwpx-package" + ] + }, + "state": { + "type": "string", + "enum": [ + "verified", + "blocked", + "external-manual" + ] + }, + "fixtureId": { + "type": "string", + "minLength": 1 + }, + "externalLane": { + "type": "string", + "minLength": 1 + }, + "reason": { + "type": "string", + "enum": [ + "unsupported_format", + "unsupported_operation", + "unsupported_engine", + "roundtrip_unverified", + "bridge_not_enabled", + "bridge_missing", + "bridge_timeout", + "bridge_invalid_json", + "bridge_exit_nonzero", + "rhwp_runtime_missing", + "rhwp_api_missing", + "rhwp_api_missing_or_too_old", + "binary_hwp_mutation_forbidden", + "binary_hwp_write_forbidden", + "fixture_validation_failed", + "capability_schema_invalid" + ] + }, + "verifiedOperations": { + "type": "array", + "items": { + "type": "string", + "enum": [ + "read_text", + "render_svg", + "list_fields", + "read_field", + "fill_field", + "replace_text", + "set_table_cell", + "create_blank", + "save_original", + "save_as_hwp", + "table_map" + ] + } + }, + "notes": { + "type": "string", + "minLength": 1 + } + }, + "allOf": [ + { + "if": { + "properties": { + "state": { + "const": "verified" + } + } + }, + "then": { + "required": [ + "fixtureId", + "verifiedOperations" + ] + } + }, + { + "if": { + "properties": { + "state": { + "const": "blocked" + } + } + }, + "then": { + "required": [ + "reason" + ], + "not": { + "required": [ + "verifiedOperations" + ] + } + } + }, + { + "if": { + "properties": { + "state": { + "const": "external-manual" + } + } + }, + "then": { + "required": [ + "externalLane" + ], + "not": { + "required": [ + "verifiedOperations" + ] + } + } + } + ] + } + } +} diff --git a/schemas/interfaces/diff-result.v1.schema.json b/schemas/interfaces/diff-result.v1.schema.json new file mode 100644 index 000000000..d876fad28 --- /dev/null +++ b/schemas/interfaces/diff-result.v1.schema.json @@ -0,0 +1,16 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "officecli/interfaces/diff-result/v1", + "title": "OfficeCLI Diff Result", + "type": "object", + "required": ["format", "before", "after", "changes"], + "properties": { + "format": { "type": "string" }, + "before": { "type": "string" }, + "after": { "type": "string" }, + "changes": { "type": "array" }, + "summary": { "type": "object" }, + "warnings": { "type": "array", "items": { "type": "string" } } + }, + "additionalProperties": true +} diff --git a/schemas/interfaces/edit-result.v1.schema.json b/schemas/interfaces/edit-result.v1.schema.json new file mode 100644 index 000000000..f6120b721 --- /dev/null +++ b/schemas/interfaces/edit-result.v1.schema.json @@ -0,0 +1,18 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "officecli/interfaces/edit-result/v1", + "title": "OfficeCLI Edit Result", + "type": "object", + "required": ["format", "operation", "outputPath", "engine", "warnings"], + "properties": { + "format": { "type": "string" }, + "operation": { "type": "string" }, + "outputPath": { "type": "string" }, + "engine": { "type": "string" }, + "engineVersion": { "type": ["string", "null"] }, + "changedRanges": { "type": "array" }, + "validation": { "type": "object" }, + "warnings": { "type": "array", "items": { "type": "string" } } + }, + "additionalProperties": true +} diff --git a/schemas/interfaces/error-result.v1.schema.json b/schemas/interfaces/error-result.v1.schema.json new file mode 100644 index 000000000..591bc07bf --- /dev/null +++ b/schemas/interfaces/error-result.v1.schema.json @@ -0,0 +1,17 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "officecli/interfaces/error-result/v1", + "title": "OfficeCLI Error Result", + "type": "object", + "required": ["code", "message"], + "properties": { + "code": { "type": "string" }, + "message": { "type": "string" }, + "format": { "type": "string" }, + "operation": { "type": "string" }, + "engine": { "type": "string" }, + "nextCommand": { "type": "string" }, + "details": { "type": "object" } + }, + "additionalProperties": true +} diff --git a/schemas/interfaces/expected-capabilities.v1.schema.json b/schemas/interfaces/expected-capabilities.v1.schema.json new file mode 100644 index 000000000..2a5378f8a --- /dev/null +++ b/schemas/interfaces/expected-capabilities.v1.schema.json @@ -0,0 +1,158 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "officecli/interfaces/expected-capabilities/v1", + "title": "OfficeCLI HWP/HWPX Expected Capabilities", + "type": "object", + "additionalProperties": false, + "required": [ + "schemaVersion", + "description", + "formats" + ], + "properties": { + "schemaVersion": { + "type": "integer", + "const": 1 + }, + "description": { + "type": "string", + "minLength": 1 + }, + "formats": { + "type": "object", + "additionalProperties": false, + "required": [ + "hwp", + "hwpx" + ], + "properties": { + "hwp": { + "$ref": "#/$defs/formatCapabilities" + }, + "hwpx": { + "$ref": "#/$defs/formatCapabilities" + } + } + } + }, + "$defs": { + "formatCapabilities": { + "type": "object", + "additionalProperties": false, + "required": [ + "defaultEngine", + "operations" + ], + "properties": { + "defaultEngine": { + "type": "string", + "enum": [ + "custom", + "rhwp-bridge", + "none" + ] + }, + "operations": { + "type": "object", + "propertyNames": { + "enum": [ + "read_text", + "render_svg", + "render_png", + "export_pdf", + "export_markdown", + "thumbnail", + "document_info", + "diagnostics", + "dump_controls", + "dump_pages", + "list_fields", + "read_field", + "fill_field", + "insert_text", + "replace_text", + "read_table_cell", + "scan_cells", + "set_table_cell", + "create_blank", + "save_original", + "convert_to_editable", + "native_mutation", + "native_read", + "save_as_hwp" + ] + }, + "additionalProperties": { + "$ref": "#/$defs/operationCapability" + }, + "minProperties": 1 + } + } + }, + "operationCapability": { + "type": "object", + "additionalProperties": false, + "required": [ + "status", + "evidence" + ], + "properties": { + "status": { + "type": "string", + "enum": [ + "experimental", + "roundtrip-verified", + "unsupported" + ] + }, + "reason": { + "type": "string", + "enum": [ + "unsupported_format", + "unsupported_operation", + "unsupported_engine", + "roundtrip_unverified", + "bridge_not_enabled", + "bridge_missing", + "bridge_timeout", + "bridge_invalid_json", + "bridge_exit_nonzero", + "rhwp_runtime_missing", + "rhwp_api_missing", + "rhwp_api_missing_or_too_old", + "binary_hwp_mutation_forbidden", + "binary_hwp_write_forbidden", + "fixture_validation_failed", + "capability_schema_invalid" + ] + }, + "evidence": { + "type": "array", + "items": { + "type": "string", + "minLength": 1 + } + } + }, + "allOf": [ + { + "if": { + "properties": { + "status": { + "const": "unsupported" + } + }, + "required": [ + "status" + ] + }, + "then": { + "required": [ + "reason" + ] + } + } + ] + } + } +} diff --git a/schemas/interfaces/rhwp-provider-capabilities.v1.schema.json b/schemas/interfaces/rhwp-provider-capabilities.v1.schema.json new file mode 100644 index 000000000..91090733e --- /dev/null +++ b/schemas/interfaces/rhwp-provider-capabilities.v1.schema.json @@ -0,0 +1,17 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "officecli/interfaces/rhwp-provider-capabilities/v1", + "title": "OfficeCLI rhwp Provider Capabilities", + "type": "object", + "required": ["schemaVersion", "provider", "formats", "operations"], + "properties": { + "schemaVersion": { "type": "integer", "const": 1 }, + "provider": { "type": "string", "const": "rhwp" }, + "providerVersion": { "type": "string" }, + "rhwpVersion": { "type": "string" }, + "formats": { "type": "array", "items": { "type": "string" } }, + "operations": { "type": "object" }, + "warnings": { "type": "array", "items": { "type": "string" } } + }, + "additionalProperties": true +} diff --git a/schemas/interfaces/rhwp-sidecar-request.v1.schema.json b/schemas/interfaces/rhwp-sidecar-request.v1.schema.json new file mode 100644 index 000000000..b254588d4 --- /dev/null +++ b/schemas/interfaces/rhwp-sidecar-request.v1.schema.json @@ -0,0 +1,90 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "officecli/interfaces/rhwp-sidecar-request/v1", + "title": "OfficeCLI rhwp Sidecar Request", + "type": "object", + "required": [ + "schemaVersion", + "operation", + "format" + ], + "properties": { + "schemaVersion": { + "type": "integer", + "const": 1 + }, + "operation": { + "type": "string", + "enum": [ + "create_blank", + "read_text", + "render_svg", + "render_png", + "export_pdf", + "export_markdown", + "thumbnail", + "document_info", + "diagnostics", + "dump_controls", + "dump_pages", + "list_fields", + "read_field", + "fill_field", + "insert_text", + "replace_text", + "read_table_cell", + "scan_cells", + "table_map", + "set_table_cell", + "convert_to_editable", + "native_mutation", + "native_read", + "save_as_hwp" + ] + }, + "format": { + "type": "string", + "enum": [ + "hwp", + "hwpx" + ] + }, + "inputPath": { + "type": "string" + }, + "outputPath": { + "type": "string" + }, + "arguments": { + "type": "object" + }, + "json": { + "type": "boolean" + } + }, + "additionalProperties": false, + "allOf": [ + { + "if": { + "properties": { + "operation": { + "const": "create_blank" + } + }, + "required": [ + "operation" + ] + }, + "then": { + "required": [ + "outputPath" + ] + }, + "else": { + "required": [ + "inputPath" + ] + } + } + ] +} diff --git a/schemas/interfaces/rhwp-sidecar-response.v1.schema.json b/schemas/interfaces/rhwp-sidecar-response.v1.schema.json new file mode 100644 index 000000000..1f7b3e072 --- /dev/null +++ b/schemas/interfaces/rhwp-sidecar-response.v1.schema.json @@ -0,0 +1,20 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "officecli/interfaces/rhwp-sidecar-response/v1", + "title": "OfficeCLI rhwp Sidecar Response", + "type": "object", + "required": ["schemaVersion", "ok", "operation", "format", "engineVersion"], + "properties": { + "schemaVersion": { "type": "integer", "const": 1 }, + "ok": { "type": "boolean" }, + "operation": { "type": "string" }, + "format": { "type": "string", "enum": ["hwp", "hwpx"] }, + "engineVersion": { "type": "string" }, + "outputPath": { "type": "string" }, + "data": { "type": "object" }, + "validation": { "type": "object" }, + "warnings": { "type": "array", "items": { "type": "string" } }, + "error": { "$ref": "error-result.v1.schema.json" } + }, + "additionalProperties": true +} diff --git a/schemas/interfaces/save-policy.v1.schema.json b/schemas/interfaces/save-policy.v1.schema.json new file mode 100644 index 000000000..991716cb4 --- /dev/null +++ b/schemas/interfaces/save-policy.v1.schema.json @@ -0,0 +1,50 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "officecli/interfaces/save-policy/v1", + "title": "OfficeCLI Safe Save Policy", + "type": "object", + "required": [ + "schemaVersion", + "format", + "mode", + "outputRequired", + "backupRequired", + "transactionRequired", + "validationRequired" + ], + "properties": { + "schemaVersion": { "type": "integer", "const": 1 }, + "format": { "type": "string", "enum": ["hwp", "hwpx"] }, + "mode": { "type": "string", "enum": ["output", "in-place"] }, + "outputRequired": { "type": "boolean" }, + "backupRequired": { "type": "boolean" }, + "transactionRequired": { "type": "boolean" }, + "validationRequired": { "type": "boolean" }, + "atomicReplace": { "type": "boolean" }, + "sameDirectoryTemp": { "type": "boolean" }, + "preserveOriginalOnFailure": { "type": "boolean" }, + "allowedWithoutBackup": { "type": "boolean" }, + "requiredChecks": { + "type": "array", + "items": { + "type": "string", + "enum": [ + "temp-write", + "fsync-temp", + "provider-readback", + "alternate-provider-readback", + "semantic-delta", + "field-delta", + "table-delta", + "hwpx-package-integrity", + "svg-render", + "visual-diff", + "backup-created", + "atomic-replace" + ] + } + }, + "warnings": { "type": "array", "items": { "type": "string" } } + }, + "additionalProperties": false +} diff --git a/schemas/interfaces/save-transaction.v1.schema.json b/schemas/interfaces/save-transaction.v1.schema.json new file mode 100644 index 000000000..ad1610942 --- /dev/null +++ b/schemas/interfaces/save-transaction.v1.schema.json @@ -0,0 +1,52 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "officecli/interfaces/save-transaction/v1", + "title": "OfficeCLI Safe Save Transaction Result", + "type": "object", + "required": [ + "schemaVersion", + "ok", + "format", + "operation", + "mode", + "inputPath", + "outputPath", + "verified", + "checks", + "warnings" + ], + "properties": { + "schemaVersion": { "type": "integer", "const": 1 }, + "ok": { "type": "boolean" }, + "format": { "type": "string", "enum": ["hwp", "hwpx"] }, + "operation": { "type": "string" }, + "mode": { "type": "string", "enum": ["output", "in-place"] }, + "inputPath": { "type": "string" }, + "outputPath": { "type": "string" }, + "tempPath": { "type": ["string", "null"] }, + "backupPath": { "type": ["string", "null"] }, + "manifestPath": { "type": ["string", "null"] }, + "verified": { "type": "boolean" }, + "checks": { + "type": "array", + "items": { + "type": "object", + "required": ["name", "ok"], + "properties": { + "name": { "type": "string" }, + "ok": { "type": "boolean" }, + "severity": { "type": "string", "enum": ["info", "warning", "error"] }, + "message": { "type": "string" }, + "details": { "type": "object" } + }, + "additionalProperties": true + } + }, + "semanticDelta": { "type": "object" }, + "visualDelta": { "type": "object" }, + "packageIntegrity": { "type": "object" }, + "warnings": { "type": "array", "items": { "type": "string" } }, + "error": { "$ref": "error-result.v1.schema.json" } + }, + "additionalProperties": true +} diff --git a/schemas/interfaces/validation-result.v1.schema.json b/schemas/interfaces/validation-result.v1.schema.json new file mode 100644 index 000000000..351ef508d --- /dev/null +++ b/schemas/interfaces/validation-result.v1.schema.json @@ -0,0 +1,14 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "officecli/interfaces/validation-result/v1", + "title": "OfficeCLI Validation Result", + "type": "object", + "required": ["ok", "checks"], + "properties": { + "ok": { "type": "boolean" }, + "checks": { "type": "array", "items": { "type": "object" } }, + "errors": { "type": "array", "items": { "type": "object" } }, + "warnings": { "type": "array", "items": { "type": "string" } } + }, + "additionalProperties": true +} diff --git a/scripts/build-rhwp-sidecars.sh b/scripts/build-rhwp-sidecars.sh new file mode 100755 index 000000000..504dc5e94 --- /dev/null +++ b/scripts/build-rhwp-sidecars.sh @@ -0,0 +1,129 @@ +#!/bin/bash +set -euo pipefail + +if [ "$#" -lt 1 ]; then + echo "Usage: scripts/build-rhwp-sidecars.sh [rid] [Release|Debug]" >&2 + exit 2 +fi + +OUT_DIR="$1" +TARGET_RID="${2:-}" +CONFIG="${3:-Release}" +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +ROOT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" +BRIDGE_PROJECT="$ROOT_DIR/src/rhwp-officecli-bridge/rhwp-officecli-bridge.csproj" +API_MANIFEST="$ROOT_DIR/src/rhwp-field-bridge/Cargo.toml" + +detect_local_rid() { + local OS + local ARCH + local LIBC + OS=$(uname -s | tr '[:upper:]' '[:lower:]') + ARCH=$(uname -m) + LIBC="gnu" + if [ "$OS" = "linux" ]; then + if command -v ldd >/dev/null 2>&1 && ldd --version 2>&1 | grep -qi musl; then + LIBC="musl" + elif [ -f /etc/alpine-release ]; then + LIBC="musl" + fi + fi + case "$OS" in + darwin) + case "$ARCH" in + arm64) echo "osx-arm64" ;; + x86_64) echo "osx-x64" ;; + esac ;; + linux) + case "$ARCH" in + x86_64) + if [ "$LIBC" = "musl" ]; then echo "linux-musl-x64"; else echo "linux-x64"; fi ;; + aarch64|arm64) + if [ "$LIBC" = "musl" ]; then echo "linux-musl-arm64"; else echo "linux-arm64"; fi ;; + esac ;; + esac +} + +LOCAL_RID="$(detect_local_rid)" +if [ -z "$LOCAL_RID" ]; then + echo "Unsupported local platform for rhwp sidecars: $(uname -s) $(uname -m)" >&2 + exit 1 +fi + +if [ -n "$TARGET_RID" ] && [ "$TARGET_RID" != "$LOCAL_RID" ]; then + echo "Skipping rhwp sidecars for $TARGET_RID; local Rust sidecar build is $LOCAL_RID." + exit 0 +fi + +mkdir -p "$OUT_DIR" + +BRIDGE_TMP="$(mktemp -d)" +cleanup() { + rm -rf "$BRIDGE_TMP" +} +trap cleanup EXIT + +echo "Building rhwp-officecli-bridge ($LOCAL_RID)..." +dotnet publish "$BRIDGE_PROJECT" \ + -c "$CONFIG" \ + -r "$LOCAL_RID" \ + -o "$BRIDGE_TMP" \ + --self-contained true \ + -p:PublishSingleFile=true \ + -p:PublishTrimmed=false \ + --nologo -v quiet + +BRIDGE_OUT="" +for candidate in \ + "$BRIDGE_TMP/rhwp-officecli-bridge" \ + "$BRIDGE_TMP/rhwp-officecli-bridge.exe" \ + "$BRIDGE_TMP/rhwp-officecli-bridge.dll" +do + if [ -f "$candidate" ]; then + BRIDGE_OUT="$candidate" + break + fi +done + +if [ -z "$BRIDGE_OUT" ]; then + echo "rhwp-officecli-bridge publish completed but no bridge executable was found." >&2 + exit 1 +fi + +cp "$BRIDGE_OUT" "$OUT_DIR/$(basename "$BRIDGE_OUT")" +chmod +x "$OUT_DIR/$(basename "$BRIDGE_OUT")" 2>/dev/null || true + +echo "Building rhwp-field-bridge ($LOCAL_RID)..." +CONFIG_LOWER="$(printf '%s' "$CONFIG" | tr '[:upper:]' '[:lower:]')" +API_FEATURES="${OFFICECLI_RHWP_API_FEATURES:-native-skia}" +if [ "$API_FEATURES" = "none" ]; then + API_FEATURES="" +fi +FEATURE_ARGS=() +if [ -n "$API_FEATURES" ]; then + FEATURE_ARGS=(--features "$API_FEATURES") +fi +if [ "$CONFIG_LOWER" = "release" ]; then + cargo build --manifest-path "$API_MANIFEST" --release "${FEATURE_ARGS[@]}" + API_BIN="$ROOT_DIR/src/rhwp-field-bridge/target/release/rhwp-field-bridge" +else + cargo build --manifest-path "$API_MANIFEST" "${FEATURE_ARGS[@]}" + API_BIN="$ROOT_DIR/src/rhwp-field-bridge/target/debug/rhwp-field-bridge" +fi + +if [ ! -f "$API_BIN" ]; then + echo "rhwp-field-bridge build completed but no executable was found at $API_BIN." >&2 + exit 1 +fi + +cp "$API_BIN" "$OUT_DIR/rhwp-field-bridge" +chmod +x "$OUT_DIR/rhwp-field-bridge" + +if [ "$(uname -s)" = "Darwin" ]; then + xattr -d com.apple.quarantine "$OUT_DIR/rhwp-officecli-bridge" 2>/dev/null || true + xattr -d com.apple.quarantine "$OUT_DIR/rhwp-field-bridge" 2>/dev/null || true + codesign -s - -f "$OUT_DIR/rhwp-officecli-bridge" 2>/dev/null || true + codesign -s - -f "$OUT_DIR/rhwp-field-bridge" 2>/dev/null || true +fi + +echo "rhwp sidecars copied to $OUT_DIR" diff --git a/scripts/hwpx_form_edit.py b/scripts/hwpx_form_edit.py new file mode 100644 index 000000000..37e5535dd --- /dev/null +++ b/scripts/hwpx_form_edit.py @@ -0,0 +1,1664 @@ +#!/usr/bin/env python3 +"""HWPX Korean Document Pattern Matching and Editing Prototype. + +Classifies HWPX documents (exam, regulation, form, report, mixed) and +provides extraction/editing utilities for Korean government forms, exam +papers, regulations, and application documents. + +Usage: + python hwpx_form_edit.py classify doc.hwpx + python hwpx_form_edit.py hierarchy doc.hwpx + python hwpx_form_edit.py appendix doc.hwpx + python hwpx_form_edit.py strip-lineseg doc.hwpx output.hwpx + python hwpx_form_edit.py extract doc.hwpx + python hwpx_form_edit.py digit-headings doc.hwpx + python hwpx_form_edit.py pages doc.hwpx + python hwpx_form_edit.py problems doc.hwpx + python hwpx_form_edit.py incell doc.hwpx + python hwpx_form_edit.py markers doc.hwpx + python hwpx_form_edit.py headers-footers doc.hwpx + python hwpx_form_edit.py fill doc.hwpx output.hwpx '성명=홍길동,주소=서울' +""" + +from __future__ import annotations + +import argparse +import json +import os +import re +import shutil +import sys +import tempfile +import zipfile +from typing import Any + +# Security: Use defusedxml if available (XXE defense) +try: + import defusedxml.ElementTree as ET +except ImportError: + import xml.etree.ElementTree as ET + import warnings + warnings.warn( + "defusedxml not available - using stdlib ElementTree. " + "Install defusedxml for enhanced security: pip install defusedxml", + ImportWarning, + stacklevel=2, + ) + +# --------------------------------------------------------------------------- +# HWPX Namespaces +# --------------------------------------------------------------------------- + +NS = { + "hp": "urn:hancom:hwpml:2011:paragraph", + "hs": "urn:hancom:hwpml:2011:section", + "hh": "urn:hancom:hwpml:2011:head", +} + +# Fallback namespace variants (some documents use http:// or 2016 URIs) +NS_ALT = { + "hp": "http://www.hancom.co.kr/hwpml/2011/paragraph", + "hs": "http://www.hancom.co.kr/hwpml/2011/section", + "hh": "http://www.hancom.co.kr/hwpml/2011/head", +} + +# --------------------------------------------------------------------------- +# Compiled Regex Patterns (R1-R25) +# --------------------------------------------------------------------------- + +# -- Tier 1: Structure Detection -- + +# R1: Chapter/section heading (제1장 총칙, 제2절 ...) +R1_CHAPTER_HEADING = re.compile(r"^제\s*(\d+)\s*[장절편관]\s*(.+)") + +# R2: Article (제1조(목적), 제3조의2(특례)) +R2_ARTICLE = re.compile( + r"^제\s*(\d+)\s*조(?:\s*의\s*(\d+))?\s*[((]\s*(.+?)\s*[))]" +) + +# R3: Circled number item (① 항목 ...) +R3_CIRCLED_NUMBER = re.compile( + r"^\s*[①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑲⑳]\s*(.+)" +) + +# R4: Numbered list (1. 항목) +R4_NUMBERED_LIST = re.compile(r"^\s*(\d{1,2})\.\s+(.+)") + +# R5: Korean letter list (가. 항목) +R5_KOREAN_LETTER = re.compile( + r"^\s*[가나다라마바사아자차카타파하]\.\s*(.+)" +) + +# -- Tier 2: Form Patterns -- + +# R6: Checkbox flat (□ 항목, ■ 항목) +R6_CHECKBOX_FLAT = re.compile(r"^\s*[□■☐☑]\s*(.+)") + +# R7: Inline checkbox group (구분: □ A □ B □ C) +R7_CHECKBOX_GROUP = re.compile( + r"^(.+?)[\s::]\s*[□■]\s*(.+?)(?:\s*[□■]\s*(.+?))*" +) + +# R8: Appendix reference ([별첨 제1호], [별지], [별표 2]) +R8_APPENDIX_REF = re.compile(r"\[별[첨지표]\s*(?:제?\s*(\d+)\s*호?)?\]") + +# R9: Digit-concatenated heading (3지원금 집행기준) +R9_DIGIT_HEADING = re.compile(r"^(\d{1,2})([가-힣])") + +# R10: Label-colon-value (성명: 홍길동) +R10_LABEL_COLON_VALUE = re.compile(r"([가-힣]{2,6})\s*[::]\s*(.+)") + +# -- Tier 3: Content Patterns -- + +# R11: Date (2024.03.15, 2024-3-15, 2024년 3월 15일) +R11_DATE = re.compile( + r"\d{4}[.\-/년]\s*\d{1,2}[.\-/월]\s*\d{1,2}[일]?" +) + +# R12: Currency amount (1,000,000 원) +R12_CURRENCY = re.compile(r"[\d,]+\s*원") + +# R13: Phone number (02-1234-5678, 010-1234-5678) +R13_PHONE = re.compile(r"\d{2,3}-\d{3,4}-\d{4}") + +# R14: Resident registration number (880101-1234567) +R14_RRN = re.compile(r"\d{6}-[1-4]\d{6}") + +# R15: Checkbox hierarchy markers (□=0, ○=1, -=2, *=3) +R15_CHECKBOX_HIERARCHY = re.compile(r"^([□○●◎\-\*])\s*(.+)") + +# R16: Appendix ref (same as R8, kept as alias for Tier 3 grouping) +R16_APPENDIX_REF = R8_APPENDIX_REF + +# R17: Digit heading (same as R9, kept as alias for Tier 3 grouping) +R17_DIGIT_HEADING = R9_DIGIT_HEADING + +# -- Tier 3: Shared Utilities -- + +# R18: Whitespace collapse +R18_WHITESPACE = re.compile(r"\s+") + +# R19: Trailing colon strip +R19_TRAILING_COLON = re.compile(r"[::]\s*$") + +# R20: Short Korean label heuristic (2-8 chars, Korean+spaces+parens) +R20_SHORT_KOREAN_LABEL = re.compile(r"^[\uAC00-\uD7A3\s()·]{2,8}$") + +# R21: Checkbox prefix strip +R21_CHECKBOX_PREFIX = re.compile(r"^[□○●◎\-\*]\s*") + +# R22: Chapter/section number extract +R22_CHAPTER_NUM = re.compile(r"제(\d+)[장절편]\s") + +# R23: Article number extract +R23_ARTICLE_NUM = re.compile(r"제(\d+)조") + +# R24: Parenthesized text extract +R24_PAREN_TEXT = re.compile(r"\((.+?)\)") + +# R25: Leading number strip +R25_LEADING_NUMBER = re.compile(r"^\d{1,2}[.)]?\s*") + +# -- Phase C: New Patterns (R26-R41) -- + +# R26: Problem/question number (KICE exam style) +R26_PROBLEM_NUMBER = re.compile( + r"^\s*(?:" + r"(\d{1,2})\s*[..]" # "1." or "1." + r"|(\d{1,2})\s*번" # "1번" + r"|\[(\d{1,2})\]" # "[1]" + r"|\((\d{1,2})\)" # "(1)" + r")" + r"\s*(.*)", + re.DOTALL, +) + +# R27-R30: Table context patterns (in-cell detection) +R27_CELL_LABEL = re.compile(r"^([가-힣]{2,6})\s*$") # Short Korean label only +R28_CELL_VALUE = re.compile(r"^[^\uAC00-\uD7A3\s]+$") # Non-Korean value only +R29_CELL_MIXED = re.compile(r"([가-힣]{2,6})\s*[::]\s*(.+)") # Label: value +R30_CELL_CHECKBOX = re.compile(r"^[□■☐☑]") # Cell starts with checkbox + +# R31-R33: Page number patterns (footer detection) +R31_PAGE_NUM_DASH = re.compile(r"^\s*-\s*\d+\s*-\s*$") # "- 5 -" +R32_PAGE_NUM_PLAIN = re.compile(r"^\s*\d+\s*$") # "5" +R33_PAGE_NUM_OF = re.compile(r"^\s*\d+\s*/\s*\d+\s*$") # "5 / 10" + +# R34-R36: Header/footer spatial markers +R34_HEADER_MARKER = re.compile(r"^(?:제목|머리말|Header)") # Header keywords +R35_FOOTER_MARKER = re.compile(r"^(?:바닥글|Footer|페이지)") # Footer keywords +R36_SHORT_LINE = re.compile(r"^.{1,15}$") # Short line (header/footer candidate) + +# R37-R40: Korean punctuation markers (kordoc P13) +R37_KR_COMMA = re.compile(r"[,,、]") # Korean/CJK comma variants +R38_KR_PERIOD = re.compile(r"[.。.]") # Korean/CJK period variants +R39_KR_SPACE_COMMA = re.compile(r"\s+,") # Space before comma (error) +R40_KR_DOUBLE_SPACE = re.compile(r" +") # Multiple spaces + +# R41: Merge-line heuristic (cross-script boundary) +R41_CROSS_SCRIPT = re.compile( + r"([\uAC00-\uD7A3])\s*$" # Korean char at end + r"|\s*([\uAC00-\uD7A3])" # or Korean char at start +) + +# --------------------------------------------------------------------------- +# Label keywords for form field detection +# --------------------------------------------------------------------------- + +LABEL_KEYWORDS: set[str] = { + # Personal info + "성명", "이름", "주소", "전화", "전화번호", "휴대폰", "연락처", "핸드폰", + "생년월일", "주민등록번호", "소속", "직위", "직급", "부서", + "이메일", "학교", "학년", "반", "번호", "학번", "학적", "학과", + "캠퍼스", "대학", "단과대학", + # Application-related + "신청인", "대표자", "담당자", "작성자", "확인자", "승인자", + "일시", "날짜", "기간", "장소", "목적", "사유", "비고", + # Amount/quantity + "금액", "수량", "단가", "합계", "계", "소계", + # Form-specific + "동아리명", "사업분야", "참가구분", "접수", "인원수", "아이템", + "사업명", "기관명", "단체명", "프로젝트명", + # Regulation-specific + "비목", "항목해설", "증빙", "집행", "비용항목", "지출", + "결제일", "결제금액", "카드번호", "승인번호", "사용처", + "구분", "내용", "지도교수", "검수자", "검수일", +} + +KR_CHAR_RE = re.compile(r"^[\uAC00-\uD7AF\u3131-\u318E]$") + +# --------------------------------------------------------------------------- +# XML / ZIP Helpers +# --------------------------------------------------------------------------- + + +def local_tag(el: ET.Element) -> str: + """Return the local name of an element, ignoring namespace.""" + tag = el.tag + return tag.split("}")[-1] if "}" in tag else tag + + +def has_tag(parent: ET.Element, tag_name: str) -> bool: + """Check if any descendant has the given local tag name.""" + return any(local_tag(child) == tag_name for child in parent.iter()) + + +def collect_text(el: ET.Element) -> str: + """Concatenate all text nodes under an element.""" + parts: list[str] = [] + for child in el.iter(): + if local_tag(child) == "t" and child.text: + parts.append(child.text) + return "".join(parts) + + +def find_all_paragraphs(root: ET.Element) -> list[ET.Element]: + """Return all paragraph

elements regardless of namespace.""" + return [ + el + for el in root.iter() + if el.tag.endswith("}p") and "paragraph" in el.tag + ] + + +def _list_section_files(zf: zipfile.ZipFile) -> list[str]: + """List section XML files inside a HWPX zip (Contents/section0.xml, etc.).""" + sections: list[str] = [] + for name in sorted(zf.namelist()): + if name.startswith("Contents/section") and name.endswith(".xml"): + sections.append(name) + return sections + + +def _parse_section(zf: zipfile.ZipFile, section_path: str) -> ET.Element: + """Parse a section XML file from a HWPX zip into an ElementTree root. + + Args: + zf: Open ZipFile object. + section_path: Path to section XML within the ZIP. + + Returns: + Parsed XML root element. + + Raises: + ValueError: If section_path contains suspicious path components (ZipSlip defense). + """ + # ZipSlip defense + _validate_zip_path(section_path) + + with zf.open(section_path) as f: + return ET.fromstring(f.read()) + + +def _validate_zip_path(path: str) -> None: + """Validate ZIP entry path for ZipSlip attacks. + + Args: + path: ZIP entry filename to validate. + + Raises: + ValueError: If path contains '..', absolute paths, or null bytes. + """ + if ".." in path.split("/"): + raise ValueError(f"ZipSlip attack detected: path contains '..': {path}") + if os.path.isabs(path): + raise ValueError(f"ZipSlip attack detected: absolute path: {path}") + if "\x00" in path: + raise ValueError(f"ZipSlip attack detected: null byte in path: {path}") + + +def normalize_uniform_spaces(text: str) -> str: + """Normalize uniformly-distributed single-character Korean tokens. + + Korean form software often inserts spaces between every character + for visual alignment (e.g. "학 번" for "학번"). This collapses those + back when 70%+ of space-separated tokens are single Korean characters + and total length <= 30. + """ + if len(text) > 30 or " " not in text: + return text + tokens = text.split(" ") + if len(tokens) < 2: + return text + kr_single = sum(1 for t in tokens if len(t) == 1 and KR_CHAR_RE.match(t)) + # For 2-token case: both must be single Korean chars (e.g. "학 번") + # For 3+ tokens: 70% threshold applies (e.g. "소 속 대 학") + if len(tokens) == 2: + if kr_single == 2: + return "".join(tokens) + elif kr_single / len(tokens) >= 0.7: + return "".join(tokens) + return text + + +# F7: Phone spacing normalization +R_PHONE_SPACED = re.compile( + r"(? str: + """Collapse uniform-spaced phone numbers. + + Examples: + "45 0 -7 3 40" -> "450-7340" + "0 1 0 -1 2 3 4 -5 6 7 8" -> "010-1234-5678" + """ + def _collapse(m: re.Match) -> str: + return "-".join( + m.group(i).replace(" ", "") for i in (1, 2, 3) + ) + + return R_PHONE_SPACED.sub(_collapse, text) + + +# --------------------------------------------------------------------------- +# Core: extract_paragraphs +# --------------------------------------------------------------------------- + + +def extract_paragraphs(hwpx_path: str) -> list[str]: + """Extract all paragraph texts from all HWPX sections. + + Opens the HWPX file as a ZIP, iterates over all section XML files, + finds paragraph elements, and collects their text content. + + Args: + hwpx_path: Path to the .hwpx file. + + Returns: + List of paragraph text strings (may include empty strings for + blank paragraphs). + """ + texts: list[str] = [] + with zipfile.ZipFile(hwpx_path, "r") as zf: + for section_path in _list_section_files(zf): + root = _parse_section(zf, section_path) + for p in find_all_paragraphs(root): + texts.append(normalize_phone_spacing(collect_text(p))) + return texts + + +# --------------------------------------------------------------------------- +# Core: classify_document +# --------------------------------------------------------------------------- + + +def classify_document(hwpx_path: str) -> tuple[str, dict[str, Any]]: + """Classify an HWPX document into one of 5 types based on content analysis. + + Types: + exam - equations > 3 AND rect shapes > 5 (KICE-style exam papers) + regulation - circle_bullets > 10 AND (appendix_refs > 0 OR + article_refs > 3) AND tables > 10 + form - tables > 0 AND (checkboxes > 0 OR label_keywords > 3) + report - paragraphs > 50 AND tables < 3 + mixed - default fallback + + Args: + hwpx_path: Path to the .hwpx file. + + Returns: + Tuple of (document_type, stats_dict). + """ + stats: dict[str, int] = { + "equations": 0, + "tables": 0, + "checkboxes": 0, + "circle_bullets": 0, + "rects": 0, + "appendix_refs": 0, + "article_refs": 0, + "total_paragraphs": 0, + "empty_paragraphs": 0, + "label_keywords_found": 0, + } + + with zipfile.ZipFile(hwpx_path, "r") as zf: + for section_path in _list_section_files(zf): + root = _parse_section(zf, section_path) + _accumulate_stats(root, stats) + + # Classification logic + if stats["equations"] > 3 and stats["rects"] > 5: + return "exam", stats + + is_regulation = ( + stats["circle_bullets"] > 10 + and (stats["appendix_refs"] > 0 or stats["article_refs"] > 3) + and stats["tables"] > 10 + ) + if is_regulation: + return "regulation", stats + + if stats["tables"] > 0 and ( + stats["checkboxes"] > 0 or stats["label_keywords_found"] > 3 + ): + return "form", stats + + non_empty = stats["total_paragraphs"] - stats["empty_paragraphs"] + if non_empty > 50 and stats["tables"] < 3: + return "report", stats + + return "mixed", stats + + +def _accumulate_stats(root: ET.Element, stats: dict[str, int]) -> None: + """Walk an XML root and accumulate document statistics.""" + all_paragraphs = find_all_paragraphs(root) + stats["total_paragraphs"] += len(all_paragraphs) + + for p in all_paragraphs: + text = collect_text(p).strip() + + if not text: + stats["empty_paragraphs"] += 1 + continue + + # Checkbox markers + if re.search(r"[□■☑☐]", text): + stats["checkboxes"] += 1 + + # Circle bullets + if text.startswith("○"): + stats["circle_bullets"] += 1 + + # Appendix references + if R8_APPENDIX_REF.search(text): + stats["appendix_refs"] += 1 + + # Article references (제N조/항/호) + if re.search(r"제\d+[조호항]", text): + stats["article_refs"] += 1 + + # Label keyword detection + normalized = normalize_uniform_spaces(text) + for kw in LABEL_KEYWORDS: + if kw in normalized: + stats["label_keywords_found"] += 1 + break # count at most once per paragraph + + # Count structural elements across the entire tree + for el in root.iter(): + tag = local_tag(el) + if tag == "equation" or tag == "script": + # Count only substantive equations (script text > 3 chars) + if tag == "script": + if el.text and len(el.text.strip()) > 3: + stats["equations"] += 1 + else: + stats["equations"] += 1 + elif tag == "tbl": + stats["tables"] += 1 + elif tag == "rect": + stats["rects"] += 1 + + +# --------------------------------------------------------------------------- +# Core: extract_checkbox_hierarchy +# --------------------------------------------------------------------------- + +DEPTH_MAP: dict[str, int] = { + "□": 0, + "○": 1, + "●": 1, + "◎": 1, + "-": 2, + "*": 3, +} + + +def extract_checkbox_hierarchy( + paragraphs: list[str], +) -> list[dict[str, Any]]: + """Extract 4-level checkbox hierarchy from paragraph texts. + + Hierarchy levels: + □ = heading (depth 0) + ○/●/◎ = item (depth 1) + - = detail (depth 2) + * = note (depth 3) + + Args: + paragraphs: List of paragraph text strings. + + Returns: + List of dicts with keys: depth, marker, text, paragraph_index, + children (always an empty list; caller may build tree from depth). + """ + items: list[dict[str, Any]] = [] + + for i, text in enumerate(paragraphs): + stripped = text.strip() + m = R15_CHECKBOX_HIERARCHY.match(stripped) + if m: + marker = m.group(1) + content = m.group(2).strip() + items.append( + { + "depth": DEPTH_MAP.get(marker, 0), + "marker": marker, + "text": content, + "paragraph_index": i, + "children": [], + } + ) + + return items + + +# --------------------------------------------------------------------------- +# Core: extract_appendix_refs +# --------------------------------------------------------------------------- + + +def extract_appendix_refs( + paragraphs: list[str], +) -> list[dict[str, Any]]: + """Extract appendix references ([별첨 제N호], [별지], [별표]) from paragraphs. + + Args: + paragraphs: List of paragraph text strings. + + Returns: + List of dicts with keys: ref, number (int or None), title, paragraph_index. + """ + refs: list[dict[str, Any]] = [] + + for i, text in enumerate(paragraphs): + stripped = text.strip() + m = R8_APPENDIX_REF.search(stripped) + if m: + number = int(m.group(1)) if m.group(1) else None + title = stripped[m.end() :].strip() if m.end() < len(stripped) else "" + refs.append( + { + "ref": m.group(0), + "number": number, + "title": title[:60], + "paragraph_index": i, + } + ) + + return refs + + +# --------------------------------------------------------------------------- +# Core: detect_digit_headings +# --------------------------------------------------------------------------- + + +def detect_digit_headings( + paragraphs: list[str], +) -> list[dict[str, Any]]: + """Detect digit-concatenated headings (e.g. '3지원금 집행기준'). + + These are non-standard section numbering patterns found in Korean + regulations where a digit is directly concatenated to the title + without any space or punctuation. + + Args: + paragraphs: List of paragraph text strings. + + Returns: + List of dicts with keys: number, title, paragraph_index. + """ + headings: list[dict[str, Any]] = [] + + for i, text in enumerate(paragraphs): + stripped = text.strip() + m = R9_DIGIT_HEADING.match(stripped) + if m: + num = int(m.group(1)) + title = stripped[len(m.group(1)) :].strip() + if len(title) >= 3: # require at least 3 chars in title + headings.append( + { + "number": num, + "title": title, + "paragraph_index": i, + } + ) + + return headings + + +# --------------------------------------------------------------------------- +# Phase C: New Functions (C1-C9) +# --------------------------------------------------------------------------- + + +def parse_field_string(fields_str: str) -> dict[str, str]: + """Parse a comma-separated key=value field string with comma protection (C9). + + Splits on commas only when followed by a Korean/English key and '='. + This allows values to contain commas safely. + + Examples: + "성명=홍길동,주소=서울시 강남구" -> {"성명": "홍길동", "주소": "서울시 강남구"} + "목적=연구, 개발,기간=1년" -> {"목적": "연구, 개발", "기간": "1년"} + + Args: + fields_str: Comma-separated "key=value" string. + + Returns: + Dict mapping field labels to values. + + Raises: + ValueError: If a pair lacks '=' separator. + """ + # Split only on comma followed by a key pattern (Korean or Latin + '=') + pairs = re.split(r",(?=[가-힣A-Za-z][가-힣A-Za-z\s]*=)", fields_str) + + result: dict[str, str] = {} + for pair in pairs: + pair = pair.strip() + if not pair: + continue + if "=" not in pair: + raise ValueError(f"Invalid field pair (missing '='): {pair!r}") + key, value = pair.split("=", 1) + result[key.strip()] = value.strip() + + return result + + +def find_page_boundaries(hwpx_path: str) -> list[dict[str, Any]]: + """Detect page/column boundaries from paragraph attributes (C1). + + HWPX paragraphs can carry pageBreak="1" or columnBreak="1" attributes + on their

element or nested / nodes. This scans + all section files and returns an ordered list of boundaries. + + Args: + hwpx_path: Path to the .hwpx file. + + Returns: + List of dicts with keys: + - paragraph_index (int): global paragraph index + - break_type (str): "page" | "column" | "section" + - section (str): section filename (e.g. "Contents/section0.xml") + """ + boundaries: list[dict[str, Any]] = [] + global_idx = 0 + + with zipfile.ZipFile(hwpx_path, "r") as zf: + section_files = _list_section_files(zf) + + for sec_idx, section_path in enumerate(section_files): + # Each new section file is implicitly a section break + if sec_idx > 0: + boundaries.append({ + "paragraph_index": global_idx, + "break_type": "section", + "section": section_path, + }) + + root = _parse_section(zf, section_path) + paragraphs = find_all_paragraphs(root) + + for p in paragraphs: + # Check direct attributes on

+ page_break = p.get("pageBreak", "0") + col_break = p.get("columnBreak", "0") + + # Also check namespace-prefixed attributes + for attr_name in list(p.attrib.keys()): + local = attr_name.split("}")[-1] if "}" in attr_name else attr_name + if local == "pageBreak" and p.get(attr_name) == "1": + page_break = "1" + elif local == "columnBreak" and p.get(attr_name) == "1": + col_break = "1" + + # Also scan child elements for break attributes + for child in p.iter(): + ltag = local_tag(child) + if ltag in ("paraShape", "paraPr", "lineseg"): + if child.get("pageBreak") == "1": + page_break = "1" + if child.get("columnBreak") == "1": + col_break = "1" + + if page_break == "1": + boundaries.append({ + "paragraph_index": global_idx, + "break_type": "page", + "section": section_path, + }) + elif col_break == "1": + boundaries.append({ + "paragraph_index": global_idx, + "break_type": "column", + "section": section_path, + }) + + global_idx += 1 + + return boundaries + + +def find_problem_starts(hwpx_path: str) -> list[dict[str, Any]]: + """Map problem/question numbers to paragraph indices (exam documents, C2). + + Scans all paragraphs for patterns like "1.", "1번", "[1]", "(1)" at the + start of text. Returns ordered list of problem starts with their paragraph + positions. + + Args: + hwpx_path: Path to the .hwpx file. + + Returns: + List of dicts with keys: + - problem_number (int): detected problem number + - paragraph_index (int): global paragraph index + - preview (str): first 60 chars of problem text + - pattern (str): which pattern matched ("dot", "번", "bracket", "paren") + """ + problems: list[dict[str, Any]] = [] + paragraphs = extract_paragraphs(hwpx_path) + + for i, text in enumerate(paragraphs): + stripped = text.strip() + if not stripped: + continue + + m = R26_PROBLEM_NUMBER.match(stripped) + if not m: + continue + + # Determine which group matched + if m.group(1) is not None: + num, pattern = int(m.group(1)), "dot" + elif m.group(2) is not None: + num, pattern = int(m.group(2)), "번" + elif m.group(3) is not None: + num, pattern = int(m.group(3)), "bracket" + elif m.group(4) is not None: + num, pattern = int(m.group(4)), "paren" + else: + continue + + # Filter: problem numbers should be reasonable (1-50) + if num < 1 or num > 50: + continue + + # Extract preview text + rest = m.group(5) or "" + preview = rest[:60].strip() if rest else "(no text)" + + problems.append({ + "problem_number": num, + "paragraph_index": i, + "preview": preview, + "pattern": pattern, + }) + + return problems + + +def detect_incell_patterns(hwpx_path: str) -> dict[str, Any]: + """Detect in-cell form patterns (labels, values, checkboxes) in tables (C3). + + Analyzes table cell contents to identify common patterns: + - Short Korean labels (e.g., "성명", "주소") + - Value-only cells (numbers, dates) + - Label: value cells (e.g., "성명: 홍길동") + - Checkbox cells + + Args: + hwpx_path: Path to the .hwpx file. + + Returns: + Dict with keys: + - label_cells (int): count of label-only cells + - value_cells (int): count of value-only cells + - mixed_cells (int): count of label:value cells + - checkbox_cells (int): count of checkbox cells + - total_cells (int): total cells analyzed + """ + stats = { + "label_cells": 0, + "value_cells": 0, + "mixed_cells": 0, + "checkbox_cells": 0, + "total_cells": 0, + } + + with zipfile.ZipFile(hwpx_path, "r") as zf: + section_files = _list_section_files(zf) + + for section_path in section_files: + root = _parse_section(zf, section_path) + + # Find all table cells (tbl > tc or similar) + for el in root.iter(): + ltag = local_tag(el) + if ltag == "tc": # table cell + stats["total_cells"] += 1 + cell_text = collect_text(el).strip() + + if not cell_text: + continue + + # Check patterns in order of specificity + if R30_CELL_CHECKBOX.match(cell_text): + stats["checkbox_cells"] += 1 + elif R29_CELL_MIXED.match(cell_text): + stats["mixed_cells"] += 1 + elif R27_CELL_LABEL.match(cell_text): + stats["label_cells"] += 1 + elif R28_CELL_VALUE.match(cell_text): + stats["value_cells"] += 1 + + return stats + + +def strip_page_numbers( + hwpx_path: str, + output_path: str, + patterns: list[str] | None = None, +) -> dict[str, Any]: + """Remove page number paragraphs from HWPX (C4). + + Detects and removes paragraphs that contain only page numbers in common + formats: "- 5 -", "5", "5 / 10". + + Args: + hwpx_path: Path to input .hwpx file. + output_path: Path for output .hwpx file. + patterns: List of pattern names to use ("dash", "plain", "of"). + If None, uses all patterns. + + Returns: + Dict with keys: + - removed_count (int): number of paragraphs removed + - patterns_used (list[str]): patterns that matched + """ + if patterns is None: + patterns = ["dash", "plain", "of"] + + pattern_map = { + "dash": R31_PAGE_NUM_DASH, + "plain": R32_PAGE_NUM_PLAIN, + "of": R33_PAGE_NUM_OF, + } + + active_patterns = [pattern_map[p] for p in patterns if p in pattern_map] + removed_count = 0 + matched_patterns: set[str] = set() + + tmp_fd, tmp_path = tempfile.mkstemp(suffix=".hwpx") + os.close(tmp_fd) + + try: + with zipfile.ZipFile(hwpx_path, "r") as zf_in: + with zipfile.ZipFile(tmp_path, "w", zipfile.ZIP_DEFLATED) as zf_out: + for item in zf_in.infolist(): + _validate_zip_path(item.filename) + data = zf_in.read(item.filename) + + is_section = ( + item.filename.endswith(".xml") + and re.search(r"[Ss]ection\d+\.xml$", item.filename) + ) + + if is_section: + root = ET.fromstring(data.decode("utf-8")) + paragraphs = find_all_paragraphs(root) + + # Mark paragraphs for removal + to_remove: list[ET.Element] = [] + for p in paragraphs: + p_text = collect_text(p).strip() + for pat_name, regex in zip(patterns, active_patterns): + if regex.match(p_text): + to_remove.append(p) + matched_patterns.add(pat_name) + break + + # Remove marked paragraphs + for p in to_remove: + parent = None + for candidate in root.iter(): + if p in list(candidate): + parent = candidate + break + if parent is not None: + parent.remove(p) + removed_count += 1 + + # Serialize back + xml_text = ET.tostring(root, encoding="unicode", xml_declaration=False) + if data.decode("utf-8").startswith("\n' + xml_text + data = xml_text.encode("utf-8") + + # Write entry + if item.filename == "mimetype": + zf_out.writestr(item, data, compress_type=zipfile.ZIP_STORED) + else: + zf_out.writestr(item, data) + + shutil.move(tmp_path, output_path) + except Exception: + if os.path.exists(tmp_path): + os.unlink(tmp_path) + raise + + return { + "removed_count": removed_count, + "patterns_used": sorted(matched_patterns), + } + + +def detect_headers_footers(hwpx_path: str) -> dict[str, Any]: + """Detect header/footer paragraphs based on spatial markers (C5). + + Identifies paragraphs that likely represent headers or footers based on: + - Short length (<= 15 chars) + - Header/footer keywords + - Position in section (first/last paragraphs) + + Args: + hwpx_path: Path to the .hwpx file. + + Returns: + Dict with keys: + - headers (list[dict]): detected headers with paragraph_index, text + - footers (list[dict]): detected footers with paragraph_index, text + - total_candidates (int): total header/footer candidates found + """ + headers: list[dict[str, Any]] = [] + footers: list[dict[str, Any]] = [] + global_idx = 0 + + with zipfile.ZipFile(hwpx_path, "r") as zf: + section_files = _list_section_files(zf) + + for section_path in section_files: + root = _parse_section(zf, section_path) + paragraphs = find_all_paragraphs(root) + + for local_idx, p in enumerate(paragraphs): + p_text = collect_text(p).strip() + + # Short line heuristic + if not R36_SHORT_LINE.match(p_text): + global_idx += 1 + continue + + # Check for header markers + if R34_HEADER_MARKER.search(p_text) or local_idx < 2: + headers.append({ + "paragraph_index": global_idx, + "text": p_text, + "section": section_path, + }) + # Check for footer markers + elif R35_FOOTER_MARKER.search(p_text) or local_idx >= len(paragraphs) - 2: + footers.append({ + "paragraph_index": global_idx, + "text": p_text, + "section": section_path, + }) + + global_idx += 1 + + return { + "headers": headers, + "footers": footers, + "total_candidates": len(headers) + len(footers), + } + + +def detect_korean_markers(paragraphs: list[str]) -> dict[str, Any]: + """Detect Korean punctuation markers and spacing errors (C6). + + Analyzes text for: + - Korean comma variants (,,、) + - Korean period variants (.。.) + - Space before comma errors + - Multiple consecutive spaces + + Args: + paragraphs: List of paragraph text strings. + + Returns: + Dict with keys: + - kr_commas (int): paragraphs with Korean commas + - kr_periods (int): paragraphs with Korean periods + - space_comma_errors (int): paragraphs with space before comma + - double_space_errors (int): paragraphs with multiple spaces + """ + stats = { + "kr_commas": 0, + "kr_periods": 0, + "space_comma_errors": 0, + "double_space_errors": 0, + } + + for text in paragraphs: + if R37_KR_COMMA.search(text): + stats["kr_commas"] += 1 + if R38_KR_PERIOD.search(text): + stats["kr_periods"] += 1 + if R39_KR_SPACE_COMMA.search(text): + stats["space_comma_errors"] += 1 + if R40_KR_DOUBLE_SPACE.search(text): + stats["double_space_errors"] += 1 + + return stats + + +def _is_marker_line(text: str) -> bool: + """Helper: Check if a line is likely a structural marker (C7 helper). + + Args: + text: Paragraph text. + + Returns: + True if the line appears to be a heading, label, or marker. + """ + stripped = text.strip() + if not stripped: + return False + + # Check for structural patterns + if R1_CHAPTER_HEADING.match(stripped): + return True + if R2_ARTICLE.match(stripped): + return True + if R3_CIRCLED_NUMBER.match(stripped): + return True + if R4_NUMBERED_LIST.match(stripped): + return True + if R5_KOREAN_LETTER.match(stripped): + return True + if R6_CHECKBOX_FLAT.match(stripped): + return True + + # Short all-caps or all-Korean lines + if len(stripped) <= 20 and (stripped.isupper() or R20_SHORT_KOREAN_LABEL.match(stripped)): + return True + + return False + + +def should_merge_lines(line1: str, line2: str) -> bool: + """Determine if two lines should be merged based on cross-script boundaries (C7). + + Korean text often gets split mid-sentence when mixed with English/numbers. + This detects cases where: + - Line 1 ends with Korean but no sentence-ending punctuation + - Line 2 starts with Korean or continues the sentence + - Neither line is a structural marker (heading, list item, etc.) + + Args: + line1: First line text. + line2: Second line text. + + Returns: + True if lines should be merged. + """ + if not line1 or not line2: + return False + + # Don't merge if either is a marker line + if _is_marker_line(line1) or _is_marker_line(line2): + return False + + # Check for sentence-ending punctuation + if line1.rstrip().endswith((".", "。", "!", "?", ":", ":")): + return False + + # Cross-script merge heuristic + # If line1 ends with Korean char and no punctuation, likely continuation + if R41_CROSS_SCRIPT.search(line1): + return True + + # If line2 starts with lowercase or number, likely continuation + if line2 and line2[0].islower(): + return True + + return False + + +def merge_lines(paragraphs: list[str]) -> list[str]: + """Merge paragraphs that were incorrectly split mid-sentence (C7). + + Uses cross-script boundary heuristics to identify and merge split lines. + + Args: + paragraphs: List of paragraph text strings. + + Returns: + New list of paragraphs with splits merged. + """ + if not paragraphs: + return [] + + merged: list[str] = [paragraphs[0]] + + for current in paragraphs[1:]: + if merged and should_merge_lines(merged[-1], current): + merged[-1] = merged[-1].rstrip() + " " + current.lstrip() + else: + merged.append(current) + + return merged + + +def fill_hwpx_preserve( + hwpx_path: str, + output_path: str, + fields: dict[str, str], +) -> dict[str, Any]: + """Fill form fields in HWPX by direct XML surgery, preserving styles (C8). + + Strategy: + 1. Open HWPX as ZIP + 2. Find section XML files matching /[Ss]ection\\d+\\.xml$/ + 3. For each field, find the paragraph containing the label + 4. Replace text in the first element of the first + (preserving charPrIDRef for style continuity) + 5. Clear remaining runs in that paragraph + 6. Strip all linesegarray elements (force recalc) + 7. Rewrite ZIP preserving original structure + + Args: + hwpx_path: Path to input .hwpx file. + output_path: Path for output .hwpx file. + fields: Dict of {label: value} to fill. + + Returns: + Dict with keys: + - filled (list[str]): labels that were successfully filled + - not_found (list[str]): labels not found in document + - lineseg_stripped (int): count of lineseg elements removed + """ + filled: list[str] = [] + not_found: list[str] = list(fields.keys()) + total_lineseg = 0 + + tmp_fd, tmp_path = tempfile.mkstemp(suffix=".hwpx") + os.close(tmp_fd) + + try: + with zipfile.ZipFile(hwpx_path, "r") as zf_in: + with zipfile.ZipFile(tmp_path, "w", zipfile.ZIP_DEFLATED) as zf_out: + for item in zf_in.infolist(): + _validate_zip_path(item.filename) + data = zf_in.read(item.filename) + + is_section = ( + item.filename.endswith(".xml") + and re.search(r"[Ss]ection\d+\.xml$", item.filename) + ) + + if is_section: + xml_text = data.decode("utf-8") + + # Parse and process + root = ET.fromstring(xml_text) + paragraphs = find_all_paragraphs(root) + + for p in paragraphs: + p_text = collect_text(p).strip() + normalized_p = normalize_uniform_spaces(p_text) + + for label, value in list(fields.items()): + if label in not_found and label in normalized_p: + t_elements = [ + el for el in p.iter() + if local_tag(el) == "t" + ] + + if t_elements: + # Strategy: find "label: ___" pattern and replace the value part + # NOT the label itself + import re as _re + colon_pat = _re.compile( + _re.escape(label) + r"\s*[::]\s*(.*)", + _re.DOTALL, + ) + + replaced = False + for t_el in t_elements: + if t_el.text is None: + continue + m = colon_pat.search(normalize_uniform_spaces(t_el.text)) + if m: + # Replace only the value portion after label: + t_el.text = colon_pat.sub( + label + ": " + value, + t_el.text, + count=1, + ) + replaced = True + break + + if not replaced: + # Fallback: look for adjacent after the label + label_idx = -1 + for i, t_el in enumerate(t_elements): + if t_el.text and label in normalize_uniform_spaces(t_el.text): + label_idx = i + break + + if label_idx >= 0 and label_idx + 1 < len(t_elements): + # Set the NEXT element (value cell) + t_elements[label_idx + 1].text = value + replaced = True + + if replaced: + filled.append(label) + not_found.remove(label) + + # Serialize back + xml_text = ET.tostring( + root, encoding="unicode", xml_declaration=False + ) + + # Add XML declaration if it was present + if data.decode("utf-8").startswith("\n' + xml_text + + # Strip lineseg + count_open = len(_LINESEG_OPEN.findall(xml_text)) + count_self = len(_LINESEG_SELF.findall(xml_text)) + xml_text = _LINESEG_OPEN.sub("", xml_text) + xml_text = _LINESEG_SELF.sub("", xml_text) + total_lineseg += count_open + count_self + + data = xml_text.encode("utf-8") + + # Write entry + if item.filename == "mimetype": + zf_out.writestr(item, data, compress_type=zipfile.ZIP_STORED) + else: + zf_out.writestr(item, data) + + shutil.move(tmp_path, output_path) + except Exception: + if os.path.exists(tmp_path): + os.unlink(tmp_path) + raise + + return { + "filled": filled, + "not_found": not_found, + "lineseg_stripped": total_lineseg, + } + + +# --------------------------------------------------------------------------- +# Core: strip_lineseg +# --------------------------------------------------------------------------- + +_LINESEG_OPEN = re.compile( + r"<(?:\w+:)?linesegarray[^>]*>.*?", + re.DOTALL, +) +_LINESEG_SELF = re.compile(r"<(?:\w+:)?linesegarray[^/]*/>") + + +def strip_lineseg(hwpx_path: str, output_path: str) -> int: + """Strip all linesegarray elements from an HWPX file. + + Linesegarray elements store line-break position caches. Removing + them forces Hancom Office to recalculate line breaks on open, which + is required after content edits to avoid layout corruption. + + Args: + hwpx_path: Path to the input .hwpx file. + output_path: Path for the output .hwpx file. + + Returns: + Total count of linesegarray elements stripped. + """ + total_stripped = 0 + + # Work in a temporary file for atomic write + tmp_fd, tmp_path = tempfile.mkstemp(suffix=".hwpx") + os.close(tmp_fd) + + try: + with zipfile.ZipFile(hwpx_path, "r") as zf_in: + with zipfile.ZipFile(tmp_path, "w", zipfile.ZIP_DEFLATED) as zf_out: + for item in zf_in.infolist(): + data = zf_in.read(item.filename) + + if item.filename.endswith(".xml") and item.filename.startswith( + "Contents/section" + ): + xml_text = data.decode("utf-8") + + # Count before stripping + count_open = len(_LINESEG_OPEN.findall(xml_text)) + count_self = len(_LINESEG_SELF.findall(xml_text)) + + # Strip + xml_text = _LINESEG_OPEN.sub("", xml_text) + xml_text = _LINESEG_SELF.sub("", xml_text) + + total_stripped += count_open + count_self + data = xml_text.encode("utf-8") + + # Preserve mimetype as STORED (first entry convention) + if item.filename == "mimetype": + zf_out.writestr(item, data, compress_type=zipfile.ZIP_STORED) + else: + zf_out.writestr(item, data) + + # Atomic move to final destination + shutil.move(tmp_path, output_path) + except Exception: + # Clean up temp file on failure + if os.path.exists(tmp_path): + os.unlink(tmp_path) + raise + + return total_stripped + + +# --------------------------------------------------------------------------- +# CLI +# --------------------------------------------------------------------------- + + +def _cmd_classify(args: argparse.Namespace) -> None: + """Handle the 'classify' subcommand.""" + doc_type, stats = classify_document(args.hwpx_path) + print(f"Document type: {doc_type}") + print(f"Statistics:") + for key, value in sorted(stats.items()): + print(f" {key}: {value}") + + +def _cmd_hierarchy(args: argparse.Namespace) -> None: + """Handle the 'hierarchy' subcommand.""" + paragraphs = extract_paragraphs(args.hwpx_path) + items = extract_checkbox_hierarchy(paragraphs) + + if not items: + print("No checkbox hierarchy found.") + return + + print(f"Checkbox hierarchy ({len(items)} items):") + for item in items: + indent = " " * item["depth"] + print( + f" {indent}{item['marker']} [{item['depth']}] " + f"(p{item['paragraph_index']}): {item['text'][:70]}" + ) + + +def _cmd_appendix(args: argparse.Namespace) -> None: + """Handle the 'appendix' subcommand.""" + paragraphs = extract_paragraphs(args.hwpx_path) + refs = extract_appendix_refs(paragraphs) + + if not refs: + print("No appendix references found.") + return + + print(f"Appendix references ({len(refs)} found):") + for ref in refs: + num_str = f"#{ref['number']}" if ref["number"] is not None else "(unnumbered)" + print( + f" {ref['ref']} {num_str} " + f"(p{ref['paragraph_index']}): {ref['title']}" + ) + + +def _cmd_strip_lineseg(args: argparse.Namespace) -> None: + """Handle the 'strip-lineseg' subcommand.""" + count = strip_lineseg(args.hwpx_path, args.output_path) + print(f"Stripped {count} linesegarray element(s).") + print(f"Output: {args.output_path}") + + +def _cmd_extract(args: argparse.Namespace) -> None: + """Handle the 'extract' subcommand.""" + paragraphs = extract_paragraphs(args.hwpx_path) + + non_empty = [p for p in paragraphs if p.strip()] + print(f"Total paragraphs: {len(paragraphs)} ({len(non_empty)} non-empty)") + print("---") + for i, text in enumerate(paragraphs): + stripped = text.strip() + if stripped: + print(f"[{i:04d}] {stripped}") + + +def _cmd_digit_headings(args: argparse.Namespace) -> None: + """Handle the 'digit-headings' subcommand.""" + paragraphs = extract_paragraphs(args.hwpx_path) + headings = detect_digit_headings(paragraphs) + + if not headings: + print("No digit-concatenated headings found.") + return + + print(f"Digit headings ({len(headings)} found):") + for h in headings: + print(f" {h['number']}. {h['title']} (p{h['paragraph_index']})") + + +def _cmd_pages(args: argparse.Namespace) -> None: + """Handle the 'pages' subcommand (C1).""" + boundaries = find_page_boundaries(args.hwpx_path) + + if not boundaries: + print("No page/column/section boundaries found.") + return + + print(f"Boundaries ({len(boundaries)} found):") + for b in boundaries: + print( + f" [{b['break_type']:7s}] paragraph {b['paragraph_index']:4d} " + f"({b['section']})" + ) + + +def _cmd_problems(args: argparse.Namespace) -> None: + """Handle the 'problems' subcommand (C2).""" + problems = find_problem_starts(args.hwpx_path) + + if not problems: + print("No problem numbers found.") + return + + print(f"Problems ({len(problems)} found):") + for prob in problems: + print( + f" Problem {prob['problem_number']:2d} [{prob['pattern']:7s}] " + f"p{prob['paragraph_index']:4d}: {prob['preview']}" + ) + + +def _cmd_incell(args: argparse.Namespace) -> None: + """Handle the 'incell' subcommand (C3).""" + stats = detect_incell_patterns(args.hwpx_path) + + print(f"In-cell pattern analysis:") + print(f" Total cells: {stats['total_cells']}") + print(f" Label cells: {stats['label_cells']}") + print(f" Value cells: {stats['value_cells']}") + print(f" Mixed cells: {stats['mixed_cells']}") + print(f" Checkbox cells: {stats['checkbox_cells']}") + + +def _cmd_markers(args: argparse.Namespace) -> None: + """Handle the 'markers' subcommand (C6).""" + paragraphs = extract_paragraphs(args.hwpx_path) + stats = detect_korean_markers(paragraphs) + + print(f"Korean marker analysis:") + print(f" Paragraphs with Korean commas: {stats['kr_commas']}") + print(f" Paragraphs with Korean periods: {stats['kr_periods']}") + print(f" Space before comma errors: {stats['space_comma_errors']}") + print(f" Multiple space errors: {stats['double_space_errors']}") + + +def _cmd_headers_footers(args: argparse.Namespace) -> None: + """Handle the 'headers-footers' subcommand (C5).""" + result = detect_headers_footers(args.hwpx_path) + + print(f"Header/footer detection ({result['total_candidates']} found):") + + if result["headers"]: + print(f"\nHeaders ({len(result['headers'])}):") + for h in result["headers"]: + print(f" p{h['paragraph_index']:4d}: {h['text']}") + + if result["footers"]: + print(f"\nFooters ({len(result['footers'])}):") + for f in result["footers"]: + print(f" p{f['paragraph_index']:4d}: {f['text']}") + + +def _cmd_fill(args: argparse.Namespace) -> None: + """Handle the 'fill' subcommand (C8).""" + fields = parse_field_string(args.fields) + result = fill_hwpx_preserve(args.hwpx_path, args.output_path, fields) + + print(f"Filled {len(result['filled'])} field(s):") + for label in result["filled"]: + print(f" + {label}") + + if result["not_found"]: + print(f"Not found ({len(result['not_found'])}):") + for label in result["not_found"]: + print(f" - {label}") + + print(f"Lineseg stripped: {result['lineseg_stripped']}") + print(f"Output: {args.output_path}") + + +def main() -> None: + """Entry point with argparse-based CLI.""" + parser = argparse.ArgumentParser( + description="HWPX Korean Document Pattern Matching and Editing", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=( + "Examples:\n" + " %(prog)s classify doc.hwpx\n" + " %(prog)s hierarchy doc.hwpx\n" + " %(prog)s appendix doc.hwpx\n" + " %(prog)s strip-lineseg doc.hwpx output.hwpx\n" + " %(prog)s extract doc.hwpx\n" + " %(prog)s digit-headings doc.hwpx\n" + " %(prog)s pages doc.hwpx\n" + " %(prog)s problems doc.hwpx\n" + " %(prog)s incell doc.hwpx\n" + " %(prog)s markers doc.hwpx\n" + " %(prog)s headers-footers doc.hwpx\n" + " %(prog)s fill doc.hwpx output.hwpx '성명=홍길동,주소=서울'\n" + ), + ) + + subparsers = parser.add_subparsers(dest="command", required=True) + + # classify + p_classify = subparsers.add_parser( + "classify", help="Classify document type (exam/regulation/form/report/mixed)" + ) + p_classify.add_argument("hwpx_path", help="Path to .hwpx file") + p_classify.set_defaults(func=_cmd_classify) + + # hierarchy + p_hierarchy = subparsers.add_parser( + "hierarchy", help="Extract checkbox hierarchy (4-level depth)" + ) + p_hierarchy.add_argument("hwpx_path", help="Path to .hwpx file") + p_hierarchy.set_defaults(func=_cmd_hierarchy) + + # appendix + p_appendix = subparsers.add_parser( + "appendix", help="Extract appendix references" + ) + p_appendix.add_argument("hwpx_path", help="Path to .hwpx file") + p_appendix.set_defaults(func=_cmd_appendix) + + # strip-lineseg + p_strip = subparsers.add_parser( + "strip-lineseg", help="Strip linesegarray elements from HWPX" + ) + p_strip.add_argument("hwpx_path", help="Path to input .hwpx file") + p_strip.add_argument("output_path", help="Path for output .hwpx file") + p_strip.set_defaults(func=_cmd_strip_lineseg) + + # extract + p_extract = subparsers.add_parser( + "extract", help="Extract all paragraph texts" + ) + p_extract.add_argument("hwpx_path", help="Path to .hwpx file") + p_extract.set_defaults(func=_cmd_extract) + + # digit-headings (bonus command for detect_digit_headings) + p_digit = subparsers.add_parser( + "digit-headings", help="Detect digit-concatenated headings" + ) + p_digit.add_argument("hwpx_path", help="Path to .hwpx file") + p_digit.set_defaults(func=_cmd_digit_headings) + + # pages (C1) + p_pages = subparsers.add_parser( + "pages", help="Detect page/column/section boundaries" + ) + p_pages.add_argument("hwpx_path", help="Path to .hwpx file") + p_pages.set_defaults(func=_cmd_pages) + + # problems (C2) + p_problems = subparsers.add_parser( + "problems", help="Map problem/question numbers to paragraphs" + ) + p_problems.add_argument("hwpx_path", help="Path to .hwpx file") + p_problems.set_defaults(func=_cmd_problems) + + # incell (C3) + p_incell = subparsers.add_parser( + "incell", help="Detect in-cell form patterns (tables)" + ) + p_incell.add_argument("hwpx_path", help="Path to .hwpx file") + p_incell.set_defaults(func=_cmd_incell) + + # markers (C6) + p_markers = subparsers.add_parser( + "markers", help="Detect Korean punctuation markers and spacing errors" + ) + p_markers.add_argument("hwpx_path", help="Path to .hwpx file") + p_markers.set_defaults(func=_cmd_markers) + + # headers-footers (C5) + p_hf = subparsers.add_parser( + "headers-footers", help="Detect header/footer paragraphs" + ) + p_hf.add_argument("hwpx_path", help="Path to .hwpx file") + p_hf.set_defaults(func=_cmd_headers_footers) + + # fill (C8) + p_fill = subparsers.add_parser( + "fill", help="Fill form fields via XML surgery (style-preserving)" + ) + p_fill.add_argument("hwpx_path", help="Path to input .hwpx file") + p_fill.add_argument("output_path", help="Path for output .hwpx file") + p_fill.add_argument( + "fields", + help="Comma-separated key=value pairs (e.g. '성명=홍길동,주소=서울')" + ) + p_fill.set_defaults(func=_cmd_fill) + + args = parser.parse_args() + args.func(args) + + +if __name__ == "__main__": + main() diff --git a/skills/morph-ppt-3d/SKILL.md b/skills/morph-ppt-3d/SKILL.md index dd1c8c29b..ba17064b9 100644 --- a/skills/morph-ppt-3d/SKILL.md +++ b/skills/morph-ppt-3d/SKILL.md @@ -70,28 +70,18 @@ Tell the user: "Your topic is [X]. I suggest using a 3D model of [description]. 2. **Sketchfab API** (no auth needed for search): ```bash - curl -s "https://api.sketchfab.com/v3/search?type=models&q=[keyword]&downloadable=true&archives_flavours=glb" \ - | python3 -c " - import json, sys - data = json.load(sys.stdin) - for m in data.get('results', [])[:5]: - print(f\"Name: {m['name']}\") - print(f\"URL: https://sketchfab.com/3d-models/{m['slug']}-{m['uid']}\") - print(f\"Likes: {m.get('likeCount', 0)}, License: {m.get('license', {}).get('label', 'unknown')}\") - print() - " + agbrowse fetch "https://api.sketchfab.com/v3/search?type=models&q=[keyword]&downloadable=true&archives_flavours=glb" --json --browser never ``` + Parse the JSON `content` for `results[].name`, `results[].slug`, `results[].uid`, `results[].likeCount`, and `results[].license.label`. 3. **Poly Pizza** (direct GLB download, all free): ```bash - # Search results page — parse for download links - curl -s "https://poly.pizza/api/search/[keyword]" 2>/dev/null + agbrowse fetch "https://poly.pizza/api/search/[keyword]" --json --browser never ``` 4. **Khronos glTF-Sample-Assets** (guaranteed to work, always available): ```bash - # Direct download — no auth, no API, always works curl -L -o model.glb "https://raw.githubusercontent.com/KhronosGroup/glTF-Sample-Assets/main/Models/[ModelName]/glTF-Binary/[ModelName].glb" ``` Available models: Duck, Fox, Avocado, BrainStem, CesiumMan, DamagedHelmet, FlightHelmet, Lantern, Suzanne, WaterBottle, etc. diff --git a/skills/officecli-data-dashboard/SKILL.md b/skills/officecli-data-dashboard/SKILL.md index e7d2b0a34..148d7e910 100644 --- a/skills/officecli-data-dashboard/SKILL.md +++ b/skills/officecli-data-dashboard/SKILL.md @@ -133,7 +133,11 @@ officecli close "$FILE" officecli validate "$FILE" ``` -Verified end-to-end on a 12-row revenue CSV: `validate` reports no errors, Dashboard opens first, `Dashboard/A2.cachedValue` resolves (2,075,000 for the test data), chart renders with values linked. +Verified end-to-end on a 12-row revenue CSV: `validate` reports no structural +errors, Dashboard opens first, formulas remain formulas, and chart ranges link +to the intended cells. Treat `cachedValue` as helpful evidence only, not proof +that the target spreadsheet app has recalculated; run the recalc/error gate or +record the app-open caveat before delivery. ## Design Ideas diff --git a/skills/officecli-docx/SKILL.md b/skills/officecli-docx/SKILL.md index 44b5effe8..933efd89d 100644 --- a/skills/officecli-docx/SKILL.md +++ b/skills/officecli-docx/SKILL.md @@ -421,7 +421,7 @@ officecli add "$FILE" /body --type paragraph --prop text="2. Market Diagnosis .. # ... one per heading ``` -Use this when the live-field option leaves the literal prompt visible to the reader. Page numbers are manually set. For approximate pagination preview: `officecli view "$FILE" html` and read the returned HTML file to eyeball layout. For exact page numbers: open in your target viewer (Word / WPS / etc.) — precise numbers only come from the final render in that viewer. This recipe assumes you can get approximate page positions from the document structure. `add --type toc` (live field) remains correct for recipients whose viewer recalculates on open (or who will press F9) — this recipe is for everyone else. +Use this when the live-field option leaves the literal prompt visible to the reader. Page numbers are manually set. For approximate pagination preview: `officecli view "$FILE" html` and read the returned HTML file to eyeball layout. For exact page numbers: open in your target viewer (Word / WPS / etc.) — precise numbers only come from the final render in that viewer. This recipe assumes you can get approximate page positions from the document structure. `add --type toc` (live field) remains correct for recipients whose viewer updates Word fields on open (or who will press F9) — this recipe is for everyone else. ### Forcing page breaks — belt-and-suspenders for cross-viewer reliability diff --git a/skills/officecli-hwpx/SKILL.md b/skills/officecli-hwpx/SKILL.md new file mode 100644 index 000000000..16af1bcfa --- /dev/null +++ b/skills/officecli-hwpx/SKILL.md @@ -0,0 +1,401 @@ +--- +name: officecli-hwpx +description: "Use this skill any time a .hwpx file is involved -- as input, output, or for analysis. This includes: creating new HWPX from scratch or from Markdown; reading, parsing, or extracting text; editing or modifying existing documents; querying document structure; validating integrity; comparing documents; working with Korean (한글) office documents. Trigger whenever the user mentions 'HWP', 'HWPX', '한글 문서', '한글 파일', 'Hancom', or references a .hwpx filename." +--- + +# OfficeCLI HWPX Skill + +## Quick Decision + +| Task | Supported? | Command | +|------|-----------|---------| +| Create new .hwpx | ✅ Yes | `officecli create file.hwpx` | +| Create from Markdown | ✅ Yes | `officecli create file.hwpx --from-markdown input.md` | +| Read / analyze .hwpx | ✅ Yes | `view text`, `annotated`, `outline`, `stats`, `html`, `markdown`, `tables`, `forms`, `objects` | +| Edit existing .hwpx | ✅ Yes | `set`, `add`, `remove`, `move`, `swap` | +| Label-based fill | ✅ Yes | `set /table/fill --prop '라벨=값'` or `--prop 'fill:라벨=값'` | +| New form field creation (`text/checkbox/dropdown`) | 🟡 Blocked | source prototype exists, but Hancom golden/manual verification and published binary parity are not closed yet | +| Form recognize | ✅ Yes | `view forms --auto` (label-value auto-detect) | +| Table map | ✅ Yes | `view tables` (2D grid + labels) | +| Markdown export | ✅ Yes | `view markdown` | +| Equation (수식) | ✅ Yes | `add --type equation --prop 'script={1 over 2}'` | +| Object finder | ✅ Yes | `view objects` (picture/field/bookmark/equation) | +| Query (expanded) | ✅ Yes | `query 'tc[text~=홍길동]'`, `:has()`, `>` combinator | +| Template merge | ✅ Yes | `merge template.hwpx out.hwpx --data '{"key":"val"}'` | +| Swap elements | ✅ Yes | `swap file.hwpx '/p[1]' '/p[2]'` | +| Column break | ✅ Yes | `add --type columnbreak --prop cols=2` | +| Watermark (image) | 🟡 Plan 98 active | `build-local/officecli` 1.0.42 기준 동작 확인. Opaque RGB 권장, 밝은 자산은 `bright=0`, `contrast=0` 권장 | +| Image anchor / floating picture | ✅ Yes | `add --type picture --prop anchor=page --prop halign=center --prop valign=middle` | +| Field types | ✅ Yes | `add --type author\|title\|lastsaveby\|filename` | +| Compare documents | ✅ Yes | `compare a.hwpx b.hwpx` (LCS-based diff + table comparison) | +| Security validation | ✅ Yes | ZIP bomb, path traversal, symlink, XXE defense | +| Form fill feedback | ✅ Yes | `set /table/fill` returns unmatched labels | +| Broken ZIP recovery | ✅ Yes | corrupted HWPX auto-recovery via Local File Header scan | +| HTML preview | ✅ Yes | `view html --browser` | +| Watch live preview | ✅ Yes | `watch file.hwpx` | +| Validate .hwpx | ✅ Yes | `validate` (9-level: ZIP, package, XML, IDRef, table, NS, BinData, field, section) | +| Raw XML | ✅ Yes | `raw`, `raw-set` | +| Open .hwp (binary) | 🟡 Capability-gated | Run `officecli hwp doctor --json` and `officecli capabilities --json`; use native rhwp read/render/mutate/create routes only when the specific operation is ready. Do not silently convert to `.hwpx`. | + +--- + +## Binary Location + +```bash +OFFICECLI="700_projects/cli-jaw/build-local/officecli" +# Build: cd 700_projects/cli-jaw/officecli && dotnet publish -c Release -r osx-arm64 -o ../build-local +``` + +--- + +## Core Commands + +### Create & Import & Merge + +```bash +officecli create doc.hwpx # 빈 문서 +officecli create doc.hwpx --from-markdown input.md # MD→HWPX (JUSTIFY 기본) +officecli create doc.hwpx --from-markdown input.md --align left # 왼쪽 정렬 +officecli merge template.hwpx output.hwpx --data '{"이름":"홍길동"}' # 템플릿 {{키}} 치환 +officecli merge template.hwpx output.hwpx --data data.json # JSON 파일 데이터 +``` + +### View Modes + +```bash +officecli view doc.hwpx text # 줄번호 텍스트 +officecli view doc.hwpx annotated # 경로+스타일 상세 +officecli view doc.hwpx outline # 제목만 +officecli view doc.hwpx stats # 문서 통계 +officecli view doc.hwpx html --browser # A4 HTML 미리보기 +officecli view doc.hwpx markdown # GFM 마크다운 변환 +officecli view doc.hwpx tables # 테이블 2D 그리드 + 라벨 맵 +officecli view doc.hwpx forms --auto # CLICK_HERE + label-value 자동 인식 +officecli view doc.hwpx forms --auto --json # AI 파이프라인용 JSON +officecli view doc.hwpx objects # picture/field/bookmark/equation 목록 +officecli view doc.hwpx objects --object-type field # 특정 타입 필터 +officecli view doc.hwpx styles # charPr/paraPr 스타일 +officecli view doc.hwpx issues # 9-level 검증 이슈 +``` + +### Edit + +```bash +officecli add doc.hwpx /section[1] --type paragraph --prop text="내용" --prop fontsize=11 +officecli add doc.hwpx /section[1] --type table --prop rows=3 --prop cols=4 +officecli set doc.hwpx '/section[1]/p[1]' --prop bold=true --prop align=CENTER +officecli set doc.hwpx / --prop find="old" --prop replace="new" +officecli remove doc.hwpx /section[1]/p[3] +``` + +### Image watermark + +```bash +700_projects/cli-jaw/build-local/officecli add doc.hwpx /section[1] \ + --type watermark \ + --prop src=/path/to/watermark.png \ + --prop bright=0 \ + --prop contrast=0 +``` + +Validation notes: +- Hancom에서 `v5`, `v5.1`, `v5.2` 세 변형 모두 표시 확인 +- 실패 원인은 XML 불일치가 아니라 **raster 특성 + watermark filter 조합**이었다 +- 투명 PNG는 피하고, **opaque RGB PNG**를 우선 사용 +- 매우 밝은/단순한 자산은 기본 `bright=70`, `contrast=-50`에서 희미해질 수 있음 +- 설치된 `~/.local/bin/officecli`가 `Unsupported element type: watermark`를 반환하면 최신 `build-local/officecli` 사용 또는 재설치 + +### Image anchor / floating picture + +```bash +# 기본: inline (글자처럼 취급) +officecli add doc.hwpx /section[1] --type picture --prop path=/path/to/image.png + +# 페이지 기준 정중앙 +officecli add doc.hwpx /section[1] --type picture \ + --prop path=/path/to/image.png \ + --prop anchor=page \ + --prop halign=center \ + --prop valign=middle \ + --prop width=10000 \ + --prop height=5000 + +# 페이지 기준 중앙에서 약간 이동 +officecli add doc.hwpx /section[1] --type picture \ + --prop path=/path/to/image.png \ + --prop anchor=page \ + --prop halign=center \ + --prop valign=middle \ + --prop x=1200 \ + --prop y=800 + +# 문단 기준 floating +officecli add doc.hwpx /section[1] --type picture \ + --prop path=/path/to/image.png \ + --prop anchor=para \ + --prop wrap=square \ + --prop halign=center \ + --prop y=1200 + +# 글 뒤로 +officecli add doc.hwpx /section[1] --type picture \ + --prop path=/path/to/image.png \ + --prop wrap=behind + +# 생성 후 위치/잠금 조정 +officecli set doc.hwpx '/section[1]/p[2]/run[1]/pic[1]' \ + --prop x=1111 --prop y=2222 --prop lock=1 --prop wrap=topbottom +``` + +Rules: +- `path`가 기본이며 `src`도 허용됨 +- `anchor=page`는 **용지 전체(PAPER)** 기준 offset 계산 +- `halign`/`valign`은 별도 정렬 enum이 아니라 `horzOffset`/`vertOffset` 계산으로 처리됨 +- `anchor=para`는 V1에서 본문 폭 기준 가로 배치 + `y` explicit only +- set 경로는 현재 `x`, `y`, `lock`, `wrap=topbottom`까지만 문서화한다 +- picture path는 `'/section[1]/p[N]/run[1]/pic[1]'` 형태를 사용 + +### Label Fill (테이블 자동 채우기) + +```bash +officecli set doc.hwpx / --prop 'fill:대표자=홍길동' --prop 'fill:연락처=010-1234' +officecli set doc.hwpx / --prop 'fill:주소>down=서울시' # 방향: right(기본), down, left, up +officecli set doc.hwpx /table/fill --prop '이름=김서준' # fill: prefix 생략 +``` + +### Query (확장 문법) + +```bash +officecli query doc.hwpx 'p' # 모든 단락 +officecli query doc.hwpx 'tc[text~=홍길동]' # 셀 텍스트 검색 +officecli query doc.hwpx 'run[bold=true]' # 굵은 글씨 +officecli query doc.hwpx 'p:has(tbl)' # 테이블 포함 단락 +officecli query doc.hwpx 'tbl > tr > tc[colSpan!=1]' # 병합 셀 +officecli query doc.hwpx 'run[fontsize>=20]' # 20pt 이상 +officecli query doc.hwpx 'p[heading=1]' # heading 1 +``` + +Operators: `=`, `!=`, `~=` (contains), `>=`, `<=` +Pseudo: `:empty`, `:contains(text)`, `:has(child)`, `:first`, `:last` +Virtual attrs: `text`, `bold`, `italic`, `fontsize`, `colSpan`, `rowSpan`, `heading` + +### Compare + +```bash +officecli compare a.hwpx b.hwpx # text diff (기본) +officecli compare a.hwpx b.hwpx --mode outline # heading diff +officecli compare a.hwpx b.hwpx --mode table --json # table diff JSON +``` + +### Watch + +```bash +officecli watch doc.hwpx # 파일 변경 시 HTML 자동 갱신 +officecli unwatch doc.hwpx # 중지 +``` + +### Validate + +```bash +officecli validate doc.hwpx +``` + +9-level: ZIP integrity, package (mimetype/rootfile/version), XML, IDRef, table structure, namespace, BinData orphan, field pairs, section count. + +--- + +## Key Workflows + +### 1. AI 양식 자동 채우기 (recognize → fill) + +```bash +officecli view form.hwpx forms --auto --json > fields.json # Step 1: 인식 +# Step 2: AI가 label→value 매핑 +officecli set form.hwpx /table/fill --prop '성 명=홍길동' # Step 3: 채우기 +``` + +> **Regulation docs** (운영지침 등): `forms --auto`로 label-value 인식 가능하지만, +> 체크박스 계층(□→○→-→*), 별첨 양식 참조, 비표준 heading은 Python 패턴매칭 필요. +> See "Document Classification & Pattern-Match Editing" section above. + +### 2. 테이블 구조 파악 → 편집 + +```bash +officecli view doc.hwpx tables # 2D 그리드 맵 +officecli query doc.hwpx 'tc[text~=대표자]' # 셀 검색 +officecli set doc.hwpx /table/fill --prop '대표자=홍길동' # label fill +``` + +### 3. Markdown 왕복 변환 + +```bash +officecli view doc.hwpx markdown > output.md # HWPX→MD +officecli create new.hwpx --from-markdown output.md # MD→HWPX +``` + +### 4. 템플릿 대량 문서 생성 + +```bash +# 템플릿에 {{키}} 플레이스홀더 넣고 → 데이터로 치환 +officecli merge template.hwpx 홍길동.hwpx --data '{"이름":"홍길동","날짜":"2026-04-12"}' +officecli merge template.hwpx 이지은.hwpx --data '{"이름":"이지은","날짜":"2026-04-12"}' +# 테이블 안의 {{키}}도 치환됨. 미해결 키는 보고됨. +``` + +### 5. 문서 비교 + +```bash +officecli compare before.hwpx after.hwpx --mode text +officecli compare before.hwpx after.hwpx --json > diff.json +``` + +--- + +## Document Classification & Pattern-Match Editing (Plan 90.999 + 99.7 + 99.9) + +> Updated 2026-04-14: Plan 99.9 Phase A-I fully implemented. + +officecli `view forms --auto` handles standard label-value detection. For **complex templates** +(KICE exams, regulation docs, checkbox hierarchies), use the Python pattern-match fallback: + +### Document Types (auto-classified) + +| Type | Key Signals | Example | +|------|------------|---------| +| `exam` | equation 10+, rect objects | KICE 수능/모의고사 시험지 | +| `form` | table 3+, checkboxes (□/■) | 대학 신청서, 정부 양식 | +| `regulation` | ○ bullets 10+, 별첨/조항 refs, table 10+ | 운영지침, 내규, 시행세칙 | +| `report` | long text, few tables | 보고서, 논문 | +| `mixed` | none of above | 사업계획서 | + +### Form Recognition (4 strategies) + +1. **Adjacent cell label-value** — original table label→value detection +2. **Header+data rows** — original column-header recognition +3. **In-cell patterns** (Phase B1) — `□` checkbox, `keyword( )` paren-blank, `(label: )` annotation +4. **KV table detection** (Phase B2) — 16 Korean keywords trigger auto-detection + +### Form Fill (3-phase pipeline) + +1. **In-cell patterns** (Phase B6) — checkbox `□`→`☑`, paren-blank, annotation fill +2. **Table label-value** (Phase B3) — exact + prefix 60% matching, 4-directional (`right`/`down`/`left`/`up`) +3. **Inline paragraph** (Phase B6) — regex lookbehind for `"label: value"` outside tables + +### Security Suite (Phase E) + +| Check | Limits | +|-------|--------| +| ZIP bomb | 1000 entries, 200 MB, 100:1 ratio | +| Path traversal | null byte, `..`, absolute, drive letter, symlink | +| XXE | `DtdProcessing.Prohibit` | +| Table size | 200 cols x 10000 rows | + +### Diff/Compare (Phase H) + +- **LCS DP alignment** (fallback greedy for >10M cells) +- **Table similarity**: dimension weight 0.3 + content weight 0.7 +- **Page range filtering**: `--pages "1-3,5"` + +### Text Quality (Phase F) + +- **Shape alt-text removal**: 50+ Korean shape names +- **PUA stripping**: 3 Unicode planes +- **Pseudo-table demotion**: rows <= 3 + empty >= 30% +- **GFM tilde escape** +- **Form confidence score** + +### Phase I Enhancements + +- **Unmatched label feedback** in fill results (labels without matching cells reported) +- **Broken ZIP recovery** via Local File Header scan +- **Font-size heading detection**: H1 >= 1.5x, H2 >= 1.3x, H3 >= 1.15x base size +- **LCS-based diff** for text and table comparison +- **Multi-`` in-cell replacement** (handles fragmented text nodes) + +### Regulation-Specific Patterns + +- **Checkbox hierarchy**: `□` (section) → `○` (item) → `-` (detail) → `*` (footnote) +- **Appendix references**: `[별첨 제N호]`, `[별지 N]` — linked to form templates +- **Digit-concatenated headings**: `"3지원금 집행기준"` (no space between number and title) +- **Uniform footer**: repeated identical footers → org extraction (e.g., "크림슨창업지원단장 귀하") + +### Verified Edit Workflow (lineseg strip) + +For direct XML editing outside officecli, strip ALL linesegarray → edit → repack: +```bash +# Python one-liner +python -c " +import zipfile, re +from lxml import etree +# ... strip linesegarray, edit nodes, repack ZIP +" +``` +**Verified on 4+ document types**: KICE exam (193 lineseg), application form (472p), regulation doc (599 lineseg, HWP→HWPX), 공문. +All opened correctly in Hancom after full lineseg strip + text edits. +Python CLI now has **12 commands**. + +### 98+ Regex Patterns (Plan 99.8) / 58 Implementation Tasks (Plan 99.9) + +Key patterns: lineseg strip (R1), checkbox (R6), label detect (R7-R8), uniform space normalization (R10), +checkbox hierarchy (R21), appendix ref (R22), digit-title concat (R23). +Plan 99.8 expanded to 98+ patterns. Plan 99.9 defined 58 implementation tasks (Phase A-I). +Full inventory → `devlog/_plan/office/hwp/plan/99.7-kice-regex-parsing-implementation.md`. + +### Exam XML Structure Patterns (시험지 특화) + +KICE 시험지는 일반 양식과 다른 XML 구조를 가짐: + +| Pattern | Description | Detection | +|---------|-------------|-----------| +| Page/Column breaks | `pageBreak="1"` / `columnBreak="1"` on `` | 페이지 경계 = 문제 그룹 경계 | +| p[0] Monster | secPr + colPr + title tbl + 문제1 텍스트 합체 | 첫 paragraph에 모든 것 | +| Equation interleaving | `` ↔ `` 교차 패턴 | 문제 텍스트 추출 시 equation 스킵 | +| Answer choices | `①` + 5 `` (5지선다) | 답안 paragraph 자동 감지 | +| Text fragmentation | 1-2자 단위 `` 분할 (HWP 변환) | 전체 텍스트 연결 후 매칭 | +| 2-column layout | `` | 시험지 고유 레이아웃 | + +**Equation editing via script**: Hancom equations are stored as `` text. +To modify an equation, replace the script text (Python or officecli find-replace): +```bash +# View all equations +officecli view exam.hwpx objects --type equation +# Edit via Python: modify text nodes, strip lineseg, repack ZIP +``` +KICE template at `/private/tmp/kice-full-edit-v2.hwpx` (836 equations, verified editable). + +**officecli가 커버하는 것**: `view text`, `view stats`, `view forms --auto`, `validate`, `add --type equation` +**Python fallback 필요**: 페이지 단위 삭제, 문제 텍스트 교체, section 축소 + +**검증 (2026-04-13)**: 2025 수능 수학 → 1페이지 4문제로 축소 + 텍스트 교체 + lineseg strip. Hancom OK. + +상세 → `hwp_recog/24-exam-xml-structure-patterns.md`, Plan 99.7. + +--- + +## Common Pitfalls + +| Pitfall | Correct Approach | +|---------|-----------------| +| `--props text=Hello` | `--prop text=Hello` — 반드시 singular `--prop` | +| `/body/p[1]` path | HWPX는 `/section[1]/p[1]` — body가 아닌 section 기반 | +| `.hwp` (binary) 열기 | 먼저 `officecli hwp doctor --json` / `officecli capabilities --json` 확인. native rhwp operation이 ready면 그대로 처리하고, not-ready면 typed dependency reason을 보고한다. `.hwpx` 변환은 사용자 승인 fallback일 때만 사용 | +| Unquoted `[N]` in shell | `"/section[1]/p[1]"` — 반드시 따옴표 | +| fontsize 미지정 | `--prop fontsize=11` 항상 명시 — charPr 오염 방지 | +| `--type formfield`를 build-local이 못 알아봄 | source tree prototype이 있어도 release acceptance 전까지는 blocked로 취급 | +| 테이블 수동 매핑 | `view tables` 한 줄로 대체 가능 | +| HWP→HWPX 변환 파일 텍스트 교체 | 문단 통째 `` → raw string replace 또는 paragraph-level 교체. p[0] 제목에 페이지번호 조각 `20` 포함 주의 | + +> Updated 2026-04-14: HWP→HWPX conversion editing limitations documented + +--- + +## Essential Rules + +1. **View mode 필수** — `officecli view file.hwpx` 만으로는 에러; `text`/`markdown`/`tables` 등 지정 +2. **경로 1-based** — `/section[1]/p[1]` +3. **경로 따옴표** — shell glob 방지 +4. **`--prop` singular** — `--props` 아님 +5. **fontsize 항상 명시** — charPr 0 오염 방지 +6. **편집 후 검증** — `view issues` + `validate` (9-level 동일 검사 범위) +7. **한글 자동 정규화** — PUA 제거, 균등 분배 축소 자동 적용 +8. **Transport parity** — CLI/Resident/MCP 모두 같은 view 모드 지원 (tables, markdown, objects, forms) diff --git a/skills/officecli-xlsx/SKILL.md b/skills/officecli-xlsx/SKILL.md index 149d37b44..01b0af620 100644 --- a/skills/officecli-xlsx/SKILL.md +++ b/skills/officecli-xlsx/SKILL.md @@ -143,7 +143,10 @@ officecli close "$FILE" officecli validate "$FILE" ``` -Verified: `validate` returns `no errors found`, `B5` resolves to `135000`. This is the shape of every build: open → set cells/formulas → format → close → validate. +Verified: `validate` returns `no errors found`, `B5` resolves to `135000` in +this simple fixture. This is the shape of every build: open → set +cells/formulas → format → close → validate. For formula-heavy work, add the +formula-error/recalc gate; structural validation alone is not delivery proof. ## CSV / bulk import diff --git a/src/officecli/BlankDocCreator.cs b/src/officecli/BlankDocCreator.cs index e14695ea9..7863eee22 100644 --- a/src/officecli/BlankDocCreator.cs +++ b/src/officecli/BlankDocCreator.cs @@ -7,6 +7,7 @@ using DocumentFormat.OpenXml.Wordprocessing; using DocumentFormat.OpenXml.Presentation; using OfficeCli.Core; +using OfficeCli.Handlers.Hwp; namespace OfficeCli; @@ -26,9 +27,15 @@ public static void Create(string path, string? locale = null, bool minimal = fal case ".pptx": CreatePowerPoint(path); break; + case ".hwpx": + CreateHwpx(path); + break; + case ".hwp": + HwpBlankCreator.Create(path); + break; default: if (TryCreateViaPlugin(path, ext)) break; - throw new NotSupportedException($"Unsupported file type: {ext}. Supported: .docx, .xlsx, .pptx, or any extension served by an installed format-handler plugin that implements `create`."); + throw new NotSupportedException($"Unsupported file type: {ext}. Supported: .docx, .xlsx, .pptx, .hwpx, experimental .hwp, or any extension served by an installed format-handler plugin that implements `create`."); } } @@ -567,6 +574,19 @@ private static void CreatePowerPoint(string path) OfficeCliMetadata.StampOnCreate(doc); } + private static void CreateHwpx(string path) + { + var asm = typeof(BlankDocCreator).Assembly; + // Try multiple resource name conventions + var resourceName = asm.GetManifestResourceNames() + .FirstOrDefault(n => n.EndsWith("base.hwpx", StringComparison.OrdinalIgnoreCase)) + ?? throw new InvalidOperationException("Embedded base.hwpx template not found in assembly resources."); + + using var stream = asm.GetManifestResourceStream(resourceName)!; + using var fs = File.Create(path); + stream.CopyTo(fs); + } + private static Shape CreateLayoutPlaceholder(uint id, string name, PlaceholderValues phType, long x, long y, long cx, long cy) { diff --git a/src/officecli/CommandBuilder.Add.Hwp.cs b/src/officecli/CommandBuilder.Add.Hwp.cs new file mode 100644 index 000000000..b404ae3be --- /dev/null +++ b/src/officecli/CommandBuilder.Add.Hwp.cs @@ -0,0 +1,105 @@ +// Copyright 2025 OfficeCLI (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using OfficeCli.Core; +using OfficeCli.Handlers.Hwp; + +namespace OfficeCli; + +static partial class CommandBuilder +{ + private static bool IsHwpTextAdd(string extension, string parentPath, string? type) + => string.Equals(extension, ".hwp", StringComparison.OrdinalIgnoreCase) + && string.Equals(parentPath, "/text", StringComparison.OrdinalIgnoreCase) + && (string.IsNullOrWhiteSpace(type) + || string.Equals(type, "text", StringComparison.OrdinalIgnoreCase) + || string.Equals(type, "paragraph", StringComparison.OrdinalIgnoreCase)); + + private static int HandleHwpTextAdd( + string inputPath, + HwpFormat format, + Dictionary properties, + bool json) + { + var value = FirstValue(properties, "value", "text", "content"); + var output = FirstValue(properties, "output", "out"); + if (string.IsNullOrEmpty(value) || string.IsNullOrWhiteSpace(output)) + { + var message = "HWP text add requires --prop value= --prop output=."; + if (json) Console.WriteLine(OutputFormatter.WrapEnvelopeError(message)); + else Console.Error.WriteLine(message); + return 1; + } + + if (!TryReadInt(properties, 0, out var section, "section", "sec") + || !TryReadInt(properties, 0, out var paragraph, "paragraph", "para", "p") + || !TryReadInt(properties, 0, out var offset, "offset", "off")) + { + var message = "HWP text add position props must be integers: section, paragraph, offset."; + if (json) Console.WriteLine(OutputFormatter.WrapEnvelopeError(message)); + else Console.Error.WriteLine(message); + return 1; + } + + var outputPath = Path.GetFullPath(output); + var outputDir = Path.GetDirectoryName(outputPath); + if (!string.IsNullOrEmpty(outputDir)) Directory.CreateDirectory(outputDir); + + var formatKey = format == HwpFormat.Hwp + ? HwpCapabilityConstants.FormatHwp + : HwpCapabilityConstants.FormatHwpx; + var engine = HwpEngineSelector.GetEngine(formatKey, HwpCapabilityConstants.OperationInsertText); + var request = new HwpInsertTextRequest( + format, + inputPath, + outputPath, + section, + paragraph, + offset, + value, + json); + var result = engine.InsertTextAsync(request, CancellationToken.None).GetAwaiter().GetResult(); + + if (json) + { + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = true, + ["message"] = $"Inserted HWP text -> {result.OutputPath}", + ["data"] = new System.Text.Json.Nodes.JsonObject + { + ["outputPath"] = result.OutputPath, + ["engine"] = result.Engine, + ["engineVersion"] = result.EngineVersion, + ["evidence"] = HwpCapabilityJsonMapper.ToJsonArray(result.Evidence), + ["transaction"] = result.Transaction?.DeepClone() + }, + ["warnings"] = HwpCapabilityJsonMapper.ToJsonArray(result.Warnings) + }; + Console.WriteLine(envelope.ToJsonString(OutputFormatter.PublicJsonOptions)); + } + else + { + Console.WriteLine($"Inserted HWP text -> {result.OutputPath}"); + foreach (var warning in result.Warnings) + Console.Error.WriteLine($"WARNING: {warning}"); + } + return 0; + } + + private static bool TryReadInt( + Dictionary properties, + int defaultValue, + out int value, + params string[] keys) + { + var raw = FirstValue(properties, keys); + if (string.IsNullOrWhiteSpace(raw)) + { + value = defaultValue; + return true; + } + + return int.TryParse(raw, out value); + } +} diff --git a/src/officecli/CommandBuilder.Add.cs b/src/officecli/CommandBuilder.Add.cs index c64b70838..e822ef879 100644 --- a/src/officecli/CommandBuilder.Add.cs +++ b/src/officecli/CommandBuilder.Add.cs @@ -4,6 +4,7 @@ using System.CommandLine; using OfficeCli.Core; using OfficeCli.Handlers; +using OfficeCli.Handlers.Hwp; using OfficeCli.Help; namespace OfficeCli; @@ -14,7 +15,7 @@ private static Command BuildAddCommand(Option jsonOption) { var addFileArg = new Argument("file") { Description = "Office document path (required even with open/close mode)" }; var addParentPathArg = new Argument("parent") { Description = "Parent DOM path (e.g. /body, /Sheet1, /slide[1])" }; - var addTypeOpt = new Option("--type") { Description = "Element type to add (e.g. paragraph, run, table, sheet, row, cell, slide, shape, picture, ole, video)" }; + var addTypeOpt = new Option("--type") { Description = "Element type to add (e.g. paragraph, run, table, formfield, sheet, row, cell, slide, shape, picture, ole, video)" }; var addFromOpt = new Option("--from") { Description = "Copy from an existing element path (e.g. /slide[1]/shape[2])" }; var addIndexOpt = new Option("--index") { @@ -171,6 +172,9 @@ private static Command BuildAddCommand(Option jsonOption) // Reuse ParsePropsArray so the inline and resident-server paths // stay in sync. var properties = ParsePropsArray(props); + var extension = file.Extension; + if (IsHwpTextAdd(extension, parentPath, type)) + return HandleHwpTextAdd(file.FullName, HwpFormat.Hwp, properties, json); // ARCHITECTURE(handler-as-truth): the handler is the single // source of truth for "is this prop supported". We pass the @@ -416,6 +420,7 @@ private static Command BuildSwapCommand(Option jsonOption) OfficeCli.Handlers.PowerPointHandler ppt => ppt.Swap(path1, path2), OfficeCli.Handlers.WordHandler word => word.Swap(path1, path2), OfficeCli.Handlers.ExcelHandler excel => excel.Swap(path1, path2), + OfficeCli.Handlers.HwpxHandler hwpx => hwpx.Swap(path1, path2), _ => throw new InvalidOperationException("swap not supported for this document type") }; var message = $"Swapped {p1} <-> {p2}"; diff --git a/src/officecli/CommandBuilder.Capabilities.cs b/src/officecli/CommandBuilder.Capabilities.cs new file mode 100644 index 000000000..15929d16e --- /dev/null +++ b/src/officecli/CommandBuilder.Capabilities.cs @@ -0,0 +1,38 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.CommandLine; +using OfficeCli.Handlers.Hwp; + +namespace OfficeCli; + +static partial class CommandBuilder +{ + private static Command BuildCapabilitiesCommand(Option jsonOption) + { + var command = new Command("capabilities", "Show machine-readable OfficeCLI capability information"); + command.Add(jsonOption); + + command.SetAction(result => + { + var json = result.GetValue(jsonOption); + return SafeRun(() => + { + var report = HwpCapabilityFactory.BuildReport(); + if (json) + { + Console.WriteLine(HwpCapabilityJsonMapper.BuildEnvelope(report).ToJsonString(OfficeCli.Core.OutputFormatter.PublicJsonOptions)); + return 0; + } + + Console.WriteLine($"OfficeCLI capabilities schema {report.SchemaVersion}"); + Console.WriteLine("HWP/HWPX support is gated by `officecli capabilities --json`."); + Console.WriteLine("Run `officecli help hwp` for rhwp bridge setup, examples, and support boundaries."); + Console.WriteLine("Binary .hwp mutations default to output=; safe in-place text replacement requires --in-place --backup --verify."); + return 0; + }, json); + }); + + return command; + } +} diff --git a/src/officecli/CommandBuilder.Compare.cs b/src/officecli/CommandBuilder.Compare.cs new file mode 100644 index 000000000..8e93af4bd --- /dev/null +++ b/src/officecli/CommandBuilder.Compare.cs @@ -0,0 +1,156 @@ +// Plan 84: Document Diff Workflow +using System.CommandLine; +using System.Text.Json.Nodes; +using OfficeCli.Core; +using OfficeCli.Handlers; + +namespace OfficeCli; + +static partial class CommandBuilder +{ + private static Command BuildCompareCommand(Option jsonOption) + { + var fileAArg = new Argument("fileA") { Description = "First document" }; + var fileBArg = new Argument("fileB") { Description = "Second document" }; + var modeOpt = new Option("--mode") { Description = "Diff mode: text, outline, table" }; + modeOpt.DefaultValueFactory = _ => "text"; + + var cmd = new Command("compare", "Compare two HWPX documents and show differences"); + cmd.Add(fileAArg); + cmd.Add(fileBArg); + cmd.Add(modeOpt); + cmd.Add(jsonOption); + + cmd.SetAction(result => { var json = result.GetValue(jsonOption); return SafeRun(() => + { + var fileA = result.GetValue(fileAArg)!; + var fileB = result.GetValue(fileBArg)!; + var mode = result.GetValue(modeOpt)!; + + using var handlerA = DocumentHandlerFactory.Open(fileA.FullName, editable: false); + using var handlerB = DocumentHandlerFactory.Open(fileB.FullName, editable: false); + + if (handlerA is not HwpxHandler hwpxA || handlerB is not HwpxHandler hwpxB) + throw new CliException("Compare is only supported for .hwpx files.") + { Code = "unsupported_type" }; + + var diff = CompareHwpx(hwpxA, hwpxB, mode); + + if (json) + Console.WriteLine(OutputFormatter.WrapEnvelope(diff.ToJsonString(OutputFormatter.PublicJsonOptions))); + else + Console.WriteLine(FormatDiffText(diff, mode)); + + return 0; + }, json); }); + + return cmd; + } + + private static JsonObject CompareHwpx(HwpxHandler a, HwpxHandler b, string mode) + { + var result = new JsonObject { ["mode"] = mode }; + + switch (mode.ToLowerInvariant()) + { + case "text": + { + var linesA = ExtractLines(a.ViewAsText()); + var linesB = ExtractLines(b.ViewAsText()); + result["diff"] = DiffLines(linesA, linesB); + break; + } + case "outline": + { + var linesA = ExtractLines(a.ViewAsOutline()); + var linesB = ExtractLines(b.ViewAsOutline()); + result["diff"] = DiffLines(linesA, linesB); + break; + } + case "table": + { + var tablesA = a.ViewAsTables(); + var tablesB = b.ViewAsTables(); + var linesA = ExtractLines(tablesA); + var linesB = ExtractLines(tablesB); + result["diff"] = DiffLines(linesA, linesB); + break; + } + default: + throw new CliException($"Unknown diff mode: {mode}. Available: text, outline, table") + { Code = "invalid_value" }; + } + + return result; + } + + private static string[] ExtractLines(string text) + => text.Split('\n').Select(l => + { + // Strip line numbers from ViewAsText output ("1. text" → "text") + var dot = l.IndexOf(". "); + if (dot > 0 && dot <= 5 && l[..dot].All(char.IsDigit)) + return l[(dot + 2)..]; + return l; + }).ToArray(); + + private static JsonArray DiffLines(string[] linesA, string[] linesB) + { + var diff = new JsonArray(); + int maxLen = Math.Max(linesA.Length, linesB.Length); + + // Simple line-by-line diff + var setA = new HashSet(linesA); + var setB = new HashSet(linesB); + + foreach (var line in linesA) + { + if (!setB.Contains(line) && !string.IsNullOrWhiteSpace(line)) + diff.Add(new JsonObject { ["status"] = "removed", ["text"] = line.Trim() }); + } + foreach (var line in linesB) + { + if (!setA.Contains(line) && !string.IsNullOrWhiteSpace(line)) + diff.Add(new JsonObject { ["status"] = "added", ["text"] = line.Trim() }); + } + + // Summary + var unchanged = linesA.Intersect(linesB).Count(l => !string.IsNullOrWhiteSpace(l)); + diff.Insert(0, new JsonObject { + ["summary"] = $"added={diff.Count(d => d?["status"]?.GetValue() == "added")}, removed={diff.Count(d => d?["status"]?.GetValue() == "removed")}, unchanged={unchanged}" + }); + + return diff; + } + + private static string FormatDiffText(JsonObject diff, string mode) + { + var sb = new System.Text.StringBuilder(); + sb.AppendLine($"Diff mode: {mode}"); + sb.AppendLine(); + + var diffArr = diff["diff"]?.AsArray(); + if (diffArr == null || diffArr.Count == 0) + { + sb.AppendLine("(no differences)"); + return sb.ToString().TrimEnd(); + } + + foreach (var item in diffArr) + { + if (item is not JsonObject obj) continue; + if (obj.ContainsKey("summary")) + { + sb.AppendLine(obj["summary"]!.GetValue()); + sb.AppendLine(); + continue; + } + var status = obj["status"]?.GetValue() ?? ""; + var text = obj["text"]?.GetValue() ?? ""; + var prefix = status switch { "added" => "+ ", "removed" => "- ", _ => " " }; + sb.AppendLine($"{prefix}{text}"); + } + + return sb.ToString().TrimEnd(); + } +} diff --git a/src/officecli/CommandBuilder.Help.Hwp.cs b/src/officecli/CommandBuilder.Help.Hwp.cs new file mode 100644 index 000000000..12fa9c22d --- /dev/null +++ b/src/officecli/CommandBuilder.Help.Hwp.cs @@ -0,0 +1,476 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Text.Json.Nodes; +using System.CommandLine; +using OfficeCli.Core; +using OfficeCli.Handlers.Hwp; + +namespace OfficeCli; + +static partial class CommandBuilder +{ + private static Command BuildHwpHelpCommand(Option jsonOption) + { + var command = new Command("hwp", "Show experimental HWP/rhwp setup, recipes, sidecar notes, and coverage policy"); + command.Aliases.Add("rhwp"); + command.Add(jsonOption); + command.Add(BuildHwpDoctorCommand(jsonOption)); + + command.SetAction(result => + { + var json = result.GetValue(jsonOption); + return SafeRun(() => + { + WriteHwpBridgeHelp("hwp", Console.Out, json); + return 0; + }, json); + }); + + return command; + } + + private static Command BuildHwpDoctorCommand(Option jsonOption) + { + var command = new Command("doctor", "Check experimental HWP/rhwp environment readiness"); + command.Add(jsonOption); + + command.SetAction(result => + { + var json = result.GetValue(jsonOption); + return SafeRun(() => + { + var report = BuildHwpDoctorReport(); + if (json) + { + writerJson(Console.Out, report); + return report["ok"]?.GetValue() == true ? 0 : 2; + } + + WriteHwpDoctorText(Console.Out, report); + return report["ok"]?.GetValue() == true ? 0 : 2; + }, json); + }); + + return command; + + static void writerJson(TextWriter writer, JsonObject report) + { + writer.WriteLine(OutputFormatter.WrapEnvelope(report.ToJsonString(OutputFormatter.PublicJsonOptions))); + } + } + + private static bool WriteHwpBridgeHelp(string topic, TextWriter writer, bool json) + { + if (!string.Equals(topic, "hwp", StringComparison.OrdinalIgnoreCase) + && !string.Equals(topic, "rhwp", StringComparison.OrdinalIgnoreCase)) + return false; + + if (json) + { + var commands = HwpCapabilityJsonMapper.ToJsonArray([ + "officecli capabilities --json", + "officecli create file.hwp --json", + "officecli view file.hwp text --json", + "officecli view file.hwp svg --page 1 --json", + "officecli view file.hwp fields --json", + "officecli view file.hwp field --field-name 회사명 --json", + "officecli set file.hwp /field --prop name=회사명 --prop value=리지 --prop output=out.hwp --json", + "officecli set file.hwp /text --prop find=마케팅 --prop value=브릿지 --prop output=out.hwp --json", + "officecli set file.hwp /text --prop find=마케팅 --prop value=브릿지 --in-place --backup --verify --json", + "officecli set file.hwp /table/cell --prop section=0 --prop parent-para=3 --prop control=0 --prop cell=0 --prop value=오피스셀 --prop output=out.hwp --json", + "officecli set file.hwpx /save-as-hwp --prop output=out.hwp --json" + ]); + var setup = HwpCapabilityJsonMapper.ToJsonArray([ + "run ./dev-install.sh to install officecli with rhwp sidecars", + "optional: export OFFICECLI_HWP_ENGINE=rhwp-experimental", + "export OFFICECLI_RHWP_BIN=/path/to/rhwp", + "export OFFICECLI_RHWP_BRIDGE_PATH=/path/to/rhwp-officecli-bridge.dll", + "export OFFICECLI_RHWP_API_BIN=/path/to/rhwp-field-bridge" + ]); + var unsupported = HwpCapabilityJsonMapper.ToJsonArray([ + "silently falling back to HWPX when the requested native .hwp operation is ready", + "ungated writes when the required rhwp sidecar or safe-save readback is unavailable", + "claiming corpus-wide round-trip fidelity without fixture evidence" + ]); + var requiredEnv = new JsonArray(); + requiredEnv.Add((JsonNode?)BuildEnvSpec("OFFICECLI_HWP_ENGINE", "rhwp-experimental", "optional override; packaged sidecars are auto-discovered when present")); + requiredEnv.Add((JsonNode?)BuildEnvSpec("OFFICECLI_RHWP_BIN", "/path/to/rhwp", "optional stock rhwp CLI fallback for text/SVG read-render")); + requiredEnv.Add((JsonNode?)BuildEnvSpec("OFFICECLI_RHWP_BRIDGE_PATH", "/path/to/rhwp-officecli-bridge.dll", "optional explicit C# bridge path invoked by OfficeCLI")); + requiredEnv.Add((JsonNode?)BuildEnvSpec("OFFICECLI_RHWP_API_BIN", "/path/to/rhwp-field-bridge", "optional explicit Rust rhwp API sidecar path for create/fields/text/table/export")); + var diagnostics = HwpCapabilityJsonMapper.ToJsonArray([ + "officecli hwp doctor --json", + "officecli capabilities --json", + "officecli view file.hwp text --json" + ]); + var recipes = new JsonObject + { + ["createBlank"] = "officecli create file.hwp --json", + ["readText"] = "officecli view file.hwp text --json", + ["renderSvg"] = "officecli view file.hwp svg --page 1 --json", + ["listFields"] = "officecli view file.hwp fields --json", + ["readField"] = "officecli view file.hwp field --field-name 회사명 --json", + ["setField"] = "officecli set file.hwp /field --prop name=회사명 --prop value=리지 --prop output=out.hwp --json", + ["replaceText"] = "officecli set file.hwp /text --prop find=마케팅 --prop value=브릿지 --prop output=out.hwp --json", + ["replaceTextInPlace"] = "officecli set file.hwp /text --prop find=마케팅 --prop value=브릿지 --in-place --backup --verify --json", + ["setTableCell"] = "officecli set file.hwp /table/cell --prop section=0 --prop parent-para=3 --prop control=0 --prop cell=0 --prop value=오피스셀 --prop output=out.hwp --json", + ["saveAsHwp"] = "officecli set file.hwpx /save-as-hwp --prop output=out.hwp --json" + }; + var policies = HwpCapabilityJsonMapper.ToJsonArray([ + "default mutation mode writes to --prop output=", + "in-place text replacement is opt-in only and requires --in-place --backup --verify", + "never combine --in-place with --prop output=", + "run officecli hwp doctor --json before HWP work", + "verify outputs with text/SVG/Hancom evidence before relying on them" + ]); + var data = new JsonObject + { + ["schemaVersion"] = 1, + ["topic"] = "hwp-rhwp", + ["status"] = "experimental", + ["setup"] = setup, + ["requiredEnv"] = requiredEnv, + ["setupCommands"] = setup.DeepClone(), + ["commands"] = commands, + ["recipes"] = recipes, + ["diagnostics"] = diagnostics, + ["policies"] = policies, + ["unsupported"] = unsupported, + ["capabilityProbe"] = "officecli capabilities --json", + ["doctor"] = "officecli hwp doctor --json" + }; + writer.WriteLine(OutputFormatter.WrapEnvelope(data.ToJsonString(OutputFormatter.PublicJsonOptions))); + return true; + } + + writer.WriteLine("HWP / rhwp Bridge Help (experimental)"); + writer.WriteLine(); + writer.WriteLine("Setup:"); + writer.WriteLine(" ./dev-install.sh"); + writer.WriteLine(" # optional explicit overrides:"); + writer.WriteLine(" export OFFICECLI_HWP_ENGINE=rhwp-experimental"); + writer.WriteLine(" export OFFICECLI_RHWP_BIN=/path/to/rhwp"); + writer.WriteLine(" export OFFICECLI_RHWP_BRIDGE_PATH=/path/to/rhwp-officecli-bridge.dll"); + writer.WriteLine(" export OFFICECLI_RHWP_API_BIN=/path/to/rhwp-field-bridge"); + writer.WriteLine(); + writer.WriteLine("Probe:"); + writer.WriteLine(" officecli hwp doctor --json"); + writer.WriteLine(" officecli capabilities --json"); + writer.WriteLine(); + writer.WriteLine("Top-level help:"); + writer.WriteLine(" officecli hwp"); + writer.WriteLine(" officecli rhwp"); + writer.WriteLine(); + writer.WriteLine("Read/render:"); + writer.WriteLine(" officecli create file.hwp --json"); + writer.WriteLine(" officecli view file.hwp text --json"); + writer.WriteLine(" officecli view file.hwp svg --page 1 --json"); + writer.WriteLine(" officecli view file.hwp fields --json"); + writer.WriteLine(" officecli view file.hwp field --field-name 회사명 --json"); + writer.WriteLine(); + writer.WriteLine("Mutation default writes a new output file:"); + writer.WriteLine(" officecli set file.hwp /field --prop name=회사명 --prop value=리지 --prop output=out.hwp --json"); + writer.WriteLine(" officecli set file.hwp /field --prop id=1584999796 --prop value=리지 --prop output=out.hwp --json"); + writer.WriteLine(" officecli set file.hwp /text --prop find=마케팅 --prop value=브릿지 --prop output=out.hwp --json"); + writer.WriteLine(" officecli set file.hwp /table/cell --prop section=0 --prop parent-para=3 --prop control=0 --prop cell=0 --prop value=오피스셀 --prop output=out.hwp --json"); + writer.WriteLine(" officecli set file.hwpx /save-as-hwp --prop output=out.hwp --json"); + writer.WriteLine(); + writer.WriteLine("Safe in-place text replacement (experimental, creates backup + manifest first):"); + writer.WriteLine(" officecli set file.hwp /text --prop find=마케팅 --prop value=브릿지 --in-place --backup --verify --json"); + writer.WriteLine(); + writer.WriteLine("Sidecar binaries used by the experimental bridge:"); + writer.WriteLine(" rhwp-officecli-bridge C# bridge for OfficeCLI HWP routing"); + writer.WriteLine(" rhwp-field-bridge Rust rhwp API sidecar for create/read/render/fields/text/table/export"); + writer.WriteLine(); + writer.WriteLine("Coverage policy:"); + writer.WriteLine(" - operation-gated rhwp coverage; if rhwp can do it and sidecars expose it, OfficeCLI should wire it"); + writer.WriteLine(" - experimental until fixture/corpus evidence promotes each operation"); + writer.WriteLine(" - default mutations write to --prop output="); + writer.WriteLine(" - in-place text replacement requires --in-place --backup --verify"); + writer.WriteLine(" - never combine --in-place with --prop output="); + writer.WriteLine(" - table cell mutation currently uses explicit rhwp coordinates; broader table discovery should be wired when rhwp exposes it"); + writer.WriteLine(" - verify outputs with text/SVG/Hancom round-trip evidence before relying on them"); + return true; + } + + private static JsonObject BuildHwpDoctorReport() + { + var runtime = HwpRuntimeProbe.Probe(); + var checks = new JsonArray(); + checks.Add((JsonNode?)BuildRuntimeCheck( + "OFFICECLI_HWP_ENGINE", + runtime.EngineRequested || runtime.BridgeAvailable || runtime.ApiAvailable || runtime.RhwpAvailable, + Environment.GetEnvironmentVariable("OFFICECLI_HWP_ENGINE"), + "optional when rhwp sidecars are installed beside officecli", + "export OFFICECLI_HWP_ENGINE=rhwp-experimental")); + checks.Add((JsonNode?)BuildRuntimeCheck( + "rhwp-officecli-bridge", + runtime.BridgeAvailable, + runtime.BridgePath, + "C# bridge sidecar for existing-file read/render/mutation; not required for blank .hwp create", + "run ./dev-install.sh or export OFFICECLI_RHWP_BRIDGE_PATH=/path/to/rhwp-officecli-bridge")); + checks.Add((JsonNode?)BuildRuntimeCheck( + "rhwp-field-bridge", + runtime.ApiAvailable, + runtime.ApiPath, + "Rust rhwp API sidecar for .hwp create/mutate/export and direct read/render", + "run ./dev-install.sh or export OFFICECLI_RHWP_API_BIN=/path/to/rhwp-field-bridge")); + checks.Add((JsonNode?)BuildRuntimeCheck( + "read-render runtime", + runtime.ApiAvailable || runtime.RhwpAvailable, + runtime.ApiPath ?? runtime.RhwpPath, + "rhwp-field-bridge is preferred; stock rhwp is accepted only as read/render fallback", + "install rhwp-field-bridge beside officecli, or export OFFICECLI_RHWP_BIN=/path/to/rhwp")); + + var operations = new JsonObject + { + ["read_text"] = BuildDoctorOperation( + runtime.ReadRenderAvailable, + "existing .hwp text extraction", + runtime.ReadRenderAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge or stock rhwp", + "officecli view file.hwp text --json"), + ["render_svg"] = BuildDoctorOperation( + runtime.ReadRenderAvailable, + "existing .hwp SVG/page render", + runtime.ReadRenderAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge or stock rhwp", + "officecli view file.hwp svg --page 1 --json"), + ["render_png"] = BuildDoctorOperation( + runtime.RenderPngAvailable, + "existing .hwp PNG/page render", + runtime.RenderPngAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with render-png support", + "officecli view file.hwp png --page 1 --out /tmp/hwp-png --json"), + ["export_pdf"] = BuildDoctorOperation( + runtime.ExportPdfAvailable, + "existing .hwp PDF export", + runtime.ExportPdfAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with export-pdf support", + "officecli view file.hwp pdf --page 1 --out out.pdf --json"), + ["export_markdown"] = BuildDoctorOperation( + runtime.ExportMarkdownAvailable, + "existing .hwp markdown export", + runtime.ExportMarkdownAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with export-markdown support", + "officecli view file.hwp markdown --json"), + ["document_info"] = BuildDoctorOperation( + runtime.DocumentInfoAvailable, + "existing .hwp document information", + runtime.DocumentInfoAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with document-info support", + "officecli view file.hwp info --json"), + ["diagnostics"] = BuildDoctorOperation( + runtime.DiagnosticsAvailable, + "existing .hwp provider diagnostics", + runtime.DiagnosticsAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with diagnostics support", + "officecli view file.hwp diagnostics --json"), + ["dump_controls"] = BuildDoctorOperation( + runtime.DumpControlsAvailable, + "existing .hwp full control/document dump diagnostics", + runtime.DumpControlsAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with dump-controls support", + "officecli view file.hwp dump --json"), + ["dump_pages"] = BuildDoctorOperation( + runtime.DumpPagesAvailable, + "existing .hwp page dump diagnostics", + runtime.DumpPagesAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with dump-pages support", + "officecli view file.hwp pages --page 1 --json"), + ["thumbnail"] = BuildDoctorOperation( + runtime.ThumbnailAvailable, + "existing .hwp thumbnail extraction", + runtime.ThumbnailAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with thumbnail support", + "officecli view file.hwp thumbnail --out thumb.png --json"), + ["mutate_output"] = BuildDoctorOperation( + runtime.MutationAvailable, + "aggregate .hwp output-first mutation runtime", + runtime.MutationAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge", + "officecli hwp doctor --json"), + ["list_fields"] = BuildDoctorOperation( + runtime.ListFieldsAvailable, + "existing .hwp field listing", + runtime.ListFieldsAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with list-fields support", + "officecli view file.hwp fields --json"), + ["read_field"] = BuildDoctorOperation( + runtime.ReadFieldAvailable, + "existing .hwp field read", + runtime.ReadFieldAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with get-field support", + "officecli view file.hwp field --field-name name --json"), + ["fill_field"] = BuildDoctorOperation( + runtime.FillFieldAvailable, + "existing .hwp output-first field fill", + runtime.FillFieldAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with set-field support", + "officecli set file.hwp /field --prop name=field --prop value=TEXT --prop output=out.hwp --json"), + ["replace_text"] = BuildDoctorOperation( + runtime.ReplaceTextAvailable, + "existing .hwp output-first text replacement", + runtime.ReplaceTextAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with replace-text support", + "officecli set file.hwp /text --prop find=OLD --prop value=NEW --prop output=out.hwp --json"), + ["insert_text"] = BuildDoctorOperation( + runtime.InsertTextAvailable, + "existing .hwp output-first body text insertion", + runtime.InsertTextAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with insert-text support", + "officecli add file.hwp /text --type paragraph --prop value=TEXT --prop output=out.hwp --json"), + ["read_table_cell"] = BuildDoctorOperation( + runtime.ReadTableCellAvailable, + "existing .hwp table cell read by rhwp coordinates", + runtime.ReadTableCellAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with get-cell-text support", + "officecli view file.hwp table-cell --section 0 --parent-para 3 --control 0 --cell 0 --cell-para 0 --json"), + ["scan_cells"] = BuildDoctorOperation( + runtime.ScanCellsAvailable, + "bounded .hwp table cell scan", + runtime.ScanCellsAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with scan-cells support", + "officecli view file.hwp tables --section 0 --json"), + ["set_table_cell"] = BuildDoctorOperation( + runtime.SetTableCellAvailable, + "existing .hwp output-first table cell mutation", + runtime.SetTableCellAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with set-cell-text support", + "officecli set file.hwp /table/cell --prop section=0 --prop parentPara=3 --prop control=0 --prop cell=0 --prop value=TEXT --prop output=out.hwp --json"), + ["convert_to_editable"] = BuildDoctorOperation( + runtime.ConvertToEditableAvailable, + "existing distribution/read-only .hwp -> editable .hwp conversion", + runtime.ConvertToEditableAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with convert-to-editable support", + "officecli set file.hwp /convert-to-editable --prop output=editable.hwp --json"), + ["native_read"] = BuildDoctorOperation( + runtime.NativeOpAvailable, + "rhwp native read/query API escape hatch", + runtime.NativeOpAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with native-op support", + "officecli view file.hwp native --op get-style-list --json"), + ["native_mutation"] = BuildDoctorOperation( + runtime.NativeOpAvailable, + "rhwp native output-first mutation API escape hatch", + runtime.NativeOpAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with native-op support", + "officecli set file.hwp /native-op --prop op=split-paragraph --prop paragraph=0 --prop offset=5 --prop output=out.hwp --json"), + ["create_blank"] = BuildDoctorOperation( + runtime.CreateBlankAvailable, + "new blank binary .hwp creation", + runtime.CreateBlankAvailable ? null : "requires rhwp-field-bridge", + "officecli create file.hwp --json"), + ["save_as_hwp"] = BuildDoctorOperation( + runtime.SaveAsHwpAvailable, + ".hwpx/.hwp save-as-hwp bridge export", + runtime.SaveAsHwpAvailable ? null : "requires rhwp-officecli-bridge plus rhwp-field-bridge with save-as-hwp support", + "officecli set file.hwpx /save-as-hwp --prop output=out.hwp --json") + }; + + var ok = runtime.ReadRenderAvailable || runtime.MutationAvailable || runtime.CreateBlankAvailable; + return new JsonObject + { + ["schemaVersion"] = 1, + ["topic"] = "hwp-rhwp-doctor", + ["ok"] = ok, + ["autoDiscovery"] = new JsonObject + { + ["bridgePath"] = runtime.BridgePath, + ["apiPath"] = runtime.ApiPath, + ["rhwpPath"] = runtime.RhwpPath, + ["readRenderAvailable"] = runtime.ReadRenderAvailable, + ["mutationAvailable"] = runtime.MutationAvailable, + ["createBlankAvailable"] = runtime.CreateBlankAvailable, + ["listFieldsAvailable"] = runtime.ListFieldsAvailable, + ["readFieldAvailable"] = runtime.ReadFieldAvailable, + ["fillFieldAvailable"] = runtime.FillFieldAvailable, + ["replaceTextAvailable"] = runtime.ReplaceTextAvailable, + ["insertTextAvailable"] = runtime.InsertTextAvailable, + ["renderPngAvailable"] = runtime.RenderPngAvailable, + ["exportPdfAvailable"] = runtime.ExportPdfAvailable, + ["exportMarkdownAvailable"] = runtime.ExportMarkdownAvailable, + ["documentInfoAvailable"] = runtime.DocumentInfoAvailable, + ["diagnosticsAvailable"] = runtime.DiagnosticsAvailable, + ["dumpControlsAvailable"] = runtime.DumpControlsAvailable, + ["dumpPagesAvailable"] = runtime.DumpPagesAvailable, + ["thumbnailAvailable"] = runtime.ThumbnailAvailable, + ["readTableCellAvailable"] = runtime.ReadTableCellAvailable, + ["scanCellsAvailable"] = runtime.ScanCellsAvailable, + ["setTableCellAvailable"] = runtime.SetTableCellAvailable, + ["convertToEditableAvailable"] = runtime.ConvertToEditableAvailable, + ["nativeOpAvailable"] = runtime.NativeOpAvailable, + ["saveAsHwpAvailable"] = runtime.SaveAsHwpAvailable, + ["apiCommands"] = HwpCapabilityJsonMapper.ToJsonArray(runtime.ApiCommands.Order(StringComparer.Ordinal).ToArray()) + }, + ["operations"] = operations, + ["checks"] = checks, + ["nextCommand"] = ok ? "officecli capabilities --json" : "officecli help hwp", + ["capabilityProbe"] = "officecli capabilities --json" + }; + } + + private static JsonObject BuildEnvSpec(string name, string example, string purpose) + => new() + { + ["name"] = name, + ["example"] = example, + ["purpose"] = purpose + }; + + private static JsonObject BuildEnvCheck(string name, string? expectedValue, bool requireFile, string hint) + { + var value = Environment.GetEnvironmentVariable(name); + var isSet = !string.IsNullOrWhiteSpace(value); + var expectedMatches = expectedValue is null + || string.Equals(value, expectedValue, StringComparison.OrdinalIgnoreCase); + var fileExists = !requireFile || (isSet && File.Exists(value)); + var ok = isSet && expectedMatches && fileExists; + + return new JsonObject + { + ["name"] = name, + ["ok"] = ok, + ["isSet"] = isSet, + ["value"] = value, + ["expected"] = expectedValue, + ["requiresExistingFile"] = requireFile, + ["fileExists"] = requireFile ? fileExists : null, + ["hint"] = ok ? null : hint + }; + } + + private static JsonObject BuildRuntimeCheck( + string name, + bool ok, + string? value, + string detail, + string hint) + => new() + { + ["name"] = name, + ["ok"] = ok, + ["isSet"] = !string.IsNullOrWhiteSpace(value), + ["value"] = value, + ["detail"] = detail, + ["hint"] = ok ? null : hint + }; + + private static JsonObject BuildDoctorOperation( + bool ready, + string detail, + string? blockedBy, + string example) + => new() + { + ["ready"] = ready, + ["detail"] = detail, + ["blockedBy"] = blockedBy, + ["example"] = example + }; + + private static void WriteHwpDoctorText(TextWriter writer, JsonObject report) + { + var ok = report["ok"]?.GetValue() == true; + writer.WriteLine(ok ? "HWP/rhwp doctor: OK" : "HWP/rhwp doctor: NOT READY"); + writer.WriteLine(); + foreach (var check in report["checks"]!.AsArray().OfType()) + { + var mark = check["ok"]?.GetValue() == true ? "ok" : "missing"; + writer.WriteLine($" {mark}: {check["name"]?.GetValue()}"); + var hint = check["hint"]?.GetValue(); + if (!string.IsNullOrWhiteSpace(hint)) + writer.WriteLine($" {hint}"); + } + writer.WriteLine(); + foreach (var op in report["operations"]!.AsObject()) + { + var opData = op.Value!.AsObject(); + var mark = opData["ready"]?.GetValue() == true ? "ready" : "blocked"; + writer.WriteLine($" {mark}: {op.Key}"); + var blockedBy = opData["blockedBy"]?.GetValue(); + if (!string.IsNullOrWhiteSpace(blockedBy)) + writer.WriteLine($" {blockedBy}"); + } + writer.WriteLine(); + writer.WriteLine($"Next: {report["nextCommand"]?.GetValue()}"); + } +} diff --git a/src/officecli/CommandBuilder.Help.cs b/src/officecli/CommandBuilder.Help.cs index de203d57e..acdc30435 100644 --- a/src/officecli/CommandBuilder.Help.cs +++ b/src/officecli/CommandBuilder.Help.cs @@ -313,6 +313,9 @@ private static int RunHelp(string? format, string? verb, string? element, bool j || EarlyDispatchHelp.ContainsKey(format) || string.Equals(format, "skill", StringComparison.OrdinalIgnoreCase))) { + if (WriteHwpBridgeHelp(format, Console.Out, json)) + return 0; + if (WriteEarlyDispatchUsage(format, Console.Out)) return 0; diff --git a/src/officecli/CommandBuilder.Import.cs b/src/officecli/CommandBuilder.Import.cs index 01d6175f9..43aff39da 100644 --- a/src/officecli/CommandBuilder.Import.cs +++ b/src/officecli/CommandBuilder.Import.cs @@ -120,11 +120,13 @@ private static Command BuildImportCommand(Option jsonOption) private static Command BuildCreateCommand(Option jsonOption) { - var createFileArg = new Argument("file") { Description = "Output file path (.docx, .xlsx, .pptx)" }; - var createTypeOpt = new Option("--type") { Description = "Document type (docx, xlsx, pptx) — optional, inferred from file extension" }; + var createFileArg = new Argument("file") { Description = "Output file path (.docx, .xlsx, .pptx, .hwpx, experimental .hwp)" }; + var createTypeOpt = new Option("--type") { Description = "Document type (docx, xlsx, pptx, hwpx, hwp) — optional, inferred from file extension" }; var createForceOpt = new Option("--force") { Description = "Overwrite an existing file." }; var createLocaleOpt = new Option("--locale") { Description = "Locale tag (e.g. zh-CN, ja, ko, ar, he) — sets per-script default fonts in docDefaults. Without it, host application's UI-locale fallback applies. Currently only honored for .docx." }; var createMinimalOpt = new Option("--minimal") { Description = "(.docx only) Skip Word's Normal.dotm-style baseline (Calibri 11pt + Normal style + theme1.xml) and emit a raw OOXML-spec docx instead. Use for testing edge cases or producing maximally compact output. Without this flag, the doc carries Word-aligned defaults so it renders identically in Word, LibreOffice, and the cli preview." }; + var fromMarkdownOpt = new Option("--from-markdown") { Description = "Import content from a Markdown file (.md) into the new document (hwpx only)" }; + var alignOpt = new Option("--align") { Description = "Text alignment for imported content: justify (default), left, center, right" }; var createCommand = new Command("create", "Create a blank Office document"); createCommand.Aliases.Add("new"); createCommand.Add(createFileArg); @@ -132,6 +134,8 @@ private static Command BuildCreateCommand(Option jsonOption) createCommand.Add(createForceOpt); createCommand.Add(createLocaleOpt); createCommand.Add(createMinimalOpt); + createCommand.Add(fromMarkdownOpt); + createCommand.Add(alignOpt); createCommand.Add(jsonOption); createCommand.SetAction(result => { var json = result.GetValue(jsonOption); return SafeRun(() => @@ -141,6 +145,7 @@ private static Command BuildCreateCommand(Option jsonOption) var force = result.GetValue(createForceOpt); var locale = result.GetValue(createLocaleOpt); var minimal = result.GetValue(createMinimalOpt); + var fromMarkdown = result.GetValue(fromMarkdownOpt); // If file has no extension but --type is provided, append it if (!string.IsNullOrEmpty(type) && string.IsNullOrEmpty(Path.GetExtension(file))) @@ -177,6 +182,21 @@ private static Command BuildCreateCommand(Option jsonOption) } OfficeCli.BlankDocCreator.Create(file, locale, minimal); + + // Plan 85: Import Markdown content into the new document (hwpx only) + if (fromMarkdown != null) + { + if (!Path.GetExtension(file).Equals(".hwpx", StringComparison.OrdinalIgnoreCase)) + throw new CliException("--from-markdown is only supported for .hwpx files.") + { Code = "unsupported_type" }; + + var mdContent = File.ReadAllText(fromMarkdown.FullName, Encoding.UTF8); + var align = result.GetValue(alignOpt); + using var handler = new OfficeCli.Handlers.HwpxHandler(Path.GetFullPath(file), editable: true); + var blockCount = handler.ImportMarkdown(mdContent, align); + if (!json) + Console.WriteLine($"Imported {blockCount} blocks from {fromMarkdown.Name}"); + } var fullCreatedPath = Path.GetFullPath(file); // Best-effort: auto-start a short-lived resident process so @@ -221,7 +241,7 @@ private static Command BuildCreateCommand(Option jsonOption) private static Command BuildMergeCommand(Option jsonOption) { - var mergeTemplateArg = new Argument("template") { Description = "Template file path (.docx, .xlsx, .pptx) with {{key}} placeholders" }; + var mergeTemplateArg = new Argument("template") { Description = "Template file path (.docx, .xlsx, .pptx, .hwpx) with {{key}} placeholders" }; var mergeOutputArg = new Argument("output") { Description = "Output file path" }; var mergeDataOpt = new Option("--data") { Description = "JSON data or path to .json file", Required = true }; var mergeCommand = new Command("merge", "Merge template with JSON data, replacing {{key}} placeholders"); diff --git a/src/officecli/CommandBuilder.Schema.cs b/src/officecli/CommandBuilder.Schema.cs new file mode 100644 index 000000000..621f976f4 --- /dev/null +++ b/src/officecli/CommandBuilder.Schema.cs @@ -0,0 +1,154 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.CommandLine; +using System.Text.Json.Nodes; +using OfficeCli.Core; +using OfficeCli.Help; + +namespace OfficeCli; + +static partial class CommandBuilder +{ + private static Command BuildSchemaCommand(Option jsonOption) + { + var command = new Command("schema", "List and validate embedded OfficeCLI help schemas"); + command.Add(BuildSchemaListCommand(jsonOption)); + command.Add(BuildSchemaValidateCommand(jsonOption)); + return command; + } + + private static Command BuildSchemaListCommand(Option jsonOption) + { + var formatOption = new Option("--format") + { + Description = "Restrict to one format: docx, xlsx, pptx, hwpx, or hwp", + }; + var command = new Command("list", "List embedded schema-help entries"); + command.Add(formatOption); + command.Add(jsonOption); + + command.SetAction(result => + { + var json = result.GetValue(jsonOption); + return SafeRun(() => + { + var format = result.GetValue(formatOption); + var formats = string.IsNullOrWhiteSpace(format) + ? SchemaHelpLoader.ListFormats() + : new[] { SchemaHelpLoader.NormalizeFormat(format!) }; + var entries = BuildSchemaListEntries(formats); + + if (json) + { + var data = new JsonObject + { + ["schemaVersion"] = 1, + ["formats"] = BuildFormatArray(formats), + ["entries"] = entries, + }; + Console.WriteLine(OutputFormatter.WrapEnvelope(data.ToJsonString(OutputFormatter.PublicJsonOptions))); + } + else + { + foreach (var entry in entries) + { + Console.WriteLine($"{entry!["format"]!.GetValue()} {entry["element"]!.GetValue()}"); + } + } + return 0; + }, json); + }); + return command; + } + + private static Command BuildSchemaValidateCommand(Option jsonOption) + { + var formatOption = new Option("--format") + { + Description = "Restrict validation to one format: docx, xlsx, pptx, hwpx, or hwp", + }; + var command = new Command("validate", "Parse embedded schemas and report load errors"); + command.Add(formatOption); + command.Add(jsonOption); + + command.SetAction(result => + { + var json = result.GetValue(jsonOption); + return SafeRun(() => + { + var format = result.GetValue(formatOption); + var formats = string.IsNullOrWhiteSpace(format) + ? SchemaHelpLoader.ListFormats() + : new[] { SchemaHelpLoader.NormalizeFormat(format!) }; + var errors = new JsonArray(); + var checkedCount = 0; + + foreach (var fmt in formats) + { + foreach (var element in SchemaHelpLoader.ListElements(fmt)) + { + checkedCount++; + try + { + using var _ = SchemaHelpLoader.LoadSchema(fmt, element); + } + catch (Exception ex) + { + errors.Add((JsonNode?)new JsonObject + { + ["format"] = fmt, + ["element"] = element, + ["message"] = ex.Message, + }); + } + } + } + + if (json) + { + var data = new JsonObject + { + ["schemaVersion"] = 1, + ["checked"] = checkedCount, + ["ok"] = errors.Count == 0, + ["errors"] = errors, + }; + Console.WriteLine(OutputFormatter.WrapEnvelope(data.ToJsonString(OutputFormatter.PublicJsonOptions))); + } + else + { + Console.WriteLine(errors.Count == 0 + ? $"Schema validation OK ({checkedCount} entries)" + : $"Schema validation failed ({errors.Count} errors / {checkedCount} entries)"); + } + return errors.Count == 0 ? 0 : 1; + }, json); + }); + return command; + } + + private static JsonArray BuildSchemaListEntries(IEnumerable formats) + { + var entries = new JsonArray(); + foreach (var fmt in formats) + { + foreach (var element in SchemaHelpLoader.ListElements(fmt)) + { + entries.Add((JsonNode?)new JsonObject + { + ["format"] = fmt, + ["element"] = element, + }); + } + } + return entries; + } + + private static JsonArray BuildFormatArray(IEnumerable formats) + { + var array = new JsonArray(); + foreach (var format in formats) array.Add((JsonNode?)JsonValue.Create(format)); + return array; + } +} diff --git a/src/officecli/CommandBuilder.Set.Hwp.cs b/src/officecli/CommandBuilder.Set.Hwp.cs new file mode 100644 index 000000000..73542f190 --- /dev/null +++ b/src/officecli/CommandBuilder.Set.Hwp.cs @@ -0,0 +1,383 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using OfficeCli.Core; +using OfficeCli.Handlers.Hwp; + +namespace OfficeCli; + +static partial class CommandBuilder +{ + private static int HandleHwpFieldSet( + string inputPath, + HwpFormat format, + Dictionary properties, + bool json) + { + var fieldName = FirstValue(properties, "name", "field", "field-name"); + var fieldIdRaw = FirstValue(properties, "id", "field-id", "fieldId"); + var value = FirstValue(properties, "value", "text"); + var output = FirstValue(properties, "output", "out"); + if ((string.IsNullOrWhiteSpace(fieldName) && string.IsNullOrWhiteSpace(fieldIdRaw)) + || value == null || string.IsNullOrWhiteSpace(output)) + { + var message = "HWP field set requires --prop name= or --prop id=, plus --prop value= --prop output=."; + if (json) Console.WriteLine(OutputFormatter.WrapEnvelopeError(message)); + else Console.Error.WriteLine(message); + return 1; + } + if (!string.IsNullOrWhiteSpace(fieldIdRaw) && !int.TryParse(fieldIdRaw, out _)) + { + var message = $"Invalid HWP field id '{fieldIdRaw}'."; + if (json) Console.WriteLine(OutputFormatter.WrapEnvelopeError(message)); + else Console.Error.WriteLine(message); + return 1; + } + + var outputPath = Path.GetFullPath(output); + var outputDir = Path.GetDirectoryName(outputPath); + if (!string.IsNullOrEmpty(outputDir)) Directory.CreateDirectory(outputDir); + + var formatKey = format == HwpFormat.Hwp + ? HwpCapabilityConstants.FormatHwp + : HwpCapabilityConstants.FormatHwpx; + var engine = HwpEngineSelector.GetEngine(formatKey, HwpCapabilityConstants.OperationFillField); + var nameFields = string.IsNullOrWhiteSpace(fieldName) + ? new Dictionary() + : new Dictionary { [fieldName] = value }; + var idFields = string.IsNullOrWhiteSpace(fieldIdRaw) + ? new Dictionary() + : new Dictionary { [int.Parse(fieldIdRaw)] = value }; + var request = new HwpFillFieldRequest( + format, + inputPath, + outputPath, + nameFields, + json) + { + FieldIds = idFields + }; + var result = engine.FillFieldAsync(request, CancellationToken.None).GetAwaiter().GetResult(); + var fieldLabel = !string.IsNullOrWhiteSpace(fieldName) ? fieldName : $"#{fieldIdRaw}"; + + if (json) + { + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = true, + ["message"] = $"Updated HWP field '{fieldLabel}' -> {result.OutputPath}", + ["data"] = new System.Text.Json.Nodes.JsonObject + { + ["outputPath"] = result.OutputPath, + ["engine"] = result.Engine, + ["engineVersion"] = result.EngineVersion, + ["evidence"] = HwpCapabilityJsonMapper.ToJsonArray(result.Evidence) + }, + ["warnings"] = HwpCapabilityJsonMapper.ToJsonArray(result.Warnings) + }; + Console.WriteLine(envelope.ToJsonString(OutputFormatter.PublicJsonOptions)); + } + else + { + Console.WriteLine($"Updated HWP field '{fieldLabel}' -> {result.OutputPath}"); + foreach (var warning in result.Warnings) + Console.Error.WriteLine($"WARNING: {warning}"); + } + return 0; + } + + private static int HandleHwpTextReplace( + string inputPath, + HwpFormat format, + Dictionary properties, + bool json, + bool inPlace, + bool backup, + bool verify) + { + var query = FirstValue(properties, "find", "query", "old"); + var value = FirstValue(properties, "value", "text", "new"); + var output = FirstValue(properties, "output", "out"); + var mode = FirstValue(properties, "mode") ?? "one"; + var caseSensitiveRaw = FirstValue(properties, "case-sensitive", "caseSensitive") ?? "false"; + if (string.IsNullOrEmpty(query) || value == null || (!inPlace && string.IsNullOrWhiteSpace(output))) + { + var message = "HWP text replace requires --prop find= --prop value= plus --prop output= or --in-place."; + if (json) Console.WriteLine(OutputFormatter.WrapEnvelopeError(message)); + else Console.Error.WriteLine(message); + return 1; + } + var formatKey = format == HwpFormat.Hwp + ? HwpCapabilityConstants.FormatHwp + : HwpCapabilityConstants.FormatHwpx; + if (inPlace && !string.IsNullOrWhiteSpace(output)) + return WriteHwpTextReplacePolicyError( + json, + formatKey, + "hwp_in_place_output_conflict", + "Use either --in-place or --prop output=, not both.", + "Remove --prop output or remove --in-place.", + "officecli help hwp"); + if (inPlace && !backup) + return WriteHwpTextReplacePolicyError( + json, + formatKey, + "hwp_in_place_requires_backup", + "HWP in-place text replacement requires --backup.", + "Add --backup or use --prop output= instead.", + "officecli help hwp"); + if (inPlace && !verify) + return WriteHwpTextReplacePolicyError( + json, + formatKey, + "hwp_in_place_requires_verify", + "HWP in-place text replacement requires --verify.", + "Add --verify or use --prop output= instead.", + "officecli help hwp"); + + var outputPath = inPlace ? Path.GetFullPath(inputPath) : Path.GetFullPath(output!); + var outputDir = Path.GetDirectoryName(outputPath); + if (!string.IsNullOrEmpty(outputDir)) Directory.CreateDirectory(outputDir); + + var engine = HwpEngineSelector.GetEngine(formatKey, HwpCapabilityConstants.OperationReplaceText); + var request = new HwpReplaceTextRequest( + format, + inputPath, + outputPath, + query, + value, + mode, + bool.TryParse(caseSensitiveRaw, out var caseSensitive) && caseSensitive, + inPlace, + backup, + verify, + json); + var result = engine.ReplaceTextAsync(request, CancellationToken.None).GetAwaiter().GetResult(); + + if (json) + { + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = true, + ["message"] = $"Replaced HWP text '{query}' -> {result.OutputPath}", + ["data"] = new System.Text.Json.Nodes.JsonObject + { + ["outputPath"] = result.OutputPath, + ["engine"] = result.Engine, + ["engineVersion"] = result.EngineVersion, + ["evidence"] = HwpCapabilityJsonMapper.ToJsonArray(result.Evidence), + ["transaction"] = result.Transaction?.DeepClone() + }, + ["warnings"] = HwpCapabilityJsonMapper.ToJsonArray(result.Warnings) + }; + Console.WriteLine(envelope.ToJsonString(OutputFormatter.PublicJsonOptions)); + } + else + { + Console.WriteLine($"Replaced HWP text '{query}' -> {result.OutputPath}"); + foreach (var warning in result.Warnings) + Console.Error.WriteLine($"WARNING: {warning}"); + } + return 0; + } + + private static int WriteHwpTextReplacePolicyError( + bool json, + string format, + string code, + string message, + string suggestion, + string nextCommand) + { + if (json) + { + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = false, + ["error"] = new System.Text.Json.Nodes.JsonObject + { + ["error"] = message, + ["code"] = code, + ["suggestion"] = suggestion, + ["help"] = "officecli help hwp", + ["format"] = format, + ["operation"] = HwpCapabilityConstants.OperationReplaceText, + ["engine"] = HwpCapabilityConstants.EngineRhwpBridge, + ["engineMode"] = HwpCapabilityConstants.ModeExperimental, + ["nextCommand"] = nextCommand + } + }; + Console.WriteLine(envelope.ToJsonString(OutputFormatter.PublicJsonOptions)); + } + else + { + Console.Error.WriteLine(message); + Console.Error.WriteLine($"Hint: {suggestion}"); + } + return 1; + } + + private static int HandleHwpSaveAsHwp( + string inputPath, + HwpFormat format, + Dictionary properties, + bool json) + { + var output = FirstValue(properties, "output", "out"); + if (string.IsNullOrWhiteSpace(output)) + { + var message = "HWP save-as-hwp requires --prop output=."; + if (json) Console.WriteLine(OutputFormatter.WrapEnvelopeError(message)); + else Console.Error.WriteLine(message); + return 1; + } + + var outputPath = Path.GetFullPath(output); + var outputDir = Path.GetDirectoryName(outputPath); + if (!string.IsNullOrEmpty(outputDir)) Directory.CreateDirectory(outputDir); + + var formatKey = format == HwpFormat.Hwp + ? HwpCapabilityConstants.FormatHwp + : HwpCapabilityConstants.FormatHwpx; + var engine = HwpEngineSelector.GetEngine(formatKey, HwpCapabilityConstants.OperationSaveAsHwp); + var request = new HwpSaveAsHwpRequest(format, inputPath, outputPath, json); + var result = engine.SaveAsHwpAsync(request, CancellationToken.None).GetAwaiter().GetResult(); + + if (json) + { + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = true, + ["message"] = $"Saved HWP output -> {result.OutputPath}", + ["data"] = new System.Text.Json.Nodes.JsonObject + { + ["outputPath"] = result.OutputPath, + ["engine"] = result.Engine, + ["engineVersion"] = result.EngineVersion, + ["evidence"] = HwpCapabilityJsonMapper.ToJsonArray(result.Evidence) + }, + ["warnings"] = HwpCapabilityJsonMapper.ToJsonArray(result.Warnings) + }; + Console.WriteLine(envelope.ToJsonString(OutputFormatter.PublicJsonOptions)); + } + else + { + Console.WriteLine($"Saved HWP output -> {result.OutputPath}"); + foreach (var warning in result.Warnings) + Console.Error.WriteLine($"WARNING: {warning}"); + } + return 0; + } + + private static int HandleHwpConvertToEditable( + string inputPath, + HwpFormat format, + Dictionary properties, + bool json) + { + var output = FirstValue(properties, "output", "out"); + if (string.IsNullOrWhiteSpace(output)) + { + var message = "HWP convert-to-editable requires --prop output=."; + if (json) Console.WriteLine(OutputFormatter.WrapEnvelopeError(message)); + else Console.Error.WriteLine(message); + return 1; + } + + var outputPath = Path.GetFullPath(output); + var outputDir = Path.GetDirectoryName(outputPath); + if (!string.IsNullOrEmpty(outputDir)) Directory.CreateDirectory(outputDir); + + var formatKey = format == HwpFormat.Hwp + ? HwpCapabilityConstants.FormatHwp + : HwpCapabilityConstants.FormatHwpx; + var engine = HwpEngineSelector.GetEngine(formatKey, HwpCapabilityConstants.OperationConvertToEditable); + var request = new HwpConvertToEditableRequest(format, inputPath, outputPath, json); + var result = engine.ConvertToEditableAsync(request, CancellationToken.None).GetAwaiter().GetResult(); + + if (json) + { + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = true, + ["message"] = $"Converted HWP to editable output -> {result.OutputPath}", + ["data"] = new System.Text.Json.Nodes.JsonObject + { + ["outputPath"] = result.OutputPath, + ["engine"] = result.Engine, + ["engineVersion"] = result.EngineVersion, + ["evidence"] = HwpCapabilityJsonMapper.ToJsonArray(result.Evidence) + }, + ["warnings"] = HwpCapabilityJsonMapper.ToJsonArray(result.Warnings) + }; + Console.WriteLine(envelope.ToJsonString(OutputFormatter.PublicJsonOptions)); + } + else + { + Console.WriteLine($"Converted HWP to editable output -> {result.OutputPath}"); + foreach (var warning in result.Warnings) + Console.Error.WriteLine($"WARNING: {warning}"); + } + return 0; + } + + private static int HandleHwpNativeMutation( + string inputPath, + HwpFormat format, + Dictionary properties, + bool json) + { + var op = FirstValue(properties, "op", "operation"); + var output = FirstValue(properties, "output", "out"); + if (string.IsNullOrWhiteSpace(op) || string.IsNullOrWhiteSpace(output)) + { + var message = "HWP native-op requires --prop op= --prop output=."; + if (json) Console.WriteLine(OutputFormatter.WrapEnvelopeError(message)); + else Console.Error.WriteLine(message); + return 1; + } + + var outputPath = Path.GetFullPath(output); + var outputDir = Path.GetDirectoryName(outputPath); + if (!string.IsNullOrEmpty(outputDir)) Directory.CreateDirectory(outputDir); + + var args = new Dictionary(properties, StringComparer.OrdinalIgnoreCase); + args.Remove("op"); + args.Remove("operation"); + args.Remove("output"); + args.Remove("out"); + + var formatKey = format == HwpFormat.Hwp + ? HwpCapabilityConstants.FormatHwp + : HwpCapabilityConstants.FormatHwpx; + var engine = HwpEngineSelector.GetEngine(formatKey, HwpCapabilityConstants.OperationNativeMutation); + var request = new HwpNativeMutationRequest(format, inputPath, outputPath, op, args, json); + var result = engine.NativeMutationAsync(request, CancellationToken.None).GetAwaiter().GetResult(); + + if (json) + { + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = true, + ["message"] = $"Ran HWP native-op '{op}' -> {result.OutputPath}", + ["data"] = new System.Text.Json.Nodes.JsonObject + { + ["operation"] = op, + ["outputPath"] = result.OutputPath, + ["engine"] = result.Engine, + ["engineVersion"] = result.EngineVersion, + ["evidence"] = HwpCapabilityJsonMapper.ToJsonArray(result.Evidence) + }, + ["warnings"] = HwpCapabilityJsonMapper.ToJsonArray(result.Warnings) + }; + Console.WriteLine(envelope.ToJsonString(OutputFormatter.PublicJsonOptions)); + } + else + { + Console.WriteLine($"Ran HWP native-op '{op}' -> {result.OutputPath}"); + foreach (var warning in result.Warnings) + Console.Error.WriteLine($"WARNING: {warning}"); + } + return 0; + } +} diff --git a/src/officecli/CommandBuilder.Set.HwpTable.cs b/src/officecli/CommandBuilder.Set.HwpTable.cs new file mode 100644 index 000000000..1d6ea6545 --- /dev/null +++ b/src/officecli/CommandBuilder.Set.HwpTable.cs @@ -0,0 +1,103 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using OfficeCli.Core; +using OfficeCli.Handlers.Hwp; + +namespace OfficeCli; + +static partial class CommandBuilder +{ + private static int HandleHwpTableCellSet( + string inputPath, + HwpFormat format, + Dictionary properties, + bool json) + { + var output = FirstValue(properties, "output", "out"); + var value = FirstValue(properties, "value", "text"); + if (string.IsNullOrWhiteSpace(output) || value == null) + return HwpTableCellError("HWP table cell set requires --prop value= --prop output=.", json); + if (!TryReadInt(properties, out var section, "section", "sec") + || !TryReadInt(properties, out var parentPara, "parent-para", "parentParagraph", "paragraph", "para") + || !TryReadInt(properties, out var control, "control", "control-index", "controlIndex") + || !TryReadInt(properties, out var cell, "cell", "cell-index", "cellIndex")) + { + return HwpTableCellError( + "HWP table cell set requires numeric section, parent-para, control, and cell props.", + json); + } + + var cellPara = TryReadInt(properties, out var parsedCellPara, "cell-para", "cellParagraph", "cell-paragraph") + ? parsedCellPara + : 0; + var offset = TryReadInt(properties, out var parsedOffset, "offset") ? parsedOffset : 0; + int? count = TryReadInt(properties, out var parsedCount, "count") ? parsedCount : null; + var outputPath = Path.GetFullPath(output); + var outputDir = Path.GetDirectoryName(outputPath); + if (!string.IsNullOrEmpty(outputDir)) Directory.CreateDirectory(outputDir); + + var formatKey = format == HwpFormat.Hwp + ? HwpCapabilityConstants.FormatHwp + : HwpCapabilityConstants.FormatHwpx; + var engine = HwpEngineSelector.GetEngine( + formatKey, + HwpCapabilityConstants.OperationSetTableCell); + var request = new HwpTableCellSetRequest( + format, + inputPath, + outputPath, + section, + parentPara, + control, + cell, + cellPara, + offset, + count, + value, + json); + var result = engine.SetTableCellAsync(request, CancellationToken.None).GetAwaiter().GetResult(); + + if (json) + { + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = true, + ["message"] = $"Updated {formatKey.ToUpperInvariant()} table cell ({parentPara},{control},{cell}) -> {result.OutputPath}", + ["data"] = new System.Text.Json.Nodes.JsonObject + { + ["outputPath"] = result.OutputPath, + ["engine"] = result.Engine, + ["engineVersion"] = result.EngineVersion, + ["evidence"] = HwpCapabilityJsonMapper.ToJsonArray(result.Evidence) + }, + ["warnings"] = HwpCapabilityJsonMapper.ToJsonArray(result.Warnings) + }; + Console.WriteLine(envelope.ToJsonString(OutputFormatter.PublicJsonOptions)); + } + else + { + Console.WriteLine($"Updated {formatKey.ToUpperInvariant()} table cell ({parentPara},{control},{cell}) -> {result.OutputPath}"); + foreach (var warning in result.Warnings) + Console.Error.WriteLine($"WARNING: {warning}"); + } + return 0; + } + + private static bool TryReadInt( + Dictionary properties, + out int value, + params string[] keys) + { + value = 0; + var raw = FirstValue(properties, keys); + return raw != null && int.TryParse(raw, out value); + } + + private static int HwpTableCellError(string message, bool json) + { + if (json) Console.WriteLine(OutputFormatter.WrapEnvelopeError(message)); + else Console.Error.WriteLine(message); + return 1; + } +} diff --git a/src/officecli/CommandBuilder.Set.cs b/src/officecli/CommandBuilder.Set.cs index 8c82637db..00f7aac3e 100644 --- a/src/officecli/CommandBuilder.Set.cs +++ b/src/officecli/CommandBuilder.Set.cs @@ -4,6 +4,7 @@ using System.CommandLine; using OfficeCli.Core; using OfficeCli.Handlers; +using OfficeCli.Handlers.Hwp; namespace OfficeCli; @@ -12,16 +13,38 @@ static partial class CommandBuilder private static Command BuildSetCommand(Option jsonOption) { var forceOption = new Option("--force") { Description = "Force write even if document is protected" }; + var inPlaceOption = new Option("--in-place") { Description = "HWP safe-save in-place mutation (experimental; requires --backup --verify)" }; + var backupOption = new Option("--backup") { Description = "Create a backup before HWP in-place mutation" }; + var verifyOption = new Option("--verify") { Description = "Run HWP safe-save verification checks before publishing output" }; var setFileArg = new Argument("file") { Description = "Office document path (required even with open/close mode)" }; var setPathArg = new Argument("path") { Description = "DOM path to the element. The 'selected' pseudo-path is deprecated for mutations: use `get selected` to capture path(s) first, then `set ` (or a `batch` file for multi-select) so the target lives in the command line, not in transient watch-server state." }; var propsOpt = new Option("--prop") { Description = "Property to set (key=value)", AllowMultipleArgumentsPerToken = true }; - var setCommand = new Command("set", "Modify a document node's properties") { TreatUnmatchedTokensAsErrors = false }; + var setCommand = new Command("set", """ + Modify a document node's properties. + + Agent examples: + officecli set report.docx "/body/p[1]" --prop text="New text" --json + officecli set sheet.xlsx "/Sheet1/A1" --prop value="42" --json + officecli set form.hwp /field --prop name=회사명 --prop value=리지 --prop output=out.hwp --json + officecli set form.hwp /field --prop id=1584999796 --prop value=리지 --prop output=out.hwp --json + officecli set form.hwp /text --prop find=마케팅 --prop value=브릿지 --prop output=out.hwp --json + officecli set form.hwp /text --prop find=마케팅 --prop value=브릿지 --in-place --backup --verify --json + officecli set table.hwp /table/cell --prop section=0 --prop parent-para=3 --prop control=0 --prop cell=0 --prop value=오피스셀 --prop output=out.hwp --json + officecli set readonly.hwp /convert-to-editable --prop output=editable.hwp --json + officecli set form.hwp /native-op --prop op=split-paragraph --prop paragraph=0 --prop offset=5 --prop output=out.hwp --json + officecli set form.hwpx /save-as-hwp --prop output=out.hwp --json + + HWP uses packaged rhwp sidecars when present, or OFFICECLI_HWP_ENGINE=rhwp-experimental plus bridge paths; run `officecli help hwp`. + """) { TreatUnmatchedTokensAsErrors = false }; setCommand.Add(setFileArg); setCommand.Add(setPathArg); setCommand.Add(propsOpt); setCommand.Add(jsonOption); setCommand.Add(forceOption); + setCommand.Add(inPlaceOption); + setCommand.Add(backupOption); + setCommand.Add(verifyOption); setCommand.SetAction(result => { var json = result.GetValue(jsonOption); return SafeRun(() => { @@ -29,6 +52,9 @@ private static Command BuildSetCommand(Option jsonOption) var path = result.GetValue(setPathArg)!; var props = result.GetValue(propsOpt); var force = result.GetValue(forceOption); + var inPlace = result.GetValue(inPlaceOption); + var backup = result.GetValue(backupOption); + var verify = result.GetValue(verifyOption); // BUG-BT-R5-01: support the `selected` pseudo-path (mark and get // already do). Expand to the first selected path and recursively @@ -124,19 +150,67 @@ private static Command BuildSetCommand(Option jsonOption) return 1; } - if (TryResident(file.FullName, req => - { - req.Command = "set"; - req.Args["path"] = path; - req.Props = ParsePropsArray(props); - }, json) is {} rc) return rc; - // CONSISTENCY(prop-key-case): --prop keys are case-insensitive // so "SRC=x" and "src=x" both resolve to the same handler key. // Reuse ParsePropsArray so the inline and resident-server paths // stay in sync. var properties = ParsePropsArray(props); + var extension = Path.GetExtension(file.FullName); + if (string.Equals(extension, ".hwp", StringComparison.OrdinalIgnoreCase) + && string.Equals(path, "/field", StringComparison.OrdinalIgnoreCase)) + return HandleHwpFieldSet(file.FullName, HwpFormat.Hwp, properties, json); + if (string.Equals(extension, ".hwp", StringComparison.OrdinalIgnoreCase) + && string.Equals(path, "/text", StringComparison.OrdinalIgnoreCase)) + return HandleHwpTextReplace(file.FullName, HwpFormat.Hwp, properties, json, inPlace, backup, verify); + if (string.Equals(extension, ".hwp", StringComparison.OrdinalIgnoreCase) + && string.Equals(path, "/table/cell", StringComparison.OrdinalIgnoreCase)) + return HandleHwpTableCellSet(file.FullName, HwpFormat.Hwp, properties, json); + if (string.Equals(extension, ".hwp", StringComparison.OrdinalIgnoreCase) + && string.Equals(path, "/convert-to-editable", StringComparison.OrdinalIgnoreCase)) + return HandleHwpConvertToEditable(file.FullName, HwpFormat.Hwp, properties, json); + if (string.Equals(extension, ".hwp", StringComparison.OrdinalIgnoreCase) + && string.Equals(path, "/native-op", StringComparison.OrdinalIgnoreCase)) + return HandleHwpNativeMutation(file.FullName, HwpFormat.Hwp, properties, json); + if (string.Equals(extension, ".hwp", StringComparison.OrdinalIgnoreCase) + && string.Equals(path, "/save-as-hwp", StringComparison.OrdinalIgnoreCase)) + return HandleHwpSaveAsHwp(file.FullName, HwpFormat.Hwp, properties, json); + + if (string.Equals(extension, ".hwpx", StringComparison.OrdinalIgnoreCase) + && (HwpEngineSelector.IsExperimentalBridgeEnabled() + || HwpEngineSelector.CanUseInstalledRuntime( + HwpCapabilityConstants.FormatHwpx, + HwpCapabilityConstants.OperationFillField)) + && string.Equals(path, "/field", StringComparison.OrdinalIgnoreCase)) + return HandleHwpFieldSet(file.FullName, HwpFormat.Hwpx, properties, json); + if (string.Equals(extension, ".hwpx", StringComparison.OrdinalIgnoreCase) + && (HwpEngineSelector.IsExperimentalBridgeEnabled() + || HwpEngineSelector.CanUseInstalledRuntime( + HwpCapabilityConstants.FormatHwpx, + HwpCapabilityConstants.OperationReplaceText)) + && string.Equals(path, "/text", StringComparison.OrdinalIgnoreCase)) + return HandleHwpTextReplace(file.FullName, HwpFormat.Hwpx, properties, json, inPlace, backup, verify); + if (string.Equals(extension, ".hwpx", StringComparison.OrdinalIgnoreCase) + && (HwpEngineSelector.IsExperimentalBridgeEnabled() + || HwpEngineSelector.CanUseInstalledRuntime( + HwpCapabilityConstants.FormatHwpx, + HwpCapabilityConstants.OperationSetTableCell)) + && string.Equals(path, "/table/cell", StringComparison.OrdinalIgnoreCase)) + return HandleHwpTableCellSet(file.FullName, HwpFormat.Hwpx, properties, json); + if (string.Equals(extension, ".hwpx", StringComparison.OrdinalIgnoreCase) + && string.Equals(path, "/save-as-hwp", StringComparison.OrdinalIgnoreCase)) + return HandleHwpSaveAsHwp(file.FullName, HwpFormat.Hwpx, properties, json); + if (string.Equals(extension, ".hwpx", StringComparison.OrdinalIgnoreCase) + && string.Equals(path, "/native-op", StringComparison.OrdinalIgnoreCase)) + return HandleHwpNativeMutation(file.FullName, HwpFormat.Hwpx, properties, json); + + if (TryResident(file.FullName, req => + { + req.Command = "set"; + req.Args["path"] = path; + req.Props = properties; + }, json) is {} rc) return rc; + using var handler = DocumentHandlerFactory.Open(file.FullName, editable: true); var unsupported = handler.Set(path, properties); @@ -320,4 +394,12 @@ private static Command BuildSetCommand(Option jsonOption) return setCommand; } + + private static string? FirstValue(Dictionary properties, params string[] keys) + { + foreach (var key in keys) + if (properties.TryGetValue(key, out var value)) + return value; + return null; + } } diff --git a/src/officecli/CommandBuilder.View.Help.cs b/src/officecli/CommandBuilder.View.Help.cs new file mode 100644 index 000000000..fa33a6783 --- /dev/null +++ b/src/officecli/CommandBuilder.View.Help.cs @@ -0,0 +1,22 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +namespace OfficeCli; + +static partial class CommandBuilder +{ + private static string BuildViewDescription() + => """ + View document in different modes. + + Agent examples: + officecli view report.docx text --json + officecli view sheet.xlsx text --json + officecli view file.hwp text --json + officecli view file.hwp svg --page 1 --json + officecli view file.hwp fields --json + officecli view file.hwp field --field-name 회사명 --json + + HWP requires OFFICECLI_HWP_ENGINE=rhwp-experimental plus bridge paths; run `officecli help hwp`. + """; +} diff --git a/src/officecli/CommandBuilder.View.HwpNative.cs b/src/officecli/CommandBuilder.View.HwpNative.cs new file mode 100644 index 000000000..ff54dcd27 --- /dev/null +++ b/src/officecli/CommandBuilder.View.HwpNative.cs @@ -0,0 +1,97 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Collections.Generic; +using OfficeCli.Handlers.Hwp; + +namespace OfficeCli; + +static partial class CommandBuilder +{ + private static readonly HashSet HwpNativeReadOperations = new(StringComparer.OrdinalIgnoreCase) + { + "get-paragraph-count", + "get-paragraph-length", + "get-text-range", + "get-textbox-control-index", + "find-next-editable-control", + "find-nearest-control-backward", + "find-nearest-control-forward", + "get-cell-paragraph-count", + "get-cell-paragraph-length", + "get-text-in-cell", + "get-char-properties-at", + "get-cell-char-properties-at", + "get-para-properties-at", + "get-cell-para-properties-at", + "get-style-list", + "get-style-detail", + "get-numbering-list", + "get-bullet-list", + "get-page-hide", + "get-header-footer", + "get-header-footer-list", + "get-header-footer-para-info", + "navigate-header-footer-by-page", + "get-para-properties-in-hf", + "get-picture-properties", + "get-shape-properties", + "get-equation-properties", + "render-equation-preview", + "get-footnote-info" + }; + + private static string? HwpViewOperationForMode(string modeKey) + => modeKey is "text" or "t" ? HwpCapabilityConstants.OperationReadText + : modeKey is "svg" or "g" ? HwpCapabilityConstants.OperationRenderSvg + : modeKey is "png" ? HwpCapabilityConstants.OperationRenderPng + : modeKey is "pdf" ? HwpCapabilityConstants.OperationExportPdf + : modeKey is "markdown" or "md" ? HwpCapabilityConstants.OperationExportMarkdown + : modeKey is "thumbnail" ? HwpCapabilityConstants.OperationThumbnail + : modeKey is "info" ? HwpCapabilityConstants.OperationDocumentInfo + : modeKey is "diagnostics" or "diag" ? HwpCapabilityConstants.OperationDiagnostics + : modeKey is "dump" or "controls" ? HwpCapabilityConstants.OperationDumpControls + : modeKey is "pages" or "dump-pages" ? HwpCapabilityConstants.OperationDumpPages + : modeKey is "fields" ? HwpCapabilityConstants.OperationListFields + : modeKey is "field" ? HwpCapabilityConstants.OperationReadField + : modeKey is "table-cell" or "cell" ? HwpCapabilityConstants.OperationReadTableCell + : modeKey is "tables" or "cells" ? HwpCapabilityConstants.OperationScanCells + : modeKey is "native" or "native-op" ? HwpCapabilityConstants.OperationNativeRead + : null; + + private static bool IsHwpNativeReadOperation(string op) + => HwpNativeReadOperations.Contains(op); + + private static void ValidateHwpNativeViewRequest( + string formatKey, + string nativeOp, + string[] nativeArgs) + { + if (!IsHwpNativeReadOperation(nativeOp)) + throw new HwpEngineException( + $"HWP native view operation '{nativeOp}' is not read-only.", + HwpCapabilityConstants.ReasonUnsupportedOperation, + "Use `officecli set file.hwp /native-op --prop op= --prop output= ...` for native mutations.", + [HwpCapabilityConstants.OperationNativeRead, HwpCapabilityConstants.OperationNativeMutation], + formatKey, + HwpCapabilityConstants.OperationNativeRead, + HwpCapabilityConstants.EngineRhwpBridge, + HwpCapabilityConstants.ModeExperimental); + + foreach (var (key, _) in ParsePropsArray(nativeArgs)) + { + var normalized = key.TrimStart('-'); + if (normalized.Equals("output", StringComparison.OrdinalIgnoreCase) + || normalized.Equals("out", StringComparison.OrdinalIgnoreCase)) + throw new HwpEngineException( + "HWP native view does not accept output paths.", + HwpCapabilityConstants.ReasonUnsupportedOperation, + "Use `officecli set file.hwp /native-op --prop op= --prop output= ...` for output-first native mutations.", + [HwpCapabilityConstants.OperationNativeRead, HwpCapabilityConstants.OperationNativeMutation], + formatKey, + HwpCapabilityConstants.OperationNativeRead, + HwpCapabilityConstants.EngineRhwpBridge, + HwpCapabilityConstants.ModeExperimental); + } + } +} diff --git a/src/officecli/CommandBuilder.View.cs b/src/officecli/CommandBuilder.View.cs index e6f76efde..7e4e9baee 100644 --- a/src/officecli/CommandBuilder.View.cs +++ b/src/officecli/CommandBuilder.View.cs @@ -4,6 +4,7 @@ using System.CommandLine; using OfficeCli.Core; using OfficeCli.Handlers; +using OfficeCli.Handlers.Hwp; namespace OfficeCli; @@ -11,8 +12,8 @@ static partial class CommandBuilder { private static Command BuildViewCommand(Option jsonOption) { - var viewFileArg = new Argument("file") { Description = "Office document path (.docx, .xlsx, .pptx)" }; - var viewModeArg = new Argument("mode") { Description = "View mode: text, annotated, outline, stats, issues, html, svg, screenshot, pdf, forms" }; + var viewFileArg = new Argument("file") { Description = "Office document path (.docx, .xlsx, .pptx, .hwpx, experimental .hwp)" }; + var viewModeArg = new Argument("mode") { Description = "View mode: text, annotated, outline, stats, issues, html, svg, screenshot, pdf, forms, styles, tables, markdown, objects, fields, field, native" }; var startLineOpt = new Option("--start") { Description = "Start line/paragraph number" }; var endLineOpt = new Option("--end") { Description = "End line/paragraph number" }; var maxLinesOpt = new Option("--max-lines") { Description = "Maximum number of lines/rows/slides to output (truncates with total count)" }; @@ -21,15 +22,33 @@ private static Command BuildViewCommand(Option jsonOption) var colsOpt = new Option("--cols") { Description = "Column filter, comma-separated (Excel only, e.g. A,B,C)" }; var pageOpt = new Option("--page") { Description = "Page filter (e.g. 1, 2-5, 1,3,5). html mode: default=all. screenshot mode: default=1 (use --page 1-N to capture more, or --grid N for pptx thumbnails)." }; - var browserOpt = new Option("--browser") { Description = "Open output in browser (html / svg modes)" }; + var browserOpt = new Option("--browser") { Description = "Open output in browser or image viewer (html / svg / screenshot modes)" }; var outOpt = new Option("--out", "-o") { Description = "Output file path (screenshot mode; defaults to a temp file)" }; var screenshotWidthOpt = new Option("--screenshot-width") { Description = "Screenshot viewport width (default 1600)", DefaultValueFactory = _ => 1600 }; var screenshotHeightOpt = new Option("--screenshot-height") { Description = "Screenshot viewport height (default 1200)", DefaultValueFactory = _ => 1200 }; var gridOpt = new Option("--grid") { Description = "Tile slides into an N-column thumbnail grid (screenshot mode, pptx only; 0 = off)", DefaultValueFactory = _ => 0 }; var renderOpt = new Option("--render") { Description = "Screenshot rendering path (docx only): auto (default; native on Windows w/ Word, html elsewhere), native (force OS-native, error if unavailable), html", DefaultValueFactory = _ => "auto" }; var withPagesOpt = new Option("--page-count") { Description = "stats mode (docx only): also report total page count via Word repagination (Win + Word required; slow on long docs)" }; + var autoOpt = new Option("--auto") { Description = "Auto-recognize label-value fields in tables (hwpx forms only)" }; + var objectTypeOpt = new Option("--object-type") { Description = "Object type filter: picture, field, bookmark, equation, formfield (hwpx objects mode)" }; + var nativeOpOpt = new Option("--op") { Description = "HWP rhwp native read operation for native view mode" }; + var nativeArgOpt = new Option("--native-arg") { Description = "HWP native view argument (key=value), repeatable", AllowMultipleArgumentsPerToken = true }; + var fieldNameOpt = new Option("--field-name") { Description = "Field name for HWP/HWPX field read mode" }; + var fieldIdOpt = new Option("--field-id") { Description = "Field id for HWP/HWPX field read mode" }; + var sectionOpt = new Option("--section") { Description = "HWP rhwp section index for table/page operations" }; + var parentParaOpt = new Option("--parent-para") { Description = "HWP rhwp parent paragraph index for table operations" }; + var controlOpt = new Option("--control") { Description = "HWP rhwp control index for table operations" }; + var cellOpt = new Option("--cell") { Description = "HWP rhwp cell index for table operations" }; + var cellParaOpt = new Option("--cell-para") { Description = "HWP rhwp cell paragraph index for table operations" }; + var offsetOpt = new Option("--offset") { Description = "HWP rhwp text offset for table cell read" }; + var countOpt = new Option("--count") { Description = "HWP rhwp count/limit for table cell read" }; + var maxParentParaOpt = new Option("--max-parent-para") { Description = "HWP rhwp scan upper bound for parent paragraphs" }; + var maxControlOpt = new Option("--max-control") { Description = "HWP rhwp scan upper bound for controls" }; + var maxCellOpt = new Option("--max-cell") { Description = "HWP rhwp scan upper bound for cells" }; + var maxCellParaOpt = new Option("--max-cell-para") { Description = "HWP rhwp scan upper bound for cell paragraphs" }; + var includeEmptyOpt = new Option("--include-empty") { Description = "Include empty HWP table cells in scan output" }; - var viewCommand = new Command("view", "View document in different modes"); + var viewCommand = new Command("view", BuildViewDescription()); viewCommand.Add(viewFileArg); viewCommand.Add(viewModeArg); viewCommand.Add(startLineOpt); @@ -46,6 +65,24 @@ private static Command BuildViewCommand(Option jsonOption) viewCommand.Add(gridOpt); viewCommand.Add(renderOpt); viewCommand.Add(withPagesOpt); + viewCommand.Add(autoOpt); + viewCommand.Add(objectTypeOpt); + viewCommand.Add(nativeOpOpt); + viewCommand.Add(nativeArgOpt); + viewCommand.Add(fieldNameOpt); + viewCommand.Add(fieldIdOpt); + viewCommand.Add(sectionOpt); + viewCommand.Add(parentParaOpt); + viewCommand.Add(controlOpt); + viewCommand.Add(cellOpt); + viewCommand.Add(cellParaOpt); + viewCommand.Add(offsetOpt); + viewCommand.Add(countOpt); + viewCommand.Add(maxParentParaOpt); + viewCommand.Add(maxControlOpt); + viewCommand.Add(maxCellOpt); + viewCommand.Add(maxCellParaOpt); + viewCommand.Add(includeEmptyOpt); viewCommand.Add(jsonOption); viewCommand.SetAction(result => { var json = result.GetValue(jsonOption); return SafeRun(() => @@ -68,13 +105,41 @@ private static Command BuildViewCommand(Option jsonOption) if (renderMode is not ("auto" or "native" or "html")) throw new OfficeCli.Core.CliException($"Invalid --render value: {renderMode}. Valid: auto, native, html") { Code = "invalid_render", ValidValues = ["auto", "native", "html"] }; var withPages = result.GetValue(withPagesOpt); + var autoRecognize = result.GetValue(autoOpt); + var objectTypeFilter = result.GetValue(objectTypeOpt); + var nativeOp = result.GetValue(nativeOpOpt); + var nativeArgs = result.GetValue(nativeArgOpt); + var fieldName = result.GetValue(fieldNameOpt); + var fieldId = result.GetValue(fieldIdOpt); + var hwpViewArgs = new Dictionary(StringComparer.Ordinal); + AddHwpViewOption(hwpViewArgs, "--section", result.GetValue(sectionOpt)); + AddHwpViewOption(hwpViewArgs, "--parent-para", result.GetValue(parentParaOpt)); + AddHwpViewOption(hwpViewArgs, "--control", result.GetValue(controlOpt)); + AddHwpViewOption(hwpViewArgs, "--cell", result.GetValue(cellOpt)); + AddHwpViewOption(hwpViewArgs, "--cell-para", result.GetValue(cellParaOpt)); + AddHwpViewOption(hwpViewArgs, "--offset", result.GetValue(offsetOpt)); + AddHwpViewOption(hwpViewArgs, "--count", result.GetValue(countOpt)); + AddHwpViewOption(hwpViewArgs, "--max-parent-para", result.GetValue(maxParentParaOpt)); + AddHwpViewOption(hwpViewArgs, "--max-control", result.GetValue(maxControlOpt)); + AddHwpViewOption(hwpViewArgs, "--max-cell", result.GetValue(maxCellOpt)); + AddHwpViewOption(hwpViewArgs, "--max-cell-para", result.GetValue(maxCellParaOpt)); + if (result.GetValue(includeEmptyOpt)) + hwpViewArgs["--include-empty"] = "true"; // pdf mode runs entirely through an exporter plugin (no handler // open, no resident hop — the plugin gets a snapshot of the // source and writes the PDF). Handled before TryResident // because exporter invocation needs the file lock released, and // ExporterInvoker closes the resident itself when present. - if (mode.ToLowerInvariant() is "pdf") + var lowerMode = mode.ToLowerInvariant(); + var earlyExtension = Path.GetExtension(file.FullName); + var bridgeOwnsPdf = string.Equals(earlyExtension, ".hwp", StringComparison.OrdinalIgnoreCase) + || (string.Equals(earlyExtension, ".hwpx", StringComparison.OrdinalIgnoreCase) + && (HwpEngineSelector.IsExperimentalBridgeEnabled() + || HwpEngineSelector.CanUseInstalledRuntime( + HwpCapabilityConstants.FormatHwpx, + HwpCapabilityConstants.OperationExportPdf))); + if (lowerMode is "pdf" && !bridgeOwnsPdf) { var pdfPath = outArg ?? Path.ChangeExtension(file.FullName, "pdf"); var exp = OfficeCli.Core.Plugins.ExporterInvoker.Run(file.FullName, ".pdf", pdfPath); @@ -120,11 +185,31 @@ private static Command BuildViewCommand(Option jsonOption) if (gridCols > 0) req.Args["grid"] = gridCols.ToString(); if (renderMode != "auto") req.Args["render"] = renderMode; if (withPages) req.Args["page-count"] = "true"; + if (autoRecognize) req.Args["auto"] = "true"; + if (objectTypeFilter != null) req.Args["object-type"] = objectTypeFilter; }, json) is {} rc) return rc; var format = json ? OutputFormat.Json : OutputFormat.Text; var cols = colsStr != null ? new HashSet(colsStr.Split(',').Select(c => c.Trim().ToUpperInvariant())) : null; + var extension = Path.GetExtension(file.FullName); + + // Binary .hwp: route through HWP engine (bridge when experimental, else unsupported) + if (string.Equals(extension, ".hwp", StringComparison.OrdinalIgnoreCase)) + return HandleHwpView(file.FullName, HwpFormat.Hwp, mode, pageFilter, json, fieldName, fieldId, outArg, hwpViewArgs, nativeOp, nativeArgs); + + // HWPX stays on the custom XML handler by default. The rhwp bridge can be + // opted into for read/render smoke coverage without changing stable HWPX behavior. + var hwpxModeKey = mode.Trim().ToLowerInvariant(); + var hwpxOperation = HwpViewOperationForMode(hwpxModeKey); + if (string.Equals(extension, ".hwpx", StringComparison.OrdinalIgnoreCase) + && hwpxOperation != null + && (HwpEngineSelector.IsExperimentalBridgeEnabled() + || HwpEngineSelector.CanUseInstalledRuntime( + HwpCapabilityConstants.FormatHwpx, + hwpxOperation))) + return HandleHwpView(file.FullName, HwpFormat.Hwpx, mode, pageFilter, json, fieldName, fieldId, outArg, hwpViewArgs, nativeOp, nativeArgs); + using var handler = DocumentHandlerFactory.Open(file.FullName); if (mode.ToLowerInvariant() is "html" or "h") @@ -143,6 +228,8 @@ private static Command BuildViewCommand(Option jsonOption) html = excelHandler.ViewAsHtml(); else if (handler is OfficeCli.Handlers.WordHandler wordHandler) html = wordHandler.ViewAsHtml(pageFilter); + else if (handler is OfficeCli.Handlers.HwpxHandler hwpxHandler) + html = hwpxHandler.ViewAsHtml(); else if (handler is OfficeCli.Core.Plugins.FormatHandlerProxy proxy) html = proxy.ViewAsHtml(int.TryParse(pageFilter, out var p) ? p : (int?)null); @@ -174,10 +261,10 @@ private static Command BuildViewCommand(Option jsonOption) } else { - throw new OfficeCli.Core.CliException("HTML preview is only supported for .pptx, .xlsx, and .docx files.") + throw new OfficeCli.Core.CliException("HTML preview is only supported for .pptx, .xlsx, .docx, and .hwpx files.") { Code = "unsupported_type", - Suggestion = "Use a .pptx, .xlsx, or .docx file, or use mode 'text' or 'annotated' for other formats.", + Suggestion = "Use a .pptx, .xlsx, .docx, or .hwpx file, or use mode 'text' or 'annotated' for other formats.", ValidValues = ["text", "annotated", "outline", "stats", "issues"] }; } @@ -412,6 +499,8 @@ private static Command BuildViewCommand(Option jsonOption) { if (handler is OfficeCli.Handlers.WordHandler wordFormsHandler) Console.WriteLine(OutputFormatter.WrapEnvelope(wordFormsHandler.ViewAsFormsJson().ToJsonString(OutputFormatter.PublicJsonOptions))); + else if (handler is OfficeCli.Handlers.HwpxHandler hwpxFormsHandler) + Console.WriteLine(OutputFormatter.WrapEnvelope(hwpxFormsHandler.ViewAsFormsJson(autoRecognize).ToJsonString(OutputFormatter.PublicJsonOptions))); else if (handler is OfficeCli.Core.Plugins.FormatHandlerProxy formsProxy) { var formsJson = formsProxy.ViewAsFormsJson(); @@ -421,17 +510,33 @@ private static Command BuildViewCommand(Option jsonOption) Console.WriteLine(OutputFormatter.WrapEnvelope(formsJson.ToJsonString(OutputFormatter.PublicJsonOptions))); } else - throw new OfficeCli.Core.CliException("Forms view is only supported for .docx files.") + throw new OfficeCli.Core.CliException("Forms view is only supported for .docx and .hwpx files.") { Code = "unsupported_type", - ValidValues = ["text", "annotated", "outline", "stats", "issues", "html", "svg", "screenshot", "pdf", "forms"] + ValidValues = ["text", "annotated", "outline", "stats", "issues", "html", "svg", "screenshot", "pdf", "forms", "tables", "objects"] }; } + else if (modeKey is "tables" or "tbl") + { + if (handler is OfficeCli.Handlers.HwpxHandler hwpxTblHandler) + Console.WriteLine(OutputFormatter.WrapEnvelope(hwpxTblHandler.ViewAsTablesJson().ToJsonString(OutputFormatter.PublicJsonOptions))); + else + throw new OfficeCli.Core.CliException("Tables view is only supported for .hwpx files.") + { Code = "unsupported_type" }; + } + else if (modeKey is "objects" or "obj") + { + if (handler is OfficeCli.Handlers.HwpxHandler hwpxObjHandler) + Console.WriteLine(OutputFormatter.WrapEnvelope(hwpxObjHandler.ViewAsObjectsJson(objectTypeFilter).ToJsonString(OutputFormatter.PublicJsonOptions))); + else + throw new OfficeCli.Core.CliException("Objects view is only supported for .hwpx files.") + { Code = "unsupported_type" }; + } else - throw new OfficeCli.Core.CliException($"Unknown mode: {mode}. Available: text, annotated, outline, stats, issues, html, svg, screenshot, forms") + throw new OfficeCli.Core.CliException($"Unknown mode: {mode}. Available: text, annotated, outline, stats, issues, html, svg, screenshot, pdf, forms, tables, objects") { Code = "invalid_value", - ValidValues = ["text", "annotated", "outline", "stats", "issues", "html", "svg", "screenshot", "pdf", "forms"] + ValidValues = ["text", "annotated", "outline", "stats", "issues", "html", "svg", "screenshot", "pdf", "forms", "tables", "objects"] }; } else @@ -448,20 +553,49 @@ private static Command BuildViewCommand(Option jsonOption) "forms" or "f" => handler switch { OfficeCli.Handlers.WordHandler wfh => wfh.ViewAsForms(), + OfficeCli.Handlers.HwpxHandler hfh => hfh.ViewAsForms(autoRecognize), OfficeCli.Core.Plugins.FormatHandlerProxy fp => fp.ViewAsFormsJson()?.ToJsonString(OutputFormatter.PublicJsonOptions) ?? throw new OfficeCli.Core.CliException($"Forms view is not supported by the format-handler plugin for {file.Extension}.") { Code = "unsupported_type" }, - _ => throw new OfficeCli.Core.CliException("Forms view is only supported for .docx files.") + _ => throw new OfficeCli.Core.CliException("Forms view is only supported for .docx, .hwpx, or a plugin that supports forms view.") { Code = "unsupported_type", - ValidValues = ["text", "annotated", "outline", "stats", "issues", "html", "svg", "screenshot", "pdf", "forms"] + ValidValues = ["text", "annotated", "outline", "stats", "issues", "html", "svg", "screenshot", "pdf", "forms", "tables", "markdown", "objects", "styles"] } }, - _ => throw new OfficeCli.Core.CliException($"Unknown mode: {mode}. Available: text, annotated, outline, stats, issues, html, svg, screenshot, forms") + "styles" => handler is OfficeCli.Handlers.HwpxHandler hsh + ? hsh.ViewAsStyles() + : throw new OfficeCli.Core.CliException("Styles view is only supported for .hwpx files.") + { + Code = "unsupported_type", + ValidValues = ["text", "annotated", "outline", "stats", "issues", "html", "styles", "tables"] + }, + "tables" or "tbl" => handler is OfficeCli.Handlers.HwpxHandler htbl + ? htbl.ViewAsTables() + : throw new OfficeCli.Core.CliException("Tables view is only supported for .hwpx files.") + { + Code = "unsupported_type", + ValidValues = ["text", "annotated", "outline", "stats", "issues", "html", "styles", "tables", "markdown"] + }, + "markdown" or "md" => handler is OfficeCli.Handlers.HwpxHandler hmd + ? hmd.ViewAsMarkdown() + : throw new OfficeCli.Core.CliException("Markdown view is only supported for .hwpx files.") + { + Code = "unsupported_type", + ValidValues = ["text", "annotated", "outline", "stats", "issues", "html", "styles", "tables", "markdown", "objects"] + }, + "objects" or "obj" => handler is OfficeCli.Handlers.HwpxHandler hobj + ? hobj.ViewAsObjects(objectTypeFilter) + : throw new OfficeCli.Core.CliException("Objects view is only supported for .hwpx files.") + { + Code = "unsupported_type", + ValidValues = ["text", "annotated", "outline", "stats", "issues", "html", "styles", "tables", "markdown", "objects"] + }, + _ => throw new OfficeCli.Core.CliException($"Unknown mode: {mode}. Available: text, annotated, outline, stats, issues, html, svg, screenshot, pdf, forms, tables, markdown, objects, styles") { Code = "invalid_value", - ValidValues = ["text", "annotated", "outline", "stats", "issues", "html", "svg", "screenshot", "pdf", "forms"] + ValidValues = ["text", "annotated", "outline", "stats", "issues", "html", "svg", "screenshot", "pdf", "forms", "tables", "markdown", "objects", "styles"] } }; Console.WriteLine(output); @@ -505,4 +639,320 @@ private static (int? start, int? end) ParsePptHtmlPage( throw new ArgumentException($"--page {p} out of range (total slides: {slideCount})."); return (p, p); } + + private static void AddHwpViewOption(Dictionary args, string key, int? value) + { + if (value.HasValue) + args[key] = value.Value.ToString(); + } + + private static int HandleHwpView( + string filePath, + HwpFormat format, + string mode, + string? pageFilter, + bool json, + string? fieldName = null, + int? fieldId = null, + string? outArg = null, + IReadOnlyDictionary? viewArgs = null, + string? nativeOp = null, + string[]? nativeArgs = null) + { + var modeKey = mode.Trim().ToLowerInvariant(); + var formatKey = format == HwpFormat.Hwp + ? HwpCapabilityConstants.FormatHwp + : HwpCapabilityConstants.FormatHwpx; + var operation = HwpViewOperationForMode(modeKey); + + if (!HwpEngineSelector.IsExperimentalBridgeEnabled() + && !HwpEngineSelector.CanUseInstalledRuntime(formatKey, operation)) + { + var label = format == HwpFormat.Hwp ? "Binary .hwp" : "HWPX"; + throw new HwpEngineException( + $"{label} bridge view requires packaged rhwp sidecars or OFFICECLI_HWP_ENGINE=rhwp-experimental.", + HwpCapabilityConstants.ReasonBridgeNotEnabled, + "Run ./dev-install.sh, or set OFFICECLI_HWP_ENGINE=rhwp-experimental and install rhwp-officecli-bridge.", + [ + HwpCapabilityConstants.OperationReadText, + HwpCapabilityConstants.OperationRenderSvg, + HwpCapabilityConstants.OperationRenderPng, + HwpCapabilityConstants.OperationExportPdf, + HwpCapabilityConstants.OperationExportMarkdown, + HwpCapabilityConstants.OperationThumbnail, + HwpCapabilityConstants.OperationDocumentInfo, + HwpCapabilityConstants.OperationDiagnostics, + HwpCapabilityConstants.OperationDumpControls, + HwpCapabilityConstants.OperationDumpPages, + HwpCapabilityConstants.OperationListFields, + HwpCapabilityConstants.OperationReadField, + HwpCapabilityConstants.OperationReadTableCell, + HwpCapabilityConstants.OperationScanCells, + HwpCapabilityConstants.OperationNativeRead + ], + formatKey, + operation, + HwpCapabilityConstants.EngineNone, + HwpCapabilityConstants.ModeNone); + } + + var engine = HwpEngineSelector.GetEngine(formatKey, operation); + var fileInfo = new FileInfo(filePath); + var ct = CancellationToken.None; + + if (modeKey is "text" or "t") + { + var request = new HwpReadRequest(format, filePath, fileInfo.Length, json); + var result = engine.ReadTextAsync(request, ct).GetAwaiter().GetResult(); + if (json) + { + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = true, + ["data"] = new System.Text.Json.Nodes.JsonObject + { + ["text"] = result.Text, + ["engine"] = result.Engine, + ["engineVersion"] = result.EngineVersion + }, + ["warnings"] = HwpCapabilityJsonMapper.ToJsonArray(result.Warnings) + }; + Console.WriteLine(envelope.ToJsonString(OfficeCli.Core.OutputFormatter.PublicJsonOptions)); + } + else + { + Console.WriteLine(result.Text); + } + return 0; + } + + if (modeKey is "svg" or "g") + { + var outDir = Path.Combine(Path.GetTempPath(), $"officecli_hwp_svg_{Guid.NewGuid():N}"); + Directory.CreateDirectory(outDir); + var request = new HwpRenderRequest( + format, filePath, outDir, + pageFilter ?? "all", fileInfo.Length, json); + var result = engine.RenderSvgAsync(request, ct).GetAwaiter().GetResult(); + if (json) + { + var pagesArr = new System.Text.Json.Nodes.JsonArray(); + foreach (var p in result.Pages) + pagesArr.Add((System.Text.Json.Nodes.JsonNode?)new System.Text.Json.Nodes.JsonObject + { + ["page"] = p.Page, ["path"] = p.SvgPath, ["sha256"] = p.Sha256 + }); + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = true, + ["data"] = new System.Text.Json.Nodes.JsonObject + { + ["pages"] = pagesArr, + ["manifest"] = result.ManifestPath, + ["engine"] = result.Engine, + ["engineVersion"] = result.EngineVersion + }, + ["warnings"] = HwpCapabilityJsonMapper.ToJsonArray(result.Warnings) + }; + Console.WriteLine(envelope.ToJsonString(OfficeCli.Core.OutputFormatter.PublicJsonOptions)); + } + else + { + foreach (var p in result.Pages) + Console.WriteLine($"Page {p.Page}: {p.SvgPath}"); + } + return 0; + } + + if (modeKey is "png" or "pdf" or "markdown" or "md" or "thumbnail" or "info" or "diagnostics" or "diag" or "dump" or "controls" or "pages" or "dump-pages" or "table-cell" or "cell" or "tables" or "cells" or "native" or "native-op") + { + var args = new Dictionary(StringComparer.Ordinal); + if (viewArgs != null) + foreach (var entry in viewArgs) + args[entry.Key] = entry.Value; + string bridgeCommand; + var effectiveOperation = operation ?? HwpCapabilityConstants.OperationReadText; + if (modeKey is "png") + { + bridgeCommand = "render-png"; + args["--out-dir"] = outArg != null + ? Path.GetFullPath(outArg) + : Path.Combine(Path.GetTempPath(), $"officecli_hwp_png_{Guid.NewGuid():N}"); + args["--page"] = pageFilter ?? "all"; + Directory.CreateDirectory(args["--out-dir"]); + } + else if (modeKey is "pdf") + { + bridgeCommand = "export-pdf"; + args["--output"] = outArg != null + ? Path.GetFullPath(outArg) + : Path.GetFullPath(Path.ChangeExtension(filePath, ".pdf")); + args["--page"] = pageFilter ?? "all"; + } + else if (modeKey is "markdown" or "md") + { + bridgeCommand = "export-markdown"; + args["--page"] = pageFilter ?? "all"; + } + else if (modeKey is "thumbnail") + { + bridgeCommand = "thumbnail"; + args["--output"] = outArg != null + ? Path.GetFullPath(outArg) + : Path.Combine(Path.GetTempPath(), $"officecli_hwp_thumbnail_{Guid.NewGuid():N}.png"); + } + else if (modeKey is "info") + { + bridgeCommand = "document-info"; + } + else if (modeKey is "diagnostics" or "diag") + { + bridgeCommand = "diagnostics"; + } + else if (modeKey is "dump" or "controls") + { + bridgeCommand = "dump-controls"; + } + else if (modeKey is "pages" or "dump-pages") + { + bridgeCommand = "dump-pages"; + if (!string.IsNullOrWhiteSpace(pageFilter)) + args["--page"] = pageFilter; + } + else if (modeKey is "table-cell" or "cell") + { + bridgeCommand = "get-cell-text"; + } + else if (modeKey is "native" or "native-op") + { + if (string.IsNullOrWhiteSpace(nativeOp)) + throw new HwpEngineException( + "HWP native view requires --op .", + HwpCapabilityConstants.ReasonUnsupportedOperation, + "Example: officecli view input.hwp native --op get-style-list --json", + [HwpCapabilityConstants.OperationNativeRead], + formatKey, + HwpCapabilityConstants.OperationNativeRead, + HwpCapabilityConstants.EngineRhwpBridge, + HwpCapabilityConstants.ModeExperimental); + ValidateHwpNativeViewRequest(formatKey, nativeOp, nativeArgs ?? Array.Empty()); + bridgeCommand = "native-op"; + args["--op"] = nativeOp; + foreach (var (key, value) in ParsePropsArray(nativeArgs ?? Array.Empty())) + { + var normalized = key.StartsWith("--", StringComparison.Ordinal) ? key : $"--{key}"; + args[normalized] = value; + } + } + else + { + bridgeCommand = "scan-cells"; + } + + var request = new HwpJsonViewRequest(format, filePath, fileInfo.Length, effectiveOperation, bridgeCommand, args, json); + var result = engine.ViewJsonAsync(request, ct).GetAwaiter().GetResult(); + if (json) + { + var data = (System.Text.Json.Nodes.JsonObject)result.Data.DeepClone(); + data["engine"] = result.Engine; + data["engineVersion"] = result.EngineVersion; + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = true, + ["data"] = data, + ["warnings"] = HwpCapabilityJsonMapper.ToJsonArray(result.Warnings) + }; + Console.WriteLine(envelope.ToJsonString(OfficeCli.Core.OutputFormatter.PublicJsonOptions)); + } + else if (result.Data["markdown"]?.GetValue() is { } markdown) + { + Console.WriteLine(markdown); + } + else if (result.Data["dump"]?.GetValue() is { } dump) + { + Console.WriteLine(dump); + } + else if (result.Data["pdf"]?["path"]?.GetValue() is { } pdfPath) + { + Console.WriteLine(pdfPath); + } + else + { + Console.WriteLine(result.Data.ToJsonString(OfficeCli.Core.OutputFormatter.PublicJsonOptions)); + } + return 0; + } + + if (modeKey is "fields") + { + var request = new HwpFieldListRequest(format, filePath, fileInfo.Length, json); + var result = engine.ListFieldsAsync(request, ct).GetAwaiter().GetResult(); + if (json) + { + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = true, + ["data"] = result.Fields.DeepClone(), + ["engine"] = result.Engine, + ["engineVersion"] = result.EngineVersion, + ["warnings"] = HwpCapabilityJsonMapper.ToJsonArray(result.Warnings) + }; + Console.WriteLine(envelope.ToJsonString(OfficeCli.Core.OutputFormatter.PublicJsonOptions)); + } + else + { + Console.WriteLine(result.Fields.ToJsonString(OfficeCli.Core.OutputFormatter.PublicJsonOptions)); + } + return 0; + } + + if (modeKey is "field") + { + var request = new HwpFieldReadRequest(format, filePath, fieldName, fieldId, fileInfo.Length, json); + var result = engine.ReadFieldAsync(request, ct).GetAwaiter().GetResult(); + if (json) + { + var envelope = new System.Text.Json.Nodes.JsonObject + { + ["success"] = true, + ["data"] = result.Field.DeepClone(), + ["engine"] = result.Engine, + ["engineVersion"] = result.EngineVersion, + ["warnings"] = HwpCapabilityJsonMapper.ToJsonArray(result.Warnings) + }; + Console.WriteLine(envelope.ToJsonString(OfficeCli.Core.OutputFormatter.PublicJsonOptions)); + } + else + { + Console.WriteLine(result.Field.ToJsonString(OfficeCli.Core.OutputFormatter.PublicJsonOptions)); + } + return 0; + } + + throw new HwpEngineException( + $"{formatKey} bridge view mode '{mode}' is not supported. Use text, svg, png, pdf, markdown, thumbnail, info, diagnostics, dump, pages, fields, field, table-cell, tables, or native.", + HwpCapabilityConstants.ReasonUnsupportedOperation, + null, + [ + HwpCapabilityConstants.OperationReadText, + HwpCapabilityConstants.OperationRenderSvg, + HwpCapabilityConstants.OperationRenderPng, + HwpCapabilityConstants.OperationExportPdf, + HwpCapabilityConstants.OperationExportMarkdown, + HwpCapabilityConstants.OperationThumbnail, + HwpCapabilityConstants.OperationDocumentInfo, + HwpCapabilityConstants.OperationDiagnostics, + HwpCapabilityConstants.OperationDumpControls, + HwpCapabilityConstants.OperationDumpPages, + HwpCapabilityConstants.OperationListFields, + HwpCapabilityConstants.OperationReadField, + HwpCapabilityConstants.OperationReadTableCell, + HwpCapabilityConstants.OperationScanCells, + HwpCapabilityConstants.OperationNativeRead + ], + formatKey, + null, + HwpCapabilityConstants.EngineRhwpBridge, + HwpCapabilityConstants.ModeExperimental); + } } diff --git a/src/officecli/CommandBuilder.Watch.cs b/src/officecli/CommandBuilder.Watch.cs index 20c984713..27ba1744a 100644 --- a/src/officecli/CommandBuilder.Watch.cs +++ b/src/officecli/CommandBuilder.Watch.cs @@ -11,7 +11,7 @@ static partial class CommandBuilder { private static Command BuildWatchCommand() { - var watchFileArg = new Argument("file") { Description = "Office document path (.pptx, .xlsx, .docx)" }; + var watchFileArg = new Argument("file") { Description = "Office document path (.pptx, .xlsx, .docx, .hwpx)" }; var watchPortOpt = new Option("--port") { Description = "HTTP port for preview server" }; watchPortOpt.DefaultValueFactory = _ => 26315; @@ -57,6 +57,8 @@ private static Command BuildWatchCommand() initialHtml = excel.ViewAsHtml(); else if (handler is OfficeCli.Handlers.WordHandler word) initialHtml = word.ViewAsHtml(); + else if (handler is OfficeCli.Handlers.HwpxHandler hwpx) + initialHtml = hwpx.ViewAsHtml(); } catch (Exception ex) { @@ -95,7 +97,7 @@ private static Command BuildWatchCommand() private static Command BuildUnwatchCommand() { - var unwatchFileArg = new Argument("file") { Description = "Office document path (.pptx, .xlsx, .docx)" }; + var unwatchFileArg = new Argument("file") { Description = "Office document path (.pptx, .xlsx, .docx, .hwpx)" }; var unwatchCommand = new Command("unwatch", "Stop the watch preview server for the document"); unwatchCommand.Add(unwatchFileArg); diff --git a/src/officecli/CommandBuilder.cs b/src/officecli/CommandBuilder.cs index d673b8d01..db9a9a165 100644 --- a/src/officecli/CommandBuilder.cs +++ b/src/officecli/CommandBuilder.cs @@ -17,9 +17,10 @@ public static RootCommand BuildRootCommand() var jsonOption = new Option("--json") { Description = "Output as JSON (AI-friendly)" }; var rootCommand = new RootCommand(""" - officecli: AI-friendly CLI for Office documents (.docx, .xlsx, .pptx) + officecli: AI-friendly CLI for Office documents (.docx, .xlsx, .pptx, .hwpx, experimental .hwp) Run 'officecli help' for the schema-driven capability reference (formats, elements, properties). + Run 'officecli help hwp' for experimental rhwp bridge setup, examples, and support boundaries. See the Commands section below for the full list of subcommands. """); rootCommand.Add(jsonOption); @@ -124,6 +125,7 @@ public static RootCommand BuildRootCommand() rootCommand.Add(BuildGetCommand(jsonOption)); rootCommand.Add(BuildQueryCommand(jsonOption)); rootCommand.Add(BuildSetCommand(jsonOption)); + rootCommand.Add(BuildHwpHelpCommand(jsonOption)); rootCommand.Add(BuildAddCommand(jsonOption)); rootCommand.Add(BuildRemoveCommand(jsonOption)); rootCommand.Add(BuildMoveCommand(jsonOption)); @@ -139,6 +141,9 @@ public static RootCommand BuildRootCommand() rootCommand.Add(BuildCreateCommand(jsonOption)); rootCommand.Add(BuildMergeCommand(jsonOption)); rootCommand.Add(BuildPluginsCommand(jsonOption)); + rootCommand.Add(BuildCompareCommand(jsonOption)); + rootCommand.Add(BuildCapabilitiesCommand(jsonOption)); + rootCommand.Add(BuildSchemaCommand(jsonOption)); foreach (var stub in BuildIntegrationStubCommands()) rootCommand.Add(stub); @@ -586,6 +591,7 @@ internal static string ExecuteBatchItem(OfficeCli.Core.IDocumentHandler handler, OfficeCli.Handlers.PowerPointHandler ppt => ppt.Swap(item.Path, item.To), OfficeCli.Handlers.WordHandler word => word.Swap(item.Path, item.To), OfficeCli.Handlers.ExcelHandler excel => excel.Swap(item.Path, item.To), + OfficeCli.Handlers.HwpxHandler hwpx => hwpx.Swap(item.Path, item.To), _ => throw new InvalidOperationException("swap not supported for this document type") }; return $"Swapped {p1} <-> {p2}"; @@ -1204,6 +1210,11 @@ private static void NotifyWatch(IDocumentHandler handler, string filePath, strin WatchNotifier.NotifyIfWatching(filePath, new WatchMessage { Action = "full", FullHtml = word.ViewAsHtml(), ScrollTo = scrollTo }); return; } + if (handler is OfficeCli.Handlers.HwpxHandler hwpx) + { + WatchNotifier.NotifyIfWatching(filePath, new WatchMessage { Action = "full", FullHtml = hwpx.ViewAsHtml() }); + return; + } if (handler is not OfficeCli.Handlers.PowerPointHandler ppt) return; var slideNum = WatchMessage.ExtractSlideNum(changedPath); if (slideNum > 0) @@ -1225,6 +1236,11 @@ private static void NotifyWatchRoot(IDocumentHandler handler, string filePath, i { if (!WatchServer.IsWatching(filePath)) return; + if (handler is OfficeCli.Handlers.HwpxHandler hwpx) + { + WatchNotifier.NotifyIfWatching(filePath, new WatchMessage { Action = "full", FullHtml = hwpx.ViewAsHtml() }); + return; + } if (handler is OfficeCli.Handlers.ExcelHandler excel) { WatchNotifier.NotifyIfWatching(filePath, new WatchMessage { Action = "full", FullHtml = excel.ViewAsHtml() }); diff --git a/src/officecli/Core/CjkHelper.cs b/src/officecli/Core/CjkHelper.cs new file mode 100644 index 000000000..fcc15dfd2 --- /dev/null +++ b/src/officecli/Core/CjkHelper.cs @@ -0,0 +1,260 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 +// +// Modified by cli-jaw contributors +// Added: CJK font handling, language detection, kinsoku processing + +using DocumentFormat.OpenXml; +using DocumentFormat.OpenXml.Wordprocessing; +using System.Text; +using A = DocumentFormat.OpenXml.Drawing; + +namespace OfficeCli.Core; + +///

+/// CJK (Chinese-Japanese-Korean) text handling utilities. +/// Provides font fallback chains, language detection, and line-break rules. +/// Uses for basic character classification. +/// +public static class CjkHelper +{ + // ── Script detection ──────────────────────────────────────────── + + /// Detect if text contains any CJK characters. + public static bool ContainsCjk(string? text) + { + if (string.IsNullOrEmpty(text)) return false; + foreach (var c in text) + if (ParseHelpers.IsCjkOrFullWidth(c)) + return true; + return false; + } + + /// Detect the dominant CJK script in text. + public static CjkScript DetectScript(string? text) + { + if (string.IsNullOrEmpty(text)) return CjkScript.None; + + int ko = 0, ja = 0, zh = 0; + foreach (var c in text) + { + if (IsKorean(c)) ko++; + else if (IsJapanese(c)) ja++; + else if (IsChinese(c)) zh++; + } + + if (ko == 0 && ja == 0 && zh == 0) return CjkScript.None; + if (ko >= ja && ko >= zh) return CjkScript.Korean; + if (ja >= ko && ja >= zh) return CjkScript.Japanese; + return CjkScript.Chinese; + } + + // ── Font chains ───────────────────────────────────────────────── + + /// Get the font fallback chain for a CJK script. + public static (string primary, string fallback1, string fallback2) GetFontChain(CjkScript script) => + script switch + { + CjkScript.Korean => ("Malgun Gothic", "맑은 고딕", "AppleSDGothicNeo-Regular"), + CjkScript.Japanese => ("Yu Gothic", "Meiryo", "Hiragino Sans"), + CjkScript.Chinese => ("Microsoft YaHei", "SimSun", "PingFang SC"), + _ => ("", "", "") + }; + + /// Get the BCP 47 language tag for a CJK script. + public static string GetLanguageTag(CjkScript script) => + script switch + { + CjkScript.Korean => "ko-KR", + CjkScript.Japanese => "ja-JP", + CjkScript.Chinese => "zh-CN", + _ => "" + }; + + /// + /// Split text into contiguous CJK/non-CJK segments so mixed content can be + /// emitted as separate runs. + /// + public static IReadOnlyList<(string text, CjkScript script)> SegmentText(string? text) + { + var segments = new List<(string text, CjkScript script)>(); + if (string.IsNullOrEmpty(text)) return segments; + + var buffer = new StringBuilder(); + var currentIsCjk = ParseHelpers.IsCjkOrFullWidth(text[0]); + + foreach (var c in text) + { + var isCjk = ParseHelpers.IsCjkOrFullWidth(c); + if (buffer.Length > 0 && isCjk != currentIsCjk) + { + AddSegment(segments, buffer.ToString(), currentIsCjk); + buffer.Clear(); + } + + buffer.Append(c); + currentIsCjk = isCjk; + } + + if (buffer.Length > 0) + AddSegment(segments, buffer.ToString(), currentIsCjk); + + return segments; + } + + // ── WordML (DOCX) helpers ─────────────────────────────────────── + + /// + /// Apply CJK fonts and language to a WordML w:rPr element. + /// Sets w:rFonts/@w:eastAsia and w:lang/@w:eastAsia. + /// + public static void ApplyToWordRun(RunProperties rPr, CjkScript script) + { + if (script == CjkScript.None || rPr == null) return; + var (primary, _, _) = GetFontChain(script); + var lang = GetLanguageTag(script); + + // w:rFonts — set eastAsia only (preserve user's Ascii/HighAnsi) + var rFonts = rPr.GetFirstChild(); + if (rFonts == null) + { + rFonts = new RunFonts(); + rPr.PrependChild(rFonts); + } + rFonts.EastAsia = primary; + + // w:lang — set eastAsia + var langElem = rPr.GetFirstChild(); + if (langElem == null) + { + langElem = new Languages(); + rPr.AppendChild(langElem); + } + langElem.EastAsia = lang; + } + + /// Remove CJK-only WordML font and language metadata from a run. + public static void ClearWordRunCjk(RunProperties rPr) + { + if (rPr == null) return; + + var rFonts = rPr.GetFirstChild(); + if (rFonts != null) + rFonts.EastAsia = null; + + var langElem = rPr.GetFirstChild(); + if (langElem != null) + langElem.EastAsia = null; + } + + /// + /// Detect CJK in text and apply font/lang to the run's properties. + /// Creates RunProperties if missing. + /// + public static void ApplyToWordRunIfCjk(Run run, string? text) + { + var script = DetectScript(text); + if (script == CjkScript.None) + { + var existingRPr = run.GetFirstChild(); + if (existingRPr != null) + ClearWordRunCjk(existingRPr); + return; + } + + var rPr = run.GetFirstChild(); + if (rPr == null) + { + rPr = new RunProperties(); + run.PrependChild(rPr); + } + ApplyToWordRun(rPr, script); + } + + // ── DrawingML (PPTX/XLSX chart text) helpers ──────────────────── + + /// + /// Apply CJK fonts and language to a DrawingML a:rPr element. + /// Sets a:ea/@typeface and a:rPr/@lang. + /// + public static void ApplyToDrawingRun(A.RunProperties rPr, CjkScript script) + { + if (script == CjkScript.None || rPr == null) return; + var (primary, _, _) = GetFontChain(script); + var lang = GetLanguageTag(script); + + // a:ea (East Asian font) + var eaFont = rPr.GetFirstChild(); + if (eaFont == null) + { + eaFont = new A.EastAsianFont(); + rPr.AppendChild(eaFont); + } + eaFont.Typeface = primary; + + // lang attribute + rPr.Language = lang; + } + + /// Remove DrawingML CJK font metadata and restore fallback language. + public static void ClearDrawingRunCjk(A.RunProperties rPr, string fallbackLang = "en-US") + { + if (rPr == null) return; + + rPr.RemoveAllChildren(); + rPr.Language = fallbackLang; + } + + /// + /// Detect CJK in text and apply font/lang. Falls back to the provided + /// default language when no CJK is detected. + /// + public static void ApplyToDrawingRunIfCjk(A.RunProperties rPr, string? text, string fallbackLang = "en-US") + { + var script = DetectScript(text); + if (script != CjkScript.None) + ApplyToDrawingRun(rPr, script); + else + ClearDrawingRunCjk(rPr, fallbackLang); + } + + // ── Kinsoku (line-break rules) ────────────────────────────────── + + // Characters that must NOT appear at the start of a line + private const string KinsokuStartChars = + "!%),.:;?]}¢°·'\"†‡›℃∵、。〉》」』】〕〗〙〛!"%'),.:;?>]}~"; + + // Characters that must NOT appear at the end of a line + private const string KinsokuEndChars = + "$(£¥·'\"〈《「『【〔〖〘〚$([{£¥"; + + /// Cannot start a line (e.g. closing brackets, periods). + public static bool IsKinsokuStart(char c) => KinsokuStartChars.Contains(c); + + /// Cannot end a line (e.g. opening brackets). + public static bool IsKinsokuEnd(char c) => KinsokuEndChars.Contains(c); + + // ── Private character classifiers ─────────────────────────────── + + private static bool IsKorean(char c) => + (c >= 0xAC00 && c <= 0xD7AF) // Hangul Syllables + || (c >= 0x1100 && c <= 0x11FF) // Hangul Jamo + || (c >= 0x3130 && c <= 0x318F); // Hangul Compat Jamo + + private static bool IsJapanese(char c) => + (c >= 0x3040 && c <= 0x309F) // Hiragana + || (c >= 0x30A0 && c <= 0x30FF) // Katakana + || (c >= 0x31F0 && c <= 0x31FF); // Katakana Phonetic Ext + + private static bool IsChinese(char c) => + (c >= 0x4E00 && c <= 0x9FFF) // CJK Unified Ideographs + || (c >= 0x3400 && c <= 0x4DBF); // CJK Extension A + + private static void AddSegment(List<(string text, CjkScript script)> segments, string text, bool isCjk) + { + segments.Add((text, isCjk ? DetectScript(text) : CjkScript.None)); + } +} + +/// Identified CJK script family. +public enum CjkScript { None, Korean, Japanese, Chinese } diff --git a/src/officecli/Core/IDocumentHandler.cs b/src/officecli/Core/IDocumentHandler.cs index 1a49c164f..7d0a354a0 100644 --- a/src/officecli/Core/IDocumentHandler.cs +++ b/src/officecli/Core/IDocumentHandler.cs @@ -100,4 +100,44 @@ public interface IDocumentHandler : IDisposable bool TryExtractBinary(string path, string destPath, out string? contentType, out long byteCount); } -public record ValidationError(string ErrorType, string Description, string? Path, string? Part); +/// +/// Standardized validation error/warning codes (aligned with kordoc v2.2.6). +/// +public static class ValidationCodes +{ + // Errors (critical — document may not open correctly) + public const string Encrypted = "ENCRYPTED"; + public const string DrmProtected = "DRM_PROTECTED"; + public const string ZipBomb = "ZIP_BOMB"; + public const string Corrupted = "CORRUPTED"; + public const string NoSections = "NO_SECTIONS"; + public const string ZipEmpty = "ZIP_EMPTY"; + public const string ZipCorrupt = "ZIP_CORRUPT"; + public const string OpfMissing = "OPF_MISSING"; + public const string XmlMalformed = "XML_MALFORMED"; + public const string IdRefOrphan = "IDREF_ORPHAN"; + public const string TableStructure = "TABLE_STRUCTURE"; + public const string BinDataMissing = "BINDATA_MISSING"; + public const string BinDataOrphan = "BINDATA_ORPHAN"; + public const string FieldPairMismatch = "FIELD_PAIR_MISMATCH"; + public const string SectionMismatch = "SECTION_MISMATCH"; + + // Warnings (non-critical — document opens but may have issues) + public const string TruncatedTable = "TRUNCATED_TABLE"; + public const string MalformedXml = "MALFORMED_XML_MINOR"; + public const string PartialParse = "PARTIAL_PARSE"; + public const string NamespaceMissing = "NAMESPACE_MISSING"; + public const string NamespaceMismatch = "NAMESPACE_MISMATCH"; + public const string StaleIdRef = "STALE_IDREF"; + public const string EmptySection = "EMPTY_SECTION"; + public const string LargeFile = "LARGE_FILE"; + public const string DeprecatedElement = "DEPRECATED_ELEMENT"; + public const string MergedCellOverlap = "MERGED_CELL_OVERLAP"; +} + +public record ValidationError( + string ErrorType, + string Description, + string? Path, + string? Part, + IssueSeverity Severity = IssueSeverity.Error); diff --git a/src/officecli/Core/OutputFormatter.cs b/src/officecli/Core/OutputFormatter.cs index d70969520..f04f276d0 100644 --- a/src/officecli/Core/OutputFormatter.cs +++ b/src/officecli/Core/OutputFormatter.cs @@ -50,6 +50,16 @@ internal class ErrorResult public string? Help { get; set; } [JsonPropertyName("validValues")] public string[]? ValidValues { get; set; } + [JsonPropertyName("format")] + public string? Format { get; set; } + [JsonPropertyName("operation")] + public string? Operation { get; set; } + [JsonPropertyName("engine")] + public string? Engine { get; set; } + [JsonPropertyName("engineMode")] + public string? EngineMode { get; set; } + [JsonPropertyName("nextCommand")] + public string? NextCommand { get; set; } } internal class CliWarning @@ -229,6 +239,13 @@ public static string WrapErrorEnvelope(Exception ex) ["success"] = false, ["error"] = JsonSerializer.SerializeToNode(errorResult, AppJsonContext.Default.ErrorResult) }; + if (ex is OfficeCli.Handlers.Hwp.HwpEngineException { Transaction: not null } hwp) + { + envelope["data"] = new JsonObject + { + ["transaction"] = hwp.Transaction.DeepClone() + }; + } return envelope.ToJsonString(JsonOptions); } @@ -248,6 +265,18 @@ private static ErrorResult BuildErrorResult(Exception ex) result.Help = cli.Help; result.ValidValues = cli.ValidValues; } + else if (ex is OfficeCli.Handlers.Hwp.HwpEngineException hwp) + { + result.Code = hwp.Error.Code; + result.Suggestion = hwp.Error.Suggestion; + result.Help = "officecli help hwp"; + result.ValidValues = hwp.Error.ValidValues; + result.Format = hwp.Error.Format; + result.Operation = hwp.Error.Operation; + result.Engine = hwp.Error.Engine; + result.EngineMode = hwp.Error.EngineMode; + result.NextCommand = BuildHwpNextCommand(hwp.Error.Code, hwp.Error.Operation); + } else { EnrichFromMessage(result, ex); @@ -256,6 +285,17 @@ private static ErrorResult BuildErrorResult(Exception ex) return result; } + private static string BuildHwpNextCommand(string? code, string? operation) + { + if (code is "bridge_not_enabled" or "bridge_missing" or "rhwp_runtime_missing" or "rhwp_api_missing") + return "officecli hwp doctor --json"; + + if (operation is not null) + return "officecli capabilities --json"; + + return "officecli help hwp"; + } + private static void EnrichFromMessage(ErrorResult result, Exception ex) { var msg = ex.Message; diff --git a/src/officecli/Core/SkillInstaller.cs b/src/officecli/Core/SkillInstaller.cs index 473157eac..a5981268b 100644 --- a/src/officecli/Core/SkillInstaller.cs +++ b/src/officecli/Core/SkillInstaller.cs @@ -32,6 +32,7 @@ private static readonly (string[] Aliases, string DisplayName, string DetectDir, // Guide name → skill folder name mapping private static readonly Dictionary SkillMap = new(StringComparer.OrdinalIgnoreCase) { + ["hwpx"] = "officecli-hwpx", ["pptx"] = "officecli-pptx", ["word"] = "officecli-docx", ["excel"] = "officecli-xlsx", diff --git a/src/officecli/Core/TemplateMerger.cs b/src/officecli/Core/TemplateMerger.cs index c77784c3a..286ca4eb7 100644 --- a/src/officecli/Core/TemplateMerger.cs +++ b/src/officecli/Core/TemplateMerger.cs @@ -84,10 +84,11 @@ public static MergeResult Merge(string templatePath, string outputPath, Dictiona ".docx" => MergeDocx(outputPath, data), ".xlsx" => MergeXlsx(outputPath, data), ".pptx" => MergePptx(outputPath, data), + ".hwpx" => MergeHwpx(outputPath, data), _ => throw new CliException($"Unsupported file type for merge: {ext}") { Code = "unsupported_type", - ValidValues = [".docx", ".xlsx", ".pptx"] + ValidValues = [".docx", ".xlsx", ".pptx", ".hwpx"] } }; } @@ -212,6 +213,36 @@ private static List ScanUnresolvedDocx(string filePath) return unresolved.OrderBy(x => x).ToList(); } + private static MergeResult MergeHwpx(string filePath, Dictionary data) + { + using (var handler = new Handlers.HwpxHandler(filePath, editable: true)) + { + foreach (var kvp in data) + { + var placeholder = "{{" + kvp.Key + "}}"; + handler.Set("/", new Dictionary + { + ["find"] = placeholder, + ["replace"] = kvp.Value + }); + } + } + + var unresolved = ScanUnresolvedHwpx(filePath); + var usedKeys = data.Keys.Where(k => !unresolved.Contains(k)).ToList(); + return new MergeResult(usedKeys.Count, unresolved, usedKeys); + } + + private static List ScanUnresolvedHwpx(string filePath) + { + var unresolved = new HashSet(); + using var handler = new Handlers.HwpxHandler(filePath, editable: false); + var text = handler.ViewAsText(); + foreach (System.Text.RegularExpressions.Match match in PlaceholderPattern.Matches(text)) + unresolved.Add(match.Groups[1].Value); + return unresolved.OrderBy(x => x).ToList(); + } + private static MergeResult MergeXlsx(string filePath, Dictionary data) { var usedKeys = new HashSet(); diff --git a/src/officecli/Handlers/DocumentHandlerFactory.cs b/src/officecli/Handlers/DocumentHandlerFactory.cs index 32b5511e0..ea60bd508 100644 --- a/src/officecli/Handlers/DocumentHandlerFactory.cs +++ b/src/officecli/Handlers/DocumentHandlerFactory.cs @@ -73,8 +73,16 @@ private static IDocumentHandler OpenHandler(string filePath, string ext, bool ed ".docx" => new WordHandler(filePath, editable), ".xlsx" => new ExcelHandler(filePath, editable), ".pptx" => new PowerPointHandler(filePath, editable), - _ => TryOpenViaPlugin(filePath, ext, editable) - ?? throw UnsupportedTypeException(ext) + ".hwpx" => new HwpxHandler(filePath, editable), + ".hwp" => throw new CliException( + "Binary .hwp files are handled by operation-specific rhwp-backed OfficeCLI routes, not the generic OOXML document handler.") + { + Code = "hwp_generic_handler_unsupported", + Suggestion = "Run `officecli hwp doctor --json`, then use `officecli view file.hwp text --json`, `officecli create file.hwp --json`, or `officecli hwp --json` recipes.", + Help = "officecli hwp doctor --json" + }, + _ => TryOpenViaPlugin(filePath, ext, editable) + ?? throw UnsupportedTypeException(ext) }; } @@ -181,12 +189,12 @@ private static IDocumentHandler OpenHandlerWithRetry(string path, string ext, bo private static CliException UnsupportedTypeException(string ext) => new CliException( - $"Unsupported file type: {ext}. Supported: .docx, .xlsx, .pptx. " + + $"Unsupported file type: {ext}. Supported: .docx, .xlsx, .pptx, .hwpx, experimental .hwp. " + $"Other formats may be opened via plugins — run `officecli plugins list` to see installed plugins, " + $"or see docs/plugin-protocol.md for installation paths.") { Code = "unsupported_type", - ValidValues = [".docx", ".xlsx", ".pptx"] + ValidValues = [".docx", ".xlsx", ".pptx", ".hwpx", ".hwp"] }; private static bool IsEncodingException(Exception ex) diff --git a/src/officecli/Handlers/Hwp/CustomHwpxEngine.cs b/src/officecli/Handlers/Hwp/CustomHwpxEngine.cs new file mode 100644 index 000000000..c02fc80d8 --- /dev/null +++ b/src/officecli/Handlers/Hwp/CustomHwpxEngine.cs @@ -0,0 +1,124 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Reflection; + +namespace OfficeCli.Handlers.Hwp; + +public sealed class CustomHwpxEngine : IHwpEngine +{ + public string Name => HwpCapabilityConstants.EngineCustom; + public string? Version => $"officecli:{Assembly.GetExecutingAssembly().GetName().Version}"; + public HwpEngineMode Mode => HwpEngineMode.Default; + + public Task GetCapabilitiesAsync(CancellationToken ct) + { + ct.ThrowIfCancellationRequested(); + return Task.FromResult(HwpCapabilityFactory.BuildReport(Version)); + } + + public Task ReadTextAsync(HwpReadRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, HwpCapabilityConstants.OperationReadText); + } + + public Task RenderSvgAsync(HwpRenderRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, HwpCapabilityConstants.OperationRenderSvg); + } + + public Task ViewJsonAsync(HwpJsonViewRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, request.Operation); + } + + public Task ListFieldsAsync(HwpFieldListRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, HwpCapabilityConstants.OperationListFields); + } + + public Task ReadFieldAsync(HwpFieldReadRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, HwpCapabilityConstants.OperationReadField); + } + + public Task FillFieldAsync(HwpFillFieldRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, HwpCapabilityConstants.OperationFillField); + } + + public Task ReplaceTextAsync(HwpReplaceTextRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, HwpCapabilityConstants.OperationReplaceText); + } + + public Task InsertTextAsync(HwpInsertTextRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, HwpCapabilityConstants.OperationInsertText); + } + + public Task SetTableCellAsync(HwpTableCellSetRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, HwpCapabilityConstants.OperationSetTableCell); + } + + public Task SaveOriginalAsync(HwpSaveOriginalRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, HwpCapabilityConstants.OperationSaveOriginal); + } + + public Task ConvertToEditableAsync(HwpConvertToEditableRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, HwpCapabilityConstants.OperationConvertToEditable); + } + + public Task NativeMutationAsync(HwpNativeMutationRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, HwpCapabilityConstants.OperationNativeMutation); + } + + public Task SaveAsHwpAsync(HwpSaveAsHwpRequest request, CancellationToken ct) + { + throw Unsupported(request.Format, HwpCapabilityConstants.OperationSaveAsHwp); + } + + private static HwpEngineException Unsupported(HwpFormat format, string operation) + { + var formatKey = format == HwpFormat.Hwp + ? HwpCapabilityConstants.FormatHwp + : HwpCapabilityConstants.FormatHwpx; + var reason = operation switch + { + HwpCapabilityConstants.OperationFillField when format == HwpFormat.Hwp + => HwpCapabilityConstants.ReasonBinaryHwpMutationForbidden, + HwpCapabilityConstants.OperationSaveOriginal or HwpCapabilityConstants.OperationSaveAsHwp when format == HwpFormat.Hwp + => HwpCapabilityConstants.ReasonBinaryHwpWriteForbidden, + _ => HwpCapabilityConstants.ReasonRoundTripUnverified + }; + + return new HwpEngineException( + $"{formatKey} operation '{operation}' is not roundtrip-verified in this OfficeCLI build.", + reason, + "Check `officecli capabilities --json` and use only roundtrip-verified operations.", + [ + HwpCapabilityConstants.OperationReadText, + HwpCapabilityConstants.OperationRenderSvg, + HwpCapabilityConstants.OperationRenderPng, + HwpCapabilityConstants.OperationExportPdf, + HwpCapabilityConstants.OperationExportMarkdown, + HwpCapabilityConstants.OperationListFields, + HwpCapabilityConstants.OperationReadField, + HwpCapabilityConstants.OperationFillField, + HwpCapabilityConstants.OperationReplaceText, + HwpCapabilityConstants.OperationInsertText, + HwpCapabilityConstants.OperationNativeRead, + HwpCapabilityConstants.OperationNativeMutation, + HwpCapabilityConstants.OperationSaveOriginal, + HwpCapabilityConstants.OperationSaveAsHwp + ], + formatKey, + operation, + HwpCapabilityConstants.EngineCustom, + HwpCapabilityConstants.ModeDefault); + } +} diff --git a/src/officecli/Handlers/Hwp/HwpBlankCreator.cs b/src/officecli/Handlers/Hwp/HwpBlankCreator.cs new file mode 100644 index 000000000..9f3a89802 --- /dev/null +++ b/src/officecli/Handlers/Hwp/HwpBlankCreator.cs @@ -0,0 +1,85 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Diagnostics; +using OfficeCli.Core; + +namespace OfficeCli.Handlers.Hwp; + +public static class HwpBlankCreator +{ + private const int CreateTimeoutMs = 30_000; + + public static void Create(string path) + { + var apiPath = HwpRuntimeProbe.DiscoverApiPath(); + if (apiPath == null) + throw new CliException("Binary .hwp blank creation requires the rhwp-field-bridge sidecar, but it was not found.") + { + Code = "hwp_create_dependency_missing", + Suggestion = "Run ./dev-install.sh or set OFFICECLI_RHWP_API_BIN to rhwp-field-bridge.", + Help = "officecli hwp doctor --json" + }; + + var fullPath = Path.GetFullPath(path); + var dir = Path.GetDirectoryName(fullPath); + if (!string.IsNullOrEmpty(dir)) + Directory.CreateDirectory(dir); + + var result = RunCreate(apiPath, fullPath); + if (result.ExitCode != 0) + throw new CliException($"rhwp-field-bridge create-blank failed: {result.Stderr.Trim()}") + { + Code = "hwp_create_failed", + Suggestion = result.Stdout.Trim(), + Help = "officecli hwp doctor --json" + }; + + if (!File.Exists(fullPath) || new FileInfo(fullPath).Length == 0) + throw new CliException("rhwp-field-bridge create-blank completed but did not create a non-empty .hwp file.") + { + Code = "hwp_create_output_missing", + Help = "officecli hwp doctor --json" + }; + } + + private static ProcessResult RunCreate(string apiPath, string outputPath) + { + var psi = new ProcessStartInfo + { + FileName = apiPath, + UseShellExecute = false, + RedirectStandardOutput = true, + RedirectStandardError = true, + CreateNoWindow = true + }; + psi.ArgumentList.Add("create-blank"); + psi.ArgumentList.Add("--output"); + psi.ArgumentList.Add(outputPath); + psi.ArgumentList.Add("--json"); + + using var process = Process.Start(psi) + ?? throw new CliException("Failed to start rhwp-field-bridge.") + { + Code = "hwp_create_start_failed", + Help = "officecli hwp doctor --json" + }; + + if (!process.WaitForExit(CreateTimeoutMs)) + { + try { process.Kill(entireProcessTree: true); } catch { } + throw new CliException($"rhwp-field-bridge create-blank timed out after {CreateTimeoutMs}ms.") + { + Code = "hwp_create_timeout", + Help = "officecli hwp doctor --json" + }; + } + + return new ProcessResult( + process.ExitCode, + process.StandardOutput.ReadToEnd(), + process.StandardError.ReadToEnd()); + } + + private sealed record ProcessResult(int ExitCode, string Stdout, string Stderr); +} diff --git a/src/officecli/Handlers/Hwp/HwpCapabilityFactory.cs b/src/officecli/Handlers/Hwp/HwpCapabilityFactory.cs new file mode 100644 index 000000000..9f3d7b941 --- /dev/null +++ b/src/officecli/Handlers/Hwp/HwpCapabilityFactory.cs @@ -0,0 +1,293 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Reflection; + +namespace OfficeCli.Handlers.Hwp; + +public static class HwpCapabilityFactory +{ + public static HwpCapabilityReport BuildReport(string? customEngineVersion = null) + { + var version = customEngineVersion ?? $"officecli:{Assembly.GetExecutingAssembly().GetName().Version}"; + var runtime = HwpRuntimeProbe.Probe(); + var formats = new Dictionary + { + [HwpCapabilityConstants.FormatHwpx] = BuildHwpx(version, runtime), + [HwpCapabilityConstants.FormatHwp] = BuildHwp(runtime) + }; + + return new HwpCapabilityReport( + HwpCapabilityConstants.SchemaVersion, + Assembly.GetExecutingAssembly().GetName().Version?.ToString() ?? "unknown", + DateTimeOffset.UtcNow, + formats); + } + + private static HwpFormatCapability BuildHwpx(string engineVersion, HwpRuntimeProbeResult runtime) + { + var readRenderBlockedReason = ReadRenderBlockedReason(runtime); + var operations = new Dictionary + { + [HwpCapabilityConstants.OperationReadText] = runtime.ReadRenderAvailable + ? ExperimentalBridge(["tests/golden/hwp/rhwp-smoke/hwpx-officecli-view/text.pretty.json"]) + : ExperimentalCustom(engineVersion, ["tests/golden/hwp/rhwp-smoke/hwpx-officecli-view/text.pretty.json"]), + [HwpCapabilityConstants.OperationRenderSvg] = runtime.ReadRenderAvailable + ? ExperimentalBridge(["tests/golden/hwp/rhwp-smoke/hwpx-officecli-view/svg.pretty.json"]) + : ExperimentalBridgeBlocked(readRenderBlockedReason), + [HwpCapabilityConstants.OperationRenderPng] = runtime.RenderPngAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "render-png")), + [HwpCapabilityConstants.OperationExportPdf] = runtime.ExportPdfAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "export-pdf")), + [HwpCapabilityConstants.OperationExportMarkdown] = runtime.ExportMarkdownAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "export-markdown")), + [HwpCapabilityConstants.OperationThumbnail] = runtime.ThumbnailAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "thumbnail")), + [HwpCapabilityConstants.OperationDocumentInfo] = runtime.DocumentInfoAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "document-info")), + [HwpCapabilityConstants.OperationDiagnostics] = runtime.DiagnosticsAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "diagnostics")), + [HwpCapabilityConstants.OperationDumpControls] = runtime.DumpControlsAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "dump-controls")), + [HwpCapabilityConstants.OperationDumpPages] = runtime.DumpPagesAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "dump-pages")), + [HwpCapabilityConstants.OperationListFields] = runtime.ListFieldsAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "list-fields")), + [HwpCapabilityConstants.OperationReadField] = runtime.ReadFieldAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "get-field")), + [HwpCapabilityConstants.OperationFillField] = ExperimentalCustom(engineVersion, []), + [HwpCapabilityConstants.OperationReplaceText] = runtime.ReplaceTextAvailable + ? ExperimentalBridge(["tests/golden/hwp/rhwp-fields/officecli-replace-hwpx-government.json"]) + : ExperimentalCustom(engineVersion, []), + [HwpCapabilityConstants.OperationInsertText] = runtime.InsertTextAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_text.rs"]) + : ExperimentalBridgeBlocked(InsertTextBlockedReason(runtime)), + [HwpCapabilityConstants.OperationSetTableCell] = runtime.SetTableCellAvailable + ? ExperimentalBridge(["tests/OfficeCli.Tests/Hwp/HwpBridgeTableScanTests.cs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "set-cell-text")), + [HwpCapabilityConstants.OperationCreateBlank] = ExperimentalCustom(engineVersion, + ["src/officecli/Resources/base.hwpx"]), + [HwpCapabilityConstants.OperationSaveOriginal] = ExperimentalCustom(engineVersion, []), + [HwpCapabilityConstants.OperationNativeRead] = runtime.NativeOpAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_native.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "native-op")), + [HwpCapabilityConstants.OperationNativeMutation] = runtime.NativeOpAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_native.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "native-op")), + [HwpCapabilityConstants.OperationSaveAsHwp] = runtime.SaveAsHwpAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/main.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "save-as-hwp")) + }; + + return new HwpFormatCapability( + HwpCapabilityConstants.StatusExperimental, + HwpCapabilityConstants.WriteStatusOperationGated, + HwpCapabilityConstants.EngineCustom, + operations, + runtime.ReadRenderAvailable || runtime.MutationAvailable + ? ["HWPX default engine remains custom, and rhwp sidecars are used for operations that the bridge exposes."] + : ["HWPX operations are advertised only after per-operation round-trip evidence exists."]); + } + + private static HwpFormatCapability BuildHwp(HwpRuntimeProbeResult runtime) + { + var readRenderBlockedReason = ReadRenderBlockedReason(runtime); + var createBlockedReason = runtime.ApiAvailable + ? HwpCapabilityConstants.ReasonRoundTripUnverified + : HwpCapabilityConstants.ReasonRhwpApiMissing; + var operations = new Dictionary + { + [HwpCapabilityConstants.OperationReadText] = runtime.ReadRenderAvailable + ? ExperimentalBridge(["tests/golden/hwp/rhwp-smoke/officecli-view/text.pretty.json"]) + : ExperimentalBridgeBlocked(readRenderBlockedReason), + [HwpCapabilityConstants.OperationRenderSvg] = runtime.ReadRenderAvailable + ? ExperimentalBridge(["tests/golden/hwp/rhwp-smoke/officecli-view/svg.pretty.json"]) + : ExperimentalBridgeBlocked(readRenderBlockedReason), + [HwpCapabilityConstants.OperationRenderPng] = runtime.RenderPngAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "render-png")), + [HwpCapabilityConstants.OperationExportPdf] = runtime.ExportPdfAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "export-pdf")), + [HwpCapabilityConstants.OperationExportMarkdown] = runtime.ExportMarkdownAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "export-markdown")), + [HwpCapabilityConstants.OperationThumbnail] = runtime.ThumbnailAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "thumbnail")), + [HwpCapabilityConstants.OperationDocumentInfo] = runtime.DocumentInfoAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "document-info")), + [HwpCapabilityConstants.OperationDiagnostics] = runtime.DiagnosticsAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "diagnostics")), + [HwpCapabilityConstants.OperationDumpControls] = runtime.DumpControlsAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "dump-controls")), + [HwpCapabilityConstants.OperationDumpPages] = runtime.DumpPagesAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_view.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "dump-pages")), + [HwpCapabilityConstants.OperationListFields] = runtime.ListFieldsAvailable + ? ExperimentalBridge(["tests/golden/hwp/rhwp-fields/field-list.json"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "list-fields")), + [HwpCapabilityConstants.OperationReadField] = runtime.ReadFieldAvailable + ? ExperimentalBridge(["tests/golden/hwp/rhwp-fields/field-read-company-name.json"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "get-field")), + [HwpCapabilityConstants.OperationFillField] = runtime.FillFieldAvailable + ? ExperimentalBridge(["tests/golden/hwp/rhwp-fields/field-set-company-name-cli-readback.json"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "set-field")), + [HwpCapabilityConstants.OperationReplaceText] = runtime.ReplaceTextAvailable + ? ExperimentalBridge(["tests/golden/hwp/rhwp-fields/officecli-replace-marketing-title.json"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "replace-text")), + [HwpCapabilityConstants.OperationInsertText] = runtime.InsertTextAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_text.rs"]) + : ExperimentalBridgeBlocked(InsertTextBlockedReason(runtime)), + [HwpCapabilityConstants.OperationReadTableCell] = runtime.ReadTableCellAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "get-cell-text")), + [HwpCapabilityConstants.OperationScanCells] = runtime.ScanCellsAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "scan-cells")), + [HwpCapabilityConstants.OperationSetTableCell] = runtime.SetTableCellAvailable + ? ExperimentalBridge(["tests/golden/hwp/rhwp-tables/officecli-set-cell-hwp-table-readback.json"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "set-cell-text")), + [HwpCapabilityConstants.OperationCreateBlank] = runtime.CreateBlankAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/main.rs"]) + : ExperimentalBridgeBlocked(createBlockedReason), + [HwpCapabilityConstants.OperationSaveOriginal] = Unsupported( + HwpCapabilityConstants.ReasonBinaryHwpWriteForbidden), + [HwpCapabilityConstants.OperationConvertToEditable] = runtime.ConvertToEditableAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "convert-to-editable")), + [HwpCapabilityConstants.OperationNativeRead] = runtime.NativeOpAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_native.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "native-op")), + [HwpCapabilityConstants.OperationNativeMutation] = runtime.NativeOpAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/ops_native.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "native-op")), + [HwpCapabilityConstants.OperationSaveAsHwp] = runtime.SaveAsHwpAvailable + ? ExperimentalBridge(["src/rhwp-field-bridge/src/main.rs"]) + : ExperimentalBridgeBlocked(ApiCommandBlockedReason(runtime, "save-as-hwp")) + }; + + return new HwpFormatCapability( + runtime.ReadRenderAvailable + ? HwpCapabilityConstants.StatusExperimental + : HwpCapabilityConstants.StatusUnsupported, + runtime.MutationAvailable || runtime.CreateBlankAvailable + ? HwpCapabilityConstants.WriteStatusOperationGated + : HwpCapabilityConstants.WriteStatusUnsupported, + runtime.BridgeAvailable || runtime.ApiAvailable + ? HwpCapabilityConstants.EngineRhwpBridge + : HwpCapabilityConstants.EngineNone, + operations, + HwpWarnings(runtime)); + } + + private static HwpOperationCapability ExperimentalCustom(string engineVersion, string[] evidence) + => new( + HwpOperationStatus.Experimental, + HwpCapabilityConstants.EngineCustom, + engineVersion, + evidence, + ["Not advertised until fixture round-trip and Hancom evidence are complete."], + HwpCapabilityConstants.ReasonRoundTripUnverified); + + private static HwpOperationCapability ExperimentalBridge(string[] evidence) + => new( + HwpOperationStatus.Experimental, + HwpCapabilityConstants.EngineRhwpBridge, + "rhwp-experimental", + evidence, + ["Experimental rhwp bridge path; verify generated output before production use."], + HwpCapabilityConstants.ReasonRoundTripUnverified); + + private static HwpOperationCapability ExperimentalBridgeBlocked(string reason) + => new( + HwpOperationStatus.Experimental, + HwpCapabilityConstants.EngineRhwpBridge, + null, + [], + [SetupHint()], + reason); + + private static HwpOperationCapability Unsupported(string unsupportedReason) + => new( + HwpOperationStatus.Unsupported, + HwpCapabilityConstants.EngineNone, + null, + [], + [], + unsupportedReason); + + private static string ReadRenderBlockedReason(HwpRuntimeProbeResult runtime) + { + if (!runtime.EngineRequested && !runtime.BridgeAvailable && !runtime.ApiAvailable && !runtime.RhwpAvailable) + return HwpCapabilityConstants.ReasonBridgeNotEnabled; + if (!runtime.BridgeAvailable) + return HwpCapabilityConstants.ReasonBridgeMissing; + if (!runtime.ApiAvailable && !runtime.RhwpAvailable) + return HwpCapabilityConstants.ReasonRhwpRuntimeMissing; + return HwpCapabilityConstants.ReasonBridgeMissing; + } + + private static string InsertTextBlockedReason(HwpRuntimeProbeResult runtime) + { + if (!runtime.EngineRequested && !runtime.BridgeAvailable && !runtime.ApiAvailable) + return HwpCapabilityConstants.ReasonBridgeNotEnabled; + if (!runtime.BridgeAvailable) + return HwpCapabilityConstants.ReasonBridgeMissing; + if (!runtime.ApiAvailable || !runtime.InsertTextAvailable) + return HwpCapabilityConstants.ReasonRhwpApiMissingOrTooOld; + return HwpCapabilityConstants.ReasonBridgeMissing; + } + + private static string ApiCommandBlockedReason(HwpRuntimeProbeResult runtime, string command) + { + if (!runtime.EngineRequested && !runtime.BridgeAvailable && !runtime.ApiAvailable) + return HwpCapabilityConstants.ReasonBridgeNotEnabled; + if (!runtime.BridgeAvailable) + return HwpCapabilityConstants.ReasonBridgeMissing; + if (!runtime.ApiAvailable || !runtime.ApiCommands.Contains(command)) + return HwpCapabilityConstants.ReasonRhwpApiMissingOrTooOld; + return HwpCapabilityConstants.ReasonBridgeMissing; + } + + private static string[] HwpWarnings(HwpRuntimeProbeResult runtime) + { + if (runtime.MutationAvailable) + { + return [ + "Experimental rhwp bridge is available from installed sidecars or explicit environment paths.", + "Default mutations require explicit --prop output=.", + "Do not claim production-grade HWP fidelity without fixture and Hancom round-trip evidence." + ]; + } + + if (runtime.CreateBlankAvailable) + { + return [ + "Blank .hwp creation is available through rhwp-field-bridge.", + "Read/render/mutation still require rhwp-officecli-bridge plus rhwp-field-bridge or rhwp CLI." + ]; + } + + return [ + "Binary .hwp support requires packaged rhwp sidecars beside officecli or explicit environment paths.", + SetupHint() + ]; + } + + private static string SetupHint() + => "Run ./dev-install.sh to install rhwp sidecars beside officecli, or set OFFICECLI_RHWP_BRIDGE_PATH/OFFICECLI_RHWP_API_BIN/OFFICECLI_RHWP_BIN."; +} diff --git a/src/officecli/Handlers/Hwp/HwpCapabilityJsonMapper.cs b/src/officecli/Handlers/Hwp/HwpCapabilityJsonMapper.cs new file mode 100644 index 000000000..ba234e9be --- /dev/null +++ b/src/officecli/Handlers/Hwp/HwpCapabilityJsonMapper.cs @@ -0,0 +1,246 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Text.Json.Nodes; + +namespace OfficeCli.Handlers.Hwp; + +public static class HwpCapabilityJsonMapper +{ + public static JsonObject BuildEnvelope(HwpCapabilityReport report) + { + return new JsonObject + { + ["success"] = true, + ["data"] = BuildData(report), + ["warnings"] = new JsonArray() + }; + } + + public static JsonObject BuildData(HwpCapabilityReport report) + { + var formats = new JsonObject(); + foreach (var (format, capability) in report.Formats) + formats[format] = BuildFormatCapability(format, capability); + + return new JsonObject + { + ["schemaVersion"] = report.SchemaVersion, + ["officecliVersion"] = report.OfficeCliVersion, + ["generatedAt"] = report.GeneratedAt.ToString("O"), + ["formats"] = formats + }; + } + + private static JsonObject BuildFormatCapability(string format, HwpFormatCapability capability) + { + var operations = new JsonObject(); + foreach (var (operation, opCapability) in capability.Operations) + operations[operation] = BuildOperationCapability(format, operation, opCapability); + + return new JsonObject + { + ["readStatus"] = capability.ReadStatus, + ["writeStatus"] = capability.WriteStatus, + ["defaultEngine"] = capability.DefaultEngine, + ["setupHints"] = ToJsonArray(BuildSetupHints(capability)), + ["operations"] = operations, + ["warnings"] = ToJsonArray(capability.Warnings) + }; + } + + private static JsonObject BuildOperationCapability(string format, string operation, HwpOperationCapability capability) + { + var result = new JsonObject + { + ["status"] = StatusToJson(capability.Status), + ["support"] = StatusToJson(capability.Status), + ["ready"] = IsReady(capability), + ["engine"] = capability.Engine, + ["engineVersion"] = capability.EngineVersion, + ["evidence"] = ToJsonArray(capability.Evidence), + ["warnings"] = ToJsonArray(capability.Warnings), + ["unsupportedReason"] = capability.UnsupportedReason, + ["blockedBy"] = ToJsonArray(BuildBlockedBy(capability)), + ["requiredArgs"] = ToJsonArray(BuildRequiredArgs(operation)), + ["example"] = BuildExample(operation) + }; + if (format == HwpCapabilityConstants.FormatHwp + && operation == HwpCapabilityConstants.OperationReplaceText) + { + result["safeInPlace"] = new JsonObject + { + ["support"] = HwpCapabilityConstants.StatusExperimental, + ["ready"] = IsSafeInPlaceReady(capability), + ["requires"] = ToJsonArray(["--in-place", "--backup", "--verify"]), + ["example"] = "officecli set input.hwp /text --prop find=마케팅 --prop value=브릿지 --in-place --backup --verify --json", + ["policy"] = "creates temp output, provider readback, semantic delta, backup, manifest, then atomic replace" + }; + } + return result; + } + + private static bool IsReady(HwpOperationCapability capability) + { + if (capability.Status == HwpOperationStatus.Unsupported) + return false; + return capability.UnsupportedReason is null + or HwpCapabilityConstants.ReasonRoundTripUnverified; + } + + private static bool IsSafeInPlaceReady(HwpOperationCapability capability) + { + if (!IsReady(capability)) + return false; + + return HwpRuntimeProbe.Probe().MutationAvailable; + } + + private static IEnumerable BuildBlockedBy(HwpOperationCapability capability) + { + if (capability.UnsupportedReason is null + or HwpCapabilityConstants.ReasonRoundTripUnverified) + yield break; + yield return capability.UnsupportedReason; + } + + private static IEnumerable BuildSetupHints(HwpFormatCapability capability) + { + if (capability.Operations.Values.Any(op => + op.UnsupportedReason is HwpCapabilityConstants.ReasonBridgeNotEnabled + or HwpCapabilityConstants.ReasonBridgeMissing + or HwpCapabilityConstants.ReasonRhwpRuntimeMissing + or HwpCapabilityConstants.ReasonRhwpApiMissing + or HwpCapabilityConstants.ReasonRhwpApiMissingOrTooOld)) + { + yield return "run ./dev-install.sh to install rhwp sidecars beside officecli"; + yield return "export OFFICECLI_RHWP_BIN=/path/to/rhwp"; + yield return "export OFFICECLI_RHWP_BRIDGE_PATH=/path/to/rhwp-officecli-bridge.dll"; + yield return "export OFFICECLI_RHWP_API_BIN=/path/to/rhwp-field-bridge"; + yield return "officecli help hwp"; + } + } + + private static IEnumerable BuildRequiredArgs(string operation) + { + switch (operation) + { + case HwpCapabilityConstants.OperationReadField: + yield return "field-name|field-id"; + break; + case HwpCapabilityConstants.OperationFillField: + yield return "name|id"; + yield return "value"; + yield return "output"; + break; + case HwpCapabilityConstants.OperationReplaceText: + yield return "find"; + yield return "value"; + yield return "output|--in-place"; + yield return "--backup when --in-place"; + yield return "--verify when --in-place"; + break; + case HwpCapabilityConstants.OperationInsertText: + yield return "value"; + yield return "output"; + yield return "section? default 0"; + yield return "paragraph? default 0"; + yield return "offset? default 0"; + break; + case HwpCapabilityConstants.OperationRenderPng: + case HwpCapabilityConstants.OperationExportMarkdown: + case HwpCapabilityConstants.OperationDumpPages: + yield return "page? default all"; + break; + case HwpCapabilityConstants.OperationExportPdf: + yield return "output"; + yield return "page? default all"; + break; + case HwpCapabilityConstants.OperationDumpControls: + yield break; + case HwpCapabilityConstants.OperationThumbnail: + yield return "output"; + break; + case HwpCapabilityConstants.OperationReadTableCell: + yield return "section"; + yield return "parent-para"; + yield return "control"; + yield return "cell"; + yield return "cell-para"; + break; + case HwpCapabilityConstants.OperationScanCells: + yield return "section? default 0"; + yield return "max-parent-para? default 50"; + break; + case HwpCapabilityConstants.OperationSetTableCell: + yield return "section"; + yield return "parent-para"; + yield return "control"; + yield return "cell"; + yield return "value"; + yield return "output"; + break; + case HwpCapabilityConstants.OperationConvertToEditable: + case HwpCapabilityConstants.OperationSaveAsHwp: + yield return "output"; + break; + case HwpCapabilityConstants.OperationNativeRead: + yield return "op"; + yield return "native-arg? key=value"; + break; + case HwpCapabilityConstants.OperationNativeMutation: + yield return "op"; + yield return "output"; + yield return "operation-specific props"; + break; + } + } + + private static string? BuildExample(string operation) + => operation switch + { + HwpCapabilityConstants.OperationReadText => "officecli view input.hwp text --json", + HwpCapabilityConstants.OperationRenderSvg => "officecli view input.hwp svg --page 1 --json", + HwpCapabilityConstants.OperationRenderPng => "officecli view input.hwp png --page 1 --out /tmp/hwp-png --json", + HwpCapabilityConstants.OperationExportPdf => "officecli view input.hwp pdf --page 1 --out output.pdf --json", + HwpCapabilityConstants.OperationExportMarkdown => "officecli view input.hwp markdown --json", + HwpCapabilityConstants.OperationThumbnail => "officecli view input.hwp thumbnail --out thumb.png --json", + HwpCapabilityConstants.OperationDocumentInfo => "officecli view input.hwp info --json", + HwpCapabilityConstants.OperationDiagnostics => "officecli view input.hwp diagnostics --json", + HwpCapabilityConstants.OperationDumpControls => "officecli view input.hwp dump --json", + HwpCapabilityConstants.OperationDumpPages => "officecli view input.hwp pages --page 1 --json", + HwpCapabilityConstants.OperationListFields => "officecli view input.hwp fields --json", + HwpCapabilityConstants.OperationReadField => "officecli view input.hwp field --field-name 회사명 --json", + HwpCapabilityConstants.OperationFillField => "officecli set input.hwp /field --prop name=회사명 --prop value=리지 --prop output=output.hwp --json", + HwpCapabilityConstants.OperationReplaceText => "officecli set input.hwp /text --prop find=마케팅 --prop value=브릿지 --prop output=output.hwp --json", + HwpCapabilityConstants.OperationInsertText => "officecli add input.hwp /text --type text --prop value=본문 --prop output=output.hwp --json", + HwpCapabilityConstants.OperationReadTableCell => "officecli view input.hwp table-cell --section 0 --parent-para 3 --control 0 --cell 0 --cell-para 0 --json", + HwpCapabilityConstants.OperationScanCells => "officecli view input.hwp tables --section 0 --json", + HwpCapabilityConstants.OperationSetTableCell => "officecli set input.hwp /table/cell --prop section=0 --prop parent-para=3 --prop control=0 --prop cell=0 --prop value=오피스셀 --prop output=output.hwp --json", + HwpCapabilityConstants.OperationCreateBlank => "officecli create output.hwp --json", + HwpCapabilityConstants.OperationConvertToEditable => "officecli set input.hwp /convert-to-editable --prop output=editable.hwp --json", + HwpCapabilityConstants.OperationNativeRead => "officecli view input.hwp native --op get-style-list --json", + HwpCapabilityConstants.OperationNativeMutation => "officecli set input.hwp /native-op --prop op=split-paragraph --prop paragraph=0 --prop offset=5 --prop output=output.hwp --json", + HwpCapabilityConstants.OperationSaveAsHwp => "officecli set input.hwpx /save-as-hwp --prop output=output.hwp --json", + _ => null + }; + + private static string StatusToJson(HwpOperationStatus status) + { + return status switch + { + HwpOperationStatus.Unsupported => HwpCapabilityConstants.StatusUnsupported, + HwpOperationStatus.Experimental => HwpCapabilityConstants.StatusExperimental, + HwpOperationStatus.RoundTripVerified => HwpCapabilityConstants.StatusRoundTripVerified, + _ => throw new ArgumentOutOfRangeException(nameof(status), status, "Unknown HWP operation status") + }; + } + + internal static JsonArray ToJsonArray(IEnumerable values) + { + var array = new JsonArray(); + foreach (var value in values) + array.Add((JsonNode?)JsonValue.Create(value)); + return array; + } +} diff --git a/src/officecli/Handlers/Hwp/HwpCapabilityReport.cs b/src/officecli/Handlers/Hwp/HwpCapabilityReport.cs new file mode 100644 index 000000000..2312d0db6 --- /dev/null +++ b/src/officecli/Handlers/Hwp/HwpCapabilityReport.cs @@ -0,0 +1,260 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Text.Json.Nodes; + +namespace OfficeCli.Handlers.Hwp; + +public enum HwpFormat +{ + Hwp, + Hwpx +} + +public enum HwpEngineMode +{ + None, + Default, + Experimental +} + +public enum HwpOperationStatus +{ + Unsupported, + Experimental, + RoundTripVerified +} + +public static class HwpCapabilityConstants +{ + public const int SchemaVersion = 2; + + public const string FormatHwp = "hwp"; + public const string FormatHwpx = "hwpx"; + + public const string EngineCustom = "custom"; + public const string EngineRhwpBridge = "rhwp-bridge"; + public const string EngineNone = "none"; + + public const string ModeNone = "none"; + public const string ModeDefault = "default"; + public const string ModeExperimental = "experimental"; + + public const string StatusUnsupported = "unsupported"; + public const string StatusExperimental = "experimental"; + public const string StatusRoundTripVerified = "roundtrip-verified"; + + public const string WriteStatusUnsupported = "unsupported"; + public const string WriteStatusOperationGated = "operation-gated"; + + public const string OperationReadText = "read_text"; + public const string OperationRenderSvg = "render_svg"; + public const string OperationRenderPng = "render_png"; + public const string OperationExportPdf = "export_pdf"; + public const string OperationExportMarkdown = "export_markdown"; + public const string OperationThumbnail = "thumbnail"; + public const string OperationDocumentInfo = "document_info"; + public const string OperationDiagnostics = "diagnostics"; + public const string OperationDumpControls = "dump_controls"; + public const string OperationDumpPages = "dump_pages"; + public const string OperationListFields = "list_fields"; + public const string OperationReadField = "read_field"; + public const string OperationFillField = "fill_field"; + public const string OperationReplaceText = "replace_text"; + public const string OperationInsertText = "insert_text"; + public const string OperationReadTableCell = "read_table_cell"; + public const string OperationScanCells = "scan_cells"; + public const string OperationSetTableCell = "set_table_cell"; + public const string OperationCreateBlank = "create_blank"; + public const string OperationSaveOriginal = "save_original"; + public const string OperationConvertToEditable = "convert_to_editable"; + public const string OperationNativeRead = "native_read"; + public const string OperationNativeMutation = "native_mutation"; + public const string OperationSaveAsHwp = "save_as_hwp"; + + public const string ReasonUnsupportedFormat = "unsupported_format"; + public const string ReasonUnsupportedOperation = "unsupported_operation"; + public const string ReasonUnsupportedEngine = "unsupported_engine"; + public const string ReasonRoundTripUnverified = "roundtrip_unverified"; + public const string ReasonBridgeNotEnabled = "bridge_not_enabled"; + public const string ReasonBridgeMissing = "bridge_missing"; + public const string ReasonBridgeTimeout = "bridge_timeout"; + public const string ReasonBridgeInvalidJson = "bridge_invalid_json"; + public const string ReasonBridgeExitNonZero = "bridge_exit_nonzero"; + public const string ReasonRhwpRuntimeMissing = "rhwp_runtime_missing"; + public const string ReasonRhwpApiMissing = "rhwp_api_missing"; + public const string ReasonRhwpApiMissingOrTooOld = "rhwp_api_missing_or_too_old"; + public const string ReasonBinaryHwpMutationForbidden = "binary_hwp_mutation_forbidden"; + public const string ReasonBinaryHwpWriteForbidden = "binary_hwp_write_forbidden"; + public const string ReasonFixtureValidationFailed = "fixture_validation_failed"; + public const string ReasonCapabilitySchemaInvalid = "capability_schema_invalid"; +} + +public sealed record HwpOperationCapability( + HwpOperationStatus Status, + string Engine, + string? EngineVersion, + string[] Evidence, + string[] Warnings, + string? UnsupportedReason +); + +public sealed record HwpFormatCapability( + string ReadStatus, + string WriteStatus, + string DefaultEngine, + IReadOnlyDictionary Operations, + string[] Warnings +); + +public sealed record HwpCapabilityReport( + int SchemaVersion, + string OfficeCliVersion, + DateTimeOffset GeneratedAt, + IReadOnlyDictionary Formats +); + +public sealed record HwpReadRequest(HwpFormat Format, string InputPath, long InputSizeBytes, bool Json); +public sealed record HwpTextPage(int Page, string Text); +public sealed record HwpTextResult( + string Text, + IReadOnlyList Pages, + string Engine, + string? EngineVersion, + string[] Evidence, + string[] Warnings +); + +public sealed record HwpRenderRequest( + HwpFormat Format, + string InputPath, + string OutputDirectory, + string PageSelector, + long InputSizeBytes, + bool Json +); + +public sealed record HwpRenderedPage(int Page, string SvgPath, string Sha256); +public sealed record HwpRenderResult( + IReadOnlyList Pages, + string ManifestPath, + string Engine, + string? EngineVersion, + string[] Evidence, + string[] Warnings +); + +public sealed record HwpJsonViewRequest( + HwpFormat Format, + string InputPath, + long InputSizeBytes, + string Operation, + string BridgeCommand, + IReadOnlyDictionary Args, + bool Json +); + +public sealed record HwpJsonViewResult( + JsonObject Data, + string Engine, + string? EngineVersion, + string[] Evidence, + string[] Warnings +); + +public sealed record HwpFieldListRequest(HwpFormat Format, string InputPath, long InputSizeBytes, bool Json); +public sealed record HwpFieldReadRequest( + HwpFormat Format, + string InputPath, + string? FieldName, + int? FieldId, + long InputSizeBytes, + bool Json +); +public sealed record HwpFieldListResult( + JsonObject Fields, + string Engine, + string? EngineVersion, + string[] Evidence, + string[] Warnings +); +public sealed record HwpFieldReadResult( + JsonObject Field, + string Engine, + string? EngineVersion, + string[] Evidence, + string[] Warnings +); + +public sealed record HwpFillFieldRequest( + HwpFormat Format, + string InputPath, + string OutputPath, + IReadOnlyDictionary Fields, + bool Json +) +{ + public IReadOnlyDictionary FieldIds { get; init; } = new Dictionary(); +} + +public sealed record HwpReplaceTextRequest( + HwpFormat Format, + string InputPath, + string OutputPath, + string Query, + string Value, + string Mode, + bool CaseSensitive, + bool InPlace, + bool Backup, + bool Verify, + bool Json +); + +public sealed record HwpInsertTextRequest( + HwpFormat Format, + string InputPath, + string OutputPath, + int Section, + int Paragraph, + int Offset, + string Value, + bool Json +); + +public sealed record HwpTableCellSetRequest( + HwpFormat Format, + string InputPath, + string OutputPath, + int Section, + int ParentParagraph, + int Control, + int Cell, + int CellParagraph, + int Offset, + int? Count, + string Value, + bool Json +); + +public sealed record HwpSaveOriginalRequest(HwpFormat Format, string InputPath, string OutputPath, bool Json); +public sealed record HwpConvertToEditableRequest(HwpFormat Format, string InputPath, string OutputPath, bool Json); +public sealed record HwpNativeMutationRequest( + HwpFormat Format, + string InputPath, + string OutputPath, + string Operation, + IReadOnlyDictionary Args, + bool Json +); +public sealed record HwpSaveAsHwpRequest(HwpFormat Format, string InputPath, string OutputPath, bool Json); +public sealed record HwpMutationResult( + string OutputPath, + string Engine, + string? EngineVersion, + string[] Evidence, + string[] Warnings +) +{ + public JsonObject? Transaction { get; init; } +} diff --git a/src/officecli/Handlers/Hwp/HwpEngineError.cs b/src/officecli/Handlers/Hwp/HwpEngineError.cs new file mode 100644 index 000000000..18430b93e --- /dev/null +++ b/src/officecli/Handlers/Hwp/HwpEngineError.cs @@ -0,0 +1,53 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +namespace OfficeCli.Handlers.Hwp; + +using System.Text.Json.Nodes; + +public sealed record OfficeCliJsonEnvelope( + bool Success, + T? Data, + string[] Warnings, + OfficeCliError? Error +); + +public sealed record OfficeCliError( + string Error, + string Code, + string? Suggestion, + string[] ValidValues, + string? Format, + string? Operation, + string? Engine, + string? EngineMode +); + +public sealed class HwpEngineException : Exception +{ + public HwpEngineException( + string message, + string code, + string? suggestion = null, + string[]? validValues = null, + string? format = null, + string? operation = null, + string? engine = null, + string? engineMode = null, + JsonObject? transaction = null) : base(message) + { + Error = new OfficeCliError( + message, + code, + suggestion, + validValues ?? [], + format, + operation, + engine, + engineMode); + Transaction = transaction; + } + + public OfficeCliError Error { get; } + public JsonObject? Transaction { get; } +} diff --git a/src/officecli/Handlers/Hwp/HwpEngineSelector.cs b/src/officecli/Handlers/Hwp/HwpEngineSelector.cs new file mode 100644 index 000000000..10be23c47 --- /dev/null +++ b/src/officecli/Handlers/Hwp/HwpEngineSelector.cs @@ -0,0 +1,206 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +namespace OfficeCli.Handlers.Hwp; + +/// +/// Selects the active HWP engine based on the OFFICECLI_HWP_ENGINE environment variable. +/// Default: CustomHwpxEngine (existing XML-first behavior). +/// Experimental: RhwpBridgeEngine when OFFICECLI_HWP_ENGINE=rhwp-experimental. +/// +public static class HwpEngineSelector +{ + private const string EnvVarName = "OFFICECLI_HWP_ENGINE"; + private const string ExperimentalValue = "rhwp-experimental"; + + public static bool IsExperimentalBridgeEnabled() + => string.Equals( + Environment.GetEnvironmentVariable(EnvVarName), + ExperimentalValue, + StringComparison.OrdinalIgnoreCase); + + internal static bool CanUseInstalledRuntime(string? format = null, string? operation = null) + { + var runtime = HwpRuntimeProbe.Probe(); + return IsBridgeFormat(format) && OperationRuntimeAvailable(runtime, operation); + } + + /// + /// Returns the active engine. Throws HwpEngineException with bridge_missing + /// if rhwp-experimental is requested but the bridge executable is not found. + /// + public static IHwpEngine GetEngine(string? format = null, string? operation = null) + { + var runtime = HwpRuntimeProbe.Probe(); + var explicitBridge = IsExperimentalBridgeEnabled(); + var autoRuntime = IsBridgeFormat(format) && OperationRuntimeAvailable(runtime, operation); + if (!explicitBridge && !autoRuntime) + { + if (string.Equals(format, HwpCapabilityConstants.FormatHwp, StringComparison.OrdinalIgnoreCase) + && IsBridgeBackedOperation(operation)) + { + throw new HwpEngineException( + "Binary .hwp operation requires packaged rhwp sidecars or OFFICECLI_HWP_ENGINE=rhwp-experimental.", + HwpCapabilityConstants.ReasonBridgeNotEnabled, + "Run ./dev-install.sh or set OFFICECLI_RHWP_BRIDGE_PATH and OFFICECLI_RHWP_API_BIN.", + [], + format, + operation, + engine: HwpCapabilityConstants.EngineNone, + engineMode: HwpCapabilityConstants.ModeNone); + } + return new CustomHwpxEngine(); + } + + if (runtime.BridgePath == null) + throw new HwpEngineException( + "rhwp-officecli-bridge is not available.", + HwpCapabilityConstants.ReasonBridgeMissing, + "Run `officecli hwp doctor --json`; install sidecars with ./dev-install.sh or set OFFICECLI_RHWP_BRIDGE_PATH.", + [], + format, + operation, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + + var runtimeReason = explicitBridge && !OperationRequiresApiCommand(operation) + ? null + : RuntimeBlockedReason(runtime, operation); + if (runtimeReason != null) + throw new HwpEngineException( + $"rhwp runtime is not available for operation '{operation ?? "unknown"}'.", + runtimeReason, + "Run `officecli hwp doctor --json`; install sidecars with ./dev-install.sh or set explicit rhwp environment paths.", + [], + format, + operation, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + + var bridge = RhwpBridgeEngine.TryCreate(out var missingReason); + if (bridge != null) + return bridge; + + throw new HwpEngineException( + $"OFFICECLI_HWP_ENGINE=rhwp-experimental requested but bridge not available: {missingReason}", + HwpCapabilityConstants.ReasonBridgeMissing, + "Run `officecli help hwp`; set OFFICECLI_RHWP_BRIDGE_PATH and, for fields/text/table mutation, OFFICECLI_RHWP_API_BIN.", + [], + format, + operation, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + } + + private static bool IsBridgeFormat(string? format) + => string.Equals(format, HwpCapabilityConstants.FormatHwp, StringComparison.OrdinalIgnoreCase) + || string.Equals(format, HwpCapabilityConstants.FormatHwpx, StringComparison.OrdinalIgnoreCase); + + private static bool IsBridgeBackedOperation(string? operation) + => operation is HwpCapabilityConstants.OperationReadText + or HwpCapabilityConstants.OperationRenderSvg + or HwpCapabilityConstants.OperationRenderPng + or HwpCapabilityConstants.OperationExportPdf + or HwpCapabilityConstants.OperationExportMarkdown + or HwpCapabilityConstants.OperationThumbnail + or HwpCapabilityConstants.OperationDocumentInfo + or HwpCapabilityConstants.OperationDiagnostics + or HwpCapabilityConstants.OperationDumpControls + or HwpCapabilityConstants.OperationDumpPages + or HwpCapabilityConstants.OperationListFields + or HwpCapabilityConstants.OperationReadField + or HwpCapabilityConstants.OperationFillField + or HwpCapabilityConstants.OperationReplaceText + or HwpCapabilityConstants.OperationInsertText + or HwpCapabilityConstants.OperationReadTableCell + or HwpCapabilityConstants.OperationScanCells + or HwpCapabilityConstants.OperationSetTableCell + or HwpCapabilityConstants.OperationConvertToEditable + or HwpCapabilityConstants.OperationNativeRead + or HwpCapabilityConstants.OperationNativeMutation + or HwpCapabilityConstants.OperationSaveAsHwp; + + private static bool OperationRuntimeAvailable(HwpRuntimeProbeResult runtime, string? operation) + => operation switch + { + HwpCapabilityConstants.OperationReadText or HwpCapabilityConstants.OperationRenderSvg + => runtime.ReadRenderAvailable, + HwpCapabilityConstants.OperationRenderPng + => runtime.RenderPngAvailable, + HwpCapabilityConstants.OperationExportPdf + => runtime.ExportPdfAvailable, + HwpCapabilityConstants.OperationExportMarkdown + => runtime.ExportMarkdownAvailable, + HwpCapabilityConstants.OperationThumbnail + => runtime.ThumbnailAvailable, + HwpCapabilityConstants.OperationDocumentInfo + => runtime.DocumentInfoAvailable, + HwpCapabilityConstants.OperationDiagnostics + => runtime.DiagnosticsAvailable, + HwpCapabilityConstants.OperationDumpControls + => runtime.DumpControlsAvailable, + HwpCapabilityConstants.OperationDumpPages + => runtime.DumpPagesAvailable, + HwpCapabilityConstants.OperationListFields + => runtime.ListFieldsAvailable, + HwpCapabilityConstants.OperationReadField + => runtime.ReadFieldAvailable, + HwpCapabilityConstants.OperationFillField + => runtime.FillFieldAvailable, + HwpCapabilityConstants.OperationReplaceText + => runtime.ReplaceTextAvailable, + HwpCapabilityConstants.OperationSetTableCell + => runtime.SetTableCellAvailable, + HwpCapabilityConstants.OperationSaveAsHwp + => runtime.SaveAsHwpAvailable, + HwpCapabilityConstants.OperationConvertToEditable + => runtime.ConvertToEditableAvailable, + HwpCapabilityConstants.OperationNativeRead or HwpCapabilityConstants.OperationNativeMutation + => runtime.NativeOpAvailable, + HwpCapabilityConstants.OperationInsertText + => runtime.InsertTextAvailable, + HwpCapabilityConstants.OperationReadTableCell + => runtime.ReadTableCellAvailable, + HwpCapabilityConstants.OperationScanCells + => runtime.ScanCellsAvailable, + _ => runtime.BridgeAvailable + }; + + private static string? RuntimeBlockedReason(HwpRuntimeProbeResult runtime, string? operation) + { + if (OperationRuntimeAvailable(runtime, operation)) + return null; + if (!runtime.BridgeAvailable) + return HwpCapabilityConstants.ReasonBridgeMissing; + if (operation is HwpCapabilityConstants.OperationReadText or HwpCapabilityConstants.OperationRenderSvg + && !runtime.ApiAvailable && !runtime.RhwpAvailable) + return HwpCapabilityConstants.ReasonRhwpRuntimeMissing; + if (OperationRequiresApiCommand(operation) && !OperationRuntimeAvailable(runtime, operation)) + return HwpCapabilityConstants.ReasonRhwpApiMissingOrTooOld; + if (!runtime.ApiAvailable) + return HwpCapabilityConstants.ReasonRhwpApiMissing; + return HwpCapabilityConstants.ReasonBridgeMissing; + } + + private static bool OperationRequiresApiCommand(string? operation) + => operation is HwpCapabilityConstants.OperationListFields + or HwpCapabilityConstants.OperationReadField + or HwpCapabilityConstants.OperationFillField + or HwpCapabilityConstants.OperationReplaceText + or HwpCapabilityConstants.OperationInsertText + or HwpCapabilityConstants.OperationRenderPng + or HwpCapabilityConstants.OperationExportPdf + or HwpCapabilityConstants.OperationExportMarkdown + or HwpCapabilityConstants.OperationThumbnail + or HwpCapabilityConstants.OperationDocumentInfo + or HwpCapabilityConstants.OperationDiagnostics + or HwpCapabilityConstants.OperationDumpControls + or HwpCapabilityConstants.OperationDumpPages + or HwpCapabilityConstants.OperationReadTableCell + or HwpCapabilityConstants.OperationScanCells + or HwpCapabilityConstants.OperationSetTableCell + or HwpCapabilityConstants.OperationConvertToEditable + or HwpCapabilityConstants.OperationNativeRead + or HwpCapabilityConstants.OperationNativeMutation + or HwpCapabilityConstants.OperationSaveAsHwp; +} diff --git a/src/officecli/Handlers/Hwp/HwpRuntimeProbe.cs b/src/officecli/Handlers/Hwp/HwpRuntimeProbe.cs new file mode 100644 index 000000000..3ea3ba485 --- /dev/null +++ b/src/officecli/Handlers/Hwp/HwpRuntimeProbe.cs @@ -0,0 +1,195 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Diagnostics; + +namespace OfficeCli.Handlers.Hwp; + +internal sealed record HwpRuntimeProbeResult( + bool EngineRequested, + string? BridgePath, + string? ApiPath, + string? RhwpPath, + IReadOnlySet ApiCommands) +{ + public bool BridgeAvailable => BridgePath != null; + public bool ApiAvailable => ApiPath != null; + public bool RhwpAvailable => RhwpPath != null; + public bool ReadRenderAvailable => BridgeAvailable && (ApiAvailable || RhwpAvailable); + public bool MutationAvailable => BridgeAvailable && ApiAvailable; + public bool CreateBlankAvailable => ApiAvailable; + public bool ListFieldsAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("list-fields"); + public bool ReadFieldAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("get-field"); + public bool FillFieldAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("set-field"); + public bool ReplaceTextAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("replace-text"); + public bool InsertTextAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("insert-text"); + public bool RenderPngAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("render-png"); + public bool ExportPdfAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("export-pdf"); + public bool ExportMarkdownAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("export-markdown"); + public bool ThumbnailAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("thumbnail"); + public bool DocumentInfoAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("document-info"); + public bool DiagnosticsAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("diagnostics"); + public bool DumpControlsAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("dump-controls"); + public bool DumpPagesAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("dump-pages"); + public bool ReadTableCellAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("get-cell-text"); + public bool ScanCellsAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("scan-cells"); + public bool SetTableCellAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("set-cell-text"); + public bool ConvertToEditableAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("convert-to-editable"); + public bool NativeOpAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("native-op"); + public bool SaveAsHwpAvailable => BridgeAvailable && ApiAvailable && ApiCommands.Contains("save-as-hwp"); +} + +internal static class HwpRuntimeProbe +{ + private const string BridgeExecutableName = "rhwp-officecli-bridge"; + private const string ApiExecutableName = "rhwp-field-bridge"; + private const string RhwpExecutableName = "rhwp"; + private static readonly string[] KnownApiCommands = + [ + "create-blank", + "read-text", + "render-svg", + "render-png", + "export-pdf", + "export-markdown", + "document-info", + "diagnostics", + "dump-controls", + "dump-pages", + "thumbnail", + "list-fields", + "get-field", + "set-field", + "replace-text", + "insert-text", + "get-cell-text", + "scan-cells", + "set-cell-text", + "convert-to-editable", + "native-op", + "save-as-hwp" + ]; + + public static HwpRuntimeProbeResult Probe() + { + var apiPath = DiscoverApiPath(); + return new( + HwpEngineSelector.IsExperimentalBridgeEnabled(), + DiscoverBridgePath(), + apiPath, + DiscoverRhwpPath(), + DiscoverApiCommands(apiPath)); + } + + public static string? DiscoverBridgePath() + => DiscoverExecutable( + Environment.GetEnvironmentVariable("OFFICECLI_RHWP_BRIDGE_PATH"), + CandidateNames(BridgeExecutableName, includeDll: true)); + + public static string? DiscoverApiPath() + => DiscoverExecutable( + Environment.GetEnvironmentVariable("OFFICECLI_RHWP_API_BIN"), + CandidateNames(ApiExecutableName, includeDll: false)); + + public static string? DiscoverRhwpPath() + => DiscoverExecutable( + Environment.GetEnvironmentVariable("OFFICECLI_RHWP_BIN"), + CandidateNames(RhwpExecutableName, includeDll: false)); + + private static string[] CandidateNames(string baseName, bool includeDll) + { + var names = new List { baseName }; + if (OperatingSystem.IsWindows()) + names.Add(baseName + ".exe"); + if (includeDll) + names.Add(baseName + ".dll"); + return names.ToArray(); + } + + private static string? DiscoverExecutable(string? explicitPath, string[] names) + { + if (!string.IsNullOrWhiteSpace(explicitPath)) + return File.Exists(explicitPath) ? explicitPath : null; + + foreach (var dir in CandidateDirectories()) + { + foreach (var name in names) + { + var candidate = Path.Combine(dir, name); + if (File.Exists(candidate)) return candidate; + } + } + + var pathEnv = Environment.GetEnvironmentVariable("PATH") ?? ""; + foreach (var dir in pathEnv.Split(Path.PathSeparator)) + { + if (string.IsNullOrWhiteSpace(dir)) continue; + foreach (var name in names) + { + var candidate = Path.Combine(dir, name); + if (File.Exists(candidate)) return candidate; + } + } + + return null; + } + + private static IReadOnlySet DiscoverApiCommands(string? apiPath) + { + var commands = new HashSet(StringComparer.Ordinal); + if (string.IsNullOrWhiteSpace(apiPath) || !File.Exists(apiPath)) + return commands; + + try + { + using var process = new Process + { + StartInfo = new ProcessStartInfo + { + FileName = apiPath, + UseShellExecute = false, + RedirectStandardOutput = true, + RedirectStandardError = true, + CreateNoWindow = true + } + }; + process.StartInfo.ArgumentList.Add("--help"); + process.Start(); + if (!process.WaitForExit(2_000)) + { + try { process.Kill(); } catch { } + return commands; + } + var stdout = process.StandardOutput.ReadToEnd(); + + foreach (var command in KnownApiCommands) + { + if (stdout.Contains(command, StringComparison.Ordinal)) + commands.Add(command); + } + } + catch + { + return commands; + } + + return commands; + } + + private static IEnumerable CandidateDirectories() + { + var seen = new HashSet(StringComparer.Ordinal); + foreach (var dir in new[] + { + AppContext.BaseDirectory, + Path.GetDirectoryName(Environment.ProcessPath ?? ""), + Path.GetDirectoryName(Process.GetCurrentProcess().MainModule?.FileName ?? ""), + Directory.GetCurrentDirectory() + }) + { + if (string.IsNullOrWhiteSpace(dir)) continue; + var full = Path.GetFullPath(dir); + if (seen.Add(full)) yield return full; + } + } +} diff --git a/src/officecli/Handlers/Hwp/IHwpEngine.cs b/src/officecli/Handlers/Hwp/IHwpEngine.cs new file mode 100644 index 000000000..e557cc40d --- /dev/null +++ b/src/officecli/Handlers/Hwp/IHwpEngine.cs @@ -0,0 +1,26 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +namespace OfficeCli.Handlers.Hwp; + +public interface IHwpEngine +{ + string Name { get; } + string? Version { get; } + HwpEngineMode Mode { get; } + + Task GetCapabilitiesAsync(CancellationToken ct); + Task ReadTextAsync(HwpReadRequest request, CancellationToken ct); + Task RenderSvgAsync(HwpRenderRequest request, CancellationToken ct); + Task ViewJsonAsync(HwpJsonViewRequest request, CancellationToken ct); + Task ListFieldsAsync(HwpFieldListRequest request, CancellationToken ct); + Task ReadFieldAsync(HwpFieldReadRequest request, CancellationToken ct); + Task FillFieldAsync(HwpFillFieldRequest request, CancellationToken ct); + Task ReplaceTextAsync(HwpReplaceTextRequest request, CancellationToken ct); + Task InsertTextAsync(HwpInsertTextRequest request, CancellationToken ct); + Task SetTableCellAsync(HwpTableCellSetRequest request, CancellationToken ct); + Task SaveOriginalAsync(HwpSaveOriginalRequest request, CancellationToken ct); + Task ConvertToEditableAsync(HwpConvertToEditableRequest request, CancellationToken ct); + Task NativeMutationAsync(HwpNativeMutationRequest request, CancellationToken ct); + Task SaveAsHwpAsync(HwpSaveAsHwpRequest request, CancellationToken ct); +} diff --git a/src/officecli/Handlers/Hwp/RhwpBridgeEngine.Mutation.cs b/src/officecli/Handlers/Hwp/RhwpBridgeEngine.Mutation.cs new file mode 100644 index 000000000..cdce2fe54 --- /dev/null +++ b/src/officecli/Handlers/Hwp/RhwpBridgeEngine.Mutation.cs @@ -0,0 +1,638 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Security.Cryptography; +using System.Text.Json.Nodes; +using OfficeCli.Handlers.Hwpx.Validation; +using OfficeCli.Handlers.Hwp.SafeSave; + +namespace OfficeCli.Handlers.Hwp; + +public sealed partial class RhwpBridgeEngine +{ + public async Task FillFieldAsync(HwpFillFieldRequest request, CancellationToken ct) + { + if (request.Fields.Count == 0 && request.FieldIds.Count == 0) + throw new HwpEngineException( + "fill_field requires at least one field.", + HwpCapabilityConstants.ReasonUnsupportedOperation, + "Pass one or more field name/value pairs.", + [HwpCapabilityConstants.OperationFillField], + FormatKey(request.Format), + HwpCapabilityConstants.OperationFillField, + HwpCapabilityConstants.EngineRhwpBridge, + HwpCapabilityConstants.ModeExperimental); + + var formatArg = FormatKey(request.Format); + var currentInput = request.InputPath; + var tempFiles = new List(); + string? lastOutputJson = null; + + try + { + var index = 0; + foreach (var field in request.Fields) + { + index++; + var output = index == request.Fields.Count + request.FieldIds.Count + ? request.OutputPath + : Path.Combine( + Path.GetTempPath(), + $"officecli-rhwp-field-{Guid.NewGuid():N}{Path.GetExtension(request.OutputPath)}"); + if (index != request.Fields.Count) tempFiles.Add(output); + + var args = new[] + { + "set-field", "--format", formatArg, "--input", currentInput, + "--output", output, "--name", field.Key, "--value", field.Value, "--json" + }; + lastOutputJson = await RunBridgeAsync(args, RenderSvgTimeoutMs, ct); + currentInput = output; + } + foreach (var field in request.FieldIds) + { + index++; + var output = index == request.Fields.Count + request.FieldIds.Count + ? request.OutputPath + : Path.Combine( + Path.GetTempPath(), + $"officecli-rhwp-field-{Guid.NewGuid():N}{Path.GetExtension(request.OutputPath)}"); + if (index != request.Fields.Count + request.FieldIds.Count) tempFiles.Add(output); + + var args = new[] + { + "set-field", "--format", formatArg, "--input", currentInput, + "--output", output, "--id", field.Key.ToString(), "--value", field.Value, "--json" + }; + lastOutputJson = await RunBridgeAsync(args, RenderSvgTimeoutMs, ct); + currentInput = output; + } + } + finally + { + foreach (var tempFile in tempFiles) + try { File.Delete(tempFile); } catch { /* best effort */ } + } + + EnsureOutputExists(request.OutputPath); + return ParseMutationResult( + lastOutputJson ?? "{}", + request.OutputPath, + "set-field", + "rhwp-api set-field output file created; caller must verify round-trip before production use."); + } + + public async Task ReplaceTextAsync(HwpReplaceTextRequest request, CancellationToken ct) + { + if (string.IsNullOrEmpty(request.Query)) + throw new HwpEngineException( + "replace_text requires a non-empty query.", + HwpCapabilityConstants.ReasonUnsupportedOperation, + "Pass --prop find=.", + [HwpCapabilityConstants.OperationReplaceText], + FormatKey(request.Format), + HwpCapabilityConstants.OperationReplaceText, + HwpCapabilityConstants.EngineRhwpBridge, + HwpCapabilityConstants.ModeExperimental); + + var formatArg = FormatKey(request.Format); + string? outputJson = null; + var sourceHash = ComputeSha256(request.InputPath); + var runner = new SafeSaveRunner(); + var transaction = await runner.RunAsync( + new SafeSaveOptions( + request.InputPath, + request.OutputPath, + request.InPlace, + request.Backup, + request.Verify, + HwpCapabilityConstants.OperationReplaceText, + formatArg, + BuildReplaceTextSafeSavePolicy(request.Format)), + async tempPath => + { + var args = new List + { + "replace-text", "--format", formatArg, "--input", request.InputPath, + "--output", tempPath, "--query", request.Query, + "--value", request.Value, "--mode", request.Mode, "--json" + }; + if (request.CaseSensitive) + { + args.Add("--case-sensitive"); + args.Add("true"); + } + + outputJson = await RunBridgeAsync(args.ToArray(), RenderSvgTimeoutMs, ct).ConfigureAwait(false); + EnsureOutputExists(tempPath); + }, + tempPath => ValidateReplaceTextAsync(request, tempPath, sourceHash, ct), + ct).ConfigureAwait(false); + + if (!transaction.Ok) + { + throw new HwpEngineException( + "replace_text safe-save transaction failed.", + HwpCapabilityConstants.ReasonFixtureValidationFailed, + "Inspect transaction checks and retry with a separate output path.", + [HwpCapabilityConstants.OperationReplaceText], + FormatKey(request.Format), + HwpCapabilityConstants.OperationReplaceText, + HwpCapabilityConstants.EngineRhwpBridge, + HwpCapabilityConstants.ModeExperimental, + SafeSaveJsonMapper.ToJson(transaction)); + } + + EnsureOutputExists(request.OutputPath); + var result = ParseMutationResult( + outputJson ?? "{}", + request.OutputPath, + "replace-text", + "rhwp-api replace-text output file created; caller must verify round-trip before production use."); + return result with { Transaction = SafeSaveJsonMapper.ToJson(transaction) }; + } + + private async Task ValidateReplaceTextAsync( + HwpReplaceTextRequest request, + string tempPath, + string sourceHash, + CancellationToken ct) + { + var checks = new List(); + var semanticDelta = new Dictionary(StringComparer.Ordinal) + { + ["query"] = request.Query, + ["replacement"] = request.Value, + ["mode"] = request.Mode + }; + + try + { + var sourceText = await ReadTextOnlyAsync(request.Format, request.InputPath, ct).ConfigureAwait(false); + var outputText = await ReadTextOnlyAsync(request.Format, tempPath, ct).ConfigureAwait(false); + checks.Add(new SafeSaveCheck( + "provider-readback", + true, + "info", + null, + new Dictionary + { + ["sourceLength"] = sourceText.Length, + ["outputLength"] = outputText.Length + })); + + var comparison = request.CaseSensitive + ? StringComparison.Ordinal + : StringComparison.OrdinalIgnoreCase; + var sourceOldCount = CountOccurrences(sourceText, request.Query, comparison); + var outputOldCount = CountOccurrences(outputText, request.Query, comparison); + var outputNewCount = string.IsNullOrEmpty(request.Value) + ? 0 + : CountOccurrences(outputText, request.Value, comparison); + var valueContainsQuery = !string.IsNullOrEmpty(request.Value) + && request.Value.Contains(request.Query, comparison); + var allMode = string.Equals(request.Mode, "all", StringComparison.OrdinalIgnoreCase); + var oldTextChanged = valueContainsQuery + ? !string.Equals(sourceText, outputText, StringComparison.Ordinal) + : allMode + ? outputOldCount == 0 + : outputOldCount == Math.Max(0, sourceOldCount - 1); + var newTextPresent = string.IsNullOrEmpty(request.Value) || outputNewCount > 0; + var semanticOk = sourceOldCount > 0 + && oldTextChanged + && newTextPresent + && !string.Equals(sourceText, outputText, StringComparison.Ordinal); + + semanticDelta["sourceOldCount"] = sourceOldCount; + semanticDelta["outputOldCount"] = outputOldCount; + semanticDelta["outputNewCount"] = outputNewCount; + semanticDelta["changed"] = semanticOk; + + checks.Add(new SafeSaveCheck( + "semantic-delta", + semanticOk, + semanticOk ? "info" : "error", + semanticOk ? null : "Replacement output did not contain the expected semantic delta.", + semanticDelta)); + } + catch (Exception ex) + { + checks.Add(new SafeSaveCheck( + "provider-readback", + false, + "error", + ex.Message)); + } + + var currentSourceHash = ComputeSha256(request.InputPath); + var sourcePreserved = string.Equals(sourceHash, currentSourceHash, StringComparison.OrdinalIgnoreCase); + checks.Add(new SafeSaveCheck( + "source-preserved", + sourcePreserved, + sourcePreserved ? "info" : "error", + sourcePreserved ? null : "Source file changed before safe-save commit.", + new Dictionary + { + ["beforeSha256"] = sourceHash, + ["afterSha256"] = currentSourceHash + })); + + Dictionary? packageIntegrity = null; + if (request.Format == HwpFormat.Hwpx) + { + var packageResult = HwpxPackageValidator.Validate(tempPath); + checks.AddRange(packageResult.Checks); + packageIntegrity = new Dictionary(packageResult.PackageIntegrity, StringComparer.Ordinal); + } + + IReadOnlyDictionary? visualDelta = null; + if (request.Verify) + { + var visualResult = await ValidateVisualAsync(request.Format, tempPath, ct).ConfigureAwait(false); + checks.AddRange(visualResult.Checks); + visualDelta = visualResult.VisualDelta; + } + + return new SafeSaveValidationResult(checks, semanticDelta, visualDelta, packageIntegrity); + } + + private static SafeSavePolicy BuildReplaceTextSafeSavePolicy(HwpFormat format) + { + var required = new List + { + "temp-write", + "provider-readback", + "semantic-delta", + "source-preserved" + }; + if (format == HwpFormat.Hwpx) + required.Add("package-integrity"); + return SafeSavePolicy.OutputMode(required.ToArray()); + } + + private async Task ValidateVisualAsync( + HwpFormat format, + string tempPath, + CancellationToken ct) + { + var outputDir = Path.Combine( + Path.GetDirectoryName(tempPath) ?? Path.GetTempPath(), + $".{Path.GetFileNameWithoutExtension(tempPath)}.svg"); + try + { + var renderResult = await RenderSvgAsync( + new HwpRenderRequest( + format, + tempPath, + outputDir, + "1", + new FileInfo(tempPath).Length, + Json: false), + ct).ConfigureAwait(false); + return SafeSaveVisualValidator.FromRenderResult(renderResult); + } + catch (Exception ex) + { + return SafeSaveVisualValidator.FromFailure(ex); + } + } + + private async Task ReadTextOnlyAsync(HwpFormat format, string path, CancellationToken ct) + { + var result = await ReadTextAsync( + new HwpReadRequest(format, path, new FileInfo(path).Length, Json: true), + ct).ConfigureAwait(false); + return result.Text; + } + + private static int CountOccurrences(string text, string value, StringComparison comparison) + { + if (string.IsNullOrEmpty(text) || string.IsNullOrEmpty(value)) return 0; + var count = 0; + var index = 0; + while (index < text.Length) + { + var found = text.IndexOf(value, index, comparison); + if (found < 0) break; + count++; + index = found + value.Length; + } + return count; + } + + private static string ComputeSha256(string path) + { + using var stream = File.OpenRead(path); + return Convert.ToHexString(SHA256.HashData(stream)).ToLowerInvariant(); + } + + public async Task InsertTextAsync(HwpInsertTextRequest request, CancellationToken ct) + { + var formatArg = FormatKey(request.Format); + string? outputJson = null; + var sourceHash = ComputeSha256(request.InputPath); + var runner = new SafeSaveRunner(); + var transaction = await runner.RunAsync( + new SafeSaveOptions( + request.InputPath, + request.OutputPath, + InPlace: false, + Backup: false, + Verify: false, + HwpCapabilityConstants.OperationInsertText, + formatArg, + BuildInsertTextSafeSavePolicy(request.Format)), + async tempPath => + { + var args = new[] + { + "insert-text", "--format", formatArg, "--input", request.InputPath, + "--output", tempPath, + "--section", request.Section.ToString(), + "--paragraph", request.Paragraph.ToString(), + "--offset", request.Offset.ToString(), + "--value", request.Value, + "--json" + }; + + outputJson = await RunBridgeAsync(args, RenderSvgTimeoutMs, ct).ConfigureAwait(false); + EnsureOutputExists(tempPath); + }, + tempPath => ValidateInsertTextAsync(request, tempPath, sourceHash, ct), + ct).ConfigureAwait(false); + + if (!transaction.Ok) + { + throw new HwpEngineException( + "insert_text safe-save transaction failed.", + HwpCapabilityConstants.ReasonFixtureValidationFailed, + "Inspect transaction checks and retry with a separate output path.", + [HwpCapabilityConstants.OperationInsertText], + FormatKey(request.Format), + HwpCapabilityConstants.OperationInsertText, + HwpCapabilityConstants.EngineRhwpBridge, + HwpCapabilityConstants.ModeExperimental, + SafeSaveJsonMapper.ToJson(transaction)); + } + + EnsureOutputExists(request.OutputPath); + var result = ParseMutationResult( + outputJson ?? "{}", + request.OutputPath, + "insert-text", + $"rhwp-api insert-text output file created for {formatArg} input."); + return result with { Transaction = SafeSaveJsonMapper.ToJson(transaction) }; + } + + private async Task ValidateInsertTextAsync( + HwpInsertTextRequest request, + string tempPath, + string sourceHash, + CancellationToken ct) + { + var checks = new List(); + var semanticDelta = new Dictionary(StringComparer.Ordinal) + { + ["section"] = request.Section, + ["paragraph"] = request.Paragraph, + ["offset"] = request.Offset, + ["insertedTextLength"] = request.Value.Length + }; + + try + { + var sourceText = await ReadTextOnlyAsync(request.Format, request.InputPath, ct).ConfigureAwait(false); + var outputText = await ReadTextOnlyAsync(request.Format, tempPath, ct).ConfigureAwait(false); + checks.Add(new SafeSaveCheck( + "provider-readback", + true, + "info", + null, + new Dictionary + { + ["sourceLength"] = sourceText.Length, + ["outputLength"] = outputText.Length + })); + + var insertedPresent = !string.IsNullOrEmpty(request.Value) + && outputText.Contains(request.Value, StringComparison.Ordinal); + var changed = !string.Equals(sourceText, outputText, StringComparison.Ordinal); + var semanticOk = insertedPresent && changed; + semanticDelta["insertedTextPresent"] = insertedPresent; + semanticDelta["changed"] = changed; + + checks.Add(new SafeSaveCheck( + "semantic-delta", + semanticOk, + semanticOk ? "info" : "error", + semanticOk ? null : "Inserted text was not visible in provider readback.", + semanticDelta)); + } + catch (Exception ex) + { + checks.Add(new SafeSaveCheck( + "provider-readback", + false, + "error", + ex.Message)); + } + + var currentSourceHash = ComputeSha256(request.InputPath); + var sourcePreserved = string.Equals(sourceHash, currentSourceHash, StringComparison.OrdinalIgnoreCase); + checks.Add(new SafeSaveCheck( + "source-preserved", + sourcePreserved, + sourcePreserved ? "info" : "error", + sourcePreserved ? null : "Source file changed before safe-save commit.", + new Dictionary + { + ["beforeSha256"] = sourceHash, + ["afterSha256"] = currentSourceHash + })); + + Dictionary? packageIntegrity = null; + if (request.Format == HwpFormat.Hwpx) + { + var packageResult = HwpxPackageValidator.Validate(tempPath); + checks.AddRange(packageResult.Checks); + packageIntegrity = new Dictionary(packageResult.PackageIntegrity, StringComparer.Ordinal); + } + + return new SafeSaveValidationResult(checks, semanticDelta, VisualDelta: null, packageIntegrity); + } + + private static SafeSavePolicy BuildInsertTextSafeSavePolicy(HwpFormat format) + { + var required = new List + { + "temp-write", + "provider-readback", + "semantic-delta", + "source-preserved" + }; + if (format == HwpFormat.Hwpx) + required.Add("package-integrity"); + return SafeSavePolicy.OutputMode(required.ToArray()); + } + + public async Task SetTableCellAsync(HwpTableCellSetRequest request, CancellationToken ct) + { + var formatArg = FormatKey(request.Format); + + var args = new List + { + "set-cell-text", "--format", formatArg, + "--input", request.InputPath, "--output", request.OutputPath, + "--section", request.Section.ToString(), + "--parent-para", request.ParentParagraph.ToString(), + "--control", request.Control.ToString(), + "--cell", request.Cell.ToString(), + "--cell-para", request.CellParagraph.ToString(), + "--offset", request.Offset.ToString(), + "--value", request.Value, + "--json" + }; + if (request.Count.HasValue) + { + args.Add("--count"); + args.Add(request.Count.Value.ToString()); + } + + var outputJson = await RunBridgeAsync(args.ToArray(), RenderSvgTimeoutMs, ct); + EnsureOutputExists(request.OutputPath); + return ParseMutationResult( + outputJson, + request.OutputPath, + "set-cell-text", + $"rhwp-api set-cell-text output file created for {formatArg} input."); + } + + public Task SaveOriginalAsync(HwpSaveOriginalRequest request, CancellationToken ct) + => Task.FromException( + MutationUnsupported(request.Format, HwpCapabilityConstants.OperationSaveOriginal)); + + public async Task ConvertToEditableAsync(HwpConvertToEditableRequest request, CancellationToken ct) + { + var formatArg = FormatKey(request.Format); + var args = new[] + { + "convert-to-editable", "--format", formatArg, "--input", request.InputPath, + "--output", request.OutputPath, "--json" + }; + var outputJson = await RunBridgeAsync(args, RenderSvgTimeoutMs, ct).ConfigureAwait(false); + EnsureOutputExists(request.OutputPath); + return ParseMutationResult( + outputJson, + request.OutputPath, + "convert-to-editable", + "rhwp-api convert-to-editable output file created; caller must verify readback and Hancom before production use."); + } + + public async Task NativeMutationAsync(HwpNativeMutationRequest request, CancellationToken ct) + { + var formatArg = FormatKey(request.Format); + var args = new List + { + "native-op", "--format", formatArg, "--input", request.InputPath, + "--output", request.OutputPath, "--op", request.Operation, "--json" + }; + foreach (var (key, value) in request.Args) + { + if (string.IsNullOrWhiteSpace(key) || value == null) continue; + var normalized = key.StartsWith("--", StringComparison.Ordinal) ? key : $"--{key}"; + if (normalized is "--op" or "--output" or "--input" or "--format" or "--json") + continue; + args.Add(normalized); + args.Add(value); + } + + var outputJson = await RunBridgeAsync(args.ToArray(), RenderSvgTimeoutMs, ct).ConfigureAwait(false); + EnsureOutputExists(request.OutputPath); + return ParseMutationResult( + outputJson, + request.OutputPath, + "native-op", + $"rhwp-api native-op '{request.Operation}' output file created; caller must verify readback and Hancom before production use."); + } + + public async Task SaveAsHwpAsync(HwpSaveAsHwpRequest request, CancellationToken ct) + { + var formatArg = FormatKey(request.Format); + var args = new[] + { + "save-as-hwp", "--format", formatArg, "--input", request.InputPath, + "--output", request.OutputPath, "--json" + }; + var outputJson = await RunBridgeAsync(args, RenderSvgTimeoutMs, ct).ConfigureAwait(false); + EnsureOutputExists(request.OutputPath); + return ParseMutationResult( + outputJson, + request.OutputPath, + "save-as-hwp", + "rhwp-api save-as-hwp output file created; caller must verify round-trip before production use."); + } + + private static void EnsureOutputExists(string outputPath) + { + if (File.Exists(outputPath)) return; + throw new HwpEngineException( + "rhwp-officecli-bridge did not create the requested output file.", + HwpCapabilityConstants.ReasonBridgeExitNonZero, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + } + + private static HwpMutationResult ParseMutationResult( + string json, + string outputPath, + string bridgeCommand, + string evidence) + { + JsonNode? node; + try { node = JsonNode.Parse(json); } + catch + { + throw new HwpEngineException( + $"rhwp-officecli-bridge {bridgeCommand} produced unparseable JSON.", + HwpCapabilityConstants.ReasonBridgeInvalidJson, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + } + + var engineVersion = node?["engineVersion"]?.GetValue(); + var warnings = ParseWarnings(node); + return new HwpMutationResult( + outputPath, + HwpCapabilityConstants.EngineRhwpBridge, + engineVersion, + [evidence], + warnings); + } + + private static string FormatKey(HwpFormat format) + => format == HwpFormat.Hwp + ? HwpCapabilityConstants.FormatHwp + : HwpCapabilityConstants.FormatHwpx; + + private static HwpEngineException MutationUnsupported(HwpFormat format, string operation) + { + var formatKey = FormatKey(format); + var reason = operation switch + { + HwpCapabilityConstants.OperationFillField when format == HwpFormat.Hwp + => HwpCapabilityConstants.ReasonBinaryHwpMutationForbidden, + HwpCapabilityConstants.OperationSaveOriginal or HwpCapabilityConstants.OperationSaveAsHwp when format == HwpFormat.Hwp + => HwpCapabilityConstants.ReasonBinaryHwpWriteForbidden, + _ => HwpCapabilityConstants.ReasonRoundTripUnverified + }; + return new HwpEngineException( + $"{formatKey} operation '{operation}' is not supported in Phase 0.5 (read/render only).", + reason, + "Use explicit output-file mutation commands only in experimental bridge mode.", + [HwpCapabilityConstants.OperationReadText, HwpCapabilityConstants.OperationRenderSvg], + formatKey, + operation, + HwpCapabilityConstants.EngineRhwpBridge, + HwpCapabilityConstants.ModeExperimental); + } +} diff --git a/src/officecli/Handlers/Hwp/RhwpBridgeEngine.cs b/src/officecli/Handlers/Hwp/RhwpBridgeEngine.cs new file mode 100644 index 000000000..e46735f9c --- /dev/null +++ b/src/officecli/Handlers/Hwp/RhwpBridgeEngine.cs @@ -0,0 +1,413 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Diagnostics; +using System.Text.Json.Nodes; + +namespace OfficeCli.Handlers.Hwp; + +/// +/// Phase 0.5 experimental engine that routes read/render and explicitly gated +/// field operations through the rhwp-officecli-bridge subprocess. Only active +/// when OFFICECLI_HWP_ENGINE=rhwp-experimental. +/// +public sealed partial class RhwpBridgeEngine : IHwpEngine +{ + private const int ReadTextTimeoutMs = 10_000; + private const int RenderSvgTimeoutMs = 60_000; + private const int FieldReadTimeoutMs = 10_000; + private const long LargeFileSizeBytes = 10L * 1024 * 1024; + + private readonly string _bridgePath; + + private RhwpBridgeEngine(string bridgePath) + { + _bridgePath = bridgePath; + } + + public string Name => HwpCapabilityConstants.EngineRhwpBridge; + public string? Version => null; + public HwpEngineMode Mode => HwpEngineMode.Experimental; + + /// + /// Returns a new bridge engine if the bridge executable can be located, + /// or null with a reason description. + /// + public static RhwpBridgeEngine? TryCreate(out string? missingReason) + { + missingReason = null; + var bridgePath = HwpRuntimeProbe.DiscoverBridgePath(); + if (bridgePath == null) + { + missingReason = "bridge not found in OFFICECLI_RHWP_BRIDGE_PATH, executable directory, or PATH"; + return null; + } + return new RhwpBridgeEngine(bridgePath); + } + + public Task GetCapabilitiesAsync(CancellationToken ct) + { + ct.ThrowIfCancellationRequested(); + return Task.FromResult(HwpCapabilityFactory.BuildReport(Name)); + } + + public async Task ReadTextAsync(HwpReadRequest request, CancellationToken ct) + { + var formatArg = request.Format == HwpFormat.Hwp ? "hwp" : "hwpx"; + var timeout = request.InputSizeBytes > LargeFileSizeBytes + ? ReadTextTimeoutMs * 3 + : ReadTextTimeoutMs; + var args = new[] { "read-text", "--format", formatArg, "--input", request.InputPath, "--json" }; + var output = await RunBridgeAsync(args, timeout, ct); + return ParseTextResult(output); + } + + public async Task RenderSvgAsync(HwpRenderRequest request, CancellationToken ct) + { + var formatArg = request.Format == HwpFormat.Hwp ? "hwp" : "hwpx"; + var timeout = request.InputSizeBytes > LargeFileSizeBytes + ? RenderSvgTimeoutMs * 3 + : RenderSvgTimeoutMs; + var args = new[] + { + "render-svg", "--format", formatArg, "--input", request.InputPath, + "--out-dir", request.OutputDirectory, "--page", request.PageSelector, "--json" + }; + var output = await RunBridgeAsync(args, timeout, ct); + return ParseRenderResult(output, request.OutputDirectory); + } + + public async Task ViewJsonAsync(HwpJsonViewRequest request, CancellationToken ct) + { + var formatArg = request.Format == HwpFormat.Hwp ? "hwp" : "hwpx"; + var timeout = request.InputSizeBytes > LargeFileSizeBytes + ? RenderSvgTimeoutMs * 3 + : RenderSvgTimeoutMs; + var args = new List + { + request.BridgeCommand, + "--format", formatArg, + "--input", request.InputPath, + "--json" + }; + foreach (var entry in request.Args) + { + args.Add(entry.Key); + args.Add(entry.Value); + } + + var output = await RunBridgeAsync(args.ToArray(), timeout, ct); + return ParseJsonViewResult(output, request.Operation, request.BridgeCommand); + } + + public async Task ListFieldsAsync(HwpFieldListRequest request, CancellationToken ct) + { + var formatArg = request.Format == HwpFormat.Hwp ? "hwp" : "hwpx"; + var timeout = request.InputSizeBytes > LargeFileSizeBytes + ? FieldReadTimeoutMs * 3 + : FieldReadTimeoutMs; + var args = new[] { "list-fields", "--format", formatArg, "--input", request.InputPath, "--json" }; + var output = await RunBridgeAsync(args, timeout, ct); + return ParseFieldListResult(output); + } + + public async Task ReadFieldAsync(HwpFieldReadRequest request, CancellationToken ct) + { + var formatArg = request.Format == HwpFormat.Hwp ? "hwp" : "hwpx"; + var args = new List + { + "get-field", "--format", formatArg, "--input", request.InputPath, "--json" + }; + if (!string.IsNullOrWhiteSpace(request.FieldName)) + { + args.Add("--name"); + args.Add(request.FieldName); + } + else if (request.FieldId.HasValue) + { + args.Add("--id"); + args.Add(request.FieldId.Value.ToString()); + } + else + { + throw new HwpEngineException( + "read_field requires a field name or field id.", + HwpCapabilityConstants.ReasonUnsupportedOperation, + "Use --field-name or --field-id.", + [HwpCapabilityConstants.OperationReadField], + formatArg, + HwpCapabilityConstants.OperationReadField, + HwpCapabilityConstants.EngineRhwpBridge, + HwpCapabilityConstants.ModeExperimental); + } + var output = await RunBridgeAsync(args.ToArray(), FieldReadTimeoutMs, ct); + return ParseFieldReadResult(output); + } + + private async Task RunBridgeAsync(string[] args, int timeoutMs, CancellationToken ct) + { + if (!File.Exists(_bridgePath)) + throw new HwpEngineException( + $"rhwp-officecli-bridge not found at '{_bridgePath}'.", + HwpCapabilityConstants.ReasonBridgeMissing, + "Set OFFICECLI_RHWP_BRIDGE_PATH or install the bridge beside officecli.", + [], + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + + using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct); + cts.CancelAfter(timeoutMs); + + var psi = new ProcessStartInfo + { + FileName = BridgeFileName(_bridgePath), + UseShellExecute = false, + RedirectStandardOutput = true, + RedirectStandardError = true, + CreateNoWindow = true + }; + if (string.Equals(Path.GetExtension(_bridgePath), ".dll", StringComparison.OrdinalIgnoreCase)) + psi.ArgumentList.Add(_bridgePath); + foreach (var arg in args) + psi.ArgumentList.Add(arg); + + Process? process; + try + { + process = Process.Start(psi); + } + catch (Exception ex) + { + throw new HwpEngineException( + $"Failed to start rhwp-officecli-bridge: {ex.Message}", + HwpCapabilityConstants.ReasonBridgeMissing, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + } + + if (process == null) + throw new HwpEngineException( + "Failed to start rhwp-officecli-bridge process.", + HwpCapabilityConstants.ReasonBridgeMissing, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + + string stdout; + string stderr; + int exitCode; + try + { + var stdoutTask = process.StandardOutput.ReadToEndAsync(cts.Token); + var stderrTask = process.StandardError.ReadToEndAsync(cts.Token); + await process.WaitForExitAsync(cts.Token); + stdout = await stdoutTask; + stderr = await stderrTask; + exitCode = process.ExitCode; + } + catch (OperationCanceledException) when (!ct.IsCancellationRequested) + { + try { process.Kill(entireProcessTree: true); } catch { /* best effort */ } + process.Dispose(); + throw new HwpEngineException( + $"rhwp-officecli-bridge timed out after {timeoutMs}ms.", + HwpCapabilityConstants.ReasonBridgeTimeout, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + } + finally + { + process.Dispose(); + } + + if (exitCode != 0) + throw new HwpEngineException( + BuildBridgeExitMessage(exitCode, stdout, stderr), + HwpCapabilityConstants.ReasonBridgeExitNonZero, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + + var trimmed = stdout.Trim(); + if (string.IsNullOrEmpty(trimmed) || trimmed[0] != '{') + throw new HwpEngineException( + "rhwp-officecli-bridge produced invalid JSON output.", + HwpCapabilityConstants.ReasonBridgeInvalidJson, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + + return trimmed; + } + + private static string BuildBridgeExitMessage(int exitCode, string stdout, string stderr) + { + var stderrTrimmed = TruncateDiagnostic(stderr.Trim()); + var stdoutTrimmed = TruncateDiagnostic(stdout.Trim()); + if (string.IsNullOrWhiteSpace(stderrTrimmed) && string.IsNullOrWhiteSpace(stdoutTrimmed)) + return $"rhwp-officecli-bridge exited with code {exitCode}."; + if (string.IsNullOrWhiteSpace(stdoutTrimmed)) + return $"rhwp-officecli-bridge exited with code {exitCode}: {stderrTrimmed}"; + if (string.IsNullOrWhiteSpace(stderrTrimmed)) + return $"rhwp-officecli-bridge exited with code {exitCode}: stdout={stdoutTrimmed}"; + return $"rhwp-officecli-bridge exited with code {exitCode}: {stderrTrimmed}; stdout={stdoutTrimmed}"; + } + + private static string TruncateDiagnostic(string value) + => value.Length > 512 ? value[..512] + "..." : value; + + private static HwpTextResult ParseTextResult(string json) + { + JsonNode? node; + try { node = JsonNode.Parse(json); } + catch + { + throw new HwpEngineException( + "rhwp-officecli-bridge read-text produced unparseable JSON.", + HwpCapabilityConstants.ReasonBridgeInvalidJson, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + } + + var text = node?["text"]?.GetValue() ?? ""; + var engineVersion = node?["engineVersion"]?.GetValue(); + var pages = new List(); + if (node?["pages"] is JsonArray pagesArr) + { + foreach (var p in pagesArr) + { + var pageNum = p?["page"]?.GetValue() ?? 0; + var pageText = p?["text"]?.GetValue() ?? ""; + pages.Add(new HwpTextPage(pageNum, pageText)); + } + } + var warnings = new List(); + if (node?["warnings"] is JsonArray wArr) + foreach (var w in wArr) + if (w?.GetValue() is { } ws) warnings.Add(ws); + + return new HwpTextResult( + text, pages, + HwpCapabilityConstants.EngineRhwpBridge, + engineVersion, [], warnings.ToArray()); + } + + private static HwpRenderResult ParseRenderResult(string json, string outputDir) + { + JsonNode? node; + try { node = JsonNode.Parse(json); } + catch + { + throw new HwpEngineException( + "rhwp-officecli-bridge render-svg produced unparseable JSON.", + HwpCapabilityConstants.ReasonBridgeInvalidJson, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + } + + var pages = new List(); + if (node?["pages"] is JsonArray pagesArr) + { + foreach (var p in pagesArr) + { + var pageNum = p?["page"]?.GetValue() ?? 0; + var svgPath = p?["path"]?.GetValue() ?? ""; + var sha256 = p?["sha256"]?.GetValue() ?? ""; + pages.Add(new HwpRenderedPage(pageNum, svgPath, sha256)); + } + } + var manifestPath = node?["manifest"]?.GetValue() + ?? Path.Combine(outputDir, "manifest.json"); + var engineVersion = node?["engineVersion"]?.GetValue(); + var warnings = new List(); + if (node?["warnings"] is JsonArray wArr) + foreach (var w in wArr) + if (w?.GetValue() is { } ws) warnings.Add(ws); + + return new HwpRenderResult( + pages, manifestPath, + HwpCapabilityConstants.EngineRhwpBridge, + engineVersion, [], warnings.ToArray()); + } + + private static HwpJsonViewResult ParseJsonViewResult(string json, string operation, string bridgeCommand) + { + JsonNode? node; + try { node = JsonNode.Parse(json); } + catch + { + throw new HwpEngineException( + $"rhwp-officecli-bridge {bridgeCommand} produced unparseable JSON.", + HwpCapabilityConstants.ReasonBridgeInvalidJson, + operation: operation, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + } + + var payload = node?.AsObject() ?? new JsonObject(); + var engineVersion = node?["engineVersion"]?.GetValue(); + var warnings = ParseWarnings(node); + return new HwpJsonViewResult( + payload, + HwpCapabilityConstants.EngineRhwpBridge, + engineVersion, + [$"rhwp-api {bridgeCommand} output parsed"], + warnings); + } + + private static HwpFieldListResult ParseFieldListResult(string json) + { + JsonNode? node; + try { node = JsonNode.Parse(json); } + catch + { + throw new HwpEngineException( + "rhwp-officecli-bridge list-fields produced unparseable JSON.", + HwpCapabilityConstants.ReasonBridgeInvalidJson, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + } + + var fieldsNode = node?["fields"]?.DeepClone() ?? new JsonArray(); + var payload = new JsonObject { ["fields"] = fieldsNode }; + var engineVersion = node?["engineVersion"]?.GetValue(); + var warnings = ParseWarnings(node); + return new HwpFieldListResult( + payload, HwpCapabilityConstants.EngineRhwpBridge, + engineVersion, [], warnings); + } + + private static HwpFieldReadResult ParseFieldReadResult(string json) + { + JsonNode? node; + try { node = JsonNode.Parse(json); } + catch + { + throw new HwpEngineException( + "rhwp-officecli-bridge get-field produced unparseable JSON.", + HwpCapabilityConstants.ReasonBridgeInvalidJson, + engine: HwpCapabilityConstants.EngineRhwpBridge, + engineMode: HwpCapabilityConstants.ModeExperimental); + } + + var fieldNode = node?["field"]?.DeepClone() ?? new JsonObject(); + var payload = new JsonObject { ["field"] = fieldNode }; + var engineVersion = node?["engineVersion"]?.GetValue(); + var warnings = ParseWarnings(node); + return new HwpFieldReadResult( + payload, HwpCapabilityConstants.EngineRhwpBridge, + engineVersion, [], warnings); + } + + private static string[] ParseWarnings(JsonNode? node) + { + var warnings = new List(); + if (node?["warnings"] is JsonArray wArr) + foreach (var w in wArr) + if (w?.GetValue() is { } ws) warnings.Add(ws); + return warnings.ToArray(); + } + + private static string BridgeFileName(string bridgePath) + => string.Equals(Path.GetExtension(bridgePath), ".dll", StringComparison.OrdinalIgnoreCase) + ? "dotnet" + : bridgePath; + +} diff --git a/src/officecli/Handlers/Hwp/SafeSave/SafeSaveBackup.cs b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveBackup.cs new file mode 100644 index 000000000..60133a6bd --- /dev/null +++ b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveBackup.cs @@ -0,0 +1,40 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +namespace OfficeCli.Handlers.Hwp.SafeSave; + +internal static class SafeSaveBackup +{ + public static string BuildBackupPath(string inputPath, DateTimeOffset timestamp) + => $"{Path.GetFullPath(inputPath)}.bak-{SafeSaveManifestWriter.FormatTimestamp(timestamp)}"; + + public static SafeSaveCheck Create(string inputPath, string backupPath) + { + try + { + var directory = Path.GetDirectoryName(backupPath); + if (!string.IsNullOrWhiteSpace(directory)) + Directory.CreateDirectory(directory); + File.Copy(inputPath, backupPath, overwrite: false); + return new SafeSaveCheck( + "backup-created", + true, + "info", + null, + new Dictionary + { + ["backupPath"] = backupPath, + ["sizeBytes"] = new FileInfo(backupPath).Length + }); + } + catch (Exception ex) when (ex is IOException or UnauthorizedAccessException) + { + return new SafeSaveCheck( + "backup-created", + false, + "error", + $"Could not create backup before in-place replace: {ex.Message}", + new Dictionary { ["backupPath"] = backupPath }); + } + } +} diff --git a/src/officecli/Handlers/Hwp/SafeSave/SafeSaveCheck.cs b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveCheck.cs new file mode 100644 index 000000000..772f7aef6 --- /dev/null +++ b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveCheck.cs @@ -0,0 +1,12 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +namespace OfficeCli.Handlers.Hwp.SafeSave; + +internal sealed record SafeSaveCheck( + string Name, + bool Ok, + string Severity, + string? Message = null, + IReadOnlyDictionary? Details = null +); diff --git a/src/officecli/Handlers/Hwp/SafeSave/SafeSaveJsonMapper.cs b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveJsonMapper.cs new file mode 100644 index 000000000..b2cfcbdb4 --- /dev/null +++ b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveJsonMapper.cs @@ -0,0 +1,84 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Text.Json.Nodes; + +namespace OfficeCli.Handlers.Hwp.SafeSave; + +internal static class SafeSaveJsonMapper +{ + public static JsonObject ToJson(SafeSaveTransaction transaction) + { + var checks = new JsonArray(); + foreach (var check in transaction.Checks) + checks.Add((JsonNode?)ToJson(check)); + + return new JsonObject + { + ["schemaVersion"] = transaction.SchemaVersion, + ["ok"] = transaction.Ok, + ["format"] = transaction.Format, + ["operation"] = transaction.Operation, + ["mode"] = transaction.Mode, + ["inputPath"] = transaction.InputPath, + ["outputPath"] = transaction.OutputPath, + ["tempPath"] = transaction.TempPath, + ["backupPath"] = transaction.BackupPath, + ["manifestPath"] = transaction.ManifestPath, + ["verified"] = transaction.Verified, + ["checks"] = checks, + ["semanticDelta"] = ToJsonObject(transaction.SemanticDelta), + ["visualDelta"] = ToJsonObject(transaction.VisualDelta), + ["packageIntegrity"] = ToJsonObject(transaction.PackageIntegrity), + ["warnings"] = ToJsonArray(transaction.Warnings) + }; + } + + private static JsonObject ToJson(SafeSaveCheck check) + { + var obj = new JsonObject + { + ["name"] = check.Name, + ["ok"] = check.Ok, + ["severity"] = check.Severity + }; + if (check.Message != null) obj["message"] = check.Message; + if (check.Details != null) + { + var details = new JsonObject(); + foreach (var item in check.Details) + details[item.Key] = ToJsonValue(item.Value); + obj["details"] = details; + } + return obj; + } + + private static JsonObject? ToJsonObject(IReadOnlyDictionary? values) + { + if (values == null) return null; + var obj = new JsonObject(); + foreach (var item in values) + obj[item.Key] = ToJsonValue(item.Value); + return obj; + } + + private static JsonNode? ToJsonValue(object? value) => value switch + { + null => null, + JsonNode node => node.DeepClone(), + string text => JsonValue.Create(text), + bool flag => JsonValue.Create(flag), + int number => JsonValue.Create(number), + long number => JsonValue.Create(number), + double number => JsonValue.Create(number), + _ => JsonValue.Create(value.ToString()) + }; + + private static JsonArray ToJsonArray(IEnumerable values) + { + var array = new JsonArray(); + foreach (var value in values) + array.Add((JsonNode?)JsonValue.Create(value)); + return array; + } +} diff --git a/src/officecli/Handlers/Hwp/SafeSave/SafeSaveManifestWriter.cs b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveManifestWriter.cs new file mode 100644 index 000000000..366c3cd41 --- /dev/null +++ b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveManifestWriter.cs @@ -0,0 +1,54 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Text.Json; + +namespace OfficeCli.Handlers.Hwp.SafeSave; + +internal interface ISafeSaveManifestWriter +{ + string BuildManifestPath(SafeSaveOptions options, DateTimeOffset timestamp); + void Probe(string manifestPath); + void Write(SafeSaveTransaction transaction); +} + +internal sealed class SafeSaveManifestWriter : ISafeSaveManifestWriter +{ + private static readonly JsonSerializerOptions JsonOptions = new() + { + WriteIndented = true + }; + + public string BuildManifestPath(SafeSaveOptions options, DateTimeOffset timestamp) + { + var inputPath = Path.GetFullPath(options.InputPath); + var outputPath = Path.GetFullPath(options.OutputPath); + if (options.InPlace) + return $"{inputPath}.officecli-transaction-{FormatTimestamp(timestamp)}.json"; + return $"{outputPath}.officecli-transaction.json"; + } + + public void Probe(string manifestPath) + { + var directory = Path.GetDirectoryName(manifestPath); + if (!string.IsNullOrWhiteSpace(directory)) + Directory.CreateDirectory(directory); + var probePath = $"{manifestPath}.probe-{Guid.NewGuid():N}.tmp"; + File.WriteAllText(probePath, "{}"); + File.Delete(probePath); + } + + public void Write(SafeSaveTransaction transaction) + { + var manifestPath = transaction.ManifestPath + ?? throw new InvalidOperationException("Safe-save manifest path is missing."); + var directory = Path.GetDirectoryName(manifestPath); + if (!string.IsNullOrWhiteSpace(directory)) + Directory.CreateDirectory(directory); + var json = SafeSaveJsonMapper.ToJson(transaction).ToJsonString(JsonOptions); + File.WriteAllText(manifestPath, json); + } + + internal static string FormatTimestamp(DateTimeOffset timestamp) + => timestamp.UtcDateTime.ToString("yyyyMMddHHmmss"); +} diff --git a/src/officecli/Handlers/Hwp/SafeSave/SafeSaveOptions.cs b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveOptions.cs new file mode 100644 index 000000000..19c722912 --- /dev/null +++ b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveOptions.cs @@ -0,0 +1,15 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +namespace OfficeCli.Handlers.Hwp.SafeSave; + +internal sealed record SafeSaveOptions( + string InputPath, + string OutputPath, + bool InPlace, + bool Backup, + bool Verify, + string Operation, + string Format, + SafeSavePolicy Policy +); diff --git a/src/officecli/Handlers/Hwp/SafeSave/SafeSavePolicy.cs b/src/officecli/Handlers/Hwp/SafeSave/SafeSavePolicy.cs new file mode 100644 index 000000000..77eebb4aa --- /dev/null +++ b/src/officecli/Handlers/Hwp/SafeSave/SafeSavePolicy.cs @@ -0,0 +1,18 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +namespace OfficeCli.Handlers.Hwp.SafeSave; + +internal sealed record SafeSavePolicy( + IReadOnlySet RequiredChecks, + bool BackupRequired, + bool TransactionRequired, + bool ValidationRequired +) +{ + public static SafeSavePolicy OutputMode(params string[] requiredChecks) => new( + new HashSet(requiredChecks, StringComparer.OrdinalIgnoreCase), + BackupRequired: false, + TransactionRequired: true, + ValidationRequired: requiredChecks.Length > 0); +} diff --git a/src/officecli/Handlers/Hwp/SafeSave/SafeSaveRunner.cs b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveRunner.cs new file mode 100644 index 000000000..86c766433 --- /dev/null +++ b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveRunner.cs @@ -0,0 +1,445 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +namespace OfficeCli.Handlers.Hwp.SafeSave; + +internal sealed class SafeSaveRunner +{ + private readonly ISafeSaveManifestWriter _manifestWriter; + private readonly Action _replaceFile; + + public SafeSaveRunner() + : this(new SafeSaveManifestWriter(), File.Move) + { + } + + internal SafeSaveRunner(ISafeSaveManifestWriter manifestWriter) + : this(manifestWriter, File.Move) + { + } + + internal SafeSaveRunner(ISafeSaveManifestWriter manifestWriter, Action replaceFile) + { + _manifestWriter = manifestWriter; + _replaceFile = replaceFile; + } + + public async Task RunAsync( + SafeSaveOptions options, + Func writeTempAsync, + Func> validateAsync, + CancellationToken cancellationToken) + { + var timestamp = DateTimeOffset.UtcNow; + var manifestPath = _manifestWriter.BuildManifestPath(options, timestamp); + if (options.InPlace) + return await RunInPlaceAsync( + options, + writeTempAsync, + validateAsync, + cancellationToken, + timestamp, + manifestPath).ConfigureAwait(false); + + return await RunOutputAsync( + options, + writeTempAsync, + validateAsync, + cancellationToken, + manifestPath).ConfigureAwait(false); + } + + private async Task RunOutputAsync( + SafeSaveOptions options, + Func writeTempAsync, + Func> validateAsync, + CancellationToken cancellationToken, + string manifestPath) + { + var outputPath = Path.GetFullPath(options.OutputPath); + var inputPath = Path.GetFullPath(options.InputPath); + if (PathsReferToSameLocation(inputPath, outputPath)) + return FinalizeTransaction( + options, + null, + null, + manifestPath, + false, + false, + [new SafeSaveCheck( + "same-path-output", + false, + "error", + "Output path equals input path. Use --in-place with --backup --verify.")], + null, + ["Output path equals input path. Use --in-place with --backup --verify."]); + + var outputDirectory = Path.GetDirectoryName(outputPath); + if (string.IsNullOrWhiteSpace(outputDirectory)) + outputDirectory = Directory.GetCurrentDirectory(); + Directory.CreateDirectory(outputDirectory); + + var extension = Path.GetExtension(outputPath); + var tempPath = Path.Combine( + outputDirectory, + $".{Path.GetFileNameWithoutExtension(outputPath)}.officecli-{DateTimeOffset.UtcNow:yyyyMMddHHmmss}-{Guid.NewGuid():N}{extension}"); + + try + { + await writeTempAsync(tempPath).ConfigureAwait(false); + cancellationToken.ThrowIfCancellationRequested(); + + var checks = new List + { + BuildTempWriteCheck(tempPath) + }; + var validation = await validateAsync(tempPath).ConfigureAwait(false); + checks.AddRange(validation.Checks); + + var missing = FindMissingRequiredChecks(options.Policy, checks); + if (missing.Count > 0) + { + checks.Add(new SafeSaveCheck( + "required-checks", + false, + "error", + $"Missing or failed required safe-save check(s): {string.Join(", ", missing)}")); + var failed = FinalizeTransaction( + options, + tempPath, + null, + manifestPath, + false, + false, + checks, + validation, + ["safe-save required checks failed"]); + TryDelete(tempPath); + return failed; + } + + _replaceFile(tempPath, outputPath, true); + return FinalizeTransaction(options, tempPath, null, manifestPath, true, true, checks, validation, []); + } + catch + { + TryDelete(tempPath); + throw; + } + } + + private async Task RunInPlaceAsync( + SafeSaveOptions options, + Func writeTempAsync, + Func> validateAsync, + CancellationToken cancellationToken, + DateTimeOffset timestamp, + string manifestPath) + { + var readinessChecks = BuildInPlaceReadinessChecks(options); + if (readinessChecks.Count > 0) + return FinalizeTransaction( + options, + null, + null, + manifestPath, + false, + false, + readinessChecks, + null, + ["in-place safe save requires --backup and --verify"]); + + var inputPath = Path.GetFullPath(options.InputPath); + var outputDirectory = Path.GetDirectoryName(inputPath) ?? Directory.GetCurrentDirectory(); + var extension = Path.GetExtension(inputPath); + var tempPath = Path.Combine( + outputDirectory, + $".{Path.GetFileNameWithoutExtension(inputPath)}.officecli-{SafeSaveManifestWriter.FormatTimestamp(timestamp)}-{Guid.NewGuid():N}{extension}"); + var backupPath = SafeSaveBackup.BuildBackupPath(inputPath, timestamp); + + try + { + await writeTempAsync(tempPath).ConfigureAwait(false); + cancellationToken.ThrowIfCancellationRequested(); + + var checks = new List + { + BuildTempWriteCheck(tempPath) + }; + var validation = await validateAsync(tempPath).ConfigureAwait(false); + checks.AddRange(validation.Checks); + + var missing = FindMissingRequiredChecks(options.Policy, checks); + if (missing.Count > 0) + { + checks.Add(new SafeSaveCheck( + "required-checks", + false, + "error", + $"Missing or failed required safe-save check(s): {string.Join(", ", missing)}")); + var failed = FinalizeTransaction( + options, + tempPath, + null, + manifestPath, + false, + false, + checks, + validation, + ["safe-save required checks failed"]); + TryDelete(tempPath); + return failed; + } + + var backupCheck = SafeSaveBackup.Create(inputPath, backupPath); + checks.Add(backupCheck); + if (!backupCheck.Ok) + { + var failed = FinalizeTransaction( + options, + tempPath, + null, + manifestPath, + false, + false, + checks, + validation, + ["backup creation failed; source was not replaced"]); + TryDelete(tempPath); + return failed; + } + + var manifestProbe = ProbeManifestWrite(manifestPath); + if (!manifestProbe.Ok) + { + checks.Add(manifestProbe); + TryDelete(tempPath); + return Transaction( + options, + tempPath, + backupPath, + manifestPath, + false, + false, + checks, + validation, + ["manifest write failed; source was not replaced"]); + } + checks.Add(manifestProbe); + + try + { + _replaceFile(tempPath, inputPath, true); + checks.Add(new SafeSaveCheck( + "atomic-replace", + true, + "info", + null, + new Dictionary { ["targetPath"] = inputPath })); + } + catch (Exception ex) when (ex is IOException or UnauthorizedAccessException) + { + checks.Add(new SafeSaveCheck( + "atomic-replace", + false, + "error", + $"Could not replace source file: {ex.Message}", + new Dictionary { ["targetPath"] = inputPath })); + var failed = FinalizeTransaction( + options, + tempPath, + backupPath, + manifestPath, + false, + false, + checks, + validation, + ["atomic replace failed; source was not marked as replaced"]); + TryDelete(tempPath); + return failed; + } + + return FinalizeTransaction( + options, + tempPath, + backupPath, + manifestPath, + true, + true, + checks, + validation, + []); + } + catch + { + TryDelete(tempPath); + throw; + } + } + + private SafeSaveTransaction FinalizeTransaction( + SafeSaveOptions options, + string? tempPath, + string? backupPath, + string manifestPath, + bool ok, + bool verified, + IReadOnlyList checks, + SafeSaveValidationResult? validation, + IReadOnlyList warnings) + { + var finalChecks = checks.ToList(); + finalChecks.Add(new SafeSaveCheck( + "manifest-write", + true, + "info", + null, + new Dictionary { ["manifestPath"] = manifestPath })); + var transaction = Transaction( + options, + tempPath, + backupPath, + manifestPath, + ok, + verified, + finalChecks, + validation, + warnings); + try + { + _manifestWriter.Write(transaction); + return transaction; + } + catch (Exception ex) when (ex is IOException or UnauthorizedAccessException) + { + finalChecks.RemoveAt(finalChecks.Count - 1); + finalChecks.Add(new SafeSaveCheck( + "manifest-write", + false, + "error", + $"Could not write safe-save manifest: {ex.Message}", + new Dictionary { ["manifestPath"] = manifestPath })); + var finalWarnings = warnings.Concat(["manifest write failed"]).ToArray(); + return Transaction( + options, + tempPath, + backupPath, + manifestPath, + false, + false, + finalChecks, + validation, + finalWarnings); + } + } + + private SafeSaveCheck ProbeManifestWrite(string manifestPath) + { + try + { + _manifestWriter.Probe(manifestPath); + return new SafeSaveCheck( + "manifest-probe", + true, + "info", + null, + new Dictionary { ["manifestPath"] = manifestPath }); + } + catch (Exception ex) when (ex is IOException or UnauthorizedAccessException) + { + return new SafeSaveCheck( + "manifest-write", + false, + "error", + $"Could not prepare safe-save manifest: {ex.Message}", + new Dictionary { ["manifestPath"] = manifestPath }); + } + } + + private static SafeSaveTransaction Transaction( + SafeSaveOptions options, + string? tempPath, + string? backupPath, + string? manifestPath, + bool ok, + bool verified, + IReadOnlyList checks, + SafeSaveValidationResult? validation, + IReadOnlyList warnings) => new( + SchemaVersion: 1, + Ok: ok, + Format: options.Format, + Operation: options.Operation, + Mode: options.InPlace ? "in-place" : "output", + InputPath: Path.GetFullPath(options.InputPath), + OutputPath: Path.GetFullPath(options.OutputPath), + TempPath: tempPath, + BackupPath: backupPath, + ManifestPath: manifestPath, + Verified: verified, + Checks: checks, + SemanticDelta: validation?.SemanticDelta, + VisualDelta: validation?.VisualDelta, + PackageIntegrity: validation?.PackageIntegrity, + Warnings: warnings); + + private static List BuildInPlaceReadinessChecks(SafeSaveOptions options) + { + var checks = new List(); + if (!options.Backup) + checks.Add(new SafeSaveCheck( + "in-place-requires-backup", + false, + "error", + "In-place safe save requires --backup.")); + if (!options.Verify) + checks.Add(new SafeSaveCheck( + "in-place-requires-verify", + false, + "error", + "In-place safe save requires --verify.")); + return checks; + } + + private static SafeSaveCheck BuildTempWriteCheck(string tempPath) + { + var file = new FileInfo(tempPath); + var ok = file.Exists && file.Length > 0; + return new SafeSaveCheck( + "temp-write", + ok, + ok ? "info" : "error", + ok ? null : "Temporary output was not created or is empty.", + new Dictionary + { + ["tempPath"] = tempPath, + ["sizeBytes"] = file.Exists ? file.Length : 0 + }); + } + + private static List FindMissingRequiredChecks( + SafeSavePolicy policy, + IReadOnlyList checks) + { + var okChecks = checks + .Where(check => check.Ok) + .Select(check => check.Name) + .ToHashSet(StringComparer.OrdinalIgnoreCase); + return policy.RequiredChecks + .Where(required => !okChecks.Contains(required)) + .ToList(); + } + + private static bool PathsReferToSameLocation(string firstPath, string secondPath) + { + var comparison = OperatingSystem.IsWindows() || OperatingSystem.IsMacOS() + ? StringComparison.OrdinalIgnoreCase + : StringComparison.Ordinal; + return string.Equals(firstPath, secondPath, comparison); + } + + private static void TryDelete(string path) + { + try { File.Delete(path); } catch { /* best effort cleanup */ } + } +} diff --git a/src/officecli/Handlers/Hwp/SafeSave/SafeSaveTransaction.cs b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveTransaction.cs new file mode 100644 index 000000000..034ff6b1c --- /dev/null +++ b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveTransaction.cs @@ -0,0 +1,23 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +namespace OfficeCli.Handlers.Hwp.SafeSave; + +internal sealed record SafeSaveTransaction( + int SchemaVersion, + bool Ok, + string Format, + string Operation, + string Mode, + string InputPath, + string OutputPath, + string? TempPath, + string? BackupPath, + string? ManifestPath, + bool Verified, + IReadOnlyList Checks, + IReadOnlyDictionary? SemanticDelta, + IReadOnlyDictionary? VisualDelta, + IReadOnlyDictionary? PackageIntegrity, + IReadOnlyList Warnings +); diff --git a/src/officecli/Handlers/Hwp/SafeSave/SafeSaveValidationResult.cs b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveValidationResult.cs new file mode 100644 index 000000000..2700450bb --- /dev/null +++ b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveValidationResult.cs @@ -0,0 +1,15 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +namespace OfficeCli.Handlers.Hwp.SafeSave; + +internal sealed record SafeSaveValidationResult( + IReadOnlyList Checks, + IReadOnlyDictionary? SemanticDelta = null, + IReadOnlyDictionary? VisualDelta = null, + IReadOnlyDictionary? PackageIntegrity = null +) +{ + public static SafeSaveValidationResult FromChecks(IReadOnlyList checks) + => new(checks); +} diff --git a/src/officecli/Handlers/Hwp/SafeSave/SafeSaveVisualValidator.cs b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveVisualValidator.cs new file mode 100644 index 000000000..b8470ee12 --- /dev/null +++ b/src/officecli/Handlers/Hwp/SafeSave/SafeSaveVisualValidator.cs @@ -0,0 +1,41 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using OfficeCli.Handlers.Hwp; + +namespace OfficeCli.Handlers.Hwp.SafeSave; + +internal static class SafeSaveVisualValidator +{ + public static SafeSaveValidationResult FromRenderResult(HwpRenderResult renderResult) + { + var pageCount = renderResult.Pages.Count; + var firstPage = renderResult.Pages.FirstOrDefault(); + var visualDelta = new Dictionary + { + ["pageCount"] = pageCount, + ["manifestPath"] = renderResult.ManifestPath, + ["firstPageSha256"] = firstPage?.Sha256, + ["engine"] = renderResult.Engine, + ["engineVersion"] = renderResult.EngineVersion + }; + var check = new SafeSaveCheck( + "visual-render", + pageCount > 0, + pageCount > 0 ? "info" : "warning", + pageCount > 0 ? null : "Provider SVG render returned no pages.", + visualDelta); + return new SafeSaveValidationResult([check], VisualDelta: visualDelta); + } + + public static SafeSaveValidationResult FromFailure(Exception exception) + { + var visualDelta = new Dictionary + { + ["error"] = exception.Message + }; + return new SafeSaveValidationResult( + [new SafeSaveCheck("visual-render", false, "warning", exception.Message, visualDelta)], + VisualDelta: visualDelta); + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxDocument.cs b/src/officecli/Handlers/Hwpx/HwpxDocument.cs new file mode 100644 index 000000000..ef95a8bc7 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxDocument.cs @@ -0,0 +1,125 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.IO.Compression; +using System.Xml.Linq; + +namespace OfficeCli.Handlers; + +internal class HwpxDocument +{ + public ZipArchive Archive { get; init; } = null!; + public XDocument? Header { get; set; } + /// Actual ZIP entry path of header.xml (e.g. "Contents/header.xml"). + public string? HeaderEntryPath { get; set; } + public List Sections { get; } = new(); + /// Parsed content.hpf manifest document for section/spine management. + public XDocument? ManifestDoc { get; set; } + /// ZIP entry path for the manifest (e.g. "Contents/content.hpf"). + public string? ManifestEntryPath { get; set; } + /// Selected rootfile path from container.xml (null if conventional fallback used). + public string? RootfilePath { get; set; } + public HwpxSection PrimarySection => Sections[0]; // convenience + + /// Read binary data from BinData directory in the ZIP archive. + /// + /// Hancom stores BinData at the archive root (BinData/imageN.ext), not under Contents/. + /// We try root-level first (canonical), then Contents/ prefix (legacy fallback). + /// + public byte[]? GetBinData(string reference) + { + var name = reference.StartsWith("BinData/") ? reference : $"BinData/{reference}"; + // Try root-level first (Hancom canonical path) + var entry = Archive.GetEntry(name) + // Fallback: Contents/ prefix (legacy officecli path) + ?? Archive.GetEntry($"Contents/{name}"); + if (entry == null) return null; + using var stream = entry.Open(); + using var ms = new MemoryStream(); + stream.CopyTo(ms); + return ms.ToArray(); + } + + /// All paragraphs across all sections. SectionIndex is 0-based LOCAL index within that section. + public IEnumerable<(HwpxSection Section, XElement Paragraph, int SectionIndex)> AllParagraphs() + { + foreach (var sec in Sections) + { + int localIdx = 0; + foreach (var p in sec.Paragraphs) + yield return (sec, p, localIdx++); + } + } + + /// All tables across all sections. SectionIndex is 0-based LOCAL index within that section. + public IEnumerable<(HwpxSection Section, XElement Table, int SectionIndex)> AllTables() + { + foreach (var sec in Sections) + { + int localIdx = 0; + foreach (var tbl in sec.Tables) + yield return (sec, tbl, localIdx++); + } + } + + /// All content elements (paragraphs + table cells) in document order for text extraction. + /// Handles both officecli-created tables (direct section children) and + /// Hancom-created tables (nested inside p > run > tbl). + public IEnumerable<(HwpxSection Section, XElement Paragraph, string Path)> AllContentInOrder() + { + foreach (var sec in Sections) + { + int paraIdx = 0; + int tblIdx = 0; + foreach (var child in sec.Root.Elements()) + { + var localName = child.Name.LocalName; + if (localName == "p") + { + paraIdx++; + yield return (sec, child, $"/section[{sec.Index + 1}]/p[{paraIdx}]"); + + // Hancom nests tables inside p > run > tbl + var nestedTables = child.Descendants(HwpxNs.Hp + "tbl"); + foreach (var ntbl in nestedTables) + { + tblIdx++; + foreach (var item in EnumerateTableCells(sec, ntbl, tblIdx)) + yield return item; + } + } + else if (localName == "tbl") + { + tblIdx++; + foreach (var item in EnumerateTableCells(sec, child, tblIdx)) + yield return item; + } + } + } + } + + private static IEnumerable<(HwpxSection Section, XElement Paragraph, string Path)> EnumerateTableCells( + HwpxSection sec, XElement tbl, int tblIdx) + { + int rowIdx = 0; + foreach (var tr in tbl.Elements(HwpxNs.Hp + "tr")) + { + rowIdx++; + int cellIdx = 0; + foreach (var tc in tr.Elements(HwpxNs.Hp + "tc")) + { + cellIdx++; + var subList = tc.Element(HwpxNs.Hp + "subList"); + var paragraphs = subList?.Elements(HwpxNs.Hp + "p") + ?? tc.Elements(HwpxNs.Hp + "p"); + int cellParaIdx = 0; + foreach (var p in paragraphs) + { + cellParaIdx++; + var path = $"/section[{sec.Index + 1}]/tbl[{tblIdx}]/tr[{rowIdx}]/tc[{cellIdx}]/p[{cellParaIdx}]"; + yield return (sec, p, path); + } + } + } + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxEquationConverter.cs b/src/officecli/Handlers/Hwpx/HwpxEquationConverter.cs new file mode 100644 index 000000000..f9d13bc28 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxEquationConverter.cs @@ -0,0 +1,523 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 +// +// Equation conversion logic based on LibreOffice hwpeq.cxx (MPL 2.0). +// Original: https://github.com/LibreOffice/core/blob/master/hwpfilter/source/hwpeq.cxx +// NOT derived from H2Orestart ConvEquation.java (GPLv3) — GPL infection risk. + +using System.Text; +using System.Text.RegularExpressions; + +namespace OfficeCli.Handlers; + +/// +/// Convert Hancom equation script to StarMath and LaTeX formats. +/// Hancom uses a proprietary scripting language similar to StarMath but with +/// case differences and some structural variations. +/// +public static class HwpxEquationConverter +{ + // CJK-safe word boundary: matches keyword not surrounded by alphanumeric chars. + // Standard \b fails on CJK boundaries. + private static string WB(string keyword) + => $@"(? + /// Keyword mapping: Hancom script → StarMath. + /// Most Hancom keywords are identical to StarMath; only differences are mapped. + /// Order matters: longer/more specific patterns first to avoid partial matches. + /// + private static readonly (string Pattern, string Replacement)[] HwpToStarMathMap = + { + // === Structural Commands (case normalization) === + (WB("SQRT"), "sqrt"), + (WB("PILE"), "alignc"), + (WB("LPILE"), "alignl"), + (WB("RPILE"), "alignr"), + (WB("LSUB"), "lsub"), + (WB("LSUP"), "lsup"), + (WB("TIMES"), "times"), + (WB("PROD"), "prod"), + + // === Integral variants → StarMath (normalize to base form) === + // StarMath only has: int, iint, iiint, lint (contour) + (WB("OTINT"), "iiint"), // triple contour → triple (approx) + (WB("ODINT"), "iint"), // double contour → double (approx) + (WB("TINT"), "iiint"), // triple integral + (WB("DINT"), "iint"), // double integral + (WB("OINT"), "lint"), // contour integral + (WB("INT"), "int"), + + // === Set operators === + (WB("SMALLUNION"), "union"), + (WB("smallunion"), "union"), + (WB("UNION"), "union"), + (WB("CAP"), "union"), + (WB("SMALLINTER"), "intersection"), + (WB("smallinter"), "intersection"), + (WB("INTER"), "intersection"), + + // === Bracket case normalization === + (WB("LEFT"), "left"), + (WB("RIGHT"), "right"), + (WB("MATRIX"), "matrix"), + (WB("BMATRIX"), "bmatrix"), + (WB("DMATRIX"), "dmatrix"), + (WB("PMATRIX"), "pmatrix"), + (WB("CASES"), "cases"), + + // === Special symbols === + (WB("ALEPH"), "aleph"), + (WB("HBAR"), "hbar"), + (WB("IMAG"), "im"), + (WB("WP"), "wp"), + (WB("ANGSTROM"), "{circle A}"), + (WB("IMATH"), "{italic i}"), + (WB("JMATH"), "{italic j}"), + (WB("ELL"), "{italic l}"), + (WB("LITER"), "{italic l}"), + (WB("OHM"), "%OMEGA"), + + // === Operators === + (WB("OPLUS"), "oplus"), + (WB("OMINUS"), "ominus"), + (WB("OTIMES"), "otimes"), + (WB("ODOT"), "odot"), + (WB("OSLASH"), "odivide"), + (WB("ODIV"), "odivide"), + (WB("VEE"), "or"), + (WB("LOR"), "or"), + (WB("WEDGE"), "and"), + + // === Set relations === + (WB("SUBSET"), "subset"), + (WB("SUPSET"), "supset"), + (WB("SUPERSET"), "supset"), + (WB("SUBSETEQ"), "subseteq"), + (WB("SUPSETEQ"), "supseteq"), + (WB("IN"), "in"), + (WB("OWNS"), "owns"), + (WB("LEQ"), "<="), + (WB("GEQ"), ">="), + (WB("PREC"), "prec"), + (WB("SUCC"), "succ"), + + // === Arithmetic / Logic === + (WB("PLUSMINUS"), "plusminus"), + (WB("MINUSPLUS"), "minusplus"), + (WB("DIVIDE"), "div"), + (WB("divide"), "div"), + (WB("CIRC"), "circ"), + (WB("EMPTYSET"), "emptyset"), + (WB("EXIST"), "exists"), + (WB("SIM"), "sim"), + (WB("APPROX"), "approx"), + (WB("SIMEQ"), "simeq"), + (WB("EQUIV"), "equiv"), + (WB("FORALL"), "forall"), + (WB("PARTIAL"), "partial"), + (WB("INF"), "infinity"), + (WB("inf"), "infinity"), + + // === Arrows === + (WB("LRARROW"), "dlrarrow"), // double left-right + (WB("LARROW"), "dlarrow"), // double left (uppercase = double) + (WB("RARROW"), "drarrow"), // double right + (WB("lrarrow"), "lrarrow"), // single left-right (lowercase = single) + (WB("larrow"), "leftarrow"), + (WB("rarrow"), "rightarrow"), + (WB("uarrow"), "uparrow"), + (WB("darrow"), "downarrow"), + (WB("VERT"), "parallel"), + (WB("vert"), "divides"), + + // === Dots === + (WB("cdots"), "dotsaxis"), + (WB("LDOTS"), "dotslow"), + (WB("ldots"), "dotslow"), + (WB("VDOTS"), "dotsvert"), + (WB("DDOTS"), "dotsdown"), + + // === Decorations (case normalization) === + (WB("ACUTE"), "acute"), + (WB("GRAVE"), "grave"), + (WB("TILDE"), "tilde"), + (WB("OVERLINE"), "overline"), + (WB("under"), "underline"), + + // === Miscellaneous === + (WB("TRIANGLED"), "nabla"), + (WB("SANGLE"), "%angle"), + (WB("BOT"), "ortho"), + (WB("hund"), "%perthousand"), + }; + + /// + /// Convert Hancom equation script to StarMath format. + /// Most keywords are identical; this handles case differences and structural variations. + /// + public static string ToStarMath(string hwpScript) + { + if (string.IsNullOrWhiteSpace(hwpScript)) return hwpScript; + + var result = hwpScript; + + // Step 1: Keyword replacements + foreach (var (pattern, replacement) in HwpToStarMathMap) + { + result = Regex.Replace(result, pattern, replacement); + } + + // Step 2: BIGG (large divider) + if (result.Contains("bigg", StringComparison.OrdinalIgnoreCase)) + { + result = Regex.Replace(result, @"(?i)(bigg)\s*/\s*(.*)", "wideslash {$2}"); + result = Regex.Replace(result, @"(?i)(bigg)\s*\\\s*(.*)", "widebslash {$2}"); + } + + // Step 3: OVER — ensure bare operands are wrapped in braces + // "a over b" → "{a over b}" (StarMath expects braces around fraction) + if (result.Contains("over", StringComparison.OrdinalIgnoreCase)) + { + result = Regex.Replace(result, + @"([^\s\}]+)\s+(?i:over)\s+([^\{\s]+)", + "{$1 over $2}"); + } + + // Step 4: MATRIX variants — convert # (row sep) and & (col sep) to StarMath format + if (result.Contains("matrix", StringComparison.OrdinalIgnoreCase)) + { + result = ConvertMatrix(result); + } + + // Step 5: Decorations — hat/check/tilde expand to wide* for multi-char + foreach (var deco in new[] { "hat", "check", "tilde" }) + { + if (result.Contains(deco, StringComparison.OrdinalIgnoreCase)) + { + var m = Regex.Match(result, $@"(? 3) // {ab} = 4 chars, single char = {a} = 3 + result = Regex.Replace(result, + $@"(?Convert MATRIX variants: # → ## (row sep), & → # (col sep). + private static string ConvertMatrix(string input) + { + var matrixPattern = @"(?i)(bmatrix|dmatrix|pmatrix|matrix)\s*\{((?:[^{}]|\{[^{}]*\})+)\}"; + return Regex.Replace(input, matrixPattern, m => + { + var type = m.Groups[1].Value.ToLowerInvariant(); + var body = m.Groups[2].Value; + var converted = body.Replace("#", "##").Replace("&", "#"); + + return type switch + { + "bmatrix" => $"left [ matrix{{ {converted}}} right ]", + "dmatrix" => $"left lline matrix{{ {converted}}} right rline", + "pmatrix" => $"left ( matrix{{ {converted}}} right )", + _ => $"matrix{{ {converted}}}", + }; + }); + } + + // ==================== Hancom → LaTeX ==================== + + /// + /// Keyword mapping: Hancom script → LaTeX. + /// Longer patterns first to avoid partial matches. + /// + private static readonly (string Pattern, string Replacement)[] HwpToLatexMap = + { + // === Greek Uppercase === + (WB("Alpha"), @"\Alpha"), + (WB("Beta"), @"\Beta"), + (WB("Gamma"), @"\Gamma"), + (WB("Delta"), @"\Delta"), + (WB("Epsilon"), @"\Epsilon"), + (WB("Zeta"), @"\Zeta"), + (WB("Eta"), @"\Eta"), + (WB("Theta"), @"\Theta"), + (WB("Iota"), @"\Iota"), + (WB("Kappa"), @"\Kappa"), + (WB("Lambda"), @"\Lambda"), + (WB("Mu"), @"\Mu"), + (WB("Nu"), @"\Nu"), + (WB("Xi"), @"\Xi"), + (WB("Omicron"), @"\Omicron"), + (WB("Pi"), @"\Pi"), + (WB("Rho"), @"\Rho"), + (WB("Sigma"), @"\Sigma"), + (WB("SIGMA"), @"\Sigma"), + (WB("Tau"), @"\Tau"), + (WB("Upsilon"), @"\Upsilon"), + (WB("Phi"), @"\Phi"), + (WB("Chi"), @"\Chi"), + (WB("Psi"), @"\Psi"), + (WB("Omega"), @"\Omega"), + + // === Greek Lowercase === + (WB("alpha"), @"\alpha"), + (WB("beta"), @"\beta"), + (WB("gamma"), @"\gamma"), + (WB("delta"), @"\delta"), + (WB("epsilon"), @"\epsilon"), + (WB("varepsilon"), @"\varepsilon"), + (WB("zeta"), @"\zeta"), + (WB("eta"), @"\eta"), + (WB("theta"), @"\theta"), + (WB("vartheta"), @"\vartheta"), + (WB("iota"), @"\iota"), + (WB("kappa"), @"\kappa"), + (WB("lambda"), @"\lambda"), + (WB("mu"), @"\mu"), + (WB("nu"), @"\nu"), + (WB("xi"), @"\xi"), + (WB("omicron"), @"\omicron"), + (WB("pi"), @"\pi"), + (WB("varpi"), @"\varpi"), + (WB("rho"), @"\rho"), + (WB("sigma"), @"\sigma"), + (WB("varsigma"), @"\varsigma"), + (WB("tau"), @"\tau"), + (WB("upsilon"), @"\upsilon"), + (WB("phi"), @"\phi"), + (WB("varphi"), @"\varphi"), + (WB("chi"), @"\chi"), + (WB("psi"), @"\psi"), + (WB("omega"), @"\omega"), + + // === Integral variants (longer first!) === + (WB("OTINT"), @"\oiiint"), + (WB("ODINT"), @"\oiint"), + (WB("TINT"), @"\iiint"), + (WB("iiint"), @"\iiint"), + (WB("DINT"), @"\iint"), + (WB("iint"), @"\iint"), + (WB("OINT"), @"\oint"), + (WB("oint"), @"\oint"), + (WB("INT"), @"\int"), + (WB("int"), @"\int"), + + // === Functions & Large Operators === + (WB("SQRT"), @"\sqrt"), + (WB("sqrt"), @"\sqrt"), + (WB("sum"), @"\sum"), + (WB("SUM"), @"\sum"), + (WB("prod"), @"\prod"), + (WB("PROD"), @"\prod"), + (WB("lim"), @"\lim"), + (WB("Lim"), @"\lim"), + (WB("INF"), @"\infty"), + (WB("inf"), @"\infty"), + (WB("PARTIAL"),@"\partial"), + (WB("partial"),@"\partial"), + + // === Subscript/superscript keywords === + (WB("from"), "_"), + (WB("to"), "^"), + (WB("sub"), "_"), + (WB("sup"), "^"), + + // === Operators === + (WB("TIMES"), @"\times"), + (WB("times"), @"\times"), + (WB("DIVIDE"), @"\div"), + (WB("divide"), @"\div"), + (WB("PLUSMINUS"), @"\pm"), + (WB("MINUSPLUS"), @"\mp"), + (WB("CIRC"), @"\circ"), + (WB("OPLUS"), @"\oplus"), + (WB("OMINUS"), @"\ominus"), + (WB("OTIMES"), @"\otimes"), + (WB("ODOT"), @"\odot"), + + // === Set operators === + (WB("SMALLUNION"), @"\bigcup"), + (WB("smallunion"), @"\bigcup"), + (WB("UNION"), @"\bigcup"), + (WB("CAP"), @"\bigcup"), + (WB("SMALLINTER"), @"\bigcap"), + (WB("smallinter"), @"\bigcap"), + (WB("INTER"), @"\bigcap"), + + // === Relations === + (WB("SUBSET"), @"\subset"), + (WB("SUPSET"), @"\supset"), + (WB("SUPERSET"), @"\supset"), + (WB("SUBSETEQ"), @"\subseteq"), + (WB("SUPSETEQ"), @"\supseteq"), + (WB("IN"), @"\in"), + (WB("OWNS"), @"\ni"), + (WB("LEQ"), @"\leq"), + (WB("GEQ"), @"\geq"), + (WB("PREC"), @"\prec"), + (WB("SUCC"), @"\succ"), + (WB("SIM"), @"\sim"), + (WB("APPROX"), @"\approx"), + (WB("SIMEQ"), @"\simeq"), + (WB("EQUIV"), @"\equiv"), + (WB("FORALL"), @"\forall"), + (WB("forall"), @"\forall"), + (WB("EXIST"), @"\exists"), + (WB("EMPTYSET"), @"\emptyset"), + + // === Arrows (longer first!) === + (WB("LRARROW"), @"\Leftrightarrow"), + (WB("lrarrow"), @"\leftrightarrow"), + (WB("LARROW"), @"\Leftarrow"), + (WB("larrow"), @"\leftarrow"), + (WB("RARROW"), @"\Rightarrow"), + (WB("rarrow"), @"\rightarrow"), + (WB("uarrow"), @"\uparrow"), + (WB("darrow"), @"\downarrow"), + + // === Dots === + (WB("cdots"), @"\cdots"), + (WB("LDOTS"), @"\ldots"), + (WB("ldots"), @"\ldots"), + (WB("VDOTS"), @"\vdots"), + (WB("DDOTS"), @"\ddots"), + + // === Decorations === + (WB("hat"), @"\widehat"), + (WB("tilde"), @"\widetilde"), + (WB("bar"), @"\overline"), + (WB("overline"), @"\overline"), + (WB("OVERLINE"), @"\overline"), + (WB("vec"), @"\vec"), + (WB("dot"), @"\dot"), + (WB("ddot"), @"\ddot"), + (WB("acute"), @"\acute"), + (WB("ACUTE"), @"\acute"), + (WB("grave"), @"\grave"), + (WB("GRAVE"), @"\grave"), + (WB("check"), @"\check"), + (WB("breve"), @"\breve"), + (WB("under"), @"\underline"), + (WB("underline"), @"\underline"), + + // === Brackets (case normalization) === + (WB("LEFT"), @"\left"), + (WB("RIGHT"), @"\right"), + + // === Miscellaneous === + (WB("ALEPH"), @"\aleph"), + (WB("HBAR"), @"\hbar"), + (WB("TRIANGLED"), @"\nabla"), + (WB("nabla"), @"\nabla"), + (WB("VERT"), @"\parallel"), + (WB("vert"), @"\mid"), + (WB("BOT"), @"\perp"), + + // === Line separator === + // # in Hancom = line break, maps to \\ in LaTeX + // Handled separately in ConvertStructure + }; + + /// + /// Convert Hancom equation script to LaTeX format. + /// V1: keyword substitution + simple over→\frac pattern matching. + /// V2 (future): recursive descent parser ported from hwpeq.cxx. + /// + public static string ToLatex(string hwpScript) + { + if (string.IsNullOrWhiteSpace(hwpScript)) return hwpScript; + + var result = hwpScript; + + // Phase 1: Convert "over" to \frac (simple single-level braces) + // {a over b} → \frac{a}{b} + result = ConvertOverToFrac(result); + + // Phase 2: Keyword substitutions + foreach (var (pattern, replacement) in HwpToLatexMap) + { + result = Regex.Replace(result, pattern, replacement); + } + + // Phase 3: MATRIX → LaTeX environments + result = ConvertMatrixToLatex(result); + + // Phase 4: CASES → LaTeX cases environment + if (result.Contains("cases", StringComparison.OrdinalIgnoreCase)) + { + result = Regex.Replace(result, + @"(?i)cases\s*\{((?:[^{}]|\{[^{}]*\})+)\}", + @"\begin{cases}$1\end{cases}"); + } + + // Phase 5: ATOP → \binom + if (result.Contains("atop", StringComparison.OrdinalIgnoreCase)) + { + result = Regex.Replace(result, + @"\{([^{}]+)\s+(?i:atop)\s+([^{}]+)\}", + @"\binom{$1}{$2}"); + } + + // Phase 6: Line separator # → \\ + result = result.Replace(" # ", @" \\ "); + // Standalone # at line boundaries + result = Regex.Replace(result, @"(?<=[})])\s*#\s*", @" \\ "); + + // Phase 7: COLOR + if (result.Contains("color", StringComparison.OrdinalIgnoreCase)) + { + result = Regex.Replace(result, + @"(?i)color\s*\{\s*(\d+)\s*,\s*(\d+)\s*,\s*(\d+)\s*\}", + @"\textcolor[RGB]{$1,$2,$3}"); + } + + return result; + } + + /// + /// Convert "{a over b}" to "\frac{a}{b}". + /// V1: handles single-level braces only. Nested braces preserved as-is. + /// + private static string ConvertOverToFrac(string input) + { + // Pattern: {numerator over denominator} where numerator/denominator have no nested braces + return Regex.Replace(input, + @"\{([^{}]+?)\s+over\s+([^{}]+?)\}", + @"\frac{$1}{$2}"); + } + + /// Convert MATRIX/BMATRIX/DMATRIX/PMATRIX to LaTeX environments. + private static string ConvertMatrixToLatex(string input) + { + var matrixPattern = @"(?i)(bmatrix|dmatrix|pmatrix|matrix)\s*\{((?:[^{}]|\{[^{}]*\})+)\}"; + return Regex.Replace(input, matrixPattern, m => + { + var type = m.Groups[1].Value.ToLowerInvariant(); + var body = m.Groups[2].Value; + // Hancom: & = column sep, # = row sep + // LaTeX: & = column sep, \\ = row sep + var converted = body.Replace("#", @" \\ ").Trim(); + + var env = type switch + { + "bmatrix" => "bmatrix", + "dmatrix" => "vmatrix", + "pmatrix" => "pmatrix", + _ => "matrix", + }; + return $@"\begin{{{env}}}{converted}\end{{{env}}}"; + }); + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxHandler.Diff.cs b/src/officecli/Handlers/Hwpx/HwpxHandler.Diff.cs new file mode 100644 index 000000000..d7a4f0154 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxHandler.Diff.cs @@ -0,0 +1,451 @@ +// Plan 84/99.9.H: Document Diff/Compare +using System.Text.Json.Nodes; +using System.Text.RegularExpressions; + +namespace OfficeCli.Handlers; + +public partial class HwpxHandler +{ + // H1: Block similarity threshold + private const double BlockSimilarityThreshold = 0.4; + + // H2: Table similarity weights + private const double TableDimWeight = 0.3; + private const double TableContentWeight = 0.7; + + // H3: Max matrix cells for Levenshtein + private const long MaxDiffMatrixCells = 10_000_000; + + /// Jaccard similarity between two strings based on word tokens. + internal static double ComputeBlockSimilarity(string a, string b) + { + if (string.IsNullOrEmpty(a) && string.IsNullOrEmpty(b)) return 1.0; + if (string.IsNullOrEmpty(a) || string.IsNullOrEmpty(b)) return 0.0; + + var tokensA = NormalizeForSimilarity(a).Split(' ', StringSplitOptions.RemoveEmptyEntries).ToHashSet(); + var tokensB = NormalizeForSimilarity(b).Split(' ', StringSplitOptions.RemoveEmptyEntries).ToHashSet(); + + if (tokensA.Count == 0 && tokensB.Count == 0) return 1.0; + + int intersection = tokensA.Intersect(tokensB).Count(); + int union = tokensA.Union(tokensB).Count(); + + return union == 0 ? 0.0 : (double)intersection / union; + } + + /// Compare two documents block-by-block. + public JsonNode CompareText(HwpxHandler other) + { + var linesA = ExtractTextLines(); + var linesB = other.ExtractTextLines(); + var result = ComputeLineDiff(linesA, linesB); + + return new JsonObject + { + ["mode"] = "text", + ["linesA"] = linesA.Length, + ["linesB"] = linesB.Length, + ["changes"] = result + }; + } + + /// + /// Plan 99.9.I4: LCS-based line diff with fallback to linear scan for large inputs. + /// Uses proper LCS DP alignment for accurate diff of insertions/deletions. + /// + internal JsonArray ComputeLineDiff(string[] linesA, string[] linesB) + { + long matrixCells = (long)(linesA.Length + 1) * (linesB.Length + 1); + if (matrixCells > MaxDiffMatrixCells) + return ComputeLineDiffLinear(linesA, linesB); // fallback for huge docs + + return ComputeLineDiffLcs(linesA, linesB); + } + + /// LCS DP-based diff — accurate alignment for moderate-size documents. + private JsonArray ComputeLineDiffLcs(string[] a, string[] b) + { + int n = a.Length, m = b.Length; + + // Build LCS table + var dp = new int[n + 1, m + 1]; + for (int i = 1; i <= n; i++) + for (int j = 1; j <= m; j++) + dp[i, j] = a[i - 1] == b[j - 1] + ? dp[i - 1, j - 1] + 1 + : Math.Max(dp[i - 1, j], dp[i, j - 1]); + + // Backtrack to produce diff + var result = new List<(string type, int? la, int? lb, string? ta, string? tb)>(); + int ia = n, ib = m; + while (ia > 0 && ib > 0) + { + if (a[ia - 1] == b[ib - 1]) + { + result.Add(("unchanged", ia, ib, a[ia - 1], null)); + ia--; ib--; + } + else if (dp[ia - 1, ib] >= dp[ia, ib - 1]) + { + result.Add(("removed", ia, null, a[ia - 1], null)); + ia--; + } + else + { + result.Add(("added", null, ib, null, b[ib - 1])); + ib--; + } + } + while (ia > 0) { result.Add(("removed", ia, null, a[ia - 1], null)); ia--; } + while (ib > 0) { result.Add(("added", null, ib, null, b[ib - 1])); ib--; } + + result.Reverse(); + + // Post-process: detect "modified" (adjacent removed+added with similar content) + var output = new JsonArray(); + int idx = 0; + while (idx < result.Count) + { + var (type, la, lb, ta, tb) = result[idx]; + if (type == "removed" && idx + 1 < result.Count && result[idx + 1].type == "added") + { + var next = result[idx + 1]; + var sim = ComputeBlockSimilarity(ta!, next.tb!); + if (sim >= BlockSimilarityThreshold) + { + output.Add(MakeDiffEntry("modified", la, next.lb, ta, next.tb)); + idx += 2; + continue; + } + } + output.Add(MakeDiffEntry(type, la, lb, ta, tb)); + idx++; + } + return output; + } + + /// Linear scan fallback for very large documents (exceeds LCS matrix limit). + private JsonArray ComputeLineDiffLinear(string[] linesA, string[] linesB) + { + var result = new JsonArray(); + int ia = 0, ib = 0; + + while (ia < linesA.Length && ib < linesB.Length) + { + if (linesA[ia] == linesB[ib]) + { + result.Add(MakeDiffEntry("unchanged", ia + 1, ib + 1, linesA[ia], null)); + ia++; ib++; + continue; + } + + var sim = ComputeBlockSimilarity(linesA[ia], linesB[ib]); + if (sim >= BlockSimilarityThreshold) + { + result.Add(MakeDiffEntry("modified", ia + 1, ib + 1, linesA[ia], linesB[ib])); + ia++; ib++; + } + else + { + bool foundA = false, foundB = false; + for (int lookahead = 1; lookahead <= 5; lookahead++) + { + if (ib + lookahead < linesB.Length && linesA[ia] == linesB[ib + lookahead]) + { + for (int j = 0; j < lookahead; j++) + result.Add(MakeDiffEntry("added", null, ib + j + 1, null, linesB[ib + j])); + ib += lookahead; + foundB = true; + break; + } + if (ia + lookahead < linesA.Length && linesA[ia + lookahead] == linesB[ib]) + { + for (int j = 0; j < lookahead; j++) + result.Add(MakeDiffEntry("removed", ia + j + 1, null, linesA[ia + j], null)); + ia += lookahead; + foundA = true; + break; + } + } + if (!foundA && !foundB) + { + result.Add(MakeDiffEntry("modified", ia + 1, ib + 1, linesA[ia], linesB[ib])); + ia++; ib++; + } + } + } + + while (ia < linesA.Length) + { result.Add(MakeDiffEntry("removed", ia + 1, null, linesA[ia], null)); ia++; } + while (ib < linesB.Length) + { result.Add(MakeDiffEntry("added", null, ib + 1, null, linesB[ib])); ib++; } + + return result; + } + + /// Extract text lines for diff. + internal string[] ExtractTextLines() + => ViewAsText() + .Split('\n') + .Select(l => Regex.Replace(l, @"^\d+\.\s*", "").Trim()) + .Where(l => !string.IsNullOrEmpty(l)) + .ToArray(); + + private static JsonObject MakeDiffEntry(string type, int? lineA, int? lineB, string? textA, string? textB) + { + var obj = new JsonObject { ["type"] = type }; + if (lineA.HasValue) obj["lineA"] = lineA.Value; + if (lineB.HasValue) obj["lineB"] = lineB.Value; + if (textA != null) obj["textA"] = textA; + if (textB != null) obj["textB"] = textB; + return obj; + } + + // --- H2: Table comparison --- + + /// Compute similarity between two tables (dimensions + content). + internal static double ComputeTableSimilarity(string?[,] gridA, string?[,] gridB) + { + int rowsA = gridA.GetLength(0), colsA = gridA.GetLength(1); + int rowsB = gridB.GetLength(0), colsB = gridB.GetLength(1); + + double dimSim = 0; + if (rowsA + rowsB > 0) + dimSim += (double)Math.Min(rowsA, rowsB) / Math.Max(rowsA, rowsB) * 0.5; + if (colsA + colsB > 0) + dimSim += (double)Math.Min(colsA, colsB) / Math.Max(colsA, colsB) * 0.5; + + int minRows = Math.Min(rowsA, rowsB), minCols = Math.Min(colsA, colsB); + int matchCount = 0, totalCount = 0; + for (int r = 0; r < minRows; r++) + for (int c = 0; c < minCols; c++) + { + totalCount++; + var cellA = gridA[r, c]?.Trim() ?? ""; + var cellB = gridB[r, c]?.Trim() ?? ""; + if (cellA == cellB) matchCount++; + } + totalCount += Math.Max(0, rowsA * colsA - minRows * minCols); + totalCount += Math.Max(0, rowsB * colsB - minRows * minCols); + + double contentSim = totalCount == 0 ? 1.0 : (double)matchCount / totalCount; + return TableDimWeight * dimSim + TableContentWeight * contentSim; + } + + /// Compare tables between two documents by position index. + public JsonNode CompareTables(HwpxHandler other) + { + var tablesA = ExtractAllTableGrids(); + var tablesB = other.ExtractAllTableGrids(); + + var result = new JsonArray(); + int maxTables = Math.Max(tablesA.Count, tablesB.Count); + + for (int t = 0; t < maxTables; t++) + { + if (t >= tablesA.Count) + { + result.Add(new JsonObject { ["table"] = t + 1, ["type"] = "added" }); + continue; + } + if (t >= tablesB.Count) + { + result.Add(new JsonObject { ["table"] = t + 1, ["type"] = "removed" }); + continue; + } + + var gridA = tablesA[t].Grid; + var gridB = tablesB[t].Grid; + var similarity = ComputeTableSimilarity(gridA, gridB); + + var cellDiffs = new JsonArray(); + int maxRows = Math.Max(gridA.GetLength(0), gridB.GetLength(0)); + int maxCols = Math.Max(gridA.GetLength(1), gridB.GetLength(1)); + + for (int r = 0; r < maxRows; r++) + for (int c = 0; c < maxCols; c++) + { + var cellA = (r < gridA.GetLength(0) && c < gridA.GetLength(1)) ? gridA[r, c] : null; + var cellB = (r < gridB.GetLength(0) && c < gridB.GetLength(1)) ? gridB[r, c] : null; + if (cellA != cellB) + { + cellDiffs.Add(new JsonObject + { + ["row"] = r + 1, ["col"] = c + 1, + ["old"] = cellA, ["new"] = cellB + }); + } + } + + result.Add(new JsonObject + { + ["table"] = t + 1, + ["type"] = cellDiffs.Count > 0 ? "modified" : "unchanged", + ["similarity"] = Math.Round(similarity, 3), + ["changes"] = cellDiffs + }); + } + + return new JsonObject { ["mode"] = "table", ["tables"] = result }; + } + + /// Extract cell text grids for all tables. + internal List<(string Path, string?[,] Grid)> ExtractAllTableGrids() + { + var result = new List<(string Path, string?[,] Grid)>(); + foreach (var (sec, tbl, localTblIdx) in _doc.AllTables()) + { + var (grid, _) = BuildTableGrid(tbl); + var textGrid = new string?[grid.GetLength(0), grid.GetLength(1)]; + for (int r = 0; r < grid.GetLength(0); r++) + for (int c = 0; c < grid.GetLength(1); c++) + textGrid[r, c] = grid[r, c] != null ? ExtractCellText(grid[r, c]!).Trim() : null; + var path = $"/section[{sec.Index + 1}]/tbl[{localTblIdx + 1}]"; + result.Add((path, textGrid)); + } + return result; + } + + // --- H3: Levenshtein distance with fallback --- + + /// Levenshtein edit distance with matrix size limit. + internal static int LevenshteinDistance(string[] a, string[] b) + { + long matrixSize = (long)(a.Length + 1) * (b.Length + 1); + if (matrixSize > MaxDiffMatrixCells) + return LevenshteinFallback(a, b); + + int n = a.Length, m = b.Length; + var prev = new int[m + 1]; + var curr = new int[m + 1]; + + for (int j = 0; j <= m; j++) prev[j] = j; + + for (int i = 1; i <= n; i++) + { + curr[0] = i; + for (int j = 1; j <= m; j++) + { + int cost = a[i - 1] == b[j - 1] ? 0 : 1; + curr[j] = Math.Min( + Math.Min(prev[j] + 1, curr[j - 1] + 1), + prev[j - 1] + cost); + } + (prev, curr) = (curr, prev); + } + + return prev[m]; + } + + private static int LevenshteinFallback(string[] a, string[] b) + { + const int sampleSize = 500; + + int headMatches = 0; + int headLen = Math.Min(Math.Min(a.Length, b.Length), sampleSize); + for (int i = 0; i < headLen; i++) + if (a[i] == b[i]) headMatches++; + + int tailMatches = 0; + int tailLen = Math.Min(Math.Min(a.Length, b.Length), sampleSize); + for (int i = 0; i < tailLen; i++) + if (a[a.Length - 1 - i] == b[b.Length - 1 - i]) tailMatches++; + + double matchRate = (headLen + tailLen) > 0 + ? (double)(headMatches + tailMatches) / (headLen + tailLen) + : 0; + int maxLen = Math.Max(a.Length, b.Length); + + return (int)((1 - matchRate) * maxLen) + Math.Abs(a.Length - b.Length); + } + + // --- H4: Text normalization for similarity --- + + /// Normalize text for similarity comparison. + internal static string NormalizeForSimilarity(string text) + { + if (string.IsNullOrEmpty(text)) return ""; + + text = HwpxKorean.Normalize(text); + text = text.ToLowerInvariant(); + text = Regex.Replace(text, @"[^\p{L}\p{N}\s]", " "); + text = Regex.Replace(text, @"\s+", " ").Trim(); + + return text; + } + + // --- H5: Page range compare --- + + /// Parse "1-3,5,7-9" into 1-based page numbers. + internal static HashSet? ParsePageRange(string? pageRange) + { + if (string.IsNullOrWhiteSpace(pageRange)) return null; + + var pages = new HashSet(); + var parts = pageRange.Split(',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries); + + foreach (var part in parts) + { + var dashIdx = part.IndexOf('-'); + if (dashIdx > 0 && dashIdx < part.Length - 1) + { + if (int.TryParse(part[..dashIdx].Trim(), out var start) + && int.TryParse(part[(dashIdx + 1)..].Trim(), out var end)) + { + if (start > end) (start, end) = (end, start); + end = Math.Min(end, start + 999); + if (end > 100_000) end = start + 999; // absolute safety cap + for (int p = start; p <= end; p++) + { + pages.Add(p); + if (p == end) break; // prevent int overflow wrap + } + } + } + else + { + if (int.TryParse(part.Trim(), out var page)) + pages.Add(page); + } + } + + return pages.Count > 0 ? pages : null; + } + + /// Compare text for specific page ranges. + public JsonNode CompareTextRange(HwpxHandler other, string? pagesA, string? pagesB) + { + var rangeA = ParsePageRange(pagesA); + var rangeB = ParsePageRange(pagesB); + + var linesA = ExtractTextLinesFiltered(rangeA); + var linesB = other.ExtractTextLinesFiltered(rangeB); + var changes = ComputeLineDiff(linesA, linesB); + + return new JsonObject + { + ["mode"] = "text", + ["pagesA"] = pagesA ?? "all", + ["pagesB"] = pagesB ?? "all", + ["linesA"] = linesA.Length, + ["linesB"] = linesB.Length, + ["changes"] = changes + }; + } + + /// Extract text lines filtered by section indices. + internal string[] ExtractTextLinesFiltered(HashSet? sectionFilter) + { + if (sectionFilter == null) return ExtractTextLines(); + + var lines = new List(); + foreach (var (section, para, path) in _doc.AllContentInOrder()) + { + if (!sectionFilter.Contains(section.Index + 1)) continue; + var text = HwpxKorean.Normalize(ExtractParagraphText(para)).Trim(); + if (!string.IsNullOrWhiteSpace(text)) + lines.Add(text); + } + return lines.ToArray(); + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxHandler.Helpers.cs b/src/officecli/Handlers/Hwpx/HwpxHandler.Helpers.cs new file mode 100644 index 000000000..08e3f088b --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxHandler.Helpers.cs @@ -0,0 +1,686 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Xml.Linq; + +namespace OfficeCli.Handlers; + +public partial class HwpxHandler +{ + // G2: charPr multi-namespace cascade (hp: → hh: → bare) + private XElement? FindCharPr(string idRef) + { + var namespacePriority = new[] { HwpxNs.Hp, HwpxNs.Hh, XNamespace.None }; + foreach (var ns in namespacePriority) + { + var result = _doc.Header?.Root? + .Descendants(ns + "charPr") + .FirstOrDefault(e => e.Attribute("id")?.Value == idRef); + if (result != null) return result; + } + return null; + } + + private static double GetFontSizePt(XElement charPr) + => ((double?)charPr.Attribute("height") ?? 1000) / 100.0; + + /// + /// Extract cell address and span from multiple possible formats: + /// 1. Modern: <hp:cellAddr colAddr rowAddr/> + <hp:cellSpan colSpan rowSpan/> (separate elements) + /// 2. Combined: <hp:cellAddr colAddr rowAddr colSpan rowSpan/> (span attrs on cellAddr) + /// 3. Legacy: attributes directly on <hp:tc> + /// + // ==================== Common Helpers (Plan 39) ==================== + + /// + /// Wrap content in an <hp:run> element with the given charPrIDRef. + /// Used by CreateHyperlink, CreateFootnote, AddHeaderFooter, etc. + /// + private static XElement WrapInRun(XElement content, string charPrIDRef = "0") + => new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", charPrIDRef), + content); + + /// + /// Create a standard <hp:subList> element containing a single paragraph with text. + /// Used by BuildCell, CreateFootnote, AddHeaderFooter. + /// + private XElement CreateSubList(string text, string vertAlign = "CENTER") + => new XElement(HwpxNs.Hp + "subList", + new XAttribute("id", NewId()), + new XAttribute("textDirection", "HORIZONTAL"), + new XAttribute("lineWrap", "BREAK"), + new XAttribute("vertAlign", vertAlign), + new XAttribute("linkListIDRef", "0"), + new XAttribute("linkListNextIDRef", "0"), + new XAttribute("textWidth", "0"), + new XAttribute("textHeight", "0"), + new XAttribute("hasTextRef", "0"), + new XAttribute("hasNumRef", "0"), + CreateParagraph(new() { ["text"] = text })); + + /// + /// If the paraPr referenced by the paragraph is shared with other paragraphs, + /// clone it with a new ID and update the paragraph's paraPrIDRef. + /// Returns the (possibly cloned) paraPr, or null if not found. + /// + private XElement? CloneParaPrIfShared(XElement para) + { + var paraPrIdRef = para.Attribute("paraPrIDRef")?.Value; + if (paraPrIdRef == null) return null; + + var paraPr = _doc.Header?.Root? + .Descendants(HwpxNs.Hh + "paraPr") + .FirstOrDefault(e => e.Attribute("id")?.Value == paraPrIdRef); + if (paraPr == null) return null; + + if (IsParaPrShared(paraPrIdRef, para)) + { + var newId = NextParaPrId(); + var cloned = new XElement(paraPr); + cloned.SetAttributeValue("id", newId.ToString()); + // CRITICAL: Hancom uses POSITIONAL indexing (array index), not id-based lookup. + // Append at END so position matches the new ID. + var container = paraPr.Parent!; + container.Add(cloned); + para.SetAttributeValue("paraPrIDRef", newId.ToString()); + paraPr = cloned; + + // Update itemCnt on the parent container + var count = container.Elements(HwpxNs.Hh + "paraPr").Count(); + container.SetAttributeValue("itemCnt", count.ToString()); + } + + return paraPr; + } + + /// + /// Return the next available borderFill ID based on max existing ID (not count). + /// Fixes the count-based ID generation bug that could cause ID collisions. + /// + private string NextBorderFillId() + { + var borderFills = _doc.Header!.Root!.Descendants(HwpxNs.Hh + "borderFill"); + var maxId = borderFills.Any() + ? borderFills.Max(bf => int.TryParse(bf.Attribute("id")?.Value, out var n) ? n : 0) + : 0; + return (maxId + 1).ToString(); + } + + /// + /// Create a border element (leftBorder, rightBorder, topBorder, bottomBorder, diagonal). + /// + private static XElement MakeBorder(string name, string type, string width, string color) + => new XElement(HwpxNs.Hh + name, + new XAttribute("type", type), + new XAttribute("width", width), + new XAttribute("color", color)); + + // ==================== Label-Based Table Fill (Plan 70) ==================== + + /// + /// Extract all text from a table cell: tc → subList → p* → run* → t*. + /// Reuses for consistency. + /// + internal static string ExtractCellText(XElement tc) + { + var subList = tc.Element(HwpxNs.Hp + "subList"); + var paragraphs = subList?.Elements(HwpxNs.Hp + "p") + ?? tc.Elements(HwpxNs.Hp + "p"); + + var sb = new System.Text.StringBuilder(); + foreach (var p in paragraphs) + { + var text = ExtractParagraphText(p); + if (sb.Length > 0 && !string.IsNullOrEmpty(text)) + sb.Append('\n'); + sb.Append(text); + } + return sb.ToString(); + } + + /// + /// Normalize a label for matching: trim, collapse whitespace, + /// strip trailing colon/fullwidth colon/spaces, middle dot, and trailing Korean parenthetical qualifiers. + /// + internal static string NormalizeLabel(string label) + { + if (string.IsNullOrEmpty(label)) return ""; + var normalized = System.Text.RegularExpressions.Regex.Replace(label.Trim(), @"\s+", " "); + normalized = normalized.TrimEnd(':', ' ', '\t', '\u00A0', '\uFF1A'); // ASCII colon + fullwidth colon + // B7: Strip trailing Korean parenthetical if short qualifier (≤4 chars) + normalized = System.Text.RegularExpressions.Regex.Replace(normalized, @"[((][\uAC00-\uD7A3]{1,4}[))]$", ""); + return normalized.Trim(); + } + + /// + /// Parse "라벨>direction" syntax. Default direction is "right". + /// Examples: "대표자>down" → ("대표자", "down"), "대표자" → ("대표자", "right"). + /// + internal static (string Label, string Direction) ParseLabelSpec(string key) + { + var idx = key.IndexOf('>'); + if (idx > 0 && idx < key.Length - 1) + return (key[..idx].Trim(), key[(idx + 1)..].Trim().ToLowerInvariant()); + return (key.Trim(), "right"); + } + + /// + /// Find a table cell whose text matches , + /// then return the adjacent cell in the specified . + /// Searches all tables in all sections. + /// Handles merged cells via . + /// + internal XElement? FindCellByLabel(string label, string direction = "right") + { + var normalizedLabel = NormalizeLabel(label); + if (string.IsNullOrEmpty(normalizedLabel)) return null; + + foreach (var sec in _doc.Sections) + { + foreach (var tbl in sec.Tables) + { + var result = FindCellInTable(tbl, normalizedLabel, direction); + if (result != null) return result; + } + } + return null; + } + + /// + /// Build a 2D grid from table cells using cellAddr. Handles merged cells via rowSpan/colSpan. + /// Reused by: FindCellInTable (Plan 70), RecognizeFormFields (Plan 70.2), Table Map (Plan 71). + /// + internal static (XElement?[,] Grid, List<(XElement Tc, int Row, int Col, int RowSpan, int ColSpan)> Cells) + BuildTableGrid(XElement tbl) + { + var rows = tbl.Elements(HwpxNs.Hp + "tr").ToList(); + if (rows.Count == 0) return (new XElement?[0, 0], new()); + + int maxRow = 0, maxCol = 0; + var cellList = new List<(XElement tc, int row, int col, int rowSpan, int colSpan)>(); + + foreach (var tr in rows) + { + foreach (var tc in tr.Elements(HwpxNs.Hp + "tc")) + { + var (row, col, rowSpan, colSpan) = GetCellAddr(tc); + cellList.Add((tc, row, col, rowSpan, colSpan)); + if (row + rowSpan > maxRow) maxRow = row + rowSpan; + if (col + colSpan > maxCol) maxCol = col + colSpan; + } + } + + // Plan 99.9.E5: Table size limits — prevent OOM on malformed cellAddr values + const int MaxTableRows = 10000; + const int MaxTableCols = 200; + if (maxRow > MaxTableRows || maxCol > MaxTableCols) + return (new XElement?[0, 0], new()); + + var grid = new XElement?[maxRow, maxCol]; + foreach (var (tc, row, col, rowSpan, colSpan) in cellList) + { + for (int r = row; r < row + rowSpan && r < maxRow; r++) + for (int c = col; c < col + colSpan && c < maxCol; c++) + grid[r, c] = tc; + } + + return (grid, cellList); + } + + /// + /// Search a single table for a label match and return the adjacent cell. + /// Uses for merged cell handling. + /// Matching order: exact → prefix overlap (60% threshold). + /// + private static XElement? FindCellInTable(XElement tbl, string normalizedLabel, string direction) + { + var (grid, cellList) = BuildTableGrid(tbl); + if (cellList.Count == 0) return null; + int maxRow = grid.GetLength(0), maxCol = grid.GetLength(1); + + // Phase 1: Exact match + var exactResult = FindAdjacentByMatch(grid, cellList, normalizedLabel, direction, maxRow, maxCol, exact: true); + if (exactResult != null) return exactResult; + + // Phase 2: Prefix + overlap match (60% threshold) + return FindAdjacentByMatch(grid, cellList, normalizedLabel, direction, maxRow, maxCol, exact: false); + } + + /// + /// Inner match helper. When exact=false, accepts prefix overlap with 60% length ratio. + /// + private static XElement? FindAdjacentByMatch( + XElement?[,] grid, + List<(XElement Tc, int Row, int Col, int RowSpan, int ColSpan)> cellList, + string normalizedLabel, string direction, + int maxRow, int maxCol, bool exact) + { + XElement? bestCandidate = null; + double bestRatio = 0; + + foreach (var (tc, row, col, rowSpan, colSpan) in cellList) + { + var cellText = ExtractCellText(tc); + var normalizedCell = NormalizeLabel(cellText); + + bool isMatch; + double ratio = 0; + + if (exact) + { + isMatch = normalizedCell.Equals(normalizedLabel, StringComparison.OrdinalIgnoreCase); + } + else + { + // Prefix overlap: one must start with the other, and shorter/longer >= 0.6 + isMatch = false; + if (!string.IsNullOrEmpty(normalizedCell) && + (normalizedCell.StartsWith(normalizedLabel, StringComparison.OrdinalIgnoreCase) || + normalizedLabel.StartsWith(normalizedCell, StringComparison.OrdinalIgnoreCase))) + { + var longer = Math.Max(normalizedCell.Length, normalizedLabel.Length); + var shorter = Math.Min(normalizedCell.Length, normalizedLabel.Length); + ratio = (double)shorter / longer; + if (ratio >= 0.6) + { + isMatch = true; + } + } + } + + if (!isMatch) continue; + + // Calculate target position based on direction + int targetRow = row, targetCol = col; + switch (direction) + { + case "right": targetCol = col + colSpan; break; + case "left": targetCol = col - 1; break; + case "down": targetRow = row + rowSpan; break; + case "up": targetRow = row - 1; break; + } + + // Bounds check + if (targetRow >= 0 && targetRow < maxRow && + targetCol >= 0 && targetCol < maxCol) + { + var target = grid[targetRow, targetCol]; + if (target != null && target != tc) + { + if (exact) return target; // Exact match always wins + // Prefix match: track best ratio + if (ratio > bestRatio) + { + bestRatio = ratio; + bestCandidate = target; + } + } + } + } + + return bestCandidate; + } + + // ==================== Cell Address Helpers ==================== + + internal static (int Row, int Col, int RowSpan, int ColSpan) GetCellAddr(XElement tc) + { + var cellAddr = tc.Element(HwpxNs.Hp + "cellAddr"); + if (cellAddr != null) + { + int row = (int?)cellAddr.Attribute("rowAddr") ?? 0; + int col = (int?)cellAddr.Attribute("colAddr") ?? 0; + + // Try separate element first (Hancom native format) + var cellSpan = tc.Element(HwpxNs.Hp + "cellSpan"); + if (cellSpan != null) + { + return (row, col, + (int?)cellSpan.Attribute("rowSpan") ?? 1, + (int?)cellSpan.Attribute("colSpan") ?? 1); + } + + // Fallback: span attrs on cellAddr itself + return (row, col, + (int?)cellAddr.Attribute("rowSpan") ?? 1, + (int?)cellAddr.Attribute("colSpan") ?? 1); + } + + // Fallback: attributes directly on + return ( + (int?)tc.Attribute("rowAddr") ?? 0, + (int?)tc.Attribute("colAddr") ?? 0, + (int?)tc.Attribute("rowSpan") ?? 1, + (int?)tc.Attribute("colSpan") ?? 1 + ); + } + + // ==================== Form Recognition (Plan 70.2) ==================== + + /// A single recognized form field from auto-detection. + internal record RecognizedField( + string Label, string Value, string Path, int Row, int Col, string Strategy); + + /// Korean government form label keywords (~48 items). + private static readonly string[] LabelKeywords = [ + "성명", "이름", "주소", "전화", "전화번호", "휴대폰", "연락처", + "생년월일", "주민등록번호", "소속", "직위", "직급", "부서", + "이메일", "팩스", "학교", "학년", "반", "학번", + "신청인", "대표자", "담당자", "작성자", + "일시", "날짜", "기간", "장소", "목적", "사유", "비고", + "금액", "수량", "단가", "합계", "계", "소계", + "동아리명", "사업분야", "참가구분", "인원수", + // Regulation/expenditure keywords (Plan 70.2 — regulation doc support) + "비목", "항목해설", "증빙", "집행", "비용항목", "지출", + "결제일", "결제금액", "카드번호", "승인번호", "사용처", + "구분", "내용", "지도교수", "검수자", "검수일", + // Public form keywords (Plan 99.9.A1 — kordoc catalog) + "핸드폰", "확인자", "승인자", "번호", + "등록기준지", "본적", "위임인", "청구사유" + ]; + + // Plan 99.9.A3: Labels where a time-like value (HH:MM) is expected + private static readonly HashSet _timeRelatedLabels = new(StringComparer.Ordinal) + { + "일시", "시간", "시작", "종료", "기간", "출발", "도착" + }; + + // B1: In-cell pattern regexes (FF3 from kordoc catalog) + private static readonly System.Text.RegularExpressions.Regex InCellParenBlankRx = new( + @"([가-힣A-Za-z]+)\(\s{1,}\)([가-힣A-Za-z]*)", + System.Text.RegularExpressions.RegexOptions.Compiled); + + private static readonly System.Text.RegularExpressions.Regex InCellCheckboxRx = new( + @"[□■☐☑✓✔]\s?([가-힣A-Za-z]+)", + System.Text.RegularExpressions.RegexOptions.Compiled); + + private static readonly System.Text.RegularExpressions.Regex InCellAnnotationRx = new( + @"\(([가-힣A-Za-z]+)[::]\s{1,}\)", + System.Text.RegularExpressions.RegexOptions.Compiled); + + // B2: Korean key-value table header keywords (P15 from kordoc catalog) + private static readonly System.Text.RegularExpressions.Regex KvTableHeaderRx = new( + @"^[\s\(]*(?:구분|항목|종류|분류|유형|대상|내용|기간|금액|비율|방법|절차|요건|조건|근거|목적|범위|기준)[\s\)]*[:\s]?$", + System.Text.RegularExpressions.RegexOptions.Compiled); + + // B5: Values that map to "checked" state for checkbox fields (FF3 detail) + internal static readonly HashSet CheckboxTruthyValues = new(StringComparer.OrdinalIgnoreCase) + { + "true", "1", "yes", "o", "v", + "☑", "✓", "✔", "■" + }; + + // B5: Checked/unchecked character pairs for checkbox conversion + internal const char CheckboxUnchecked = '□'; + internal const char CheckboxChecked = '☑'; + + /// + /// Determine if a cell's text looks like a form label. + /// Keyword substring match + short Korean heuristic (2-8 chars, no digits). + /// Strips trailing superscript markers (Plan 99.9.A2 — kordoc FF4). + /// + internal static bool IsLabelCell(string text) + { + var trimmed = NormalizeLabel(text); + if (string.IsNullOrEmpty(trimmed) || trimmed.Length > 30) return false; + + // Plan 99.9.A2: Strip trailing superscript/reference markers before keyword matching + trimmed = System.Text.RegularExpressions.Regex.Replace(trimmed, @"[¹²³⁴⁵⁶⁷⁸⁹⁰*※]+$", ""); + if (string.IsNullOrEmpty(trimmed)) return false; + + if (LabelKeywords.Any(kw => trimmed.Contains(kw))) return true; + + if (System.Text.RegularExpressions.Regex.IsMatch(trimmed, @"^[\uAC00-\uD7A3\s()·]{2,8}$") + && !System.Text.RegularExpressions.Regex.IsMatch(trimmed, @"\d")) + return true; + + return false; + } + + /// + /// Recognize form fields from all tables in the document. + /// Strategy 1: Adjacent cell label-value (left→right). + /// Strategy 2: Header row + data rows (first row all short text → headers). + /// Strategy 3: In-cell patterns (checkbox, paren-blank, annotation). + /// Strategy 4: KV table detection (tables with KV keyword headers). + /// + internal List RecognizeFormFields() + { + var fields = new List(); + + foreach (var (sec, tbl, localTblIdx) in _doc.AllTables()) + { + var (grid, cellList) = BuildTableGrid(tbl); + if (cellList.Count == 0) continue; + + int maxRow = grid.GetLength(0), maxCol = grid.GetLength(1); + + // B4: Layout table skip heuristic — skip narrow tables (likely layout, not forms) + if (maxCol == 1) + { + bool allEmpty = cellList.All(c => string.IsNullOrWhiteSpace(ExtractCellText(c.Tc))); + if (allEmpty) continue; + } + + var tableFields = new List(); + + // Strategy 3: In-cell pattern detection (FF3 — paren-blank, checkbox, annotation) + { + var seen3 = new HashSet(); + foreach (var (tc, row, col, rowSpan, colSpan) in cellList) + { + if (seen3.Contains(tc)) continue; + seen3.Add(tc); + + var cellText = ExtractCellText(tc); + if (string.IsNullOrWhiteSpace(cellText)) continue; + + var path = $"/section[{sec.Index + 1}]/tbl[{localTblIdx + 1}]/tr[{row + 1}]/tc[{col + 1}]"; + + // B1a. Parenthesized blank: "일반( )통" → label="일반통", value="" + foreach (System.Text.RegularExpressions.Match m in InCellParenBlankRx.Matches(cellText)) + { + var label = m.Groups[1].Value + m.Groups[2].Value; + tableFields.Add(new RecognizedField( + label, "", path, row, col, "in-cell:paren-blank")); + } + + // B1b. Checkbox: "□남자" → label="남자", value="□" (unchecked) + foreach (System.Text.RegularExpressions.Match m in InCellCheckboxRx.Matches(cellText)) + { + var checkChar = cellText[m.Index].ToString(); + var isChecked = checkChar != "□" && checkChar != "☐"; + tableFields.Add(new RecognizedField( + m.Groups[1].Value, isChecked ? "true" : "false", + path, row, col, "in-cell:checkbox")); + } + + // B1c. Annotation blank: "(한자: )" → label="한자", value="" + foreach (System.Text.RegularExpressions.Match m in InCellAnnotationRx.Matches(cellText)) + { + tableFields.Add(new RecognizedField( + m.Groups[1].Value, "", path, row, col, "in-cell:annotation")); + } + } + } + + // Strategy 1: Adjacent cell label-value (label left, value right) + if (maxCol >= 2) + { + var seen = new HashSet(); + foreach (var (tc, row, col, rowSpan, colSpan) in cellList) + { + if (seen.Contains(tc)) continue; + seen.Add(tc); + + var cellText = ExtractCellText(tc); + if (!IsLabelCell(cellText)) continue; + + int targetCol = col + colSpan; + if (targetCol < maxCol) + { + var valueCell = grid[row, targetCol]; + if (valueCell != null && valueCell != tc) + { + var value = ExtractCellText(valueCell).Trim(); + if (!string.IsNullOrEmpty(value)) + { + var normalizedLabel = NormalizeLabel(cellText); + + // Plan 99.9.A3: Skip false positive values (URL, ratio) + // Time filter only when label is NOT a time/date keyword + if (value.Contains("://") + || System.Text.RegularExpressions.Regex.IsMatch(value, @"^\d+:\d+$")) + continue; + if (!_timeRelatedLabels.Contains(normalizedLabel) + && System.Text.RegularExpressions.Regex.IsMatch(value, @"^\d{1,2}:\d{2}$")) + continue; + + var path = $"/section[{sec.Index + 1}]/tbl[{localTblIdx + 1}]/tr[{row + 1}]/tc[{col + 1}]"; + tableFields.Add(new RecognizedField( + normalizedLabel, value, path, row, col, "adjacent")); + } + } + } + } + } + + // Strategy 2: Header+data (first row all short text → treat as headers) + if (tableFields.Count == 0 && maxRow >= 2 && maxCol >= 2) + { + bool allLabels = true; + for (int c = 0; c < maxCol; c++) + { + var headerCell = grid[0, c]; + if (headerCell == null) { allLabels = false; break; } + var ht = ExtractCellText(headerCell).Trim(); + if (string.IsNullOrEmpty(ht) || ht.Length > 20) { allLabels = false; break; } + } + + if (allLabels) + { + for (int r = 1; r < maxRow; r++) + for (int c = 0; c < maxCol; c++) + { + var headerCell = grid[0, c]; + var dataCell = grid[r, c]; + if (headerCell == null || dataCell == null) continue; + var label = ExtractCellText(headerCell).Trim(); + var value = ExtractCellText(dataCell).Trim(); + if (!string.IsNullOrEmpty(label) && !string.IsNullOrEmpty(value)) + { + var path = $"/section[{sec.Index + 1}]/tbl[{localTblIdx + 1}]/tr[{r + 1}]/tc[{c + 1}]"; + tableFields.Add(new RecognizedField( + NormalizeLabel(label), value, path, r, c, "header-data")); + } + } + } + } + + // Strategy 4: KV table detection (P15 — tables with KV keyword headers) + if (tableFields.Count == 0 && maxRow >= 2 && maxCol >= 2) + { + // Check if first column contains KV header keywords + int kvHits = 0; + for (int r = 0; r < maxRow; r++) + { + var firstCell = grid[r, 0]; + if (firstCell == null) continue; + var text = NormalizeLabel(ExtractCellText(firstCell)); + if (KvTableHeaderRx.IsMatch(text)) kvHits++; + } + + // If 2+ rows have KV keywords in col 0, treat col 0 as labels, col 1+ as values + if (kvHits >= 2) + { + var seenKv = new HashSet(); + for (int r = 0; r < maxRow; r++) + { + var labelCell = grid[r, 0]; + if (labelCell == null || seenKv.Contains(labelCell)) continue; + seenKv.Add(labelCell); + + var label = NormalizeLabel(ExtractCellText(labelCell)); + if (string.IsNullOrEmpty(label)) continue; + + // Collect values from remaining columns + for (int c = 1; c < maxCol; c++) + { + var valCell = grid[r, c]; + if (valCell == null || valCell == labelCell) continue; + var value = ExtractCellText(valCell).Trim(); + if (string.IsNullOrEmpty(value)) continue; + + var path = $"/section[{sec.Index + 1}]/tbl[{localTblIdx + 1}]/tr[{r + 1}]/tc[{c + 1}]"; + tableFields.Add(new RecognizedField( + label, value, path, r, c, "kv-table")); + break; // Take first non-empty value column + } + } + } + } + + fields.AddRange(tableFields); + } + + return fields; + } + + /// + /// Try to fill a checkbox pattern in any table cell. + /// Searches for □label or ■label and toggles based on truthy value. + /// Returns true if a checkbox was found and updated. + /// + internal bool TryFillCheckbox(string label, string value) + { + var isTruthy = CheckboxTruthyValues.Contains(value); + var targetChar = isTruthy ? CheckboxChecked : CheckboxUnchecked; + + foreach (var sec in _doc.Sections) + { + foreach (var tbl in sec.Tables) + { + foreach (var tr in tbl.Elements(HwpxNs.Hp + "tr")) + { + foreach (var tc in tr.Elements(HwpxNs.Hp + "tc")) + { + var cellText = ExtractCellText(tc); + // Match □label or □ label (with optional whitespace) + var pattern = $@"[□■☐☑✓✔]\s?{System.Text.RegularExpressions.Regex.Escape(label)}"; + if (!System.Text.RegularExpressions.Regex.IsMatch(cellText, pattern)) + continue; + + // Replace the checkbox character in the actual XML + var subList = tc.Element(HwpxNs.Hp + "subList"); + var paragraphs = subList?.Elements(HwpxNs.Hp + "p") + ?? tc.Elements(HwpxNs.Hp + "p"); + foreach (var p in paragraphs) + { + foreach (var run in p.Elements(HwpxNs.Hp + "run")) + { + foreach (var t in run.Elements(HwpxNs.Hp + "t")) + { + if (System.Text.RegularExpressions.Regex.IsMatch(t.Value, pattern)) + { + t.Value = System.Text.RegularExpressions.Regex.Replace( + t.Value, @"[□■☐☑✓✔](?=\s?" + + System.Text.RegularExpressions.Regex.Escape(label) + ")", + targetChar.ToString()); + // Save section + var sectionRoot = tc.AncestorsAndSelf() + .FirstOrDefault(e => e.Name.LocalName == "sec"); + if (sectionRoot != null) SaveSection(sectionRoot); + _dirty = true; + return true; + } + } + } + } + } + } + } + } + return false; + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxHandler.Import.cs b/src/officecli/Handlers/Hwpx/HwpxHandler.Import.cs new file mode 100644 index 000000000..d5b5a9bf3 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxHandler.Import.cs @@ -0,0 +1,225 @@ +// Plan 85: Markdown → HWPX Import +// Minimal GFM parser: headings, paragraphs, tables. +// Uses existing Add/Set infrastructure internally. + +using System.Text.RegularExpressions; + +namespace OfficeCli.Handlers; + +public partial class HwpxHandler +{ + /// + /// Import Markdown content into the current HWPX document. + /// Supports: headings (#-######), paragraphs, GFM tables, bold, italic. + /// + public int ImportMarkdown(string markdown, string? align = null) + { + var lines = markdown.Split('\n'); + int blockCount = 0; + + int i = 0; + + // Skip YAML frontmatter (--- ... ---) + if (i < lines.Length && lines[i].TrimEnd('\r') == "---") + { + i++; + while (i < lines.Length && lines[i].TrimEnd('\r') != "---") i++; + if (i < lines.Length) i++; // skip closing --- + } + while (i < lines.Length) + { + var line = lines[i].TrimEnd('\r'); + + // Skip empty lines + if (string.IsNullOrWhiteSpace(line)) { i++; continue; } + + // Skip code fence markers (``` or ~~~) + if (Regex.IsMatch(line, @"^(`{3}|~{3})")) { i++; continue; } + + // Skip horizontal rules (--- or ***) + if (Regex.IsMatch(line.Trim(), @"^[-*_]{3,}$")) { i++; continue; } + + // Skip image-only lines: ![alt](url) + if (Regex.IsMatch(line.Trim(), @"^!\[.*\]\(.*\)$")) { i++; continue; } + + // G7: Blockquote: > text (preserve marker for round-trip) + if (line.TrimStart().StartsWith('>')) + { + var text = StripInlineMarkdown(line.TrimStart().TrimStart('>').Trim()); + if (!string.IsNullOrEmpty(text)) + { + var props = new Dictionary { ["text"] = $"> {text}" }; + if (align != null) props["align"] = align.ToUpperInvariant(); + Add("/section[1]", "paragraph", null, props); + blockCount++; + } + i++; + continue; + } + + // G7: Unordered list: - item, * item, + item + var ulMatch = Regex.Match(line, @"^\s*[-*+]\s+(.+)$"); + if (ulMatch.Success) + { + var text = StripInlineMarkdown(ulMatch.Groups[1].Value); + var props = new Dictionary { ["text"] = $" - {text}" }; + if (align != null) props["align"] = align.ToUpperInvariant(); + Add("/section[1]", "paragraph", null, props); + blockCount++; + i++; + continue; + } + + // G7: Ordered list: 1. item + var olMatch = Regex.Match(line, @"^\s*(\d+)\.\s+(.+)$"); + if (olMatch.Success) + { + var num = olMatch.Groups[1].Value; + var text = StripInlineMarkdown(olMatch.Groups[2].Value); + var props = new Dictionary { ["text"] = $" {num}. {text}" }; + if (align != null) props["align"] = align.ToUpperInvariant(); + Add("/section[1]", "paragraph", null, props); + blockCount++; + i++; + continue; + } + + // Heading: # ... ###### + var headingMatch = Regex.Match(line, @"^(#{1,6})\s+(.+)$"); + if (headingMatch.Success) + { + var level = headingMatch.Groups[1].Value.Length; + var text = StripInlineMarkdown(headingMatch.Groups[2].Value.Trim()); + var props = new Dictionary + { + ["text"] = text, + ["bold"] = "true", + ["fontsize"] = level switch { 1 => "22", 2 => "18", 3 => "14", _ => "12" } + }; + if (level <= 3) props["styleidref"] = (level + 1).ToString(); + if (align != null) props["align"] = align.ToUpperInvariant(); + Add("/section[1]", "paragraph", null, props); + blockCount++; + i++; + continue; + } + + // GFM Table: starts with | + if (line.TrimStart().StartsWith('|')) + { + var tableLines = new List(); + while (i < lines.Length && lines[i].TrimEnd('\r').TrimStart().StartsWith('|')) + { + tableLines.Add(lines[i].TrimEnd('\r')); + i++; + } + blockCount += ImportMarkdownTable(tableLines); + continue; + } + + // Bold/italic paragraph + { + var text = StripInlineMarkdown(line.Trim()); + if (!string.IsNullOrEmpty(text)) + { + var props = new Dictionary { ["text"] = text }; + if (line.Trim().StartsWith("**") && line.Trim().EndsWith("**")) + props["bold"] = "true"; + if (align != null) props["align"] = align.ToUpperInvariant(); + Add("/section[1]", "paragraph", null, props); + blockCount++; + } + i++; + } + } + + return blockCount; + } + + private int ImportMarkdownTable(List tableLines) + { + // Parse table rows, skipping separator line (| --- | --- |) + var rows = new List(); + foreach (var line in tableLines) + { + var trimmed = line.Trim(); + // Skip separator rows + if (Regex.IsMatch(trimmed, @"^\|[\s\-:|]+\|$")) continue; + + var cells = trimmed.Split('|', StringSplitOptions.None) + .Skip(1) // leading empty from first | + .ToArray(); + // Remove trailing empty from last | + if (cells.Length > 0 && string.IsNullOrWhiteSpace(cells[^1])) + cells = cells[..^1]; + cells = cells.Select(c => StripInlineMarkdown(c.Trim())).ToArray(); + if (cells.Length > 0) rows.Add(cells); + } + + if (rows.Count == 0) return 0; + + int rowCount = rows.Count; + int colCount = rows.Max(r => r.Length); + + // Create table + Add("/section[1]", "table", null, new Dictionary + { + ["rows"] = rowCount.ToString(), + ["cols"] = colCount.ToString() + }); + + // Find the table we just added — it's the last tbl in the document + var lastTbl = _doc.Sections.SelectMany(s => s.Tables).LastOrDefault(); + if (lastTbl == null) return 0; + + // Find path to this table + var tblPath = BuildPath(lastTbl); + + // Fill cells + for (int r = 0; r < rowCount; r++) + { + for (int c = 0; c < rows[r].Length; c++) + { + var cellText = rows[r][c]; + if (!string.IsNullOrEmpty(cellText)) + { + var cellPath = $"{tblPath}/tr[{r + 1}]/tc[{c + 1}]"; + try { Set(cellPath, new Dictionary { ["text"] = cellText }); } + catch { /* skip cells that don't resolve */ } + } + } + } + + return 1; + } + + private static string StripInlineMarkdown(string text) + { + // Bold+italic (*** and ___) + text = Regex.Replace(text, @"\*{3}(.+?)\*{3}", "$1"); + text = Regex.Replace(text, @"_{3}(.+?)_{3}", "$1"); + // Bold (** and __) + text = Regex.Replace(text, @"\*{2}(.+?)\*{2}", "$1"); + text = Regex.Replace(text, @"_{2}(.+?)_{2}", "$1"); + // Italic (* and _) + text = Regex.Replace(text, @"\*(.+?)\*", "$1"); + text = Regex.Replace(text, @"(?").Replace(""", "\"") + .Replace("'", "'"); + // Escaped markdown chars + text = Regex.Replace(text, @"\\([*_~`\[\]\\|#>!\-])", "$1"); + // Escaped pipe + text = text.Replace("\\|", "|"); + return text.Trim(); + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxHandler.Korean.cs b/src/officecli/Handlers/Hwpx/HwpxHandler.Korean.cs new file mode 100644 index 000000000..b786c5f17 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxHandler.Korean.cs @@ -0,0 +1,48 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Text.RegularExpressions; + +namespace OfficeCli.Handlers; + +internal static partial class HwpxKorean +{ + public static string Normalize(string text) + { + text = StripPuaChars(text); + text = StripShapeAltText(text); + text = NormalizeKoreanSpacing(text); + return text; + } + + // F1: Rune-based PUA stripping — handles BMP + supplementary PUA planes + public static string StripPuaChars(string text) + => string.Concat(text.EnumerateRunes() + .Where(r => !(r.Value >= 0xE000 && r.Value <= 0xF8FF) // BMP PUA + && !(r.Value >= 0xF0000 && r.Value <= 0xFFFFD) // PUA-A + && !(r.Value >= 0x100000 && r.Value <= 0x10FFFD)) // PUA-B + .Select(r => r.ToString())); + + public static string StripShapeAltText(string text) + => ShapeAltTextRegex().Replace(text, ""); + + public static string NormalizeKoreanSpacing(string text) + { + // Fix uniform-distribution spacing (균등 분할): "현 장 대 응" → "현장대응" + // Only collapse when 3+ consecutive single Korean syllables are space-separated. + // Preserves normal word spacing like "인사 발령 통보". + text = UniformDistRegex().Replace(text, m => m.Value.Replace(" ", "")); + // Remove zero-width joiners between jamo + text = text.Replace("\u200D", ""); + return text; + } + + // Plan 99.9.A4: Complete shape alt-text regex (kordoc TB2, 50+ shapes) + // Anchored with ^...$ to prevent partial-match false positives. + [GeneratedRegex(@"^(?:모서리가 둥근 |둥근 )?(?:표|그림|개체|사각형|직사각형|정사각형|원|타원|삼각형|이등변 삼각형|직각 삼각형|선|직선|곡선|화살표|굵은 화살표|이중 화살표|오각형|육각형|팔각형|별|[4-8]점별|십자|십자형|구름|구름형|마름모|도넛|평행사변형|사다리꼴|부채꼴|호|반원|물결|번개|하트|빗금|블록 화살표|수식|그리기\s*개체|묶음\s*개체|글상자|수식\s*개체|OLE\s*개체)\s*입니다\.?$")] + private static partial Regex ShapeAltTextRegex(); + + // F2: Hangul Syllables + Compatibility Jamo uniform spacing detection + [GeneratedRegex(@"(? + /// Add a new element under the parent at the given path. + /// Returns the path of the newly created element. + /// + /// Path to the parent element. + /// Element type: "paragraph", "table", "run" (lowercase). + /// Optional insertion position. null = append. + /// Optional properties for the new element. + public string Add(string parentPath, string type, InsertPosition? position, + Dictionary properties) + { + var index = position?.Index; + // Section: special handling — creates new section file + manifest entry (no parent needed) + if (type.Equals("section", StringComparison.OrdinalIgnoreCase)) + { + var newSection = AddNewSection(properties); + _dirty = true; + return $"/section[{newSection.Index + 1}]"; + } + + // Style: header-level, not section-level + if (type.Equals("style", StringComparison.OrdinalIgnoreCase)) + { + var newStyle = CreateStyleElement(properties); + _dirty = true; + return $"/header/style[{newStyle.Attribute("id")?.Value}]"; + } + + var parent = ResolvePath(parentPath); + + // Header/footer: special handling — adds to secPr, not to parent directly + if (type.Equals("header", StringComparison.OrdinalIgnoreCase) || type.Equals("footer", StringComparison.OrdinalIgnoreCase)) + { + var isHeader = type.Equals("header", StringComparison.OrdinalIgnoreCase); + var hfElement = AddHeaderFooter(parent, properties, isHeader); + _dirty = true; + SaveSection(hfElement); + return $"/{(isHeader ? "header" : "footer")}[1]"; + } + + // Memo: special handling — adds to at section level, not inline + if (type.Equals("comment", StringComparison.OrdinalIgnoreCase) || type.Equals("memo", StringComparison.OrdinalIgnoreCase)) + { + var memoElement = AddMemoToGroup(parent, properties); + _dirty = true; + SaveSection(memoElement); + var memoCount = memoElement.Parent?.Elements(HwpxNs.Hp + "memo").Count() ?? 1; + return $"/memo[{memoCount}]"; + } + + // TOC: special handling — generates multiple paragraphs from headings + if (type.Equals("toc", StringComparison.OrdinalIgnoreCase) || type.Equals("tableofcontents", StringComparison.OrdinalIgnoreCase)) + { + var mode = properties?.GetValueOrDefault("mode") ?? "static"; + var tocParas = mode.Equals("field", StringComparison.OrdinalIgnoreCase) + ? CreateFieldToc(properties) + : CreateStaticToc(properties); + foreach (var p in tocParas) + parent.Add(p); + _dirty = true; + SaveSection(parent); + return $"/toc[{tocParas.Count} paragraphs]"; + } + + var newElement = type.ToLowerInvariant() switch + { + "paragraph" or "p" => CreateParagraph(properties), + "table" or "tbl" => CreateTable(properties), + "run" => CreateRun(properties), + "row" or "tr" => CreateRow(parent, properties), + "cell" or "tc" => CreateCell(parent, properties), + "picture" or "image" or "pic" => CreatePicture(parent, properties), + "hyperlink" or "link" => CreateHyperlink(properties), + "pagebreak" or "page-break" => CreatePageBreak(), + "columnbreak" or "column-break" => CreateColumnBreak(properties), + "footnote" => CreateFootnote(properties), + "endnote" => CreateFootnote(properties, isEndnote: true), + "pagenum" or "pagenumber" => CreatePageNum(properties), + "bookmark" => CreateBookmark(properties), + "equation" or "eq" => CreateEquation(properties), + "line" => CreateLine(properties), + "rect" or "rectangle" => CreateRect(properties), + "ellipse" or "circle" => CreateEllipse(properties), + "textbox" => CreateTextBox(properties), + "polygon" => CreatePolygon(properties), + "triangle" => CreatePolygon(MergeProps(properties, "sides", "3")), + "pentagon" => CreatePolygon(MergeProps(properties, "sides", "5")), + "arrow" => CreateArrow(properties), + "formfield" or "form" => CreateFormField(properties), + "field" => CreateField(properties), + "date" => CreateField(MergeProps(properties, "type", "DATE")), + "filepath" or "path" => CreateField(MergeProps(properties, "type", "PATH")), + "clickhere" => CreateField(MergeProps(properties, "type", "CLICK_HERE")), + "checkbox" or "check" => CreateField(MergeProps(properties, "type", "CHECKBOX")), + "dropdown" or "drop" => CreateField(MergeProps(properties, "type", "DROPDOWN")), + "summary" or "summery" => CreateField(MergeProps(properties, "type", "SUMMERY")), + "author" => CreateField(MergeProps(properties, "type", "SUMMERY", "command", "$author")), + "title" => CreateField(MergeProps(properties, "type", "SUMMERY", "command", "$title")), + "lastsaveby" => CreateField(MergeProps(properties, "type", "SUMMERY", "command", "$lastsaveby")), + "filename" => CreateField(MergeProps(properties, "type", "PATH", "command", "$F")), + "watermark" => CreateWatermark(properties), + _ => throw new CliException($"Unsupported element type: {type}") + }; + + // Hancom requires tables and pictures to be wrapped: ... + // If adding to a section (or section-like parent), wrap in p>run. + var needsWrap = (newElement.Name == HwpxNs.Hp + "tbl" || newElement.Name == HwpxNs.Hp + "pic") + && IsSectionLike(parent); + if (needsWrap) + { + newElement = WrapTableInParagraph(newElement); + } + + // Insert at index or append + + // Auto-replace first empty paragraph: if parent is section-like and first paragraph + // has no text (only secPr), replace it instead of appending after it. + // This prevents the "always empty first line" issue with base.hwpx template. + if (!index.HasValue && newElement.Name == HwpxNs.Hp + "p" && IsSectionLike(parent)) + { + var firstP = parent.Elements(HwpxNs.Hp + "p").FirstOrDefault(); + if (firstP != null) + { + var hasText = firstP.Descendants(HwpxNs.Hp + "t").Any(t => !string.IsNullOrEmpty(t.Value)); + var isOnlyPara = parent.Elements(HwpxNs.Hp + "p").Count() == 1; + if (!hasText && isOnlyPara) + { + // Preserve secPr from the first paragraph (it's structurally required) + var secPrRun = firstP.Elements(HwpxNs.Hp + "run") + .FirstOrDefault(r => r.Element(HwpxNs.Hp + "secPr") != null); + if (secPrRun != null) + newElement.AddFirst(secPrRun); + firstP.ReplaceWith(newElement); + goto afterInsert; + } + } + } + + if (index.HasValue) + { + var targetName = newElement.Name; + var siblings = parent.Elements(targetName).ToList(); + var insertIdx = index.Value - 1; // convert 1-based to 0-based + + if (insertIdx <= 0 || siblings.Count == 0) + { + // Insert as first child of this type + var firstOfType = parent.Elements(targetName).FirstOrDefault(); + if (firstOfType != null) + firstOfType.AddBeforeSelf(newElement); + else + parent.Add(newElement); + } + else if (insertIdx >= siblings.Count) + { + // Append after last sibling of this type + siblings.Last().AddAfterSelf(newElement); + } + else + { + // Insert before the element currently at this index + siblings[insertIdx].AddBeforeSelf(newElement); + } + } + else + { + parent.Add(newElement); + } + afterInsert: + + // Apply formatting properties (fontsize, bold, italic, color, align, etc.) + // to the newly created paragraph — CreateParagraph only handles text/style/IDRef + var createdPara = newElement.Name == HwpxNs.Hp + "p" ? newElement : null; + if (createdPara != null && properties != null) + { + var structuralKeys = new HashSet(StringComparer.OrdinalIgnoreCase) + { "text", "styleidref", "styleIDRef", "charpridref", "charPrIDRef", + "parapridref", "paraPrIDRef" }; + foreach (var (key, value) in properties) + { + if (structuralKeys.Contains(key)) continue; + SetParagraphProp(createdPara, key, value); + } + } + + _dirty = true; + SaveSection(parent); + return BuildPath(newElement); + } + + // ==================== Element Factories ==================== + + /// + /// Create a new paragraph element with optional text content. + /// Props: "text" → paragraph text, "styleidref" → style ID, "charpridref" → char property ID. + /// + private XElement CreateParagraph(Dictionary? props) + { + var id = NewId(); + var text = props?.GetValueOrDefault("text") ?? ""; + var styleIdRef = props?.GetValueOrDefault("styleidref") ?? props?.GetValueOrDefault("styleIDRef") ?? "0"; + var charPrIdRef = props?.GetValueOrDefault("charpridref") ?? props?.GetValueOrDefault("charPrIDRef") ?? "0"; + var paraPrIdRef = props?.GetValueOrDefault("parapridref") ?? props?.GetValueOrDefault("paraPrIDRef") ?? "0"; + + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", id), + new XAttribute("styleIDRef", styleIdRef), + new XAttribute("paraPrIDRef", paraPrIdRef), + new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", charPrIdRef), + new XElement(HwpxNs.Hp + "t", text) + ) + ); + } + + /// + /// Create a new table element with full DOCX-parity features. + /// + /// CRITICAL: Every <hp:tc> MUST have ALL of the following children + /// in this exact order, or Hancom will crash on open: + /// - <hp:subList vertAlign="CENTER" ...><hp:p .../></hp:subList> + /// - <hp:cellAddr colAddr="C" rowAddr="R"/> + /// - <hp:cellSpan colSpan="1" rowSpan="1"/> + /// - <hp:cellSz width="W" height="H"/> + /// - <hp:cellMargin left="510" right="510" top="141" bottom="141"/> + /// + /// Props: + /// "rows" → row count (default 2) + /// "cols" → col count (default 2) + /// "width" → total table width in HWPML units (default 42520 ≈ A4 body) + /// "data" → cell data: "H1,H2;R1C1,R1C2" or CSV file path + /// "colWidths" → per-column widths: "10000,15000,17520" + /// "merge" → merge spec: "startRow,startCol,endRow,endCol;..." + /// "borderFillIDRef"→ table-level border fill ID (default "1") + /// + private XElement CreateTable(Dictionary? props) + { + var id = NewId(); + + // Parse data: "H1,H2;R1C1,R1C2;R2C1,R2C2" or CSV file path + string[][]? tableData = null; + if (props?.TryGetValue("data", out var dataStr) == true && !string.IsNullOrEmpty(dataStr)) + { + if (File.Exists(dataStr)) + tableData = File.ReadAllLines(dataStr) + .Where(l => !string.IsNullOrWhiteSpace(l)) + .Select(l => l.Split(',').Select(c => c.Trim()).ToArray()) + .ToArray(); + else + tableData = dataStr.Split(';') + .Select(r => r.Split(',').Select(c => c.Trim()).ToArray()) + .ToArray(); + } + + // Determine dimensions + int rows, cols; + if (tableData != null) + { + rows = tableData.Length; + cols = tableData.Max(r => r.Length); + // Allow explicit overrides to be larger + if (int.TryParse(props?.GetValueOrDefault("rows"), out var r2) && r2 > rows) rows = r2; + if (int.TryParse(props?.GetValueOrDefault("cols"), out var c2) && c2 > cols) cols = c2; + } + else + { + rows = int.TryParse(props?.GetValueOrDefault("rows"), out var r) && r > 0 ? r : 2; + cols = int.TryParse(props?.GetValueOrDefault("cols"), out var c) && c > 0 ? c : 2; + } + + var totalWidth = int.TryParse(props?.GetValueOrDefault("width"), out var w) && w > 0 ? w : 42520; + var defaultCellWidth = totalWidth / Math.Max(cols, 1); + + // Parse per-column widths: "10000,15000,17520" + int[]? colWidthArr = null; + if ((props?.TryGetValue("colwidths", out var cwStr) == true + || props?.TryGetValue("colWidths", out cwStr) == true) + && !string.IsNullOrEmpty(cwStr)) + { + colWidthArr = cwStr.Split(',') + .Select(s => int.TryParse(s.Trim(), out var v) ? v : defaultCellWidth) + .ToArray(); + } + + var borderFillRef = props?.GetValueOrDefault("borderfillid") + ?? props?.GetValueOrDefault("borderFillIDRef") + ?? EnsureTableBorderFill(); + + var cellHeight = 1000; + var totalHeight = rows * cellHeight; + + var tbl = new XElement(HwpxNs.Hp + "tbl", + new XAttribute("id", id), + new XAttribute("zOrder", "0"), + new XAttribute("numberingType", "TABLE"), + new XAttribute("textWrap", "TOP_AND_BOTTOM"), + new XAttribute("textFlow", "BOTH_SIDES"), + new XAttribute("lock", "0"), + new XAttribute("dropcapstyle", "None"), + new XAttribute("pageBreak", "CELL"), + new XAttribute("repeatHeader", "1"), + new XAttribute("rowCnt", rows.ToString()), + new XAttribute("colCnt", cols.ToString()), + new XAttribute("cellSpacing", "0"), + new XAttribute("borderFillIDRef", borderFillRef), + new XAttribute("noAdjust", "0"), + // Table size — required by Hancom for rendering + new XElement(HwpxNs.Hp + "sz", + new XAttribute("width", totalWidth.ToString()), + new XAttribute("widthRelTo", "ABSOLUTE"), + new XAttribute("height", totalHeight.ToString()), + new XAttribute("heightRelTo", "ABSOLUTE"), + new XAttribute("protect", "0")), + // Position — treatAsChar=1 makes table inline with text + new XElement(HwpxNs.Hp + "pos", + new XAttribute("treatAsChar", "1"), + new XAttribute("affectLSpacing", "0"), + new XAttribute("flowWithText", "1"), + new XAttribute("allowOverlap", "0"), + new XAttribute("holdAnchorAndSO", "0"), + new XAttribute("vertRelTo", "PARA"), + new XAttribute("horzRelTo", "COLUMN"), + new XAttribute("vertAlign", "TOP"), + new XAttribute("horzAlign", "LEFT"), + new XAttribute("vertOffset", "0"), + new XAttribute("horzOffset", "0")), + // Outer margin + new XElement(HwpxNs.Hp + "outMargin", + new XAttribute("left", "283"), + new XAttribute("right", "283"), + new XAttribute("top", "283"), + new XAttribute("bottom", "283")), + // Inner margin + new XElement(HwpxNs.Hp + "inMargin", + new XAttribute("left", "510"), + new XAttribute("right", "510"), + new XAttribute("top", "141"), + new XAttribute("bottom", "141")) + ); + + // Column widths — Hancom uses these to distribute column sizes + for (int col = 0; col < cols; col++) + { + var cw = colWidthArr != null && col < colWidthArr.Length ? colWidthArr[col] : defaultCellWidth; + tbl.Add(new XElement(HwpxNs.Hp + "colSz", + new XAttribute("width", cw.ToString()))); + } + + // Parse merge instructions: "0,0,0,3;1,0,2,0" (startRow,startCol,endRow,endCol) + var mergedCells = new HashSet<(int row, int col)>(); + var spanMap = new Dictionary<(int row, int col), (int rowSpan, int colSpan)>(); + if (props?.TryGetValue("merge", out var mergeStr) == true && !string.IsNullOrEmpty(mergeStr)) + { + foreach (var m in mergeStr.Split(';')) + { + var parts = m.Trim().Split(','); + if (parts.Length == 4 + && int.TryParse(parts[0].Trim(), out var sr) + && int.TryParse(parts[1].Trim(), out var sc) + && int.TryParse(parts[2].Trim(), out var er) + && int.TryParse(parts[3].Trim(), out var ec)) + { + for (int mr = sr; mr <= er; mr++) + for (int mc = sc; mc <= ec; mc++) + { + if (mr == sr && mc == sc) + spanMap[(mr, mc)] = (er - sr + 1, ec - sc + 1); + else + mergedCells.Add((mr, mc)); + } + } + } + } + + // Build rows and cells + for (int row = 0; row < rows; row++) + { + var tr = new XElement(HwpxNs.Hp + "tr"); + + for (int col = 0; col < cols; col++) + { + // Skip cells covered by a merge + if (mergedCells.Contains((row, col))) + continue; + + var cellId = NewId(); + var rowSpan = 1; + var colSpan = 1; + if (spanMap.TryGetValue((row, col), out var span)) + { + rowSpan = span.rowSpan; + colSpan = span.colSpan; + } + + // Cell width = sum of spanned columns + var cellW = 0; + for (int ci = col; ci < col + colSpan && ci < cols; ci++) + cellW += colWidthArr != null && ci < colWidthArr.Length ? colWidthArr[ci] : defaultCellWidth; + + // Cell text from data prop or positional prop + var cellText = ""; + if (tableData != null && row < tableData.Length && col < tableData[row].Length) + cellText = tableData[row][col]; + else if (props?.TryGetValue($"r{row + 1}c{col + 1}", out var rc) == true) + cellText = rc; + + // Per-cell borderFillIDRef: check "r{row+1}c{col+1}borderfillid" prop, fallback to table-level + var cellBorderFill = props?.GetValueOrDefault($"r{row + 1}c{col + 1}borderfillid") + ?? props?.GetValueOrDefault($"r{row + 1}borderfillid") + ?? borderFillRef; + + // Per-cell vertAlign: check "r{row+1}c{col+1}valign" prop + var cellVertAlign = props?.GetValueOrDefault($"r{row + 1}c{col + 1}valign") + ?? props?.GetValueOrDefault("valign") ?? "CENTER"; + + var tc = BuildCell(row, col, rowSpan, colSpan, cellW, cellHeight, + cellText, cellBorderFill, isHeader: row == 0, vertAlign: cellVertAlign); + + tr.Add(tc); + } + + tbl.Add(tr); + } + + return tbl; + } + + /// + /// Create a new run element with optional text content. + /// Props: "text" → run text, "charpridref" → char property ID. + /// + private XElement CreateRun(Dictionary? props) + { + var text = props?.GetValueOrDefault("text") ?? ""; + var charPrIdRef = props?.GetValueOrDefault("charpridref") ?? props?.GetValueOrDefault("charPrIDRef") ?? "0"; + + return new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", charPrIdRef), + new XElement(HwpxNs.Hp + "t", text) + ); + } + + /// + /// Create a new table row with cells. Parent MUST be a <hp:tbl>. + /// Props: "cols" → cell count (default from table colCnt), + /// "c1", "c2", ... → cell text for each column. + /// + private XElement CreateRow(XElement parent, Dictionary? props) + { + if (parent.Name.LocalName != "tbl") + throw new CliException("Rows can only be added to a table element"); + + var colCnt = int.TryParse(parent.Attribute("colCnt")?.Value, out var cc) ? cc : 1; + var cols = int.TryParse(props?.GetValueOrDefault("cols"), out var c) && c > 0 ? c : colCnt; + var existingRows = parent.Elements(HwpxNs.Hp + "tr").Count(); + + // Get column widths from existing colSz elements + var colSizes = parent.Elements(HwpxNs.Hp + "colSz") + .Select(e => int.TryParse(e.Attribute("width")?.Value, out var w) ? w : 42520 / cols) + .ToList(); + + var tr = new XElement(HwpxNs.Hp + "tr"); + + for (int col = 0; col < cols; col++) + { + var cellText = props?.GetValueOrDefault($"c{col + 1}") ?? ""; + var cellWidth = col < colSizes.Count ? colSizes[col] : 42520 / cols; + + tr.Add(BuildCell(existingRows, col, 1, 1, cellWidth, 1000, cellText, "1")); + } + + // Update table rowCnt + parent.SetAttributeValue("rowCnt", (existingRows + 1).ToString()); + + return tr; + } + + /// + /// Create a new table cell. Parent MUST be a <hp:tr>. + /// Props: "text" → cell text, "width" → cell width. + /// + private XElement CreateCell(XElement parent, Dictionary? props) + { + if (parent.Name.LocalName != "tr") + throw new CliException("Cells can only be added to a table row element"); + + var existingCells = parent.Elements(HwpxNs.Hp + "tc").Count(); + var text = props?.GetValueOrDefault("text") ?? ""; + var cellWidth = int.TryParse(props?.GetValueOrDefault("width"), out var w) && w > 0 ? w : 10000; + + // Determine row address from parent's position in the table + var tbl = parent.Parent; + var rowAddr = tbl?.Elements(HwpxNs.Hp + "tr").ToList().IndexOf(parent) ?? 0; + + return BuildCell(rowAddr, existingCells, 1, 1, cellWidth, 1000, text, "1"); + } + + // ==================== Remove ==================== + + /// + /// Swap two elements in the document (Plan 96). + public (string NewPath1, string NewPath2) Swap(string pathA, string pathB) + { + var elemA = ResolvePath(pathA); + var elemB = ResolvePath(pathB); + var placeholder = new XElement(HwpxNs.Hp + "_swap"); + elemA.ReplaceWith(placeholder); + elemB.ReplaceWith(elemA); + placeholder.ReplaceWith(elemB); + var secA = elemA.AncestorsAndSelf().FirstOrDefault(e => e.Name.LocalName == "sec"); + if (secA != null) SaveSection(secA); + var secB = elemB.AncestorsAndSelf().FirstOrDefault(e => e.Name.LocalName == "sec"); + if (secB != null && secB != secA) SaveSection(secB); + _dirty = true; + return (BuildPath(elemA), BuildPath(elemB)); + } + + /// Remove the element at the given path with type-aware cascading cleanup. + /// Special paths: /watermark, /pagebackground, /toc. + /// Returns null on success. Throws CliException if not found. + /// + public string? Remove(string path) + { + // Special path handlers (not element-based) + if (path.Equals("/watermark", StringComparison.OrdinalIgnoreCase) + || path.Equals("/pagebackground", StringComparison.OrdinalIgnoreCase)) + return RemovePageBackground(); + if (path.Equals("/toc", StringComparison.OrdinalIgnoreCase)) + return RemoveToc(); + // Header/Footer removal: /header[N] or /footer[N] + var hfMatch = System.Text.RegularExpressions.Regex.Match( + path, @"^/(header|footer)\[(\d+)\]$", System.Text.RegularExpressions.RegexOptions.IgnoreCase); + if (hfMatch.Success) + return RemoveHeaderFooter(hfMatch.Groups[1].Value.Equals("header", StringComparison.OrdinalIgnoreCase), int.Parse(hfMatch.Groups[2].Value)); + // Section removal: /section[N] with no further path segments + if (System.Text.RegularExpressions.Regex.IsMatch(path, @"^/section\[\d+\]$", System.Text.RegularExpressions.RegexOptions.IgnoreCase)) + return RemoveSection(path); + + var element = ResolvePath(path); + var parent = element.Parent + ?? throw new CliException($"Cannot remove root element at: {path}"); + + // Type-aware cascade + var localName = element.Name.LocalName; + switch (localName) + { + case "tbl": + // Remove wrapper paragraph if tbl is the only content + if (IsWrapperParagraph(parent)) + { + var wrapperParent = parent.Parent!; + parent.Remove(); + _dirty = true; + SaveSection(wrapperParent); + return null; + } + break; + + case "pic" or "img": + CleanupBinData(element); + if (IsWrapperParagraph(parent)) + { + var wrapperParent = parent.Parent!; + parent.Remove(); + _dirty = true; + SaveSection(wrapperParent); + return null; + } + break; + + case "equation": + // equation is wrapped in p > run > equation + var eqRun = parent; // run + var eqP = eqRun?.Parent; // p + if (eqP != null && IsWrapperParagraph(eqP)) + { + var eqParent = eqP.Parent!; + eqP.Remove(); + _dirty = true; + SaveSection(eqParent); + return null; + } + break; + + case "memo": + CleanupMemoMarkers(element); + break; + } + + element.Remove(); + _dirty = true; + SaveSection(parent); + return null; + } + + // ==================== Remove Helpers ==================== + + /// + /// Check if a paragraph is just a wrapper for a single table/picture/equation. + /// These wrappers should be removed together with their content. + /// + private static bool IsWrapperParagraph(XElement p) + { + if (p.Name != HwpxNs.Hp + "p") return false; + var runs = p.Elements(HwpxNs.Hp + "run").ToList(); + if (runs.Count != 1) return false; + var run = runs[0]; + var nonTextChildren = run.Elements() + .Where(e => e.Name != HwpxNs.Hp + "t" || !string.IsNullOrWhiteSpace(e.Value)) + .ToList(); + return nonTextChildren.Count == 1 && ( + nonTextChildren[0].Name == HwpxNs.Hp + "tbl" || + nonTextChildren[0].Name == HwpxNs.Hp + "pic" || + nonTextChildren[0].Name == HwpxNs.Hp + "equation"); + } + + /// + /// Remove header or footer ctrl element from section XML by 1-based index. + /// + private string? RemoveHeaderFooter(bool isHeader, int index) + { + var tagName = isHeader ? "header" : "footer"; + int found = 0; + foreach (var section in _doc.Sections) + { + foreach (var hf in section.Root.Descendants(HwpxNs.Hp + tagName).ToList()) + { + found++; + if (found == index) + { + var ctrl = hf.Parent; + if (ctrl?.Name == HwpxNs.Hp + "ctrl") + ctrl.Remove(); + else + hf.Remove(); + _dirty = true; + SaveSection(section.Root); + return null; + } + } + } + throw new CliException($"Cannot find {tagName}[{index}] in HWPX document"); + } + + /// + /// Clean up BinData ZIP entry when an image is removed. + /// + private void CleanupBinData(XElement picElement) + { + var imgEl = picElement.Descendants() + .FirstOrDefault(e => e.Name.LocalName == "img" || e.Name.LocalName == "binItem"); + var binRef = imgEl?.Attribute("binaryItemIDRef")?.Value + ?? imgEl?.Attribute("src")?.Value; + if (binRef == null) return; + + // Delete BinData entry — try Contents/ prefixed paths first, then root BinData/ (legacy) + var entryPath = binRef.StartsWith("Contents/") + ? binRef + : binRef.StartsWith("BinData/") + ? $"Contents/{binRef}" + : $"Contents/BinData/{binRef}"; + var entry = _doc.Archive.GetEntry(entryPath) + ?? _doc.Archive.GetEntry($"BinData/{binRef}"); + entry?.Delete(); + } + + /// + /// Remove inline memo markers (memoBegin/memoEnd) from body text. + /// + private void CleanupMemoMarkers(XElement memoElement) + { + var memoId = memoElement.Attribute("id")?.Value; + if (memoId == null) return; + + foreach (var sec in _doc.Sections) + { + foreach (var ctrl in sec.Root.Descendants(HwpxNs.Hp + "ctrl").ToList()) + { + var fieldBegin = ctrl.Element(HwpxNs.Hp + "fieldBegin"); + if (fieldBegin?.Attribute("type")?.Value == "MEMO" + && fieldBegin.Attribute("instId")?.Value == memoId) + { + ctrl.Remove(); + } + var fieldEnd = ctrl.Element(HwpxNs.Hp + "fieldEnd"); + if (fieldEnd?.Attribute("type")?.Value == "MEMO") + { + ctrl.Remove(); + } + } + } + } + + /// Remove page background (pageBorderFill) from all sections. + private string? RemovePageBackground() + { + foreach (var sec in _doc.Sections) + { + var pageBf = sec.Root.Descendants(HwpxNs.Hp + "pageBorderFill") + .Where(e => e.Attribute("type")?.Value == "BOTH") + .ToList(); + foreach (var bf in pageBf) bf.Remove(); + _dirty = true; + SaveSection(sec.Root); + } + return null; + } + + /// Add image watermark via pageBorderFill + imgBrush (Plan 98). + private XElement CreateWatermark(Dictionary? props) + { + var path = props?.GetValueOrDefault("path") ?? props?.GetValueOrDefault("src") + ?? throw new CliException("watermark requires 'path' or 'src' property"); + if (!File.Exists(path)) + throw new CliException($"Watermark image not found: {path}"); + + var bright = props?.GetValueOrDefault("bright") ?? "70"; + var contrast = props?.GetValueOrDefault("contrast") ?? "-50"; + + // 1. Add image to BinData + var imageBytes = File.ReadAllBytes(path); + var ext = Path.GetExtension(path).TrimStart('.').ToLowerInvariant(); + if (ext == "jpg") ext = "jpeg"; + var mediaType = ext switch { "png" => "image/png", "jpeg" => "image/jpeg", _ => $"image/{ext}" }; + var imageId = GetNextImageId(); + var binFileName = $"image{imageId}.{ext}"; + + // BinData lives at ZIP root (BinData/), NOT under Contents/. + // Hancom resolves manifest href relative to ZIP root, not content.hpf location. + var binEntry = _doc.Archive.CreateEntry($"BinData/{binFileName}", + System.IO.Compression.CompressionLevel.Optimal); + using (var binStream = binEntry.Open()) + binStream.Write(imageBytes, 0, imageBytes.Length); + RegisterImageInManifest($"image{imageId}", $"BinData/{binFileName}", mediaType); + + // 2. Create borderFill with imgBrush (golden template pattern) + var bfId = NextBorderFillId(); + var bf = new XElement(HwpxNs.Hh + "borderFill", + new XAttribute("id", bfId), + new XAttribute("threeD", "0"), + new XAttribute("shadow", "0"), + new XAttribute("centerLine", "NONE"), + new XAttribute("breakCellSeparateLine", "0"), + new XElement(HwpxNs.Hh + "slash", + new XAttribute("type", "NONE"), new XAttribute("Crooked", "0"), new XAttribute("isCounter", "0")), + new XElement(HwpxNs.Hh + "backSlash", + new XAttribute("type", "NONE"), new XAttribute("Crooked", "0"), new XAttribute("isCounter", "0")), + MakeBorder("leftBorder", "NONE", "0.1 mm", "#000000"), + MakeBorder("rightBorder", "NONE", "0.1 mm", "#000000"), + MakeBorder("topBorder", "NONE", "0.1 mm", "#000000"), + MakeBorder("bottomBorder", "NONE", "0.1 mm", "#000000"), + MakeBorder("diagonal", "SOLID", "0.1 mm", "#000000"), + new XElement(HwpxNs.Hc + "fillBrush", + new XElement(HwpxNs.Hc + "imgBrush", + new XAttribute("mode", "TOTAL"), + new XElement(HwpxNs.Hc + "img", + new XAttribute("binaryItemIDRef", $"image{imageId}"), + new XAttribute("bright", bright), + new XAttribute("contrast", contrast), + new XAttribute("effect", "REAL_PIC"), + new XAttribute("alpha", "0"))))); + + var container = _doc.Header!.Root!.Descendants(HwpxNs.Hh + "borderFills").FirstOrDefault(); + container?.Add(bf); + if (container != null) + container.SetAttributeValue("itemCnt", container.Elements(HwpxNs.Hh + "borderFill").Count().ToString()); + SaveHeader(); + + // 3. Set pageBorderFill on all sections + foreach (var sec in _doc.Sections) + { + var secPr = sec.Root.Descendants(HwpxNs.Hp + "secPr").FirstOrDefault(); + if (secPr == null) continue; + + // Update BOTH type pageBorderFill + var pageBf = secPr.Elements(HwpxNs.Hp + "pageBorderFill") + .FirstOrDefault(e => e.Attribute("type")?.Value == "BOTH"); + if (pageBf != null) + pageBf.SetAttributeValue("borderFillIDRef", bfId); + else + secPr.Add(new XElement(HwpxNs.Hp + "pageBorderFill", + new XAttribute("type", "BOTH"), + new XAttribute("borderFillIDRef", bfId), + new XAttribute("textBorder", "PAPER"), + new XAttribute("headerInside", "0"), + new XAttribute("footerInside", "0"), + new XAttribute("fillArea", "PAPER"), + new XElement(HwpxNs.Hp + "offset", + new XAttribute("left", "1417"), + new XAttribute("right", "1417"), + new XAttribute("top", "1417"), + new XAttribute("bottom", "1417")))); + + SaveSection(sec.Root); + } + + _dirty = true; + // Return empty paragraph as placeholder (watermark is page-level, not inline) + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", "0"), + new XAttribute("paraPrIDRef", "0"), + new XAttribute("styleIDRef", "0")); + } + + /// Remove TOC paragraphs (static V1 or field-based V2). + private string? RemoveToc() + { + foreach (var sec in _doc.Sections) + { + // V2: field-based TOC (fieldBegin type="TABLEOFCONTENTS" ... fieldEnd) + var fieldBegins = sec.Root.Descendants(HwpxNs.Hp + "fieldBegin") + .Where(fb => fb.Attribute("type")?.Value == "TABLEOFCONTENTS") + .ToList(); + if (fieldBegins.Count > 0) + { + foreach (var fb in fieldBegins) + { + var instId = fb.Attribute("instId")?.Value; + // Remove all paragraphs between fieldBegin and matching fieldEnd + var beginCtrl = fb.Parent; // ctrl + var beginPara = beginCtrl?.Parent; // p + if (beginPara == null) continue; + + var parasToRemove = new List { beginPara }; + var sibling = beginPara.ElementsAfterSelf(HwpxNs.Hp + "p").GetEnumerator(); + while (sibling.MoveNext()) + { + parasToRemove.Add(sibling.Current); + var endField = sibling.Current.Descendants(HwpxNs.Hp + "fieldEnd") + .FirstOrDefault(fe => fe.Attribute("type")?.Value == "TABLEOFCONTENTS"); + if (endField != null) break; + } + foreach (var p in parasToRemove) p.Remove(); + } + _dirty = true; + SaveSection(sec.Root); + continue; + } + + // V1: static TOC (title "목차"/"차례" + entries until blank line) + var tocStart = -1; + var tocEnd = -1; + var paras = sec.Root.Elements(HwpxNs.Hp + "p").ToList(); + for (int i = 0; i < paras.Count; i++) + { + var text = ExtractParagraphText(paras[i]).Trim(); + if (text == "목차" || text == "목 차" || text == "차례" + || text.StartsWith("[목차]") || text.StartsWith("[차례]")) + { + tocStart = i; + // Consume entries until blank line (CreateStaticToc adds trailing blank) + for (int j = i + 1; j < paras.Count; j++) + { + var t = ExtractParagraphText(paras[j]).Trim(); + tocEnd = j; + if (string.IsNullOrWhiteSpace(t)) break; + } + break; + } + } + if (tocStart >= 0 && tocEnd >= tocStart) + { + for (int i = tocEnd; i >= tocStart; i--) + paras[i].Remove(); + _dirty = true; + SaveSection(sec.Root); + } + } + return null; + } + + // ==================== Move ==================== + + /// + /// Move an element from sourcePath to a new position under targetParentPath. + /// + /// CORRECT detach-then-insert pattern: + /// 1. Resolve targetParentPath FIRST (validate before modifying the tree). + /// 2. Resolve sourcePath. + /// 3. Detach: call source.Remove() BEFORE re-inserting. + /// Bad pattern: target.Add(source) when source is still parented — + /// XLinq silently moves it, but only within the same XDocument. + /// Cross-section moves fail silently without detach. + /// 4. Insert at the specified index under the target parent. + /// + /// New path of the moved element. + public string Move(string sourcePath, string? targetParentPath, InsertPosition? position) + { + var index = position?.Index; + if (string.IsNullOrEmpty(targetParentPath)) + throw new CliException("Target parent path is required for move"); + + // 1. Resolve target FIRST (before tree modification) + var target = ResolvePath(targetParentPath); + + // 2. Resolve source + var source = ResolvePath(sourcePath); + var sourceParent = source.Parent; + + // 3. Detach source — NEVER re-parent directly + source.Remove(); + + // 4. Insert at position + if (index.HasValue) + { + var siblings = target.Elements(source.Name).ToList(); + var insertIdx = index.Value - 1; + + if (insertIdx <= 0 || siblings.Count == 0) + { + var firstOfType = target.Elements(source.Name).FirstOrDefault(); + if (firstOfType != null) + firstOfType.AddBeforeSelf(source); + else + target.Add(source); + } + else if (insertIdx >= siblings.Count) + { + siblings.Last().AddAfterSelf(source); + } + else + { + siblings[insertIdx].AddBeforeSelf(source); + } + } + else + { + target.Add(source); + } + + _dirty = true; + + // Save both affected sections + if (sourceParent != null) + SaveSection(sourceParent); + SaveSection(target); + + return BuildPath(source); + } + + // ==================== CopyFrom ==================== + + /// + /// Deep-clone the element at sourcePath and insert the copy under targetParentPath. + /// Assigns a new id attribute to the clone to avoid duplicate IDs. + /// + /// Path of the newly created copy. + public string CopyFrom(string sourcePath, string targetParentPath, InsertPosition? position) + { + var index = position?.Index; + var source = ResolvePath(sourcePath); + var target = ResolvePath(targetParentPath); + + // Deep clone + var clone = new XElement(source); + + // Assign new IDs to clone and all descendants with id attributes + AssignNewIds(clone); + + // Insert at position + if (index.HasValue) + { + var siblings = target.Elements(clone.Name).ToList(); + var insertIdx = index.Value - 1; + + if (insertIdx <= 0 || siblings.Count == 0) + { + var firstOfType = target.Elements(clone.Name).FirstOrDefault(); + if (firstOfType != null) + firstOfType.AddBeforeSelf(clone); + else + target.Add(clone); + } + else if (insertIdx >= siblings.Count) + { + siblings.Last().AddAfterSelf(clone); + } + else + { + siblings[insertIdx].AddBeforeSelf(clone); + } + } + else + { + target.Add(clone); + } + + _dirty = true; + SaveSection(target); + return BuildPath(clone); + } + + /// + /// Recursively assign new IDs to an element and all descendants + /// that have an "id" attribute. Prevents duplicate IDs in the document. + /// + private void AssignNewIds(XElement element) + { + if (element.Attribute("id") != null) + { + element.SetAttributeValue("id", NewId()); + } + + foreach (var child in element.Elements()) + { + AssignNewIds(child); + } + } + + // ==================== Helpers ==================== + + /// + /// Ensure a borderFill with SOLID black borders exists in header.xml. + /// Returns the borderFill ID. If one already exists, returns its ID. + /// If not, creates a new one and returns the new ID. + /// + private string EnsureTableBorderFill() + { + var header = _doc.Header?.Root; + if (header == null) return "1"; // fallback + + var refList = header.Element(HwpxNs.Hh + "refList"); + if (refList == null) return "1"; + + var borderFills = refList.Element(HwpxNs.Hh + "borderFills"); + if (borderFills == null) return "1"; + + // Check if any existing borderFill has SOLID borders on all 4 sides + foreach (var bf in borderFills.Elements(HwpxNs.Hh + "borderFill")) + { + var left = bf.Element(HwpxNs.Hh + "leftBorder"); + var right = bf.Element(HwpxNs.Hh + "rightBorder"); + var top = bf.Element(HwpxNs.Hh + "topBorder"); + var bottom = bf.Element(HwpxNs.Hh + "bottomBorder"); + if (left?.Attribute("type")?.Value == "SOLID" + && right?.Attribute("type")?.Value == "SOLID" + && top?.Attribute("type")?.Value == "SOLID" + && bottom?.Attribute("type")?.Value == "SOLID") + { + return bf.Attribute("id")?.Value ?? "1"; + } + } + + // None found — create a new one with SOLID black borders + var newId = NextBorderFillId(); + + var newBorderFill = new XElement(HwpxNs.Hh + "borderFill", + new XAttribute("id", newId), + new XAttribute("threeD", "0"), + new XAttribute("shadow", "0"), + new XAttribute("centerLine", "NONE"), + new XAttribute("breakCellSeparateLine", "0"), + new XElement(HwpxNs.Hh + "slash", new XAttribute("type", "NONE"), new XAttribute("Crooked", "0"), new XAttribute("isCounter", "0")), + new XElement(HwpxNs.Hh + "backSlash", new XAttribute("type", "NONE"), new XAttribute("Crooked", "0"), new XAttribute("isCounter", "0")), + MakeBorder("leftBorder", "SOLID", "0.12 mm", "#000000"), + MakeBorder("rightBorder", "SOLID", "0.12 mm", "#000000"), + MakeBorder("topBorder", "SOLID", "0.12 mm", "#000000"), + MakeBorder("bottomBorder", "SOLID", "0.12 mm", "#000000"), + MakeBorder("diagonal", "SOLID", "0.12 mm", "#000000") + ); + + borderFills.Add(newBorderFill); + var existingCount = borderFills.Elements(HwpxNs.Hh + "borderFill").Count(); + borderFills.SetAttributeValue("itemCnt", existingCount.ToString()); + + // Save the modified header + SaveHeader(); + + return newId; + } + + /// + /// Check if parent element is a section or section-like container. + /// Tables added to these containers must be wrapped in p>run. + /// + private static bool IsSectionLike(XElement parent) + { + var localName = parent.Name.LocalName; + return localName is "sec" or "section" or "body"; + } + + /// + /// Wrap a <hp:tbl> in <hp:p><hp:run>...</hp:run></hp:p> + /// so Hancom renders it. Without this wrapper, tables are invisible. + /// + private XElement WrapTableInParagraph(XElement tbl) + { + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("styleIDRef", "0"), + new XAttribute("paraPrIDRef", "0"), + new XAttribute("pageBreak", "0"), + new XAttribute("columnBreak", "0"), + new XAttribute("merged", "0"), + new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", "0"), + tbl) + ); + } + + /// + /// Build a Hancom-compatible element with correct child ordering: + /// subList → cellAddr → cellSpan → cellSz → cellMargin. + /// This matches the structure produced by Hancom Office (2011 namespace). + /// + private XElement BuildCell(int rowAddr, int colAddr, int rowSpan, int colSpan, + int width, int height, string text, + string borderFillIDRef, bool isHeader = false, + string vertAlign = "CENTER") + { + return new XElement(HwpxNs.Hp + "tc", + new XAttribute("name", ""), + new XAttribute("header", isHeader ? "1" : "0"), + new XAttribute("hasMargin", "0"), + new XAttribute("protect", "0"), + new XAttribute("editable", "0"), + new XAttribute("dirty", "0"), + new XAttribute("borderFillIDRef", borderFillIDRef), + CreateSubList(text, vertAlign), + new XElement(HwpxNs.Hp + "cellAddr", + new XAttribute("colAddr", colAddr.ToString()), + new XAttribute("rowAddr", rowAddr.ToString())), + new XElement(HwpxNs.Hp + "cellSpan", + new XAttribute("colSpan", colSpan.ToString()), + new XAttribute("rowSpan", rowSpan.ToString())), + new XElement(HwpxNs.Hp + "cellSz", + new XAttribute("width", width.ToString()), + new XAttribute("height", height.ToString())), + new XElement(HwpxNs.Hp + "cellMargin", + new XAttribute("left", "510"), + new XAttribute("right", "510"), + new XAttribute("top", "141"), + new XAttribute("bottom", "141")) + ); + } + + // ==================== Picture ==================== + + /// + /// Create a picture element. The image file is registered in the ZIP (BinData/) and content.hpf manifest. + /// Golden template based on real Hancom documents: uses hc:img (NOT hp:img), hc:pt0-pt3 for imgRect. + /// Props: path (required), width (e.g. "2in"), height (e.g. "1in"), alt. + /// + private readonly record struct PictureReferenceBox(int Width, int Height); + + private XElement CreatePicture(XElement parent, Dictionary? props) + { + var path = props?.GetValueOrDefault("path") + ?? props?.GetValueOrDefault("src") + ?? throw new CliException("picture requires 'path' property"); + if (!File.Exists(path)) + throw new CliException($"Image file not found: {path}"); + + var widthHwp = ParseDimensionToHwpUnit(props?.GetValueOrDefault("width") ?? "2in"); + var heightHwp = ParseDimensionToHwpUnit(props?.GetValueOrDefault("height") ?? "1in"); + var wrapMode = ResolvePictureWrap(props); + var anchorMode = ResolvePictureAnchor(props, wrapMode); + var (horizontalAlign, verticalAlign) = ParsePictureAlignment(props); + var sectionRoot = ResolvePictureSectionRoot(parent); + var anchorParagraph = ResolvePictureAnchorParagraph(parent); + var referenceBox = ResolvePictureReferenceBox(anchorMode, sectionRoot, anchorParagraph); + var (offsetX, offsetY) = ResolvePictureOffsets(props, anchorMode, widthHwp, heightHwp, + referenceBox, horizontalAlign, verticalAlign); + var treatAsChar = wrapMode == "char"; + var textWrap = MapPictureWrap(wrapMode); + var lockValue = ParseBoolProp(props, "lock") ? "1" : "0"; + var zOrder = ResolvePictureZOrder(props, wrapMode).ToString(); + var relativeTarget = anchorMode == "page" ? "PAPER" : "PARA"; + + // 1. Read image bytes and determine format + var imageBytes = File.ReadAllBytes(path); + var ext = Path.GetExtension(path).TrimStart('.').ToLowerInvariant(); + if (ext == "jpg") ext = "jpeg"; + var mediaType = ext switch + { + "png" => "image/png", + "jpeg" => "image/jpeg", + "gif" => "image/gif", + "bmp" => "image/bmp", + "tiff" or "tif" => "image/tiff", + _ => $"image/{ext}" + }; + + // 2. Find next available image ID in content.hpf + var imageId = GetNextImageId(); + var binFileName = $"image{imageId}.{ext}"; + + // 3. Add image to ZIP — BinData/ at archive root (not Contents/BinData/) + var binEntry = _doc.Archive.CreateEntry($"BinData/{binFileName}", System.IO.Compression.CompressionLevel.Optimal); + using (var binStream = binEntry.Open()) + binStream.Write(imageBytes, 0, imageBytes.Length); + + // 4. Register in content.hpf manifest + RegisterImageInManifest($"image{imageId}", $"BinData/{binFileName}", mediaType); + + // 5. Create element (golden template structure from real Hancom docs) + var id = NewId(); + var instId = NewId(); + return new XElement(HwpxNs.Hp + "pic", + new XAttribute("id", id), + new XAttribute("zOrder", zOrder), + new XAttribute("numberingType", "PICTURE"), + new XAttribute("textWrap", textWrap), + new XAttribute("textFlow", "BOTH_SIDES"), + new XAttribute("lock", lockValue), + new XAttribute("dropcapstyle", "None"), + new XAttribute("href", ""), + new XAttribute("groupLevel", "0"), + new XAttribute("instid", instId), + new XAttribute("reverse", "0"), + new XElement(HwpxNs.Hp + "offset", new XAttribute("x", "0"), new XAttribute("y", "0")), + new XElement(HwpxNs.Hp + "orgSz", + new XAttribute("width", widthHwp), new XAttribute("height", heightHwp)), + new XElement(HwpxNs.Hp + "curSz", + new XAttribute("width", widthHwp), new XAttribute("height", heightHwp)), + new XElement(HwpxNs.Hp + "flip", new XAttribute("horizontal", "0"), new XAttribute("vertical", "0")), + new XElement(HwpxNs.Hp + "rotationInfo", + new XAttribute("angle", "0"), + new XAttribute("centerX", (widthHwp / 2).ToString()), + new XAttribute("centerY", (heightHwp / 2).ToString()), + new XAttribute("rotateimage", "1")), + new XElement(HwpxNs.Hp + "renderingInfo", + new XElement(HwpxNs.Hc + "transMatrix", + new XAttribute("e1", "1"), new XAttribute("e2", "0"), new XAttribute("e3", "0"), + new XAttribute("e4", "0"), new XAttribute("e5", "1"), new XAttribute("e6", "0")), + new XElement(HwpxNs.Hc + "scaMatrix", + new XAttribute("e1", "1"), new XAttribute("e2", "0"), new XAttribute("e3", "0"), + new XAttribute("e4", "0"), new XAttribute("e5", "1"), new XAttribute("e6", "0")), + new XElement(HwpxNs.Hc + "rotMatrix", + new XAttribute("e1", "1"), new XAttribute("e2", "0"), new XAttribute("e3", "0"), + new XAttribute("e4", "0"), new XAttribute("e5", "1"), new XAttribute("e6", "0"))), + // CRITICAL: hc:img, NOT hp:img (core namespace) + new XElement(HwpxNs.Hc + "img", + new XAttribute("binaryItemIDRef", $"image{imageId}"), + new XAttribute("bright", "0"), + new XAttribute("contrast", "0"), + new XAttribute("effect", "REAL_PIC"), + new XAttribute("alpha", "0")), + new XElement(HwpxNs.Hp + "imgRect", + new XElement(HwpxNs.Hc + "pt0", new XAttribute("x", "0"), new XAttribute("y", "0")), + new XElement(HwpxNs.Hc + "pt1", new XAttribute("x", widthHwp), new XAttribute("y", "0")), + new XElement(HwpxNs.Hc + "pt2", new XAttribute("x", widthHwp), new XAttribute("y", heightHwp)), + new XElement(HwpxNs.Hc + "pt3", new XAttribute("x", "0"), new XAttribute("y", heightHwp))), + new XElement(HwpxNs.Hp + "imgClip", + new XAttribute("left", "0"), new XAttribute("right", widthHwp), + new XAttribute("top", "0"), new XAttribute("bottom", heightHwp)), + new XElement(HwpxNs.Hp + "inMargin", + new XAttribute("left", "0"), new XAttribute("right", "0"), + new XAttribute("top", "0"), new XAttribute("bottom", "0")), + new XElement(HwpxNs.Hp + "imgDim", + new XAttribute("dimwidth", widthHwp), new XAttribute("dimheight", heightHwp)), + new XElement(HwpxNs.Hp + "effects"), + new XElement(HwpxNs.Hp + "sz", + new XAttribute("width", widthHwp), new XAttribute("widthRelTo", "ABSOLUTE"), + new XAttribute("height", heightHwp), new XAttribute("heightRelTo", "ABSOLUTE"), + new XAttribute("protect", "0")), + new XElement(HwpxNs.Hp + "pos", + new XAttribute("treatAsChar", treatAsChar ? "1" : "0"), new XAttribute("affectLSpacing", "0"), + new XAttribute("flowWithText", "1"), new XAttribute("allowOverlap", "0"), + new XAttribute("holdAnchorAndSO", "0"), + new XAttribute("vertRelTo", relativeTarget), new XAttribute("horzRelTo", relativeTarget), + new XAttribute("vertAlign", "TOP"), new XAttribute("horzAlign", "LEFT"), + new XAttribute("vertOffset", offsetY), new XAttribute("horzOffset", offsetX)), + new XElement(HwpxNs.Hp + "outMargin", + new XAttribute("left", "0"), new XAttribute("right", "0"), + new XAttribute("top", "0"), new XAttribute("bottom", "0")) + ); + } + + private static string ResolvePictureWrap(Dictionary? props) + { + var rawWrap = props?.GetValueOrDefault("wrap") + ?? props?.GetValueOrDefault("textwrap") + ?? ""; + var normalizedWrap = rawWrap.Trim().ToLowerInvariant() switch + { + "char" or "inline" => "char", + "square" => "square", + "front" or "infront" or "in_front" => "front", + "behind" or "back" => "behind", + "topbottom" or "top_bottom" or "top-and-bottom" or "tight" or "wrap" => "topbottom", + _ => "char" + }; + if (normalizedWrap != "char") + return normalizedWrap; + + var anchor = props?.GetValueOrDefault("anchor")?.Trim().ToLowerInvariant(); + var hasPositioningProps = props != null && (props.ContainsKey("x") || props.ContainsKey("y") + || props.ContainsKey("halign") || props.ContainsKey("valign")); + return (anchor is "page" or "para") || hasPositioningProps ? "topbottom" : "char"; + } + + private static string ResolvePictureAnchor(Dictionary? props, string wrapMode) + { + var anchor = props?.GetValueOrDefault("anchor")?.Trim().ToLowerInvariant(); + if (anchor is "page" or "paper") + return "page"; + if (anchor == "para") + return "para"; + return wrapMode == "char" ? "para" : "para"; + } + + private static (string Horizontal, string Vertical) ParsePictureAlignment(Dictionary? props) + { + var horizontal = props?.GetValueOrDefault("halign")?.Trim().ToLowerInvariant() switch + { + "center" or "middle" => "center", + "right" => "right", + _ => "left" + }; + var vertical = props?.GetValueOrDefault("valign")?.Trim().ToLowerInvariant() switch + { + "middle" or "center" => "middle", + "bottom" => "bottom", + _ => "top" + }; + return (horizontal, vertical); + } + + private XElement ResolvePictureSectionRoot(XElement parent) + { + if (parent.Name == HwpxNs.Hs + "sec") + return parent; + var sectionRoot = parent.AncestorsAndSelf() + .FirstOrDefault(e => e.Name == HwpxNs.Hs + "sec"); + return sectionRoot ?? _doc.PrimarySection.Root; + } + + private static XElement? ResolvePictureAnchorParagraph(XElement parent) + { + if (parent.Name == HwpxNs.Hp + "p") + return parent; + if (parent.Name == HwpxNs.Hp + "run" && parent.Parent?.Name == HwpxNs.Hp + "p") + return parent.Parent; + return null; + } + + private static PictureReferenceBox ResolvePictureReferenceBox(string anchorMode, XElement sectionRoot, + XElement? anchorParagraph) + { + var secPr = sectionRoot.Descendants(HwpxNs.Hp + "secPr").FirstOrDefault(); + var pagePr = secPr?.Element(HwpxNs.Hp + "pagePr"); + var margin = pagePr?.Element(HwpxNs.Hp + "margin"); + var pageWidth = (int?)pagePr?.Attribute("width") ?? 59528; + var pageHeight = (int?)pagePr?.Attribute("height") ?? 84186; + var marginLeft = (int?)margin?.Attribute("left") ?? 8504; + var marginRight = (int?)margin?.Attribute("right") ?? 8504; + var marginTop = (int?)margin?.Attribute("top") ?? 5668; + var marginBottom = (int?)margin?.Attribute("bottom") ?? 4252; + + if (anchorMode == "page") + return new PictureReferenceBox(pageWidth, pageHeight); + + var bodyWidth = Math.Max(pageWidth - marginLeft - marginRight, 0); + var bodyHeight = Math.Max(pageHeight - marginTop - marginBottom, 0); + if (anchorParagraph == null) + return new PictureReferenceBox(bodyWidth, bodyHeight); + return new PictureReferenceBox(bodyWidth, bodyHeight); + } + + private static (int X, int Y) ResolvePictureOffsets(Dictionary? props, string anchorMode, + int widthHwp, int heightHwp, PictureReferenceBox referenceBox, string horizontalAlign, string verticalAlign) + { + var explicitX = ParsePictureOffset(props?.GetValueOrDefault("x")); + var explicitY = ParsePictureOffset(props?.GetValueOrDefault("y")); + + var baseX = horizontalAlign switch + { + "center" => (referenceBox.Width - widthHwp) / 2, + "right" => referenceBox.Width - widthHwp, + _ => 0 + }; + + var baseY = 0; + if (anchorMode == "page") + { + baseY = verticalAlign switch + { + "middle" => (referenceBox.Height - heightHwp) / 2, + "bottom" => referenceBox.Height - heightHwp, + _ => 0 + }; + } + + return (baseX + explicitX, baseY + explicitY); + } + + private static int ParsePictureOffset(string? value) + { + if (string.IsNullOrWhiteSpace(value)) + return 0; + return ParseDimensionToHwpUnit(value); + } + + private static string MapPictureWrap(string wrapMode) => wrapMode switch + { + "square" => "SQUARE", + "front" => "IN_FRONT_OF_TEXT", + "behind" => "BEHIND_TEXT", + _ => "TOP_AND_BOTTOM" + }; + + private static bool ParseBoolProp(Dictionary? props, string key) + { + var value = props?.GetValueOrDefault(key); + return value != null && (value.Equals("true", StringComparison.OrdinalIgnoreCase) || value == "1"); + } + + private static int ResolvePictureZOrder(Dictionary? props, string wrapMode) + { + if (int.TryParse(props?.GetValueOrDefault("z"), out var zOrder)) + return zOrder; + return wrapMode switch + { + "front" => 1, + "behind" => 0, + _ => 0 + }; + } + + // ==================== Hyperlink ==================== + + private static readonly HashSet SafeUrlSchemes = new(StringComparer.OrdinalIgnoreCase) + { + "http", "https", "mailto", "ftp", "ftps", "tel" + }; + + private static void ValidateUrlScheme(string url) + { + if (string.IsNullOrWhiteSpace(url)) + throw new ArgumentException("Hyperlink URL must not be empty."); + + // Reject obvious local/UNC paths + if (url.StartsWith('\\') || url.StartsWith("//") || + (url.Length >= 2 && url[1] == ':')) + throw new ArgumentException( + $"Local or UNC path not allowed as hyperlink target: '{url}'."); + + if (Uri.TryCreate(url, UriKind.Absolute, out var uri)) + { + if (!SafeUrlSchemes.Contains(uri.Scheme)) + throw new ArgumentException( + $"Unsafe URL scheme '{uri.Scheme}' in '{url}'. " + + $"Allowed schemes: {string.Join(", ", SafeUrlSchemes)}."); + } + else + { + // Non-parseable as absolute URI — reject unless it looks like + // a fragment (#anchor) or same-document reference + if (!url.StartsWith('#')) + throw new ArgumentException( + $"Hyperlink target must be an absolute URL with allowed scheme " + + $"({string.Join(", ", SafeUrlSchemes)}) or a fragment reference: '{url}'."); + } + } + + /// + /// Create a hyperlink using the OWPML fieldBegin/fieldEnd pattern (3-run structure). + /// Golden template based on OWPML schema + python-hwpx implementation. + /// Props: url/href (required), text (default=url). + /// + private XElement CreateHyperlink(Dictionary? props) + { + var url = props?.GetValueOrDefault("url") ?? props?.GetValueOrDefault("href") + ?? throw new CliException("hyperlink requires 'url' property"); + + // Plan 99.9.E3: Safe URL scheme whitelist + ValidateUrlScheme(url); + + var text = props?.GetValueOrDefault("text") ?? url; + var fieldId = NewId(); + var fieldIdNum = NewId(); + + // Determine link category and command encoding (golden template 2026-04-11) + string category, command; + if (url.StartsWith("mailto:", StringComparison.OrdinalIgnoreCase)) + { + category = "HWPHYPERLINK_TYPE_EMAIL"; + command = EscapeHyperlinkCommand(url) + ";2;0;0;"; + } + else if (url.StartsWith("http://", StringComparison.OrdinalIgnoreCase) + || url.StartsWith("https://", StringComparison.OrdinalIgnoreCase)) + { + category = "HWPHYPERLINK_TYPE_URL"; + command = EscapeHyperlinkCommand(url) + ";1;0;0;"; + } + else + { + category = "HWPHYPERLINK_TYPE_EX"; + command = EscapeHyperlinkCommand(url) + ";3;0;0;"; + } + + // Ensure hyperlink charPr exists in header.xml (blue underline) + var linkCharPrId = EnsureHyperlinkCharPr(); + + // Build parameters element + var parameters = new XElement(HwpxNs.Hp + "parameters", + new XAttribute("cnt", "6"), + new XAttribute("name", ""), + new XElement(HwpxNs.Hp + "integerParam", new XAttribute("name", "Prop"), "0"), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Command"), command), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Path"), url), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Category"), category), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "TargetType"), "HWPHYPERLINK_TARGET_BOOKMARK"), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "DocOpenType"), "HWPHYPERLINK_JUMP_CURRENTTAB")); + + // Hyperlinks in HWPX use fieldBegin/fieldEnd (golden template confirmed). + // URL type uses double nesting; email/file use single nesting. + var para = new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("styleIDRef", "0"), + new XAttribute("paraPrIDRef", "0"), + new XAttribute("pageBreak", "0"), + new XAttribute("columnBreak", "0"), + new XAttribute("merged", "0"), + // Run 1: fieldBegin with parameters + new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "ctrl", + new XElement(HwpxNs.Hp + "fieldBegin", + new XAttribute("id", fieldId), + new XAttribute("type", "HYPERLINK"), + new XAttribute("name", ""), + new XAttribute("editable", "0"), + new XAttribute("dirty", "1"), + new XAttribute("zorder", "-1"), + new XAttribute("fieldid", fieldIdNum), + new XAttribute("metaTag", ""), + parameters))), + // Run 2: visible text with hyperlink charPr (blue underline) + new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", linkCharPrId), + new XElement(HwpxNs.Hp + "t", text)), + // Run 3: fieldEnd + new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "ctrl", + new XElement(HwpxNs.Hp + "fieldEnd", + new XAttribute("beginIDRef", fieldId), + new XAttribute("fieldid", fieldIdNum))), + new XElement(HwpxNs.Hp + "t")) + ); + return para; + } + + /// + /// Escape special characters in hyperlink Command parameter. + /// Colons and semicolons are escaped with backslash (golden template verified). + /// + private static string EscapeHyperlinkCommand(string url) + => url.Replace(":", "\\:").Replace(";", "\\;"); + + /// + /// Ensure a hyperlink charPr (blue text, bottom underline) exists in header.xml. + /// Returns the charPr id string. Creates one if not found. + /// + private string EnsureHyperlinkCharPr() + { + // Look for existing hyperlink charPr (textColor=#0000FF with underline BOTTOM) + var charPrs = _doc.Header?.Root?.Descendants(HwpxNs.Hh + "charPr"); + if (charPrs != null) + { + foreach (var cp in charPrs) + { + if (cp.Attribute("textColor")?.Value == "#0000FF") + { + var underline = cp.Element(HwpxNs.Hh + "underline"); + if (underline?.Attribute("type")?.Value == "BOTTOM") + return cp.Attribute("id")?.Value ?? "0"; + } + } + } + + // Create new hyperlink charPr + var newId = NextCharPrId(); + + // Clone from charPr id=0 as base + var baseCharPr = FindCharPr("0"); + XElement newCharPr; + if (baseCharPr != null) + { + newCharPr = new XElement(baseCharPr); + newCharPr.SetAttributeValue("id", newId.ToString()); + } + else + { + newCharPr = new XElement(HwpxNs.Hh + "charPr", + new XAttribute("id", newId.ToString()), + new XAttribute("height", "1000"), + new XAttribute("shadeColor", "none"), + new XAttribute("useFontSpace", "0"), + new XAttribute("useKerning", "0"), + new XAttribute("symMark", "NONE"), + new XAttribute("borderFillIDRef", "2")); + } + + // Set blue text color + newCharPr.SetAttributeValue("textColor", "#0000FF"); + + // Set underline to BOTTOM SOLID blue + var underlineEl = newCharPr.Element(HwpxNs.Hh + "underline"); + if (underlineEl != null) + { + underlineEl.SetAttributeValue("type", "BOTTOM"); + underlineEl.SetAttributeValue("shape", "SOLID"); + underlineEl.SetAttributeValue("color", "#0000FF"); + } + else + { + newCharPr.Add(new XElement(HwpxNs.Hh + "underline", + new XAttribute("type", "BOTTOM"), + new XAttribute("shape", "SOLID"), + new XAttribute("color", "#0000FF"))); + } + + // Add to header.xml + // CRITICAL: Hancom uses POSITIONAL indexing (array index), not id-based lookup. + // Append at END of container so position matches the new ID. + var lastCharPr = _doc.Header?.Root?.Descendants(HwpxNs.Hh + "charPr").LastOrDefault(); + if (lastCharPr != null) + { + var container = lastCharPr.Parent!; + container.Add(newCharPr); + var count = container.Elements(HwpxNs.Hh + "charPr").Count(); + container.SetAttributeValue("itemCnt", count.ToString()); + } + + SaveHeader(); + return newId.ToString(); + } + + // ==================== Page Break ==================== + + /// + /// Create a page break paragraph. In HWPX, page break is simply a paragraph + /// with pageBreak="1" attribute. + /// + private XElement CreatePageBreak() + { + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("styleIDRef", "0"), + new XAttribute("paraPrIDRef", "0"), + new XAttribute("pageBreak", "1"), + new XAttribute("columnBreak", "0"), + new XAttribute("merged", "0")); + } + + /// Create column break — changes colCount via colPr (Plan 96). + private XElement CreateColumnBreak(Dictionary? props) + { + var cols = int.Parse(props?.GetValueOrDefault("cols") ?? "2"); + var gap = props?.GetValueOrDefault("gap") ?? (cols > 1 ? "2268" : "0"); + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("styleIDRef", "0"), + new XAttribute("paraPrIDRef", "0"), + new XAttribute("pageBreak", "0"), + new XAttribute("columnBreak", "0"), + new XAttribute("merged", "0"), + new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "ctrl", + new XElement(HwpxNs.Hp + "colPr", + new XAttribute("id", ""), + new XAttribute("type", "NEWSPAPER"), + new XAttribute("layout", "LEFT"), + new XAttribute("colCount", cols.ToString()), + new XAttribute("sameSz", cols > 1 ? "1" : "1"), + new XAttribute("sameGap", gap))))); + } + + // ==================== Footnote ==================== + + /// + /// Create a footnote or endnote. Uses hp:ctrl > hp:footNote/endNote > hp:subList structure. + /// The marker appears at the insertion point; footnote text at page bottom, endnote at document end. + /// Props: text (required), number (auto if omitted). + /// + private XElement CreateFootnote(Dictionary? props, bool isEndnote = false) + { + var text = props?.GetValueOrDefault("text") + ?? throw new CliException($"{(isEndnote ? "endnote" : "footnote")} requires 'text' property"); + var number = props?.GetValueOrDefault("number") ?? "0"; // 0 = auto-number + var tagName = isEndnote ? "endNote" : "footNote"; + + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("styleIDRef", "0"), + new XAttribute("paraPrIDRef", "0"), + new XAttribute("pageBreak", "0"), + new XAttribute("columnBreak", "0"), + new XAttribute("merged", "0"), + WrapInRun( + new XElement(HwpxNs.Hp + "ctrl", + new XElement(HwpxNs.Hp + tagName, + new XAttribute("number", number), + CreateSubList(text, "TOP"))))); + } + + // ==================== Comment / Memo ==================== + + /// + /// Add a memo to the section-level memogroup container and attach a + /// fieldBegin/fieldEnd anchor to the last paragraph so Hancom displays it. + /// HWPX memos live in: section > hp:memogroup > hp:memo > hp:paraList > hp:p + /// The anchor uses fieldBegin type="MEMO" with parameters linking to the memo. + /// Props: text (required). + /// + private XElement AddMemoToGroup(XElement sectionParent, Dictionary? props) + { + var text = props?.GetValueOrDefault("text") + ?? throw new CliException("comment/memo requires 'text' property"); + + // Ensure memoPr exists in header + var memoShapeId = EnsureMemoPr(); + + // Find or create the section root (hs:sec) + var section = sectionParent; + if (section.Name != HwpxNs.Hs + "sec") + section = sectionParent.AncestorsAndSelf(HwpxNs.Hs + "sec").FirstOrDefault() ?? sectionParent; + + // Find or create + var memoGroup = section.Element(HwpxNs.Hp + "memogroup"); + if (memoGroup == null) + { + memoGroup = new XElement(HwpxNs.Hp + "memogroup"); + section.Add(memoGroup); + } + + // Create memo with paraList structure (NOT subList) + var memoId = $"memo{memoGroup.Elements(HwpxNs.Hp + "memo").Count()}"; + var memo = new XElement(HwpxNs.Hp + "memo", + new XAttribute("id", memoId), + new XAttribute("memoShapeIDRef", memoShapeId), + new XElement(HwpxNs.Hp + "paraList", + new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("paraPrIDRef", "0"), + new XAttribute("styleIDRef", "0"), + new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "t", text))))); + + memoGroup.Add(memo); + + // Attach fieldBegin/fieldEnd anchor to last paragraph + var lastPara = section.Elements(HwpxNs.Hp + "p").LastOrDefault(); + if (lastPara != null) + { + var fieldId = Guid.NewGuid().ToString("N"); + var now = DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss"); + + // fieldBegin run + var runBegin = new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "ctrl", + new XElement(HwpxNs.Hp + "fieldBegin", + new XAttribute("id", fieldId), + new XAttribute("type", "MEMO"), + new XAttribute("editable", "true"), + new XAttribute("dirty", "false"), + new XAttribute("fieldid", fieldId), + new XElement(HwpxNs.Hp + "parameters", + new XAttribute("count", "5"), + new XAttribute("name", ""), + new XElement(HwpxNs.Hp + "stringParam", + new XAttribute("name", "ID"), memoId), + new XElement(HwpxNs.Hp + "integerParam", + new XAttribute("name", "Number"), "1"), + new XElement(HwpxNs.Hp + "stringParam", + new XAttribute("name", "CreateDateTime"), now), + new XElement(HwpxNs.Hp + "stringParam", + new XAttribute("name", "Author"), ""), + new XElement(HwpxNs.Hp + "stringParam", + new XAttribute("name", "MemoShapeID"), memoShapeId)), + new XElement(HwpxNs.Hp + "subList", + new XAttribute("id", $"memo-field-{memoId}"), + new XAttribute("textDirection", "HORIZONTAL"), + new XAttribute("lineWrap", "BREAK"), + new XAttribute("vertAlign", "TOP"), + new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("paraPrIDRef", "0"), + new XAttribute("styleIDRef", "0"), + new XAttribute("pageBreak", "0"), + new XAttribute("columnBreak", "0"), + new XAttribute("merged", "0"), + new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "t", memoId))))))); + + // fieldEnd run + var runEnd = new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "ctrl", + new XElement(HwpxNs.Hp + "fieldEnd", + new XAttribute("beginIDRef", fieldId), + new XAttribute("fieldid", fieldId)))); + + // Insert at beginning and end of paragraph + lastPara.AddFirst(runBegin); + lastPara.Add(runEnd); + } + + return memo; + } + + /// + /// Ensure a memoProperties/memoPr definition exists in header.xml. + /// Returns the memoPr ID to use as memoShapeIDRef. + /// + private string EnsureMemoPr() + { + var refList = _doc.Header!.Root!.Element(HwpxNs.Hh + "refList"); + if (refList == null) + { + refList = new XElement(HwpxNs.Hh + "refList"); + _doc.Header.Root.Add(refList); + } + + var memoProps = refList.Element(HwpxNs.Hh + "memoProperties"); + if (memoProps != null) + { + var existing = memoProps.Elements(HwpxNs.Hh + "memoPr").FirstOrDefault(); + if (existing != null) + return existing.Attribute("id")?.Value ?? "0"; + } + + // Create memoProperties with default memoPr + memoProps = new XElement(HwpxNs.Hh + "memoProperties", + new XAttribute("itemCnt", "1"), + new XElement(HwpxNs.Hh + "memoPr", + new XAttribute("id", "0"), + new XAttribute("width", "15591"), + new XAttribute("lineWidth", "0.6mm"), + new XAttribute("lineType", "SOLID"), + new XAttribute("lineColor", "#B6D7AE"), + new XAttribute("fillColor", "#F0FFE9"), + new XAttribute("activeColor", "#CFF1C7"), + new XAttribute("memoType", "NORMAL"))); + refList.Add(memoProps); + SaveHeader(); + + return "0"; + } + + // ==================== Page Numbering ==================== + + /// + /// Create a page number element. HWPX uses hp:ctrl > hp:pageNum structure. + /// Props: pos (default BOTTOM_CENTER), format (default DIGIT). + /// formatType: DIGIT, CIRCLED_DIGIT, ROMAN_CAPITAL, ROMAN_SMALL, HANGUL, HANJA. + /// pos: TOP_LEFT, TOP_CENTER, TOP_RIGHT, BOTTOM_LEFT, BOTTOM_CENTER, BOTTOM_RIGHT, + /// OUTSIDE_TOP, OUTSIDE_BOTTOM, INSIDE_TOP, INSIDE_BOTTOM. + /// + private XElement CreatePageNum(Dictionary? props) + { + var pos = props?.GetValueOrDefault("pos") ?? "BOTTOM_CENTER"; + var format = props?.GetValueOrDefault("format") ?? "DIGIT"; + + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("styleIDRef", "0"), + new XAttribute("paraPrIDRef", "0"), + new XAttribute("pageBreak", "0"), + new XAttribute("columnBreak", "0"), + new XAttribute("merged", "0"), + WrapInRun( + new XElement(HwpxNs.Hp + "ctrl", + new XElement(HwpxNs.Hp + "pageNum", + new XAttribute("pos", pos), + new XAttribute("formatType", format), + new XAttribute("sideChar", ""))))); + } + + // ==================== Bookmark ==================== + + /// + /// Create a point bookmark element. HWPX uses hp:ctrl > hp:bookmark structure. + /// Props: name (required). + /// Note: Range bookmarks (fieldBegin/fieldEnd) require start/end at different positions + /// and are not supported in this version. + /// + private XElement CreateBookmark(Dictionary? props) + { + var name = props?.GetValueOrDefault("name") + ?? throw new CliException("bookmark requires 'name' property"); + + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("styleIDRef", "0"), + new XAttribute("paraPrIDRef", "0"), + new XAttribute("pageBreak", "0"), + new XAttribute("columnBreak", "0"), + new XAttribute("merged", "0"), + WrapInRun( + new XElement(HwpxNs.Hp + "ctrl", + new XElement(HwpxNs.Hp + "bookmark", + new XAttribute("name", name))))); + } + + // ==================== Header / Footer ==================== + + /// + /// Add header or footer to the section using the ctrl pattern (golden template verified 2026-04-11). + /// Structure: hp:run > hp:ctrl > hp:header/footer > hp:subList > hp:p + /// The ctrl is inserted into the first paragraph's secPr run (second position). + /// Props: text (required), type (BOTH/ODD/EVEN, default=BOTH). + /// + private XElement AddHeaderFooter(XElement sectionRoot, Dictionary? props, bool isHeader) + { + var text = props?.GetValueOrDefault("text") ?? ""; + var applyPageType = props?.GetValueOrDefault("type") ?? "BOTH"; + var tagName = isHeader ? "header" : "footer"; + var vertAlign = isHeader ? "TOP" : "BOTTOM"; + + // Find secPr in the section document + var doc = sectionRoot.Document ?? sectionRoot.AncestorsAndSelf().Last().Document; + var searchRoot = doc?.Root ?? sectionRoot; + + var secPr = searchRoot.Descendants(HwpxNs.Hp + "secPr").FirstOrDefault() + ?? searchRoot.Descendants().FirstOrDefault(e => e.Name.LocalName == "secPr"); + + if (secPr == null) + throw new CliException("Cannot find in section to add header/footer"); + + // Calculate textWidth/textHeight from pagePr margins + var pagePr = secPr.Element(HwpxNs.Hp + "pagePr"); + var marginEl = pagePr?.Element(HwpxNs.Hp + "margin"); + var pageWidth = (int?)pagePr?.Attribute("width") ?? 59528; + var marginLeft = (int?)marginEl?.Attribute("left") ?? 8504; + var marginRight = (int?)marginEl?.Attribute("right") ?? 8504; + var marginHf = isHeader + ? ((int?)marginEl?.Attribute("header") ?? 4252) + : ((int?)marginEl?.Attribute("footer") ?? 4252); + var textWidth = pageWidth - marginLeft - marginRight; + + // Determine header/footer id — use incremental: headers start at 1, footers at 2 + var existingHfCount = searchRoot.Descendants(HwpxNs.Hp + "header").Count() + + searchRoot.Descendants(HwpxNs.Hp + "footer").Count(); + var hfId = (existingHfCount + 1).ToString(); + + // Create subList with correct dimensions (golden template: id="" empty, textWidth/Height from pagePr) + var subList = new XElement(HwpxNs.Hp + "subList", + new XAttribute("id", ""), + new XAttribute("textDirection", "HORIZONTAL"), + new XAttribute("lineWrap", "BREAK"), + new XAttribute("vertAlign", vertAlign), + new XAttribute("linkListIDRef", "0"), + new XAttribute("linkListNextIDRef", "0"), + new XAttribute("textWidth", textWidth.ToString()), + new XAttribute("textHeight", marginHf.ToString()), + new XAttribute("hasTextRef", "0"), + new XAttribute("hasNumRef", "0"), + new XElement(HwpxNs.Hp + "p", + new XAttribute("id", "0"), + new XAttribute("paraPrIDRef", "0"), + new XAttribute("styleIDRef", "0"), + new XAttribute("pageBreak", "0"), + new XAttribute("columnBreak", "0"), + new XAttribute("merged", "0"), + new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "t", text)), + new XElement(HwpxNs.Hp + "linesegarray", + new XElement(HwpxNs.Hp + "lineseg", + new XAttribute("textpos", "0"), + new XAttribute("vertpos", "0"), + new XAttribute("vertsize", "1000"), + new XAttribute("textheight", "1000"), + new XAttribute("baseline", "850"), + new XAttribute("spacing", "600"), + new XAttribute("horzpos", "0"), + new XAttribute("horzsize", textWidth.ToString()), + new XAttribute("flags", "393216"))))); + + // Create the ctrl element + var hfElement = new XElement(HwpxNs.Hp + tagName, + new XAttribute("id", hfId), + new XAttribute("applyPageType", applyPageType), + subList); + + var ctrlElement = new XElement(HwpxNs.Hp + "ctrl", hfElement); + + // Find the run that contains secPr and add the ctrl there + var secPrRun = secPr.Parent; + if (secPrRun?.Name == HwpxNs.Hp + "run") + { + // Insert ctrl after secPr run, or find existing body run to prepend + var bodyRun = secPrRun.ElementsAfterSelf(HwpxNs.Hp + "run").FirstOrDefault(); + if (bodyRun != null) + { + // Add ctrl at the beginning of the body run + bodyRun.AddFirst(ctrlElement); + } + else + { + // No body run yet — create one with the ctrl and body text placeholder + var newRun = new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", "0"), + ctrlElement); + secPrRun.AddAfterSelf(newRun); + } + } + else + { + // Fallback: add as new run in first paragraph + var firstP = searchRoot.Descendants(HwpxNs.Hp + "p").FirstOrDefault(); + if (firstP != null) + { + firstP.Add(new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", "0"), + ctrlElement)); + } + } + + return hfElement; + } + + // ==================== Image Helpers ==================== + + /// Find the next available image index by scanning content.hpf manifest. + private int GetNextImageId() + { + var hpfEntry = _doc.Archive.GetEntry("Contents/content.hpf"); + if (hpfEntry == null) return 1; + + using var stream = hpfEntry.Open(); + var hpf = LoadAndNormalize(stream); + var maxId = 0; + foreach (var item in hpf.Descendants().Where(e => e.Name.LocalName == "item")) + { + var id = item.Attribute("id")?.Value; + if (id != null && id.StartsWith("image", StringComparison.OrdinalIgnoreCase)) + { + if (int.TryParse(id.AsSpan("image".Length), out var num) && num > maxId) + maxId = num; + } + } + return maxId + 1; + } + + /// Register an image item in content.hpf manifest. + private void RegisterImageInManifest(string itemId, string href, string mediaType) + { + var hpfEntry = _doc.Archive.GetEntry("Contents/content.hpf"); + if (hpfEntry == null) + throw new CliException("Cannot find Contents/content.hpf in HWPX archive"); + + XDocument hpf; + using (var stream = hpfEntry.Open()) + hpf = LoadAndNormalize(stream); + + // Add item to manifest (inside ) + var manifest = hpf.Descendants().FirstOrDefault(e => e.Name.LocalName == "manifest"); + if (manifest == null) + throw new CliException("Cannot find in content.hpf"); + + manifest.Add(new XElement(HwpxNs.Opf + "item", + new XAttribute("id", itemId), + new XAttribute("href", href), + new XAttribute("media-type", mediaType), + new XAttribute("isEmbeded", "1"))); + + // Save back to ZIP + var entryName = hpfEntry.FullName; + hpfEntry.Delete(); + var newEntry = _doc.Archive.CreateEntry(entryName, System.IO.Compression.CompressionLevel.Optimal); + using var outStream = newEntry.Open(); + var xmlStr = HwpxPacker.MinifyXml(hpf.ToString(SaveOptions.DisableFormatting)); + xmlStr = HwpxPacker.RestoreOriginalNamespaces(xmlStr); + xmlStr = "" + xmlStr; + var bytes = System.Text.Encoding.UTF8.GetBytes(xmlStr); + outStream.Write(bytes, 0, bytes.Length); + } + + /// + /// Parse a dimension string (e.g. "2in", "50mm", "100pt", "5cm") to HWPX units (HWPUNIT). + /// 1 inch = 7200 HWPUNIT, 1mm ≈ 283.46 HWPUNIT, 1pt = 100 HWPUNIT, 1cm = 2834.6 HWPUNIT. + /// A4 width = 59528 HWPUNIT ≈ 210mm. + /// + private static int ParseDimensionToHwpUnit(string dim) + { + dim = dim.Trim(); + if (int.TryParse(dim, out var rawVal)) return rawVal; // already in HWPUNIT + + // Extract numeric part and unit + var i = 0; + while (i < dim.Length && (char.IsDigit(dim[i]) || dim[i] == '.')) + i++; + if (i == 0) return 14400; // default 2in + + var number = double.Parse(dim[..i], System.Globalization.CultureInfo.InvariantCulture); + var unit = dim[i..].Trim().ToLowerInvariant(); + + return unit switch + { + "in" or "inch" => (int)(number * 7200), + "mm" => (int)(number * 283.46), + "cm" => (int)(number * 2834.6), + "pt" => (int)(number * 100), + "hwp" => (int)number, + _ => (int)(number * 7200) // default to inches + }; + } + + // ==================== Equation ==================== + + /// + /// Create an equation paragraph. Uses hp:equation (NOT hp:eqEdit — eqEdit is HWP5 class name). + /// Structure: hp:p > hp:run > hp:equation (ShapeObject) > hp:script. + /// Props: script (required), font, mode (LINE|CHAR), color. + /// Based on hwp_recog/18 (hwpxlib confirmed structure). + /// + private XElement CreateEquation(Dictionary? props) + { + var script = props?.GetValueOrDefault("script") + ?? props?.GetValueOrDefault("formula") + ?? throw new CliException("equation requires 'script' or 'formula' property") + { Code = "invalid_prop" }; + + // Golden template analysis (test_eq_golden.hwpx): + // version="Equation Version 60", font="HancomEQN", lineMode="CHAR" + // numberingType="EQUATION", textWrap="TOP_AND_BOTTOM", textFlow="BOTH_SIDES" + // horzRelTo="PARA", vertAlign="TOP", outMargin top/bottom="0" + // NO , HAS , + var font = props?.GetValueOrDefault("font") ?? "HancomEQN"; + var lineMode = props?.GetValueOrDefault("mode")?.ToUpperInvariant() ?? "CHAR"; + var textColor = props?.GetValueOrDefault("color") ?? "#000000"; + var baseUnit = props?.GetValueOrDefault("baseunit") ?? "1000"; + + // Default sz: Hancom auto-calculates based on equation complexity. + // Golden examples: simple eq ~3700x975, fraction ~5500x2250, sigma ~2000x2700 + // Use moderate defaults; Hancom recalculates on open. + var width = props?.GetValueOrDefault("width") ?? "5000"; + var height = props?.GetValueOrDefault("height") ?? "2000"; + + if (int.TryParse(width, out _) == false) + width = ParseDimensionToHwpUnit(width).ToString(); + if (int.TryParse(height, out _) == false) + height = ParseDimensionToHwpUnit(height).ToString(); + + var equation = new XElement(HwpxNs.Hp + "equation", + new XAttribute("id", NewId()), + new XAttribute("zOrder", "0"), + new XAttribute("numberingType", "EQUATION"), + new XAttribute("textWrap", "TOP_AND_BOTTOM"), + new XAttribute("textFlow", "BOTH_SIDES"), + new XAttribute("lock", "0"), + new XAttribute("dropcapstyle", "None"), + new XAttribute("version", "Equation Version 60"), + new XAttribute("baseLine", "85"), + new XAttribute("textColor", textColor), + new XAttribute("baseUnit", baseUnit), + new XAttribute("lineMode", lineMode), + new XAttribute("font", font), + // ShapeObject children (order matches golden template) + new XElement(HwpxNs.Hp + "sz", + new XAttribute("width", width), + new XAttribute("widthRelTo", "ABSOLUTE"), + new XAttribute("height", height), + new XAttribute("heightRelTo", "ABSOLUTE"), + new XAttribute("protect", "0")), + new XElement(HwpxNs.Hp + "pos", + new XAttribute("treatAsChar", "1"), + new XAttribute("affectLSpacing", "0"), + new XAttribute("flowWithText", "1"), + new XAttribute("allowOverlap", "0"), + new XAttribute("holdAnchorAndSO", "0"), + new XAttribute("vertRelTo", "PARA"), + new XAttribute("horzRelTo", "PARA"), + new XAttribute("vertAlign", "TOP"), + new XAttribute("horzAlign", "LEFT"), + new XAttribute("vertOffset", "0"), + new XAttribute("horzOffset", "0")), + new XElement(HwpxNs.Hp + "outMargin", + new XAttribute("left", "56"), + new XAttribute("right", "56"), + new XAttribute("top", "0"), + new XAttribute("bottom", "0")), + // NO — golden template doesn't have it + new XElement(HwpxNs.Hp + "shapeComment", "수식입니다."), + // xml:space="preserve" + trailing newline matches golden + new XElement(HwpxNs.Hp + "script", + new XAttribute(XNamespace.Xml + "space", "preserve"), + script + "\n")); + + // Wrap: hp:p > hp:run > hp:equation + hp:t (empty, matches golden) + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("styleIDRef", "0"), + new XAttribute("paraPrIDRef", "0"), + new XAttribute("pageBreak", "0"), + new XAttribute("columnBreak", "0"), + new XAttribute("merged", "0"), + new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", "0"), + equation, + new XElement(HwpxNs.Hp + "t"))); + } + + /// + /// Generate a unique numeric ID string. + /// Hancom requires numeric IDs (not hex) for elements to render properly. + /// Uses a high base + random offset to avoid collisions with existing IDs. + /// + private static long _idCounter = 2000000000L + Random.Shared.Next(0, 100000000); + private string NewId() + { + return Interlocked.Increment(ref _idCounter).ToString(); + } + + // ==================== Fields (golden 기반: CLICK_HERE, SUMMERY, PATH) ==================== + + private XElement CreateFormField(Dictionary? props) + { + var formFieldType = props?.GetValueOrDefault("formfieldtype") + ?? props?.GetValueOrDefault("type") + ?? "text"; + + return formFieldType.ToLowerInvariant() switch + { + "text" or "clickhere" or "click_here" => CreateField(MergeProps(props, "type", "CLICK_HERE")), + "checkbox" or "check" => CreateField(MergeProps(props, "type", "CHECKBOX")), + "dropdown" or "drop" => CreateField(MergeProps(props, "type", "DROPDOWN")), + _ => throw new CliException($"Unsupported HWPX form field type: {formFieldType}") + { + Code = "invalid_prop", + Suggestion = "Use type=text, type=checkbox, or type=dropdown." + } + }; + } + + /// + /// Create a field (fieldBegin + display text + fieldEnd). + /// Props: "type" (required), "text" (display), "command", "direction" + /// Golden structure: hp:ctrl > hp:fieldBegin with parameters > hp:run > hp:t > hp:ctrl > hp:fieldEnd + /// + private XElement CreateField(Dictionary? props) + { + var fieldType = props?.GetValueOrDefault("type")?.ToUpperInvariant() + ?? throw new CliException("field requires 'type' property") { Code = "invalid_prop" }; + var displayText = props?.GetValueOrDefault("text") ?? GetDefaultFieldText(fieldType, props); + var fieldId = NewId(); + var instId = NewId(); + var fieldName = props?.GetValueOrDefault("name") ?? ""; + var metaTag = props?.GetValueOrDefault("metatag") ?? props?.GetValueOrDefault("metaTag") ?? ""; + + // Build fieldBegin with golden-accurate attributes + var editable = fieldType is "CLICK_HERE" or "CHECKBOX" or "DROPDOWN" + ? "1" + : (props?.GetValueOrDefault("editable") ?? "0"); + var dirty = fieldType is "CLICK_HERE" or "CHECKBOX" or "DROPDOWN" ? "1" : "0"; + + var fieldBeginEl = new XElement(HwpxNs.Hp + "fieldBegin", + new XAttribute("id", instId), + new XAttribute("type", fieldType), + new XAttribute("name", fieldName), + new XAttribute("editable", editable), + new XAttribute("dirty", dirty), + new XAttribute("zorder", "-1"), + new XAttribute("fieldid", fieldId), + new XAttribute("metaTag", metaTag)); + + // Add parameters based on field type (golden structure) + var parameters = BuildFieldParameters(fieldType, props); + if (parameters != null) + fieldBeginEl.Add(parameters); + + // Golden structure: separate runs for fieldBegin, text, fieldEnd + // CLICK_HERE display text uses red italic charPr (golden: textColor=#FF0000 + italic) + var textCharPr = "0"; + if (fieldType == "CLICK_HERE") + textCharPr = EnsureClickHereCharPr().ToString(); + + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("styleIDRef", "0"), + new XAttribute("paraPrIDRef", "0"), + new XElement(HwpxNs.Hp + "run", new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "ctrl", fieldBeginEl)), + new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", textCharPr), + new XElement(HwpxNs.Hp + "t", displayText)), + new XElement(HwpxNs.Hp + "run", new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "ctrl", + new XElement(HwpxNs.Hp + "fieldEnd", + new XAttribute("beginIDRef", instId), + new XAttribute("fieldid", fieldId))), + new XElement(HwpxNs.Hp + "t"))); + } + + private XElement? BuildFieldParameters(string fieldType, Dictionary? props) + { + return fieldType switch + { + "CLICK_HERE" => new XElement(HwpxNs.Hp + "parameters", + new XAttribute("cnt", props != null && props.ContainsKey("maxlength") ? "4" : "3"), new XAttribute("name", ""), + new XElement(HwpxNs.Hp + "integerParam", new XAttribute("name", "Prop"), "9"), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Command"), + new XAttribute(XNamespace.Xml + "space", "preserve"), + $"Clickhere:set:66:Direction:wstring:{GetClickHereDirection(props).Length}:{GetClickHereDirection(props)} HelpState:wstring:0: "), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Direction"), + GetClickHereDirection(props)), + CreateOptionalIntegerParam("MaxLength", props?.GetValueOrDefault("maxlength"))), + + "CHECKBOX" => new XElement(HwpxNs.Hp + "parameters", + new XAttribute("cnt", "4"), new XAttribute("name", ""), + new XElement(HwpxNs.Hp + "integerParam", new XAttribute("name", "Prop"), "9"), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Command"), + new XAttribute(XNamespace.Xml + "space", "preserve"), + $"CheckBox:set:7:Checked:int:{(IsChecked(props) ? "1" : "0")} Label:wstring:{GetCheckboxLabel(props).Length}:{GetCheckboxLabel(props)}"), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Checked"), + IsChecked(props) ? "1" : "0"), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Label"), + GetCheckboxLabel(props))), + + "DROPDOWN" => new XElement(HwpxNs.Hp + "parameters", + new XAttribute("cnt", "4"), new XAttribute("name", ""), + new XElement(HwpxNs.Hp + "integerParam", new XAttribute("name", "Prop"), "9"), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Command"), + new XAttribute(XNamespace.Xml + "space", "preserve"), + $"Dropdown:set:12:Items:wstring:{GetDropdownItemsValue(props).Length}:{GetDropdownItemsValue(props)} SelectedIndex:int:{ResolveDropdownSelectedIndex(props)}"), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Items"), + GetDropdownItemsValue(props)), + new XElement(HwpxNs.Hp + "integerParam", new XAttribute("name", "SelectedIndex"), + ResolveDropdownSelectedIndex(props).ToString())), + + "SUMMERY" => new XElement(HwpxNs.Hp + "parameters", + new XAttribute("cnt", "3"), new XAttribute("name", ""), + new XElement(HwpxNs.Hp + "integerParam", new XAttribute("name", "Prop"), "8"), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Command"), + props?.GetValueOrDefault("command") ?? "$createtime"), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Property"), + props?.GetValueOrDefault("command") ?? "$createtime")), + + "PATH" => new XElement(HwpxNs.Hp + "parameters", + new XAttribute("cnt", "3"), new XAttribute("name", ""), + new XElement(HwpxNs.Hp + "integerParam", new XAttribute("name", "Prop"), "8"), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Command"), + props?.GetValueOrDefault("format") ?? "$P$F"), + new XElement(HwpxNs.Hp + "stringParam", new XAttribute("name", "Format"), + props?.GetValueOrDefault("format") ?? "$P$F")), + + _ => null + }; + } + + private static string GetDefaultFieldText(string type, Dictionary? props) => type switch + { + "DATE" => DateTime.Now.ToString("yyyy-MM-dd"), + "PATH" => "(filepath)", + "CHECKBOX" => IsChecked(props) + ? (props?.GetValueOrDefault("checkedtext") ?? "☑") + : (props?.GetValueOrDefault("uncheckedtext") ?? "☐"), + "DROPDOWN" => ResolveDropdownValue(props), + "CLICK_HERE" => props?.GetValueOrDefault("text") + ?? props?.GetValueOrDefault("defaultvalue") + ?? props?.GetValueOrDefault("direction") + ?? "이곳을 마우스로 누르고 내용을 입력하세요.", + "SUMMERY" => props?.GetValueOrDefault("command") switch + { + "$lastsaveby" => "(author)", + "$createtime" => DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss"), + _ => "(summary)" + }, + _ => $"({type})" + }; + + /// Ensure a red italic charPr exists for CLICK_HERE display text. Returns its ID. + private int EnsureClickHereCharPr() + { + if (_doc.Header?.Root == null) return 0; + // Check if a red italic charPr already exists + var charPrs = _doc.Header.Root.Descendants(HwpxNs.Hh + "charPr").ToList(); + foreach (var cp in charPrs) + { + if (cp.Attribute("textColor")?.Value == "#FF0000" + && cp.Element(HwpxNs.Hh + "italic") != null) + return int.TryParse(cp.Attribute("id")?.Value, out var existingId) ? existingId : 0; + } + // Create new charPr: clone charPr 0 + set red + italic + var baseCharPr = charPrs.FirstOrDefault(c => c.Attribute("id")?.Value == "0"); + if (baseCharPr == null) return 0; + var newCharPr = new XElement(baseCharPr); + var newId = charPrs.Select(c => int.TryParse(c.Attribute("id")?.Value, out var i) ? i : 0).Max() + 1; + newCharPr.SetAttributeValue("id", newId.ToString()); + newCharPr.SetAttributeValue("textColor", "#FF0000"); + // Add italic if missing + if (newCharPr.Element(HwpxNs.Hh + "italic") == null) + newCharPr.Add(new XElement(HwpxNs.Hh + "italic")); + var container = baseCharPr.Parent!; + container.Add(newCharPr); + container.SetAttributeValue("itemCnt", container.Elements(HwpxNs.Hh + "charPr").Count().ToString()); + SaveHeader(); + return newId; + } + + private static string GetClickHereDirection(Dictionary? props) + { + return props?.GetValueOrDefault("direction") + ?? props?.GetValueOrDefault("defaultvalue") + ?? props?.GetValueOrDefault("text") + ?? "이곳을 마우스로 누르고 내용을 입력하세요."; + } + + private static bool IsChecked(Dictionary? props) + { + var raw = props?.GetValueOrDefault("checked") + ?? props?.GetValueOrDefault("value") + ?? props?.GetValueOrDefault("text"); + return raw != null && ParseHelpers.IsTruthy(raw); + } + + private static string GetCheckboxLabel(Dictionary? props) + { + return props?.GetValueOrDefault("label") + ?? props?.GetValueOrDefault("name") + ?? ""; + } + + private static string[] ParseDropdownOptions(Dictionary? props) + { + var raw = props?.GetValueOrDefault("options") + ?? props?.GetValueOrDefault("items") + ?? ""; + return raw.Split(',', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries); + } + + private static string GetDropdownItemsValue(Dictionary? props) + { + return string.Join("|", ParseDropdownOptions(props)); + } + + private static int ResolveDropdownSelectedIndex(Dictionary? props) + { + var options = ParseDropdownOptions(props); + if (options.Length == 0) return 0; + + if (int.TryParse(props?.GetValueOrDefault("selectedindex"), out var parsedIndex)) + return Math.Clamp(parsedIndex, 0, options.Length - 1); + + var selectedValue = props?.GetValueOrDefault("value") ?? props?.GetValueOrDefault("text"); + if (!string.IsNullOrEmpty(selectedValue)) + { + var matchedIndex = Array.FindIndex(options, option => + string.Equals(option, selectedValue, StringComparison.Ordinal)); + if (matchedIndex >= 0) return matchedIndex; + } + + return 0; + } + + private static string ResolveDropdownValue(Dictionary? props) + { + var explicitValue = props?.GetValueOrDefault("text") ?? props?.GetValueOrDefault("value"); + if (!string.IsNullOrEmpty(explicitValue)) + return explicitValue; + + var options = ParseDropdownOptions(props); + if (options.Length == 0) return "(dropdown)"; + + return options[ResolveDropdownSelectedIndex(props)]; + } + + private static XElement? CreateOptionalIntegerParam(string name, string? value) + { + return int.TryParse(value, out var parsed) && parsed > 0 + ? new XElement(HwpxNs.Hp + "integerParam", new XAttribute("name", name), parsed.ToString()) + : null; + } + + private static Dictionary MergeProps(Dictionary? props, string key, string value) + { + var result = props != null ? new Dictionary(props, StringComparer.OrdinalIgnoreCase) : new(StringComparer.OrdinalIgnoreCase); + result[key] = value; + return result; + } + + private static Dictionary MergeProps(Dictionary? props, string key1, string value1, string key2, string value2) + { + var result = MergeProps(props, key1, value1); + result[key2] = value2; + return result; + } + + // ==================== Style ==================== + + private XElement CreateStyleElement(Dictionary props) + { + var name = props.GetValueOrDefault("name") + ?? throw new CliException("style requires 'name'") { Code = "invalid_prop" }; + var engName = props.GetValueOrDefault("engname") ?? name; + var type = props.GetValueOrDefault("type")?.ToUpperInvariant() ?? "PARA"; + + var styles = _doc.Header!.Root!.Descendants(HwpxNs.Hh + "style").ToList(); + var maxId = styles.Select(s => int.TryParse(s.Attribute("id")?.Value, out var i) ? i : 0) + .DefaultIfEmpty(0).Max(); + var newId = (maxId + 1).ToString(); + + var style = new XElement(HwpxNs.Hh + "style", + new XAttribute("id", newId), + new XAttribute("type", type), + new XAttribute("name", name), + new XAttribute("engName", engName), + new XAttribute("paraPrIDRef", CloneParaPrForNewStyle().ToString()), + new XAttribute("charPrIDRef", CloneCharPrForNewStyle().ToString()), + new XAttribute("nextStyleIDRef", newId)); + + var container = _doc.Header.Root.Descendants(HwpxNs.Hh + "styles").FirstOrDefault(); + if (container != null) + { + container.Add(style); + container.SetAttributeValue("itemCnt", + container.Elements(HwpxNs.Hh + "style").Count().ToString()); + } + SaveHeader(); + return style; + } + + // ==================== Shapes (golden54 기반) ==================== + + /// Create common shape child elements matching golden template structure. + private XElement[] CreateShapeCommonChildren(int w, int h, Dictionary? props) + { + var x = int.TryParse(props?.GetValueOrDefault("x"), out var xv) ? xv : 0; + var y = int.TryParse(props?.GetValueOrDefault("y"), out var yv) ? yv : 0; + var treatAsChar = props?.GetValueOrDefault("wrap")?.Equals("char", StringComparison.OrdinalIgnoreCase) == true; + var cx = w / 2; var cy = h / 2; + + XElement identity(string name) => new XElement(HwpxNs.Hc + name, + new XAttribute("e1", "1"), new XAttribute("e2", "0"), new XAttribute("e3", "0"), + new XAttribute("e4", "0"), new XAttribute("e5", "1"), new XAttribute("e6", "0")); + + return new XElement[] + { + new XElement(HwpxNs.Hp + "offset", new XAttribute("x", "0"), new XAttribute("y", "0")), + new XElement(HwpxNs.Hp + "orgSz", new XAttribute("width", w), new XAttribute("height", h)), + new XElement(HwpxNs.Hp + "curSz", new XAttribute("width", "0"), new XAttribute("height", "0")), + new XElement(HwpxNs.Hp + "flip", new XAttribute("horizontal", "0"), new XAttribute("vertical", "0")), + new XElement(HwpxNs.Hp + "rotationInfo", + new XAttribute("angle", "0"), new XAttribute("centerX", cx), new XAttribute("centerY", cy), + new XAttribute("rotateimage", "1")), + new XElement(HwpxNs.Hp + "renderingInfo", identity("transMatrix"), identity("scaMatrix"), identity("rotMatrix")), + }; + } + + private XElement CreateShapeSzPosMargin(int w, int h, Dictionary? props) + { + // Returns a container — caller extracts children + var x = int.TryParse(props?.GetValueOrDefault("x"), out var xv) ? xv : 0; + var y = int.TryParse(props?.GetValueOrDefault("y"), out var yv) ? yv : 0; + var treatAsChar = props?.GetValueOrDefault("wrap")?.Equals("char", StringComparison.OrdinalIgnoreCase) == true; + + return new XElement("_tmp", + new XElement(HwpxNs.Hp + "sz", + new XAttribute("width", w), new XAttribute("widthRelTo", "ABSOLUTE"), + new XAttribute("height", h), new XAttribute("heightRelTo", "ABSOLUTE"), + new XAttribute("protect", "0")), + new XElement(HwpxNs.Hp + "pos", + new XAttribute("treatAsChar", treatAsChar ? "1" : "0"), + new XAttribute("affectLSpacing", "0"), new XAttribute("flowWithText", "1"), + new XAttribute("allowOverlap", treatAsChar ? "1" : "0"), + new XAttribute("holdAnchorAndSO", "0"), + new XAttribute("vertRelTo", "PARA"), new XAttribute("horzRelTo", "COLUMN"), + new XAttribute("vertAlign", "TOP"), new XAttribute("horzAlign", "LEFT"), + new XAttribute("vertOffset", y), new XAttribute("horzOffset", x)), + new XElement(HwpxNs.Hp + "outMargin", + new XAttribute("left", "0"), new XAttribute("right", "0"), + new XAttribute("top", "0"), new XAttribute("bottom", "0"))); + } + + private static XElement CreateLineShapeElement(Dictionary? props) + { + var color = props?.GetValueOrDefault("color") ?? props?.GetValueOrDefault("linecolor") ?? "#000000"; + var width = props?.GetValueOrDefault("linewidth") ?? "33"; + var style = props?.GetValueOrDefault("linestyle") ?? "SOLID"; + return new XElement(HwpxNs.Hp + "lineShape", + new XAttribute("color", color), new XAttribute("width", width), + new XAttribute("style", style), new XAttribute("endCap", "FLAT"), + new XAttribute("headStyle", "NORMAL"), new XAttribute("tailStyle", "NORMAL"), + new XAttribute("headfill", "1"), new XAttribute("tailfill", "1"), + new XAttribute("headSz", "MEDIUM_MEDIUM"), new XAttribute("tailSz", "MEDIUM_MEDIUM"), + new XAttribute("outlineStyle", "NORMAL"), new XAttribute("alpha", "0")); + } + + private static XElement? CreateFillBrush(Dictionary? props) + { + var fill = props?.GetValueOrDefault("fillcolor") ?? props?.GetValueOrDefault("fill"); + if (fill == null) return null; + return new XElement(HwpxNs.Hc + "fillBrush", + new XElement(HwpxNs.Hc + "winBrush", + new XAttribute("faceColor", fill), new XAttribute("hatchColor", "#000000"), + new XAttribute("alpha", "0"))); + } + + private XElement CreateLine(Dictionary? props) + { + var w = int.TryParse(props?.GetValueOrDefault("width"), out var wv) ? wv : 20000; + var h = 43; // golden: line height is minimal + var common = CreateShapeCommonChildren(w, h, props); + var szPosMargin = CreateShapeSzPosMargin(w, h, props); + + var line = new XElement(HwpxNs.Hp + "line", + new XAttribute("id", NewId()), new XAttribute("zOrder", "0"), + new XAttribute("numberingType", "PICTURE"), + new XAttribute("textWrap", "TOP_AND_BOTTOM"), new XAttribute("textFlow", "BOTH_SIDES"), + new XAttribute("lock", "0"), new XAttribute("dropcapstyle", "None"), + new XAttribute("href", ""), new XAttribute("groupLevel", "0"), + new XAttribute("instid", NewId()), new XAttribute("isReverseHV", "0")); + line.Add(common); + line.Add(CreateLineShapeElement(props)); + line.Add(new XElement(HwpxNs.Hp + "shadow", + new XAttribute("type", "NONE"), new XAttribute("color", "#B2B2B2"), + new XAttribute("offsetX", "0"), new XAttribute("offsetY", "0"), new XAttribute("alpha", "0"))); + line.Add(new XElement(HwpxNs.Hc + "startPt", new XAttribute("x", "0"), new XAttribute("y", h))); + line.Add(new XElement(HwpxNs.Hc + "endPt", new XAttribute("x", w), new XAttribute("y", "0"))); + line.Add(szPosMargin.Elements()); + line.Add(new XElement(HwpxNs.Hp + "shapeComment", "선입니다.")); + + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), new XAttribute("styleIDRef", "0"), new XAttribute("paraPrIDRef", "0"), + new XElement(HwpxNs.Hp + "run", new XAttribute("charPrIDRef", "0"), line, + new XElement(HwpxNs.Hp + "t", ""))); + } + + private XElement CreateRect(Dictionary? props) + { + var w = int.TryParse(props?.GetValueOrDefault("width"), out var wv) ? wv : 20000; + var h = int.TryParse(props?.GetValueOrDefault("height"), out var hv) ? hv : 10000; + var text = props?.GetValueOrDefault("text"); + var common = CreateShapeCommonChildren(w, h, props); + var szPosMargin = CreateShapeSzPosMargin(w, h, props); + var fillBrush = CreateFillBrush(props); + + var rect = new XElement(HwpxNs.Hp + "rect", + new XAttribute("id", NewId()), new XAttribute("zOrder", "0"), + new XAttribute("numberingType", "PICTURE"), + new XAttribute("textWrap", "TOP_AND_BOTTOM"), new XAttribute("textFlow", "BOTH_SIDES"), + new XAttribute("lock", "0"), new XAttribute("dropcapstyle", "None"), + new XAttribute("href", ""), new XAttribute("groupLevel", "0"), + new XAttribute("instid", NewId()), new XAttribute("ratio", "0")); + rect.Add(common); + rect.Add(CreateLineShapeElement(props)); + if (fillBrush != null) rect.Add(fillBrush); + rect.Add(new XElement(HwpxNs.Hp + "shadow", + new XAttribute("type", "NONE"), new XAttribute("color", "#B2B2B2"), + new XAttribute("offsetX", "0"), new XAttribute("offsetY", "0"), new XAttribute("alpha", "178"))); + // drawText (text inside shape) + if (text != null) + { + rect.Add(new XElement(HwpxNs.Hp + "drawText", + new XAttribute("lastWidth", w), new XAttribute("name", ""), new XAttribute("editable", "0"), + new XElement(HwpxNs.Hp + "subList", + new XAttribute("id", ""), new XAttribute("textDirection", "HORIZONTAL"), + new XAttribute("lineWrap", "BREAK"), new XAttribute("vertAlign", "CENTER"), + new XAttribute("linkListIDRef", "0"), new XAttribute("linkListNextIDRef", "0"), + new XAttribute("textWidth", "0"), new XAttribute("textHeight", "0"), + new XAttribute("hasTextRef", "0"), new XAttribute("hasNumRef", "0"), + new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), new XAttribute("styleIDRef", "0"), new XAttribute("paraPrIDRef", "0"), + new XElement(HwpxNs.Hp + "run", new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "t", text)))), + new XElement(HwpxNs.Hp + "textMargin", + new XAttribute("left", "283"), new XAttribute("right", "283"), + new XAttribute("top", "283"), new XAttribute("bottom", "283")))); + } + // Corner points + rect.Add(new XElement(HwpxNs.Hc + "pt0", new XAttribute("x", "0"), new XAttribute("y", "0"))); + rect.Add(new XElement(HwpxNs.Hc + "pt1", new XAttribute("x", w), new XAttribute("y", "0"))); + rect.Add(new XElement(HwpxNs.Hc + "pt2", new XAttribute("x", w), new XAttribute("y", h))); + rect.Add(new XElement(HwpxNs.Hc + "pt3", new XAttribute("x", "0"), new XAttribute("y", h))); + rect.Add(szPosMargin.Elements()); + + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), new XAttribute("styleIDRef", "0"), new XAttribute("paraPrIDRef", "0"), + new XElement(HwpxNs.Hp + "run", new XAttribute("charPrIDRef", "0"), rect, + new XElement(HwpxNs.Hp + "t", ""))); + } + + private XElement CreateEllipse(Dictionary? props) + { + var w = int.TryParse(props?.GetValueOrDefault("width"), out var wv) ? wv : 15000; + var h = int.TryParse(props?.GetValueOrDefault("height"), out var hv) ? hv : 10000; + var common = CreateShapeCommonChildren(w, h, props); + var szPosMargin = CreateShapeSzPosMargin(w, h, props); + var fillBrush = CreateFillBrush(props); + var cx = w / 2; var cy = h / 2; + + var ellipse = new XElement(HwpxNs.Hp + "ellipse", + new XAttribute("id", NewId()), new XAttribute("zOrder", "0"), + new XAttribute("numberingType", "PICTURE"), + new XAttribute("textWrap", "TOP_AND_BOTTOM"), new XAttribute("textFlow", "BOTH_SIDES"), + new XAttribute("lock", "0"), new XAttribute("dropcapstyle", "None"), + new XAttribute("href", ""), new XAttribute("groupLevel", "0"), + new XAttribute("instid", NewId()), + new XAttribute("intervalDirty", "0"), new XAttribute("hasArcPr", "0"), + new XAttribute("arcType", "NORMAL")); + ellipse.Add(common); + ellipse.Add(CreateLineShapeElement(props)); + if (fillBrush != null) ellipse.Add(fillBrush); + ellipse.Add(new XElement(HwpxNs.Hp + "shadow", + new XAttribute("type", "NONE"), new XAttribute("color", "#B2B2B2"), + new XAttribute("offsetX", "0"), new XAttribute("offsetY", "0"), new XAttribute("alpha", "178"))); + // Ellipse geometry + ellipse.Add(new XElement(HwpxNs.Hc + "center", new XAttribute("x", cx), new XAttribute("y", cy))); + ellipse.Add(new XElement(HwpxNs.Hc + "ax1", new XAttribute("x", w), new XAttribute("y", cy))); + ellipse.Add(new XElement(HwpxNs.Hc + "ax2", new XAttribute("x", cx), new XAttribute("y", "0"))); + ellipse.Add(new XElement(HwpxNs.Hc + "start1", new XAttribute("x", "0"), new XAttribute("y", "0"))); + ellipse.Add(new XElement(HwpxNs.Hc + "end1", new XAttribute("x", "0"), new XAttribute("y", "0"))); + ellipse.Add(new XElement(HwpxNs.Hc + "start2", new XAttribute("x", "0"), new XAttribute("y", "0"))); + ellipse.Add(new XElement(HwpxNs.Hc + "end2", new XAttribute("x", "0"), new XAttribute("y", "0"))); + ellipse.Add(szPosMargin.Elements()); + + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), new XAttribute("styleIDRef", "0"), new XAttribute("paraPrIDRef", "0"), + new XElement(HwpxNs.Hp + "run", new XAttribute("charPrIDRef", "0"), ellipse, + new XElement(HwpxNs.Hp + "t", ""))); + } + + private XElement CreateTextBox(Dictionary? props) + { + // TextBox = rect with drawText and default fill + var merged = new Dictionary(props ?? new(), StringComparer.OrdinalIgnoreCase); + if (!merged.ContainsKey("fill") && !merged.ContainsKey("fillcolor")) + merged["fillcolor"] = "#FFFFFF"; + if (!merged.ContainsKey("text")) + merged["text"] = ""; + return CreateRect(merged); + } + + private XElement CreatePolygon(Dictionary? props) + { + var w = int.TryParse(props?.GetValueOrDefault("width"), out var wv) ? wv : 15000; + var h = int.TryParse(props?.GetValueOrDefault("height"), out var hv) ? hv : 15000; + var sides = int.TryParse(props?.GetValueOrDefault("sides"), out var sv) ? sv : 5; + var text = props?.GetValueOrDefault("text"); + var common = CreateShapeCommonChildren(w, h, props); + var szPosMargin = CreateShapeSzPosMargin(w, h, props); + var fillBrush = CreateFillBrush(props); + + var polygon = new XElement(HwpxNs.Hp + "polygon", + new XAttribute("id", NewId()), new XAttribute("zOrder", "0"), + new XAttribute("numberingType", "PICTURE"), + new XAttribute("textWrap", "TOP_AND_BOTTOM"), new XAttribute("textFlow", "BOTH_SIDES"), + new XAttribute("lock", "0"), new XAttribute("dropcapstyle", "None"), + new XAttribute("href", ""), new XAttribute("groupLevel", "0"), + new XAttribute("instid", NewId())); + polygon.Add(common); + polygon.Add(CreateLineShapeElement(props)); + if (fillBrush != null) polygon.Add(fillBrush); + polygon.Add(new XElement(HwpxNs.Hp + "shadow", + new XAttribute("type", "NONE"), new XAttribute("color", "#B2B2B2"), + new XAttribute("offsetX", "0"), new XAttribute("offsetY", "0"), new XAttribute("alpha", "0"))); + // drawText + if (text != null) + { + polygon.Add(new XElement(HwpxNs.Hp + "drawText", + new XAttribute("lastWidth", w), new XAttribute("name", ""), new XAttribute("editable", "0"), + new XElement(HwpxNs.Hp + "subList", + new XAttribute("id", ""), new XAttribute("textDirection", "HORIZONTAL"), + new XAttribute("lineWrap", "BREAK"), new XAttribute("vertAlign", "CENTER"), + new XAttribute("linkListIDRef", "0"), new XAttribute("linkListNextIDRef", "0"), + new XAttribute("textWidth", "0"), new XAttribute("textHeight", "0"), + new XAttribute("hasTextRef", "0"), new XAttribute("hasNumRef", "0"), + new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), new XAttribute("styleIDRef", "0"), new XAttribute("paraPrIDRef", "0"), + new XElement(HwpxNs.Hp + "run", new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "t", text)))), + new XElement(HwpxNs.Hp + "textMargin", + new XAttribute("left", "283"), new XAttribute("right", "283"), + new XAttribute("top", "283"), new XAttribute("bottom", "283")))); + } + // Generate regular polygon vertices in orgSz coordinate space + var cx = w / 2.0; var cy = h / 2.0; + var r = Math.Min(cx, cy); + for (int i = 0; i <= sides; i++) + { + var angle = -Math.PI / 2 + 2 * Math.PI * i / sides; + var px = (int)(cx + r * Math.Cos(angle)); + var py = (int)(cy + r * Math.Sin(angle)); + polygon.Add(new XElement(HwpxNs.Hc + "pt", new XAttribute("x", px), new XAttribute("y", py))); + } + polygon.Add(szPosMargin.Elements()); + + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), new XAttribute("styleIDRef", "0"), new XAttribute("paraPrIDRef", "0"), + new XElement(HwpxNs.Hp + "run", new XAttribute("charPrIDRef", "0"), polygon, + new XElement(HwpxNs.Hp + "t", ""))); + } + + private XElement CreateArrow(Dictionary? props) + { + // Arrow = line with tailStyle arrow + var merged = new Dictionary(props ?? new(), StringComparer.OrdinalIgnoreCase); + merged["linestyle"] = merged.GetValueOrDefault("linestyle") ?? "SOLID"; + var w = int.TryParse(merged.GetValueOrDefault("width"), out var wv) ? wv : 20000; + var h = 43; + var common = CreateShapeCommonChildren(w, h, merged); + var szPosMargin = CreateShapeSzPosMargin(w, h, merged); + var color = merged.GetValueOrDefault("color") ?? "#000000"; + var lineWidth = merged.GetValueOrDefault("linewidth") ?? "33"; + + var line = new XElement(HwpxNs.Hp + "line", + new XAttribute("id", NewId()), new XAttribute("zOrder", "0"), + new XAttribute("numberingType", "PICTURE"), + new XAttribute("textWrap", "TOP_AND_BOTTOM"), new XAttribute("textFlow", "BOTH_SIDES"), + new XAttribute("lock", "0"), new XAttribute("dropcapstyle", "None"), + new XAttribute("href", ""), new XAttribute("groupLevel", "0"), + new XAttribute("instid", NewId()), new XAttribute("isReverseHV", "0")); + line.Add(common); + // Arrow lineShape: tailStyle = arrow + line.Add(new XElement(HwpxNs.Hp + "lineShape", + new XAttribute("color", color), new XAttribute("width", lineWidth), + new XAttribute("style", "SOLID"), new XAttribute("endCap", "FLAT"), + new XAttribute("headStyle", "NORMAL"), new XAttribute("tailStyle", "ARROW"), + new XAttribute("headfill", "1"), new XAttribute("tailfill", "1"), + new XAttribute("headSz", "MEDIUM_MEDIUM"), new XAttribute("tailSz", "MEDIUM_MEDIUM"), + new XAttribute("outlineStyle", "NORMAL"), new XAttribute("alpha", "0"))); + line.Add(new XElement(HwpxNs.Hp + "shadow", + new XAttribute("type", "NONE"), new XAttribute("color", "#B2B2B2"), + new XAttribute("offsetX", "0"), new XAttribute("offsetY", "0"), new XAttribute("alpha", "0"))); + line.Add(new XElement(HwpxNs.Hc + "startPt", new XAttribute("x", "0"), new XAttribute("y", "0"))); + line.Add(new XElement(HwpxNs.Hc + "endPt", new XAttribute("x", w), new XAttribute("y", "0"))); + line.Add(szPosMargin.Elements()); + line.Add(new XElement(HwpxNs.Hp + "shapeComment", "화살표입니다.")); + + return new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), new XAttribute("styleIDRef", "0"), new XAttribute("paraPrIDRef", "0"), + new XElement(HwpxNs.Hp + "run", new XAttribute("charPrIDRef", "0"), line, + new XElement(HwpxNs.Hp + "t", ""))); + } + + // ==================== TOC ==================== + + /// + /// Create a static Table of Contents from document headings. + /// Returns multiple paragraph elements — one title + one per heading. + /// Props: "maxlevel" → max heading depth (default 3), "title" → TOC title (default "목차"). + /// + /// + /// Create a field-based TOC. Hancom can regenerate via "도구 > 필드 업데이트". + /// Golden structure: fieldBegin(TABLEOFCONTENTS) with Command parameter → content → fieldEnd + /// + private List CreateFieldToc(Dictionary? props) + { + var maxLevel = int.TryParse(props?.GetValueOrDefault("maxlevel"), out var ml) ? ml : 3; + var title = props?.GetValueOrDefault("title") ?? "차례"; + var headings = CollectHeadings(maxLevel); + if (headings.Count == 0) + throw new CliException("No headings found in document.") { Code = "no_headings" }; + + var instId = NewId(); + var fieldId = NewId(); + var result = new List(); + + // Golden-accurate fieldBegin with Command parameter + var commandStr = $"TableOfContents:set:140:ContentsMake:uint:31 ContentsStyles:wstring:0: ContentsLevel:int:{maxLevel} ContentsAutoTabRight:int:0 ContentsLeader:int:3 ContentsHyperlink:bool:1 "; + + // Field begin paragraph + result.Add(new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("styleIDRef", "0"), + new XAttribute("paraPrIDRef", "0"), + new XElement(HwpxNs.Hp + "run", new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "ctrl", + new XElement(HwpxNs.Hp + "fieldBegin", + new XAttribute("id", instId), + new XAttribute("type", "TABLEOFCONTENTS"), + new XAttribute("name", ""), + new XAttribute("editable", "1"), + new XAttribute("dirty", "0"), + new XAttribute("zorder", "-1"), + new XAttribute("fieldid", fieldId), + new XAttribute("metaTag", ""), + new XElement(HwpxNs.Hp + "parameters", + new XAttribute("cnt", "2"), new XAttribute("name", ""), + new XElement(HwpxNs.Hp + "integerParam", new XAttribute("name", "Prop"), "8"), + new XElement(HwpxNs.Hp + "stringParam", + new XAttribute("name", "Command"), + new XAttribute(XNamespace.Xml + "space", "preserve"), + commandStr))))))); + + // TOC title + var titleProps = new Dictionary + { ["text"] = title, ["fontsize"] = "14", ["bold"] = "true" }; + var titlePara = CreateParagraph(titleProps); + var structuralKeys = new HashSet(StringComparer.OrdinalIgnoreCase) + { "text", "styleidref", "styleIDRef", "charpridref", "charPrIDRef", "parapridref", "paraPrIDRef" }; + foreach (var (k, v) in titleProps) + if (!structuralKeys.Contains(k)) SetParagraphProp(titlePara, k, v); + result.Add(titlePara); + + // Heading entries + foreach (var (level, text) in headings) + { + var indent = new string('\u3000', level - 1); + var entryProps = new Dictionary { ["text"] = $"{indent}{text}", ["fontsize"] = "9" }; + var entryPara = CreateParagraph(entryProps); + foreach (var (k, v) in entryProps) + if (!structuralKeys.Contains(k)) SetParagraphProp(entryPara, k, v); + result.Add(entryPara); + } + + // Field end paragraph + result.Add(new XElement(HwpxNs.Hp + "p", + new XAttribute("id", NewId()), + new XAttribute("styleIDRef", "0"), + new XAttribute("paraPrIDRef", "0"), + new XElement(HwpxNs.Hp + "run", new XAttribute("charPrIDRef", "0"), + new XElement(HwpxNs.Hp + "ctrl", + new XElement(HwpxNs.Hp + "fieldEnd", + new XAttribute("beginIDRef", instId), + new XAttribute("fieldid", fieldId))), + new XElement(HwpxNs.Hp + "t")))); + + return result; + } + + private List CreateStaticToc(Dictionary? props) + { + var maxLevel = int.TryParse(props?.GetValueOrDefault("maxlevel"), out var ml) ? ml : 3; + var title = props?.GetValueOrDefault("title") ?? "목차"; + + var headings = CollectHeadings(maxLevel); + if (headings.Count == 0) + throw new CliException("No headings found in document. Set outlineLevel on paragraphs first.") + { Code = "no_headings" }; + + var result = new List(); + var structuralKeys = new HashSet(StringComparer.OrdinalIgnoreCase) + { "text", "styleidref", "styleIDRef", "charpridref", "charPrIDRef", + "parapridref", "paraPrIDRef" }; + + // TOC title paragraph — bold, 14pt + var titleProps = new Dictionary + { + ["text"] = title, + ["fontsize"] = "14", + ["bold"] = "true" + }; + var titlePara = CreateParagraph(titleProps); + foreach (var (k, v) in titleProps) + if (!structuralKeys.Contains(k)) SetParagraphProp(titlePara, k, v); + result.Add(titlePara); + + // One paragraph per heading with indentation + foreach (var (level, text) in headings) + { + var indent = new string('\u3000', level - 1); // fullwidth space for visual indent + var entryProps = new Dictionary + { + ["text"] = $"{indent}{text}" + }; + // Sub-headings slightly smaller + if (level >= 2) + entryProps["fontsize"] = "9"; + var entryPara = CreateParagraph(entryProps); + foreach (var (k, v) in entryProps) + if (!structuralKeys.Contains(k)) SetParagraphProp(entryPara, k, v); + result.Add(entryPara); + } + + // Empty paragraph after TOC + result.Add(CreateParagraph(new Dictionary { ["text"] = " " })); + + return result; + } + + /// + /// Collect headings from all sections. + /// Detection: style name match ("개요 N" / "Heading N") OR paraPr > heading element. + /// + private List<(int Level, string Text)> CollectHeadings(int maxLevel) + { + var headings = new List<(int Level, string Text)>(); + + foreach (var (section, para, localIdx) in _doc.AllParagraphs()) + { + int? level = null; + + // Method 1: Style-based detection (same as View.cs GetParagraphStyleInfo) + var styleIdRef = para.Attribute("styleIDRef")?.Value; + if (_doc.Header != null && styleIdRef != null) + { + var style = _doc.Header.Root!.Descendants(HwpxNs.Hh + "style") + .FirstOrDefault(s => s.Attribute("id")?.Value == styleIdRef); + if (style != null) + { + var name = style.Attribute("name")?.Value ?? ""; + var m = System.Text.RegularExpressions.Regex.Match(name, @"개요\s*(\d+)"); + if (!m.Success) + m = System.Text.RegularExpressions.Regex.Match(name, @"(?i)heading\s*(\d+)"); + if (m.Success) + level = int.Parse(m.Groups[1].Value); + } + } + + // Method 2: paraPr > heading element with type="OUTLINE" + if (level == null) + { + var paraPrIdRef = para.Attribute("paraPrIDRef")?.Value; + if (_doc.Header != null && paraPrIdRef != null) + { + var paraPr = _doc.Header.Root!.Descendants(HwpxNs.Hh + "paraPr") + .FirstOrDefault(pp => pp.Attribute("id")?.Value == paraPrIdRef); + var heading = paraPr?.Element(HwpxNs.Hh + "heading"); + if (heading?.Attribute("type")?.Value == "OUTLINE" + && int.TryParse(heading.Attribute("level")?.Value, out var hl)) + level = hl; + } + } + + if (level.HasValue && level.Value >= 1 && level.Value <= maxLevel) + { + var text = ExtractParagraphText(para); + if (!string.IsNullOrWhiteSpace(text)) + headings.Add((level.Value, HwpxKorean.Normalize(text))); + } + } + + return headings; + } + + // ==================== Multi-Section ==================== + + /// + /// Add a new section to the document. Creates section XML, updates manifest, increments secCnt. + /// Props: "orientation" (PORTRAIT/LANDSCAPE), "pageWidth", "pageHeight", "marginTop/Bottom/Left/Right" + /// + private HwpxSection AddNewSection(Dictionary? props) + { + var newIndex = _doc.Sections.Count; + var entryPath = $"Contents/section{newIndex}.xml"; + + // Clone the first section's structure (secPr, namespaces, etc.) for compatibility + var sourceSection = _doc.PrimarySection; + var sourceRoot = sourceSection.Root; + + // Deep-copy the section root element (preserves all namespaces + child structure) + var newRoot = new XElement(sourceRoot); + + // Strip all content paragraphs except the first one (which holds secPr) + var paras = newRoot.Elements(HwpxNs.Hp + "p").ToList(); + for (int i = 1; i < paras.Count; i++) + paras[i].Remove(); + + // Clear text in the first paragraph (keep secPr structure) + var firstPara = newRoot.Elements(HwpxNs.Hp + "p").First(); + firstPara.SetAttributeValue("id", NewId()); + // Remove linesegarray (stale layout cache) + firstPara.Elements(HwpxNs.Hp + "linesegarray").Remove(); + // Clear text content in runs but keep secPr + foreach (var run in firstPara.Elements(HwpxNs.Hp + "run")) + { + foreach (var t in run.Elements(HwpxNs.Hp + "t")) + t.Value = ""; + } + + // Apply orientation if requested + // Hancom: landscape="NARROWLY" (가로), "WIDELY" (세로) — dimensions stay the same + var landscape = props?.GetValueOrDefault("orientation")?.Equals("LANDSCAPE", StringComparison.OrdinalIgnoreCase) == true; + if (landscape) + { + var pagePr = newRoot.Descendants(HwpxNs.Hp + "pagePr").FirstOrDefault(); + pagePr?.SetAttributeValue("landscape", "NARROWLY"); + } + + // Apply custom page dimensions/margins if specified + if (props != null) + { + var pagePr = newRoot.Descendants(HwpxNs.Hp + "pagePr").FirstOrDefault(); + var margin = pagePr?.Element(HwpxNs.Hp + "margin"); + if (props.ContainsKey("pagewidth")) pagePr?.SetAttributeValue("width", props["pagewidth"]); + if (props.ContainsKey("pageheight")) pagePr?.SetAttributeValue("height", props["pageheight"]); + if (props.ContainsKey("margintop")) margin?.SetAttributeValue("top", props["margintop"]); + if (props.ContainsKey("marginbottom")) margin?.SetAttributeValue("bottom", props["marginbottom"]); + if (props.ContainsKey("marginleft")) margin?.SetAttributeValue("left", props["marginleft"]); + if (props.ContainsKey("marginright")) margin?.SetAttributeValue("right", props["marginright"]); + } + + var secDoc = new XDocument(new XDeclaration("1.0", "UTF-8", null), newRoot); + + var section = new HwpxSection + { + Index = newIndex, + EntryPath = entryPath, + Document = secDoc + }; + _doc.Sections.Add(section); + + // Update header.xml secCnt + var headEl = _doc.Header?.Root; + if (headEl != null) + { + var currentCnt = (int?)headEl.Attribute("secCnt") ?? 1; + headEl.SetAttributeValue("secCnt", (currentCnt + 1).ToString()); + SaveHeader(); + } + + // Update manifest (content.hpf) + AddManifestEntry(entryPath); + + // Save new section to ZIP + SaveSection(section.Root); + return section; + } + + /// Add an item+spine entry to content.hpf manifest. + private void AddManifestEntry(string entryPath) + { + var opfDoc = _doc.ManifestDoc; + if (opfDoc?.Root == null) return; + + var opfNs = opfDoc.Root.Name.Namespace; + var itemId = Path.GetFileNameWithoutExtension(entryPath); + + var manifest = opfDoc.Root.Element(opfNs + "manifest"); + manifest?.Add(new XElement(opfNs + "item", + new XAttribute("id", itemId), + new XAttribute("href", entryPath), + new XAttribute("media-type", "application/xml"))); + + var spine = opfDoc.Root.Element(opfNs + "spine"); + spine?.Add(new XElement(opfNs + "itemref", + new XAttribute("idref", itemId))); + + SaveManifest(); + } + + /// Remove an item+spine entry from content.hpf manifest. + private void RemoveManifestEntry(string entryPath) + { + var opfDoc = _doc.ManifestDoc; + if (opfDoc?.Root == null) return; + + var opfNs = opfDoc.Root.Name.Namespace; + var itemId = Path.GetFileNameWithoutExtension(entryPath); + + var manifest = opfDoc.Root.Element(opfNs + "manifest"); + manifest?.Elements(opfNs + "item") + .FirstOrDefault(e => e.Attribute("id")?.Value == itemId + || e.Attribute("href")?.Value == entryPath) + ?.Remove(); + + var spine = opfDoc.Root.Element(opfNs + "spine"); + spine?.Elements(opfNs + "itemref") + .FirstOrDefault(e => e.Attribute("idref")?.Value == itemId) + ?.Remove(); + + SaveManifest(); + } + + // G6: Korean government standard margins (행정문서 양식) + private static class GovStandardMargins + { + public const int PageWidth = 59528; // A4: 210mm + public const int PageHeight = 84186; // A4: 297mm + public const int Top = 5668; // 20mm + public const int Bottom = 4252; // 15mm + public const int Left = 5668; // 20mm + public const int Right = 5668; // 20mm + public const int Header = 4252; // 15mm + public const int Footer = 4252; // 15mm + public const int Gutter = 0; + } + + /// Apply Korean government standard margins to section 1. + private bool ApplyGovStandardMargins() + { + var sec = _doc.Sections.FirstOrDefault(); + if (sec == null) return false; + var secPr = sec.Root.Descendants(HwpxNs.Hp + "secPr").FirstOrDefault(); + var pagePr = secPr?.Element(HwpxNs.Hp + "pagePr"); + var margin = pagePr?.Element(HwpxNs.Hp + "margin"); + if (margin == null) return false; + + margin.SetAttributeValue("top", GovStandardMargins.Top.ToString()); + margin.SetAttributeValue("bottom", GovStandardMargins.Bottom.ToString()); + margin.SetAttributeValue("left", GovStandardMargins.Left.ToString()); + margin.SetAttributeValue("right", GovStandardMargins.Right.ToString()); + margin.SetAttributeValue("header", GovStandardMargins.Header.ToString()); + margin.SetAttributeValue("footer", GovStandardMargins.Footer.ToString()); + margin.SetAttributeValue("gutter", GovStandardMargins.Gutter.ToString()); + + _dirty = true; + return true; + } + + /// Save content.hpf manifest to ZIP archive. + private void SaveManifest() + { + if (_doc.ManifestDoc == null || _doc.ManifestEntryPath == null) return; + + var entryName = _doc.ManifestEntryPath; + var entry = _doc.Archive.GetEntry(entryName); + if (entry == null) return; + + entry.Delete(); + var newEntry = _doc.Archive.CreateEntry(entryName, System.IO.Compression.CompressionLevel.Optimal); + using var stream = newEntry.Open(); + var xmlStr = HwpxPacker.MinifyXml(_doc.ManifestDoc.ToString(SaveOptions.DisableFormatting)); + xmlStr = HwpxPacker.RestoreOriginalNamespaces(xmlStr); + xmlStr = "" + xmlStr; + var bytes = System.Text.Encoding.UTF8.GetBytes(xmlStr); + stream.Write(bytes, 0, bytes.Length); + } + + // ==================== Metadata ==================== + + /// Set a metadata value in content.hpf. Matches Hancom's OPF meta structure. + private bool SetMetadata(string name, string value) + { + var opfDoc = _doc.ManifestDoc; + if (opfDoc?.Root == null) return false; + var ns = opfDoc.Root.Name.Namespace; + var metadata = opfDoc.Root.Element(ns + "metadata"); + if (metadata == null) + { + metadata = new XElement(ns + "metadata"); + opfDoc.Root.AddFirst(metadata); + } + + // title and language are direct elements; others use + if (name is "title" or "language") + { + var el = metadata.Element(ns + name); + if (el == null) { el = new XElement(ns + name); metadata.Add(el); } + el.Value = value; + } + else + { + var el = metadata.Elements(ns + "meta") + .FirstOrDefault(e => e.Attribute("name")?.Value == name); + if (el == null) + { + el = new XElement(ns + "meta", + new XAttribute("name", name), new XAttribute("content", "text")); + metadata.Add(el); + } + el.Value = value; + } + + SaveManifest(); + _dirty = true; + return true; + } + + /// Read all metadata from content.hpf. + public Dictionary GetMetadata() + { + var opfDoc = _doc.ManifestDoc; + if (opfDoc?.Root == null) return new(); + var ns = opfDoc.Root.Name.Namespace; + var metadata = opfDoc.Root.Element(ns + "metadata"); + if (metadata == null) return new(); + + var result = new Dictionary(StringComparer.OrdinalIgnoreCase); + // Direct elements + var title = metadata.Element(ns + "title")?.Value; + if (!string.IsNullOrEmpty(title)) result["title"] = title; + var lang = metadata.Element(ns + "language")?.Value; + if (!string.IsNullOrEmpty(lang)) result["language"] = lang; + + // G4: Dublin Core elements (dc:title, dc:creator, dc:subject, etc.) + var dcFields = new[] { "title", "creator", "subject", "description", + "publisher", "contributor", "date", "type", + "format", "identifier", "source", "language", + "relation", "coverage", "rights" }; + foreach (var field in dcFields) + { + if (result.ContainsKey(field)) continue; + var dcEl = metadata.Element(HwpxNs.Dc + field); + if (dcEl != null && !string.IsNullOrEmpty(dcEl.Value)) + result[field] = dcEl.Value; + } + + // Meta elements + foreach (var meta in metadata.Elements(ns + "meta")) + { + var n = meta.Attribute("name")?.Value; + var v = meta.Value; + if (n != null && !string.IsNullOrEmpty(v)) result[n] = v; + } + return result; + } + + /// Remove a section by path (e.g. /section[2]). + private string? RemoveSection(string path) + { + var segments = ParsePath(path); + var secIdx = (segments[0].Index ?? 1) - 1; + if (_doc.Sections.Count <= 1) + throw new CliException("Cannot remove last section") { Code = "invalid_op" }; + if (secIdx < 0 || secIdx >= _doc.Sections.Count) + throw new CliException($"Section {secIdx + 1} not found"); + + var section = _doc.Sections[secIdx]; + _doc.Sections.RemoveAt(secIdx); + + // Reindex remaining sections + for (int i = secIdx; i < _doc.Sections.Count; i++) + _doc.Sections[i].Index = i; + + // Update secCnt + var headEl = _doc.Header?.Root; + if (headEl != null) + { + headEl.SetAttributeValue("secCnt", _doc.Sections.Count.ToString()); + SaveHeader(); + } + + // Remove manifest entry + ZIP entry + RemoveManifestEntry(section.EntryPath); + var zipEntry = _doc.Archive.GetEntry(section.EntryPath); + zipEntry?.Delete(); + _dirty = true; + return null; + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxHandler.Path.cs b/src/officecli/Handlers/Hwpx/HwpxHandler.Path.cs new file mode 100644 index 000000000..ba5e29e43 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxHandler.Path.cs @@ -0,0 +1,436 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Text.RegularExpressions; +using System.Xml.Linq; +using OfficeCli.Core; + +namespace OfficeCli.Handlers; + +public partial class HwpxHandler +{ + internal record PathSegment(string Name, int? Index); + + /// + /// Parse a path string into segments. + /// "/section[1]/p[3]" → [("section", 1), ("p", 3)] + /// "/section[1]/p[last]" → [("section", 1), ("p", -1)] + /// -1 is a sentinel value meaning "last element". + /// + internal static List ParsePath(string path) + { + var segments = new List(); + var parts = path.Split('/', StringSplitOptions.RemoveEmptyEntries); + + foreach (var part in parts) + { + var match = Regex.Match(part, @"^(\w+)(?:\[(\d+|last)\])?$"); + if (!match.Success) + throw new ArgumentException($"Invalid path segment: '{part}'"); + + var name = match.Groups[1].Value; + int? index; + if (match.Groups[2].Success) + index = match.Groups[2].Value == "last" ? -1 : int.Parse(match.Groups[2].Value); + else + index = null; + segments.Add(new PathSegment(name, index)); + } + + return segments; + } + + /// + /// Resolve a path to the target XElement. + /// Supports cross-section navigation: /section[2]/p[3] + /// + internal XElement ResolvePath(string path) + { + if (string.IsNullOrEmpty(path) || path == "/") + throw new ArgumentException("Cannot resolve root path to element"); + + var segments = ParsePath(path); + return ResolveSegments(segments); + } + + private XElement ResolveSegments(List segments) + { + if (segments.Count == 0) + throw new ArgumentException("Empty path"); + + var first = segments[0]; + + // Determine which section to use + HwpxSection section; + int segmentStart; + + if (first.Name.Equals("section", StringComparison.OrdinalIgnoreCase)) + { + var secIdx = (first.Index ?? 1) - 1; + if (secIdx < 0 || secIdx >= _doc.Sections.Count) + throw new ArgumentException( + $"Section {secIdx + 1} not found (document has {_doc.Sections.Count} sections)"); + section = _doc.Sections[secIdx]; + segmentStart = 1; // skip section segment + } + else if (first.Name.Equals("header", StringComparison.OrdinalIgnoreCase)) + { + return ResolveHeaderPath(segments); + } + else + { + // No section prefix → use primary section + section = _doc.PrimarySection; + segmentStart = 0; + } + + if (segmentStart >= segments.Count) + { + // Path is just "/section[N]" — return section root + return section.Root; + } + + // Resolve within section + XElement current = section.Root; + for (int i = segmentStart; i < segments.Count; i++) + { + var seg = segments[i]; + current = ResolveChildElement(current, seg); + } + + return current; + } + + private XElement ResolveChildElement(XElement parent, PathSegment segment) + { + var name = segment.Name.ToLowerInvariant(); + + XName elementName = name switch + { + "p" => HwpxNs.Hp + "p", + "tbl" => HwpxNs.Hp + "tbl", + "tr" => HwpxNs.Hp + "tr", + "tc" => HwpxNs.Hp + "tc", + "run" => HwpxNs.Hp + "run", + "pic" or "picture" or "image" => HwpxNs.Hp + "pic", + "img" => HwpxNs.Hp + "img", + "drawing" => HwpxNs.Hp + "drawing", + "sublist" => HwpxNs.Hp + "subList", + _ => throw new ArgumentException($"Unknown element type: '{name}'") + }; + + // When parent is , transparently navigate through + // so paths like /tbl[1]/tr[1]/tc[1]/p[1] work without explicit subList + var searchParent = parent; + if (parent.Name.LocalName == "tc" && (name == "p" || name == "run")) + { + searchParent = parent.Element(HwpxNs.Hp + "subList") ?? parent; + } + + // When looking for tbl inside a p, search descendants (tbl is inside p > run > tbl) + var children = (parent.Name.LocalName == "p" && name == "tbl") + ? parent.Descendants(elementName).ToList() + : searchParent.Elements(elementName).ToList(); + + // Resolve index: -1 = last, null = first (1), positive = 1-based + int idx; + if (segment.Index == -1) + idx = children.Count - 1; + else + idx = (segment.Index ?? 1) - 1; + + if (idx < 0 || idx >= children.Count) + { + var label = segment.Index == -1 ? "last" : (segment.Index ?? 1).ToString(); + throw new ArgumentException( + $"{name}[{label}] not found (parent has {children.Count} {name} elements)"); + } + + return children[idx]; + } + + private XElement ResolveHeaderPath(List segments) + { + if (_doc.Header?.Root == null) + throw new ArgumentException("Document has no header.xml"); + + if (segments.Count == 1) + return _doc.Header.Root; + + var second = segments[1]; + var name = second.Name.ToLowerInvariant(); + + // Navigate header.xml structure + XName elementName = name switch + { + "charpr" or "charproperty" => HwpxNs.Hh + "charPr", + "parapr" or "paraproperty" => HwpxNs.Hh + "paraPr", + "style" => HwpxNs.Hh + "style", + "borderfill" => HwpxNs.Hh + "borderFill", + _ => throw new ArgumentException($"Unknown header element: '{name}'") + }; + + if (second.Index.HasValue) + { + // Find by ID attribute (header elements use id= not positional index) + var element = _doc.Header.Root.Descendants(elementName) + .FirstOrDefault(e => e.Attribute("id")?.Value == second.Index.Value.ToString()); + if (element == null) + throw new ArgumentException($"{name} with id={second.Index.Value} not found"); + return element; + } + + // Return container + var container = name switch + { + "charpr" or "charproperty" => HwpxNs.Hh + "charProperties", + "parapr" or "paraproperty" => HwpxNs.Hh + "paraProperties", + "style" => HwpxNs.Hh + "styles", + "borderfill" => HwpxNs.Hh + "borderFills", + _ => throw new ArgumentException($"Unknown header container: '{name}'") + }; + + return _doc.Header.Root.Descendants(container).FirstOrDefault() + ?? throw new ArgumentException($"No {name} container found"); + } + + /// + /// Build a path string for a given XElement by walking up the tree. + /// + internal string BuildPath(XElement element) + { + var parts = new Stack(); + var current = element; + + while (current != null && current.Parent != null) + { + var localName = current.Name.LocalName; + var ns = current.Name.Namespace; + + // Skip subList in path building — users navigate tc/p directly + if (localName == "subList") + { + current = current.Parent; + continue; + } + + if (ns == HwpxNs.Hs && localName == "sec") + { + // Find section index + var secIdx = _doc.Sections.FindIndex(s => s.Root == current); + if (secIdx >= 0) + parts.Push($"section[{secIdx + 1}]"); + break; + } + + // Count siblings of same type to determine index — always emit [N] for consistent string-equality + var siblings = current.Parent.Elements(current.Name).ToList(); + var idx = siblings.IndexOf(current) + 1; // 1-based + parts.Push($"{MapElementToPathName(localName)}[{idx}]"); + + current = current.Parent; + } + + return "/" + string.Join("/", parts); + } + + private static string MapElementToPathName(string localName) => localName switch + { + "p" => "p", + "tbl" => "tbl", + "tr" => "tr", + "tc" => "tc", + "run" => "run", + "t" => "t", + "img" => "img", + "drawing" => "drawing", + "subList" => "subList", + _ => localName + }; + + /// + /// Parse and execute a CSS-like selector against the document. + /// Supported selectors: + /// "p" — all paragraphs + /// "tbl" — all tables + /// "p:empty" — empty paragraphs + /// "p:contains(text)" — paragraphs containing text + /// "tbl > tr > tc" — table cells (descendant combinator) + /// "p[styleIDRef=2]" — attribute selector + /// + internal List ExecuteSelector(string selector) + { + var trimmed = selector.Trim(); + + // Child combinator: "tbl > tr > tc", "p > run" + if (trimmed.Contains(" > ")) + { + var parts = trimmed.Split(" > ", StringSplitOptions.TrimEntries); + var current = GetAllElements(parts[0]); + for (int i = 1; i < parts.Length; i++) + current = current.SelectMany(parent => FilterChildren(parent, parts[i])).ToList(); + return current; + } + + // element:pseudo or element[attr op value] + var selectorMatch = Regex.Match(trimmed, @"^(\w+)(?::(\w+)(?:\((.+)\))?)?(?:\[(.+)\])?$"); + if (!selectorMatch.Success) + throw new ArgumentException($"Unsupported selector: '{selector}'. " + + "Supported: p, tbl, tr, tc, run, picture, p:empty, element:contains(text), element:has(child), element[attr=value], element[attr!=value], element[attr~=text], parent > child"); + + var elemType = selectorMatch.Groups[1].Value; + var pseudo = selectorMatch.Groups[2].Value; + var pseudoArg = selectorMatch.Groups[3].Value; + var attrExpr = selectorMatch.Groups[4].Value; + + // Get base elements + var results = GetAllElements(elemType); + + // Apply pseudo-selector + if (!string.IsNullOrEmpty(pseudo)) + { + results = pseudo switch + { + "empty" => results.Where(e => string.IsNullOrWhiteSpace(GetElementText(e))).ToList(), + "contains" => results.Where(e => + GetElementText(e).Contains(pseudoArg.Trim('"', '\''), StringComparison.OrdinalIgnoreCase)).ToList(), + "has" => results.Where(e => + e.Descendants(ResolveXName(pseudoArg)).Any()).ToList(), + "first" => results.Take(1).ToList(), + "last" => results.TakeLast(1).ToList(), + _ => throw new ArgumentException($"Unsupported pseudo-selector: ':{pseudo}'") + }; + } + + // Apply attribute filter + if (!string.IsNullOrEmpty(attrExpr)) + results = ApplyAttributeFilter(results, attrExpr); + + return results; + } + + private List GetAllElements(string elemType) + { + var results = new List(); + // Strip pseudo/attr for base element resolution + var baseType = Regex.Replace(elemType, @"[:[].*$", ""); + + foreach (var sec in _doc.Sections) + { + var xname = ResolveXName(baseType); + if (baseType == "p") + results.AddRange(sec.Paragraphs); + else if (baseType == "tbl") + results.AddRange(sec.Tables); + else + results.AddRange(sec.Root.Descendants(xname)); + } + return results; + } + + private static XName ResolveXName(string elemType) => elemType switch + { + "p" => HwpxNs.Hp + "p", + "tbl" => HwpxNs.Hp + "tbl", + "tr" => HwpxNs.Hp + "tr", + "tc" => HwpxNs.Hp + "tc", + "run" => HwpxNs.Hp + "run", + "pic" or "picture" or "image" => HwpxNs.Hp + "pic", + "img" => HwpxNs.Hp + "img", + "equation" => HwpxNs.Hp + "equation", + "shape" => HwpxNs.Hp + "shapeObject", + "field" or "fieldBegin" or "formfield" => HwpxNs.Hp + "fieldBegin", + "bookmark" => HwpxNs.Hp + "bookmark", + "ctrl" => HwpxNs.Hp + "ctrl", + _ => HwpxNs.Hp + elemType + }; + + private string GetElementText(XElement e) => e.Name.LocalName switch + { + "p" => HwpxKorean.Normalize(ExtractParagraphText(e)), + "tc" => ExtractCellText(e), + "run" => string.Join("", e.Elements(HwpxNs.Hp + "t").Select(t => t.Value)), + _ => e.Value + }; + + private List FilterChildren(XElement parent, string childSelector) + { + var childMatch = Regex.Match(childSelector, @"^(\w+)(?:\[(.+)\])?$"); + if (!childMatch.Success) return new(); + var childType = childMatch.Groups[1].Value; + var xname = ResolveXName(childType); + var children = parent.Elements(xname).ToList(); + if (!string.IsNullOrEmpty(childMatch.Groups[2].Value)) + children = ApplyAttributeFilter(children, childMatch.Groups[2].Value); + return children; + } + + private List ApplyAttributeFilter(List elements, string attrExpr) + { + // Operators: =, !=, ~= (contains), >=, <= + var opMatch = Regex.Match(attrExpr, @"^(\w+)(~=|!=|>=|<=|=)(.+)$"); + if (!opMatch.Success) return elements; + + var attrName = opMatch.Groups[1].Value; + var op = opMatch.Groups[2].Value; + var attrValue = opMatch.Groups[3].Value.Trim('"', '\''); + + return elements.Where(e => + { + var actual = ResolveVirtualAttribute(e, attrName); + if (actual == null) return op == "!="; // null != anything is true + return op switch + { + "=" => actual.Equals(attrValue, StringComparison.OrdinalIgnoreCase), + "!=" => !actual.Equals(attrValue, StringComparison.OrdinalIgnoreCase), + "~=" => actual.Contains(attrValue, StringComparison.OrdinalIgnoreCase), + ">=" => int.TryParse(actual, out var av) && int.TryParse(attrValue, out var tv) && av >= tv, + "<=" => int.TryParse(actual, out var av2) && int.TryParse(attrValue, out var tv2) && av2 <= tv2, + _ => false + }; + }).ToList(); + } + + /// Resolve virtual attributes (computed on-the-fly) or real XML attributes. + private string? ResolveVirtualAttribute(XElement e, string attrName) + { + // Virtual attributes + switch (attrName) + { + case "text": + return GetElementText(e); + case "bold": + { + var charPrId = e.Attribute("charPrIDRef")?.Value ?? e.Elements(HwpxNs.Hp + "run").FirstOrDefault()?.Attribute("charPrIDRef")?.Value; + var charPr = charPrId != null ? FindCharPr(charPrId) : null; + return charPr?.Element(HwpxNs.Hh + "bold") != null ? "true" : "false"; + } + case "italic": + { + var charPrId = e.Attribute("charPrIDRef")?.Value ?? e.Elements(HwpxNs.Hp + "run").FirstOrDefault()?.Attribute("charPrIDRef")?.Value; + var charPr = charPrId != null ? FindCharPr(charPrId) : null; + return charPr?.Element(HwpxNs.Hh + "italic") != null ? "true" : "false"; + } + case "fontsize": + { + var charPrId = e.Attribute("charPrIDRef")?.Value ?? e.Elements(HwpxNs.Hp + "run").FirstOrDefault()?.Attribute("charPrIDRef")?.Value; + var charPr = charPrId != null ? FindCharPr(charPrId) : null; + var height = (int?)charPr?.Attribute("height"); + return height.HasValue ? (height.Value / 100).ToString() : null; + } + case "colSpan": + return e.Name.LocalName == "tc" ? GetCellAddr(e).ColSpan.ToString() : e.Attribute("colSpan")?.Value; + case "rowSpan": + return e.Name.LocalName == "tc" ? GetCellAddr(e).RowSpan.ToString() : e.Attribute("rowSpan")?.Value; + case "heading": + { + if (e.Name.LocalName != "p") return null; + var info = GetParagraphStyleInfo(e); + return info.HeadingLevel; + } + } + + // Real XML attribute + return e.Attribute(attrName)?.Value; + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxHandler.Query.cs b/src/officecli/Handlers/Hwpx/HwpxHandler.Query.cs new file mode 100644 index 000000000..45819fa45 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxHandler.Query.cs @@ -0,0 +1,401 @@ +using System.IO.Compression; +using System.Text; +using System.Text.Json.Nodes; +using System.Xml.Linq; +using OfficeCli.Core; + +namespace OfficeCli.Handlers; + +public partial class HwpxHandler +{ + // ==================== Query Layer ==================== + + public DocumentNode Get(string path, int depth = 1) + { + if (string.IsNullOrEmpty(path)) + throw new ArgumentException("Path cannot be empty"); + + if (path == "/") + return GetRootNode(depth); + + var element = ResolvePath(path); + return BuildDocumentNode(element, path, depth); + } + + private DocumentNode GetRootNode(int depth) + { + var node = new DocumentNode + { + Path = "/", + Type = "hwpx-document", + ChildCount = _doc.Sections.Count, + }; + + // Document metadata + node.Format["sections"] = _doc.Sections.Count; + node.Format["hasHeader"] = _doc.Header != null; + + if (depth > 0) + { + foreach (var sec in _doc.Sections) + { + var secNode = new DocumentNode + { + Path = $"/section[{sec.Index + 1}]", + Type = "section", + ChildCount = sec.Root.Elements().Count(), + Preview = $"Section {sec.Index + 1}: {sec.Paragraphs.Count} paragraphs, {sec.Tables.Count} tables" + }; + + if (depth > 1) + { + PopulateSectionChildren(secNode, sec, depth - 1); + } + + node.Children.Add(secNode); + } + } + + return node; + } + + private DocumentNode BuildDocumentNode(XElement element, string path, int depth) + { + var localName = element.Name.LocalName; + + return localName switch + { + "p" => BuildParagraphNode(element, path, depth), + "tbl" => BuildTableNode(element, path, depth), + "tr" => BuildTableRowNode(element, path, depth), + "tc" => BuildTableCellNode(element, path, depth), + "run" => BuildRunNode(element, path), + "sec" => BuildSectionNode(element, path, depth), + _ => BuildGenericNode(element, path, depth) + }; + } + + private DocumentNode BuildParagraphNode(XElement para, string path, int depth) + { + var text = HwpxKorean.Normalize(ExtractParagraphText(para)); + var styleInfo = GetParagraphStyleInfo(para); + + var node = new DocumentNode + { + Path = path, + Type = !string.IsNullOrEmpty(styleInfo.HeadingLevel) + ? $"heading{styleInfo.HeadingLevel}" + : "paragraph", + Text = text, + Preview = text.Length > 100 ? text[..100] + "…" : text, + ChildCount = para.Elements(HwpxNs.Hp + "run").Count(), + }; + + // Format properties + node.Format["alignment"] = styleInfo.Alignment; + if (styleInfo.HeadingLevel != null) + node.Format["headingLevel"] = int.Parse(styleInfo.HeadingLevel); + + var styleIdRef = para.Attribute("styleIDRef")?.Value; + if (styleIdRef != null) + node.Format["styleIDRef"] = styleIdRef; + + var paraPrIdRef = para.Attribute("paraPrIDRef")?.Value; + if (paraPrIdRef != null) + node.Format["paraPrIDRef"] = paraPrIdRef; + + // Populate children (runs) if depth allows + if (depth > 1) + { + int runIdx = 0; + foreach (var run in para.Elements(HwpxNs.Hp + "run")) + { + runIdx++; + var runPath = $"{path}/run[{runIdx}]"; + node.Children.Add(BuildRunNode(run, runPath)); + } + } + + return node; + } + + private DocumentNode BuildRunNode(XElement run, string path) + { + var text = string.Join("", run.Elements(HwpxNs.Hp + "t").Select(t => t.Value)); + text = HwpxKorean.Normalize(text); + + var node = new DocumentNode + { + Path = path, + Type = "run", + Text = text, + }; + + var charPrIdRef = run.Attribute("charPrIDRef")?.Value; + if (charPrIdRef != null) + { + node.Format["charPrIDRef"] = charPrIdRef; + + // Look up character properties from header.xml + if (_doc.Header != null) + { + var charPr = _doc.Header.Descendants(HwpxNs.Hh + "charPr") + .FirstOrDefault(cp => cp.Attribute("id")?.Value == charPrIdRef); + if (charPr != null) + { + var height = (int?)charPr.Attribute("height"); + if (height.HasValue) + node.Format["fontSize"] = $"{height.Value / 100.0}pt"; + + var textColor = charPr.Attribute("textColor")?.Value; + if (textColor != null) + node.Format["color"] = textColor; + + if (charPr.Element(HwpxNs.Hh + "bold") != null) + node.Format["bold"] = true; + if (charPr.Element(HwpxNs.Hh + "italic") != null) + node.Format["italic"] = true; + + var fontRef = charPr.Element(HwpxNs.Hh + "fontRef"); + if (fontRef != null) + { + node.Format["fontHangul"] = fontRef.Attribute("hangul")?.Value; + node.Format["fontLatin"] = fontRef.Attribute("latin")?.Value; + } + } + } + } + + return node; + } + + private DocumentNode BuildTableNode(XElement tbl, string path, int depth) + { + var rowCnt = (int?)tbl.Attribute("rowCnt") ?? 0; + var colCnt = (int?)tbl.Attribute("colCnt") ?? 0; + + var node = new DocumentNode + { + Path = path, + Type = "table", + Preview = $"Table {rowCnt}×{colCnt}", + ChildCount = rowCnt, + }; + + node.Format["rowCount"] = rowCnt; + node.Format["colCount"] = colCnt; + + var borderFill = tbl.Attribute("borderFillIDRef")?.Value; + if (borderFill != null) + node.Format["borderFillIDRef"] = borderFill; + + if (depth > 1) + { + int trIdx = 0; + foreach (var tr in tbl.Elements(HwpxNs.Hp + "tr")) + { + trIdx++; + var trPath = $"{path}/tr[{trIdx}]"; + node.Children.Add(BuildTableRowNode(tr, trPath, depth - 1)); + } + } + + return node; + } + + private DocumentNode BuildTableRowNode(XElement tr, string path, int depth) + { + var cells = tr.Elements(HwpxNs.Hp + "tc").ToList(); + var node = new DocumentNode + { + Path = path, + Type = "tableRow", + ChildCount = cells.Count, + }; + + if (depth > 1) + { + int tcIdx = 0; + foreach (var tc in cells) + { + tcIdx++; + var tcPath = $"{path}/tc[{tcIdx}]"; + node.Children.Add(BuildTableCellNode(tc, tcPath, depth - 1)); + } + } + + return node; + } + + private DocumentNode BuildTableCellNode(XElement tc, string path, int depth) + { + // Extract cell address (dual-format support) + var (row, col, rowSpan, colSpan) = GetCellAddr(tc); + + var node = new DocumentNode + { + Path = path, + Type = "tableCell", + }; + + node.Format["row"] = row; + node.Format["col"] = col; + node.Format["rowSpan"] = rowSpan; + node.Format["colSpan"] = colSpan; + + // Extract cell text from subList paragraphs + var subList = tc.Element(HwpxNs.Hp + "subList"); + if (subList != null) + { + var cellText = new StringBuilder(); + foreach (var p in subList.Elements(HwpxNs.Hp + "p")) + { + var pText = HwpxKorean.Normalize(ExtractParagraphText(p)); + if (cellText.Length > 0 && !string.IsNullOrEmpty(pText)) + cellText.Append('\n'); + cellText.Append(pText); + } + node.Text = cellText.ToString(); + node.Preview = node.Text.Length > 50 ? node.Text[..50] + "…" : node.Text; + node.ChildCount = subList.Elements(HwpxNs.Hp + "p").Count(); + } + + return node; + } + + private DocumentNode BuildSectionNode(XElement sec, string path, int depth) + { + var section = _doc.Sections.FirstOrDefault(s => s.Root == sec); + var node = new DocumentNode + { + Path = path, + Type = "section", + ChildCount = sec.Elements().Count(), + }; + + if (section != null) + { + node.Preview = $"Section {section.Index + 1}: {section.Paragraphs.Count}p, {section.Tables.Count}tbl"; + } + + // Section properties + var secPr = sec.Descendants(HwpxNs.Hp + "secPr").FirstOrDefault(); + if (secPr != null) + { + node.Format["textDirection"] = secPr.Attribute("textDirection")?.Value; + var pagePr = secPr.Element(HwpxNs.Hp + "pagePr"); + if (pagePr != null) + { + node.Format["pageWidth"] = (int?)pagePr.Attribute("width"); + node.Format["pageHeight"] = (int?)pagePr.Attribute("height"); + node.Format["landscape"] = pagePr.Attribute("landscape")?.Value; + } + } + + if (depth > 0 && section != null) + { + PopulateSectionChildren(node, section, depth); + } + + return node; + } + + private void PopulateSectionChildren(DocumentNode node, HwpxSection section, int depth) + { + int pIdx = 0, tblIdx = 0; + foreach (var child in section.Root.Elements()) + { + var localName = child.Name.LocalName; + if (localName == "p") + { + pIdx++; + var childPath = $"{node.Path}/p[{pIdx}]"; + node.Children.Add(BuildParagraphNode(child, childPath, depth - 1)); + } + else if (localName == "tbl") + { + tblIdx++; + var childPath = $"{node.Path}/tbl[{tblIdx}]"; + node.Children.Add(BuildTableNode(child, childPath, depth - 1)); + } + } + } + + private DocumentNode BuildGenericNode(XElement element, string path, int depth) + { + var node = new DocumentNode + { + Path = path, + Type = element.Name.LocalName, + ChildCount = element.Elements().Count(), + }; + + // Copy all attributes to format + foreach (var attr in element.Attributes()) + { + node.Format[attr.Name.LocalName] = attr.Value; + } + + if (element.HasElements && depth > 1) + { + // Per-type counters so paths are resolvable: a[1], b[1], a[2] — not a[1], b[2], a[3] + var childCounts = new Dictionary(); + foreach (var child in element.Elements()) + { + var localName = child.Name.LocalName; + childCounts.TryGetValue(localName, out int count); + childCounts[localName] = ++count; + var childPath = $"{path}/{MapElementToPathName(localName)}[{count}]"; + node.Children.Add(BuildDocumentNode(child, childPath, depth - 1)); + } + } + else if (!element.HasElements) + { + node.Text = element.Value; + } + + return node; + } + + public List Query(string selector) + { + if (string.IsNullOrEmpty(selector)) + throw new ArgumentException("Selector cannot be empty"); + + var elements = ExecuteSelector(selector); + return elements.Select(e => BuildDocumentNode(e, BuildPath(e), 1)).ToList(); + } + + private void SaveSection(XElement element) + { + // Walk up to find which section this element belongs to + var current = element; + while (current != null) + { + if (current.Name.Namespace == HwpxNs.Hs && current.Name.LocalName == "sec") + break; + current = current.Parent; + } + + if (current == null) + throw new InvalidOperationException("Cannot determine section for element"); + + var section = _doc.Sections.First(s => s.Root == current); + var entryName = section.EntryPath; + // Delete-and-recreate pattern (avoids trailing bytes from SetLength(0)) + // For new sections, entry may not exist yet — just create. + var entry = _doc.Archive.GetEntry(entryName); + entry?.Delete(); + var newEntry = _doc.Archive.CreateEntry(entryName, CompressionLevel.Optimal); + using var stream = newEntry.Open(); + // CRITICAL: Hancom requires single-line (minified) XML without BOM. + // XDocument.ToString() pretty-prints by default — use DisableFormatting. + var xmlStr = HwpxPacker.MinifyXml(section.Document.ToString(SaveOptions.DisableFormatting)); + // Restore 2016 namespace for hp10 prefix — Hancom expects this original URI + xmlStr = HwpxPacker.RestoreOriginalNamespaces(xmlStr); + // Prepend XML declaration (without BOM) + xmlStr = "" + xmlStr; + var bytes = System.Text.Encoding.UTF8.GetBytes(xmlStr); + stream.Write(bytes, 0, bytes.Length); + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxHandler.Raw.cs b/src/officecli/Handlers/Hwpx/HwpxHandler.Raw.cs new file mode 100644 index 000000000..0ba1bf4cc --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxHandler.Raw.cs @@ -0,0 +1,161 @@ +// File: src/officecli/Handlers/Hwpx/HwpxHandler.Raw.cs +using System.IO.Compression; +using System.Xml.Linq; +using System.Xml.XPath; +using OfficeCli.Core; + +namespace OfficeCli.Handlers; + +public partial class HwpxHandler +{ + // ==================== Raw Layer ==================== + + /// + /// Return formatted XML string for the ZIP entry at partPath. + /// partPath is a ZIP entry name e.g. "Contents/section0.xml", "Contents/header.xml". + /// startRow/endRow/cols are ignored for HWPX (Excel compatibility params only). + /// + public string Raw(string partPath, int? startRow = null, int? endRow = null, HashSet? cols = null) + { + var entry = _doc.Archive.GetEntry(partPath) + ?? throw new CliException($"Part not found: {partPath}") { Code = "not_found" }; + + using var stream = entry.Open(); + var part = XDocument.Load(stream); + return part.ToString(); + } + + /// + /// Apply a mutation to the element selected by xpath within the ZIP entry at partPath. + /// partPath = ZIP entry name (e.g. "Contents/section0.xml"). + /// xpath = XPath 1.0 expression relative to document root (e.g. "//*[local-name()='p'][1]"). + /// Actions: append | prepend | insertbefore | insertafter | replace | remove | setattr + /// + public void RawSet(string partPath, string xpath, string action, string? xml) + { + var entry = _doc.Archive.GetEntry(partPath) + ?? throw new CliException($"Part not found: {partPath}") { Code = "not_found" }; + + XDocument part; + using (var readStream = entry.Open()) + part = XDocument.Load(readStream); + + // NOTE: XPath must use local-name() syntax for namespace-prefixed elements, + // e.g. "//*[local-name()='p'][1]", because XPathSelectElement does not use + // a namespace resolver by default. + var element = part.XPathSelectElement(xpath) + ?? throw new CliException($"XPath not found: {xpath}") { Code = "not_found" }; + + switch (action.ToLowerInvariant()) + { + case "remove": + element.Remove(); + break; + + case "replace": + if (string.IsNullOrEmpty(xml)) + throw new CliException("replace action requires xml (XML fragment)") + { Code = "invalid_action" }; + element.ReplaceWith(XElement.Parse(xml)); + break; + + case "setattr": + if (string.IsNullOrEmpty(xml)) + throw new CliException("setattr action requires xml in format 'attrName=value'") + { Code = "invalid_action" }; + var eqIdx = xml.IndexOf('='); + if (eqIdx <= 0) + throw new CliException("setattr xml must be in format 'attrName=value'") + { Code = "invalid_action" }; + element.SetAttributeValue(xml[..eqIdx], xml[(eqIdx + 1)..]); + break; + + case "append": + if (string.IsNullOrEmpty(xml)) + throw new CliException("append action requires xml (XML fragment)") + { Code = "invalid_action" }; + element.Add(XElement.Parse(xml)); + break; + + case "prepend": + if (string.IsNullOrEmpty(xml)) + throw new CliException("prepend action requires xml (XML fragment)") + { Code = "invalid_action" }; + element.AddFirst(XElement.Parse(xml)); + break; + + case "insertbefore": + if (string.IsNullOrEmpty(xml)) + throw new CliException("insertbefore action requires xml (XML fragment)") + { Code = "invalid_action" }; + element.AddBeforeSelf(XElement.Parse(xml)); + break; + + case "insertafter": + if (string.IsNullOrEmpty(xml)) + throw new CliException("insertafter action requires xml (XML fragment)") + { Code = "invalid_action" }; + element.AddAfterSelf(XElement.Parse(xml)); + break; + + default: + throw new CliException( + $"Unknown action: '{action}'. Valid actions: append, prepend, insertbefore, insertafter, replace, remove, setattr") + { Code = "invalid_action" }; + } + + _dirty = true; + // Write modified part back into ZIP archive (must be ZipArchiveMode.Update) + // Delete-and-recreate pattern (avoids trailing bytes from SetLength(0)) + var writeEntry = _doc.Archive.GetEntry(partPath) + ?? throw new CliException($"Part not found for write: {partPath}"); + var entryName = writeEntry.FullName; + writeEntry.Delete(); + var newEntry = _doc.Archive.CreateEntry(entryName, CompressionLevel.Optimal); + using var writeStream = newEntry.Open(); + part.Save(writeStream); + + // Refresh in-memory DOM so subsequent get/query/set/validate see the raw edit + RefreshCachedDocument(partPath, part); + } + + /// + /// After a raw ZIP write, synchronize the in-memory XDocument cache for the affected part. + /// Prevents stale reads in resident mode where the same handler instance is reused. + /// + private void RefreshCachedDocument(string partPath, XDocument updatedDoc) + { + // Check if this is the header + if (_doc.HeaderEntryPath != null + && string.Equals(partPath, _doc.HeaderEntryPath, StringComparison.OrdinalIgnoreCase)) + { + _doc.Header = updatedDoc; + return; + } + + // Check if this is a section + var section = _doc.Sections.FirstOrDefault(s => + string.Equals(s.EntryPath, partPath, StringComparison.OrdinalIgnoreCase)); + if (section != null) + { + section.Document = updatedDoc; + } + } + + /// + /// HWPX uses OPF packaging, NOT OPC. AddPart is meaningless for HWPX. + /// Always throws CliException with unsupported_operation code. + /// + public (string RelId, string PartPath) AddPart(string parentPartPath, string partType, + Dictionary? properties = null) + { + throw new CliException( + "HWPX uses OPF packaging and does not support arbitrary part addition. " + + "Use Raw() to modify existing XML entries directly.") + { + Code = "unsupported_operation", + Suggestion = "Use 'raw' or 'raw-set' commands to modify existing HWPX XML content.", + Help = "officecli raw document.hwpx Contents/section0.xml" + }; + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxHandler.Set.cs b/src/officecli/Handlers/Hwpx/HwpxHandler.Set.cs new file mode 100644 index 000000000..314145e8d --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxHandler.Set.cs @@ -0,0 +1,2145 @@ +using System.IO.Compression; +using System.Xml.Linq; +using OfficeCli.Core; + +namespace OfficeCli.Handlers; + +public partial class HwpxHandler +{ + // ==================== Set Layer ==================== + + /// + /// Apply a set of properties to the element at the given path. + /// Returns names of properties that could not be applied (unsupported). + /// + public List Set(string path, Dictionary properties) + { + var unsupported = new List(); + + // Batch Set: @selector path → Query + Set each result + if (path.StartsWith("@")) + { + var matches = Query(path); + foreach (var match in matches) + { + if (!string.IsNullOrEmpty(match.Path)) + Set(match.Path, new Dictionary(properties)); + } + return unsupported; + } + + // Find/replace: supports any scope path, regex, format filter + if (properties.ContainsKey("find")) + { + var findText = properties["find"]; + var replaceText = properties.GetValueOrDefault("replace") ?? ""; + + XElement? scope = null; + if (path is not ("/" or "" or "/body")) + { + try { scope = ResolvePath(path); } catch { /* fall through to full doc */ } + } + + var formatFilter = new Dictionary(); + foreach (var fk in new[] { "bold", "italic", "color", "fontsize" }) + { + if (properties.TryGetValue(fk, out var fv)) + formatFilter[fk] = fv; + } + + FindAndReplace(findText, replaceText, scope, + formatFilter.Count > 0 ? formatFilter : null); + + var remaining = new Dictionary(properties, StringComparer.OrdinalIgnoreCase); + foreach (var k in new[] { "find", "replace", "bold", "italic", "color", "fontsize" }) + remaining.Remove(k); + if (remaining.Count > 0) + unsupported.AddRange(remaining.Keys); + return unsupported; + } + + // Label-based table fill: fill:라벨=값 (Plan 70) + var fillKeys = properties.Keys + .Where(k => k.StartsWith("fill:", StringComparison.OrdinalIgnoreCase)).ToList(); + if (fillKeys.Count > 0) + { + var fillProps = fillKeys.ToDictionary( + k => k["fill:".Length..], + k => properties[k], + StringComparer.OrdinalIgnoreCase); + var fillResult = FillByLabel(fillProps); + if (fillResult.Unmatched.Count > 0) + unsupported.AddRange(fillResult.Unmatched.Select(u => $"fill:{u}")); + // If only fill: props were provided, return (don't mark as unsupported) + if (fillKeys.Count == properties.Count) return unsupported; + } + + // /table/fill pseudo-path: all props are label=value + if (path.Equals("/table/fill", StringComparison.OrdinalIgnoreCase)) + { + var tableFillResult = FillByLabel(properties); + unsupported.AddRange(tableFillResult.Unmatched); + return unsupported; + } + + // Document-level properties + if (path is "/" or "" or "/body") + { + + // Document-level properties (default font, default font size) + var docHandled = false; + foreach (var (key, value) in properties) + { + switch (key.ToLowerInvariant()) + { + case "defaultfont" or "basefont": + var charPrFont = FindCharPr("0"); + if (charPrFont != null) { ApplyCharPrProperty(charPrFont, "fonthangul", value); docHandled = true; } + break; + case "defaultfontsize" or "basefontsize": + var charPrSize = FindCharPr("0"); + if (charPrSize != null) { ApplyCharPrProperty(charPrSize, "fontsize", value); docHandled = true; } + break; + case "title" or "doctitle": + SetMetadata("title", value); docHandled = true; break; + case "creator" or "author": + SetMetadata("creator", value); docHandled = true; break; + case "subject": + SetMetadata("subject", value); docHandled = true; break; + case "description": + SetMetadata("description", value); docHandled = true; break; + case "language": + SetMetadata("language", value); docHandled = true; break; + case "keyword" or "keywords": + SetMetadata("keyword", value); docHandled = true; break; + default: + unsupported.Add(key); + break; + } + } + if (docHandled) { _dirty = true; SaveHeader(); } + return unsupported; + } + + // Form field editing: /formfield[id] or /clickhere[id] + if (path.StartsWith("/formfield[", StringComparison.OrdinalIgnoreCase) + || path.StartsWith("/clickhere[", StringComparison.OrdinalIgnoreCase)) + { + var fieldId = path[(path.IndexOf('[') + 1)..].TrimEnd(']'); + return SetFormFieldValue(fieldId, properties); + } + + // Style editing: /header/style[N] path — handle before generic resolution + if (path.StartsWith("/header/style", StringComparison.OrdinalIgnoreCase)) + { + var style = ResolvePath(path); + foreach (var (key, value) in properties) + { + switch (key.ToLowerInvariant()) + { + case "name": + style.SetAttributeValue("name", value); + break; + case "engname": + style.SetAttributeValue("engName", value); + break; + case "font" or "fontfamily" or "fonthangul": + var sCharPrIdRef = style.Attribute("charPrIDRef")?.Value; + if (sCharPrIdRef != null) + { + var sCharPr = FindCharPr(sCharPrIdRef); + if (sCharPr != null) + ApplyCharPrProperty(sCharPr, "fonthangul", value); + } + break; + case "fontlatin": + var sCharPrIdRef2 = style.Attribute("charPrIDRef")?.Value; + if (sCharPrIdRef2 != null) + { + var sCharPr2 = FindCharPr(sCharPrIdRef2); + if (sCharPr2 != null) + ApplyCharPrProperty(sCharPr2, "fontlatin", value); + } + break; + case "size" or "fontsize": + var sCharPrIdRef3 = style.Attribute("charPrIDRef")?.Value; + if (sCharPrIdRef3 != null) + { + var sCharPr3 = FindCharPr(sCharPrIdRef3); + if (sCharPr3 != null) + ApplyCharPrProperty(sCharPr3, "fontsize", value); + } + break; + case "bold" or "italic": + var sCharPrId4 = style.Attribute("charPrIDRef")?.Value; + if (sCharPrId4 != null) + { + var sCharPr4 = FindCharPr(sCharPrId4); + if (sCharPr4 != null) + ApplyCharPrProperty(sCharPr4, key.ToLowerInvariant(), value); + } + break; + case "color": + var sCharPrId5 = style.Attribute("charPrIDRef")?.Value; + if (sCharPrId5 != null) + { + var sCharPr5 = FindCharPr(sCharPrId5); + if (sCharPr5 != null) + ApplyCharPrProperty(sCharPr5, "textcolor", value); + } + break; + case "alignment" or "align": + var sParaPrId = style.Attribute("paraPrIDRef")?.Value; + if (sParaPrId != null) + { + var sParaPr = _doc.Header!.Root!.Descendants(HwpxNs.Hh + "paraPr") + .FirstOrDefault(p => p.Attribute("id")?.Value == sParaPrId); + var alignEl = sParaPr?.Element(HwpxNs.Hh + "align"); + if (alignEl == null && sParaPr != null) + { + alignEl = new XElement(HwpxNs.Hh + "align"); + sParaPr.Add(alignEl); + } + alignEl?.SetAttributeValue("horizontal", value.ToUpperInvariant()); + } + break; + default: + unsupported.Add(key); + break; + } + } + _dirty = true; + SaveHeader(); + return unsupported; + } + + var element = ResolvePath(path); + + foreach (var (key, value) in properties) + { + switch (element.Name.LocalName) + { + case "p": + if (!SetParagraphProp(element, key, value)) + unsupported.Add(key); + break; + case "run": + if (!SetRunProp(element, key, value)) + unsupported.Add(key); + break; + case "t": + if (key.Equals("text", StringComparison.OrdinalIgnoreCase)) + SetTextProp(element, value); + else + unsupported.Add(key); // Don't silently coerce unsupported keys to text + break; + case "tc": + if (!SetCellProp(element, key, value)) + unsupported.Add(key); + break; + case "tr": + if (!SetRowProp(element, key, value)) + unsupported.Add(key); + break; + case "tbl": + if (!SetTableProp(element, key, value)) + unsupported.Add(key); + break; + case "sec": + if (!SetSectionProp(element, key, value)) + unsupported.Add(key); + break; + case "line" or "rect" or "ellipse" or "polygon" or "pic" or "connectLine": + if (!SetShapeProp(element, key, value)) + unsupported.Add(key); + break; + default: + SetGenericAttr(element, key, value); + break; + } + } + + _dirty = true; + // Save to correct part: header elements live in header.xml, not a section + if (element.Document?.Root == _doc.Header?.Root) + SaveHeader(); + else + SaveSection(element); + return unsupported; + } + + // ==================== Text ==================== + + /// + /// Remove stale linesegarray from the nearest ancestor <hp:p>. + /// HWPX linesegarray is a layout cache generated by Hancom's renderer. + /// When text content changes, the cache becomes stale and causes Hancom to + /// render text on a single compressed line (overlapping characters). + /// Removing it forces Hancom to recalculate layout on open. + /// + private static void InvalidateLinesegarray(XElement element) + { + var para = element.Name.LocalName == "p" + ? element + : element.AncestorsAndSelf().FirstOrDefault(e => e.Name.LocalName == "p"); + para?.Elements(HwpxNs.Hp + "linesegarray").Remove(); + } + + /// + /// Replace the text content of an <hp:t> element. + /// + private void SetTextProp(XElement tElement, string value) + { + InvalidateLinesegarray(tElement); + tElement.Value = value; + } + + // ==================== Table ==================== + + /// + /// Dispatch table property by name. + /// + private bool SetTableProp(XElement tbl, string property, string value) + { + var lower = property.ToLowerInvariant(); + if (lower.StartsWith("colwidth")) + return SetIndividualColWidth(tbl, lower, value); + return lower switch + { + "borderfillid" or "borderfillidref" => SetAttribute(tbl, "borderFillIDRef", value), + "cellspacing" => SetAttribute(tbl, "cellSpacing", value), + "align" or "tablealign" => SetTableAlignment(tbl, value), + _ => false + }; + } + + // ==================== Table Row ==================== + + private bool SetRowProp(XElement tr, string property, string value) + { + return property.ToLowerInvariant() switch + { + "height" or "rowheight" => SetRowHeight(tr, value), + _ => false + }; + } + + private static bool SetRowHeight(XElement tr, string value) + { + if (!int.TryParse(value, out var h)) return false; + foreach (var tc in tr.Elements(HwpxNs.Hp + "tc")) + { + var cellSz = tc.Element(HwpxNs.Hp + "cellSz"); + cellSz?.SetAttributeValue("height", h.ToString()); + } + return true; + } + + private bool SetTableAlignment(XElement tbl, string value) + { + var parentP = tbl.Ancestors(HwpxNs.Hp + "p").FirstOrDefault(); + if (parentP != null) + return SetParagraphProp(parentP, "align", value) == true; + return false; + } + + private static bool SetIndividualColWidth(XElement tbl, string propName, string value) + { + var indexStr = propName.Replace("colwidth", ""); + if (!int.TryParse(indexStr, out var colIdx)) return false; + colIdx--; + var colSzElements = tbl.Elements(HwpxNs.Hp + "colSz").ToList(); + if (colIdx < 0 || colIdx >= colSzElements.Count) return false; + colSzElements[colIdx].SetAttributeValue("width", value); + return true; + } + + // ==================== Table Cell ==================== + + /// + /// Dispatch table cell property by name. + /// Supports: text, colspan, rowspan, borderfillid. + /// + private bool SetCellProp(XElement tc, string property, string value) + { + return property.ToLowerInvariant() switch + { + "text" => SetCellText(tc, value), + "colspan" => SetCellSpan(tc, "colSpan", value), + "rowspan" => SetCellSpan(tc, "rowSpan", value), + "borderfillid" or "borderfillidref" => SetAttribute(tc, "borderFillIDRef", value), + "valign" or "verticalalign" or "vertical-align" => SetCellVertAlign(tc, value), + "align" or "halign" or "textalign" => SetCellHorzAlign(tc, value), + "shading" or "bgcolor" or "fillcolor" => SetCellShading(tc, value), + "bordercolor" => SetCellBorder(tc, color: value), + "borderwidth" => SetCellBorder(tc, width: value), + "bordertype" or "borderstyle" => SetCellBorder(tc, type: value), + _ => false + }; + } + + /// + /// Set text content of a table cell by navigating tc → subList → p → run → t. + /// + private static bool SetCellVertAlign(XElement tc, string value) + { + var subList = tc.Element(HwpxNs.Hp + "subList"); + if (subList == null) return false; + subList.SetAttributeValue("vertAlign", value.ToUpperInvariant()); + return true; + } + + /// Set horizontal text alignment inside a cell by modifying the cell's paragraph paraPr. + private bool SetCellHorzAlign(XElement tc, string value) + { + var subList = tc.Element(HwpxNs.Hp + "subList"); + var para = subList?.Element(HwpxNs.Hp + "p") ?? tc.Element(HwpxNs.Hp + "p"); + if (para == null) return false; + return SetParagraphProp(para, "align", value) == true; + } + + private bool SetCellText(XElement tc, string text) + { + var subList = tc.Element(HwpxNs.Hp + "subList"); + if (subList == null) return false; + + var paragraphs = subList.Elements(HwpxNs.Hp + "p").ToList(); + if (paragraphs.Count == 0) return false; + + // Set text on the first paragraph + var result = SetParagraphText(paragraphs[0], text); + + // Remove ALL remaining paragraphs (guide text, placeholders, etc.) + // This ensures template guide text like "※ 내용을 입력하세요" is cleared. + foreach (var extra in paragraphs.Skip(1)) + extra.Remove(); + + return result; + } + + /// + /// Set rowSpan or colSpan on a cell. Prefers the separate <hp:cellSpan> element + /// (Hancom native format); falls back to cellAddr attributes for legacy documents. + /// + private static bool SetCellSpan(XElement tc, string spanAttr, string value) + { + if (!int.TryParse(value, out var spanVal) || spanVal < 1) + return false; + + // Prefer separate element (Hancom native format) + var cellSpan = tc.Element(HwpxNs.Hp + "cellSpan"); + if (cellSpan != null) + { + cellSpan.SetAttributeValue(spanAttr, spanVal.ToString()); + return true; + } + + // Fallback: create cellSpan element if cellAddr exists + var cellAddr = tc.Element(HwpxNs.Hp + "cellAddr"); + if (cellAddr == null) return false; + + // Check if span was on cellAddr (legacy) + if (cellAddr.Attribute(spanAttr) != null) + { + cellAddr.SetAttributeValue(spanAttr, spanVal.ToString()); + return true; + } + + // Create new cellSpan element after cellAddr + var newCellSpan = new XElement(HwpxNs.Hp + "cellSpan", + new XAttribute("colSpan", spanAttr == "colSpan" ? spanVal.ToString() : "1"), + new XAttribute("rowSpan", spanAttr == "rowSpan" ? spanVal.ToString() : "1")); + cellAddr.AddAfterSelf(newCellSpan); + return true; + } + + // ==================== Section ==================== + + /// + /// Dispatch section-level property by name. + /// Section properties live in secPr (child of section root). + /// + private bool SetSectionProp(XElement sectionRoot, string property, string value) + { + return property.ToLowerInvariant() switch + { + "pagebackground" or "pagebg" or "backgroundcolor" => SetPageBackground(sectionRoot, value), + "orientation" => SetOrientation(sectionRoot, value), + "pagewidth" => SetPageDimension(sectionRoot, "width", value), + "pageheight" => SetPageDimension(sectionRoot, "height", value), + "margintop" or "margin-top" => SetPageMargin(sectionRoot, "top", value), + "marginbottom" or "margin-bottom" => SetPageMargin(sectionRoot, "bottom", value), + "marginleft" or "margin-left" => SetPageMargin(sectionRoot, "left", value), + "marginright" or "margin-right" => SetPageMargin(sectionRoot, "right", value), + _ => false + }; + } + + private bool SetOrientation(XElement sectionRoot, string value) + { + var pagePr = sectionRoot.Descendants(HwpxNs.Hp + "pagePr").FirstOrDefault(); + if (pagePr == null) return false; + // Hancom: NARROWLY = landscape, WIDELY = portrait. Dimensions DON'T change. + var isLandscape = value.Equals("LANDSCAPE", StringComparison.OrdinalIgnoreCase) + || value.Equals("NARROWLY", StringComparison.OrdinalIgnoreCase); + pagePr.SetAttributeValue("landscape", isLandscape ? "NARROWLY" : "WIDELY"); + return true; + } + + private static bool SetPageDimension(XElement sectionRoot, string attr, string value) + { + var pagePr = sectionRoot.Descendants(HwpxNs.Hp + "pagePr").FirstOrDefault(); + pagePr?.SetAttributeValue(attr, value); + return pagePr != null; + } + + private static bool SetPageMargin(XElement sectionRoot, string side, string value) + { + var margin = sectionRoot.Descendants(HwpxNs.Hp + "margin") + .FirstOrDefault(m => m.Parent?.Name == HwpxNs.Hp + "pagePr"); + margin?.SetAttributeValue(side, value); + return margin != null; + } + + /// + /// Set page background color via secPr > pageBorderFill. + /// Creates a borderFill with no borders and the specified fill color. + /// + private bool SetPageBackground(XElement sectionRoot, string color) + { + var bfId = CreateCustomBorderFill( + borderColor: "#000000", borderWidth: "0.00mm", borderType: "NONE", + fillColor: color); + + var secPr = sectionRoot.Descendants(HwpxNs.Hp + "secPr").FirstOrDefault(); + if (secPr == null) return false; + + var pageBf = secPr.Element(HwpxNs.Hp + "pageBorderFill"); + if (pageBf == null) + { + pageBf = new XElement(HwpxNs.Hp + "pageBorderFill", + new XAttribute("type", "BOTH"), + new XAttribute("borderFillIDRef", bfId)); + secPr.Add(pageBf); + } + else + { + pageBf.SetAttributeValue("borderFillIDRef", bfId); + } + return true; + } + + // ==================== Paragraph ==================== + + /// + /// Dispatch paragraph property by name. + /// Returns true if the property was recognized and applied. + /// + private bool SetParagraphProp(XElement p, string property, string value) + { + var lower = property.ToLowerInvariant(); + var result = lower switch + { + "text" => SetParagraphText(p, value), + "style" or "styleidref" => SetAttribute(p, "styleIDRef", value), + "align" or "alignment" => SetParagraphAlignment(p, value), + "indent" or "leftindent" => SetParagraphIndent(p, value, "left"), + "rightindent" => SetParagraphIndent(p, value, "right"), + "parapridref" => SetAttribute(p, "paraPrIDRef", value), + "spacebefore" or "spacingbefore" => SetParaPrSpacing(p, "before", value), + "spaceafter" or "spacingafter" => SetParaPrSpacing(p, "after", value), + "linespacing" or "lineheight" => SetParaPrSpacing(p, "lineSpacing", value), + "linespacingtype" => SetParaPrSpacing(p, "lineSpacingType", value), + "outlinelevel" or "heading" => SetParaPrHeadingLevel(p, value), + "liststyle" or "list" or "bullet" => SetListStyle(p, value), + "keepnext" or "keepwithnext" => SetBreakSetting(p, "keepWithNext", value), + "keeplines" => SetBreakSetting(p, "keepLines", value), + "pagebreakbefore" => SetBreakSetting(p, "pageBreakBefore", value), + "widowcontrol" or "widoworphan" => SetBreakSetting(p, "widowOrphan", value), + "hangingindent" or "hanging" => SetParagraphHangingIndent(p, value), + "style" or "stylename" => SetStyleByName(p, value), + _ => (bool?)null // not a paragraph-level prop + }; + if (result.HasValue) return result.Value; + + // Delegate run-level properties (bold, italic, superscript, highlight, etc.) + // to ALL runs inside the paragraph + var runs = p.Elements(HwpxNs.Hp + "run").ToList(); + if (runs.Count == 0) return false; + bool any = false; + foreach (var run in runs) + { + if (SetRunProp(run, property, value)) + any = true; + } + return any; + } + + /// + /// Clear existing runs and set new text in a single run. + /// + private bool SetParagraphText(XElement para, string text) + { + // Preserve first run's charPrIDRef if available + var existingRun = para.Elements(HwpxNs.Hp + "run").FirstOrDefault(); + var charPrIdRef = existingRun?.Attribute("charPrIDRef")?.Value ?? "0"; + + // Remove stale linesegarray (layout cache) — see InvalidateLinesegarray(). + InvalidateLinesegarray(para); + + // CRITICAL: preserve runs that contain secPr, ctrl, or other structural elements + // Only remove runs that are purely text-bearing + var runs = para.Elements(HwpxNs.Hp + "run").ToList(); + var structuralRuns = runs.Where(r => + r.Elements(HwpxNs.Hp + "secPr").Any() || + r.Elements(HwpxNs.Hp + "ctrl").Any()).ToList(); + var textRuns = runs.Except(structuralRuns).ToList(); + + // Remove only text runs, strip text from structural runs + foreach (var tr in textRuns) + tr.Remove(); + foreach (var sr in structuralRuns) + sr.Elements(HwpxNs.Hp + "t").Remove(); + + // Add new text run + var run = new XElement(HwpxNs.Hp + "run", + new XAttribute("charPrIDRef", charPrIdRef), + new XElement(HwpxNs.Hp + "t", text)); + // Insert text run before structural runs so text appears first + var firstStructural = para.Elements(HwpxNs.Hp + "run").FirstOrDefault(); + if (firstStructural != null) + firstStructural.AddBeforeSelf(run); + else + para.Add(run); + return true; + } + + /// + /// Set paragraph alignment via header.xml paraPr. + /// Alignment values: "left", "center", "right", "justify", "distribute". + /// Real HWPX stores alignment as a CHILD ELEMENT: <hh:align horizontal="LEFT" vertical="BASELINE"/> + /// Values are UPPERCASE: LEFT, CENTER, RIGHT, JUSTIFY, DISTRIBUTE. + /// + private bool SetParagraphAlignment(XElement para, string alignment) + { + if (_doc.Header?.Root == null) + return false; + + // HWPX uses uppercase alignment values + var normalizedAlign = alignment.ToLowerInvariant() switch + { + "left" or "l" => "LEFT", + "center" or "c" => "CENTER", + "right" or "r" => "RIGHT", + "justify" or "j" => "JUSTIFY", + "distribute" or "d" => "DISTRIBUTE", + _ => alignment.ToUpperInvariant() + }; + + var paraPr = CloneParaPrIfShared(para); + if (paraPr == null) + return false; + + // Alignment is a child element + var alignEl = paraPr.Element(HwpxNs.Hh + "align"); + if (alignEl == null) + { + alignEl = new XElement(HwpxNs.Hh + "align", + new XAttribute("horizontal", normalizedAlign), + new XAttribute("vertical", "BASELINE")); + paraPr.AddFirst(alignEl); + } + else + { + alignEl.SetAttributeValue("horizontal", normalizedAlign); + } + + SaveHeader(); + return true; + } + + /// + /// Set paragraph indentation via header.xml paraPr. + /// Units are in HWPUNIT (1 HWPUNIT ≈ 1/7200 inch; 1000 ≈ 10pt at 7200 DPI). + /// + private bool SetParagraphIndent(XElement para, string value, string side) + { + if (_doc.Header?.Root == null) + return false; + + if (!int.TryParse(value, out var indentValue)) + return false; + + // Map user-facing side names to HWPX element local names + var elementName = side.ToLowerInvariant() switch + { + "left" => "left", + "right" => "right", + "indent" or "intent" => "intent", // first-line indent + "before" or "prev" => "prev", // space before paragraph + "after" or "next" => "next", // space after paragraph + _ => side + }; + + var paraPr = CloneParaPrIfShared(para); + if (paraPr == null) + return false; + + // Find . If inside /, target the default. + var margin = paraPr.Element(HwpxNs.Hh + "margin") + ?? paraPr.Descendants(HwpxNs.Hh + "margin") + .FirstOrDefault(m => m.Parent?.Name.LocalName == "default"); + if (margin == null) + { + margin = new XElement(HwpxNs.Hh + "margin"); + paraPr.Add(margin); + } + + // Margin values are child elements: + var child = margin.Element(HwpxNs.Hc + elementName); + if (child == null) + { + child = new XElement(HwpxNs.Hc + elementName, + new XAttribute("value", indentValue.ToString()), + new XAttribute("unit", "HWPUNIT")); + margin.Add(child); + } + else + { + child.SetAttributeValue("value", indentValue.ToString()); + } + + SaveHeader(); + return true; + } + + // ==================== Run ==================== + + /// + /// Dispatch run property by name. + /// Run properties are stored on the charPr in header.xml. + /// + private bool SetRunProp(XElement run, string property, string value) + { + return property.ToLowerInvariant() switch + { + "text" => SetRunText(run, value), + "charpridref" => SetAttribute(run, "charPrIDRef", value), + "bold" or "italic" or "underline" or "strikeout" + or "fontsize" or "textcolor" or "color" + or "fonthangul" or "fontlatin" + or "superscript" or "subscript" + or "charspacing" or "letterspacing" or "spacing" + or "shadecolor" or "shading" + => EnsureCharPrProp(run, property.ToLowerInvariant(), value), + "highlight" or "markpen" => SetHighlight(run, value), + _ => false + }; + } + + /// + /// Replace text content of all <hp:t> children in a run. + /// + private bool SetRunText(XElement run, string text) + { + InvalidateLinesegarray(run); + var tElements = run.Elements(HwpxNs.Hp + "t").ToList(); + if (tElements.Count == 0) + { + run.Add(new XElement(HwpxNs.Hp + "t", text)); + } + else + { + // Set text on first , remove the rest + tElements[0].Value = text; + foreach (var extra in tElements.Skip(1)) + extra.Remove(); + } + return true; + } + + // ==================== Paragraph Spacing ==================== + + /// + /// Set a spacing attribute on the paragraph's paraPr in header.xml. + /// Spacing is stored as attributes on <hh:spacing> element (not child elements). + /// attrName: "before", "after", "lineSpacing", "lineSpacingType". + /// lineSpacingType: PERCENT, FIXED, BETWEEN_LINES. + /// + private bool SetParaPrSpacing(XElement para, string attrName, string value) + { + if (_doc.Header?.Root == null) + return false; + + var paraPr = CloneParaPrIfShared(para); + if (paraPr == null) + return false; + + // Remove old-style element (incorrect structure from prior implementation) + paraPr.Element(HwpxNs.Hh + "spacing")?.Remove(); + + // HWPX spacing uses > / blocks + // containing (with /) and + var hpSwitch = paraPr.Element(HwpxNs.Hp + "switch"); + if (hpSwitch == null) + { + hpSwitch = new XElement(HwpxNs.Hp + "switch"); + var border = paraPr.Element(HwpxNs.Hh + "border"); + if (border != null) + border.AddBeforeSelf(hpSwitch); + else + paraPr.Add(hpSwitch); + } + + var hpCase = hpSwitch.Element(HwpxNs.Hp + "case"); + if (hpCase == null) + { + hpCase = new XElement(HwpxNs.Hp + "case", + new XAttribute(HwpxNs.Hp + "required-namespace", + "http://www.hancom.co.kr/hwpml/2016/HwpUnitChar")); + hpSwitch.AddFirst(hpCase); + } + + var hpDefault = hpSwitch.Element(HwpxNs.Hp + "default"); + if (hpDefault == null) + { + hpDefault = new XElement(HwpxNs.Hp + "default"); + hpSwitch.Add(hpDefault); + } + + if (attrName == "lineSpacing") + { + // lineSpacing value is same in both case and default + SetLineSpacingInBlock(hpCase, "value", value); + SetLineSpacingInBlock(hpDefault, "value", value); + } + else if (attrName == "lineSpacingType") + { + SetLineSpacingInBlock(hpCase, "type", value); + SetLineSpacingInBlock(hpDefault, "type", value); + } + else + { + // before → prev, after → next + var marginChild = attrName == "before" ? "prev" : "next"; + if (!int.TryParse(value, out var caseVal)) + return false; + + var defaultVal = caseVal * 2; // default block = 2× case value + SetMarginChild(hpCase, marginChild, caseVal.ToString()); + SetMarginChild(hpDefault, marginChild, defaultVal.ToString()); + } + + SaveHeader(); + return true; + } + + /// Set a child element value inside <hh:margin> within a switch block. + private static void SetMarginChild(XElement switchBlock, string childName, string value) + { + var margin = switchBlock.Element(HwpxNs.Hh + "margin"); + if (margin == null) + { + margin = new XElement(HwpxNs.Hh + "margin"); + switchBlock.AddFirst(margin); + } + + var child = margin.Element(HwpxNs.Hc + childName); + if (child == null) + { + child = new XElement(HwpxNs.Hc + childName, + new XAttribute("value", value), + new XAttribute("unit", "HWPUNIT")); + margin.Add(child); + } + else + { + child.SetAttributeValue("value", value); + } + } + + /// Set lineSpacing attribute inside a switch block. + private static void SetLineSpacingInBlock(XElement switchBlock, string attrName, string value) + { + var ls = switchBlock.Element(HwpxNs.Hh + "lineSpacing"); + if (ls == null) + { + ls = new XElement(HwpxNs.Hh + "lineSpacing", + new XAttribute("type", "PERCENT"), + new XAttribute("value", "160"), + new XAttribute("unit", "HWPUNIT")); + switchBlock.Add(ls); + } + ls.SetAttributeValue(attrName, value); + } + + // ==================== Paragraph Heading / Outline Level ==================== + + /// + /// Set the outline/heading level on a paragraph's paraPr. + /// Value "0" or "none" removes the heading. Values 1-9 set the heading level. + /// + private bool SetParaPrHeadingLevel(XElement para, string value) + { + if (_doc.Header?.Root == null) + return false; + + var paraPr = CloneParaPrIfShared(para); + if (paraPr == null) + return false; + + var heading = paraPr.Element(HwpxNs.Hh + "heading"); + + if (value == "0" || value.Equals("none", StringComparison.OrdinalIgnoreCase)) + { + heading?.Remove(); + } + else + { + if (heading == null) + { + heading = new XElement(HwpxNs.Hh + "heading", + new XAttribute("type", "OUTLINE"), + new XAttribute("idRef", "0"), + new XAttribute("level", value)); + paraPr.Add(heading); + } + else + { + heading.SetAttributeValue("level", value); + heading.SetAttributeValue("type", "OUTLINE"); + } + } + + SaveHeader(); + return true; + } + + // ==================== Form Field ==================== + + private List SetFormFieldValue(string idOrIndex, Dictionary props) + { + var unsupported = new List(); + + foreach (var sec in _doc.Sections) + { + foreach (var run in sec.Root.Descendants(HwpxNs.Hp + "run").ToList()) + { + var ctrl = run.Element(HwpxNs.Hp + "ctrl"); + var fieldBegin = ctrl?.Element(HwpxNs.Hp + "fieldBegin"); + if (fieldBegin == null) continue; + var instId = fieldBegin.Attribute("id")?.Value; + var fieldId = fieldBegin.Attribute("fieldid")?.Value; + var fieldName = fieldBegin.Attribute("name")?.Value; + if (instId != idOrIndex && fieldId != idOrIndex && fieldName != idOrIndex) continue; + + // Update display text in the next run's + var nextRun = run.ElementsAfterSelf(HwpxNs.Hp + "run").FirstOrDefault(); + var t = nextRun?.Element(HwpxNs.Hp + "t"); + if (t != null) + { + ApplyFormFieldValue(fieldBegin, t, props); + fieldBegin.SetAttributeValue("dirty", "1"); + _dirty = true; + SaveSection(sec.Root); + } + return unsupported; + } + } + unsupported.Add($"formfield [{idOrIndex}] not found"); + return unsupported; + } + + private static void ApplyFormFieldValue(XElement fieldBegin, XElement textElement, Dictionary props) + { + var fieldType = fieldBegin.Attribute("type")?.Value ?? "CLICK_HERE"; + var value = props.GetValueOrDefault("value") ?? props.GetValueOrDefault("text") ?? ""; + + InvalidateLinesegarray(textElement); + + switch (fieldType) + { + case "CHECKBOX": + { + var isChecked = props.TryGetValue("checked", out var checkedValue) + ? ParseHelpers.IsTruthy(checkedValue) + : ParseHelpers.IsTruthy(value); + SetFieldParamValue(fieldBegin, "Checked", isChecked ? "1" : "0"); + textElement.Value = isChecked + ? (props.GetValueOrDefault("checkedtext") ?? "☑") + : (props.GetValueOrDefault("uncheckedtext") ?? "☐"); + break; + } + case "DROPDOWN": + { + var options = GetFieldParamValue(fieldBegin, "Items") + ?.Split('|', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries) + ?? []; + + var selectedIndex = ResolveDropdownSetIndex(props, options); + if (options.Length > 0) + { + SetFieldParamValue(fieldBegin, "SelectedIndex", selectedIndex.ToString()); + textElement.Value = options[selectedIndex]; + } + else + { + textElement.Value = value; + } + break; + } + default: + textElement.Value = value; + break; + } + } + + private static int ResolveDropdownSetIndex(Dictionary props, string[] options) + { + if (options.Length == 0) return 0; + + if (int.TryParse(props.GetValueOrDefault("selectedindex"), out var parsedIndex)) + return Math.Clamp(parsedIndex, 0, options.Length - 1); + + var desired = props.GetValueOrDefault("value") ?? props.GetValueOrDefault("text"); + if (!string.IsNullOrEmpty(desired)) + { + var matchedIndex = Array.FindIndex(options, option => + string.Equals(option, desired, StringComparison.Ordinal)); + if (matchedIndex >= 0) return matchedIndex; + } + + return 0; + } + + private static string? GetFieldParamValue(XElement fieldBegin, string name) + { + return fieldBegin.Descendants() + .FirstOrDefault(e => e.Attribute("name")?.Value == name) + ?.Value; + } + + private static void SetFieldParamValue(XElement fieldBegin, string name, string value) + { + var param = fieldBegin.Descendants() + .FirstOrDefault(e => e.Attribute("name")?.Value == name); + if (param == null) return; + + param.Value = value; + } + + // ==================== Shape Properties ==================== + + private static bool SetShapeProp(XElement shape, string property, string value) + { + return property.ToLowerInvariant() switch + { + "wrap" or "textwrap" => SetShapeWrap(shape, value), + "width" => SetShapeDimension(shape, "width", value), + "height" => SetShapeDimension(shape, "height", value), + "x" => SetShapeOffset(shape, isVertical: false, value), + "y" => SetShapeOffset(shape, isVertical: true, value), + "lock" => SetShapeLock(shape, value), + _ => false + }; + } + + /// Set text wrap mode. "char"/"inline" = 글자처럼 취급 (treatAsChar=1). + private static bool SetShapeWrap(XElement shape, string value) + { + var isInline = value.Equals("char", StringComparison.OrdinalIgnoreCase) + || value.Equals("inline", StringComparison.OrdinalIgnoreCase); + var wrapValue = value.ToUpperInvariant() switch + { + "CHAR" or "INLINE" => "TOP_AND_BOTTOM", + "SQUARE" => "SQUARE", + "BEHIND" => "BEHIND_TEXT", + "FRONT" => "IN_FRONT_OF_TEXT", + "TIGHT" or "WRAP" or "TOPBOTTOM" or "TOP_AND_BOTTOM" => "TOP_AND_BOTTOM", + _ => value.ToUpperInvariant() + }; + shape.SetAttributeValue("textWrap", wrapValue); + var pos = shape.Element(HwpxNs.Hp + "pos"); + pos?.SetAttributeValue("treatAsChar", isInline ? "1" : "0"); + return true; + } + + private static bool SetShapeOffset(XElement shape, bool isVertical, string value) + { + var pos = shape.Element(HwpxNs.Hp + "pos"); + if (pos == null) return false; + var attr = isVertical ? "vertOffset" : "horzOffset"; + pos.SetAttributeValue(attr, ParseDimensionToHwpUnit(value).ToString()); + return true; + } + + private static bool SetShapeLock(XElement shape, string value) + { + var boolValue = value.Equals("true", StringComparison.OrdinalIgnoreCase) || value == "1"; + shape.SetAttributeValue("lock", boolValue ? "1" : "0"); + return true; + } + + private static bool SetShapeDimension(XElement shape, string attr, string value) + { + var sz = shape.Element(HwpxNs.Hp + "sz"); + sz?.SetAttributeValue(attr, value); + return sz != null; + } + + // ==================== Style by Name ==================== + + private bool SetStyleByName(XElement p, string styleName) + { + var style = _doc.Header?.Root?.Descendants(HwpxNs.Hh + "style") + .FirstOrDefault(s => s.Attribute("name")?.Value == styleName + || s.Attribute("engName")?.Value?.Equals(styleName, StringComparison.OrdinalIgnoreCase) == true); + if (style == null) return false; + p.SetAttributeValue("styleIDRef", style.Attribute("id")?.Value); + return true; + } + + // ==================== Break Settings / Indent ==================== + + /// + /// Set a breakSetting attribute on a paragraph's paraPr. + /// XML: hh:paraPr > hh:breakSetting keepWithNext="0|1" ... + /// + private bool SetBreakSetting(XElement para, string attr, string value) + { + if (_doc.Header?.Root == null) return false; + var paraPr = CloneParaPrIfShared(para); + if (paraPr == null) return false; + + var bs = paraPr.Element(HwpxNs.Hh + "breakSetting"); + if (bs == null) + { + bs = new XElement(HwpxNs.Hh + "breakSetting", + new XAttribute("breakLatinWord", "KEEP_WORD"), + new XAttribute("breakNonLatinWord", "BREAK_WORD"), + new XAttribute("widowOrphan", "0"), + new XAttribute("keepWithNext", "0"), + new XAttribute("keepLines", "0"), + new XAttribute("pageBreakBefore", "0"), + new XAttribute("lineWrap", "BREAK")); + paraPr.Add(bs); + } + var boolVal = value.Equals("true", StringComparison.OrdinalIgnoreCase) || value == "1" ? "1" : "0"; + bs.SetAttributeValue(attr, boolVal); + SaveHeader(); + return true; + } + + /// + /// Set hanging indent on a paragraph's paraPr margin. + /// Hanging indent = negative indent + positive left. Value in HWPML units (283 ≈ 1mm). + /// + private bool SetParagraphHangingIndent(XElement para, string value) + { + if (!int.TryParse(value, out var hangVal) || hangVal <= 0) return false; + if (_doc.Header?.Root == null) return false; + var paraPr = CloneParaPrIfShared(para); + if (paraPr == null) return false; + + // Update direct margin element + var margin = paraPr.Element(HwpxNs.Hh + "margin"); + if (margin == null) + { + margin = new XElement(HwpxNs.Hh + "margin", + new XAttribute("indent", "0"), + new XAttribute("left", "0"), new XAttribute("right", "0"), + new XAttribute("prev", "0"), new XAttribute("next", "0")); + paraPr.Add(margin); + } + margin.SetAttributeValue("indent", (-hangVal).ToString()); + var currentLeft = (int?)margin.Attribute("left") ?? 0; + if (currentLeft < hangVal) + margin.SetAttributeValue("left", hangVal.ToString()); + + // Also update margins inside blocks (Hancom reads these) + foreach (var switchMargin in paraPr.Descendants(HwpxNs.Hh + "margin")) + { + if (switchMargin == margin) continue; + // Update child elements: and + var intentEl = switchMargin.Element(HwpxNs.Hc + "intent"); + intentEl?.SetAttributeValue("value", (-hangVal).ToString()); + var leftEl = switchMargin.Element(HwpxNs.Hc + "left"); + leftEl?.SetAttributeValue("value", hangVal.ToString()); + } + + SaveHeader(); + return true; + } + + // ==================== Numbering / List ==================== + + /// + /// Set list style on a paragraph. Creates a numbering definition in header.xml if needed, + /// then links the paragraph's paraPr to it via heading element. + /// Values: "bullet" (●), "number" or "decimal" (1. 2. 3.), "circle" (○), + /// "dash" (–), "none" (remove list). + /// + private bool SetListStyle(XElement para, string style) + { + if (_doc.Header?.Root == null) return false; + + var lower = style.ToLowerInvariant(); + if (lower == "none" || lower == "false" || lower == "0") + { + // Remove list: clear heading from paraPr + return SetParaPrHeadingLevel(para, "0"); + } + + // Determine numbering format and text pattern + var (format, textPattern) = lower switch + { + "bullet" or "disc" => ("BULLET", "●"), + "circle" => ("BULLET", "○"), + "dash" => ("BULLET", "–"), + "number" or "decimal" or "numbered" => ("DIGIT", "%d."), + "roman" => ("ROMAN_CAPITAL", "%d."), + "romanlower" or "roman_small" => ("ROMAN_SMALL", "%d."), + "hangul" => ("HANGUL", "%d."), + "hanja" => ("HANJA", "%d."), + _ => ("BULLET", "●") + }; + + // Find or create numbering definition in header.xml + var numId = EnsureNumberingDef(format, textPattern); + + // Set paraPr heading to reference the numbering + var paraPr = CloneParaPrIfShared(para); + if (paraPr == null) return false; + + var heading = paraPr.Element(HwpxNs.Hh + "heading"); + if (heading == null) + { + heading = new XElement(HwpxNs.Hh + "heading", + new XAttribute("type", "NUMBER"), + new XAttribute("idRef", numId), + new XAttribute("level", "1")); + paraPr.Add(heading); + } + else + { + heading.SetAttributeValue("type", "NUMBER"); + heading.SetAttributeValue("idRef", numId); + heading.SetAttributeValue("level", "1"); + } + + // Set left indent for list items (standard 800 HWPUNIT indent) + var margin = paraPr.Element(HwpxNs.Hh + "margin"); + if (margin == null) + { + margin = new XElement(HwpxNs.Hh + "margin"); + paraPr.Add(margin); + } + var leftChild = margin.Element(HwpxNs.Hc + "left"); + if (leftChild == null) + { + margin.Add(new XElement(HwpxNs.Hc + "left", + new XAttribute("value", "800"), + new XAttribute("unit", "HWPUNIT"))); + } + + SaveHeader(); + return true; + } + + /// + /// Find or create a numbering definition in header.xml. + /// Returns the numbering id string. + /// + private string EnsureNumberingDef(string format, string textPattern) + { + var header = _doc.Header!.Root!; + var refList = header.Element(HwpxNs.Hh + "refList"); + + // Find numberings container + var numberings = refList?.Element(HwpxNs.Hh + "numberings"); + if (numberings == null) + { + // Create numberings container + numberings = new XElement(HwpxNs.Hh + "numberings", new XAttribute("itemCnt", "0")); + if (refList == null) + { + refList = new XElement(HwpxNs.Hh + "refList"); + header.Add(refList); + } + refList.Add(numberings); + } + + // Check for existing matching numbering + foreach (var num in numberings.Elements(HwpxNs.Hh + "numbering")) + { + var paraHead = num.Element(HwpxNs.Hh + "paraHead"); + if (paraHead != null) + { + var existingFormat = paraHead.Attribute("format")?.Value; + var existingText = paraHead.Element(HwpxNs.Hh + "text")?.Value; + if (existingFormat == format && existingText == textPattern) + return num.Attribute("id")?.Value ?? "1"; + } + } + + // Create new numbering definition + var maxId = numberings.Elements(HwpxNs.Hh + "numbering") + .Select(n => int.TryParse(n.Attribute("id")?.Value, out var id) ? id : 0) + .DefaultIfEmpty(0).Max(); + var newId = (maxId + 1).ToString(); + + var newNumbering = new XElement(HwpxNs.Hh + "numbering", + new XAttribute("id", newId), + new XAttribute("start", "1"), + new XElement(HwpxNs.Hh + "paraHead", + new XAttribute("start", "1"), + new XAttribute("level", "1"), + new XAttribute("format", format), + new XAttribute("alignment", "LEFT"), + new XAttribute("useInstWidth", "1"), + new XAttribute("autoIndent", "1"), + new XAttribute("textOffset", "0"), + new XAttribute("numFormat", "1"), + new XElement(HwpxNs.Hh + "text", textPattern))); + + numberings.Add(newNumbering); + var count = numberings.Elements(HwpxNs.Hh + "numbering").Count(); + numberings.SetAttributeValue("itemCnt", count.ToString()); + + SaveHeader(); + return newId; + } + + // ==================== Highlight (Markpen) ==================== + + /// + /// Set highlight (markpen) on a run by inserting markpenBegin/markpenEnd markers + /// around the text content. This is NOT a charPr property — it's inline markers. + /// Value: color hex (e.g. "#FFFF00" for yellow), "none"/"false" to remove. + /// + private bool SetHighlight(XElement run, string color) + { + var textElem = run.Element(HwpxNs.Hp + "t"); + if (textElem == null) return false; + + // Remove existing markpen markers from INSIDE (correct location) + textElem.Elements(HwpxNs.Hp + "markpenBegin").ToList().ForEach(e => e.Remove()); + textElem.Elements(HwpxNs.Hp + "markpenEnd").ToList().ForEach(e => e.Remove()); + // Also clean up old-style sibling markers (wrong location from prior bug) + run.Elements(HwpxNs.Hp + "markpenBegin").ToList().ForEach(e => e.Remove()); + run.Elements(HwpxNs.Hp + "markpenEnd").ToList().ForEach(e => e.Remove()); + + var lower = color.ToLowerInvariant(); + if (lower != "none" && lower != "false" && lower != "0") + { + // Map common color names to hex + var hexColor = lower switch + { + "yellow" => "#FFFF00", + "green" => "#00FF00", + "cyan" => "#00FFFF", + "magenta" or "pink" => "#FF00FF", + "red" => "#FF0000", + "blue" => "#0000FF", + _ => color // assume hex + }; + + // Golden structure: markers INSIDE , wrapping text content + // text + textElem.AddFirst( + new XElement(HwpxNs.Hp + "markpenBegin", + new XAttribute("color", hexColor))); + textElem.Add( + new XElement(HwpxNs.Hp + "markpenEnd")); + } + + _dirty = true; + SaveSection(run); + return true; + } + + // ==================== Cell Shading & Border ==================== + + /// + /// Set cell background color by creating a new borderFill with the fill color + /// and assigning it to the cell's borderFillIDRef. + /// + private bool SetCellShading(XElement tc, string fillColor) + { + if (_doc.Header?.Root == null) return false; + + // Get current borderFill to preserve border settings + var currentBfId = tc.Attribute("borderFillIDRef")?.Value ?? "1"; + var currentBf = _doc.Header.Root.Descendants(HwpxNs.Hh + "borderFill") + .FirstOrDefault(e => e.Attribute("id")?.Value == currentBfId); + + // Clone existing border settings + var borderType = currentBf?.Element(HwpxNs.Hh + "leftBorder")?.Attribute("type")?.Value ?? "SOLID"; + var borderWidth = currentBf?.Element(HwpxNs.Hh + "leftBorder")?.Attribute("width")?.Value ?? "0.12mm"; + var borderColor = currentBf?.Element(HwpxNs.Hh + "leftBorder")?.Attribute("color")?.Value ?? "#000000"; + + var newBfId = CreateCustomBorderFill(borderColor, borderWidth, borderType, fillColor); + tc.SetAttributeValue("borderFillIDRef", newBfId); + return true; + } + + /// + /// Set cell border properties by creating a new borderFill and assigning it. + /// Only the specified parameters are changed; others are preserved from the current borderFill. + /// + private bool SetCellBorder(XElement tc, string? color = null, string? width = null, string? type = null) + { + if (_doc.Header?.Root == null) return false; + + var currentBfId = tc.Attribute("borderFillIDRef")?.Value ?? "1"; + var currentBf = _doc.Header.Root.Descendants(HwpxNs.Hh + "borderFill") + .FirstOrDefault(e => e.Attribute("id")?.Value == currentBfId); + + var borderType = type ?? currentBf?.Element(HwpxNs.Hh + "leftBorder")?.Attribute("type")?.Value ?? "SOLID"; + var borderWidth = width ?? currentBf?.Element(HwpxNs.Hh + "leftBorder")?.Attribute("width")?.Value ?? "0.12mm"; + var borderColor = color ?? currentBf?.Element(HwpxNs.Hh + "leftBorder")?.Attribute("color")?.Value ?? "#000000"; + + // Check for existing fill color to preserve + string? fillColor = null; + var existingFill = currentBf?.Element(HwpxNs.Hc + "fillBrush")?.Element(HwpxNs.Hc + "winBrush"); + if (existingFill != null) + fillColor = existingFill.Attribute("faceColor")?.Value; + + var newBfId = CreateCustomBorderFill(borderColor, borderWidth, borderType, fillColor); + tc.SetAttributeValue("borderFillIDRef", newBfId); + return true; + } + + /// + /// Create a custom borderFill in header.xml with specified border and optional fill settings. + /// Returns the new borderFill ID. + /// + private string CreateCustomBorderFill( + string borderColor = "#000000", + string borderWidth = "0.12mm", + string borderType = "SOLID", + string? fillColor = null) + { + var borderFills = _doc.Header!.Root!.Descendants(HwpxNs.Hh + "borderFill"); + var newId = NextBorderFillId(); + + var bf = new XElement(HwpxNs.Hh + "borderFill", + new XAttribute("id", newId), + new XAttribute("threeD", "0"), + new XAttribute("shadow", "0"), + new XAttribute("centerLine", "NONE"), + new XAttribute("breakCellSeparateLine", "0"), + new XElement(HwpxNs.Hh + "slash", + new XAttribute("type", "NONE"), new XAttribute("crooked", "0"), new XAttribute("isCounter", "0")), + new XElement(HwpxNs.Hh + "backSlash", + new XAttribute("type", "NONE"), new XAttribute("crooked", "0"), new XAttribute("isCounter", "0")), + MakeBorder("leftBorder", borderType, borderWidth, borderColor), + MakeBorder("rightBorder", borderType, borderWidth, borderColor), + MakeBorder("topBorder", borderType, borderWidth, borderColor), + MakeBorder("bottomBorder", borderType, borderWidth, borderColor), + MakeBorder("diagonal", "NONE", "0.00mm", "#000000")); + + if (fillColor != null) + { + bf.Add(new XElement(HwpxNs.Hc + "fillBrush", + new XElement(HwpxNs.Hc + "winBrush", + new XAttribute("faceColor", fillColor), + new XAttribute("hatchColor", "#FFFFFF"), + new XAttribute("alpha", "0")))); + } + + // Add to borderFills container + var container = _doc.Header!.Root!.Descendants(HwpxNs.Hh + "borderFills").FirstOrDefault(); + if (container != null) + { + container.Add(bf); + var count = container.Elements(HwpxNs.Hh + "borderFill").Count(); + container.SetAttributeValue("itemCnt", count.ToString()); + } + else if (borderFills.Any()) + { + borderFills.Last().AddAfterSelf(bf); + } + + SaveHeader(); + return newId; + } + + // ==================== CharPr Clone-or-Modify ==================== + + /// + /// CRITICAL: Set a character property on a run's charPr in header.xml. + /// + /// Algorithm: + /// 1. Get current charPrIDRef from the run. + /// 2. Find <hh:charPr id="N"> in header.xml. + /// 3. Scan ALL sections to check if this charPr is referenced by ANY other run. + /// → If yes: CLONE the charPr (assign NextCharPrId()), update run's charPrIDRef. + /// → If no: modify the charPr in place. + /// 4. Set the requested property on the (possibly cloned) charPr. + /// + private bool EnsureCharPrProp(XElement run, string prop, string value) + { + if (_doc.Header?.Root == null) + return false; + + var charPrIdRef = run.Attribute("charPrIDRef")?.Value; + if (charPrIdRef == null) + return false; + + // Find the charPr in header.xml + var charPr = _doc.Header.Root.Descendants(HwpxNs.Hh + "charPr") + .FirstOrDefault(cp => cp.Attribute("id")?.Value == charPrIdRef); + if (charPr == null) + return false; + + // Count how many runs across ALL sections reference this charPr + int refCount = 0; + foreach (var section in _doc.Sections) + { + foreach (var r in section.Root.Descendants(HwpxNs.Hp + "run")) + { + if (r.Attribute("charPrIDRef")?.Value == charPrIdRef) + refCount++; + } + } + + // Clone if shared: either multiple runs reference this charPr, + // or it's charPr 0 (the global default used by all new elements). + // Without this, modifying charPr 0 via fontsize=22 on one paragraph + // contaminates ALL paragraphs and table cells that use the default. + if (refCount > 1 || charPrIdRef == "0") + { + var newId = NextCharPrId(); + var cloned = new XElement(charPr); + cloned.SetAttributeValue("id", newId.ToString()); + // CRITICAL: Hancom uses POSITIONAL indexing (array index), not id-based lookup. + // Append at END of container so position matches the new ID. + var container = charPr.Parent!; + container.Add(cloned); + + // Update this run to point to the clone + run.SetAttributeValue("charPrIDRef", newId.ToString()); + charPr = cloned; + + // Update itemCnt on the parent container + var count = container.Elements(HwpxNs.Hh + "charPr").Count(); + container.SetAttributeValue("itemCnt", count.ToString()); + } + + // Apply the property to the charPr + ApplyCharPrProperty(charPr, prop, value); + SaveHeader(); + return true; + } + + /// + /// Apply a named property to a charPr element. + /// + private static void ApplyCharPrProperty(XElement charPr, string prop, string value) + { + switch (prop) + { + case "bold": + ToggleCharPrFlag(charPr, HwpxNs.Hh + "bold", value); + break; + + case "italic": + ToggleCharPrFlag(charPr, HwpxNs.Hh + "italic", value); + break; + + case "underline": + ToggleCharPrFlag(charPr, HwpxNs.Hh + "underline", value); + break; + + case "strikeout": + ToggleCharPrFlag(charPr, HwpxNs.Hh + "strikeout", value); + break; + + case "superscript": + ToggleCharPrFlag(charPr, HwpxNs.Hh + "supscript", value); + // Remove subscript if enabling superscript + if (value.Equals("true", StringComparison.OrdinalIgnoreCase) || value == "1") + charPr.Element(HwpxNs.Hh + "subscript")?.Remove(); + break; + + case "subscript": + ToggleCharPrFlag(charPr, HwpxNs.Hh + "subscript", value); + // Remove superscript if enabling subscript + if (value.Equals("true", StringComparison.OrdinalIgnoreCase) || value == "1") + charPr.Element(HwpxNs.Hh + "supscript")?.Remove(); + break; + + case "fontsize": + // HWPX font size in centi-points (1/100 pt): 1000 = 10pt, 2000 = 20pt + // User input is in pt — convert to centi-points + if (double.TryParse(value, out var ptSize)) + charPr.SetAttributeValue("height", ((int)(ptSize * 100)).ToString()); + break; + + case "textcolor" or "color": + charPr.SetAttributeValue("textColor", value); + break; + + case "charspacing" or "letterspacing" or "spacing": + // Character spacing in percent per script. 0 = normal, -5 = 5% tighter. + if (int.TryParse(value, out var spacingVal)) + { + var spacingEl = charPr.Element(HwpxNs.Hh + "spacing"); + if (spacingEl == null) + { + spacingEl = new XElement(HwpxNs.Hh + "spacing", + new XAttribute("hangul", "0"), new XAttribute("latin", "0"), + new XAttribute("hanja", "0"), new XAttribute("japanese", "0"), + new XAttribute("other", "0"), new XAttribute("symbol", "0"), + new XAttribute("user", "0")); + charPr.Add(spacingEl); + } + spacingEl.SetAttributeValue("hangul", spacingVal.ToString()); + spacingEl.SetAttributeValue("latin", spacingVal.ToString()); + } + break; + + case "shadecolor" or "shading": + charPr.SetAttributeValue("shadeColor", value); + break; + + case "fonthangul": + var fontRef = charPr.Element(HwpxNs.Hh + "fontRef"); + if (fontRef == null) + { + fontRef = new XElement(HwpxNs.Hh + "fontRef"); + charPr.Add(fontRef); + } + fontRef.SetAttributeValue("hangul", value); + break; + + case "fontlatin": + var fontRefLatin = charPr.Element(HwpxNs.Hh + "fontRef"); + if (fontRefLatin == null) + { + fontRefLatin = new XElement(HwpxNs.Hh + "fontRef"); + charPr.Add(fontRefLatin); + } + fontRefLatin.SetAttributeValue("latin", value); + break; + } + } + + /// + /// Normalize fontRef attributes to "0" (first declared font). + /// Golden XML shows sup/subscript charPrs always use fontRef="0". + /// + private static void NormalizeFontRef(XElement charPr) + { + var fontRef = charPr.Element(HwpxNs.Hh + "fontRef"); + if (fontRef == null) return; + foreach (var attr in fontRef.Attributes()) + attr.Value = "0"; + } + + /// + /// Toggle a boolean charPr flag element. + /// "true"/"1" → add element if missing; "false"/"0" → remove if present. + /// + private static void ToggleCharPrFlag(XElement charPr, XName flagName, string value) + { + var isTruthy = value.Equals("true", StringComparison.OrdinalIgnoreCase) + || value == "1"; + var existing = charPr.Element(flagName); + + if (isTruthy && existing == null) + { + charPr.Add(new XElement(flagName)); + } + else if (!isTruthy && existing != null) + { + existing.Remove(); + } + } + + // ==================== Style Clone Helpers ==================== + + /// + /// Clone charPr 0 to a new independent ID for use in new style definitions. + /// Prevents global default contamination when editing style properties. + /// + private int CloneCharPrForNewStyle() + { + var charPr0 = _doc.Header?.Root? + .Descendants(HwpxNs.Hh + "charPr") + .FirstOrDefault(e => e.Attribute("id")?.Value == "0"); + if (charPr0 == null) return 0; + + var newId = NextCharPrId(); + var cloned = new XElement(charPr0); + cloned.SetAttributeValue("id", newId.ToString()); + var container = charPr0.Parent!; + container.Add(cloned); + var count = container.Elements(HwpxNs.Hh + "charPr").Count(); + container.SetAttributeValue("itemCnt", count.ToString()); + return newId; + } + + /// + /// Clone paraPr 0 to a new independent ID for use in new style definitions. + /// Prevents global default contamination when editing style properties. + /// + private int CloneParaPrForNewStyle() + { + var paraPr0 = _doc.Header?.Root? + .Descendants(HwpxNs.Hh + "paraPr") + .FirstOrDefault(e => e.Attribute("id")?.Value == "0"); + if (paraPr0 == null) return 0; + + var newId = NextParaPrId(); + var cloned = new XElement(paraPr0); + cloned.SetAttributeValue("id", newId.ToString()); + var container = paraPr0.Parent!; + container.Add(cloned); + var count = container.Elements(HwpxNs.Hh + "paraPr").Count(); + container.SetAttributeValue("itemCnt", count.ToString()); + return newId; + } + + // ==================== ID Generators ==================== + + /// + /// Return max charPrIDRef across ALL sections + header, then add 1. + /// + private int NextCharPrId() + { + int maxId = 0; + + // Scan all run elements across all sections + foreach (var section in _doc.Sections) + { + foreach (var run in section.Root.Descendants(HwpxNs.Hp + "run")) + { + if (int.TryParse(run.Attribute("charPrIDRef")?.Value, out var id)) + maxId = Math.Max(maxId, id); + } + } + + // Scan header.xml charPr definitions + if (_doc.Header?.Root != null) + { + foreach (var charPr in _doc.Header.Root.Descendants(HwpxNs.Hh + "charPr")) + { + if (int.TryParse(charPr.Attribute("id")?.Value, out var id)) + maxId = Math.Max(maxId, id); + } + } + + return maxId + 1; + } + + /// + /// Return max paraPrIDRef across ALL sections + header, then add 1. + /// + private int NextParaPrId() + { + int maxId = 0; + + foreach (var section in _doc.Sections) + { + foreach (var p in section.Root.Descendants(HwpxNs.Hp + "p")) + { + if (int.TryParse(p.Attribute("paraPrIDRef")?.Value, out var id)) + maxId = Math.Max(maxId, id); + } + } + + if (_doc.Header?.Root != null) + { + foreach (var paraPr in _doc.Header.Root.Descendants(HwpxNs.Hh + "paraPr")) + { + if (int.TryParse(paraPr.Attribute("id")?.Value, out var id)) + maxId = Math.Max(maxId, id); + } + } + + return maxId + 1; + } + + /// + /// Check if a paraPr is referenced by any paragraph OTHER than the given one. + /// + private bool IsParaPrShared(string paraPrIdRef, XElement excludeParagraph) + { + foreach (var section in _doc.Sections) + { + foreach (var p in section.Root.Descendants(HwpxNs.Hp + "p")) + { + if (p == excludeParagraph) continue; + if (p.Attribute("paraPrIDRef")?.Value == paraPrIdRef) + return true; + } + } + // Also check if any style references this paraPr + if (_doc.Header?.Root != null) + { + foreach (var style in _doc.Header.Root.Descendants(HwpxNs.Hh + "style")) + { + if (style.Attribute("paraPrIDRef")?.Value == paraPrIdRef) + return true; + } + } + return false; + } + + // ==================== Generic ==================== + + /// + /// Set an XML attribute directly on the element. + /// Fallback for element types without specialized property handling. + /// + private static bool SetGenericAttr(XElement element, string property, string value) + { + element.SetAttributeValue(property, value); + return true; + } + + /// Set a named attribute to a value. Always returns true. + private static bool SetAttribute(XElement element, string name, string value) + { + element.SetAttributeValue(name, value); + return true; + } + + // ==================== Find & Replace ==================== + + /// + // ==================== Label-Based Table Fill (Plan 70) ==================== + + // Plan 99.9.I1: FillResult for unmatched label feedback + internal record FillResult(List Filled, List Unmatched); + + /// + /// Fill table cells by label matching using a 3-phase pipeline: + /// Phase 1: In-cell patterns (checkbox, paren-blank, annotation) + /// Phase 2: Table label-value cell replacement (existing behavior) + /// Phase 3: Inline paragraph pattern replacement (outside tables) + /// Returns FillResult with filled and unmatched label lists. + /// + private FillResult FillByLabel(Dictionary mappings) + { + // Plan 99.9.E4: Input size guard + const int MaxFillEntries = 200; + if (mappings.Count > MaxFillEntries) + throw new ArgumentException( + $"FillByLabel input has {mappings.Count} entries, exceeding limit of {MaxFillEntries}."); + + var anyFilled = false; + var filledSections = new HashSet(); + var filledLabels = new HashSet(StringComparer.OrdinalIgnoreCase); + + // === Phase 1: In-cell patterns (checkbox, paren-blank, annotation) === + foreach (var (key, value) in mappings) + { + var (label, _) = ParseLabelSpec(key); + + // Checkbox: □label → ☑label + if (TryFillCheckbox(label, value)) + { + filledLabels.Add(key); + anyFilled = true; + continue; + } + + // Paren-blank: "일반( )통" → fill in parens + if (TryFillParenBlank(label, value)) + { + filledLabels.Add(key); + anyFilled = true; + continue; + } + + // Annotation blank: "(한자: )" → fill in annotation + if (TryFillAnnotation(label, value)) + { + filledLabels.Add(key); + anyFilled = true; + continue; + } + } + + // === Phase 2: Table label-value cell replacement (existing logic) === + foreach (var (key, value) in mappings) + { + if (filledLabels.Contains(key)) continue; // Already filled in Phase 1 + + var (label, direction) = ParseLabelSpec(key); + var tc = FindCellByLabel(label, direction); + if (tc == null) continue; + + SetCellText(tc, value); + anyFilled = true; + filledLabels.Add(key); + + var sectionRoot = tc.AncestorsAndSelf() + .FirstOrDefault(e => e.Name.LocalName == "sec"); + if (sectionRoot != null) filledSections.Add(sectionRoot); + } + + // === Phase 3: Inline paragraph pattern (outside tables) === + foreach (var (key, value) in mappings) + { + if (filledLabels.Contains(key)) continue; // Already filled + + var (label, _) = ParseLabelSpec(key); + // Try find-and-replace for "label: ..." or "label : ..." patterns + var pattern = $"regex:(?<={System.Text.RegularExpressions.Regex.Escape(label)}\\s*[::]\\s*)\\S.*"; + var count = FindAndReplace(pattern, value); + if (count > 0) + { + filledLabels.Add(key); + anyFilled = true; + } + } + + if (anyFilled) + { + foreach (var sec in filledSections) + SaveSection(sec); + _dirty = true; + } + + // Plan 99.9.I1: Report unmatched labels + var unmatched = mappings.Keys + .Where(k => !filledLabels.Contains(k)) + .ToList(); + return new FillResult(filledLabels.ToList(), unmatched); + } + + /// + /// Try to fill a parenthesized blank pattern in table cells. + /// Pattern: "일반( )통" — fills the blank inside parens. + /// + private bool TryFillParenBlank(string label, string value) + { + // Search for cells containing label text with paren blanks + foreach (var sec in _doc.Sections) + { + foreach (var tbl in sec.Tables) + { + foreach (var tr in tbl.Elements(HwpxNs.Hp + "tr")) + { + foreach (var tc in tr.Elements(HwpxNs.Hp + "tc")) + { + var cellText = ExtractCellText(tc); + // Match pattern where label chars surround parens + var rx = new System.Text.RegularExpressions.Regex( + @"(" + System.Text.RegularExpressions.Regex.Escape(label[..Math.Min(2, label.Length)]) + + @"[가-힣A-Za-z]*)\(\s*\)([가-힣A-Za-z]*)"); + var match = rx.Match(cellText); + if (!match.Success) continue; + + // Plan 99.9.I5: Use multi- aware replacement + if (ReplaceTextInCell(tc, rx, + m => $"{m.Groups[1].Value}({value}){m.Groups[2].Value}")) + return true; + } + } + } + } + return false; + } + + /// + /// Try to fill an annotation blank pattern in table cells. + /// Pattern: "(한자: )" — fills the blank after colon. + /// + private bool TryFillAnnotation(string label, string value) + { + var rx = new System.Text.RegularExpressions.Regex( + @"\(" + System.Text.RegularExpressions.Regex.Escape(label) + @"[::]\s*\)"); + + foreach (var sec in _doc.Sections) + { + foreach (var tbl in sec.Tables) + { + foreach (var tr in tbl.Elements(HwpxNs.Hp + "tr")) + { + foreach (var tc in tr.Elements(HwpxNs.Hp + "tc")) + { + var cellText = ExtractCellText(tc); + if (!rx.IsMatch(cellText)) continue; + + // Plan 99.9.I5: Use multi- aware replacement + if (ReplaceTextInCell(tc, rx, _ => $"({label}:{value})")) + return true; + } + } + } + } + return false; + } + + // Plan 99.9.I5: Multi- text replacement for in-cell patterns + /// + /// Replace text that may span across multiple <hp:t> nodes within a cell. + /// Concatenates all <hp:t> text, applies regex, then distributes the replacement + /// back across the original nodes (preserving charPrIDRef styling on the first node). + /// + private bool ReplaceTextInCell(XElement tc, System.Text.RegularExpressions.Regex rx, + Func replacer) + { + var subList = tc.Element(HwpxNs.Hp + "subList"); + var paragraphs = (subList?.Elements(HwpxNs.Hp + "p") ?? tc.Elements(HwpxNs.Hp + "p")).ToList(); + + foreach (var p in paragraphs) + { + // Collect all nodes with cumulative offsets + var tNodes = new List<(XElement T, int Start, int End)>(); + int offset = 0; + foreach (var run in p.Elements(HwpxNs.Hp + "run")) + { + foreach (var t in run.Elements(HwpxNs.Hp + "t")) + { + var text = t.Value ?? ""; + tNodes.Add((t, offset, offset + text.Length)); + offset += text.Length; + } + } + + if (tNodes.Count == 0) continue; + + // Concatenate full text and try match + var fullText = string.Concat(tNodes.Select(n => n.T.Value ?? "")); + var match = rx.Match(fullText); + if (!match.Success) continue; + + // Apply replacement + var newText = replacer(match); + var result = fullText[..match.Index] + newText + fullText[(match.Index + match.Length)..]; + + // Distribute result back: first gets all text, rest get emptied + tNodes[0].T.Value = result; + for (int i = 1; i < tNodes.Count; i++) + tNodes[i].T.Value = ""; + + var sectionRoot = tc.AncestorsAndSelf() + .FirstOrDefault(e => e.Name.LocalName == "sec"); + if (sectionRoot != null) SaveSection(sectionRoot); + _dirty = true; + return true; + } + return false; + } + + // ==================== Find & Replace ==================== + + /// + /// Replace all occurrences of with + /// across all sections' <hp:t> elements. Returns the number of replacements made. + /// Known limitation: text split across multiple runs will not be matched. + /// + private int FindAndReplace(string find, string replace, + XElement? scope = null, Dictionary? formatFilter = null) + { + if (string.IsNullOrEmpty(find)) return 0; + + IEnumerable searchRoots = scope != null + ? new[] { scope } + : _doc.Sections.Select(s => s.Root); + + int count = 0; + var isRegex = find.StartsWith("regex:", StringComparison.OrdinalIgnoreCase); + var regex = isRegex + ? new System.Text.RegularExpressions.Regex(find[6..]) + : null; + + foreach (var root in searchRoots) + { + foreach (var run in root.Descendants(HwpxNs.Hp + "run").ToList()) + { + if (formatFilter != null && !MatchesCharPrFormat(run, formatFilter)) + continue; + + foreach (var t in run.Elements(HwpxNs.Hp + "t").ToList()) + { + var text = t.Value; + if (isRegex) + { + if (regex!.IsMatch(text)) + { + InvalidateLinesegarray(t); + t.Value = regex.Replace(text, replace); + count++; + } + } + else if (text.Contains(find, StringComparison.Ordinal)) + { + InvalidateLinesegarray(t); + t.Value = text.Replace(find, replace, StringComparison.Ordinal); + count++; + } + } + } + } + + if (count > 0) + { + foreach (var sec in _doc.Sections) + SaveSection(sec.Root); + _dirty = true; + } + return count; + } + + /// Check if a run's charPr matches the format filter. + private bool MatchesCharPrFormat(XElement run, Dictionary filter) + { + var charPrId = run.Attribute("charPrIDRef")?.Value ?? "0"; + var charPr = FindCharPr(charPrId); + if (charPr == null) return false; + + foreach (var (key, expected) in filter) + { + switch (key.ToLowerInvariant()) + { + case "bold": + var hasBold = charPr.Element(HwpxNs.Hh + "bold") != null; + if (hasBold != (expected == "true" || expected == "1")) return false; + break; + case "italic": + var hasItalic = charPr.Element(HwpxNs.Hh + "italic") != null; + if (hasItalic != (expected == "true" || expected == "1")) return false; + break; + case "color": + if (charPr.Attribute("textColor")?.Value != expected) return false; + break; + case "fontsize": + if (charPr.Attribute("height")?.Value != expected) return false; + break; + } + } + return true; + } + + // ==================== Save Helpers ==================== + + /// + /// Save header.xml back to the ZIP archive. + /// Uses delete-and-recreate pattern (avoids trailing bytes from SetLength(0)). + /// + private void SaveHeader() + { + if (_doc.Header == null || _doc.HeaderEntryPath == null) return; + + var entry = _doc.Archive.GetEntry(_doc.HeaderEntryPath); + if (entry == null) return; + + var entryName = entry.FullName; + entry.Delete(); + var newEntry = _doc.Archive.CreateEntry(entryName, CompressionLevel.Optimal); + using var stream = newEntry.Open(); + // CRITICAL: Hancom requires single-line (minified) XML without BOM. + var xmlStr = HwpxPacker.MinifyXml(_doc.Header.ToString(SaveOptions.DisableFormatting)); + xmlStr = HwpxPacker.RestoreOriginalNamespaces(xmlStr); + xmlStr = "" + xmlStr; + var bytes = System.Text.Encoding.UTF8.GetBytes(xmlStr); + stream.Write(bytes, 0, bytes.Length); + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxHandler.Validate.cs b/src/officecli/Handlers/Hwpx/HwpxHandler.Validate.cs new file mode 100644 index 000000000..0873a46f9 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxHandler.Validate.cs @@ -0,0 +1,653 @@ +// File: src/officecli/Handlers/Hwpx/HwpxHandler.Validate.cs +using System.IO.Compression; +using System.Xml.Linq; +using OfficeCli.Core; + +namespace OfficeCli.Handlers; + +public partial class HwpxHandler +{ + // ==================== Validation ==================== + + public List Validate() + { + var errors = new List(); + + // Level 1: ZIP integrity + if (!ValidateZipIntegrity(errors)) + return errors; // Critical — stop here + + // Level 2: OPF manifest + if (!ValidateOpfManifest(errors)) + return errors; // Critical — stop here + + // Level 3: XML well-formedness + ValidateXmlWellFormedness(errors); + + // Level 4: IDRef consistency (charPrIDRef on runs → header.xml charPr) + ValidateIdRefConsistency(errors); + + // Level 5: Table structure + ValidateTableStructure(errors); + + // Level 6: Namespace declarations + ValidateNamespaceDeclarations(errors); + + // Level 7: BinData integrity (Plan 94 — merged from ViewAsIssues) + ValidateBinDataIntegrity(errors); + + // Level 8: Field pair consistency (Plan 94) + ValidateFieldPairs(errors); + + // Level 9: Section count consistency (Plan 94) + ValidateSectionCount(errors); + + return errors; + } + + /// + /// Level 1: Verify the file is a valid ZIP archive. + /// Returns false if ZIP is corrupted (critical failure). + /// + private bool ValidateZipIntegrity(List errors) + { + try + { + // The archive is already open (loaded in constructor). + // Verify we can enumerate entries without error. + var entryCount = _doc.Archive.Entries.Count; + if (entryCount == 0) + { + errors.Add(new ValidationError( + "zip_empty", + "ZIP archive contains no entries", + "/", + null)); + return false; + } + return true; + } + catch (InvalidDataException ex) + { + errors.Add(new ValidationError( + "zip_corrupt", + $"File is not a valid ZIP archive: {ex.Message}", + "/", + null)); + return false; + } + } + + /// + /// Level 2: Verify OPF manifest structure. + /// - mimetype entry must exist and be the first ZIP entry with no compression + /// - META-INF/container.xml must be parseable + /// + private bool ValidateOpfManifest(List errors) + { + bool critical = false; + + // Check mimetype entry + var mimetypeEntry = _doc.Archive.GetEntry("mimetype"); + if (mimetypeEntry == null) + { + errors.Add(new ValidationError( + "opf_missing_mimetype", + "HWPX package missing 'mimetype' entry (required by OPF spec)", + "/mimetype", + null)); + } + else + { + // mimetype must be first entry + var firstEntry = _doc.Archive.Entries.FirstOrDefault(); + if (firstEntry?.FullName != "mimetype") + { + errors.Add(new ValidationError( + "opf_mimetype_not_first", + "mimetype must be the first ZIP entry (found: " + (firstEntry?.FullName ?? "none") + ")", + "/mimetype", + null)); + } + + // Plan 81: mimetype must use ZIP_STORED (no compression) + // .NET ZipArchiveEntry has no CompressionMethod — use heuristic: CompressedLength == Length + if (mimetypeEntry.CompressedLength != mimetypeEntry.Length) + { + errors.Add(new ValidationError( + "package_mimetype_compressed", + $"mimetype entry should use ZIP_STORED (no compression), but appears compressed (size={mimetypeEntry.Length}, compressed={mimetypeEntry.CompressedLength})", + "/mimetype", + null)); + } + + // G5: Validate mimetype content value + using var mimeStream = mimetypeEntry.Open(); + using var mimeReader = new StreamReader(mimeStream, System.Text.Encoding.ASCII); + var mimeContent = mimeReader.ReadToEnd().Trim(); + var validMimeTypes = new HashSet(StringComparer.OrdinalIgnoreCase) + { + "application/hwp+zip", + "application/vnd.hancom.hwp", + "application/vnd.hancom.hwpx", + "application/haansofthwp" + }; + if (!validMimeTypes.Contains(mimeContent)) + { + errors.Add(new ValidationError( + "package_mimetype_invalid", + $"Unexpected MIME type '{mimeContent}' (expected one of: {string.Join(", ", validMimeTypes)})", + "/mimetype", + mimeContent)); + } + } + + // Check META-INF/container.xml + var containerEntry = _doc.Archive.GetEntry("META-INF/container.xml"); + if (containerEntry != null) + { + try + { + using var stream = containerEntry.Open(); + XDocument.Load(stream); // parse test + } + catch (Exception ex) + { + errors.Add(new ValidationError( + "opf_container_invalid", + $"META-INF/container.xml is not valid XML: {ex.Message}", + "/META-INF/container.xml", + "container.xml")); + critical = true; + } + } + + // Plan 81: Rootfile resolution check + if (containerEntry != null && _doc.RootfilePath != null) + { + var rootEntry = _doc.Archive.GetEntry(_doc.RootfilePath); + if (rootEntry == null) + { + errors.Add(new ValidationError( + "package_rootfile_missing", + $"container.xml rootfile points to '{_doc.RootfilePath}' but entry not found in archive", + "/META-INF/container.xml", + null)); + critical = true; + } + } + + // Plan 81: version.xml check (warning only). Real Hancom exports have + // appeared with version.xml at the package root, so accept both forms. + var versionEntry = _doc.Archive.GetEntry("Contents/version.xml") + ?? _doc.Archive.GetEntry("version.xml"); + if (versionEntry == null) + { + errors.Add(new ValidationError( + "package_version_missing", + "Contents/version.xml not found (optional but recommended by OWPML spec)", + "/Contents/version.xml", + null)); + // Warning severity — not critical + } + + // Plan 81: Section count consistency (manifest vs loaded) + if (_doc.ManifestDoc != null) + { + var manifestSectionCount = _doc.ManifestDoc.Descendants() + .Count(e => e.Attribute("media-type")?.Value?.Contains("section") ?? false); + if (manifestSectionCount != _doc.Sections.Count && manifestSectionCount > 0) + { + errors.Add(new ValidationError( + "package_section_mismatch", + $"Manifest declares {manifestSectionCount} sections but {_doc.Sections.Count} loaded", + "/Contents/content.hpf", + null)); + } + } + + // Check content.hpf (the OPF package file) + var hpfEntry = _doc.Archive.GetEntry("Contents/content.hpf"); + if (hpfEntry == null) + { + errors.Add(new ValidationError( + "opf_missing_hpf", + "HWPX package missing 'Contents/content.hpf' manifest", + "/Contents/content.hpf", + null)); + critical = true; + } + else + { + try + { + using var stream = hpfEntry.Open(); + XDocument.Load(stream); // parse test + } + catch (Exception ex) + { + errors.Add(new ValidationError( + "opf_hpf_invalid", + $"Contents/content.hpf is not valid XML: {ex.Message}", + "/Contents/content.hpf", + "content.hpf")); + critical = true; + } + } + + return !critical; + } + + /// + /// Level 3: Verify all .xml entries in the archive parse without exception. + /// + private void ValidateXmlWellFormedness(List errors) + { + foreach (var entry in _doc.Archive.Entries) + { + if (!entry.FullName.EndsWith(".xml", StringComparison.OrdinalIgnoreCase)) + continue; + + try + { + using var stream = entry.Open(); + XDocument.Load(stream); + } + catch (Exception ex) + { + errors.Add(new ValidationError( + "xml_malformed", + $"XML parse error in '{entry.FullName}': {ex.Message}", + $"/{entry.FullName}", + entry.FullName)); + } + } + } + + /// + /// Level 4: Verify all charPrIDRef values on hp:run elements reference + /// existing charPr entries in header.xml. + /// Scans ALL sections (not just PrimarySection). + /// + private void ValidateIdRefConsistency(List errors) + { + if (_doc.Header?.Root == null) + { + // No header.xml — can't validate refs, but this is already flagged by ViewAsIssues + return; + } + + // Collect all valid charPr IDs from header.xml + var validCharPrIds = new HashSet( + _doc.Header.Root + .Descendants(HwpxNs.Hh + "charPr") + .Select(cp => cp.Attribute("id")?.Value) + .Where(id => id != null)!); + + // Scan ALL sections for charPrIDRef references + foreach (var section in _doc.Sections) + { + int localParaIdx = 0; + foreach (var para in section.Paragraphs) + { + localParaIdx++; + int runIdx = 0; + foreach (var run in para.Elements(HwpxNs.Hp + "run")) + { + runIdx++; + var charPrIdRef = run.Attribute("charPrIDRef")?.Value; + if (charPrIdRef == null) continue; + + if (!validCharPrIds.Contains(charPrIdRef)) + { + var path = $"/section[{section.Index + 1}]/p[{localParaIdx}]/run[{runIdx}]"; + errors.Add(new ValidationError( + "idref_dangling", + $"charPrIDRef=\"{charPrIdRef}\" references non-existent charPr in header.xml", + path, + section.EntryPath)); + } + } + } + } + + // Also validate paraPrIDRef references + var validParaPrIds = new HashSet( + _doc.Header.Root + .Descendants(HwpxNs.Hh + "paraPr") + .Select(pp => pp.Attribute("id")?.Value) + .Where(id => id != null)!); + + foreach (var section in _doc.Sections) + { + int localParaIdx = 0; + foreach (var para in section.Paragraphs) + { + localParaIdx++; + var paraPrIdRef = para.Attribute("paraPrIDRef")?.Value; + if (paraPrIdRef == null) continue; + + if (!validParaPrIds.Contains(paraPrIdRef)) + { + var path = $"/section[{section.Index + 1}]/p[{localParaIdx}]"; + errors.Add(new ValidationError( + "idref_dangling", + $"paraPrIDRef=\"{paraPrIdRef}\" references non-existent paraPr in header.xml", + path, + section.EntryPath)); + } + } + } + + // Validate styleIDRef references + var validStyleIds = new HashSet( + _doc.Header.Root + .Descendants(HwpxNs.Hh + "style") + .Select(s => s.Attribute("id")?.Value) + .Where(id => id != null)!); + + foreach (var section in _doc.Sections) + { + int localParaIdx = 0; + foreach (var para in section.Paragraphs) + { + localParaIdx++; + var styleIdRef = para.Attribute("styleIDRef")?.Value; + if (styleIdRef == null) continue; + + if (!validStyleIds.Contains(styleIdRef)) + { + var path = $"/section[{section.Index + 1}]/p[{localParaIdx}]"; + errors.Add(new ValidationError( + "idref_dangling", + $"styleIDRef=\"{styleIdRef}\" references non-existent style in header.xml", + path, + section.EntryPath)); + } + } + } + } + + /// + /// Level 5: Validate table cell structure. + /// Every hp:tc must have: + /// - cellAddr (child element OR tc attributes with colAddr/rowAddr) + /// - cellSz (child element with width and height) + /// - cellMargin (child element with left, right, top, bottom) + /// - subList with ≥1 hp:p + /// + private void ValidateTableStructure(List errors) + { + foreach (var section in _doc.Sections) + { + int tblIdx = 0; + foreach (var tbl in section.Tables) + { + tblIdx++; + int trIdx = 0; + foreach (var tr in tbl.Elements(HwpxNs.Hp + "tr")) + { + trIdx++; + int tcIdx = 0; + foreach (var tc in tr.Elements(HwpxNs.Hp + "tc")) + { + tcIdx++; + var basePath = $"/section[{section.Index + 1}]/tbl[{tblIdx}]/tr[{trIdx}]/tc[{tcIdx}]"; + var part = section.EntryPath; + + // Check cellAddr (dual-format: child element OR tc attributes — require BOTH colAddr AND rowAddr) + var cellAddrChild = tc.Element(HwpxNs.Hp + "cellAddr"); + var hasAttrAddr = tc.Attribute("colAddr") != null && tc.Attribute("rowAddr") != null; // require BOTH + if (cellAddrChild != null) + { + // child element form: must have both attributes + if (cellAddrChild.Attribute("colAddr") == null || cellAddrChild.Attribute("rowAddr") == null) + errors.Add(new ValidationError( + "table_celladdr_incomplete", + "cellAddr element missing 'colAddr' or 'rowAddr' attribute", + basePath, part)); + } + else if (!hasAttrAddr) + { + errors.Add(new ValidationError( + "table_missing_celladdr", + "Table cell missing cellAddr (no child element and no colAddr/rowAddr attributes)", + basePath, + part)); + } + + // Check cellSz + var cellSz = tc.Element(HwpxNs.Hp + "cellSz"); + if (cellSz == null) + { + errors.Add(new ValidationError( + "table_missing_cellsz", + "Table cell missing cellSz element (width and height)", + basePath, + part)); + } + else + { + if (cellSz.Attribute("width") == null) + errors.Add(new ValidationError( + "table_cellsz_no_width", + "cellSz element missing 'width' attribute", + basePath, + part)); + if (cellSz.Attribute("height") == null) + errors.Add(new ValidationError( + "table_cellsz_no_height", + "cellSz element missing 'height' attribute", + basePath, + part)); + } + + // Check cellMargin + var cellMargin = tc.Element(HwpxNs.Hp + "cellMargin"); + if (cellMargin == null) + { + errors.Add(new ValidationError( + "table_missing_cellmargin", + "Table cell missing cellMargin element (left, right, top, bottom)", + basePath, + part)); + } + else + { + foreach (var side in new[] { "left", "right", "top", "bottom" }) + { + if (cellMargin.Attribute(side) == null) + errors.Add(new ValidationError( + "table_cellmargin_incomplete", + $"cellMargin element missing '{side}' attribute", + basePath, + part)); + } + } + + // Check subList with ≥1 + var subList = tc.Element(HwpxNs.Hp + "subList"); + if (subList == null) + { + errors.Add(new ValidationError( + "table_missing_sublist", + "Table cell missing subList element (must contain ≥1 paragraph)", + basePath, + part)); + } + else + { + var paraCount = subList.Elements(HwpxNs.Hp + "p").Count(); + if (paraCount == 0) + { + errors.Add(new ValidationError( + "table_empty_sublist", + "Table cell subList contains no paragraphs (must have ≥1 hp:p)", + basePath, + part)); + } + } + } + } + } + } + } + + /// + /// Level 6: Verify that required namespace declarations exist in respective root elements. + /// - Hs namespace in section roots + /// - Hp namespace in section roots + /// - Hh namespace in header.xml root + /// + private void ValidateNamespaceDeclarations(List errors) + { + // Check section roots for Hs and Hp namespaces + foreach (var section in _doc.Sections) + { + var root = section.Root; + var declaredNamespaces = root.Attributes() + .Where(a => a.IsNamespaceDeclaration) + .Select(a => a.Value) + .ToHashSet(); + + // Also check the namespace of the root element itself + var rootNs = root.Name.Namespace.NamespaceName; + + if (!declaredNamespaces.Contains(HwpxNs.Hs.NamespaceName) + && rootNs != HwpxNs.Hs.NamespaceName) + { + errors.Add(new ValidationError( + "ns_missing", + $"Section {section.Index + 1} root missing Hs namespace declaration " + + $"({HwpxNs.Hs.NamespaceName})", + $"/section[{section.Index + 1}]", + section.EntryPath)); + } + + // Check for Hp namespace (used by child elements) + var hasHpInTree = root.Descendants() + .Any(e => e.Name.Namespace == HwpxNs.Hp); + if (hasHpInTree + && !declaredNamespaces.Contains(HwpxNs.Hp.NamespaceName) + && rootNs != HwpxNs.Hp.NamespaceName) + { + errors.Add(new ValidationError( + "ns_missing", + $"Section {section.Index + 1} root missing Hp namespace declaration " + + $"({HwpxNs.Hp.NamespaceName}) — elements use this namespace", + $"/section[{section.Index + 1}]", + section.EntryPath)); + } + } + + // Check header.xml for Hh namespace + if (_doc.Header?.Root != null) + { + var headerRoot = _doc.Header.Root; + var declaredNamespaces = headerRoot.Attributes() + .Where(a => a.IsNamespaceDeclaration) + .Select(a => a.Value) + .ToHashSet(); + + var rootNs = headerRoot.Name.Namespace.NamespaceName; + + if (!declaredNamespaces.Contains(HwpxNs.Hh.NamespaceName) + && rootNs != HwpxNs.Hh.NamespaceName) + { + var hasHhInTree = headerRoot.Descendants() + .Any(e => e.Name.Namespace == HwpxNs.Hh); + if (hasHhInTree) + { + errors.Add(new ValidationError( + "ns_missing", + $"header.xml root missing Hh namespace declaration " + + $"({HwpxNs.Hh.NamespaceName}) — elements use this namespace", + "/header", + _doc.HeaderEntryPath ?? "Contents/header.xml")); + } + } + } + } + + // Plan 94: BinData integrity (merged from ViewAsIssues Level 7) + private void ValidateBinDataIntegrity(List errors) + { + var referenced = new HashSet(StringComparer.OrdinalIgnoreCase); + foreach (var sec in _doc.Sections) + foreach (var el in sec.Root.Descendants()) + { + var binRef = el.Attribute("binaryItemIDRef")?.Value; + if (binRef != null) referenced.Add(binRef); + } + + foreach (var manifestRef in GetManifestBinDataIds()) + referenced.Add(manifestRef); + + var actual = new HashSet(StringComparer.OrdinalIgnoreCase); + foreach (var entry in _doc.Archive.Entries) + if (entry.FullName.Contains("BinData/", StringComparison.OrdinalIgnoreCase) + && !entry.FullName.EndsWith("/", StringComparison.Ordinal)) + actual.Add(System.IO.Path.GetFileNameWithoutExtension(entry.FullName)); + + foreach (var missing in referenced.Except(actual)) + errors.Add(new ValidationError("bindata_missing", + $"Referenced binary '{missing}' not found in archive", + "/BinData", null)); + + foreach (var orphan in actual.Except(referenced)) + errors.Add(new ValidationError("bindata_orphan", + $"Orphan binary '{orphan}' not referenced by any element", + "/BinData", null)); + } + + private IEnumerable GetManifestBinDataIds() + { + if (_doc.ManifestDoc?.Root == null) yield break; + + foreach (var item in _doc.ManifestDoc.Descendants()) + { + var href = item.Attribute("href")?.Value; + if (href == null || !href.StartsWith("BinData/", StringComparison.OrdinalIgnoreCase)) + continue; + + var id = item.Attribute("id")?.Value; + if (!string.IsNullOrWhiteSpace(id)) + { + yield return id; + continue; + } + + var fileId = System.IO.Path.GetFileNameWithoutExtension(href); + if (!string.IsNullOrWhiteSpace(fileId)) + yield return fileId; + } + } + + // Plan 94: Field pair consistency (merged from ViewAsIssues Level 8) + private void ValidateFieldPairs(List errors) + { + foreach (var sec in _doc.Sections) + { + var begins = sec.Root.Descendants(HwpxNs.Hp + "fieldBegin").Count(); + var ends = sec.Root.Descendants(HwpxNs.Hp + "fieldEnd").Count(); + if (begins != ends) + errors.Add(new ValidationError("field_pair_mismatch", + $"Section {sec.Index + 1}: {begins} fieldBegin vs {ends} fieldEnd", + $"/section[{sec.Index + 1}]", null)); + } + } + + // Plan 94: Section count consistency (merged from ViewAsIssues Level 9) + private void ValidateSectionCount(List errors) + { + if (_doc.ManifestDoc == null) return; + var manifestSectionCount = _doc.ManifestDoc.Descendants() + .Count(e => e.Attribute("media-type")?.Value?.Contains("section") ?? false); + if (manifestSectionCount > 0 && manifestSectionCount != _doc.Sections.Count) + { + errors.Add(new ValidationError("package_section_mismatch", + $"Manifest declares {manifestSectionCount} sections but {_doc.Sections.Count} loaded", + "/Contents/content.hpf", null)); + } + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxHandler.View.cs b/src/officecli/Handlers/Hwpx/HwpxHandler.View.cs new file mode 100644 index 000000000..719b62b5a --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxHandler.View.cs @@ -0,0 +1,1439 @@ +using System.Text; +using System.Text.Json.Nodes; +using System.Xml.Linq; +using OfficeCli.Core; + +namespace OfficeCli.Handlers; + +public partial class HwpxHandler +{ + private sealed record FormFieldInfo(string Type, string Id, string Name, string Text, string? HelpText, bool IsDefault); + + // ==================== View Layer ==================== + + public string ViewAsText(int? startLine = null, int? endLine = null, + int? maxLines = null, HashSet? cols = null) + { + var sb = new StringBuilder(); + int lineNum = 0; + int emitted = 0; + + foreach (var (section, para, path) in _doc.AllContentInOrder()) + { + lineNum++; + if (startLine.HasValue && lineNum < startLine.Value) continue; + if (endLine.HasValue && lineNum > endLine.Value) break; + + var rawText = ExtractParagraphText(para); + var text = HwpxKorean.Normalize(rawText); + + if (maxLines.HasValue && emitted >= maxLines.Value) + { + sb.AppendLine($"... (more lines)"); + break; + } + + sb.AppendLine($"{lineNum}. {text}"); + emitted++; + } + + return sb.ToString().TrimEnd(); + } + + public string ViewAsAnnotated(int? startLine = null, int? endLine = null, + int? maxLines = null, HashSet? cols = null) + { + var sb = new StringBuilder(); + int lineNum = 0; + int emitted = 0; + + foreach (var (section, para, localIdx) in _doc.AllParagraphs()) + { + lineNum++; + if (startLine.HasValue && lineNum < startLine.Value) continue; + if (endLine.HasValue && lineNum > endLine.Value) break; + if (maxLines.HasValue && emitted >= maxLines.Value) + { + int remaining = CountRemainingParagraphs(lineNum); + if (remaining > 0) + sb.AppendLine($"... ({remaining} more lines)"); + break; + } + + var path = $"/section[{section.Index + 1}]/p[{localIdx + 1}]"; + var styleInfo = GetParagraphStyleInfo(para); + var runs = ExtractAnnotatedRuns(para); + var text = string.Join("", runs.Select(r => r.Text)); + text = HwpxKorean.Normalize(text); + + // Build annotation prefix + var annotations = new List(); + if (!string.IsNullOrEmpty(styleInfo.HeadingLevel)) + annotations.Add($"h{styleInfo.HeadingLevel}"); + if (styleInfo.Alignment != "LEFT") + annotations.Add(styleInfo.Alignment.ToLowerInvariant()); + + var prefix = annotations.Count > 0 ? $"[{string.Join(",", annotations)}] " : ""; + sb.AppendLine($"{lineNum}. {path} {prefix}{text}"); + emitted++; + } + + return sb.ToString().TrimEnd(); + } + + public string ViewAsOutline() + { + var sb = new StringBuilder(); + + foreach (var (section, para, localIdx) in _doc.AllParagraphs()) + { + var styleInfo = GetParagraphStyleInfo(para); + if (string.IsNullOrEmpty(styleInfo.HeadingLevel)) continue; + + var level = int.Parse(styleInfo.HeadingLevel); + var indent = new string(' ', (level - 1) * 2); + var text = HwpxKorean.Normalize(ExtractParagraphText(para)); + var preview = text.Length > 80 ? text[..80] + "…" : text; + var path = $"/section[{section.Index + 1}]/p[{localIdx + 1}]"; + + sb.AppendLine($"{indent}h{level}: {preview} ({path})"); + } + + return sb.Length > 0 ? sb.ToString().TrimEnd() : "(no headings found)"; + } + + public string ViewAsStats() + { + int totalParas = 0, totalTables = 0, totalChars = 0, totalWords = 0; + int totalImages = 0; + + foreach (var sec in _doc.Sections) + { + totalParas += sec.Paragraphs.Count; + totalTables += sec.Tables.Count; + totalImages += sec.Root.Descendants(HwpxNs.Hp + "img").Count(); + + foreach (var p in sec.Paragraphs) + { + var text = HwpxKorean.Normalize(ExtractParagraphText(p)); + totalChars += text.Length; + totalWords += CountWords(text); + } + } + + var sb = new StringBuilder(); + sb.AppendLine($"Sections: {_doc.Sections.Count}"); + sb.AppendLine($"Paragraphs: {totalParas}"); + sb.AppendLine($"Tables: {totalTables}"); + sb.AppendLine($"Images: {totalImages}"); + sb.AppendLine($"Characters: {totalChars}"); + sb.AppendLine($"Words: {totalWords}"); + + // Page info — iterate ALL sections for aggregate stats; use first secPr for page size reference + foreach (var sec in _doc.Sections) + { + var secPr = sec.Root.Descendants(HwpxNs.Hp + "secPr").FirstOrDefault(); + var pagePr = secPr?.Element(HwpxNs.Hp + "pagePr"); + if (pagePr != null) + { + var width = (int?)pagePr.Attribute("width") ?? 0; + var height = (int?)pagePr.Attribute("height") ?? 0; + sb.AppendLine($"Page size: {FormatHwpUnit(width)} × {FormatHwpUnit(height)}"); + break; // Report first section's page size; add per-section loop if needed + } + } + + // Metadata + var meta = GetMetadata(); + if (meta.TryGetValue("title", out var mTitle) && !string.IsNullOrEmpty(mTitle)) + sb.AppendLine($"Title: {mTitle}"); + if (meta.TryGetValue("creator", out var mCreator) && !string.IsNullOrEmpty(mCreator)) + sb.AppendLine($"Creator: {mCreator}"); + + return sb.ToString().TrimEnd(); + } + + public JsonNode ViewAsStatsJson() + { + int totalParas = 0, totalTables = 0, totalChars = 0, totalWords = 0; + int totalImages = 0; + + foreach (var sec in _doc.Sections) + { + totalParas += sec.Paragraphs.Count; + totalTables += sec.Tables.Count; + totalImages += sec.Root.Descendants(HwpxNs.Hp + "img").Count(); + + foreach (var p in sec.Paragraphs) + { + var text = HwpxKorean.Normalize(ExtractParagraphText(p)); + totalChars += text.Length; + totalWords += CountWords(text); + } + } + + return new JsonObject + { + ["sections"] = _doc.Sections.Count, + ["paragraphs"] = totalParas, + ["tables"] = totalTables, + ["images"] = totalImages, + ["characters"] = totalChars, + ["words"] = totalWords, + }; + } + + public JsonNode ViewAsOutlineJson() + { + var items = new JsonArray(); + + foreach (var (section, para, localIdx) in _doc.AllParagraphs()) + { + var styleInfo = GetParagraphStyleInfo(para); + if (string.IsNullOrEmpty(styleInfo.HeadingLevel)) continue; + + var level = int.Parse(styleInfo.HeadingLevel); + var text = HwpxKorean.Normalize(ExtractParagraphText(para)); + var path = $"/section[{section.Index + 1}]/p[{localIdx + 1}]"; + + items.Add(new JsonObject + { + ["level"] = level, + ["text"] = text, + ["path"] = path, + }); + } + + return items; + } + + public JsonNode ViewAsTextJson(int? startLine = null, int? endLine = null, + int? maxLines = null, HashSet? cols = null) + { + var lines = new JsonArray(); + int lineNum = 0; + int emitted = 0; + + foreach (var (section, para, path) in _doc.AllContentInOrder()) + { + lineNum++; + if (startLine.HasValue && lineNum < startLine.Value) continue; + if (endLine.HasValue && lineNum > endLine.Value) break; + if (maxLines.HasValue && emitted >= maxLines.Value) break; + + var text = HwpxKorean.Normalize(ExtractParagraphText(para)); + + lines.Add(new JsonObject + { + ["line"] = lineNum, + ["path"] = path, + ["text"] = text, + }); + emitted++; + } + + return new JsonObject + { + ["lines"] = lines, + ["totalLines"] = lineNum, + }; + } + + public List ViewAsIssues(string? issueType = null, int? limit = null) + { + var issues = new List(); + int issueId = 0; + + // Check for empty paragraphs + foreach (var (section, para, localIdx) in _doc.AllParagraphs()) + { + var text = ExtractParagraphText(para); + if (string.IsNullOrWhiteSpace(text)) + { + // Skip — empty paragraphs are normal spacing + continue; + } + + // Check for PUA characters (corruption indicator) + if (text.Any(c => c >= '\uE000' && c <= '\uF8FF')) + { + issues.Add(new DocumentIssue + { + Id = $"HWPX-{++issueId:D3}", + Type = IssueType.Content, + Severity = IssueSeverity.Warning, + Path = $"/section[{section.Index + 1}]/p[{localIdx + 1}]", + Message = "Paragraph contains Private Use Area characters", + Context = text[..Math.Min(text.Length, 50)] + }); + } + } + + // Check for tables with inconsistent column counts + foreach (var (section, tbl, tblIdx) in _doc.AllTables()) + { + var rows = tbl.Elements(HwpxNs.Hp + "tr").ToList(); + if (rows.Count == 0) continue; + + var expectedCols = (int?)tbl.Attribute("colCnt") ?? -1; + foreach (var (row, rowIdx) in rows.Select((r, i) => (r, i))) + { + // Sum colSpan values (handles merged cells); GetCellAddr is defined in this partial class + var colSpanSum = row.Elements(HwpxNs.Hp + "tc") + .Sum(tc => (int?)GetCellAddr(tc).ColSpan ?? 1); + if (expectedCols >= 0 && colSpanSum != expectedCols) + { + issues.Add(new DocumentIssue + { + Id = $"HWPX-{++issueId:D3}", + Type = IssueType.Structure, + Severity = IssueSeverity.Error, + Path = $"/section[{section.Index + 1}]/tbl[{tblIdx + 1}]/tr[{rowIdx + 1}]", + Message = $"Row colSpan sum {colSpanSum} != expected {expectedCols}", + Context = null + }); + } + } + } + + // Check for missing header.xml + if (_doc.Header == null) + { + issues.Add(new DocumentIssue + { + Id = $"HWPX-{++issueId:D3}", + Type = IssueType.Structure, + Severity = IssueSeverity.Warning, + Path = "/", + Message = "Document missing header.xml (style definitions unavailable)", + Context = null + }); + } + + // Level 7: BinData integrity — orphan/missing binary references + var referencedBinData = new HashSet(StringComparer.OrdinalIgnoreCase); + foreach (var sec in _doc.Sections) + { + foreach (var el in sec.Root.Descendants()) + { + var binRef = el.Attribute("binaryItemIDRef")?.Value; + if (binRef != null) referencedBinData.Add(binRef); + } + } + var actualBinData = new HashSet(StringComparer.OrdinalIgnoreCase); + foreach (var entry in _doc.Archive.Entries) + { + if (entry.FullName.Contains("BinData/", StringComparison.OrdinalIgnoreCase)) + actualBinData.Add(System.IO.Path.GetFileNameWithoutExtension(entry.FullName)); + } + foreach (var missing in referencedBinData.Except(actualBinData)) + { + issues.Add(new DocumentIssue + { + Id = $"HWPX-{++issueId:D3}", Type = IssueType.Structure, + Severity = IssueSeverity.Error, Path = "/BinData", + Message = $"Referenced binary '{missing}' not found in archive", + Context = null + }); + } + foreach (var orphan in actualBinData.Except(referencedBinData)) + { + issues.Add(new DocumentIssue + { + Id = $"HWPX-{++issueId:D3}", Type = IssueType.Structure, + Severity = IssueSeverity.Info, Path = "/BinData", + Message = $"Orphan binary '{orphan}' not referenced by any element", + Context = null + }); + } + + // Level 8: Field pair validation — unclosed fieldBegin/fieldEnd + foreach (var sec in _doc.Sections) + { + var fieldBegins = sec.Root.Descendants(HwpxNs.Hp + "fieldBegin").ToList(); + var fieldEnds = sec.Root.Descendants(HwpxNs.Hp + "fieldEnd").ToList(); + if (fieldBegins.Count != fieldEnds.Count) + { + issues.Add(new DocumentIssue + { + Id = $"HWPX-{++issueId:D3}", Type = IssueType.Structure, + Severity = IssueSeverity.Warning, + Path = $"/section[{sec.Index + 1}]", + Message = $"Field count mismatch: {fieldBegins.Count} opens vs {fieldEnds.Count} closes", + Context = null + }); + } + } + + // Level 9: Section count consistency — manifest vs actual + if (_doc.ManifestDoc != null) + { + var manifestSections = _doc.ManifestDoc.Descendants() + .Count(e => e.Attribute("media-type")?.Value == "application/xml" + && (e.Attribute("href")?.Value?.StartsWith("section") ?? false)); + if (manifestSections != _doc.Sections.Count) + { + issues.Add(new DocumentIssue + { + Id = $"HWPX-{++issueId:D3}", Type = IssueType.Structure, + Severity = IssueSeverity.Error, Path = "/content.hpf", + Message = $"Section count mismatch: manifest={manifestSections}, loaded={_doc.Sections.Count}", + Context = null + }); + } + } + + // Filter by type + if (issueType != null) + { + var filterType = Enum.Parse(issueType, ignoreCase: true); + issues = issues.Where(i => i.Type == filterType).ToList(); + } + + // Apply limit + if (limit.HasValue) + issues = issues.Take(limit.Value).ToList(); + + return issues; + } + + // ==================== Forms ==================== + + public string ViewAsForms(bool auto = true) + { + var sb = new StringBuilder(); + var fields = EnumerateInteractiveFormFields().ToList(); + foreach (var field in fields) + { + var nameSuffix = string.IsNullOrEmpty(field.Name) ? "" : $" {field.Name}"; + sb.AppendLine($" [{field.Id}] {field.Type}{nameSuffix}: \"{field.Text}\"{(field.IsDefault ? " (default)" : "")}"); + } + sb.Insert(0, $"Form fields: {fields.Count}\n"); + + if (auto) + { + var recognized = RecognizeFormFields(); + if (recognized.Count > 0) + { + var adjacentCount = recognized.Count(f => f.Strategy == "adjacent"); + var headerDataCount = recognized.Count(f => f.Strategy == "header-data"); + var strategySummary = new List(); + if (adjacentCount > 0) strategySummary.Add($"{adjacentCount} adjacent"); + if (headerDataCount > 0) strategySummary.Add($"{headerDataCount} header-data"); + var otherCount = recognized.Count - adjacentCount - headerDataCount; + if (otherCount > 0) strategySummary.Add($"{otherCount} other"); + + sb.AppendLine(); + sb.AppendLine($"Forms: {recognized.Count} fields recognized ({string.Join(", ", strategySummary)})"); + sb.AppendLine(); + + // Compute column widths + int labelW = Math.Max(5, recognized.Max(f => f.Label.Length)); + int valueW = Math.Max(5, recognized.Max(f => f.Value.Length)); + int pathW = Math.Max(4, recognized.Max(f => f.Path.Length)); + int stratW = Math.Max(8, recognized.Max(f => f.Strategy.Length)); + + // Cap widths to keep output readable + labelW = Math.Min(labelW, 20); + valueW = Math.Min(valueW, 24); + pathW = Math.Min(pathW, 44); + + sb.AppendLine($" {"Label".PadRight(labelW)} {"Value".PadRight(valueW)} {"Path".PadRight(pathW)} Strategy"); + sb.AppendLine($" {new string('\u2500', labelW + 2 + valueW + 2 + pathW + 2 + stratW)}"); + + foreach (var f in recognized) + { + var label = f.Label.Length > labelW ? f.Label[..(labelW - 1)] + "\u2026" : f.Label.PadRight(labelW); + var value = f.Value.Length > valueW ? f.Value[..(valueW - 1)] + "\u2026" : f.Value.PadRight(valueW); + var path = f.Path.Length > pathW ? f.Path[..(pathW - 1)] + "\u2026" : f.Path.PadRight(pathW); + sb.AppendLine($" {label} {value} {path} [auto:{f.Strategy}]"); + } + } + + // F8: Form confidence score + int totalTables = _doc.Sections.Sum(s => s.Tables.Count); + if (totalTables > 0) + { + var formTablePaths = recognized + .Select(f => System.Text.RegularExpressions.Regex.Match(f.Path, @"^/section\[\d+\]/tbl\[\d+\]").Value) + .Where(p => !string.IsNullOrEmpty(p)) + .Distinct() + .Count(); + double confidence = (double)formTablePaths / totalTables; + sb.AppendLine(); + sb.AppendLine($"Form confidence: {confidence:P0} ({formTablePaths}/{totalTables} tables are form-like)"); + } + } + + return sb.ToString().TrimEnd(); + } + + /// JSON output for forms view. Supports CLICK_HERE + auto-recognized fields. + public JsonNode ViewAsFormsJson(bool auto = true) + { + var result = new JsonObject(); + + var clickFields = new JsonArray(); + var formFields = new JsonArray(); + foreach (var field in EnumerateInteractiveFormFields()) + { + if (field.Type == "CLICK_HERE") + { + clickFields.Add(new JsonObject { + ["id"] = field.Id, ["text"] = field.Text, + ["helpText"] = field.HelpText, ["isDefault"] = field.IsDefault + }); + } + + formFields.Add(new JsonObject { + ["id"] = field.Id, + ["type"] = field.Type, + ["name"] = field.Name, + ["text"] = field.Text, + ["helpText"] = field.HelpText, + ["isDefault"] = field.IsDefault + }); + } + result["clickHere"] = clickFields; + result["formFields"] = formFields; + + if (auto) + { + var autoFields = new JsonArray(); + foreach (var f in RecognizeFormFields()) + { + autoFields.Add(new JsonObject { + ["label"] = f.Label, ["value"] = f.Value, + ["path"] = f.Path, ["row"] = f.Row, ["col"] = f.Col, + ["strategy"] = f.Strategy + }); + } + result["autoRecognized"] = autoFields; + } + + return result; + } + + private IEnumerable EnumerateInteractiveFormFields() + { + foreach (var sec in _doc.Sections) + { + foreach (var run in sec.Root.Descendants(HwpxNs.Hp + "run")) + { + var ctrl = run.Element(HwpxNs.Hp + "ctrl"); + var fieldBegin = ctrl?.Element(HwpxNs.Hp + "fieldBegin"); + var fieldType = fieldBegin?.Attribute("type")?.Value; + if (fieldType is not ("CLICK_HERE" or "CHECKBOX" or "DROPDOWN")) continue; + + var field = fieldBegin!; + var id = field.Attribute("id")?.Value ?? "?"; + var name = field.Attribute("name")?.Value ?? ""; + var helpText = field.Descendants() + .FirstOrDefault(p => p.Attribute("name")?.Value is "Direction" or "Label") + ?.Value; + var nextRun = run.ElementsAfterSelf(HwpxNs.Hp + "run").FirstOrDefault(); + var text = nextRun?.Elements(HwpxNs.Hp + "t").FirstOrDefault()?.Value ?? ""; + var isDefault = !string.IsNullOrEmpty(helpText) && text == helpText; + + yield return new FormFieldInfo(fieldType, id, name, text, helpText, isDefault); + } + } + } + + // ==================== Object Finder (Plan 82) ==================== + + private static readonly string[] DefaultObjectTypes = ["picture", "field", "bookmark", "equation"]; + + /// List objects of specified type(s) with paths and previews. + public string ViewAsObjects(string? objectType = null) + { + var types = objectType != null ? [objectType] : DefaultObjectTypes; + var sb = new StringBuilder(); + int total = 0; + + foreach (var type in types) + { + List elements; + try { elements = ExecuteSelector(type); } + catch { continue; } + + // formfield: list interactive form fields only + if (type == "formfield") + elements = elements.Where(e => e.Attribute("type")?.Value is "CLICK_HERE" or "CHECKBOX" or "DROPDOWN").ToList(); + + if (elements.Count == 0) continue; + total += elements.Count; + sb.AppendLine($"{type}: {elements.Count}"); + foreach (var el in elements) + { + var path = BuildPath(el); + var preview = GetElementText(el); + if (preview.Length > 60) preview = preview[..60] + "…"; + if (string.IsNullOrWhiteSpace(preview)) preview = $"({el.Name.LocalName})"; + + // Extra info per type + var extra = type switch + { + "picture" or "img" => el.Attribute("binaryItemIDRef")?.Value is { } r ? $" [{r}]" : "", + "field" or "formfield" => el.Attribute("type")?.Value is { } t ? $" [{t}]" : "", + "equation" => "", + "bookmark" => el.Attribute("name")?.Value is { } n ? $" [{n}]" : "", + _ => "" + }; + sb.AppendLine($" {path}{extra}: {preview}"); + } + sb.AppendLine(); + } + + if (total == 0) + return "(no objects found)"; + sb.Insert(0, $"Objects: {total}\n\n"); + return sb.ToString().TrimEnd(); + } + + /// JSON output for object finder. + public JsonNode ViewAsObjectsJson(string? objectType = null) + { + var types = objectType != null ? [objectType] : DefaultObjectTypes; + var result = new JsonObject(); + + foreach (var type in types) + { + List elements; + try { elements = ExecuteSelector(type); } + catch { continue; } + + if (type == "formfield") + elements = elements.Where(e => e.Attribute("type")?.Value is "CLICK_HERE" or "CHECKBOX" or "DROPDOWN").ToList(); + + if (elements.Count == 0) continue; + + var arr = new JsonArray(); + foreach (var el in elements) + { + var obj = new JsonObject + { + ["path"] = BuildPath(el), + ["text"] = GetElementText(el) + }; + // Type-specific attributes + if (el.Attribute("binaryItemIDRef")?.Value is { } binRef) obj["binaryRef"] = binRef; + if (el.Attribute("type")?.Value is { } ft) obj["fieldType"] = ft; + if (el.Attribute("name")?.Value is { } bname) obj["name"] = bname; + arr.Add(obj); + } + result[type] = arr; + } + + return result; + } + + // ==================== Styles ==================== + + public string ViewAsStyles() + { + if (_doc.Header?.Root == null) return "(no header.xml)"; + var sb = new StringBuilder(); + var styles = _doc.Header.Root.Descendants(HwpxNs.Hh + "style").ToList(); + sb.AppendLine($"Styles: {styles.Count}"); + foreach (var style in styles) + { + var id = style.Attribute("id")?.Value ?? "?"; + var name = style.Attribute("name")?.Value ?? "(unnamed)"; + var engName = style.Attribute("engName")?.Value ?? ""; + var type = style.Attribute("type")?.Value ?? "PARA"; + var charPrId = style.Attribute("charPrIDRef")?.Value ?? "0"; + var paraPrId = style.Attribute("paraPrIDRef")?.Value ?? "0"; + var eng = !string.IsNullOrEmpty(engName) ? $" ({engName})" : ""; + sb.AppendLine($" [{id}] {name}{eng} [{type}] charPr={charPrId} paraPr={paraPrId}"); + } + return sb.ToString().TrimEnd(); + } + + // ==================== Table Map (Plan 71) ==================== + + /// + /// Display all tables with grid structure, recognized labels, and cell paths. + /// + public string ViewAsTables() + { + var sb = new StringBuilder(); + int tblCount = 0; + + foreach (var (sec, tbl, localTblIdx) in _doc.AllTables()) + { + tblCount++; + var (grid, cellList) = BuildTableGrid(tbl); + if (cellList.Count == 0) continue; + + int maxRow = grid.GetLength(0), maxCol = grid.GetLength(1); + var basePath = $"/section[{sec.Index + 1}]/tbl[{localTblIdx + 1}]"; + sb.AppendLine($"Table {tblCount} ({basePath}, {maxRow}×{maxCol}):"); + + // Grid visualization + for (int r = 0; r < maxRow; r++) + { + sb.Append($" [{r}] "); + for (int c = 0; c < maxCol; c++) + { + var cell = grid[r, c]; + if (cell == null) { sb.Append("· "); continue; } + + // Skip duplicate merged cell refs (only show on first occurrence) + var (cr, cc, rs, cs) = GetCellAddr(cell); + if (cr != r || cc != c) { sb.Append("↕ "); continue; } + + var text = ExtractCellText(cell).Trim(); + var preview = text.Length > 12 ? text[..12] + "…" : text; + if (string.IsNullOrEmpty(preview)) preview = "(empty)"; + + var span = (rs > 1 || cs > 1) ? $"[{rs}×{cs}]" : ""; + sb.Append($"{preview}{span} "); + } + sb.AppendLine(); + } + + // Recognized fields for this table + var fields = new List(); + var tableGrid = grid; // reuse + var seen = new HashSet(); + foreach (var (tc, row, col, rowSpan, colSpan) in cellList) + { + if (seen.Contains(tc)) continue; + seen.Add(tc); + var cellText = ExtractCellText(tc); + if (!IsLabelCell(cellText)) continue; + int targetCol = col + colSpan; + if (targetCol < maxCol) + { + var valueCell = grid[row, targetCol]; + if (valueCell != null && valueCell != tc) + { + var value = ExtractCellText(valueCell).Trim(); + if (!string.IsNullOrEmpty(value)) + fields.Add(new RecognizedField( + NormalizeLabel(cellText), value, basePath, row, col, "adjacent")); + } + } + } + if (fields.Count > 0) + { + sb.AppendLine($" Labels: {fields.Count}"); + foreach (var f in fields) + sb.AppendLine($" {f.Label}: {f.Value} (r{f.Row},c{f.Col})"); + } + sb.AppendLine(); + } + + if (tblCount == 0) + sb.AppendLine("(no tables)"); + else + sb.Insert(0, $"Tables: {tblCount}\n\n"); + + return sb.ToString().TrimEnd(); + } + + // ==================== Markdown Export (Plan 72) ==================== + + /// Export document as GitHub Flavored Markdown. + public string ViewAsMarkdown() + { + var sb = new StringBuilder(); + + foreach (var (section, element, path) in _doc.AllContentInOrder()) + { + var localName = element.Name.LocalName; + if (localName == "p") + { + // Check if this paragraph is inside a table cell (skip — handled by table renderer) + if (element.Ancestors().Any(a => a.Name.LocalName == "tc")) continue; + + var styleInfo = GetParagraphStyleInfo(element); + var mdLine = ParagraphToMarkdown(element); + if (string.IsNullOrWhiteSpace(mdLine)) { sb.AppendLine(); continue; } + + if (!string.IsNullOrEmpty(styleInfo.HeadingLevel)) + { + var level = Math.Clamp(int.Parse(styleInfo.HeadingLevel), 1, 6); + sb.AppendLine($"{new string('#', level)} {mdLine}"); + } + else + { + sb.AppendLine(mdLine); + } + sb.AppendLine(); + } + } + + // Render tables + foreach (var (sec, tbl, localTblIdx) in _doc.AllTables()) + { + var (grid, cellList) = BuildTableGrid(tbl); + if (cellList.Count == 0) continue; + int maxRow = grid.GetLength(0), maxCol = grid.GetLength(1); + + // F5: Single-cell tables → emit as structured text instead of table + if (maxRow == 1 && maxCol == 1 && cellList.Count == 1) + { + var cellText = ExtractCellText(cellList[0].Tc).Trim(); + if (!string.IsNullOrEmpty(cellText)) + { + var lines = cellText.Split('\n'); + foreach (var line in lines) + { + var trimmed = line.Trim(); + if (string.IsNullOrEmpty(trimmed)) { sb.AppendLine(); continue; } + var m = System.Text.RegularExpressions.Regex.Match(trimmed, @"^(\d+[.)]|[가-하][.]|[a-z][.)]) (.+)$"); + if (m.Success) + sb.AppendLine($"**{m.Groups[1].Value}** {m.Groups[2].Value}"); + else + sb.AppendLine(trimmed); + } + sb.AppendLine(); + } + continue; + } + + // F6: Pseudo-table demotion — skip tables with <=3 rows and >=30% empty cells + if (maxRow <= 3) + { + int totalCells = maxRow * maxCol; + int emptyCells = 0; + for (int r = 0; r < maxRow; r++) + for (int c = 0; c < maxCol; c++) + { + var cell = grid[r, c]; + if (cell == null || string.IsNullOrWhiteSpace(ExtractCellText(cell))) + emptyCells++; + } + if (totalCells > 0 && (double)emptyCells / totalCells >= 0.3) + { + for (int r = 0; r < maxRow; r++) + for (int c = 0; c < maxCol; c++) + { + var cell = grid[r, c]; + if (cell == null) continue; + var (cr, cc, _, _) = GetCellAddr(cell); + if (cr != r || cc != c) continue; + var text = ExtractCellText(cell).Trim(); + if (!string.IsNullOrEmpty(text)) + sb.AppendLine(text); + } + sb.AppendLine(); + continue; + } + } + + for (int r = 0; r < maxRow; r++) + { + sb.Append("| "); + for (int c = 0; c < maxCol; c++) + { + var cell = grid[r, c]; + if (cell == null) { sb.Append("| "); continue; } + var (cr, cc, _, _) = GetCellAddr(cell); + if (cr != r || cc != c) { sb.Append("| "); continue; } // merged continuation + var text = ExtractCellText(cell).Trim().Replace("\n", " ").Replace("|", "\\|"); + sb.Append($"{text} | "); + } + sb.AppendLine(); + + // Separator after header row + if (r == 0) + { + sb.Append("| "); + for (int c = 0; c < maxCol; c++) + sb.Append("--- | "); + sb.AppendLine(); + } + } + sb.AppendLine(); + } + + return sb.ToString().Trim(); + } + + private string ParagraphToMarkdown(XElement p) + { + var sb = new StringBuilder(); + foreach (var run in p.Elements(HwpxNs.Hp + "run")) + sb.Append(RunToMarkdown(run)); + return sb.ToString().Trim(); + } + + private string RunToMarkdown(XElement run) + { + var sb = new StringBuilder(); + var charPrId = run.Attribute("charPrIDRef")?.Value ?? "0"; + var charPr = FindCharPr(charPrId); + var hasBold = charPr?.Element(HwpxNs.Hh + "bold") != null; + var hasItalic = charPr?.Element(HwpxNs.Hh + "italic") != null; + var soEl = charPr?.Element(HwpxNs.Hh + "strikeout"); + var hasStrikeout = soEl != null && soEl.Attribute("shape")?.Value != "NONE"; + + var textParts = new StringBuilder(); + foreach (var child in run.Elements()) + { + switch (child.Name.LocalName) + { + case "t": + textParts.Append(child.Value); + break; + case "lineBreak": + textParts.Append(" \n"); // MD hard line break + break; + case "tab": + textParts.Append('\t'); + break; + case "equation": + var script = child.Element(HwpxNs.Hp + "script")?.Value + ?? child.Attribute("script")?.Value ?? child.Value; + textParts.Append($"`{script.Trim()}`"); + break; + case "img": case "picture": + var src = child.Attribute("binaryItemIDRef")?.Value ?? "image"; + textParts.Append($"![{src}]({src})"); + break; + } + } + + var text = textParts.ToString(); + if (string.IsNullOrEmpty(text)) return ""; + + // F4: GFM tilde escape — prevent false strikethrough from literal tildes + // Must happen BEFORE strikethrough wrapping + if (!hasStrikeout) + text = text.Replace("~", @"\~"); + + if (hasStrikeout) text = $"~~{text}~~"; + if (hasBold && hasItalic) text = $"***{text}***"; + else if (hasBold) text = $"**{text}**"; + else if (hasItalic) text = $"*{text}*"; + + sb.Append(text); + return sb.ToString(); + } + + /// JSON output for table map view. + public JsonNode ViewAsTablesJson() + { + var result = new JsonObject(); + var tablesArr = new JsonArray(); + + foreach (var (sec, tbl, localTblIdx) in _doc.AllTables()) + { + var (grid, cellList) = BuildTableGrid(tbl); + if (cellList.Count == 0) continue; + + int maxRow = grid.GetLength(0), maxCol = grid.GetLength(1); + var basePath = $"/section[{sec.Index + 1}]/tbl[{localTblIdx + 1}]"; + + var tblObj = new JsonObject + { + ["path"] = basePath, + ["rows"] = maxRow, + ["cols"] = maxCol + }; + + // Cells grid + var cellsArr = new JsonArray(); + for (int r = 0; r < maxRow; r++) + { + var rowArr = new JsonArray(); + for (int c = 0; c < maxCol; c++) + { + var cell = grid[r, c]; + if (cell == null) { rowArr.Add((JsonNode?)null); continue; } + var (cr, cc, rs, cs) = GetCellAddr(cell); + if (cr != r || cc != c) { rowArr.Add("↕"); continue; } + var text = ExtractCellText(cell).Trim(); + rowArr.Add(new JsonObject + { + ["text"] = text, + ["path"] = $"{basePath}/tr[{r + 1}]/tc[{c + 1}]", + ["rowSpan"] = rs, + ["colSpan"] = cs + }); + } + cellsArr.Add(rowArr); + } + tblObj["cells"] = cellsArr; + + tablesArr.Add(tblObj); + } + + result["tables"] = tablesArr; + return result; + } + + // ==================== HTML Preview ==================== + + public string ViewAsHtml(int? page = null) + { + var sb = new StringBuilder(); + sb.AppendLine(""); + sb.AppendLine("HWPX Preview"); + sb.AppendLine("
"); + + foreach (var (section, element, path) in _doc.AllContentInOrder()) + { + switch (element.Name.LocalName) + { + case "p": + var wrappedTbl = element.Descendants(HwpxNs.Hp + "tbl").FirstOrDefault(); + if (wrappedTbl != null) + sb.Append(TableToHtml(wrappedTbl)); + else + sb.Append(ParagraphToHtml(element)); + break; + } + } + + sb.AppendLine("
"); + return sb.ToString(); + } + + // ==================== HTML Helpers ==================== + + private string ParagraphToHtml(XElement p) + { + var styleInfo = GetParagraphStyleInfo(p); + var tag = "p"; + + if (!string.IsNullOrEmpty(styleInfo.HeadingLevel)) + { + var level = Math.Clamp(int.Parse(styleInfo.HeadingLevel), 1, 6); + tag = $"h{level}"; + } + + var paraCss = GetParaPrCss(p.Attribute("paraPrIDRef")?.Value ?? "0"); + + var sb = new StringBuilder(); + sb.Append($"<{tag}"); + if (!string.IsNullOrEmpty(paraCss)) sb.Append($" style=\"{paraCss}\""); + sb.Append('>'); + + foreach (var run in p.Elements(HwpxNs.Hp + "run")) + sb.Append(RunToHtml(run)); + + sb.Append($""); + return sb.ToString(); + } + + private string RunToHtml(XElement run) + { + var sb = new StringBuilder(); + var charPrId = run.Attribute("charPrIDRef")?.Value ?? "0"; + var css = GetCharPrCss(charPrId); + var charPr = FindCharPr(charPrId); + var hasBold = charPr?.Element(HwpxNs.Hh + "bold") != null; + var hasItalic = charPr?.Element(HwpxNs.Hh + "italic") != null; + var ulEl = charPr?.Element(HwpxNs.Hh + "underline"); + var hasUnderline = ulEl != null && ulEl.Attribute("type")?.Value != "NONE"; + var soEl = charPr?.Element(HwpxNs.Hh + "strikeout"); + var hasStrikeout = soEl != null && soEl.Attribute("shape")?.Value != "NONE"; + var hasSup = charPr?.Element(HwpxNs.Hh + "supscript") != null; + var hasSub = charPr?.Element(HwpxNs.Hh + "subscript") != null; + + if (!string.IsNullOrEmpty(css)) sb.Append($""); + if (hasBold) sb.Append(""); + if (hasItalic) sb.Append(""); + if (hasUnderline) sb.Append(""); + if (hasStrikeout) sb.Append(""); + if (hasSup) sb.Append(""); + if (hasSub) sb.Append(""); + + foreach (var child in run.Elements()) + { + switch (child.Name.LocalName) + { + case "t": + sb.Append(TextWithMarkpenToHtml(child)); + break; + case "lineBreak": + sb.Append("
"); + break; + case "tab": + sb.Append(" "); + break; + case "equation": + var script = child.Element(HwpxNs.Hp + "script")?.Value + ?? child.Attribute("script")?.Value ?? child.Value; + sb.Append($"[{EscapeHtml(script.Trim())}]"); + break; + case "pic": + sb.Append(PicToHtml(child)); + break; + } + } + + if (hasSub) sb.Append("
"); + if (hasSup) sb.Append("
"); + if (hasStrikeout) sb.Append("
"); + if (hasUnderline) sb.Append("
"); + if (hasItalic) sb.Append("
"); + if (hasBold) sb.Append("
"); + if (!string.IsNullOrEmpty(css)) sb.Append("
"); + + return sb.ToString(); + } + + private static string TextWithMarkpenToHtml(XElement t) + { + var sb = new StringBuilder(); + foreach (var node in t.Nodes()) + { + if (node is System.Xml.Linq.XText text) + sb.Append(EscapeHtml(text.Value)); + else if (node is XElement el) + { + if (el.Name.LocalName == "markpenBegin") + { + var color = el.Attribute("color")?.Value ?? "#FFFF00"; + sb.Append($""); + } + else if (el.Name.LocalName == "markpenEnd") + sb.Append(""); + } + } + return sb.ToString(); + } + + private string TableToHtml(XElement tbl) + { + var sb = new StringBuilder(); + sb.Append(""); + foreach (var tr in tbl.Elements(HwpxNs.Hp + "tr")) + { + sb.Append(""); + foreach (var tc in tr.Elements(HwpxNs.Hp + "tc")) + { + var cellSpan = tc.Element(HwpxNs.Hp + "cellSpan"); + var colspan = (int?)cellSpan?.Attribute("colSpan") ?? 1; + var rowspan = (int?)cellSpan?.Attribute("rowSpan") ?? 1; + var subList = tc.Element(HwpxNs.Hp + "subList"); + var vAlign = subList?.Attribute("vertAlign")?.Value?.ToLowerInvariant() ?? "top"; + + var bfId = tc.Attribute("borderFillIDRef")?.Value; + var cellCss = $"vertical-align:{vAlign}"; + if (bfId != null) + { + var bgColor = GetBorderFillBgColor(bfId); + if (bgColor != null) cellCss += $";background:{bgColor}"; + } + + sb.Append(" 1) sb.Append($" colspan=\"{colspan}\""); + if (rowspan > 1) sb.Append($" rowspan=\"{rowspan}\""); + sb.Append($" style=\"{cellCss}\">"); + + if (subList != null) + { + foreach (var cp in subList.Elements(HwpxNs.Hp + "p")) + sb.Append(ParagraphToHtml(cp)); + } + sb.Append(""); + } + sb.Append(""); + } + sb.Append("
"); + return sb.ToString(); + } + + private string PicToHtml(XElement pic) + { + var imgEl = pic.Descendants().FirstOrDefault(e => e.Name.LocalName == "img"); + var src = imgEl?.Attribute("src")?.Value ?? imgEl?.Attribute("binaryItemIDRef")?.Value; + if (src != null) + { + var binData = _doc.GetBinData(src); + if (binData != null) + { + var ext = Path.GetExtension(src).ToLowerInvariant(); + var mime = ext switch { ".png" => "image/png", ".gif" => "image/gif", ".bmp" => "image/bmp", _ => "image/jpeg" }; + return $""; + } + } + return "[image]"; + } + + private string GetCharPrCss(string charPrId) + { + var charPr = FindCharPr(charPrId); + if (charPr == null) return ""; + var parts = new List(); + var height = (int?)charPr.Attribute("height") ?? 1000; + parts.Add($"font-size:{height / 100.0:0.#}pt"); + var color = charPr.Attribute("textColor")?.Value; + if (color != null && color != "#000000") parts.Add($"color:{color}"); + var fontRef = charPr.Element(HwpxNs.Hh + "fontRef"); + if (fontRef != null) + { + var hangulRef = fontRef.Attribute("hangul")?.Value ?? "0"; + var fontName = GetFontName("HANGUL", hangulRef); + if (fontName != null) parts.Add($"font-family:'{fontName}',sans-serif"); + } + return string.Join(";", parts); + } + + private string GetParaPrCss(string paraPrId) + { + if (_doc.Header?.Root == null) return ""; + var paraPr = _doc.Header.Root.Descendants(HwpxNs.Hh + "paraPr") + .FirstOrDefault(p => p.Attribute("id")?.Value == paraPrId); + if (paraPr == null) return ""; + var parts = new List(); + var align = paraPr.Element(HwpxNs.Hh + "align")?.Attribute("horizontal")?.Value; + if (align != null && align != "JUSTIFY") + parts.Add($"text-align:{align.ToLowerInvariant()}"); + else if (align == "JUSTIFY") + parts.Add("text-align:justify"); + var margin = paraPr.Element(HwpxNs.Hh + "margin"); + if (margin != null) + { + var indent = (int?)margin.Attribute("indent") ?? 0; + if (indent != 0) parts.Add($"text-indent:{indent / 283.46:0.#}mm"); + var left = (int?)margin.Attribute("left") ?? 0; + if (left != 0) parts.Add($"margin-left:{left / 283.46:0.#}mm"); + } + var ls = paraPr.Element(HwpxNs.Hh + "lineSpacing"); + if (ls != null) + { + var lsType = ls.Attribute("type")?.Value; + var lsVal = (int?)ls.Attribute("value") ?? 160; + if (lsType == "PERCENT") parts.Add($"line-height:{lsVal / 100.0:0.##}"); + } + return string.Join(";", parts); + } + + private string? GetBorderFillBgColor(string bfId) + { + var bf = _doc.Header?.Root?.Descendants(HwpxNs.Hh + "borderFill") + .FirstOrDefault(b => b.Attribute("id")?.Value == bfId); + var winBrush = bf?.Descendants(HwpxNs.Hc + "winBrush").FirstOrDefault(); + return winBrush?.Attribute("faceColor")?.Value; + } + + private string? GetFontName(string lang, string fontRef) + { + var fontface = _doc.Header?.Root?.Descendants(HwpxNs.Hh + "fontface") + .FirstOrDefault(f => f.Attribute("lang")?.Value == lang); + var font = fontface?.Elements(HwpxNs.Hh + "font") + .FirstOrDefault(f => f.Attribute("id")?.Value == fontRef); + return font?.Attribute("face")?.Value; + } + + private static string EscapeHtml(string text) + => text.Replace("&", "&").Replace("<", "<").Replace(">", ">").Replace("\"", """); + + private static string HwpxHtmlCss() => """ + * { margin: 0; padding: 0; box-sizing: border-box; } + body { background: #e8e8e8; font-family: '함초롬돋움', 'Malgun Gothic', sans-serif; } + .page { max-width: 210mm; margin: 20px auto; padding: 20mm 25mm; background: #fff; + box-shadow: 0 2px 8px rgba(0,0,0,0.15); min-height: 297mm; } + p { margin: 2px 0; font-size: 10pt; line-height: 1.6; } + h1 { font-size: 16pt; margin: 12px 0 4px; } + h2 { font-size: 14pt; margin: 10px 0 4px; } + h3 { font-size: 12pt; margin: 8px 0 4px; } + h4, h5, h6 { font-size: 11pt; margin: 6px 0 4px; } + table { border-collapse: collapse; width: 100%; margin: 8px 0; } + td, th { border: 1px solid #000; padding: 4px 8px; font-size: 10pt; } + .hwpx-eq { font-family: 'HancomEQN', serif; color: #333; background: #f5f5f5; + padding: 2px 6px; border-radius: 3px; font-size: 0.9em; } + .hwpx-img { color: #999; font-style: italic; } + mark { padding: 1px 2px; } + @media print { body { background: #fff; } .page { box-shadow: none; margin: 0; padding: 20mm; } } + """; + + /// Extract all text from a paragraph's hp:run/hp:t elements. + private static string ExtractParagraphText(XElement para) + { + var runs = para.Elements(HwpxNs.Hp + "run"); + var sb = new StringBuilder(); + foreach (var run in runs) + { + foreach (var t in run.Elements(HwpxNs.Hp + "t")) + { + sb.Append(t.Value); + } + // Handle equations — extract Hancom equation script text + // Element name is hp:equation (confirmed by hwpxlib). hp:eqEdit is legacy HWP5 class name. + var eqEl = run.Element(HwpxNs.Hp + "equation") + ?? run.Element(HwpxNs.Hp + "eqEdit") + ?? run.Descendants().FirstOrDefault(e => + e.Name.LocalName == "equation" || e.Name.LocalName == "eqEdit"); + if (eqEl != null) + { + var script = eqEl.Element(HwpxNs.Hp + "script")?.Value + ?? eqEl.Attribute("script")?.Value + ?? eqEl.Value; + if (!string.IsNullOrEmpty(script)) + sb.Append($"[eq: {script}]"); + } + // Handle line breaks + if (run.Element(HwpxNs.Hp + "lineBreak") != null) + sb.Append('\n'); + if (run.Element(HwpxNs.Hp + "tab") != null) + sb.Append('\t'); + } + return sb.ToString(); + } + + /// Extract runs with formatting annotations. + private static List<(string Text, Dictionary Format)> ExtractAnnotatedRuns(XElement para) + { + var result = new List<(string, Dictionary)>(); + foreach (var run in para.Elements(HwpxNs.Hp + "run")) + { + var text = string.Join("", run.Elements(HwpxNs.Hp + "t").Select(t => t.Value)); + if (string.IsNullOrEmpty(text)) continue; + + var format = new Dictionary(); + var charPrIdRef = run.Attribute("charPrIDRef")?.Value; + if (charPrIdRef != null) + format["charPrIDRef"] = charPrIdRef; + + result.Add((text, format)); + } + return result; + } + + /// Get paragraph style info from attributes and header.xml lookup. + private (string? HeadingLevel, string Alignment) GetParagraphStyleInfo(XElement para) + { + var styleIdRef = para.Attribute("styleIDRef")?.Value; + var paraPrIdRef = para.Attribute("paraPrIDRef")?.Value; + + string? headingLevel = null; + string alignment = "LEFT"; + + // Look up style in header.xml + if (_doc.Header != null && styleIdRef != null) + { + var style = _doc.Header.Root!.Descendants(HwpxNs.Hh + "style") + .FirstOrDefault(s => s.Attribute("id")?.Value == styleIdRef); + if (style != null) + { + var name = style.Attribute("name")?.Value ?? ""; + // Korean heading styles: "개요 1", "개요 2", etc. + var headingMatch = System.Text.RegularExpressions.Regex.Match(name, @"개요\s*(\d+)"); + if (headingMatch.Success) + headingLevel = headingMatch.Groups[1].Value; + // English heading styles + var engMatch = System.Text.RegularExpressions.Regex.Match(name, @"(?i)heading\s*(\d+)"); + if (engMatch.Success) + headingLevel = engMatch.Groups[1].Value; + } + } + + // Look up paragraph properties for alignment and heading + if (_doc.Header != null && paraPrIdRef != null) + { + var paraPr = _doc.Header.Root!.Descendants(HwpxNs.Hh + "paraPr") + .FirstOrDefault(p => p.Attribute("id")?.Value == paraPrIdRef); + if (paraPr != null) + { + // Real HWPX: alignment is a child element + var alignEl = paraPr.Element(HwpxNs.Hh + "align"); + alignment = alignEl?.Attribute("horizontal")?.Value ?? "LEFT"; + + // Heading detection via paraPr > heading element (type="OUTLINE") + if (headingLevel == null) + { + var heading = paraPr.Element(HwpxNs.Hh + "heading"); + if (heading?.Attribute("type")?.Value == "OUTLINE" + && int.TryParse(heading.Attribute("level")?.Value, out var hl) && hl >= 1) + headingLevel = hl.ToString(); + } + } + } + + // F3: Legal appendix heading detection (별표/별지/별첨, 제N조 관련) + if (headingLevel == null) + { + var text = ExtractParagraphText(para); + if (System.Text.RegularExpressions.Regex.IsMatch(text, @"^\s*\[?별[표지첨]\s*(?:\d+\s*)?(?:의\s*\d+\s*)?(?:\]|$)")) + headingLevel = "2"; + else if (System.Text.RegularExpressions.Regex.IsMatch(text, @"^\s*\(제\s*\d+\s*조\s*관련\)")) + headingLevel = "3"; + // G3: Space-tolerant legal heading detection + else + { + var compacted = System.Text.RegularExpressions.Regex.Replace(text.TrimStart(), @"\s+", ""); + if (System.Text.RegularExpressions.Regex.IsMatch(compacted, @"^제\d+[장편](?![에의은을로서와가는도])")) + headingLevel = "1"; + else if (System.Text.RegularExpressions.Regex.IsMatch(compacted, @"^제\d+[절관](?![에의은을로서와가는도])")) + headingLevel = "2"; + } + } + + // Plan 99.9.I3: Font-size ratio heading detection (fallback when outline level not set) + if (headingLevel == null && _doc.Header != null) + { + var charPrIdRef = para.Elements(HwpxNs.Hp + "run") + .FirstOrDefault()?.Attribute("charPrIDRef")?.Value; + if (charPrIdRef != null) + { + var charPr = FindCharPr(charPrIdRef); + if (charPr != null) + { + double fontSize = GetFontSizePt(charPr); + double baseFontSize = _baseFontSizePt ??= ComputeBaseFontSize(); + if (baseFontSize > 0) + { + double ratio = fontSize / baseFontSize; + if (ratio >= 1.5) headingLevel = "1"; // H1: 150%+ + else if (ratio >= 1.3) headingLevel = "2"; // H2: 130%+ + else if (ratio >= 1.15) headingLevel = "3"; // H3: 115%+ + } + } + } + } + + return (headingLevel, alignment); + } + + /// + /// Plan 99.9.I3: Compute base (body) font size by finding the most frequent font size across all paragraphs. + /// Used as denominator for heading ratio detection. + /// + private double ComputeBaseFontSize() + { + var sizeCounts = new Dictionary(); + foreach (var (_, para, _) in _doc.AllParagraphs()) + { + var charPrIdRef = para.Elements(HwpxNs.Hp + "run") + .FirstOrDefault()?.Attribute("charPrIDRef")?.Value; + if (charPrIdRef == null) continue; + var charPr = FindCharPr(charPrIdRef); + if (charPr == null) continue; + double size = GetFontSizePt(charPr); + sizeCounts[size] = sizeCounts.GetValueOrDefault(size) + 1; + } + return sizeCounts.Count > 0 + ? sizeCounts.MaxBy(kv => kv.Value).Key + : 10.0; // default 10pt + } + + private int CountRemainingParagraphs(int currentLine) + { + int total = _doc.AllParagraphs().Count(); + return Math.Max(0, total - currentLine); + } + + private static int CountWords(string text) + { + if (string.IsNullOrWhiteSpace(text)) return 0; + // Korean: each syllable cluster counts as a word boundary + // Simple heuristic: split on whitespace, count non-empty + return text.Split(Array.Empty(), StringSplitOptions.RemoveEmptyEntries).Length; + } + + private static string FormatHwpUnit(int hwpUnit) + { + var mm = hwpUnit / 283.46; + return $"{mm:0.#}mm"; + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxHandler.cs b/src/officecli/Handlers/Hwpx/HwpxHandler.cs new file mode 100644 index 000000000..34da22cf7 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxHandler.cs @@ -0,0 +1,311 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.IO.Compression; +using System.Xml.Linq; +using System.Text.Json.Nodes; +using OfficeCli.Core; + +namespace OfficeCli.Handlers; + +public partial class HwpxHandler : IDocumentHandler +{ + private readonly HwpxDocument _doc; + private double? _baseFontSizePt; // Plan 99.9.I3: cached base font size for heading ratio + private readonly string _filePath; + private readonly bool _editable; + private readonly Stream _stream; + private bool _dirty; + private readonly HashSet _deletedBinData = new(); + + public HwpxHandler(string filePath, bool editable) + { + _filePath = filePath; + _editable = editable; + Stream? stream = null; + ZipArchive? archive = null; + try + { + stream = new FileStream(filePath, FileMode.Open, + editable ? FileAccess.ReadWrite : FileAccess.Read, + FileShare.ReadWrite); + archive = new ZipArchive(stream, + editable ? ZipArchiveMode.Update : ZipArchiveMode.Read); + _doc = LoadDocument(archive); + _stream = stream; + } + catch (InvalidDataException) + { + archive?.Dispose(); + stream?.Dispose(); + + // Plan 99.9.I2: Broken ZIP recovery — scan for Local File Headers + stream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite); + try + { + _doc = TryRecoverBrokenZip(stream); + _stream = stream; + } + catch + { + stream.Dispose(); + throw; + } + } + catch + { + archive?.Dispose(); + stream?.Dispose(); + throw; + } + } + + private static HwpxDocument LoadDocument(ZipArchive archive) + { + // Plan 99.9.E1: Path traversal defense + foreach (var entry in archive.Entries) + { + var name = entry.FullName; + if (string.IsNullOrEmpty(name)) continue; + if (name.Contains('\0') || + name.StartsWith('/') || name.StartsWith('\\') || + (name.Length >= 2 && name[1] == ':') || + name.Split('/', '\\').Any(seg => seg == "..")) + { + throw new InvalidDataException( + $"Suspicious ZIP entry path detected: '{name}'. " + + "Path traversal or absolute path entries are not allowed."); + } + if ((entry.ExternalAttributes & 0xF0000000) == 0xA0000000) + { + throw new InvalidDataException( + $"Symlink ZIP entry detected: '{name}'. Symlinks are not allowed."); + } + } + + // Plan 99.9.E2: ZIP bomb precheck + const int MaxEntries = 1000; + const long MaxUncompressedBytes = 200L * 1024 * 1024; // 200MB + const double MaxCompressionRatio = 100.0; + + if (archive.Entries.Count > MaxEntries) + throw new InvalidDataException( + $"ZIP entry count ({archive.Entries.Count}) exceeds safety limit ({MaxEntries})."); + + long totalUncompressed = 0; + foreach (var entry in archive.Entries) + { + if (entry.Length < 0 || totalUncompressed > MaxUncompressedBytes - entry.Length) + throw new InvalidDataException( + $"Total uncompressed size exceeds safety limit ({MaxUncompressedBytes / (1024*1024)}MB)."); + totalUncompressed += entry.Length; + if (entry.CompressedLength > 0) + { + double ratio = (double)entry.Length / entry.CompressedLength; + if (ratio > MaxCompressionRatio) + throw new InvalidDataException( + $"ZIP entry '{entry.FullName}' has suspicious compression ratio ({ratio:F1}:1)."); + } + else if (entry.Length > 0) + { + throw new InvalidDataException( + $"ZIP entry '{entry.FullName}' has zero compressed size but non-zero length — suspicious."); + } + } + if (totalUncompressed > MaxUncompressedBytes) + throw new InvalidDataException( + $"Total uncompressed size ({totalUncompressed / (1024*1024)}MB) exceeds safety limit ({MaxUncompressedBytes / (1024*1024)}MB)."); + + var doc = new HwpxDocument { Archive = archive }; + + // Plan 80: Rootfile-aware loading via HwpxManifest + // Tries: container.xml → rootfile → OPF manifest → conventional fallback + var manifest = HwpxManifest.Parse(archive); + doc.RootfilePath = manifest.RootfilePath; + + // Load manifest doc (for SaveManifest and validation) + var manifestPath = manifest.RootfilePath ?? "Contents/content.hpf"; + var hpfEntry = archive.GetEntry(manifestPath); + if (hpfEntry != null) + { + using var hpfStream = hpfEntry.Open(); + doc.ManifestDoc = LoadAndNormalize(hpfStream); + doc.ManifestEntryPath = hpfEntry.FullName; + } + + // Load header + if (!string.IsNullOrEmpty(manifest.HeaderPath)) + { + var headerEntry = archive.GetEntry(manifest.HeaderPath); + if (headerEntry != null) + { + doc.HeaderEntryPath = headerEntry.FullName; + using var stream = headerEntry.Open(); + doc.Header = LoadAndNormalize(stream); + } + } + + // Fallback: conventional header path + if (doc.Header == null) + { + var headerEntry = archive.GetEntry("Contents/header.xml"); + if (headerEntry != null) + { + doc.HeaderEntryPath = headerEntry.FullName; + using var stream = headerEntry.Open(); + doc.Header = LoadAndNormalize(stream); + } + } + + // Load sections from manifest-discovered paths + int idx = 0; + foreach (var sectionPath in manifest.SectionPaths) + { + var entry = archive.GetEntry(sectionPath); + if (entry == null) continue; + using var s = entry.Open(); + doc.Sections.Add(new HwpxSection + { + Index = idx++, + EntryPath = entry.FullName, + Document = LoadAndNormalize(s) + }); + } + + // Fallback: try section0.xml, section1.xml, ... + if (doc.Sections.Count == 0) + { + for (int i = 0; i < 100; i++) + { + var entry = archive.GetEntry($"Contents/section{i}.xml"); + if (entry == null) break; + using var s = entry.Open(); + doc.Sections.Add(new HwpxSection + { + Index = i, + EntryPath = entry.FullName, + Document = LoadAndNormalize(s) + }); + } + } + + if (doc.Sections.Count == 0) + throw new InvalidOperationException("No sections found in HWPX document"); + + return doc; + } + + // --- Helper: read ZIP entry, normalize HWPML 2016→2011 namespaces, then parse --- + private static XDocument LoadAndNormalize(Stream stream) + { + using var reader = new StreamReader(stream, System.Text.Encoding.UTF8); + var raw = reader.ReadToEnd(); + foreach (var (old, canonical) in HwpxNs.LegacyToCanonical) + raw = raw.Replace(old, canonical, StringComparison.Ordinal); + + // Plan 99.9.E5: XXE defense via secure parser settings + var settings = new System.Xml.XmlReaderSettings + { + DtdProcessing = System.Xml.DtdProcessing.Prohibit, + XmlResolver = null, + MaxCharactersFromEntities = 0 + }; + using var stringReader = new StringReader(raw); + using var xmlReader = System.Xml.XmlReader.Create(stringReader, settings); + return XDocument.Load(xmlReader); + } + + public bool TryExtractBinary(string path, string destPath, out string? contentType, out long byteCount) + { + contentType = null; + byteCount = 0; + // HWPX binary extraction not yet implemented + return false; + } + + // Plan 99.9.I2: Broken ZIP recovery — scan Local File Headers + private static HwpxDocument TryRecoverBrokenZip(Stream stream) + { + stream.Position = 0; + var data = new byte[stream.Length]; + stream.ReadExactly(data); + + const uint LocalFileHeader = 0x04034b50; + var recovered = new Dictionary(StringComparer.OrdinalIgnoreCase); + + int pos = 0; + while (pos + 30 < data.Length) + { + uint sig = BitConverter.ToUInt32(data, pos); + if (sig != LocalFileHeader) { pos++; continue; } + + ushort compMethod = BitConverter.ToUInt16(data, pos + 8); + uint compSize = BitConverter.ToUInt32(data, pos + 18); + uint uncompSize = BitConverter.ToUInt32(data, pos + 22); + ushort nameLen = BitConverter.ToUInt16(data, pos + 26); + ushort extraLen = BitConverter.ToUInt16(data, pos + 28); + + int headerEnd = pos + 30 + nameLen + extraLen; + if (headerEnd + compSize > data.Length) break; + + var entryName = System.Text.Encoding.UTF8.GetString(data, pos + 30, nameLen); + var compData = data.AsSpan(headerEnd, (int)compSize); + + try + { + byte[] entryData; + if (compMethod == 0) // STORED + { + entryData = compData.ToArray(); + } + else if (compMethod == 8) // DEFLATE + { + using var compStream = new System.IO.Compression.DeflateStream( + new MemoryStream(compData.ToArray()), + System.IO.Compression.CompressionMode.Decompress); + using var outStream = new MemoryStream(); + compStream.CopyTo(outStream); + entryData = outStream.ToArray(); + } + else + { + pos = headerEnd + (int)compSize; + continue; + } + + if (!recovered.ContainsKey(entryName)) + recovered[entryName] = entryData; + } + catch { /* skip unreadable entry */ } + + pos = headerEnd + (int)compSize; + } + + if (!recovered.Keys.Any(k => k.Contains("section", StringComparison.OrdinalIgnoreCase))) + throw new InvalidDataException( + "Broken ZIP recovery failed: no section XML found in recovered entries."); + + // Rebuild as in-memory ZIP for the standard loader + var memStream = new MemoryStream(); + using (var newZip = new ZipArchive(memStream, ZipArchiveMode.Create, true)) + { + foreach (var (name, bytes) in recovered) + { + var entry = newZip.CreateEntry(name); + using var s = entry.Open(); + s.Write(bytes); + } + } + + memStream.Position = 0; + var archive = new ZipArchive(memStream, ZipArchiveMode.Read); + return LoadDocument(archive); + } + + public void Dispose() + { + // Plan 99.9.E6: Ensure no lingering temp files + _doc.Archive.Dispose(); + _stream.Dispose(); + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxManifest.cs b/src/officecli/Handlers/Hwpx/HwpxManifest.cs new file mode 100644 index 000000000..e304652ab --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxManifest.cs @@ -0,0 +1,210 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.IO.Compression; +using System.Text.RegularExpressions; +using System.Xml.Linq; + +namespace OfficeCli.Handlers; + +/// Parse HWPX OPF manifest to discover section order. +public class HwpxManifest +{ + /// Section entry paths in spine order. + public List SectionPaths { get; } + + /// Path to header.xml (typically "Contents/header.xml"). + public string HeaderPath { get; } + + private HwpxManifest(List sectionPaths, string headerPath, string? rootfilePath = null) + { + SectionPaths = sectionPaths; + HeaderPath = headerPath; + RootfilePath = rootfilePath; + } + + /// Rootfile path selected from container.xml (null if fallback used). + public string? RootfilePath { get; private set; } + + /// Parse directly from a ZipArchive (Plan 80). + public static HwpxManifest Parse(ZipArchive archive) + { + var entries = new Dictionary(StringComparer.OrdinalIgnoreCase); + foreach (var entry in archive.Entries) + { + // Only read XML/text entries relevant to manifest discovery + if (entry.FullName.EndsWith(".xml", StringComparison.OrdinalIgnoreCase) + || entry.FullName.EndsWith(".hpf", StringComparison.OrdinalIgnoreCase) + || entry.FullName.EndsWith(".opf", StringComparison.OrdinalIgnoreCase)) + { + using var s = entry.Open(); + using var reader = new StreamReader(s, System.Text.Encoding.UTF8); + entries[entry.FullName] = reader.ReadToEnd(); + } + } + return Parse(entries); + } + + /// + /// Parse OPF manifest from pre-read ZIP entries to discover section order. + /// Tries container.xml → content.opf → content.hpf → fallback to entry scan. + /// + public static HwpxManifest Parse(Dictionary entries) + { + string? opfXml = null; + string? opfPath = null; + string? rootfilePath = null; + + if (entries.TryGetValue("META-INF/container.xml", out var containerXml)) + { + var containerDoc = XDocument.Parse(containerXml); + var rootFile = containerDoc.Descendants() + .FirstOrDefault(e => e.Name.LocalName == "rootfile"); + var fullPath = rootFile?.Attribute("full-path")?.Value; + + if (fullPath != null && entries.TryGetValue(fullPath, out var found)) + { + opfXml = found; + opfPath = fullPath; + rootfilePath = fullPath; + } + } + + if (opfXml == null) + { + var candidates = new[] { "content.opf", "Contents/content.opf", + "content.hpf", "Contents/content.hpf" }; + foreach (var candidate in candidates) + { + if (entries.TryGetValue(candidate, out var found)) + { + opfXml = found; + opfPath = candidate; + break; + } + } + } + + if (opfXml != null) + { + var opfDoc = XDocument.Parse(opfXml); + var manifest = ParseOpfManifest(opfDoc); + if (manifest != null) + { + manifest.RootfilePath = rootfilePath; + return manifest; + } + } + + var fallbackSections = FindSectionsFromEntries(entries); + var headerPath = entries.ContainsKey("Contents/header.xml") + ? "Contents/header.xml" + : ""; + + return new HwpxManifest(fallbackSections, headerPath, rootfilePath); + } + + private static HwpxManifest? ParseOpfManifest(XDocument opfDoc) + { + var root = opfDoc.Root; + if (root == null) return null; + + var manifestEl = root.Element(HwpxNs.Opf + "manifest") + ?? root.Elements().FirstOrDefault(e => e.Name.LocalName == "manifest"); + if (manifestEl == null) return null; + + var items = new Dictionary(StringComparer.OrdinalIgnoreCase); + foreach (var item in manifestEl.Elements() + .Where(e => e.Name.LocalName == "item")) + { + var id = item.Attribute("id")?.Value; + var href = item.Attribute("href")?.Value; + if (id != null && href != null) + items[id] = href; + } + + var spineEl = root.Element(HwpxNs.Opf + "spine") + ?? root.Elements().FirstOrDefault(e => e.Name.LocalName == "spine"); + + var sectionPaths = new List(); + var headerPath = ""; + + if (spineEl != null) + { + foreach (var itemref in spineEl.Elements() + .Where(e => e.Name.LocalName == "itemref")) + { + var idref = itemref.Attribute("idref")?.Value; + if (idref == null || !items.TryGetValue(idref, out var href)) + continue; + + var fullPath = href.StartsWith("Contents/") + ? href + : $"Contents/{href}"; + + if (fullPath.Contains("header", StringComparison.OrdinalIgnoreCase)) + { + headerPath = fullPath; + } + else if (fullPath.Contains("section", StringComparison.OrdinalIgnoreCase)) + { + sectionPaths.Add(fullPath); + } + } + } + + if (sectionPaths.Count == 0) + { + foreach (var (_, href) in items) + { + var fullPath = href.StartsWith("Contents/") ? href : $"Contents/{href}"; + if (fullPath.Contains("section", StringComparison.OrdinalIgnoreCase)) + sectionPaths.Add(fullPath); + else if (fullPath.Contains("header", StringComparison.OrdinalIgnoreCase)) + headerPath = fullPath; + } + } + + sectionPaths.Sort((a, b) => + { + var numA = ExtractSectionNumber(a); + var numB = ExtractSectionNumber(b); + return numA.CompareTo(numB); + }); + + if (sectionPaths.Count == 0) + return null; + + return new HwpxManifest(sectionPaths, headerPath); + } + + // G1: Regex-based section file matching for non-standard paths + private static readonly Regex SectionFileRegex = new( + @"(?:^|[/\\])section[_\-]?(\d+)\.xml$", + RegexOptions.IgnoreCase | RegexOptions.Compiled); + + private static List FindSectionsFromEntries(Dictionary entries) + { + var sectionPaths = new List<(string Path, int Number)>(); + foreach (var key in entries.Keys) + { + var m = SectionFileRegex.Match(key); + if (m.Success) + { + var num = int.TryParse(m.Groups[1].Value, out var n) ? n : 0; + sectionPaths.Add((key, num)); + } + } + + return sectionPaths + .OrderBy(x => x.Number) + .Select(x => x.Path) + .ToList(); + } + + private static int ExtractSectionNumber(string path) + { + var m = SectionFileRegex.Match(path); + return m.Success && int.TryParse(m.Groups[1].Value, out var num) ? num : 0; + } +} diff --git a/src/officecli/Handlers/Hwpx/HwpxNamespaces.cs b/src/officecli/Handlers/Hwpx/HwpxNamespaces.cs new file mode 100644 index 000000000..fbcddb940 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxNamespaces.cs @@ -0,0 +1,42 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Xml.Linq; + +namespace OfficeCli.Handlers; + +/// HWPX (OWPML) XML namespace constants. +/// +/// CRITICAL: Use XNamespace (static readonly), NOT string const. +/// "Hp" not "HP" — matches OWPML spec prefix. +/// OPF namespace uses trailing slash (matches Hancom output); no-slash variant normalized via LegacyToCanonical. +/// +public static class HwpxNs +{ + // Body sections (.xml files under Contents/) + public static readonly XNamespace Hs = "http://www.hancom.co.kr/hwpml/2011/section"; + public static readonly XNamespace Hp = "http://www.hancom.co.kr/hwpml/2011/paragraph"; + + // Header (.xml file at Contents/header.xml) + public static readonly XNamespace Hh = "http://www.hancom.co.kr/hwpml/2011/head"; + + // Core types (child elements inside header structures: margin children, etc.) + public static readonly XNamespace Hc = "http://www.hancom.co.kr/hwpml/2011/core"; + + // OPF manifest (META-INF/container.xml or mimetype OPF) + // Real Hancom files use trailing slash; some tooling omits it — support both via LegacyToCanonical + public static readonly XNamespace Opf = "http://www.idpf.org/2007/opf/"; + public static readonly XNamespace Dc = "http://purl.org/dc/elements/1.1/"; + + // Namespace URIs that appear in HWPML 2016 docs — must be normalized to 2011 before parsing + public static readonly Dictionary LegacyToCanonical = new() + { + ["http://www.hancom.co.kr/hwpml/2016/section"] = "http://www.hancom.co.kr/hwpml/2011/section", + ["http://www.hancom.co.kr/hwpml/2016/paragraph"] = "http://www.hancom.co.kr/hwpml/2011/paragraph", + ["http://www.hancom.co.kr/hwpml/2016/head"] = "http://www.hancom.co.kr/hwpml/2011/head", + ["http://www.hancom.co.kr/hwpml/2016/core"] = "http://www.hancom.co.kr/hwpml/2011/core", + ["http://www.hancom.co.kr/hwpml/2016/app"] = "http://www.hancom.co.kr/hwpml/2011/app", + // OPF: Hancom uses trailing slash; normalize no-slash variant + ["http://www.idpf.org/2007/opf\""] = "http://www.idpf.org/2007/opf/\"", + }; +} diff --git a/src/officecli/Handlers/Hwpx/HwpxPacker.cs b/src/officecli/Handlers/Hwpx/HwpxPacker.cs new file mode 100644 index 000000000..a08ded713 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxPacker.cs @@ -0,0 +1,215 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.IO.Compression; +using System.Text; +using System.Text.RegularExpressions; +using System.Xml.Linq; +using OfficeCli.Core; + +namespace OfficeCli.Handlers; + +/// Read/write HWPX ZIP container. +public static class HwpxPacker +{ + // ==================== Read ==================== + + /// + /// Read all XML entries from an HWPX ZIP file. + /// Non-XML entries (images, binaries) are skipped. + /// Namespace normalization is applied to every entry. + /// + public static Dictionary ReadAllEntries(string path) + { + if (!File.Exists(path)) + throw new CliException($"File not found: {path}"); + + var entries = new Dictionary(StringComparer.OrdinalIgnoreCase); + + using var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite); + using var zip = new ZipArchive(fs, ZipArchiveMode.Read); + foreach (var entry in zip.Entries) + { + if (!entry.FullName.EndsWith(".xml", StringComparison.OrdinalIgnoreCase) + && !entry.FullName.EndsWith(".hpf", StringComparison.OrdinalIgnoreCase) + && !entry.FullName.EndsWith(".opf", StringComparison.OrdinalIgnoreCase)) + continue; + + entries[entry.FullName] = ReadEntry(zip, entry.FullName); + } + + return entries; + } + + /// + /// Read a single ZIP entry as text, applying namespace normalization. + /// + public static string ReadEntry(ZipArchive zip, string entryName) + { + var entry = zip.GetEntry(entryName) + ?? throw new CliException($"Entry not found in ZIP: {entryName}"); + + using var stream = entry.Open(); + using var reader = new StreamReader(stream, Encoding.UTF8); + var xml = reader.ReadToEnd(); + + return NormalizeNamespaces(xml); + } + + private static string NormalizeNamespaces(string xml) + { + foreach (var (legacy, canonical) in HwpxNs.LegacyToCanonical) + { + xml = xml.Replace(legacy, canonical); + } + + return xml; + } + + /// + /// Minify XML by removing whitespace between tags. + /// Hancom Office requires single-line XML — pretty-printed XML renders blank. + /// + public static string MinifyXml(string xml) + { + // Remove whitespace between > and < (inter-element whitespace) + // But preserve whitespace inside text content + return System.Text.RegularExpressions.Regex.Replace(xml, @">\s+<", "><"); + } + + /// + /// Reverse the namespace normalization before saving back to ZIP. + /// Hancom Office expects the original 2016 namespace URIs to remain intact. + /// Only restores namespaces that appear as xmlns declarations (not element content). + /// + public static string RestoreOriginalNamespaces(string xml) + { + // Fix 1: Restore hp10 namespace declaration (2011→2016) + xml = xml.Replace( + "xmlns:hp10=\"http://www.hancom.co.kr/hwpml/2011/paragraph\"", + "xmlns:hp10=\"http://www.hancom.co.kr/hwpml/2016/paragraph\""); + + // Fix 2: XDocument may have swapped hp: ↔ hp10: prefixes since both resolve + // to the same URI after normalization. Restore original prefix usage: + // Elements that should use hp: but got hp10: due to namespace collapse + xml = xml.Replace("", "
"); + xml = xml.Replace("", ""); + xml = xml.Replace("", ""); + xml = xml.Replace("", "
"); + xml = xml.Replace("", ""); + xml = xml.Replace("", ""); + xml = xml.Replace("", ""); + xml = xml.Replace("", ""); + xml = xml.Replace("", ""); + xml = xml.Replace("", ""); + xml = xml.Replace("", ""); + xml = xml.Replace("", ""); + xml = xml.Replace("", ""); + // Catch-all: any remaining hp10: elements → hp: + xml = System.Text.RegularExpressions.Regex.Replace(xml, @"", ""); + + // Fix attributes: LINQ to XML may swap hp10: for hp: in namespaced attributes + // (e.g. hp10:required-namespace) since both resolved to the same URI during + // normalization. Original files always use hp: for attributes. + xml = System.Text.RegularExpressions.Regex.Replace(xml, @" hp10:([\w-]+)=""", " hp:$1=\""); + + return xml; + } + + // ==================== Write ==================== + + /// + /// Pack entries into a new HWPX ZIP file. + /// Atomic write: creates temp file → validates → renames to target. + /// + public static void Pack(string targetPath, Dictionary entries, + string? mimeType = null) + { + var tempPath = targetPath + ".tmp"; + + try + { + using (var fs = new FileStream(tempPath, FileMode.Create, FileAccess.Write)) + using (var zip = new ZipArchive(fs, ZipArchiveMode.Create)) + { + // 1. mimetype MUST be first entry, stored (no compression) + var mime = mimeType ?? "application/hwp+zip"; + var mimeEntry = zip.CreateEntry("mimetype", CompressionLevel.NoCompression); + using (var mimeStream = mimeEntry.Open()) + { + var mimeBytes = Encoding.ASCII.GetBytes(mime); + mimeStream.Write(mimeBytes, 0, mimeBytes.Length); + } + + // 2. All other entries: Deflate compression + foreach (var (name, content) in entries) + { + var entry = zip.CreateEntry(name, CompressionLevel.Optimal); + using var entryStream = entry.Open(); + var bytes = Encoding.UTF8.GetBytes(content); + entryStream.Write(bytes, 0, bytes.Length); + } + } + + // 3. Validate the temp file before committing + ValidateZip(tempPath); + + // 4. Atomic rename + File.Move(tempPath, targetPath, overwrite: true); + } + catch + { + if (File.Exists(tempPath)) + File.Delete(tempPath); + throw; + } + } + + private static void ValidateZip(string path) + { + using var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite); + using var zip = new ZipArchive(fs, ZipArchiveMode.Read); + foreach (var entry in zip.Entries) + { + using var stream = entry.Open(); + var buffer = new byte[4096]; + while (stream.Read(buffer, 0, buffer.Length) > 0) { } + } + } + + // ==================== XML Utilities ==================== + + /// + /// Remove all <hp:linesegarray> blocks from XML. + /// These are stale layout cache elements generated by Hancom's renderer. + /// + public static string StripLinesegarray(string xml) + { + return Regex.Replace( + xml, + @"]*>.*?", + "", + RegexOptions.Singleline); + } + + // Removed duplicate MinifyXml — the one above (regex-based) handles inter-element whitespace correctly +} diff --git a/src/officecli/Handlers/Hwpx/HwpxSection.cs b/src/officecli/Handlers/Hwpx/HwpxSection.cs new file mode 100644 index 000000000..5a36e88c4 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/HwpxSection.cs @@ -0,0 +1,18 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.Xml.Linq; + +namespace OfficeCli.Handlers; + +internal class HwpxSection +{ + public int Index { get; set; } + /// Actual ZIP entry path discovered from manifest (e.g. "Contents/section0.xml", "Contents/body_section.xml"). + public string EntryPath { get; init; } = null!; + public XDocument Document { get; set; } = null!; + public XElement Root => Document.Root!; + public List Paragraphs => Root.Elements(HwpxNs.Hp + "p").ToList(); + /// All tables: both direct children and nested inside paragraphs (Hancom format). + public List Tables => Root.Descendants(HwpxNs.Hp + "tbl").ToList(); +} diff --git a/src/officecli/Handlers/Hwpx/Validation/HwpxPackageValidationResult.cs b/src/officecli/Handlers/Hwpx/Validation/HwpxPackageValidationResult.cs new file mode 100644 index 000000000..222825bb5 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/Validation/HwpxPackageValidationResult.cs @@ -0,0 +1,11 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using OfficeCli.Handlers.Hwp.SafeSave; + +namespace OfficeCli.Handlers.Hwpx.Validation; + +internal sealed record HwpxPackageValidationResult( + IReadOnlyList Checks, + IReadOnlyDictionary PackageIntegrity +); diff --git a/src/officecli/Handlers/Hwpx/Validation/HwpxPackageValidator.cs b/src/officecli/Handlers/Hwpx/Validation/HwpxPackageValidator.cs new file mode 100644 index 000000000..b4e4215c4 --- /dev/null +++ b/src/officecli/Handlers/Hwpx/Validation/HwpxPackageValidator.cs @@ -0,0 +1,137 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using System.IO.Compression; +using OfficeCli.Core; +using OfficeCli.Handlers.Hwp.SafeSave; + +namespace OfficeCli.Handlers.Hwpx.Validation; + +internal static class HwpxPackageValidator +{ + public static HwpxPackageValidationResult Validate(string path, bool strictOrphans = false) + { + var checks = new List(); + var package = new Dictionary(StringComparer.Ordinal); + List validationErrors; + + try + { + using var zip = ZipFile.OpenRead(path); + var entryNames = zip.Entries.Select(entry => entry.FullName).ToArray(); + package["entryCount"] = entryNames.Length; + package["xmlPartCount"] = entryNames.Count(IsXmlLikePart); + checks.Add(new SafeSaveCheck("zip-open", true, "info", null, new Dictionary + { + ["entryCount"] = entryNames.Length + })); + checks.Add(PresenceCheck("manifest-present", HasEntry(entryNames, "Contents/content.hpf"), "Contents/content.hpf")); + checks.Add(PresenceCheck("content-parts-present", entryNames.Any(IsSectionPart), "Contents/section*.xml")); + checks.Add(PresenceCheck("header-parts-present", HasEntry(entryNames, "Contents/header.xml"), "Contents/header.xml")); + } + catch (Exception ex) when (ex is InvalidDataException or IOException or UnauthorizedAccessException) + { + package["entryCount"] = 0; + package["xmlPartCount"] = 0; + package["error"] = ex.Message; + checks.Add(new SafeSaveCheck("zip-open", false, "error", ex.Message)); + checks.Add(new SafeSaveCheck("package-integrity", false, "error", "HWPX ZIP package could not be opened.")); + return new HwpxPackageValidationResult(checks, package); + } + + try + { + using var handler = new HwpxHandler(path, editable: false); + validationErrors = handler.Validate(); + } + catch (Exception ex) + { + var errorType = ex is System.Xml.XmlException + ? "xml_malformed" + : "hwpx_load_failed"; + validationErrors = [new ValidationError(errorType, ex.Message, "/", null)]; + } + + var xmlOk = !validationErrors.Any(error => ContainsAny(error.ErrorType, "xml", "malformed")); + checks.Add(new SafeSaveCheck( + "xml-well-formed", + xmlOk, + xmlOk ? "info" : "error", + xmlOk ? null : "One or more HWPX XML parts are malformed.")); + + var missingBinData = validationErrors.Count(error => error.ErrorType.Contains("bindata_missing", StringComparison.OrdinalIgnoreCase)); + var orphanBinData = validationErrors.Count(error => error.ErrorType.Contains("bindata_orphan", StringComparison.OrdinalIgnoreCase)); + checks.Add(new SafeSaveCheck( + "bindata-references-present", + missingBinData == 0, + missingBinData == 0 ? "info" : "error", + missingBinData == 0 ? null : "One or more BinData references point to missing package entries.", + new Dictionary { ["missingCount"] = missingBinData })); + checks.Add(new SafeSaveCheck( + "orphan-reference-report", + true, + orphanBinData == 0 ? "info" : "warning", + orphanBinData == 0 ? null : "Package contains unreferenced BinData entries.", + new Dictionary { ["orphanCount"] = orphanBinData })); + + var blockingErrors = validationErrors + .Where(error => IsPackageBlocking(error, strictOrphans)) + .ToArray(); + var packageOk = checks.Where(check => check.Name != "orphan-reference-report").All(check => check.Ok) + && blockingErrors.Length == 0; + + package["validationErrorCount"] = validationErrors.Count; + package["blockingErrorCount"] = blockingErrors.Length; + package["missingBinDataCount"] = missingBinData; + package["orphanBinDataCount"] = orphanBinData; + package["strictOrphans"] = strictOrphans; + checks.Add(new SafeSaveCheck( + "package-integrity", + packageOk, + packageOk ? "info" : "error", + packageOk ? null : "HWPX package integrity validation failed.", + package)); + + return new HwpxPackageValidationResult(checks, package); + } + + private static SafeSaveCheck PresenceCheck(string name, bool ok, string expectedPath) => new( + name, + ok, + ok ? "info" : "error", + ok ? null : $"Missing required HWPX package part: {expectedPath}", + new Dictionary { ["expectedPath"] = expectedPath }); + + private static bool HasEntry(IEnumerable entryNames, string expected) + => entryNames.Any(name => string.Equals(name, expected, StringComparison.OrdinalIgnoreCase)); + + private static bool IsSectionPart(string entryName) + => entryName.StartsWith("Contents/section", StringComparison.OrdinalIgnoreCase) + && entryName.EndsWith(".xml", StringComparison.OrdinalIgnoreCase); + + private static bool IsXmlLikePart(string entryName) + => entryName.EndsWith(".xml", StringComparison.OrdinalIgnoreCase) + || entryName.EndsWith(".hpf", StringComparison.OrdinalIgnoreCase) + || entryName.EndsWith(".rdf", StringComparison.OrdinalIgnoreCase); + + private static bool ContainsAny(string value, params string[] needles) + => needles.Any(needle => value.Contains(needle, StringComparison.OrdinalIgnoreCase)); + + private static bool IsPackageBlocking(ValidationError error, bool strictOrphans) + { + if (error.ErrorType.Equals("package_version_missing", StringComparison.OrdinalIgnoreCase)) + return false; + if (!strictOrphans && error.ErrorType.Contains("bindata_orphan", StringComparison.OrdinalIgnoreCase)) + return false; + if (error.Severity != IssueSeverity.Error) return false; + return ContainsAny( + error.ErrorType, + "zip", + "opf", + "package", + "mimetype", + "container", + "xml", + "bindata"); + } +} diff --git a/src/officecli/Handlers/Pptx/PowerPointHandler.Add.Text.cs b/src/officecli/Handlers/Pptx/PowerPointHandler.Add.Text.cs index 3d193aabf..1c43ac3d8 100644 --- a/src/officecli/Handlers/Pptx/PowerPointHandler.Add.Text.cs +++ b/src/officecli/Handlers/Pptx/PowerPointHandler.Add.Text.cs @@ -266,25 +266,8 @@ private string AddParagraph(string parentPath, int? index, Dictionary (paragraph-level only adds one paragraph - // here), but \t expands to siblings between text runs - // so tabular text round-trips through PowerPoint. - var paraTextResolved = paraText.Replace("\\n", "\n").Replace("\\t", "\t"); - if (paraTextResolved.Contains('\t')) - { - AppendLineWithTabs(newPara, paraTextResolved, seg => new Drawing.Run - { - RunProperties = (Drawing.RunProperties)rProps.CloneNode(true), - Text = new Drawing.Text { Text = seg } - }); - } - else - { - newRun.RunProperties = rProps; - newRun.Text = new Drawing.Text { Text = paraTextResolved }; - newPara.Append(newRun); - } + foreach (var segmentedRun in BuildSegmentedRuns(paraText.Replace("\\n", "\n"), rProps)) + newPara.Append(segmentedRun); if (index.HasValue && index.Value >= 0) { @@ -396,41 +379,45 @@ private string AddRun(string parentPath, int? index, Dictionary else if (properties.TryGetValue("subscript", out var rSub)) rProps.Baseline = IsTruthy(rSub) ? -25000 : 0; - newRun.RunProperties = rProps; - // CONSISTENCY(escape-sequences): match shape-text path (\n and \t - // two-char escapes resolved). Run-add stays single-element, so - // tabs land as raw chars inside rather than ; - // higher-level shape-text Add/Set splits on \t into separate - // runs with siblings. - newRun.Text = new Drawing.Text { Text = runText.Replace("\\n", "\n").Replace("\\t", "\t") }; + var insertedRuns = BuildSegmentedRuns(runText.Replace("\\n", "\n"), rProps); - // Insert run at specified index, or append + // Insert runs at specified index, or append if (index.HasValue) { var existingRuns = targetPara.Elements().ToList(); if (index.Value >= 0 && index.Value < existingRuns.Count) - existingRuns[index.Value].InsertBeforeSelf(newRun); + { + var insertRef = existingRuns[index.Value]; + foreach (var segmentedRun in insertedRuns) + insertRef.InsertBeforeSelf(segmentedRun); + } else { var endParaRun2 = targetPara.GetFirstChild(); - if (endParaRun2 != null) - targetPara.InsertBefore(newRun, endParaRun2); - else - targetPara.Append(newRun); + foreach (var segmentedRun in insertedRuns) + { + if (endParaRun2 != null) + targetPara.InsertBefore(segmentedRun, endParaRun2); + else + targetPara.Append(segmentedRun); + } } } else { var endParaRun = targetPara.GetFirstChild(); - if (endParaRun != null) - targetPara.InsertBefore(newRun, endParaRun); - else - targetPara.Append(newRun); + foreach (var segmentedRun in insertedRuns) + { + if (endParaRun != null) + targetPara.InsertBefore(segmentedRun, endParaRun); + else + targetPara.Append(segmentedRun); + } } var runCount = targetPara.Elements().Count(); GetSlide(runSlidePart).Save(); - return $"/slide[{runSlideIdx}]/{BuildElementPathSegment("shape", runShape, runShapeIdx)}/paragraph[{targetParaIdx}]/run[{runCount}]"; + return $"/slide[{runSlideIdx}]/{BuildElementPathSegment("shape", runShape, runShapeIdx)}/paragraph[{targetParaIdx}]/run[{runCount - insertedRuns.Count + 1}]"; } // CONSISTENCY(escape-sequences): cross-handler convention — \t in paragraph diff --git a/src/officecli/Handlers/Pptx/PowerPointHandler.Cjk.cs b/src/officecli/Handlers/Pptx/PowerPointHandler.Cjk.cs new file mode 100644 index 000000000..4f7ac2102 --- /dev/null +++ b/src/officecli/Handlers/Pptx/PowerPointHandler.Cjk.cs @@ -0,0 +1,65 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using OfficeCli.Core; +using Drawing = DocumentFormat.OpenXml.Drawing; + +namespace OfficeCli.Handlers; + +public partial class PowerPointHandler +{ + private static List BuildSegmentedRuns(string text, Drawing.RunProperties? template = null, string fallbackLang = "en-US") + { + var segments = CjkHelper.SegmentText(text); + if (segments.Count == 0) + segments = new List<(string text, CjkScript script)> { ("", CjkScript.None) }; + + var runs = new List(); + foreach (var (segmentText, script) in segments) + { + var rPr = template?.CloneNode(true) as Drawing.RunProperties ?? new Drawing.RunProperties(); + if (script != CjkScript.None) + CjkHelper.ApplyToDrawingRun(rPr, script); + else + CjkHelper.ClearDrawingRunCjk(rPr, fallbackLang); + + runs.Add(new Drawing.Run( + rPr, + new Drawing.Text { Text = segmentText })); + } + + return runs; + } + + private static Drawing.Paragraph BuildParagraphWithSegmentedRuns( + string text, + Drawing.RunProperties? template = null, + Drawing.ParagraphProperties? paragraphProperties = null) + { + var paragraph = new Drawing.Paragraph(); + if (paragraphProperties != null) + paragraph.ParagraphProperties = paragraphProperties.CloneNode(true) as Drawing.ParagraphProperties; + + foreach (var run in BuildSegmentedRuns(text, template)) + paragraph.Append(run); + + return paragraph; + } + + private static void ReplaceRunWithSegmentedRuns(Drawing.Run run, string text) + { + if (run.Parent is not Drawing.Paragraph paragraph) + { + run.Text = new Drawing.Text { Text = text }; + var rPr = run.RunProperties ?? (run.RunProperties = new Drawing.RunProperties()); + CjkHelper.ApplyToDrawingRunIfCjk(rPr, text); + return; + } + + var template = run.RunProperties?.CloneNode(true) as Drawing.RunProperties; + foreach (var newRun in BuildSegmentedRuns(text, template)) + paragraph.InsertBefore(newRun, run); + + run.Remove(); + } +} diff --git a/src/officecli/Handlers/Pptx/PowerPointHandler.NodeBuilder.cs b/src/officecli/Handlers/Pptx/PowerPointHandler.NodeBuilder.cs index 7a1778d3f..bd5b927fd 100644 --- a/src/officecli/Handlers/Pptx/PowerPointHandler.NodeBuilder.cs +++ b/src/officecli/Handlers/Pptx/PowerPointHandler.NodeBuilder.cs @@ -1329,12 +1329,7 @@ private static Shape CreateTextShape(uint id, string name, string text, bool isT var lines = text.Replace("\\n", "\n").Replace("\\t", "\t").Split('\n'); foreach (var line in lines) { - var para = new Drawing.Paragraph(); - AppendLineWithTabs(para, line, seg => new Drawing.Run( - new Drawing.RunProperties { Language = "en-US" }, - new Drawing.Text { Text = seg } - )); - body.AppendChild(para); + body.AppendChild(BuildParagraphWithSegmentedRuns(line)); } shape.TextBody = body; return shape; diff --git a/src/officecli/Handlers/Pptx/PowerPointHandler.ShapeProperties.cs b/src/officecli/Handlers/Pptx/PowerPointHandler.ShapeProperties.cs index 4209f5bea..42fc96716 100644 --- a/src/officecli/Handlers/Pptx/PowerPointHandler.ShapeProperties.cs +++ b/src/officecli/Handlers/Pptx/PowerPointHandler.ShapeProperties.cs @@ -183,8 +183,7 @@ private static List SetRunOrShapeProperties( var textLines = resolved.Split('\n'); if (runs.Count == 1 && textLines.Length == 1 && !textLines[0].Contains('\t')) { - // Single run, single line, no tabs: just replace text - runs[0].Text = new Drawing.Text { Text = textLines[0] }; + ReplaceRunWithSegmentedRuns(runs[0], textLines[0]); } else { @@ -201,18 +200,7 @@ private static List SetRunOrShapeProperties( foreach (var textLine in textLines) { - var newPara = new Drawing.Paragraph(); - if (paraProps != null) - newPara.ParagraphProperties = paraProps.CloneNode(true) as Drawing.ParagraphProperties; - AppendLineWithTabs(newPara, textLine, seg => - { - var r = new Drawing.Run(); - if (runProps != null) - r.RunProperties = runProps.CloneNode(true) as Drawing.RunProperties; - r.Text = new Drawing.Text { Text = seg }; - return r; - }); - textBody.Append(newPara); + textBody.Append(BuildParagraphWithSegmentedRuns(textLine, runProps, paraProps)); } } } @@ -1262,11 +1250,7 @@ private static List SetTableCellProperties(Drawing.TableCell cell, Dicti new Drawing.BodyProperties(), new Drawing.ListStyle()); foreach (var line in lines) { - var para = new Drawing.Paragraph(); - AppendLineWithTabs(para, line, seg => new Drawing.Run( - new Drawing.RunProperties { Language = "en-US" }, - new Drawing.Text { Text = seg })); - textBody.AppendChild(para); + textBody.AppendChild(BuildParagraphWithSegmentedRuns(line)); } cell.PrependChild(textBody); } @@ -1276,19 +1260,7 @@ private static List SetTableCellProperties(Drawing.TableCell cell, Dicti var runProps = firstRun?.RunProperties?.CloneNode(true) as Drawing.RunProperties; textBody.RemoveAllChildren(); foreach (var line in lines) - { - var para = new Drawing.Paragraph(); - AppendLineWithTabs(para, line, seg => - { - var r = new Drawing.Run(); - r.RunProperties = runProps != null - ? runProps.CloneNode(true) as Drawing.RunProperties - : new Drawing.RunProperties { Language = "en-US" }; - r.Text = new Drawing.Text { Text = seg }; - return r; - }); - textBody.Append(para); - } + textBody.Append(BuildParagraphWithSegmentedRuns(line, runProps)); } break; } diff --git a/src/officecli/Handlers/Word/WordHandler.Add.Text.cs b/src/officecli/Handlers/Word/WordHandler.Add.Text.cs index a0ebb31eb..fd70ba0e2 100644 --- a/src/officecli/Handlers/Word/WordHandler.Add.Text.cs +++ b/src/officecli/Handlers/Word/WordHandler.Add.Text.cs @@ -558,7 +558,6 @@ ParagraphMarkRunProperties EnsureNoTextMarkRPr() => if (properties.TryGetValue("text", out var text)) { - var run = new Run(); var rProps = new RunProperties(); // Per-script font slots (font.latin / font.ea / font.cs) write // to ascii+hAnsi / eastAsia / cs respectively. Bare 'font' @@ -800,9 +799,8 @@ ParagraphMarkRunProperties EnsureNoTextMarkRPr() => // duplicate that round-trips out as a separate run-level shading // command on dump replay. - run.AppendChild(rProps); - AppendTextWithBreaks(run, text); - para.AppendChild(run); + foreach (var segmentedRun in BuildSegmentedRuns(text, rProps)) + para.AppendChild(segmentedRun); } // Dotted-key fallback: any "element.attr=value" prop the hand-rolled diff --git a/src/officecli/Handlers/Word/WordHandler.Cjk.cs b/src/officecli/Handlers/Word/WordHandler.Cjk.cs new file mode 100644 index 000000000..679c917f3 --- /dev/null +++ b/src/officecli/Handlers/Word/WordHandler.Cjk.cs @@ -0,0 +1,37 @@ +// Copyright 2025 OfficeCli (officecli.ai) +// SPDX-License-Identifier: Apache-2.0 + +using DocumentFormat.OpenXml; +using DocumentFormat.OpenXml.Wordprocessing; +using OfficeCli.Core; + +namespace OfficeCli.Handlers; + +public partial class WordHandler +{ + private static List BuildSegmentedRuns(string text, RunProperties? template = null) + { + var segments = CjkHelper.SegmentText(text); + if (segments.Count == 0) + segments = new List<(string text, CjkScript script)> { ("", CjkScript.None) }; + + var runs = new List(); + foreach (var (segmentText, script) in segments) + { + var rPr = template?.CloneNode(true) as RunProperties ?? new RunProperties(); + if (script != CjkScript.None) + CjkHelper.ApplyToWordRun(rPr, script); + else + CjkHelper.ClearWordRunCjk(rPr); + + // Route segment text through AppendTextWithBreaks so that '\n' (w:br) + // and '\t' (w:tab) inside a CJK segment round-trip through Word + // instead of collapsing to a space — matches upstream plain-run path. + var run = new Run(rPr); + AppendTextWithBreaks(run, segmentText); + runs.Add(run); + } + + return runs; + } +} diff --git a/src/officecli/Help/SchemaHelpLoader.cs b/src/officecli/Help/SchemaHelpLoader.cs index 2470eb8b4..e12409152 100644 --- a/src/officecli/Help/SchemaHelpLoader.cs +++ b/src/officecli/Help/SchemaHelpLoader.cs @@ -12,7 +12,7 @@ namespace OfficeCli.Help; ///
internal static class SchemaHelpLoader { - private static readonly string[] CanonicalFormats = { "docx", "xlsx", "pptx" }; + private static readonly string[] CanonicalFormats = { "docx", "xlsx", "pptx", "hwpx", "hwp" }; private static readonly Dictionary FormatAliases = new(StringComparer.OrdinalIgnoreCase) @@ -24,6 +24,10 @@ internal static class SchemaHelpLoader ["pptx"] = "pptx", ["ppt"] = "pptx", ["powerpoint"] = "pptx", + ["hwpx"] = "hwpx", + ["hwp"] = "hwp", + ["hangeul"] = "hwp", + ["hanword"] = "hwp", }; // Manifest index: canonical key "schemas/help/{format}/{element}.json" @@ -710,7 +714,7 @@ internal static IReadOnlyList FindUnknownKeys( } /// - /// Map a file extension (".docx"/".xlsx"/".pptx") to the canonical + /// Map a file extension (".docx"/".xlsx"/".pptx"/".hwpx"/".hwp") to the canonical /// schema format name, or null if the extension isn't an Office one. /// Small helper so CLI add/set sites don't duplicate the mapping. /// @@ -722,6 +726,8 @@ internal static IReadOnlyList FindUnknownKeys( ".docx" => "docx", ".xlsx" => "xlsx", ".pptx" => "pptx", + ".hwpx" => "hwpx", + ".hwp" => "hwp", _ => null, }; } diff --git a/src/officecli/McpServer.cs b/src/officecli/McpServer.cs index 6e472d7ea..8a6f86cb5 100644 --- a/src/officecli/McpServer.cs +++ b/src/officecli/McpServer.cs @@ -409,9 +409,20 @@ string[] ArgStringArray(string key) "outline" or "o" => handler.ViewAsOutline(), "stats" or "s" => StatsWithOptionalPageCount(handler, args, file), "issues" or "i" => OutputFormatter.FormatIssues(handler.ViewAsIssues(null, null), OutputFormat.Json), - "forms" or "f" => handler is Handlers.WordHandler wfh - ? wfh.ViewAsFormsJson().ToJsonString(OutputFormatter.PublicJsonOptions) - : throw new ArgumentException("Forms view is only supported for .docx files."), + "forms" or "f" => handler switch { + Handlers.WordHandler wfh => wfh.ViewAsFormsJson().ToJsonString(OutputFormatter.PublicJsonOptions), + Handlers.HwpxHandler hfh => hfh.ViewAsFormsJson().ToJsonString(OutputFormatter.PublicJsonOptions), + _ => throw new ArgumentException("Forms view is only supported for .docx and .hwpx files."), + }, + "tables" or "tbl" => handler is Handlers.HwpxHandler htm + ? htm.ViewAsTablesJson().ToJsonString(OutputFormatter.PublicJsonOptions) + : throw new ArgumentException("Tables view is only supported for .hwpx files."), + "markdown" or "md" => handler is Handlers.HwpxHandler hmm + ? hmm.ViewAsMarkdown() + : throw new ArgumentException("Markdown view is only supported for .hwpx files."), + "objects" or "obj" => handler is Handlers.HwpxHandler hom + ? hom.ViewAsObjectsJson().ToJsonString(OutputFormatter.PublicJsonOptions) + : throw new ArgumentException("Objects view is only supported for .hwpx files."), _ => throw new ArgumentException($"Unknown mode: {mode}") }; } @@ -711,11 +722,7 @@ private static void WriteToolDefinitions(Utf8JsonWriter w) w.WriteStartObject("items"); w.WriteString("type", "string"); w.WriteEndObject(); w.WriteString("description", "key=value pairs (e.g. bold=true, color=FF0000, text=Hello)"); w.WriteEndObject(); // mode - w.WriteStartObject("mode"); w.WriteString("type", "string"); w.WriteString("description", "View mode: text, annotated, outline, stats, issues, html, svg (pptx), screenshot (PNG via headless browser; needs playwright/chrome/firefox; takes seconds), forms (docx)"); w.WriteEndObject(); - // screenshot_width / screenshot_height / grid (screenshot mode) - w.WriteStartObject("screenshot_width"); w.WriteString("type", "number"); w.WriteString("description", "Viewport width for screenshot mode (default 1600)"); w.WriteEndObject(); - w.WriteStartObject("screenshot_height"); w.WriteString("type", "number"); w.WriteString("description", "Viewport height for screenshot mode (default 1200)"); w.WriteEndObject(); - w.WriteStartObject("grid"); w.WriteString("type", "number"); w.WriteString("description", "Tile slides into N-column thumbnail grid (screenshot mode, pptx only; 0 = off)"); w.WriteEndObject(); + w.WriteStartObject("mode"); w.WriteString("type", "string"); w.WriteString("description", "View mode: text, annotated, outline, stats, issues, html, svg (pptx), forms (docx/hwpx)"); w.WriteEndObject(); // depth w.WriteStartObject("depth"); w.WriteString("type", "number"); w.WriteString("description", "Child depth for get (default 1)"); w.WriteEndObject(); // index diff --git a/src/officecli/ResidentServer.cs b/src/officecli/ResidentServer.cs index bd7c1846c..831c617e4 100644 --- a/src/officecli/ResidentServer.cs +++ b/src/officecli/ResidentServer.cs @@ -848,6 +848,11 @@ private void NotifyWatchSlideChanged(string? changedPath) { if (!WatchServer.IsWatching(_filePath)) return; + if (_handler is OfficeCli.Handlers.HwpxHandler hwpx) + { + WatchNotifier.NotifyIfWatching(_filePath, new WatchMessage { Action = "full", FullHtml = hwpx.ViewAsHtml() }); + return; + } if (_handler is OfficeCli.Handlers.ExcelHandler excel) { string? scrollTo = null; @@ -884,6 +889,11 @@ private void NotifyWatchRootChanged(int oldSlideCount) { if (!WatchServer.IsWatching(_filePath)) return; + if (_handler is OfficeCli.Handlers.HwpxHandler hwpx) + { + WatchNotifier.NotifyIfWatching(_filePath, new WatchMessage { Action = "full", FullHtml = hwpx.ViewAsHtml() }); + return; + } if (_handler is OfficeCli.Handlers.WordHandler word) { var html = word.ViewAsHtml(); @@ -927,6 +937,8 @@ private void NotifyWatchFullRefresh() fullHtml = excel.ViewAsHtml(); else if (_handler is OfficeCli.Handlers.WordHandler word) fullHtml = word.ViewAsHtml(); + else if (_handler is OfficeCli.Handlers.HwpxHandler hwpx) + fullHtml = hwpx.ViewAsHtml(); if (fullHtml != null) WatchNotifier.NotifyIfWatching(_filePath, new WatchMessage { Action = "full", FullHtml = fullHtml }); } @@ -1162,8 +1174,11 @@ private void ExecuteView(ResidentRequest req, OutputFormat format) Console.WriteLine(OutputFormatter.FormatIssues(_handler.ViewAsIssues(issueType, limit), format)); else if (modeKey is "forms" or "f") { + var auto = req.Args.TryGetValue("auto", out var a) && a == "true"; if (_handler is OfficeCli.Handlers.WordHandler wordFormsHandler) Console.WriteLine(wordFormsHandler.ViewAsFormsJson().ToJsonString(OutputFormatter.PublicJsonOptions)); + else if (_handler is OfficeCli.Handlers.HwpxHandler hwpxFormsHandler) + Console.WriteLine(hwpxFormsHandler.ViewAsFormsJson(auto).ToJsonString(OutputFormatter.PublicJsonOptions)); else if (_handler is OfficeCli.Core.Plugins.FormatHandlerProxy formsProxy) { var formsJson = formsProxy.ViewAsFormsJson(); @@ -1173,10 +1188,23 @@ private void ExecuteView(ResidentRequest req, OutputFormat format) Console.WriteLine(formsJson.ToJsonString(OutputFormatter.PublicJsonOptions)); } else - Console.Error.WriteLine("Forms view is only supported for .docx files."); + Console.Error.WriteLine("Forms view is only supported for .docx and .hwpx files."); + } + else if (modeKey is "tables" or "tbl") + { + if (_handler is OfficeCli.Handlers.HwpxHandler hwpxTblRes) + Console.WriteLine(hwpxTblRes.ViewAsTablesJson().ToJsonString(OutputFormatter.PublicJsonOptions)); + else Console.Error.WriteLine("Tables view is only supported for .hwpx files."); + } + else if (modeKey is "objects" or "obj") + { + if (_handler is OfficeCli.Handlers.HwpxHandler hwpxObjRes) + Console.WriteLine(hwpxObjRes.ViewAsObjectsJson( + req.Args.TryGetValue("object-type", out var ot2) ? ot2 : null).ToJsonString(OutputFormatter.PublicJsonOptions)); + else Console.Error.WriteLine("Objects view is only supported for .hwpx files."); } else - Console.WriteLine($"Unknown mode: {mode}. Available: text, annotated, outline, stats, issues, html, svg, screenshot, forms"); + Console.WriteLine($"Unknown mode: {mode}. Available: text, annotated, outline, stats, issues, html, forms, tables, markdown, objects"); } else { @@ -1192,12 +1220,23 @@ private void ExecuteView(ResidentRequest req, OutputFormat format) "forms" or "f" => _handler switch { OfficeCli.Handlers.WordHandler wfh => wfh.ViewAsForms(), + OfficeCli.Handlers.HwpxHandler hfh => hfh.ViewAsForms( + req.Args.TryGetValue("auto", out var a2) && a2 == "true"), OfficeCli.Core.Plugins.FormatHandlerProxy fp => fp.ViewAsFormsJson()?.ToJsonString(OutputFormatter.PublicJsonOptions) ?? "Forms view is not supported by the format-handler plugin.", - _ => "Forms view is only supported for .docx files." + _ => "Forms view is only supported for .docx and .hwpx files." }, - _ => $"Unknown mode: {mode}. Available: text, annotated, outline, stats, issues, html, svg, screenshot, forms" + "tables" or "tbl" => _handler is OfficeCli.Handlers.HwpxHandler htbl + ? htbl.ViewAsTables() : "Tables view is only supported for .hwpx files.", + "markdown" or "md" => _handler is OfficeCli.Handlers.HwpxHandler hmd + ? hmd.ViewAsMarkdown() : "Markdown view is only supported for .hwpx files.", + "objects" or "obj" => _handler is OfficeCli.Handlers.HwpxHandler hobj + ? hobj.ViewAsObjects(req.Args.TryGetValue("object-type", out var ot) ? ot : null) + : "Objects view is only supported for .hwpx files.", + "styles" => _handler is OfficeCli.Handlers.HwpxHandler hstyle + ? hstyle.ViewAsStyles() : "Styles view is only supported for .hwpx files.", + _ => $"Unknown mode: {mode}. Available: text, annotated, outline, stats, issues, html, forms, tables, markdown, objects, styles" }; Console.WriteLine(output); } diff --git a/src/officecli/Resources/base.hwpx b/src/officecli/Resources/base.hwpx new file mode 100644 index 000000000..7b24a0586 Binary files /dev/null and b/src/officecli/Resources/base.hwpx differ diff --git a/src/officecli/officecli.csproj b/src/officecli/officecli.csproj index 343a9159b..29c1b45a2 100644 --- a/src/officecli/officecli.csproj +++ b/src/officecli/officecli.csproj @@ -24,6 +24,7 @@ + diff --git a/src/rhwp-field-bridge/Cargo.lock b/src/rhwp-field-bridge/Cargo.lock new file mode 100644 index 000000000..990400fde --- /dev/null +++ b/src/rhwp-field-bridge/Cargo.lock @@ -0,0 +1,1569 @@ +# This file is automatically @generated by Cargo. +# It is not intended for manual editing. +version = 4 + +[[package]] +name = "adler2" +version = "2.0.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "320119579fcad9c21884f5c4861d16174d0e06250625266f50fe6898340abefa" + +[[package]] +name = "aho-corasick" +version = "1.1.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ddd31a130427c27518df266943a5308ed92d4b226cc639f5a8f1002816174301" +dependencies = [ + "memchr", +] + +[[package]] +name = "arrayref" +version = "0.3.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "76a2e8124351fda1ef8aaaa3bbd7ebbcb486bbcd4225aca0aa0d84bb2db8fecb" + +[[package]] +name = "arrayvec" +version = "0.7.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50" + +[[package]] +name = "autocfg" +version = "1.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8" + +[[package]] +name = "base64" +version = "0.22.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72b3254f16251a8381aa12e40e3c4d2f0199f8c6508fbecb9d91f575e0fbb8c6" + +[[package]] +name = "bindgen" +version = "0.72.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "993776b509cfb49c750f11b8f07a46fa23e0a1386ffc01fb1e7d343efc387895" +dependencies = [ + "bitflags 2.11.1", + "cexpr", + "clang-sys", + "itertools", + "log", + "prettyplease", + "proc-macro2", + "quote", + "regex", + "rustc-hash", + "shlex", + "syn", +] + +[[package]] +name = "bitflags" +version = "1.3.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a" + +[[package]] +name = "bitflags" +version = "2.11.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c4512299f36f043ab09a583e57bceb5a5aab7a73db1805848e8fef3c9e8c78b3" + +[[package]] +name = "block-buffer" +version = "0.10.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3078c7629b62d3f0439517fa394996acacc5cbc91c5a20d8c658e77abd503a71" +dependencies = [ + "generic-array", +] + +[[package]] +name = "bumpalo" +version = "3.20.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5d20789868f4b01b2f2caec9f5c4e0213b41e3e5702a50157d699ae31ced2fcb" + +[[package]] +name = "bytemuck" +version = "1.25.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c8efb64bd706a16a1bdde310ae86b351e4d21550d98d056f22f8a7f7a2183fec" +dependencies = [ + "bytemuck_derive", +] + +[[package]] +name = "bytemuck_derive" +version = "1.10.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f9abbd1bc6865053c427f7198e6af43bfdedc55ab791faed4fbd361d789575ff" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "byteorder" +version = "1.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1fd0f2584146f6f2ef48085050886acf353beff7305ebd1ae69500e27c67f64b" + +[[package]] +name = "byteorder-lite" +version = "0.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8f1fe948ff07f4bd06c30984e69f5b4899c516a3ef74f34df92a2df2ab535495" + +[[package]] +name = "cc" +version = "1.2.62" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a1dce859f0832a7d088c4f1119888ab94ef4b5d6795d1ce05afb7fe159d79f98" +dependencies = [ + "find-msvc-tools", + "shlex", +] + +[[package]] +name = "cexpr" +version = "0.6.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6fac387a98bb7c37292057cffc56d62ecb629900026402633ae9160df93a8766" +dependencies = [ + "nom", +] + +[[package]] +name = "cfb" +version = "0.14.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a347dcabdae9c31b0825fd6a8bed285ec9c2acb89c47827126d52fa4f59cece3" +dependencies = [ + "fnv", + "uuid", + "web-time", +] + +[[package]] +name = "cfg-if" +version = "1.0.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801" + +[[package]] +name = "clang-sys" +version = "1.8.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b023947811758c97c59bf9d1c188fd619ad4718dcaa767947df1cadb14f39f4" +dependencies = [ + "glob", + "libc", + "libloading", +] + +[[package]] +name = "codepage" +version = "0.1.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "48f68d061bc2828ae826206326e61251aca94c1e4a5305cf52d9138639c918b4" +dependencies = [ + "encoding_rs", +] + +[[package]] +name = "color_quant" +version = "1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3d7b894f5411737b7867f4827955924d7c254fc9f4d91a6aad6b097804b1018b" + +[[package]] +name = "console_error_panic_hook" +version = "0.1.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a06aeb73f470f66dcdbf7223caeebb85984942f22f1adb2a088cf9668146bbbc" +dependencies = [ + "cfg-if", + "wasm-bindgen", +] + +[[package]] +name = "core_maths" +version = "0.1.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "77745e017f5edba1a9c1d854f6f3a52dac8a12dd5af5d2f54aecf61e43d80d30" +dependencies = [ + "libm", +] + +[[package]] +name = "cpufeatures" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "59ed5838eebb26a2bb2e58f6d5b5316989ae9d08bab10e0e6d103e656d1b0280" +dependencies = [ + "libc", +] + +[[package]] +name = "crc32fast" +version = "1.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9481c1c90cbf2ac953f07c8d4a58aa3945c425b7185c9154d67a65e4230da511" +dependencies = [ + "cfg-if", +] + +[[package]] +name = "crypto-common" +version = "0.1.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "78c8292055d1c1df0cce5d180393dc8cce0abec0a7102adb6c7b1eef6016d60a" +dependencies = [ + "generic-array", + "typenum", +] + +[[package]] +name = "data-url" +version = "0.3.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "be1e0bca6c3637f992fc1cc7cbc52a78c1ef6db076dbf1059c4323d6a2048376" + +[[package]] +name = "digest" +version = "0.10.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9ed9a281f7bc9b7576e61468ba615a66a5c8cfdff42420a70aa82701a3b1e292" +dependencies = [ + "block-buffer", + "crypto-common", +] + +[[package]] +name = "either" +version = "1.15.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "48c757948c5ede0e46177b7add2e67155f70e33c07fea8284df6576da70b3719" + +[[package]] +name = "embedded-io" +version = "0.7.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9eb1aa714776b75c7e67e1da744b81a129b3ff919c8712b5e1b32252c1f07cc7" + +[[package]] +name = "encoding_rs" +version = "0.8.35" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "75030f3c4f45dafd7586dd6780965a8c7e8e285a5ecb86713e63a79c5b2766f3" +dependencies = [ + "cfg-if", +] + +[[package]] +name = "equivalent" +version = "1.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "877a4ace8713b0bcf2a4e7eec82529c029f1d0619886d18145fea96c3ffe5c0f" + +[[package]] +name = "errno" +version = "0.3.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "39cab71617ae0d63f51a36d69f866391735b51691dbda63cf6f96d042b63efeb" +dependencies = [ + "libc", + "windows-sys", +] + +[[package]] +name = "euclid" +version = "0.22.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f1a05365e3b1c6d1650318537c7460c6923f1abdd272ad6842baa2b509957a06" +dependencies = [ + "num-traits", +] + +[[package]] +name = "fdeflate" +version = "0.3.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1e6853b52649d4ac5c0bd02320cddc5ba956bdb407c4b75a2c6b75bf51500f8c" +dependencies = [ + "simd-adler32", +] + +[[package]] +name = "filetime" +version = "0.2.29" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5c287a33c7f0a620c38e641e7f60827713987b3c0f26e8ddc9462cc69cf75759" +dependencies = [ + "cfg-if", + "libc", +] + +[[package]] +name = "find-msvc-tools" +version = "0.1.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5baebc0774151f905a1a2cc41989300b1e6fbb29aff0ceffa1064fdd3088d582" + +[[package]] +name = "flate2" +version = "1.1.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "843fba2746e448b37e26a819579957415c8cef339bf08564fe8b7ddbd959573c" +dependencies = [ + "crc32fast", + "miniz_oxide", + "zlib-rs", +] + +[[package]] +name = "float-cmp" +version = "0.9.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "98de4bbd547a563b716d8dfa9aad1cb19bfab00f4fa09a6a4ed21dbcf44ce9c4" + +[[package]] +name = "fnv" +version = "1.0.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1" + +[[package]] +name = "font-types" +version = "0.10.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "39a654f404bbcbd48ea58c617c2993ee91d1cb63727a37bf2323a4edeed1b8c5" +dependencies = [ + "bytemuck", +] + +[[package]] +name = "fontconfig-parser" +version = "0.5.8" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bbc773e24e02d4ddd8395fd30dc147524273a83e54e0f312d986ea30de5f5646" +dependencies = [ + "roxmltree", +] + +[[package]] +name = "fontdb" +version = "0.23.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "457e789b3d1202543297a350643cf459f836cade38934e7a4cf6a39e7cde2905" +dependencies = [ + "fontconfig-parser", + "log", + "memmap2", + "slotmap", + "tinyvec", + "ttf-parser", +] + +[[package]] +name = "futures-core" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7e3450815272ef58cec6d564423f6e755e25379b217b0bc688e295ba24df6b1d" + +[[package]] +name = "futures-task" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "037711b3d59c33004d3856fbdc83b99d4ff37a24768fa1be9ce3538a1cde4393" + +[[package]] +name = "futures-util" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "389ca41296e6190b48053de0321d02a77f32f8a5d2461dd38762c0593805c6d6" +dependencies = [ + "futures-core", + "futures-task", + "pin-project-lite", + "slab", +] + +[[package]] +name = "generic-array" +version = "0.14.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "85649ca51fd72272d7821adaf274ad91c288277713d9c18820d8499a7ff69e9a" +dependencies = [ + "typenum", + "version_check", +] + +[[package]] +name = "gif" +version = "0.13.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4ae047235e33e2829703574b54fdec96bfbad892062d97fed2f76022287de61b" +dependencies = [ + "color_quant", + "weezl", +] + +[[package]] +name = "gif" +version = "0.14.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ee8cfcc411d9adbbaba82fb72661cc1bcca13e8bba98b364e62b2dba8f960159" +dependencies = [ + "color_quant", + "weezl", +] + +[[package]] +name = "glob" +version = "0.3.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0cc23270f6e1808e30a928bdc84dea0b9b4136a8bc82338574f23baf47bbd280" + +[[package]] +name = "hashbrown" +version = "0.17.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4f467dd6dccf739c208452f8014c75c18bb8301b050ad1cfb27153803edb0f51" + +[[package]] +name = "heck" +version = "0.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2304e00983f87ffb38b55b444b5e3b60a884b5d30c0fca7d82fe33449bbe55ea" + +[[package]] +name = "image" +version = "0.25.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "85ab80394333c02fe689eaf900ab500fbd0c2213da414687ebf995a65d5a6104" +dependencies = [ + "bytemuck", + "byteorder-lite", + "color_quant", + "gif 0.14.2", + "moxcms", + "num-traits", + "png 0.18.1", + "zune-core 0.5.1", + "zune-jpeg 0.5.15", +] + +[[package]] +name = "image-webp" +version = "0.2.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "525e9ff3e1a4be2fbea1fdf0e98686a6d98b4d8f937e1bf7402245af1909e8c3" +dependencies = [ + "byteorder-lite", + "quick-error", +] + +[[package]] +name = "imagesize" +version = "0.13.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "edcd27d72f2f071c64249075f42e205ff93c9a4c5f6c6da53e79ed9f9832c285" + +[[package]] +name = "indexmap" +version = "2.14.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d466e9454f08e4a911e14806c24e16fba1b4c121d1ea474396f396069cf949d9" +dependencies = [ + "equivalent", + "hashbrown", +] + +[[package]] +name = "itertools" +version = "0.13.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "413ee7dfc52ee1a4949ceeb7dbc8a33f2d6c088194d9f922fb8318faf1f01186" +dependencies = [ + "either", +] + +[[package]] +name = "itoa" +version = "1.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8f42a60cbdf9a97f5d2305f08a87dc4e09308d1276d28c869c684d7777685682" + +[[package]] +name = "js-sys" +version = "0.3.97" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a1840c94c045fbcf8ba2812c95db44499f7c64910a912551aaaa541decebcacf" +dependencies = [ + "cfg-if", + "futures-util", + "once_cell", + "wasm-bindgen", +] + +[[package]] +name = "kurbo" +version = "0.11.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c62026ae44756f8a599ba21140f350303d4f08dcdcc71b5ad9c9bb8128c13c62" +dependencies = [ + "arrayvec", + "euclid", + "smallvec", +] + +[[package]] +name = "kurbo" +version = "0.12.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ce9729cc38c18d86123ab736fd2e7151763ba226ac2490ec092d1dd148825e32" +dependencies = [ + "arrayvec", + "euclid", + "smallvec", +] + +[[package]] +name = "libc" +version = "0.2.186" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "68ab91017fe16c622486840e4c83c9a37afeff978bd239b5293d61ece587de66" + +[[package]] +name = "libloading" +version = "0.8.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d7c4b02199fee7c5d21a5ae7d8cfa79a6ef5bb2fc834d6e9058e89c825efdc55" +dependencies = [ + "cfg-if", + "windows-link", +] + +[[package]] +name = "libm" +version = "0.2.16" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b6d2cec3eae94f9f509c767b45932f1ada8350c4bdb85af2fcab4a3c14807981" + +[[package]] +name = "linux-raw-sys" +version = "0.12.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "32a66949e030da00e8c7d4434b251670a91556f4144941d37452769c25d58a53" + +[[package]] +name = "log" +version = "0.4.29" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897" + +[[package]] +name = "memchr" +version = "2.8.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79" + +[[package]] +name = "memmap2" +version = "0.9.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "714098028fe011992e1c3962653c96b2d578c4b4bce9036e15ff220319b1e0e3" +dependencies = [ + "libc", +] + +[[package]] +name = "minimal-lexical" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "68354c5c6bd36d73ff3feceb05efa59b6acb7626617f4962be322a825e61f79a" + +[[package]] +name = "miniz_oxide" +version = "0.8.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1fa76a2c86f704bdb222d66965fb3d63269ce38518b83cb0575fca855ebb6316" +dependencies = [ + "adler2", + "simd-adler32", +] + +[[package]] +name = "moxcms" +version = "0.8.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bb85c154ba489f01b25c0d36ae69a87e4a1c73a72631fc6c0eb6dde34a73e44b" +dependencies = [ + "num-traits", + "pxfm", +] + +[[package]] +name = "nom" +version = "7.1.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d273983c5a657a70a3e8f2a01329822f3b8c8172b73826411a55751e404a0a4a" +dependencies = [ + "memchr", + "minimal-lexical", +] + +[[package]] +name = "num-traits" +version = "0.2.19" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "071dfc062690e90b734c0b2273ce72ad0ffa95f0c74596bc250dcfd960262841" +dependencies = [ + "autocfg", +] + +[[package]] +name = "once_cell" +version = "1.21.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9f7c3e4beb33f85d45ae3e3a1792185706c8e16d043238c593331cc7cd313b50" + +[[package]] +name = "paste" +version = "1.0.15" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "57c0d7b74b563b49d38dae00a0c37d4d6de9b432382b2892f0574ddcae73fd0a" + +[[package]] +name = "pcx" +version = "0.2.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dffa08d8cdb97709f15a9cf1bc0edd439a96e8c1d05687272b430b7a3cd8a64b" +dependencies = [ + "byteorder", +] + +[[package]] +name = "pdf-writer" +version = "0.12.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5df03c7d216de06f93f398ef06f1385a60f2c597bb96f8195c8d98e08a26b1d5" +dependencies = [ + "bitflags 2.11.1", + "itoa", + "memchr", + "ryu", +] + +[[package]] +name = "pico-args" +version = "0.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5be167a7af36ee22fe3115051bc51f6e6c7054c9348e28deb4f49bd6f705a315" + +[[package]] +name = "pin-project-lite" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a89322df9ebe1c1578d689c92318e070967d1042b512afbe49518723f4e6d5cd" + +[[package]] +name = "pkg-config" +version = "0.3.33" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "19f132c84eca552bf34cab8ec81f1c1dcc229b811638f9d283dceabe58c5569e" + +[[package]] +name = "png" +version = "0.17.16" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "82151a2fc869e011c153adc57cf2789ccb8d9906ce52c0b39a6b5697749d7526" +dependencies = [ + "bitflags 1.3.2", + "crc32fast", + "fdeflate", + "flate2", + "miniz_oxide", +] + +[[package]] +name = "png" +version = "0.18.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "60769b8b31b2a9f263dae2776c37b1b28ae246943cf719eb6946a1db05128a61" +dependencies = [ + "bitflags 2.11.1", + "crc32fast", + "fdeflate", + "flate2", + "miniz_oxide", +] + +[[package]] +name = "prettyplease" +version = "0.2.37" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "479ca8adacdd7ce8f1fb39ce9ecccbfe93a3f1344b3d0d97f20bc0196208f62b" +dependencies = [ + "proc-macro2", + "syn", +] + +[[package]] +name = "proc-macro2" +version = "1.0.106" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8fd00f0bb2e90d81d1044c2b32617f68fcb9fa3bb7640c23e9c748e53fb30934" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "pxfm" +version = "0.1.29" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e0c5ccf5294c6ccd63a74f1565028353830a9c2f5eb0c682c355c471726a6e3f" + +[[package]] +name = "quick-error" +version = "2.0.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a993555f31e5a609f617c12db6250dedcac1b0a85076912c436e6fc9b2c8e6a3" + +[[package]] +name = "quick-xml" +version = "0.39.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "958f21e8e7ceb5a1aa7fa87fab28e7c75976e0bfe7e23ff069e0a260f894067d" +dependencies = [ + "memchr", +] + +[[package]] +name = "quote" +version = "1.0.45" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41f2619966050689382d2b44f664f4bc593e129785a36d6ee376ddf37259b924" +dependencies = [ + "proc-macro2", +] + +[[package]] +name = "read-fonts" +version = "0.35.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6717cf23b488adf64b9d711329542ba34de147df262370221940dfabc2c91358" +dependencies = [ + "bytemuck", + "font-types", +] + +[[package]] +name = "regex" +version = "1.12.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e10754a14b9137dd7b1e3e5b0493cc9171fdd105e0ab477f51b72e7f3ac0e276" +dependencies = [ + "aho-corasick", + "memchr", + "regex-automata", + "regex-syntax", +] + +[[package]] +name = "regex-automata" +version = "0.4.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6e1dd4122fc1595e8162618945476892eefca7b88c52820e74af6262213cae8f" +dependencies = [ + "aho-corasick", + "memchr", + "regex-syntax", +] + +[[package]] +name = "regex-syntax" +version = "0.8.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dc897dd8d9e8bd1ed8cdad82b5966c3e0ecae09fb1907d58efaa013543185d0a" + +[[package]] +name = "resvg" +version = "0.45.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a8928798c0a55e03c9ca6c4c6846f76377427d2c1e1f7e6de3c06ae57942df43" +dependencies = [ + "gif 0.13.3", + "image-webp", + "log", + "pico-args", + "rgb", + "svgtypes", + "tiny-skia", + "usvg", + "zune-jpeg 0.4.21", +] + +[[package]] +name = "rgb" +version = "0.8.53" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "47b34b781b31e5d73e9fbc8689c70551fd1ade9a19e3e28cfec8580a79290cc4" +dependencies = [ + "bytemuck", +] + +[[package]] +name = "rhwp" +version = "0.7.10" +source = "git+https://github.com/edwardkim/rhwp.git?rev=62a458aa317e962cd3d0eec6096728c172d57110#62a458aa317e962cd3d0eec6096728c172d57110" +dependencies = [ + "base64", + "byteorder", + "cfb", + "codepage", + "console_error_panic_hook", + "embedded-io", + "encoding_rs", + "flate2", + "image", + "js-sys", + "paste", + "pcx", + "pdf-writer", + "quick-xml", + "serde", + "skia-safe", + "snafu", + "strum", + "subsetter", + "svg2pdf", + "ttf-parser", + "unicode-segmentation", + "unicode-width", + "usvg", + "wasm-bindgen", + "web-sys", + "zip", +] + +[[package]] +name = "rhwp-field-bridge" +version = "0.1.0" +dependencies = [ + "rhwp", + "serde_json", + "sha2", +] + +[[package]] +name = "roxmltree" +version = "0.20.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6c20b6793b5c2fa6553b250154b78d6d0db37e72700ae35fad9387a46f487c97" + +[[package]] +name = "rustc-hash" +version = "2.1.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "94300abf3f1ae2e2b8ffb7b58043de3d399c73fa6f4b73826402a5c457614dbe" + +[[package]] +name = "rustix" +version = "1.1.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b6fe4565b9518b83ef4f91bb47ce29620ca828bd32cb7e408f0062e9930ba190" +dependencies = [ + "bitflags 2.11.1", + "errno", + "libc", + "linux-raw-sys", + "windows-sys", +] + +[[package]] +name = "rustversion" +version = "1.0.22" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b39cdef0fa800fc44525c84ccb54a029961a8215f9619753635a9c0d2538d46d" + +[[package]] +name = "rustybuzz" +version = "0.20.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "fd3c7c96f8a08ee34eff8857b11b49b07d71d1c3f4e88f8a88d4c9e9f90b1702" +dependencies = [ + "bitflags 2.11.1", + "bytemuck", + "core_maths", + "log", + "smallvec", + "ttf-parser", + "unicode-bidi-mirroring", + "unicode-ccc", + "unicode-properties", + "unicode-script", +] + +[[package]] +name = "ryu" +version = "1.0.23" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9774ba4a74de5f7b1c1451ed6cd5285a32eddb5cccb8cc655a4e50009e06477f" + +[[package]] +name = "serde" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9a8e94ea7f378bd32cbbd37198a4a91436180c5bb472411e48b5ec2e2124ae9e" +dependencies = [ + "serde_core", + "serde_derive", +] + +[[package]] +name = "serde_core" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41d385c7d4ca58e59fc732af25c3983b67ac852c1a25000afe1175de458b67ad" +dependencies = [ + "serde_derive", +] + +[[package]] +name = "serde_derive" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "serde_json" +version = "1.0.149" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "83fc039473c5595ace860d8c4fafa220ff474b3fc6bfdb4293327f1a37e94d86" +dependencies = [ + "itoa", + "memchr", + "serde", + "serde_core", + "zmij", +] + +[[package]] +name = "serde_spanned" +version = "1.1.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6662b5879511e06e8999a8a235d848113e942c9124f211511b16466ee2995f26" +dependencies = [ + "serde_core", +] + +[[package]] +name = "sha2" +version = "0.10.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a7507d819769d01a365ab707794a4084392c824f54a7a6a7862f8c3d0892b283" +dependencies = [ + "cfg-if", + "cpufeatures", + "digest", +] + +[[package]] +name = "shlex" +version = "1.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64" + +[[package]] +name = "simd-adler32" +version = "0.3.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "703d5c7ef118737c72f1af64ad2f6f8c5e1921f818cdcb97b8fe6fc69bf66214" + +[[package]] +name = "simplecss" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7a9c6883ca9c3c7c90e888de77b7a5c849c779d25d74a1269b0218b14e8b136c" +dependencies = [ + "log", +] + +[[package]] +name = "siphasher" +version = "1.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b2aa850e253778c88a04c3d7323b043aeda9d3e30d5971937c1855769763678e" + +[[package]] +name = "skia-bindings" +version = "0.93.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2359f7e30c9da3f322f8ca3d4ec0abbc12a40035ce758309db0cdab07b5d4476" +dependencies = [ + "bindgen", + "cc", + "flate2", + "heck", + "pkg-config", + "regex", + "serde_json", + "tar", + "toml", +] + +[[package]] +name = "skia-safe" +version = "0.93.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7f9e837ea9d531c9efee8f980bfcdb7226b21db0285b0c3171d8be745829f940" +dependencies = [ + "bitflags 2.11.1", + "skia-bindings", +] + +[[package]] +name = "skrifa" +version = "0.37.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8c31071dedf532758ecf3fed987cdb4bd9509f900e026ab684b4ecb81ea49841" +dependencies = [ + "bytemuck", + "read-fonts", +] + +[[package]] +name = "slab" +version = "0.4.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0c790de23124f9ab44544d7ac05d60440adc586479ce501c1d6d7da3cd8c9cf5" + +[[package]] +name = "slotmap" +version = "1.1.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bdd58c3c93c3d278ca835519292445cb4b0d4dc59ccfdf7ceadaab3f8aeb4038" +dependencies = [ + "version_check", +] + +[[package]] +name = "smallvec" +version = "1.15.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "67b1b7a3b5fe4f1376887184045fcf45c69e92af734b7aaddc05fb777b6fbd03" + +[[package]] +name = "snafu" +version = "0.9.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d1d4bced6a69f90b2056c03dcff2c4737f98d6fb9e0853493996e1d253ca29c6" +dependencies = [ + "snafu-derive", +] + +[[package]] +name = "snafu-derive" +version = "0.9.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "54254b8531cafa275c5e096f62d48c81435d1015405a91198ddb11e967301d40" +dependencies = [ + "heck", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "strict-num" +version = "0.1.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6637bab7722d379c8b41ba849228d680cc12d0a45ba1fa2b48f2a30577a06731" +dependencies = [ + "float-cmp", +] + +[[package]] +name = "strum" +version = "0.28.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9628de9b8791db39ceda2b119bbe13134770b56c138ec1d3af810d045c04f9bd" +dependencies = [ + "strum_macros", +] + +[[package]] +name = "strum_macros" +version = "0.28.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ab85eea0270ee17587ed4156089e10b9e6880ee688791d45a905f5b1ca36f664" +dependencies = [ + "heck", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "subsetter" +version = "0.2.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cb6895a12ac5599bb6057362f00e8a3cf1daab4df33f553a55690a44e4fed8d0" +dependencies = [ + "kurbo 0.12.0", + "rustc-hash", + "skrifa", + "write-fonts", +] + +[[package]] +name = "svg2pdf" +version = "0.13.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e50dc062439cc1a396181059c80932a6e6bd731b130e674c597c0c8874b6df22" +dependencies = [ + "fontdb", + "image", + "log", + "miniz_oxide", + "once_cell", + "pdf-writer", + "resvg", + "siphasher", + "subsetter", + "tiny-skia", + "ttf-parser", + "usvg", +] + +[[package]] +name = "svgtypes" +version = "0.15.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "68c7541fff44b35860c1a7a47a7cadf3e4a304c457b58f9870d9706ece028afc" +dependencies = [ + "kurbo 0.11.3", + "siphasher", +] + +[[package]] +name = "syn" +version = "2.0.117" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e665b8803e7b1d2a727f4023456bbbbe74da67099c585258af0ad9c5013b9b99" +dependencies = [ + "proc-macro2", + "quote", + "unicode-ident", +] + +[[package]] +name = "tar" +version = "0.4.45" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "22692a6476a21fa75fdfc11d452fda482af402c008cdbaf3476414e122040973" +dependencies = [ + "filetime", + "libc", + "xattr", +] + +[[package]] +name = "tiny-skia" +version = "0.11.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "83d13394d44dae3207b52a326c0c85a8bf87f1541f23b0d143811088497b09ab" +dependencies = [ + "arrayref", + "arrayvec", + "bytemuck", + "cfg-if", + "log", + "png 0.17.16", + "tiny-skia-path", +] + +[[package]] +name = "tiny-skia-path" +version = "0.11.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9c9e7fc0c2e86a30b117d0462aa261b72b7a99b7ebd7deb3a14ceda95c5bdc93" +dependencies = [ + "arrayref", + "bytemuck", + "strict-num", +] + +[[package]] +name = "tinyvec" +version = "1.11.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3e61e67053d25a4e82c844e8424039d9745781b3fc4f32b8d55ed50f5f667ef3" +dependencies = [ + "tinyvec_macros", +] + +[[package]] +name = "tinyvec_macros" +version = "0.1.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1f3ccbac311fea05f86f61904b462b55fb3df8837a366dfc601a0161d0532f20" + +[[package]] +name = "toml" +version = "1.1.2+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "81f3d15e84cbcd896376e6730314d59fb5a87f31e4b038454184435cd57defee" +dependencies = [ + "indexmap", + "serde_core", + "serde_spanned", + "toml_datetime", + "toml_parser", + "toml_writer", + "winnow", +] + +[[package]] +name = "toml_datetime" +version = "1.1.1+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3165f65f62e28e0115a00b2ebdd37eb6f3b641855f9d636d3cd4103767159ad7" +dependencies = [ + "serde_core", +] + +[[package]] +name = "toml_parser" +version = "1.1.2+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a2abe9b86193656635d2411dc43050282ca48aa31c2451210f4202550afb7526" +dependencies = [ + "winnow", +] + +[[package]] +name = "toml_writer" +version = "1.1.1+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "756daf9b1013ebe47a8776667b466417e2d4c5679d441c26230efd9ef78692db" + +[[package]] +name = "ttf-parser" +version = "0.25.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d2df906b07856748fa3f6e0ad0cbaa047052d4a7dd609e231c4f72cee8c36f31" +dependencies = [ + "core_maths", +] + +[[package]] +name = "typed-path" +version = "0.12.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8e28f89b80c87b8fb0cf04ab448d5dd0dd0ade2f8891bae878de66a75a28600e" + +[[package]] +name = "typenum" +version = "1.20.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "40ce102ab67701b8526c123c1bab5cbe42d7040ccfd0f64af1a385808d2f43de" + +[[package]] +name = "unicode-bidi" +version = "0.3.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5c1cb5db39152898a79168971543b1cb5020dff7fe43c8dc468b0885f5e29df5" + +[[package]] +name = "unicode-bidi-mirroring" +version = "0.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5dfa6e8c60bb66d49db113e0125ee8711b7647b5579dc7f5f19c42357ed039fe" + +[[package]] +name = "unicode-ccc" +version = "0.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ce61d488bcdc9bc8b5d1772c404828b17fc481c0a582b5581e95fb233aef503e" + +[[package]] +name = "unicode-ident" +version = "1.0.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75" + +[[package]] +name = "unicode-properties" +version = "0.1.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7df058c713841ad818f1dc5d3fd88063241cc61f49f5fbea4b951e8cf5a8d71d" + +[[package]] +name = "unicode-script" +version = "0.5.8" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "383ad40bb927465ec0ce7720e033cb4ca06912855fc35db31b5755d0de75b1ee" + +[[package]] +name = "unicode-segmentation" +version = "1.13.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9629274872b2bfaf8d66f5f15725007f635594914870f65218920345aa11aa8c" + +[[package]] +name = "unicode-vo" +version = "0.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b1d386ff53b415b7fe27b50bb44679e2cc4660272694b7b6f3326d8480823a94" + +[[package]] +name = "unicode-width" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b4ac048d71ede7ee76d585517add45da530660ef4390e49b098733c6e897f254" + +[[package]] +name = "usvg" +version = "0.45.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "80be9b06fbae3b8b303400ab20778c80bbaf338f563afe567cf3c9eea17b47ef" +dependencies = [ + "base64", + "data-url", + "flate2", + "fontdb", + "imagesize", + "kurbo 0.11.3", + "log", + "pico-args", + "roxmltree", + "rustybuzz", + "simplecss", + "siphasher", + "strict-num", + "svgtypes", + "tiny-skia-path", + "unicode-bidi", + "unicode-script", + "unicode-vo", + "xmlwriter", +] + +[[package]] +name = "uuid" +version = "1.23.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ddd74a9687298c6858e9b88ec8935ec45d22e8fd5e6394fa1bd4e99a87789c76" +dependencies = [ + "js-sys", + "wasm-bindgen", +] + +[[package]] +name = "version_check" +version = "0.9.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a" + +[[package]] +name = "wasm-bindgen" +version = "0.2.120" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "df52b6d9b87e0c74c9edfa1eb2d9bf85e5d63515474513aa50fa181b3c4f5db1" +dependencies = [ + "cfg-if", + "once_cell", + "rustversion", + "wasm-bindgen-macro", + "wasm-bindgen-shared", +] + +[[package]] +name = "wasm-bindgen-macro" +version = "0.2.120" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "78b1041f495fb322e64aca85f5756b2172e35cd459376e67f2a6c9dffcedb103" +dependencies = [ + "quote", + "wasm-bindgen-macro-support", +] + +[[package]] +name = "wasm-bindgen-macro-support" +version = "0.2.120" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9dcd0ff20416988a18ac686d4d4d0f6aae9ebf08a389ff5d29012b05af2a1b41" +dependencies = [ + "bumpalo", + "proc-macro2", + "quote", + "syn", + "wasm-bindgen-shared", +] + +[[package]] +name = "wasm-bindgen-shared" +version = "0.2.120" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "49757b3c82ebf16c57d69365a142940b384176c24df52a087fb748e2085359ea" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "web-sys" +version = "0.3.97" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2eadbac71025cd7b0834f20d1fe8472e8495821b4e9801eb0a60bd1f19827602" +dependencies = [ + "js-sys", + "wasm-bindgen", +] + +[[package]] +name = "web-time" +version = "1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5a6580f308b1fad9207618087a65c04e7a10bc77e02c8e84e9b00dd4b12fa0bb" +dependencies = [ + "js-sys", + "wasm-bindgen", +] + +[[package]] +name = "weezl" +version = "0.1.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a28ac98ddc8b9274cb41bb4d9d4d5c425b6020c50c46f25559911905610b4a88" + +[[package]] +name = "windows-link" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f0805222e57f7521d6a62e36fa9163bc891acd422f971defe97d64e70d0a4fe5" + +[[package]] +name = "windows-sys" +version = "0.61.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ae137229bcbd6cdf0f7b80a31df61766145077ddf49416a728b02cb3921ff3fc" +dependencies = [ + "windows-link", +] + +[[package]] +name = "winnow" +version = "1.0.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0592e1c9d151f854e6fd382574c3a0855250e1d9b2f99d9281c6e6391af352f1" + +[[package]] +name = "write-fonts" +version = "0.43.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "886614b5ce857341226aa091f3c285e450683894acaaa7887f366c361efef79d" +dependencies = [ + "font-types", + "indexmap", + "kurbo 0.12.0", + "log", + "read-fonts", +] + +[[package]] +name = "xattr" +version = "1.6.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "32e45ad4206f6d2479085147f02bc2ef834ac85886624a23575ae137c8aa8156" +dependencies = [ + "libc", + "rustix", +] + +[[package]] +name = "xmlwriter" +version = "0.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ec7a2a501ed189703dba8b08142f057e887dfc4b2cc4db2d343ac6376ba3e0b9" + +[[package]] +name = "zip" +version = "8.6.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2d04a6b5381502aa6087c94c669499eb1602eb9c5e8198e534de571f7154809b" +dependencies = [ + "crc32fast", + "flate2", + "indexmap", + "memchr", + "typed-path", + "zopfli", +] + +[[package]] +name = "zlib-rs" +version = "0.6.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3be3d40e40a133f9c916ee3f9f4fa2d9d63435b5fbe1bfc6d9dae0aa0ada1513" + +[[package]] +name = "zmij" +version = "1.0.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa" + +[[package]] +name = "zopfli" +version = "0.8.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f05cd8797d63865425ff89b5c4a48804f35ba0ce8d125800027ad6017d2b5249" +dependencies = [ + "bumpalo", + "crc32fast", + "log", + "simd-adler32", +] + +[[package]] +name = "zune-core" +version = "0.4.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3f423a2c17029964870cfaabb1f13dfab7d092a62a29a89264f4d36990ca414a" + +[[package]] +name = "zune-core" +version = "0.5.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cb8a0807f7c01457d0379ba880ba6322660448ddebc890ce29bb64da71fb40f9" + +[[package]] +name = "zune-jpeg" +version = "0.4.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "29ce2c8a9384ad323cf564b67da86e21d3cfdff87908bc1223ed5c99bc792713" +dependencies = [ + "zune-core 0.4.12", +] + +[[package]] +name = "zune-jpeg" +version = "0.5.15" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "27bc9d5b815bc103f142aa054f561d9187d191692ec7c2d1e2b4737f8dbd7296" +dependencies = [ + "zune-core 0.5.1", +] diff --git a/src/rhwp-field-bridge/Cargo.toml b/src/rhwp-field-bridge/Cargo.toml new file mode 100644 index 000000000..251b94ad4 --- /dev/null +++ b/src/rhwp-field-bridge/Cargo.toml @@ -0,0 +1,13 @@ +[package] +name = "rhwp-field-bridge" +version = "0.1.0" +edition = "2021" +publish = false + +[features] +native-skia = ["rhwp/native-skia"] + +[dependencies] +rhwp = { git = "https://github.com/edwardkim/rhwp.git", rev = "62a458aa317e962cd3d0eec6096728c172d57110" } +serde_json = "1" +sha2 = "0.10" diff --git a/src/rhwp-field-bridge/src/main.rs b/src/rhwp-field-bridge/src/main.rs new file mode 100644 index 000000000..a8235bf66 --- /dev/null +++ b/src/rhwp-field-bridge/src/main.rs @@ -0,0 +1,101 @@ +mod ops; +mod ops_native; +mod ops_native_header_footer; +mod ops_native_objects; +mod ops_native_style; +mod ops_native_support; +mod ops_native_table; +mod ops_native_text; +mod ops_text; +mod ops_view; +mod options; + +use std::{env, fs, process}; + +use rhwp::wasm_api::HwpDocument; +use serde_json::json; + +use ops::{ + convert_to_editable, create_blank, get_cell_text, get_field, list_fields, replace_text, + save_as_hwp, scan_cells, set_cell_text, set_field, +}; +use ops_native::native_op; +use ops_text::insert_text; +use ops_view::{ + diagnostics, document_info, dump_controls, dump_pages, export_markdown, export_pdf, read_text, + render_png, render_svg, thumbnail, +}; +use options::parse_options; + +fn main() { + if let Err(err) = run() { + eprintln!("{err}"); + println!( + "{}", + json!({ + "success": false, + "error": { + "message": err, + "code": "rhwp_field_bridge_error" + } + }) + ); + process::exit(1); + } +} + +fn run() -> Result<(), String> { + let args: Vec = env::args().skip(1).collect(); + if args.is_empty() || args[0] == "--help" || args[0] == "-h" { + print_help(); + return Ok(()); + } + + let command = args[0].as_str(); + let options = parse_options(&args[1..])?; + + if command == "create-blank" { + return create_blank(&options); + } + + let input = options::required(&options, "--input")?; + let format = options::required(&options, "--format")?; + let bytes = fs::read(input).map_err(|e| format!("input read failed: {e}"))?; + let mut doc = HwpDocument::from_bytes(&bytes).map_err(|e| format!("rhwp parse failed: {e}"))?; + + match command { + "read-text" => read_text(&doc, format, &options), + "render-svg" => render_svg(&doc, format, &options), + "render-png" => render_png(&doc, format, &options), + "export-pdf" => export_pdf(&doc, format, &options), + "export-markdown" => export_markdown(&doc, format, &options), + "document-info" => document_info(&doc, format), + "diagnostics" => diagnostics(&doc, format), + "dump-controls" => dump_controls(&doc, format), + "dump-pages" => dump_pages(&doc, format, &options), + "thumbnail" => thumbnail(&bytes, format, &options), + "list-fields" => list_fields(&doc, format), + "get-field" => get_field(&doc, format, &options), + "set-field" => set_field(&mut doc, format, &options), + "replace-text" => replace_text(&mut doc, format, &options), + "insert-text" => insert_text(&mut doc, format, &options), + "get-cell-text" => get_cell_text(&doc, format, &options), + "scan-cells" => scan_cells(&doc, format, &options), + "set-cell-text" => set_cell_text(&mut doc, format, &options), + "convert-to-editable" => convert_to_editable(&mut doc, format, &options), + "native-op" => native_op(&mut doc, format, &options), + "save-as-hwp" => save_as_hwp(&mut doc, format, &options), + _ => Err(format!("unsupported command: {command}")), + } +} + +fn print_help() { + let render_png = if cfg!(all(not(target_arch = "wasm32"), feature = "native-skia")) { + "|render-png" + } else { + "" + }; + println!( + "rhwp-field-bridge create-blank|read-text|render-svg{render_png}|export-pdf|export-markdown|document-info|diagnostics|dump-controls|dump-pages|thumbnail|list-fields|get-field|set-field|replace-text|insert-text|get-cell-text|scan-cells|set-cell-text|convert-to-editable|native-op|save-as-hwp --format hwp|hwpx --input [--op ] [--output ] [--out-dir ] [--output-format hwp|hwpx] [--name ] [--id ] [--query ] [--value ] [--mode one|all] [--section N --paragraph N --para N --parent-para N --control N --cell N --cell-para N --offset N --count N] --json" + ); +} diff --git a/src/rhwp-field-bridge/src/ops.rs b/src/rhwp-field-bridge/src/ops.rs new file mode 100644 index 000000000..f47d9d977 --- /dev/null +++ b/src/rhwp-field-bridge/src/ops.rs @@ -0,0 +1,421 @@ +use std::{collections::BTreeMap, fs}; + +use rhwp::wasm_api::HwpDocument; +use serde_json::{json, Value}; + +use crate::options::{optional_usize, required, required_usize}; + +pub(crate) fn create_blank(options: &BTreeMap) -> Result<(), String> { + let output = required(options, "--output")?; + let mut doc = HwpDocument::create_empty(); + let info_json = doc + .create_blank_document_native() + .map_err(|e| format!("blank HWP creation failed: {e}"))?; + write_document(&mut doc, "hwp", output)?; + let document_info: Value = + serde_json::from_str(&info_json).unwrap_or_else(|_| json!({ "raw": info_json })); + println!( + "{}", + json!({ + "created": true, + "operation": "create-blank", + "output": output, + "format": "hwp", + "documentInfo": document_info, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "warnings": ["experimental blank HWP creation; verify with provider readback before production use"] + }) + ); + Ok(()) +} + +pub(crate) fn list_fields(doc: &HwpDocument, format: &str) -> Result<(), String> { + let fields: Value = serde_json::from_str(&doc.get_field_list_json()) + .map_err(|e| format!("field list JSON parse failed: {e}"))?; + println!( + "{}", + json!({ + "fields": fields, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": [] + }) + ); + Ok(()) +} + +pub(crate) fn get_field( + doc: &HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let value_json = if let Some(name) = options.get("--name") { + doc.get_field_value_by_name(name) + .map_err(|e| format!("field name lookup failed: {e}"))? + } else if let Some(id) = options.get("--id") { + let id = id + .parse::() + .map_err(|e| format!("invalid --id value: {e}"))?; + doc.get_field_value_by_id(id) + .map_err(|e| format!("field id lookup failed: {e}"))? + } else { + return Err("missing --name or --id".to_string()); + }; + + let value: Value = serde_json::from_str(&value_json) + .map_err(|e| format!("field value JSON parse failed: {e}"))?; + println!( + "{}", + json!({ + "field": value, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": [] + }) + ); + Ok(()) +} + +pub(crate) fn set_field( + doc: &mut HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let output = required(options, "--output")?; + let value = required(options, "--value")?; + let mutation_json = if let Some(name) = options.get("--name") { + doc.set_field_value_by_name(name, value) + .map_err(|e| format!("field name mutation failed: {e}"))? + } else if let Some(id) = options.get("--id") { + let id = id + .parse::() + .map_err(|e| format!("invalid --id value: {e}"))?; + doc.set_field_value_by_id(id, value) + .map_err(|e| format!("field id mutation failed: {e}"))? + } else { + return Err("missing --name or --id".to_string()); + }; + + write_document(doc, format, output)?; + + let field: Value = serde_json::from_str(&mutation_json) + .map_err(|e| format!("field mutation JSON parse failed: {e}"))?; + println!( + "{}", + json!({ + "field": field, + "output": output, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": ["experimental field mutation; verify round-trip before production use"] + }) + ); + Ok(()) +} + +pub(crate) fn replace_text( + doc: &mut HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let output = required(options, "--output")?; + let query = required(options, "--query")?; + let value = required(options, "--value")?; + let mode = options.get("--mode").map(String::as_str).unwrap_or("one"); + let case_sensitive = options + .get("--case-sensitive") + .map(|v| v.eq_ignore_ascii_case("true")) + .unwrap_or(false); + + let replace_json = match mode { + "one" => doc + .replace_one_native(query, value, case_sensitive) + .map_err(|e| format!("replace one failed: {e}"))?, + "all" => doc + .replace_all_native(query, value, case_sensitive) + .map_err(|e| format!("replace all failed: {e}"))?, + other => return Err(format!("unsupported --mode: {other}")), + }; + write_document(doc, format, output)?; + + let replacement: Value = serde_json::from_str(&replace_json) + .map_err(|e| format!("replace text JSON parse failed: {e}"))?; + println!( + "{}", + json!({ + "replacement": replacement, + "output": output, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": ["experimental text replacement; verify round-trip before production use"] + }) + ); + Ok(()) +} + +pub(crate) fn get_cell_text( + doc: &HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let section = required_usize(options, "--section")?; + let parent_para = required_usize(options, "--parent-para")?; + let control = required_usize(options, "--control")?; + let cell = required_usize(options, "--cell")?; + let cell_para = required_usize(options, "--cell-para")?; + let offset = optional_usize(options, "--offset")?.unwrap_or(0); + let count = optional_usize(options, "--count")?.unwrap_or(usize::MAX / 2); + + let text = doc + .get_text_in_cell_native( + section, + parent_para, + control, + cell, + cell_para, + offset, + count, + ) + .map_err(|e| format!("cell text lookup failed: {e}"))?; + println!( + "{}", + json!({ + "cell": { + "section": section, + "parentPara": parent_para, + "control": control, + "cell": cell, + "cellPara": cell_para, + "offset": offset, + "count": count, + "text": text + }, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": [] + }) + ); + Ok(()) +} + +pub(crate) fn scan_cells( + doc: &HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let section = optional_usize(options, "--section")?.unwrap_or(0); + let max_parent_para = optional_usize(options, "--max-parent-para")?.unwrap_or(50); + let max_control = optional_usize(options, "--max-control")?.unwrap_or(4); + let max_cell = optional_usize(options, "--max-cell")?.unwrap_or(64); + let max_cell_para = optional_usize(options, "--max-cell-para")?.unwrap_or(4); + let count = optional_usize(options, "--count")?.unwrap_or(120); + let include_empty = options + .get("--include-empty") + .map(|v| v.eq_ignore_ascii_case("true")) + .unwrap_or(false); + let mut cells = Vec::new(); + + for parent_para in 0..=max_parent_para { + for control in 0..=max_control { + for cell in 0..=max_cell { + for cell_para in 0..=max_cell_para { + let result = doc.get_text_in_cell_native( + section, + parent_para, + control, + cell, + cell_para, + 0, + count, + ); + if let Ok(text) = result { + if include_empty || !text.is_empty() { + cells.push(json!({ + "section": section, + "parentPara": parent_para, + "control": control, + "cell": cell, + "cellPara": cell_para, + "text": text + })); + } + } + } + } + } + } + + println!( + "{}", + json!({ + "cells": cells, + "count": cells.len(), + "limits": { + "section": section, + "maxParentPara": max_parent_para, + "maxControl": max_control, + "maxCell": max_cell, + "maxCellPara": max_cell_para + }, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": ["bounded scan; absence from results is not proof that no table cell exists"] + }) + ); + Ok(()) +} + +pub(crate) fn set_cell_text( + doc: &mut HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let output = required(options, "--output")?; + let output_format = options + .get("--output-format") + .map(String::as_str) + .unwrap_or(format); + if format == "hwpx" && output_format != "hwp" { + return Err("HWPX table cell mutation must use --output-format hwp".to_string()); + } + + let value = required(options, "--value")?; + let section = required_usize(options, "--section")?; + let parent_para = required_usize(options, "--parent-para")?; + let control = required_usize(options, "--control")?; + let cell = required_usize(options, "--cell")?; + let cell_para = required_usize(options, "--cell-para")?; + let offset = optional_usize(options, "--offset")?.unwrap_or(0); + let existing = doc + .get_text_in_cell_native( + section, + parent_para, + control, + cell, + cell_para, + offset, + usize::MAX / 2, + ) + .map_err(|e| format!("cell text lookup failed before mutation: {e}"))?; + let delete_count = + optional_usize(options, "--count")?.unwrap_or_else(|| existing.chars().count()); + + let delete_json = doc + .delete_text_in_cell_native( + section, + parent_para, + control, + cell, + cell_para, + offset, + delete_count, + ) + .map_err(|e| format!("cell text delete failed: {e}"))?; + let insert_json = doc + .insert_text_in_cell_native( + section, + parent_para, + control, + cell, + cell_para, + offset, + value, + ) + .map_err(|e| format!("cell text insert failed: {e}"))?; + write_document(doc, output_format, output)?; + + let delete_result: Value = serde_json::from_str(&delete_json) + .map_err(|e| format!("delete cell text JSON parse failed: {e}"))?; + let insert_result: Value = serde_json::from_str(&insert_json) + .map_err(|e| format!("insert cell text JSON parse failed: {e}"))?; + println!( + "{}", + json!({ + "cell": { + "section": section, + "parentPara": parent_para, + "control": control, + "cell": cell, + "cellPara": cell_para, + "offset": offset, + "deleteCount": delete_count, + "oldText": existing, + "newText": value + }, + "delete": delete_result, + "insert": insert_result, + "output": output, + "outputFormat": output_format, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": ["experimental table cell mutation; HWPX inputs are saved as HWP for verification"] + }) + ); + Ok(()) +} + +pub(crate) fn convert_to_editable( + doc: &mut HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let output = required(options, "--output")?; + let conversion_json = doc + .convert_to_editable_native() + .map_err(|e| format!("convert-to-editable failed: {e}"))?; + write_document(doc, "hwp", output)?; + + let conversion: Value = serde_json::from_str(&conversion_json) + .map_err(|e| format!("convert-to-editable JSON parse failed: {e}"))?; + println!( + "{}", + json!({ + "converted": conversion, + "output": output, + "outputFormat": "hwp", + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": ["experimental editable-HWP conversion; verify with provider readback and Hancom before production use"] + }) + ); + Ok(()) +} + +pub(crate) fn save_as_hwp( + doc: &mut HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let output = required(options, "--output")?; + write_document(doc, "hwp", output)?; + println!( + "{}", + json!({ + "saved": true, + "operation": "save-as-hwp", + "output": output, + "outputFormat": "hwp", + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": ["experimental HWP export; verify round-trip before production use"] + }) + ); + Ok(()) +} + +pub(crate) fn write_document( + doc: &mut HwpDocument, + format: &str, + output: &str, +) -> Result<(), String> { + let bytes = match format { + "hwp" => doc + .export_hwp_with_adapter() + .map_err(|e| format!("HWP export failed: {e}"))?, + "hwpx" => doc + .export_hwpx_native() + .map_err(|e| format!("HWPX export failed: {e}"))?, + other => return Err(format!("unsupported --format: {other}")), + }; + fs::write(output, bytes).map_err(|e| format!("output write failed: {e}")) +} diff --git a/src/rhwp-field-bridge/src/ops_native.rs b/src/rhwp-field-bridge/src/ops_native.rs new file mode 100644 index 000000000..d39b0c2db --- /dev/null +++ b/src/rhwp-field-bridge/src/ops_native.rs @@ -0,0 +1,68 @@ +use std::collections::BTreeMap; + +use rhwp::wasm_api::HwpDocument; +use serde_json::{json, Value}; + +use crate::ops::write_document; +use crate::ops_native_header_footer::try_run_native_header_footer_op; +use crate::ops_native_objects::try_run_native_object_op; +use crate::ops_native_style::try_run_native_style_op; +use crate::ops_native_table::try_run_native_table_op; +use crate::ops_native_text::try_run_native_text_op; +use crate::options::required; + +type NativeOpRunner = + fn(&mut HwpDocument, &str, &BTreeMap) -> Result, String>; + +const NATIVE_OP_RUNNERS: [NativeOpRunner; 5] = [ + try_run_native_text_op, + try_run_native_table_op, + try_run_native_style_op, + try_run_native_header_footer_op, + try_run_native_object_op, +]; + +pub(crate) fn native_op( + doc: &mut HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let op = required(options, "--op")?; + let output = options.get("--output").map(String::as_str); + let result = run_native_op(doc, op, options)?; + + if let Some(output_path) = output { + let output_format = options + .get("--output-format") + .map(String::as_str) + .unwrap_or(format); + write_document(doc, output_format, output_path)?; + } + + println!( + "{}", + json!({ + "operation": op, + "result": result, + "output": output, + "outputFormat": output.map(|_| options.get("--output-format").map(String::as_str).unwrap_or(format)), + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": ["experimental native rhwp operation; use output mode and verify with readback/Hancom before production use"] + }) + ); + Ok(()) +} + +fn run_native_op( + doc: &mut HwpDocument, + op: &str, + options: &BTreeMap, +) -> Result { + for runner in NATIVE_OP_RUNNERS { + if let Some(value) = runner(doc, op, options)? { + return Ok(value); + } + } + Err(format!("unsupported native op: {op}")) +} diff --git a/src/rhwp-field-bridge/src/ops_native_header_footer.rs b/src/rhwp-field-bridge/src/ops_native_header_footer.rs new file mode 100644 index 000000000..19d07fabd --- /dev/null +++ b/src/rhwp-field-bridge/src/ops_native_header_footer.rs @@ -0,0 +1,112 @@ +use std::collections::BTreeMap; + +use rhwp::wasm_api::HwpDocument; +use serde_json::Value; + +use crate::ops_native_support::*; +use crate::options::required; + +pub(crate) fn try_run_native_header_footer_op( + doc: &mut HwpDocument, + op: &str, + options: &BTreeMap, +) -> Result, String> { + let value = match op { + "get-header-footer" => json_call(doc.get_header_footer_native( + section(options)?, + is_header(options), + apply_to(options)?, + )), + "create-header-footer" => json_call(doc.create_header_footer_native( + section(options)?, + is_header(options), + apply_to(options)?, + )), + "delete-header-footer" => json_call(doc.delete_header_footer_native( + section(options)?, + is_header(options), + apply_to(options)?, + )), + "get-header-footer-list" => json_call(doc.get_header_footer_list_native( + section(options)?, + is_header(options), + apply_to(options)?, + )), + "get-header-footer-para-info" => json_call(doc.get_header_footer_para_info_native( + section(options)?, + is_header(options), + apply_to(options)?, + req_usize(options, "--hf-para")?, + )), + "navigate-header-footer-by-page" => json_call(doc.navigate_header_footer_by_page_native( + req_u32(options, "--page")?, + is_header(options), + req_i32(options, "--direction")?, + )), + "toggle-hide-header-footer" => json_call( + doc.toggle_hide_header_footer_native(req_u32(options, "--page")?, is_header(options)), + ), + "get-para-properties-in-hf" => json_call(doc.get_para_properties_in_hf_native( + section(options)?, + is_header(options), + apply_to(options)?, + req_usize(options, "--hf-para")?, + )), + "apply-para-format-in-hf" => json_call(doc.apply_para_format_in_hf_native( + section(options)?, + is_header(options), + apply_to(options)?, + req_usize(options, "--hf-para")?, + props_json(options), + )), + "insert-field-in-hf" => json_call(doc.insert_field_in_hf_native( + section(options)?, + is_header(options), + apply_to(options)?, + req_usize(options, "--hf-para")?, + offset(options)?, + req_u8(options, "--field-type")?, + )), + "apply-hf-template" => json_call(doc.apply_hf_template_native( + section(options)?, + is_header(options), + apply_to(options)?, + req_u8(options, "--template-id")?, + )), + "insert-text-in-header-footer" => json_call(doc.insert_text_in_header_footer_native( + section(options)?, + is_header(options), + apply_to(options)?, + req_usize(options, "--hf-para")?, + offset(options)?, + required(options, "--value")?, + )), + "delete-text-in-header-footer" => json_call(doc.delete_text_in_header_footer_native( + section(options)?, + is_header(options), + apply_to(options)?, + req_usize(options, "--hf-para")?, + offset(options)?, + count(options)?, + )), + "split-paragraph-in-header-footer" => { + json_call(doc.split_paragraph_in_header_footer_native( + section(options)?, + is_header(options), + apply_to(options)?, + req_usize(options, "--hf-para")?, + offset(options)?, + )) + } + "merge-paragraph-in-header-footer" => { + json_call(doc.merge_paragraph_in_header_footer_native( + section(options)?, + is_header(options), + apply_to(options)?, + req_usize(options, "--hf-para")?, + )) + } + _ => return Ok(None), + }?; + Ok(Some(value)) +} diff --git a/src/rhwp-field-bridge/src/ops_native_objects.rs b/src/rhwp-field-bridge/src/ops_native_objects.rs new file mode 100644 index 000000000..95576a1b1 --- /dev/null +++ b/src/rhwp-field-bridge/src/ops_native_objects.rs @@ -0,0 +1,157 @@ +use std::collections::BTreeMap; + +use rhwp::wasm_api::HwpDocument; +use serde_json::Value; + +use crate::ops_native_support::*; +use crate::options::required; + +pub(crate) fn try_run_native_object_op( + doc: &mut HwpDocument, + op: &str, + options: &BTreeMap, +) -> Result, String> { + let value = match op { + "insert-picture" => insert_picture(doc, options), + "get-picture-properties" => json_call(doc.get_picture_properties_native( + section(options)?, + parent_para(options)?, + control(options)?, + )), + "set-picture-properties" => json_call(doc.set_picture_properties_native( + section(options)?, + parent_para(options)?, + control(options)?, + props_json(options), + )), + "delete-picture-control" => json_call(doc.delete_picture_control_native( + section(options)?, + parent_para(options)?, + control(options)?, + )), + "create-shape-control" => json_call( + doc.create_shape_control_native( + section(options)?, + paragraph(options)?, + offset(options)?, + req_u32(options, "--width")?, + req_u32(options, "--height")?, + req_u32(options, "--horz-offset")?, + req_u32(options, "--vert-offset")?, + bool_opt(options, "--treat-as-char", false), + options + .get("--text-wrap") + .map(String::as_str) + .unwrap_or("InFrontOfText"), + options + .get("--shape-type") + .map(String::as_str) + .unwrap_or("rectangle"), + bool_opt(options, "--line-flip-x", false), + bool_opt(options, "--line-flip-y", false), + &[], + ), + ), + "get-shape-properties" => json_call(doc.get_shape_properties_native( + section(options)?, + parent_para(options)?, + control(options)?, + )), + "set-shape-properties" => json_call(doc.set_shape_properties_native( + section(options)?, + parent_para(options)?, + control(options)?, + props_json(options), + )), + "delete-shape-control" => json_call(doc.delete_shape_control_native( + section(options)?, + parent_para(options)?, + control(options)?, + )), + "change-shape-z-order" => json_call(doc.change_shape_z_order_native( + section(options)?, + parent_para(options)?, + control(options)?, + required(options, "--operation")?, + )), + "move-line-endpoint" => json_call(doc.move_line_endpoint_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_i32(options, "--start-x")?, + req_i32(options, "--start-y")?, + req_i32(options, "--end-x")?, + req_i32(options, "--end-y")?, + )), + "group-shapes" => { + let targets = parse_targets(required(options, "--targets")?)?; + json_call(doc.group_shapes_native(section(options)?, &targets)) + } + "ungroup-shape" => json_call(doc.ungroup_shape_native( + section(options)?, + parent_para(options)?, + control(options)?, + )), + "get-equation-properties" => json_call(doc.get_equation_properties_native( + section(options)?, + parent_para(options)?, + control(options)?, + opt_usize(options, "--cell")?, + opt_usize(options, "--cell-para")?, + )), + "set-equation-properties" => json_call(doc.set_equation_properties_native( + section(options)?, + parent_para(options)?, + control(options)?, + opt_usize(options, "--cell")?, + opt_usize(options, "--cell-para")?, + props_json(options), + )), + "render-equation-preview" => json_call(doc.render_equation_preview_native( + required(options, "--script")?, + req_u32(options, "--font-size")?, + req_u32(options, "--color")?, + )), + "insert-footnote" => json_call(doc.insert_footnote_native( + section(options)?, + paragraph(options)?, + offset(options)?, + )), + "get-footnote-info" => json_call(doc.get_footnote_info_native( + section(options)?, + parent_para(options)?, + control(options)?, + )), + "insert-text-in-footnote" => json_call(doc.insert_text_in_footnote_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_usize(options, "--footnote-para")?, + offset(options)?, + required(options, "--value")?, + )), + "delete-text-in-footnote" => json_call(doc.delete_text_in_footnote_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_usize(options, "--footnote-para")?, + offset(options)?, + count(options)?, + )), + "split-paragraph-in-footnote" => json_call(doc.split_paragraph_in_footnote_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_usize(options, "--footnote-para")?, + offset(options)?, + )), + "merge-paragraph-in-footnote" => json_call(doc.merge_paragraph_in_footnote_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_usize(options, "--footnote-para")?, + )), + _ => return Ok(None), + }?; + Ok(Some(value)) +} diff --git a/src/rhwp-field-bridge/src/ops_native_style.rs b/src/rhwp-field-bridge/src/ops_native_style.rs new file mode 100644 index 000000000..d4ac309f1 --- /dev/null +++ b/src/rhwp-field-bridge/src/ops_native_style.rs @@ -0,0 +1,114 @@ +use std::collections::BTreeMap; + +use rhwp::wasm_api::HwpDocument; +use serde_json::{json, Value}; + +use crate::ops_native_support::*; +use crate::options::required; + +pub(crate) fn try_run_native_style_op( + doc: &mut HwpDocument, + op: &str, + options: &BTreeMap, +) -> Result, String> { + let value = match op { + "get-char-properties-at" => json_call(doc.get_char_properties_at_native( + section(options)?, + paragraph(options)?, + offset(options)?, + )), + "get-para-properties-at" => { + json_call(doc.get_para_properties_at_native(section(options)?, paragraph(options)?)) + } + "get-style-list" => json_parse(doc.get_style_list()), + "get-style-detail" => json_parse(doc.get_style_detail(req_u32(options, "--style-id")?)), + "update-style" => { + bool_call(doc.update_style(req_u32(options, "--style-id")?, props_json(options))) + } + "update-style-shapes" => bool_call( + doc.update_style_shapes( + req_u32(options, "--style-id")?, + options + .get("--char-json") + .map(String::as_str) + .unwrap_or("{}"), + options + .get("--para-json") + .map(String::as_str) + .unwrap_or("{}"), + ), + ), + "create-style" => Ok(json!({"styleId": doc.create_style(props_json(options))})), + "delete-style" => bool_call(doc.delete_style(req_u32(options, "--style-id")?)), + "get-numbering-list" => json_parse(doc.get_numbering_list()), + "get-bullet-list" => json_parse(doc.get_bullet_list()), + "ensure-default-numbering" => Ok(json!({"numberingId": doc.ensure_default_numbering()})), + "find-or-create-font-id" => Ok(json!({ + "fontId": doc.find_or_create_font_id_native(required(options, "--name")?) + })), + "apply-char-format" => json_call(doc.apply_char_format_native( + section(options)?, + paragraph(options)?, + req_usize(options, "--start")?, + req_usize(options, "--end")?, + props_json(options), + )), + "apply-char-format-in-cell" => json_call(doc.apply_char_format_in_cell_native( + section(options)?, + parent_para(options)?, + control(options)?, + cell(options)?, + cell_para(options)?, + req_usize(options, "--start")?, + req_usize(options, "--end")?, + props_json(options), + )), + "apply-para-format" => json_call(doc.apply_para_format_native( + section(options)?, + paragraph(options)?, + props_json(options), + )), + "apply-para-format-in-cell" => json_call(doc.apply_para_format_in_cell_native( + section(options)?, + parent_para(options)?, + control(options)?, + cell(options)?, + cell_para(options)?, + props_json(options), + )), + "apply-style" => json_call(doc.apply_style_native( + section(options)?, + paragraph(options)?, + req_usize(options, "--style-id")?, + )), + "apply-cell-style" => json_call(doc.apply_cell_style_native( + section(options)?, + parent_para(options)?, + control(options)?, + cell(options)?, + cell_para(options)?, + req_usize(options, "--style-id")?, + )), + "set-numbering-restart" => json_call(doc.set_numbering_restart_native( + section(options)?, + paragraph(options)?, + req_u8(options, "--mode")?, + req_u32(options, "--start-number")?, + )), + "set-page-hide" => json_call(doc.set_page_hide_native( + section(options)?, + paragraph(options)?, + bool_opt(options, "--hide-header", false), + bool_opt(options, "--hide-footer", false), + bool_opt(options, "--hide-master-page", false), + bool_opt(options, "--hide-border", false), + bool_opt(options, "--hide-fill", false), + bool_opt(options, "--hide-page-num", false), + )), + "get-page-hide" => { + json_call(doc.get_page_hide_native(section(options)?, paragraph(options)?)) + } + _ => return Ok(None), + }?; + Ok(Some(value)) +} diff --git a/src/rhwp-field-bridge/src/ops_native_support.rs b/src/rhwp-field-bridge/src/ops_native_support.rs new file mode 100644 index 000000000..ba23a4a0b --- /dev/null +++ b/src/rhwp-field-bridge/src/ops_native_support.rs @@ -0,0 +1,204 @@ +use std::{collections::BTreeMap, fs}; + +use rhwp::wasm_api::HwpDocument; +use serde_json::{json, Value}; + +use crate::options::{optional_usize, required, required_usize}; + +pub(crate) fn insert_picture( + doc: &mut HwpDocument, + options: &BTreeMap, +) -> Result { + let image = required(options, "--image")?; + let image_data = fs::read(image).map_err(|e| format!("image read failed: {e}"))?; + let extension = options + .get("--extension") + .map(String::as_str) + .or_else(|| image.rsplit('.').next()) + .unwrap_or("png"); + json_call( + doc.insert_picture_native( + section(options)?, + paragraph(options)?, + offset(options)?, + &image_data, + req_u32(options, "--width")?, + req_u32(options, "--height")?, + req_u32(options, "--natural-width")?, + req_u32(options, "--natural-height")?, + extension, + options + .get("--description") + .map(String::as_str) + .unwrap_or(""), + ), + ) +} + +pub(crate) fn json_call(result: Result) -> Result { + let raw = result.map_err(native_err)?; + json_parse(raw) +} + +pub(crate) fn json_parse(raw: String) -> Result { + serde_json::from_str(&raw).or_else(|_| Ok(json!({ "raw": raw }))) +} + +pub(crate) fn bool_call(ok: bool) -> Result { + Ok(json!({ "ok": ok })) +} + +pub(crate) fn native_err(err: impl std::fmt::Display) -> String { + format!("native operation failed: {err}") +} + +pub(crate) fn section(options: &BTreeMap) -> Result { + optional_usize(options, "--section").map(|value| value.unwrap_or(0)) +} + +pub(crate) fn paragraph(options: &BTreeMap) -> Result { + optional_usize(options, "--paragraph")? + .or(optional_usize(options, "--para")?) + .ok_or_else(|| "missing required option: --paragraph".to_string()) +} + +pub(crate) fn parent_para(options: &BTreeMap) -> Result { + optional_usize(options, "--parent-para")? + .or(optional_usize(options, "--paragraph")?) + .ok_or_else(|| "missing required option: --parent-para".to_string()) +} + +pub(crate) fn control(options: &BTreeMap) -> Result { + required_usize(options, "--control") +} + +pub(crate) fn cell(options: &BTreeMap) -> Result { + required_usize(options, "--cell") +} + +pub(crate) fn cell_para(options: &BTreeMap) -> Result { + required_usize(options, "--cell-para") +} + +pub(crate) fn offset(options: &BTreeMap) -> Result { + optional_usize(options, "--offset").map(|value| value.unwrap_or(0)) +} + +pub(crate) fn count(options: &BTreeMap) -> Result { + required_usize(options, "--count") +} + +pub(crate) fn opt_usize( + options: &BTreeMap, + key: &str, +) -> Result, String> { + optional_usize(options, key) +} + +pub(crate) fn req_usize(options: &BTreeMap, key: &str) -> Result { + required_usize(options, key) +} + +pub(crate) fn req_u8(options: &BTreeMap, key: &str) -> Result { + required(options, key)? + .parse::() + .map_err(|e| format!("invalid {key} value: {e}")) +} + +pub(crate) fn req_u16(options: &BTreeMap, key: &str) -> Result { + required(options, key)? + .parse::() + .map_err(|e| format!("invalid {key} value: {e}")) +} + +pub(crate) fn req_u32(options: &BTreeMap, key: &str) -> Result { + required(options, key)? + .parse::() + .map_err(|e| format!("invalid {key} value: {e}")) +} + +pub(crate) fn req_i16(options: &BTreeMap, key: &str) -> Result { + required(options, key)? + .parse::() + .map_err(|e| format!("invalid {key} value: {e}")) +} + +pub(crate) fn req_i32(options: &BTreeMap, key: &str) -> Result { + required(options, key)? + .parse::() + .map_err(|e| format!("invalid {key} value: {e}")) +} + +pub(crate) fn bool_opt(options: &BTreeMap, key: &str, default: bool) -> bool { + options + .get(key) + .map(|value| value.eq_ignore_ascii_case("true") || value == "1") + .unwrap_or(default) +} + +pub(crate) fn parse_u32_list(value: Option<&str>) -> Result>, String> { + let Some(value) = value else { + return Ok(None); + }; + let mut output = Vec::new(); + for item in value.split(',') { + let item = item.trim(); + if item.is_empty() { + continue; + } + output.push( + item.parse::() + .map_err(|e| format!("invalid --col-widths item '{item}': {e}"))?, + ); + } + Ok((!output.is_empty()).then_some(output)) +} + +pub(crate) fn parse_targets(value: &str) -> Result, String> { + let mut targets = Vec::new(); + for item in value.split(',') { + let item = item.trim(); + if item.is_empty() { + continue; + } + let (para, control) = item + .split_once(':') + .ok_or_else(|| format!("invalid --targets item '{item}'; expected para:control"))?; + targets.push(( + para.parse::() + .map_err(|e| format!("invalid --targets paragraph '{para}': {e}"))?, + control + .parse::() + .map_err(|e| format!("invalid --targets control '{control}': {e}"))?, + )); + } + if targets.is_empty() { + return Err("missing non-empty --targets".to_string()); + } + Ok(targets) +} + +pub(crate) fn is_header(options: &BTreeMap) -> bool { + options + .get("--kind") + .map(|value| value.eq_ignore_ascii_case("header")) + .unwrap_or_else(|| bool_opt(options, "--is-header", true)) +} + +pub(crate) fn apply_to(options: &BTreeMap) -> Result { + options + .get("--apply-to") + .map(|value| { + value + .parse::() + .map_err(|e| format!("invalid --apply-to value: {e}")) + }) + .unwrap_or(Ok(0)) +} + +pub(crate) fn props_json(options: &BTreeMap) -> &str { + options + .get("--props-json") + .map(String::as_str) + .unwrap_or("{}") +} diff --git a/src/rhwp-field-bridge/src/ops_native_table.rs b/src/rhwp-field-bridge/src/ops_native_table.rs new file mode 100644 index 000000000..ab2f5f4df --- /dev/null +++ b/src/rhwp-field-bridge/src/ops_native_table.rs @@ -0,0 +1,121 @@ +use std::collections::BTreeMap; + +use rhwp::wasm_api::HwpDocument; +use serde_json::Value; + +use crate::ops_native_support::*; + +pub(crate) fn try_run_native_table_op( + doc: &mut HwpDocument, + op: &str, + options: &BTreeMap, +) -> Result, String> { + let value = match op { + "create-table" => json_call(doc.create_table_native( + section(options)?, + paragraph(options)?, + offset(options)?, + req_u16(options, "--rows")?, + req_u16(options, "--cols")?, + )), + "create-table-ex" => { + let widths = parse_u32_list(options.get("--col-widths").map(String::as_str))?; + json_call(doc.create_table_ex_native( + section(options)?, + paragraph(options)?, + offset(options)?, + req_u16(options, "--rows")?, + req_u16(options, "--cols")?, + bool_opt(options, "--treat-as-char", false), + widths.as_deref(), + )) + } + "insert-table-row" => json_call(doc.insert_table_row_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_u16(options, "--row")?, + bool_opt(options, "--below", true), + )), + "insert-table-column" => json_call(doc.insert_table_column_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_u16(options, "--col")?, + bool_opt(options, "--right", true), + )), + "delete-table-row" => json_call(doc.delete_table_row_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_u16(options, "--row")?, + )), + "delete-table-column" => json_call(doc.delete_table_column_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_u16(options, "--col")?, + )), + "merge-table-cells" => json_call(doc.merge_table_cells_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_u16(options, "--start-row")?, + req_u16(options, "--start-col")?, + req_u16(options, "--end-row")?, + req_u16(options, "--end-col")?, + )), + "split-table-cell" => json_call(doc.split_table_cell_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_u16(options, "--row")?, + req_u16(options, "--col")?, + )), + "split-table-cell-into" => json_call(doc.split_table_cell_into_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_u16(options, "--row")?, + req_u16(options, "--col")?, + req_u16(options, "--rows")?, + req_u16(options, "--cols")?, + bool_opt(options, "--equal-row-height", true), + bool_opt(options, "--merge-first", false), + )), + "split-table-cells-in-range" => json_call(doc.split_table_cells_in_range_native( + section(options)?, + parent_para(options)?, + control(options)?, + req_u16(options, "--start-row")?, + req_u16(options, "--start-col")?, + req_u16(options, "--end-row")?, + req_u16(options, "--end-col")?, + req_u16(options, "--rows")?, + req_u16(options, "--cols")?, + bool_opt(options, "--equal-row-height", true), + )), + "delete-table-control" => json_call(doc.delete_table_control_native( + section(options)?, + parent_para(options)?, + control(options)?, + )), + "get-cell-char-properties-at" => json_call(doc.get_cell_char_properties_at_native( + section(options)?, + parent_para(options)?, + control(options)?, + cell(options)?, + cell_para(options)?, + offset(options)?, + )), + "get-cell-para-properties-at" => json_call(doc.get_cell_para_properties_at_native( + section(options)?, + parent_para(options)?, + control(options)?, + cell(options)?, + cell_para(options)?, + )), + _ => return Ok(None), + }?; + Ok(Some(value)) +} diff --git a/src/rhwp-field-bridge/src/ops_native_text.rs b/src/rhwp-field-bridge/src/ops_native_text.rs new file mode 100644 index 000000000..ed1df19b4 --- /dev/null +++ b/src/rhwp-field-bridge/src/ops_native_text.rs @@ -0,0 +1,157 @@ +use std::collections::BTreeMap; + +use rhwp::wasm_api::HwpDocument; +use serde_json::{json, Value}; + +use crate::ops_native_support::*; +use crate::options::required; + +pub(crate) fn try_run_native_text_op( + doc: &mut HwpDocument, + op: &str, + options: &BTreeMap, +) -> Result, String> { + let value = match op { + "delete-text" => json_call(doc.delete_text_native( + section(options)?, + paragraph(options)?, + offset(options)?, + count(options)?, + )), + "split-paragraph" => json_call(doc.split_paragraph_native( + section(options)?, + paragraph(options)?, + offset(options)?, + )), + "merge-paragraph" => { + json_call(doc.merge_paragraph_native(section(options)?, paragraph(options)?)) + } + "insert-paragraph" => { + json_call(doc.insert_paragraph_native(section(options)?, paragraph(options)?)) + } + "delete-paragraph" => { + json_call(doc.delete_paragraph_native(section(options)?, paragraph(options)?)) + } + "insert-page-break" => json_call(doc.insert_page_break_native( + section(options)?, + paragraph(options)?, + offset(options)?, + )), + "insert-column-break" => json_call(doc.insert_column_break_native( + section(options)?, + paragraph(options)?, + offset(options)?, + )), + "set-column-def" => json_call(doc.set_column_def_native( + section(options)?, + req_u16(options, "--columns")?, + req_u8(options, "--column-type")?, + bool_opt(options, "--same-width", true), + req_i16(options, "--spacing")?, + )), + "get-paragraph-count" => Ok(json!({ + "count": doc.get_paragraph_count_native(section(options)?).map_err(native_err)? + })), + "get-paragraph-length" => Ok(json!({ + "length": doc.get_paragraph_length_native( + section(options)?, + paragraph(options)?, + ).map_err(native_err)? + })), + "get-text-range" => Ok(json!({ + "text": doc.get_text_range_native( + section(options)?, + paragraph(options)?, + offset(options)?, + count(options)?, + ).map_err(native_err)? + })), + "get-textbox-control-index" => Ok(json!({ + "control": doc.get_textbox_control_index_native(section(options)?, paragraph(options)?) + })), + "find-next-editable-control" => json_parse( + doc.find_next_editable_control_native( + section(options)?, + paragraph(options)?, + options + .get("--control") + .map(|_| req_i32(options, "--control")) + .unwrap_or(Ok(-1))?, + req_i32(options, "--delta")?, + ), + ), + "find-nearest-control-backward" => json_parse(doc.find_nearest_control_backward_native( + section(options)?, + paragraph(options)?, + offset(options)?, + )), + "find-nearest-control-forward" => json_parse(doc.find_nearest_control_forward_native( + section(options)?, + paragraph(options)?, + offset(options)?, + )), + "insert-text-in-cell" => json_call(doc.insert_text_in_cell_native( + section(options)?, + parent_para(options)?, + control(options)?, + cell(options)?, + cell_para(options)?, + offset(options)?, + required(options, "--value")?, + )), + "delete-text-in-cell" => json_call(doc.delete_text_in_cell_native( + section(options)?, + parent_para(options)?, + control(options)?, + cell(options)?, + cell_para(options)?, + offset(options)?, + count(options)?, + )), + "split-paragraph-in-cell" => json_call(doc.split_paragraph_in_cell_native( + section(options)?, + parent_para(options)?, + control(options)?, + cell(options)?, + cell_para(options)?, + offset(options)?, + )), + "merge-paragraph-in-cell" => json_call(doc.merge_paragraph_in_cell_native( + section(options)?, + parent_para(options)?, + control(options)?, + cell(options)?, + cell_para(options)?, + )), + "get-cell-paragraph-count" => Ok(json!({ + "count": doc.get_cell_paragraph_count_native( + section(options)?, + parent_para(options)?, + control(options)?, + cell(options)?, + ).map_err(native_err)? + })), + "get-cell-paragraph-length" => Ok(json!({ + "length": doc.get_cell_paragraph_length_native( + section(options)?, + parent_para(options)?, + control(options)?, + cell(options)?, + cell_para(options)?, + ).map_err(native_err)? + })), + "get-text-in-cell" => Ok(json!({ + "text": doc.get_text_in_cell_native( + section(options)?, + parent_para(options)?, + control(options)?, + cell(options)?, + cell_para(options)?, + offset(options)?, + count(options)?, + ).map_err(native_err)? + })), + _ => return Ok(None), + }?; + Ok(Some(value)) +} diff --git a/src/rhwp-field-bridge/src/ops_text.rs b/src/rhwp-field-bridge/src/ops_text.rs new file mode 100644 index 000000000..e2782f80e --- /dev/null +++ b/src/rhwp-field-bridge/src/ops_text.rs @@ -0,0 +1,45 @@ +use std::collections::BTreeMap; + +use rhwp::wasm_api::HwpDocument; +use serde_json::{json, Value}; + +use crate::ops::write_document; +use crate::options::{optional_usize, required}; + +pub(crate) fn insert_text( + doc: &mut HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let output = required(options, "--output")?; + let value = required(options, "--value")?; + let section = optional_usize(options, "--section")?.unwrap_or(0); + let paragraph = optional_usize(options, "--paragraph")? + .or(optional_usize(options, "--para")?) + .unwrap_or(0); + let offset = optional_usize(options, "--offset")?.unwrap_or(0); + + let insert_json = doc + .insert_text_native(section, paragraph, offset, value) + .map_err(|e| format!("text insert failed: {e}"))?; + write_document(doc, format, output)?; + + let insert_result: Value = + serde_json::from_str(&insert_json).unwrap_or_else(|_| json!({ "raw": insert_json })); + println!( + "{}", + json!({ + "inserted": true, + "operation": "insert-text", + "output": output, + "format": format, + "section": section, + "paragraph": paragraph, + "offset": offset, + "insert": insert_result, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "warnings": ["experimental text insertion; verify with Hancom before production use"] + }) + ); + Ok(()) +} diff --git a/src/rhwp-field-bridge/src/ops_view.rs b/src/rhwp-field-bridge/src/ops_view.rs new file mode 100644 index 000000000..afd53324e --- /dev/null +++ b/src/rhwp-field-bridge/src/ops_view.rs @@ -0,0 +1,312 @@ +use std::{collections::BTreeMap, fs, path::Path}; + +use rhwp::wasm_api::HwpDocument; +use serde_json::{json, Value}; +use sha2::{Digest, Sha256}; + +use crate::options::{optional_usize, required, selected_pages}; + +pub(crate) fn read_text( + doc: &HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let page_count = doc.page_count(); + let pages = selected_pages(options, page_count)?; + let mut page_payload = Vec::new(); + let mut text = String::new(); + for page_idx in pages { + let mut page_text = doc + .extract_page_text_native(page_idx) + .map_err(|e| format!("page {} text extraction failed: {e}", page_idx + 1))?; + if !page_text.ends_with('\n') { + page_text.push('\n'); + } + text.push_str(&page_text); + page_payload.push(json!({ + "page": page_idx + 1, + "text": page_text + })); + } + println!( + "{}", + json!({ + "text": text, + "pages": page_payload, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": [] + }) + ); + Ok(()) +} + +pub(crate) fn render_svg( + doc: &HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let out_dir = required(options, "--out-dir")?; + fs::create_dir_all(out_dir).map_err(|e| format!("output directory create failed: {e}"))?; + let page_count = doc.page_count(); + let pages = selected_pages(options, page_count)?; + let mut page_payload = Vec::new(); + for page_idx in pages { + let svg = doc + .render_page_svg_native(page_idx) + .map_err(|e| format!("page {} SVG render failed: {e}", page_idx + 1))?; + let path = Path::new(out_dir).join(format!("page_{:03}.svg", page_idx + 1)); + fs::write(&path, svg.as_bytes()).map_err(|e| format!("SVG write failed: {e}"))?; + page_payload.push(json!({ + "page": page_idx + 1, + "path": path.to_string_lossy(), + "sha256": sha256_bytes(svg.as_bytes()) + })); + } + write_manifest_and_print(out_dir, page_payload, format, &[]) +} + +pub(crate) fn export_pdf( + doc: &HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let output = required(options, "--output")?; + if let Some(parent) = Path::new(output).parent() { + if !parent.as_os_str().is_empty() { + fs::create_dir_all(parent) + .map_err(|e| format!("output directory create failed: {e}"))?; + } + } + + let page_count = doc.page_count(); + let pages = selected_pages(options, page_count)?; + let mut svg_pages = Vec::new(); + let mut page_payload = Vec::new(); + for page_idx in pages { + let svg = doc + .render_page_svg_native(page_idx) + .map_err(|e| format!("page {} PDF render failed: {e}", page_idx + 1))?; + svg_pages.push(svg); + page_payload.push(json!({ "page": page_idx + 1 })); + } + + let pdf = rhwp::renderer::pdf::svgs_to_pdf(&svg_pages) + .map_err(|e| format!("PDF export failed: {e}"))?; + fs::write(output, &pdf).map_err(|e| format!("PDF write failed: {e}"))?; + + println!( + "{}", + json!({ + "pdf": { + "path": output, + "bytes": pdf.len(), + "sha256": sha256_bytes(&pdf) + }, + "pages": page_payload, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": ["experimental PDF export; verify visual output before production use"] + }) + ); + Ok(()) +} + +#[cfg(all(not(target_arch = "wasm32"), feature = "native-skia"))] +pub(crate) fn render_png( + doc: &HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let out_dir = required(options, "--out-dir")?; + fs::create_dir_all(out_dir).map_err(|e| format!("output directory create failed: {e}"))?; + let page_count = doc.page_count(); + let pages = selected_pages(options, page_count)?; + let mut page_payload = Vec::new(); + for page_idx in pages { + let png = doc + .render_page_png_native(page_idx) + .map_err(|e| format!("page {} PNG render failed: {e}", page_idx + 1))?; + let path = Path::new(out_dir).join(format!("page_{:03}.png", page_idx + 1)); + fs::write(&path, &png).map_err(|e| format!("PNG write failed: {e}"))?; + page_payload.push(json!({ + "page": page_idx + 1, + "path": path.to_string_lossy(), + "sha256": sha256_bytes(&png), + "bytes": png.len() + })); + } + write_manifest_and_print( + out_dir, + page_payload, + format, + &["experimental PNG render; verify visual output before production use"], + ) +} + +#[cfg(any(target_arch = "wasm32", not(feature = "native-skia")))] +pub(crate) fn render_png( + _doc: &HwpDocument, + _format: &str, + _options: &BTreeMap, +) -> Result<(), String> { + Err("render-png requires rhwp-field-bridge built with --features native-skia".to_string()) +} + +pub(crate) fn export_markdown( + doc: &HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let page_count = doc.page_count(); + let pages = selected_pages(options, page_count)?; + let mut page_payload = Vec::new(); + let mut markdown = String::new(); + for page_idx in pages { + let page_markdown = doc + .extract_page_markdown_native(page_idx) + .map_err(|e| format!("page {} markdown export failed: {e}", page_idx + 1))?; + markdown.push_str(&page_markdown); + if !markdown.ends_with('\n') { + markdown.push('\n'); + } + page_payload.push(json!({ + "page": page_idx + 1, + "markdown": page_markdown + })); + } + println!( + "{}", + json!({ + "markdown": markdown, + "pages": page_payload, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": ["experimental markdown export; image assets are not yet materialized by OfficeCLI"] + }) + ); + Ok(()) +} + +pub(crate) fn document_info(doc: &HwpDocument, format: &str) -> Result<(), String> { + let raw = doc.get_document_info(); + let info: Value = serde_json::from_str(&raw).unwrap_or_else(|_| json!({ "raw": raw })); + println!( + "{}", + json!({ + "info": info, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": [] + }) + ); + Ok(()) +} + +pub(crate) fn diagnostics(doc: &HwpDocument, format: &str) -> Result<(), String> { + let raw = doc.get_validation_warnings(); + let diagnostics: Value = serde_json::from_str(&raw).unwrap_or_else(|_| json!({ "raw": raw })); + println!( + "{}", + json!({ + "diagnostics": diagnostics, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": ["rhwp diagnostics are provider warnings, not full OfficeCLI package validation"] + }) + ); + Ok(()) +} + +pub(crate) fn dump_pages( + doc: &HwpDocument, + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let page = optional_usize(options, "--page")?.map(|value| value.saturating_sub(1) as u32); + let dump = doc.dump_page_items(page); + println!( + "{}", + json!({ + "dump": dump, + "page": page.map(|value| value + 1), + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": ["diagnostic page dump; output format is not a stable editing contract"] + }) + ); + Ok(()) +} + +pub(crate) fn dump_controls(doc: &HwpDocument, format: &str) -> Result<(), String> { + let dump = format!("{:#?}", doc.document()); + println!( + "{}", + json!({ + "dump": dump, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": ["diagnostic full document/control dump; output format is not a stable editing contract"] + }) + ); + Ok(()) +} + +pub(crate) fn thumbnail( + bytes: &[u8], + format: &str, + options: &BTreeMap, +) -> Result<(), String> { + let output = required(options, "--output")?; + let result = rhwp::parser::extract_thumbnail_only(bytes) + .ok_or_else(|| "document does not contain a thumbnail preview image".to_string())?; + fs::write(output, &result.data).map_err(|e| format!("thumbnail write failed: {e}"))?; + println!( + "{}", + json!({ + "thumbnail": { + "path": output, + "format": result.format, + "width": result.width, + "height": result.height, + "bytes": result.data.len(), + "sha256": sha256_bytes(&result.data) + }, + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": [] + }) + ); + Ok(()) +} + +fn write_manifest_and_print( + out_dir: &str, + page_payload: Vec, + format: &str, + warnings: &[&str], +) -> Result<(), String> { + let manifest = Path::new(out_dir).join("manifest.json"); + fs::write( + &manifest, + serde_json::to_vec_pretty(&json!({ "pages": page_payload })) + .map_err(|e| format!("manifest JSON encode failed: {e}"))?, + ) + .map_err(|e| format!("manifest write failed: {e}"))?; + println!( + "{}", + json!({ + "pages": page_payload, + "manifest": manifest.to_string_lossy(), + "engineVersion": concat!("rhwp-api ", env!("CARGO_PKG_VERSION")), + "format": format, + "warnings": warnings + }) + ); + Ok(()) +} + +fn sha256_bytes(bytes: &[u8]) -> String { + let digest = Sha256::digest(bytes); + format!("{digest:x}") +} diff --git a/src/rhwp-field-bridge/src/options.rs b/src/rhwp-field-bridge/src/options.rs new file mode 100644 index 000000000..bfccd9157 --- /dev/null +++ b/src/rhwp-field-bridge/src/options.rs @@ -0,0 +1,79 @@ +use std::collections::BTreeMap; + +pub(crate) fn parse_options(args: &[String]) -> Result, String> { + let mut options = BTreeMap::new(); + let mut index = 0; + while index < args.len() { + let arg = &args[index]; + if !arg.starts_with("--") { + index += 1; + continue; + } + if arg == "--json" { + options.insert(arg.clone(), "true".to_string()); + index += 1; + continue; + } + if index + 1 >= args.len() || args[index + 1].starts_with("--") { + return Err(format!("missing value for {arg}")); + } + options.insert(arg.clone(), args[index + 1].clone()); + index += 2; + } + Ok(options) +} + +pub(crate) fn required<'a>( + options: &'a BTreeMap, + key: &str, +) -> Result<&'a str, String> { + options + .get(key) + .map(String::as_str) + .filter(|value| !value.trim().is_empty()) + .ok_or_else(|| format!("missing required option: {key}")) +} + +pub(crate) fn required_usize( + options: &BTreeMap, + key: &str, +) -> Result { + required(options, key)? + .parse::() + .map_err(|e| format!("invalid {key} value: {e}")) +} + +pub(crate) fn optional_usize( + options: &BTreeMap, + key: &str, +) -> Result, String> { + match options.get(key) { + Some(value) => value + .parse::() + .map(Some) + .map_err(|e| format!("invalid {key} value: {e}")), + None => Ok(None), + } +} + +pub(crate) fn selected_pages( + options: &BTreeMap, + page_count: u32, +) -> Result, String> { + if page_count == 0 { + return Err("document has no pages".to_string()); + } + let selector = options.get("--page").map(String::as_str).unwrap_or("all"); + if selector.eq_ignore_ascii_case("all") { + return Ok((0..page_count).collect()); + } + let one_based = selector + .parse::() + .map_err(|e| format!("invalid --page value: {e}"))?; + if one_based == 0 || one_based > page_count { + return Err(format!( + "--page out of range: {one_based}; valid range is 1..={page_count}" + )); + } + Ok(vec![one_based - 1]) +} diff --git a/src/rhwp-officecli-bridge/Program.cs b/src/rhwp-officecli-bridge/Program.cs new file mode 100644 index 000000000..8a696ca41 --- /dev/null +++ b/src/rhwp-officecli-bridge/Program.cs @@ -0,0 +1,322 @@ +using System.Diagnostics; +using System.Security.Cryptography; +using System.Text.Json; + +var exitCode = BridgeProgram.Run(args); +return exitCode; + +internal static class BridgeProgram +{ + public static int Run(string[] args) + { + if (args.Length == 0 || args[0] is "--help" or "-h") + return Help(); + + try + { + return args[0] switch + { + "create-blank" => ApiBridge(args), + "read-text" => ApiBridgeAvailable() ? ApiBridge(args) : ReadText(args[1..]), + "render-svg" => ApiBridgeAvailable() ? ApiBridge(args) : RenderSvg(args[1..]), + "render-png" => ApiBridge(args), + "export-pdf" => ApiBridge(args), + "export-markdown" => ApiBridge(args), + "document-info" => ApiBridge(args), + "diagnostics" => ApiBridge(args), + "dump-controls" => ApiBridge(args), + "dump-pages" => ApiBridge(args), + "thumbnail" => ApiBridge(args), + "list-fields" => ApiBridge(args), + "get-field" => ApiBridge(args), + "set-field" => ApiBridge(args), + "replace-text" => ApiBridge(args), + "insert-text" => ApiBridge(args), + "get-cell-text" => ApiBridge(args), + "scan-cells" => ApiBridge(args), + "set-cell-text" => ApiBridge(args), + "convert-to-editable" => ApiBridge(args), + "native-op" => ApiBridge(args), + "save-as-hwp" => ApiBridge(args), + _ => Error($"unsupported command: {args[0]}", "unsupported_command") + }; + } + catch (Exception ex) + { + return Error(ex.Message, "bridge_exception"); + } + } + + private static int ReadText(string[] args) + { + var options = ParseOptions(args); + var input = Required(options, "--input"); + var format = Required(options, "--format"); + if (!File.Exists(input)) return Error($"input not found: {input}", "input_not_found"); + + using var temp = TempDirectory.Create(); + var rhwp = RhwpBinary(); + var rhwpArgs = new List { "export-text", input, "--output", temp.Path }; + var result = RunProcess(rhwp, rhwpArgs); + if (result.ExitCode != 0) + return Error($"rhwp export-text failed: {result.Stderr.Trim()}", "rhwp_failed"); + + var textFiles = Directory.GetFiles(temp.Path, "*.txt").OrderBy(p => p, StringComparer.Ordinal).ToArray(); + var pages = textFiles.Select((path, index) => new TextPage(index + 1, File.ReadAllText(path))).ToArray(); + var text = string.Concat(pages.Select(p => p.Text)); + WriteJson(new TextResponse(text, pages, EngineVersion(rhwp), [], format)); + return 0; + } + + private static int RenderSvg(string[] args) + { + var options = ParseOptions(args); + var input = Required(options, "--input"); + var format = Required(options, "--format"); + var outDir = Required(options, "--out-dir"); + var pageSelector = options.GetValueOrDefault("--page", "all"); + if (!File.Exists(input)) return Error($"input not found: {input}", "input_not_found"); + Directory.CreateDirectory(outDir); + + var rhwp = RhwpBinary(); + var rhwpArgs = new List { "export-svg", input, "--output", outDir }; + if (!string.Equals(pageSelector, "all", StringComparison.OrdinalIgnoreCase)) + { + if (!int.TryParse(pageSelector, out var oneBasedPage) || oneBasedPage <= 0) + return Error($"unsupported page selector: {pageSelector}", "unsupported_page_selector"); + rhwpArgs.Add("--page"); + rhwpArgs.Add((oneBasedPage - 1).ToString()); + } + + var result = RunProcess(rhwp, rhwpArgs); + if (result.ExitCode != 0) + return Error($"rhwp export-svg failed: {result.Stderr.Trim()}", "rhwp_failed"); + + var svgFiles = Directory.GetFiles(outDir, "*.svg").OrderBy(p => p, StringComparer.Ordinal).ToArray(); + var pages = svgFiles.Select((path, index) => new SvgPage(index + 1, path, Sha256(path))).ToArray(); + var manifest = Path.Combine(outDir, "manifest.json"); + File.WriteAllText(manifest, JsonSerializer.Serialize(new { pages }, JsonOptions())); + WriteJson(new SvgResponse(pages, manifest, EngineVersion(rhwp), [], format)); + return 0; + } + + private static Dictionary ParseOptions(string[] args) + { + var options = new Dictionary(StringComparer.Ordinal); + for (var i = 0; i < args.Length; i++) + { + if (!args[i].StartsWith("--", StringComparison.Ordinal)) continue; + if (args[i] == "--json") + { + options[args[i]] = "true"; + continue; + } + if (i + 1 >= args.Length || args[i + 1].StartsWith("--", StringComparison.Ordinal)) + throw new ArgumentException($"missing value for {args[i]}"); + options[args[i]] = args[i + 1]; + i++; + } + return options; + } + + private static string Required(Dictionary options, string key) + => options.TryGetValue(key, out var value) && !string.IsNullOrWhiteSpace(value) + ? value + : throw new ArgumentException($"missing required option: {key}"); + + private static string RhwpBinary() + => DiscoverExecutable( + Environment.GetEnvironmentVariable("OFFICECLI_RHWP_BIN"), + OperatingSystem.IsWindows() ? ["rhwp.exe", "rhwp"] : ["rhwp"]) + ?? "rhwp"; + + private static string RhwpApiBinary() + => DiscoverExecutable( + Environment.GetEnvironmentVariable("OFFICECLI_RHWP_API_BIN"), + OperatingSystem.IsWindows() + ? ["rhwp-field-bridge.exe", "rhwp-field-bridge"] + : ["rhwp-field-bridge"]) + ?? ""; + + private static bool ApiBridgeAvailable() + { + var api = RhwpApiBinary(); + return !string.IsNullOrWhiteSpace(api) && File.Exists(api); + } + + private static int ApiBridge(string[] args) + { + var api = RhwpApiBinary(); + if (string.IsNullOrWhiteSpace(api) || !File.Exists(api)) + return Error( + "rhwp API bridge is not configured. Set OFFICECLI_RHWP_API_BIN to rhwp-field-bridge.", + "api_bridge_missing"); + + var result = RunProcess(api, args); + if (result.ExitCode != 0) + { + if (TryForwardJsonError(result.Stdout, result.Stderr)) + return result.ExitCode; + return Error(BuildApiBridgeFailureMessage(result), "api_bridge_failed"); + } + + Console.Write(result.Stdout); + if (!result.Stdout.EndsWith('\n')) Console.WriteLine(); + return 0; + } + + private static ProcessResult RunProcess(string fileName, IReadOnlyList args) + { + var psi = new ProcessStartInfo + { + FileName = fileName, + UseShellExecute = false, + RedirectStandardOutput = true, + RedirectStandardError = true, + CreateNoWindow = true + }; + foreach (var arg in args) psi.ArgumentList.Add(arg); + using var process = Process.Start(psi) ?? throw new InvalidOperationException($"failed to start {fileName}"); + var stdout = process.StandardOutput.ReadToEnd(); + var stderr = process.StandardError.ReadToEnd(); + process.WaitForExit(); + return new ProcessResult(process.ExitCode, stdout, stderr); + } + + private static bool TryForwardJsonError(string stdout, string stderr) + { + if (string.IsNullOrWhiteSpace(stdout)) + return false; + try + { + using var _ = JsonDocument.Parse(stdout); + Console.Write(stdout); + if (!stdout.EndsWith('\n')) Console.WriteLine(); + if (!string.IsNullOrWhiteSpace(stderr)) + Console.Error.WriteLine(stderr.Trim()); + return true; + } + catch (JsonException) + { + return false; + } + } + + private static string BuildApiBridgeFailureMessage(ProcessResult result) + { + var stderr = result.Stderr.Trim(); + var stdout = result.Stdout.Trim(); + if (stdout.Length > 512) + stdout = stdout[..512] + "..."; + if (string.IsNullOrWhiteSpace(stderr)) + stderr = "(no stderr)"; + return string.IsNullOrWhiteSpace(stdout) + ? $"rhwp API bridge failed with exit code {result.ExitCode}: {stderr}" + : $"rhwp API bridge failed with exit code {result.ExitCode}: {stderr}; stdout: {stdout}"; + } + + private static string? EngineVersion(string rhwp) + { + try + { + var result = RunProcess(rhwp, ["--version"]); + return result.ExitCode == 0 ? result.Stdout.Trim() : null; + } + catch + { + return null; + } + } + + private static string Sha256(string path) + { + using var stream = File.OpenRead(path); + return Convert.ToHexString(SHA256.HashData(stream)).ToLowerInvariant(); + } + + private static int Error(string message, string code) + { + WriteJson(new ErrorResponse(false, new BridgeError(message, code))); + Console.Error.WriteLine(message); + return 1; + } + + private static int Help() + { + Console.WriteLine("rhwp-officecli-bridge create-blank|read-text|render-svg|render-png|export-pdf|export-markdown|document-info|diagnostics|dump-controls|dump-pages|thumbnail|list-fields|get-field|set-field|replace-text|insert-text|get-cell-text|scan-cells|set-cell-text|convert-to-editable|native-op|save-as-hwp --format hwp|hwpx [--input ] [--op ] [--output ] --json"); + return 0; + } + + private static string? DiscoverExecutable(string? explicitPath, string[] names) + { + if (!string.IsNullOrWhiteSpace(explicitPath)) + return File.Exists(explicitPath) ? explicitPath : null; + + foreach (var dir in CandidateDirectories()) + { + foreach (var name in names) + { + var candidate = Path.Combine(dir, name); + if (File.Exists(candidate)) return candidate; + } + } + + var pathEnv = Environment.GetEnvironmentVariable("PATH") ?? ""; + foreach (var dir in pathEnv.Split(Path.PathSeparator)) + { + if (string.IsNullOrWhiteSpace(dir)) continue; + foreach (var name in names) + { + var candidate = Path.Combine(dir, name); + if (File.Exists(candidate)) return candidate; + } + } + return null; + } + + private static IEnumerable CandidateDirectories() + { + var seen = new HashSet(StringComparer.Ordinal); + foreach (var dir in new[] + { + AppContext.BaseDirectory, + Path.GetDirectoryName(Environment.ProcessPath ?? ""), + Directory.GetCurrentDirectory() + }) + { + if (string.IsNullOrWhiteSpace(dir)) continue; + var full = Path.GetFullPath(dir); + if (seen.Add(full)) yield return full; + } + } + + private static void WriteJson(T value) + => Console.WriteLine(JsonSerializer.Serialize(value, JsonOptions())); + + private static JsonSerializerOptions JsonOptions() + => new() { PropertyNamingPolicy = JsonNamingPolicy.CamelCase }; +} + +internal sealed class TempDirectory : IDisposable +{ + private TempDirectory(string path) => Path = path; + public string Path { get; } + public static TempDirectory Create() + { + var path = System.IO.Path.Combine(System.IO.Path.GetTempPath(), $"officecli_rhwp_bridge_{Guid.NewGuid():N}"); + Directory.CreateDirectory(path); + return new TempDirectory(path); + } + public void Dispose() + { + try { Directory.Delete(Path, recursive: true); } catch { } + } +} + +internal sealed record ProcessResult(int ExitCode, string Stdout, string Stderr); +internal sealed record TextPage(int Page, string Text); +internal sealed record SvgPage(int Page, string Path, string Sha256); +internal sealed record TextResponse(string Text, IReadOnlyList Pages, string? EngineVersion, string[] Warnings, string Format); +internal sealed record SvgResponse(IReadOnlyList Pages, string Manifest, string? EngineVersion, string[] Warnings, string Format); +internal sealed record ErrorResponse(bool Success, BridgeError Error); +internal sealed record BridgeError(string Message, string Code); diff --git a/src/rhwp-officecli-bridge/rhwp-officecli-bridge.csproj b/src/rhwp-officecli-bridge/rhwp-officecli-bridge.csproj new file mode 100644 index 000000000..686cd23d9 --- /dev/null +++ b/src/rhwp-officecli-bridge/rhwp-officecli-bridge.csproj @@ -0,0 +1,11 @@ + + + + Exe + net10.0 + rhwp-officecli-bridge + enable + enable + + + diff --git a/structure/01-file-function-map.md b/structure/01-file-function-map.md new file mode 100644 index 000000000..42df58888 --- /dev/null +++ b/structure/01-file-function-map.md @@ -0,0 +1,52 @@ +# File/function map + +## Top-level layout + +| Path | Purpose | +| --- | --- | +| `src/officecli` | Main CLI executable, command registration, document handlers, MCP/skill integration, resources. | +| `src/rhwp-officecli-bridge` | C# sidecar bridge executable/project for experimental rhwp integration. | +| `src/rhwp-field-bridge` | Rust rhwp API sidecar source for field/text/table operations. Build outputs are intentionally ignored. | +| `tests/OfficeCli.Tests` | xUnit test suite for handlers, HWP/HWPX gates, safe-save, schemas, and regression coverage. | +| `tests/fixtures` | HWP/HWPX corpus manifests, common expected-capability/provider/round-trip/visual fixtures, sample documents. | +| `schemas/help` | Embedded schema-driven help by format and element. Formats include `docx`, `xlsx`, `pptx`, `hwp`, `hwpx`. | +| `schemas/interfaces` | JSON schemas for capability, provider, sidecar, validation, diff, edit, and safe-save contracts. | +| `docs/qa` | Phase 36 compatibility corpus, visual thresholds, provider matrix, release-gate documentation. | +| `docs/providers` | Provider boundary docs, currently rhwp sidecar contract. | +| `docs/safety` | Safe-save policy for HWP/HWPX mutations. | +| `skills` | Embedded agent skills including officecli base/specialized skills and `officecli-hwpx`. | +| `examples` / `assets` / `styles` | Sample documents, visual assets, and design style packs. | + +## Main command files + +| File | Responsibility | +| --- | --- | +| `Program.cs` | Startup, help rewrite, early commands (`mcp`, `install`, `skills`, `load_skill`, `config`), logging/update hooks. | +| `CommandBuilder.cs` | Root command and resident open/close setup; registers partial command builders. | +| `CommandBuilder.View.cs` / `.View.Help.cs` | `view` modes: text, annotated, outline, stats, issues, html, svg, screenshot, HWP/HWPX field modes. | +| `CommandBuilder.GetQuery.cs` | `get` and `query` path/selector reads. | +| `CommandBuilder.Set.cs` | Generic set, selected pseudo-path, document protection, HWP/HWPX mutation dispatch. | +| `CommandBuilder.Set.Hwp*.cs` | HWP/HWPX field/text/table mutation helpers and safe in-place policy errors. | +| `CommandBuilder.Add.cs` | `add`, `remove`, `move`, `swap`, and merge/template helpers. | +| `CommandBuilder.Raw.cs` | `raw`, `raw-set`, `add-part` OpenXML fallback operations. | +| `CommandBuilder.Batch.cs` | JSON batch execution in one open/save cycle. | +| `CommandBuilder.Import.cs` | CSV/TSV import plus `create`/`new`. | +| `CommandBuilder.Capabilities.cs` | Machine-readable HWP/HWPX capability report. | +| `CommandBuilder.Schema.cs` | Schema export/validation helpers for help/interface files. | +| `CommandBuilder.Help*.cs` | Schema-driven help and HWP/rhwp help/doctor. | +| `CommandBuilder.IntegrationStubs.cs` | Help-visible stubs for early-dispatch commands. | + +## Handler map + +| Handler area | Formats | Notes | +| --- | --- | --- | +| `Handlers/Word*`, `Handlers/Word/**` | `.docx` | OpenXML Word create/read/edit/render/query. | +| `Handlers/Excel*`, `Handlers/Excel/**` | `.xlsx` | Workbook/sheet/cell/formula/chart/pivot/table/data operations. | +| `Handlers/PowerPoint*`, `Handlers/Pptx/**` | `.pptx` | Slide/shape/media/chart/theme/morph/HTML/SVG operations. | +| `Handlers/Hwpx/**` | `.hwpx` | Custom ZIP/XML HWPX handler, validation, view, path, raw, set/diff/import helpers. | +| `Handlers/Hwp/**` | `.hwp`, `.hwpx` via provider | Capability report, engine selection, custom HWPX engine, rhwp bridge engine, typed errors. | +| `Handlers/DocumentHandlerFactory.cs` | all supported extensions | Opens the correct handler by extension. | + +## Embedded resources + +`src/officecli/officecli.csproj` embeds preview CSS/JS, `Resources/base.hwpx`, chart resources, all `skills/**`, root `SKILL.md`, and `schemas/help/**/*.json` for single-file distribution. diff --git a/structure/02-command-reference.md b/structure/02-command-reference.md new file mode 100644 index 000000000..c35a963a3 --- /dev/null +++ b/structure/02-command-reference.md @@ -0,0 +1,90 @@ +# Command reference + +This reference follows `Program.cs`, `CommandBuilder.cs`, and generated help schemas. Use `officecli help` and format-specific help as the runtime source of truth. + +## Early-dispatch commands + +| Command | Purpose | Evidence | +| --- | --- | --- | +| `officecli mcp` | Start MCP stdio server. | `Program.cs`, `CommandBuilder.Help.cs` | +| `officecli mcp ` | Register MCP with clients such as Claude, Cursor, VS Code/Copilot. | `Program.cs` | +| `officecli mcp uninstall ` / `mcp list` | Unregister or inspect MCP registrations. | `Program.cs` | +| `officecli install [target]` | Install binary, skills, and MCP integration. | `Program.cs`, `Core/Installer.cs` | +| `officecli skills ...` / `officecli skill ...` | Install embedded skills to agents; singular alias accepted. | `Program.cs`, `Core/SkillInstaller.cs` | +| `officecli load_skill ` | Print embedded skill content without installing. | `Program.cs`, `CommandBuilder.Help.cs` | +| `officecli config [value]` | Read/write update/config settings. | `Program.cs`, `Core/UpdateChecker.cs` | + +## Registered root commands + +| Command | Primary use | +| --- | --- | +| `open ` / `close ` | Start/stop a resident process for faster repeated edits. | +| `watch ` / `unwatch ` | Live HTML preview server with selection/mark support. | +| `mark`, `unmark`, `get-marks`, `goto` | Advisory marks and browser selection/navigation over a running watch session. | +| `view ` | Read/render document views. Modes include `text`, `annotated`, `outline`, `stats`, `issues`, `html`, `svg`, `screenshot`, `fields`, `field`. | +| `get ` | Read one node by OfficeCLI path. | +| `query ` | CSS-like element queries. | +| `set --prop k=v` | Modify node properties; HWP/HWPX special paths include `/field`, `/text`, and HWP `/table/cell`. | +| `add --type ` | Add typed elements. | +| `remove`, `move`, `swap` | Reorganize elements. | +| `raw`, `raw-set`, `add-part` | Raw OpenXML/XML escape hatch. | +| `validate ` | Schema/package validation. Use `view issues` and render/app-open proof for layout/content checks. | +| `batch ` | Execute JSON command arrays in one open/save cycle. | +| `import ` | Import CSV/TSV data into Excel/HWPX paths where supported. | +| `create` / `new ` | Create blank `.docx`, `.xlsx`, `.pptx`, `.hwpx`, and capability-gated `.hwp` when rhwp sidecars are ready. | +| `merge