diff --git a/.gitignore b/.gitignore index 534f2b8..d710180 100644 --- a/.gitignore +++ b/.gitignore @@ -58,6 +58,18 @@ Thumbs.db /releases/ /artifacts/ +# Generated sample corpus (programmatically regenerable; `npm run samples:generate`) +/samples/generated/ + +# Heavy OCR runtime + model vendors (reproducible via vendor scripts + local download; +# bundled into the app build from disk, not committed): +# onnxruntime-web: `npm i onnxruntime-web && npm run vendor:onnx` +# tesseract.js: `npm i tesseract.js && npm run vendor:tesseract` +# PP-OCRv5 ONNX models: `npm run vendor:paddle` (pinned source + SHA-256, scripts/paddleocr-models.manifest.json) +/public/vendor/onnxruntime/ +/public/vendor/paddleocr/ +/public/vendor/tesseract/ + # Test screenshots / debug captures /test-results/ /playwright-report/ diff --git a/CHANGELOG.md b/CHANGELOG.md index 195e93e..b9cc0c5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,12 +4,30 @@ ## [Unreleased] +## [2.3.0] - 2026-05-30 + +### 新增 + +- **本地 OCR(PP-OCRv5)**:图片 / 扫描 PDF 经 ONNX Runtime + WebGPU(WASM 回退)在本机识别。完整管线含图像预处理(ImageNet 归一化 + limit_side_len + 32 倍数)、DB 检测后处理(连通域 + unclip 外扩)、CTC 贪心解码 + 字典对齐、cls 方向校正(180°)、竖排/侧向 90° 试转、任意角倾斜自动纠偏(错切投影估角)、自适应中值去噪(仅噪图去噪、净图不损)、版面结构识别(按字号/间距归并为标题+段落)与识别质量评分(grade / 置信度 / 低置信行 / 纠偏 / 去噪)。应用内置一套 PP-OCRv5 mobile 模型,启动时自动载入本地缓存,开箱即用;可在安全中心导入/替换。用真实模型实测验证(rec 解词图为 "PAIN"、产品标签 0.978、倒置 0.976、+10° 倾斜 8→16 行恢复)。 +- **轻量 OCR(Tesseract.js)**:可选轻量 OCR engine,按需在安全中心导入 tessdata;与 PP-OCRv5 经优先级感知 `pickForTask` 路由(paddle 优先)。 +- **转换后检验三层**:规则 diff(字段级结构对比)+ SSIM 视觉回环 + OCR 回读,统一写入 `qualityReport.{ruleDiff,ssim,ocrReadback}` + `verification` envelope;工作台「转换检验报告」可视,含 OCR 识别质量行。SSIM 核心、文本相似度、规则 diff 均为零依赖纯函数,Node 全覆盖。 +- **LaTeX 数学渲染**:`$...$` / `$$...$$` 受保护 tokenization(反斜杠 / 下划线逐字保留,货币不误判),预览用本地 KaTeX 排版,零联网。 +- **Repair Engine 与按需模型缓存**:RepairAction 契约 + 规则驱动 validator/handler + 复核循环;model-cache manifest / SHA-256 / 状态机 / 安全中心导入 UI。 +- **优质测试样例生成器**:`npm run samples:generate` 程序化产出覆盖全格式、复杂排版、大小不一(large ≥ 3MB)的样例语料(gitignore)。 + ### 修复 -- **P7-A Windows 桌面发布基线**:统一 `package.json`、Tauri 配置与 Rust crate 版本为 `2.2.0`,声明已入库的 Windows ICO 图标;新增配置门禁并通过真实 `npm run desktop:build` 产出 MSI 与 NSIS 安装包。 -- **P8 路由损失可见性**:`RoutePlanner` 现在返回实际模型路径,跨模型转换将 `forcedWarnings` 和 `routeTemperature` 写入 QualityReport;工作台转换完成后展示带路径降级提示的转换模型。 -- **P8-B 可执行 mapper 与路径真值**:`SemanticDoc <-> WorkbookModel` 首批 mapper 已进入实际转换链并记录 `executedMappers`;`PPTX` 生成型输出和 `OFD -> PDF` 受限路径通过 `routeClass` 与 `PATH_NOT_RECOMMENDED` 明示质量边界。 -- **Markdown 导出原始 HTML 回归**:保留 task list 的 `[]` 字面语义,同时恢复文本节点 `<` / `>` 转义,避免纯文本输入导出 `.md` 时激活 HTML 标签。 +- **首页空白**:`browser-transformer.js` 漏 re-export `getKnownInputFormats` 导致 `landing-view.js` 浏览器加载失败、首页空白;补导出并加模块图加载守门。 +- **OCR 实际不触发**:转换走 Web Worker 同步路径绕过了 OCR;图片/PDF 改走主线程异步管线,OCR 真正执行。 +- **冻结引擎导入失败**:`paddleOcrEngine`/`tesseractOCREngine` 就绪状态存于冻结对象,`ensureProbe()` 赋值在严格模式抛错、令安全中心导入静默失败;改为模块级状态。 +- **OCR 质量数据被覆盖**:Repair Engine 的 `modelReview` 覆盖了 OCR stage 的,丢弃 `ocr/ocrQuality`;改为合并保留,识别质量得以在 UI 展示。 +- **P7-A Windows 桌面发布基线**:统一 `package.json`、Tauri 配置与 Rust crate 版本,声明 Windows ICO 图标;配置门禁 + 真实 `npm run desktop:build` 产出 MSI/NSIS。 +- **P8 路由损失可见性 / P8-B 可执行 mapper**:`RoutePlanner` 返回实际模型路径,`executedMappers` / `routeClass` / `PATH_NOT_RECOMMENDED` 写入 QualityReport。 +- **Markdown 导出原始 HTML 回归**:保留 task list `[]` 字面语义,恢复文本节点 `<` / `>` 转义。 + +### 方向 + +- **高级 OCR 目标调整**:调研确认 PaddleOCR-VL / MinerU(VLM)在浏览器/Tauri 本地 + 零云端 + 轻量默认包约束下不可内嵌,把高级 OCR 内置目标定为 **PP-OCRv5(ONNX/WebGPU)**,VLM 标注为远期/外部资源。 ## [2.2.0] - 2026-05-26 diff --git a/DEVELOPMENT_TASKS.md b/DEVELOPMENT_TASKS.md index ebd9c3a..73b2b64 100644 --- a/DEVELOPMENT_TASKS.md +++ b/DEVELOPMENT_TASKS.md @@ -47,17 +47,22 @@ | P9-A.3 PNG 异步 OCR 接入 + Repair 入口 | 已完成(2026-05-28) | convertContentAsync + runOCRStage 把 OCR 写入 SemanticDoc;detectOCRLowConfidence 进入 Repair Engine 默认 validator | | P9-A.4 扫描 PDF OCR 检测 + Rasterizer 骨架 | 已完成(2026-05-28) | isScannedPdf 启发式 + PdfPageRasterizer 抽象 + 多页 OCR stage + convertAsync PDF 分支 | | P9-B OCR → FixedLayoutModel + 浏览器 rasterize | 已完成(2026-05-28) | OCR 多页结果 → FixedLayoutModel(含 bbox/confidence/readingOrder)→ fixedLayoutToSemantic 派生 blocks;浏览器端 defaultPdfPageRasterizer 自动 dynamic import vendor pdfjs | -| P9-C 转换后检验三层 | 待启动 | 规则 diff、SSIM 视觉对比、OCR 回读检验统一写入 QualityReport | -| P9-D 高级 OCR | 待启动 | PaddleOCR-VL / MinerU 等大模型作为独立本地资源按需下载,明确体积、内存、降级路径 | +| P9-C.1 转换后检验三层 · 规则 diff 层 | 已完成(2026-05-29) | `public/core/verification/` 三模块 + `runVerificationStage` 编排 + `qualityReport.ruleDiff` / `qualityReport.verification` envelope;同格式 + md↔html 回环回读 diff;`blockFingerprint`/`modelFingerprint` 抽出共享 | +| P9-C.2 转换后检验三层 · SSIM 视觉对比 | 已完成(2026-05-29) | 自实现 SSIM core(零依赖)+ 视觉回环(输入页 vs 输出页)+ 像素源抽象(Node throw / 浏览器 canvas / 测试注入 stub)→ 写入 `qualityReport.ssim`;异步层只在 `convertAsync` 跑 | +| P9-C.3 转换后检验三层 · OCR 回读 | 已完成(2026-05-29) | 输出 PDF 栅格化→OCR 读回→与原文字符级 recall/precision/f1 对照→写入 `qualityReport.ocrReadback`;engine/rasterizer 复用 ocr-text,Node stub 覆盖 | +| P9-D.1 高级 OCR · PP-OCRv5 引擎骨架 | 已完成(2026-05-29) | 方向确认 PP-OCRv5 (ONNX/WebGPU) 为内置目标(VLM 远期/外部);`paddleOcrEngine` 契约 + ONNX manifest 登记 + Node 不可用三阶段拒绝;不引入 runtime、不实跑推理 | +| P9-D.2 高级 OCR · onnxruntime-web vendor + 运行时骨架 | 已完成(2026-05-29) | onnxruntime-web optionalDependency + sync-onnxruntime-vendor + `loadOnnxRuntime`/`pickExecutionProviders`(WebGPU/WASM)/session 骨架;recognize 经运行时加载,Node 抛 vendor-load-failed | +| P9-D.3 高级 OCR · PP-OCRv5 模型导入与安全中心管理 | 已完成(2026-05-29) | 安全中心导入 det/cls/rec onnx(file picker + SHA-256 + IDB,禁联网本地导入)→ ensureProbe 三件齐全置 available;顺带修复冻结引擎 ensureProbe 赋值 bug | +| P9-D.2.b 高级 OCR · PP-OCRv5 推理管线 | 已完成(2026-05-29) | 纯函数 det/rec 预处理 + DB 连通域后处理 + CTC 贪心解码 + `runPaddlePipeline` 编排(mock session 端到端可测);recognize 浏览器端 decode+session+pipeline,Node 仍 vendor-load 前置拒绝 | +| P9-D.4 高级 OCR · 接入转换链(路由偏好) | 已完成(2026-05-29) | 优先级感知 pickForTask(paddle 20 > tesseract 10 > placeholder 0);PNG/扫描 PDF stage 经 pickForTask 自动受益。P9-D PP-OCRv5 链路齐备,剩真实模型导入后浏览器端到端验证 | 详细子任务和验收门槛见 [docs/archive/DEVELOPMENT_HISTORY.md](docs/archive/DEVELOPMENT_HISTORY.md)。 ## 下一步执行顺序 -1. **P9-C 转换后检验三层**:规则 diff、SSIM 视觉对比、OCR 回读检验三层组合统一写入 QualityReport,作为项目核心差异化能力落地。 -2. **P9-D 高级 OCR**:接入 PaddleOCR-VL / MinerU 等本地解析模型;模型资源完全独立按需下载,明确体积、运行内存、降级路径和失败提示。 -3. **P7-B 跨平台发布与签名**:在转换能力表述准确后,于对应构建环境完成 macOS/Linux 安装包、签名/公证、自动更新、平台 smoke、文件关联和桌面权限体验。 -4. **发布前回归**:`npm test`、`git diff --check`、`npm run release:prepare`、release manifest ignore 验证。 +1. **P9-D / P9-C 浏览器端到端验证**:在浏览器/Tauri 端 `npm run vendor:onnx` + 安全中心导入真实 PP-OCRv5 det/cls/rec ONNX + 字典后,实测 PNG/扫描 PDF → 高级 OCR 的真实识别 + 三层检验(SSIM/OCR 回读真实跑)。P9-D 全链路 .1/.2/.3/.2.b/.4 已落地,Node 侧以 mock/纯函数全覆盖;真实渲染/OCR fixture 入库待此项。 +2. **P7-B 跨平台发布与签名**:macOS/Linux 安装包、签名/公证、自动更新、平台 smoke、文件关联与桌面权限体验。 +3. **发布前回归**:`npm test`、`git diff --check`、`npm run release:prepare`、release manifest ignore 验证。 ## P8-B 完成结果 @@ -84,6 +89,24 @@ > 仅保留最近 4 周内的记录;更早的归档到 [docs/archive/DEVELOPMENT_HISTORY.md](docs/archive/DEVELOPMENT_HISTORY.md),逐次发布的细节走 [CHANGELOG.md](CHANGELOG.md)。 +- **2026-05-30 (OCR 识别质量展示到检验报告 UI + modelReview 保留修复)**:把已计算但一直不可见的 OCR 识别质量呈现给用户。**修复潜伏 bug**:`format-registry.js` `_runRepairCycle` 用 Repair Engine 的 `modelReview`(`engine:"rule-based"`)覆盖了上游 OCR stage 写的 `modelReview`,导致 `ocr`/`ocrQuality` 子对象被丢弃、UI 取不到——改为合并保留 `priorReview.ocr` / `priorReview.ocrQuality`。`public/index.html` 检验报告面板加「OCR 识别质量」行(`#verificationOcrRecognitionRow`,默认 hidden,仅本次跑了 OCR 才显示)。`public/app.js` `renderVerificationReport` 读 `quality.modelReview.ocr` + `.ocrQuality` 渲染:引擎 / 行数 / 置信度 / 质量 grade / 低置信行 / 纠偏角 / 方向校正数 / 已去噪;grade 驱动 ok·skip·drift 配色。`scripts/browser-smoke-test.js` 断言 `#verificationOcrRecognition` 存在。`scripts/ocr-baseline-test.js` 加回归断言(`convertContentAsync` 默认 repair 路径下 `result.quality.modelReview.ocr` 必须存活)。`npm test` 28 个脚本全量通过。 +- **2026-05-30 (OCR 版面结构识别增强)**:落实「对文件内部文本格式识别的增强」——把 OCR 识别的多行(带 bbox)按版面归并成标题 + 段落,而非平铺成一个大段。新增 `public/core/ocr/ocr-structure.js`:`deriveOcrStructure(lines, opts)`(按阅读顺序 y→x 排序;中位行高 → 相对字号判定标题,行高 ≥ 中位 ×1.35 为 heading、按比例给 level 1-3;行间垂直间距 > 中位 ×0.7 分段;同段相邻行 CJK 直连 / 拉丁加空格;无 bbox 几何回退单段保持旧行为)+ `blocksFromOcrResult(result)`(按页拼接,空则回退 fullText)。`png-ocr.js` `enhanceWithOCR` 把 `paragraphsFromOCR`(旧:整页 16 行拼成 1 段)换成 `blocksFromOcrResult`(结构化标题+段落),删旧函数 + 未用 import。**真机验证**:产品标签 16 行 → 4 个结构块(「纯臻营养护发素」识别为 heading,正文按间距归并为段落)。`browser-transformer` export `deriveOcrStructure`/`blocksFromOcrResult`。新增 `scripts/ocr-structure-test.js`(7 组:字号判标题、间距分段、CJK/拉丁拼接、阅读顺序排序、无几何回退、blocksFromOcrResult 翻页、空白行忽略)接入 `npm test`(第 28 个)。`npm test` 28 个脚本全量通过。 +- **2026-05-30 (OCR 倾斜文档自动纠偏 · 去倾斜)**:解决倾斜/旋转文档(拍照/扫描带角度)识别崩塌的问题。新增纯函数 `rotateImageDataByAngle(im, deg)`(任意角最近邻旋转 + 画布扩展 + 白底)+ `estimateSkewAngle(probData, mapW, mapH)`(对 det 概率图二值化下采样后,用错切投影直方图方差找让文本行最对齐水平的角度,返回去倾斜旋转角)。`runPaddlePipeline` 抽出 `detect(image)` helper + 接 `options.deskew=true(默认)|false`:从首次 det 估倾斜角,`|est| >= minSkew`(默认 3°) 则把图旋正后**重检一次**再识别——正立图估计≈0 不重检、零开销;识别(rec)始终只跑一次。**真机验证**:倾斜 +10° 文档原本 **8 行/avgConf 0.535**(检测崩一半),自动纠偏后恢复 **16 行/0.970**(skewApplied=10);-8° 同样恢复;正立图 skewApplied=0、保持 0.974 不受影响。`quality` 摘要增 `skewApplied`。`browser-transformer` export `rotateImageDataByAngle`/`estimateSkewAngle`。`scripts/paddle-ocr-pipeline-test.js` 加任意旋转 + 倾斜估计(合成斜行 prob 图检出 ~角度、平行行≈0)断言(共 16 组)。`npm test` 27 个脚本全量通过。 +- **2026-05-30 (OCR 噪点去除 · 自适应中值去噪)**:按用户方向——只做文字内容转换(不保留艺术字样式),重点清理噪点改善带噪/艺术字背景识别。新增纯函数 `denoiseImageData`(3×3 中值滤波逐通道,保边去椒盐,alpha 透传)+ `estimateNoiseLevel`(采样统计与 3×3 邻域中值大幅跳变的孤立像素比例 [0,1],区分椒盐噪点与文字边缘)。`runPaddlePipeline` 接 `options.denoise = "auto"(默认)|true|false`:**auto 仅当 `estimateNoiseLevel > denoiseThreshold`(默认 0.05) 才去噪**——因为实测中值滤波会软化干净图、把清晰文本 0.974 拉到 0.903,所以干净图绝不去噪。**真机验证**:干净文档 noise≈0.016 → 不去噪、保持 **0.974**;15% 椒盐噪点 noise≈0.10 → 去噪,**检测从 4 行恢复到 16 行、avgConf 0.692→0.832**(重噪下检测会崩,去噪救回)。`quality` 摘要增 `denoised`/`noiseLevel`。`browser-transformer` export `denoiseImageData`/`estimateNoiseLevel`。`scripts/paddle-ocr-pipeline-test.js` 加噪点估计/中值去噪/auto-gating 三组断言(共 14 组)。明确**不做** 艺术字样式保留(只转文字内容)/ minAreaRect 透视纠偏。`npm test` 27 个脚本全量通过。 +- **2026-05-30 (OCR 方向校正 + 竖排/侧向 + 质量把控)**:增强难例识别与质量控制。**方向校正**:cls 模型实测输出 `[1,2]` softmax(class0=0°/class1=180°,upright→[1,0]、翻转→[0,1]),`runPaddlePipeline` 接 `interpretClsOutput` 在 rec 前对 180° 裁剪翻转——**实测一张完全倒置文档 16 行全部检测为 180° 并校正,识别为正确中文,avgConf 0.976**(upright 0.974)。**竖排/侧向**:高宽比 > `verticalAspect`(默认 1.5) 的框额外尝试 90°cw/ccw 旋转,按识别置信度取最优(侧向标签/旋转文本鲁棒,成本仅作用于少数高框)。新增纯函数 `rotateImageData180` / `rotateImageData90(dir)` / `interpretClsOutput`(单测:180 自逆、90 维度互换+角点映射、cls 阈值分支)。**质量把控**:`runPaddlePipeline` 返回 `quality`(`lineCount`/`averageConfidence`/`minConfidence`/`lowConfidenceLines`/`rotatedLines`/`grade` high·medium·low),`enhanceWithOCR` 把 `result.quality` 写入 `metadata.modelReview.ocrQuality`;与既有 per-line confidence + `detectOCRLowConfidence` validator + P9-C OCR 回读检验三条质量链协同。`dbPostProcess` 描述更新(含 unclip)。`browser-transformer` export 三个新函数。`scripts/paddle-ocr-pipeline-test.js` 加旋转/ cls/质量三组断言(共 12 组)。本轮**已解决** 180° 整体翻转 + 侧向高框;**仍受限** 强斜体/复杂艺术字(需 minAreaRect+透视纠偏的精确多边形框 + 更强 rec 模型,列为后续)。`npm test` 27 个脚本全量通过。 +- **2026-05-30 (PP-OCRv5 推理管线真机调通 + unclip 修复)**:用 `onnxruntime-node`(+`pngjs`/`jpeg-js`,dev-only `--no-save`)在 Node 端跑真实 PP-OCRv5 det/cls/rec ONNX 实测调参。**验证结果**:rec 在 PaddleOCR 词图 `word_10.png` 上解出 "PAIN",输出类别数 C=18385 与字典长度精确对齐(blank+18383+space);det 在真实文档图上输出概率图、阈值化后找到 16 个文本框;预处理 RGB 通道 + `(x/255-0.5)/0.5` 归一化正确,无需 BGR。**关键修复**:`dbPostProcess` 原先输出连通域轴对齐 bbox 但**缺 PP-OCR 的 unclip 外扩**,导致裁剪框过紧切掉字符笔画 → 全图识别错乱(avgConf 0.41、CJK 乱码)。加入 unclip(`distance = area*ratio/perimeter` 向外扩,`unclipRatio` 默认 1.6)后,整张产品标签文档识别为**连贯正确中文**(avgConf 0.978:「纯臻营养护发素 / 产品信息/参数 / 【品名】:纯臻营养护发素 / 【净含量】:220ml / 【主要成分】:鲸蜡硬脂醇、燕麦β-葡聚糖…」)。`paddle-ocr-pipeline.js` `dbPostProcess` 加 `unclipRatio` 参数 + 外扩逻辑。`scripts/paddle-ocr-pipeline-test.js` DB 断言用 `unclipRatio:0` 校验精确 bbox + 新增外扩断言。新增 `scripts/paddle-ocr-integration-test.js`:onnxruntime-node 跑真实 rec 解 fixture `samples/ocr/word-PAIN.png` 断言 "PAIN" + C 对齐 dict;缺依赖/模型/fixture **优雅跳过 exit 0**(默认 CI 不因缺 dev-dep 失败),接入 `npm test`(第 27 个)。本机实测 rec 0.991 置信度。本轮**确认整条 PP-OCRv5 管线对真实模型正确**;剩 cls 角度旋转校正、minAreaRect+unclip 精确多边形框(当前轴对齐 + 比例外扩已足够好)。`npm test` 27 个脚本全量通过。 +- **2026-05-30 (LaTeX 数学渲染)**:新增 `$...$`(行内)/ `$$...$$`(块级)LaTeX 数学渲染。inline tokenizer(`public/formats/inline-tokens.js`)在转义处理**之前**识别数学定界符,内容逐字保留(不递归、不转义,避免 `\frac`/`\sum` 反斜杠被吃、`_` 被误判为 em);行内启发式(定界符内侧非空白)排除 `$5 and $10` 货币误判。`public/core/models/semantic-inlines.js` 加 `createInlineMath` 工厂 + `math` 节点在 plainText/markdown/html 三处渲染:HTML 产出 `$tex$`(data-tex 携原始 tex 供 KaTeX,span 文本是带定界符 tex 作无 JS 兜底);markdown 往返保真。vendor KaTeX v0.17.0(`katex.min.css` 23KB + `katex.min.js` 265KB + 20 个 woff2 字体 300KB,共 ~592KB)到 `public/vendor/katex/`。新增 `public/katex-render.js` `renderMathIn(root)`:用全局 `katex.render` 排版 `.t2f-math` span(同源 vendor,零网络;未加载到 katex 静默保留兜底)。`index.html` / `preview.html` head 加 katex css + defer js;`app.js`(三处预览渲染后)+ `preview.js` 调 `renderMathIn`。`scripts/local-security-test.js` `isLocalVendorAsset` 信任 katex vendor(其 http 字符串仅 W3C MathML/SVG 命名空间,非网络)。新增 `scripts/latex-math-test.js`(7 组:行内/块级 tokenization 反斜杠+下划线保真、货币排除、katex 可定位 span、md 往返、plainText+工厂、md→html 不产 ``)接入 `npm test`(第 26 个)。本轮真实数学排版在浏览器(KaTeX);Node 侧覆盖 tokenization + 渲染产物。`npm test` 26 个脚本全量通过。 +- **2026-05-30 (P9-D 真实 PP-OCRv5 模型接入 + 开箱即用自动加载)**:把真实运行时与模型装上,让高级 OCR 真正可用(此前 P9-D.1~.4 是契约/运行时/管线骨架,无真实模型)。`npm install onnxruntime-web`(optionalDependency,1.26.0);`scripts/sync-onnxruntime-vendor.js` 收紧为最小集合(只同步 `ort.min.mjs` + `ort-wasm-simd-threaded.jsep.{mjs,wasm}`,JSEP 构建同时支持 WebGPU/WASM,~25MB,剔除 ~68MB 冗余变体)。下载真实 **PP-OCRv5 mobile** ONNX(det 4.8MB / cls 0.58MB / rec 16.6MB)+ 字典(74KB)到 `public/vendor/paddleocr/`(来源 OnnxOCR 仓库的 PP-OCRv5 ONNX)。新增 `public/core/ocr/paddle-default-models.js`:`ensurePaddleDefaultModels()` 浏览器端幂等地 fetch 同源 `/vendor/paddleocr/` 模型 → 写入 `defaultOCRStorage`(IndexedDB)→ `markPaddleOcrVendorReady(true)` + `ensureProbe()`,让高级 OCR **无需手动导入开箱即用**;vendor 缺失静默跳过(仍可经安全中心手动导入)。`app.js` init 末尾 fire-and-forget 调用。`browser-transformer.js` export `ensurePaddleDefaultModels`。`.gitignore` 排除 `/public/vendor/onnxruntime/` 与 `/public/vendor/paddleocr/`(不入 git,由 vendor 脚本 + 本地下载重建,随应用打包)。`scripts/local-security-test.js`:`paddle-default-models.js` 加 ALLOWED + STRICT;`isLocalVendorAsset` 把 onnxruntime vendor 整目录视为可信(其 minified bundle 含 CDN 字符串无法静态剔除,零联网保证来自 `loadOnnxRuntime` 钉 `wasmPaths` 同源 + Tauri CSP `connect-src 'self'`)。`scripts/resource-budget-test.js` public/vendor 预算 6MB → 64MB(含 onnxruntime + PP-OCRv5 模型,附理由)。**真实 ONNX 推理只能在浏览器/WebGPU 跑**,Node 侧仍 mock/纯函数全覆盖;浏览器端到端实测见 `docs/PP_OCRV5_BROWSER_VERIFICATION.md`。`npm test` 25 个脚本全量通过。 +- **2026-05-29 (P9-D.4 高级 OCR 接入转换链 · 路由偏好)**:P9-D 收口,让 PP-OCRv5 在可用时优先于 tesseract。新增 spec `docs/superpowers/specs/2026-05-29-p9d4-ocr-route-preference-design.md`。`public/core/ocr/ocr-engine.js` `pickForTask` 改为**优先级感知**:候选按 `priority` 降序挑第一个 available(`priority` 缺省 0),无可用时回退末位(行为不变)。`paddle-ocr-engine.js` 加 `priority: 20`、`tesseract-engine.js` 加 `priority: 10`(placeholder 缺省 0)。PNG / 扫描 PDF stage(`enhanceWithOCR` / `runScannedPdfOCRStage`)经 `defaultOCRRegistry.pickForTask("ocr-text")` 自动选到可用的最高优先级引擎,**无需改动 stage 代码**。`scripts/ocr-baseline-test.js` 第 38 组(自建 registry 高优先级 available 胜出 + 可用低优先级胜过不可用高优先级 + 默认 registry 同时让 paddle/tesseract available → 选 paddleocr-v5、删 paddle 模型 → 回到 tesseract-zh-en)。本轮**不改** PNG/扫描 PDF stage、**不在** Node 跑真实 ONNX(真实模型端到端为浏览器手动)。至此 P9-D PP-OCRv5 本地高级 OCR 链路(契约 → 运行时 → 模型导入 → 推理管线 → 路由偏好)齐备。`npm test` 25 个脚本全量通过。 +- **2026-05-29 (P9-D.2.b PP-OCRv5 推理管线)**:实现 PP-OCRv5 真实推理管线,把 det/cls/rec 三段 ONNX 前向 + 经典前后处理串成 OCRResult。新增 spec `docs/superpowers/specs/2026-05-29-p9d2b-paddle-inference-pipeline-design.md`。新增 `public/core/ocr/paddle-ocr-pipeline.js`:纯函数 `parseCharDictionary`(blank@0 + 行 + 末尾空格)/ `preprocessForDetection`(ImageNet 归一化 + 32 倍数 + limit_side_len=960 + scaleW/H)/ `preprocessForRecognition`(高 48 + (x/255-0.5)/0.5)/ `dbPostProcess`(阈值二值化 + 4-连通域 BFS + 轴对齐 bbox + box 平均概率过滤 + 缩放回原图 + 上→下左→右排序)/ `ctcGreedyDecode`(逐时刻 argmax → 折叠连续重复 → 去 blank(0) → 映射字典 + 平均 conf)/ `cropImageData` / `resizeRgba`(最近邻);编排器 `runPaddlePipeline({ ort, detSession, clsSession, recSession, imageData, dictionary, options })` → OCRResult(接受可注入 session,mock 可端到端测)。`paddle-ocr-engine.js` recognize 第三阶段:`loadOnnxRuntime` → `decodeImageToImageData`(Image+canvas,不用 fetch,遵守禁联网;Node 抛)→ 从 `_storage` 取 det/cls/rec buffer + 可选字典 `paddleocr/v5/dict.txt` → `createOcrSession` ×3 → `runPaddlePipeline` → finally 释放 session;Node 仍在 `loadOnnxRuntime` 前置拒绝(不会走到 decode)。`browser-transformer.js` export `runPaddlePipeline`/`parseCharDictionary`/`preprocessForDetection`/`preprocessForRecognition`/`dbPostProcess`/`ctcGreedyDecode`/`cropImageData`/`resizeRgba`/`DET_LIMIT_SIDE_LEN`/`REC_IMAGE_HEIGHT`。新增 `scripts/paddle-ocr-pipeline-test.js`(9 组:字典、det/rec 预处理形状与归一、resize/crop、DB 单/双连通域出框+排序、CTC 折叠去重去 blank、`runPaddlePipeline` 用 mock ort+session+合成图端到端解出 "HI"、参数校验)接入 `npm test`(第 25 个)。`scripts/local-security-test.js` `paddle-ocr-pipeline.js` 加 ALLOWED + STRICT。`scripts/local-model-direction-test.js` 守门加 `runPaddlePipeline`/`ctcGreedyDecode`。本轮**不做** cls 角度旋转校正(仅调用占位)、minAreaRect+unclip 高精度框、多栏阅读顺序;**不在** Node 跑真实 ONNX(mock 覆盖编排,真实模型端到端为浏览器手动);**不接** 转换链/偏好排序(P9-D.4)。`npm test` 25 个脚本全量通过。 +- **2026-05-29 (P9-D.3 PP-OCRv5 模型导入与安全中心管理 + 冻结引擎 bug 修复)**:让用户能在安全中心把 PP-OCRv5 det/cls/rec ONNX 模型导入本地缓存,使 `paddleOcrEngine` 从 model-missing 走向就绪。复用 tesseract tessdata 的**本地导入**模式(项目禁联网/STRICT 守门禁止远程 URL,"按需下载"在本项目即用户本地导入,不做自动 fetch)。新增 spec `docs/superpowers/specs/2026-05-29-p9d3-paddle-model-management-design.md`。`public/security-center.js`:import `paddleOcrEngine`/`markPaddleOcrVendorReady`/`PADDLE_OCR_MODEL_FILES`;`renderModelCache` 对 `engine === "paddleocr"` 行调 `renderPaddleActions`(导入 det.onnx/cls.onnx/rec.onnx + 清除按钮);新增 `importPaddleModel`(file picker → `arrayBuffer` → `sha256Hex` → `defaultOCRStorage.put("paddleocr/v5/")` → `paddleOcrEngine.ensureProbe()`;三件齐全才 `markPaddleOcrVendorReady(true)` + `STATUS_AVAILABLE`,否则 `STATUS_VERIFYING` 提示还需导入哪些)+ `missingPaddleFiles` + `clearPaddleModels`;click 委托加 `[data-import-paddle]`/`[data-clear-paddle]`。**修复潜伏 bug**:`paddleOcrEngine` 与 `tesseractOCREngine` 的就绪状态原存于 `Object.freeze` 后的实例属性(`_modelsReady`/`_tessdataReady`),`ensureProbe()` 赋值在 ES module 严格模式下抛 `Cannot assign to read only property`——这会让安全中心 tesseract/paddle 导入流程在浏览器静默失败(被 try/catch 吞成"导入失败")。改为模块级可变变量持有就绪状态,引擎对象仍 `Object.freeze`。`scripts/ocr-baseline-test.js` 第 37 组(三件齐全 → isAvailable true、vendor off → false、删一件 → false)+ 第 11 组加 `ensureProbe` 不抛断言。`scripts/browser-smoke-test.js` 断言 `#modelCacheFileInput` 存在。`docs/MULTI_MODEL_ARCHITECTURE.md` P9-D.3 章节。本轮**不实现** det/cls/rec 推理 + CTC 解码(P9-D.2.b)、**不做** 自动远程下载(违反禁联网)、**不接** 转换链/偏好排序(P9-D.4)。`npm test` 24 个脚本全量通过。 +- **2026-05-29 (P9-D.2 onnxruntime-web vendor + 运行时加载骨架)**:在 P9-D.1 引擎骨架之上接入 PP-OCRv5 的 ONNX Runtime(同 P9-A.2 tesseract 的 vendor+骨架节奏)。新增 spec `docs/superpowers/specs/2026-05-29-p9d2-onnxruntime-vendor-design.md`。新增 `scripts/sync-onnxruntime-vendor.js`(模仿 sync-tesseract-vendor:从 `node_modules/onnxruntime-web/dist/` 同步 `ort*.mjs` + `*.wasm` 到 `public/vendor/onnxruntime/`;缺包 exit 0 不阻塞)、`public/core/ocr/paddle-ocr-runtime.js`(`loadOnnxRuntime(vendorUrl)` dynamic import 同源 vendor ORT、设 `ort.env.wasm.wasmPaths` 同源、Node 抛 `OCR_VENDOR_LOAD_FAILED`;`pickExecutionProviders()` 检测 `navigator.gpu` → `["webgpu","wasm"]` 否则 `["wasm"]`;`createOcrSession({ ort, modelBuffer, providers })` / `disposeOcrSession` / `resetOnnxRuntimeCache` 骨架;`PADDLE_VENDOR_PATHS`)。`paddle-ocr-engine.js` recognize 第三阶段改为经 `pickExecutionProviders()` + `await loadOnnxRuntime()`,浏览器装好 vendor + 模型后以 `pipeline-not-wired`(`OCR_ENGINE_FAILED`)拒绝(det/cls/rec 推理 + CTC 解码留 P9-D.2.b);Node 在 model-missing 前置拒绝。`package.json` 加 `onnxruntime-web@^1.20.1` optionalDependency + `vendor:onnx` script + `release:prepare` 加入 onnx vendor sync。`browser-transformer.js` export `loadOnnxRuntime`/`pickExecutionProviders`/`createOcrSession`/`disposeOcrSession`/`resetOnnxRuntimeCache`/`PADDLE_VENDOR_PATHS`。`scripts/local-security-test.js` `isLocalVendorAsset` 识别 `public/vendor/onnxruntime/` + `paddle-ocr-runtime.js` 加 ALLOWED + STRICT。`scripts/local-model-direction-test.js` multiModel 守门加 `onnxruntime-web`。`scripts/ocr-baseline-test.js` 第 36 组(`pickExecutionProviders` Node 返回 `["wasm"]` + `loadOnnxRuntime` Node 抛 `OCR_VENDOR_LOAD_FAILED` + 模拟 vendor+模型就位后 recognize 经运行时加载抛 `OCR_VENDOR_LOAD_FAILED`)。Tauri CSP 已含 `wasm-unsafe-eval` + `worker-src blob:` + `connect-src 'self'`,无需改动。本轮**不实现** det/cls/rec 推理管线 + CTC 解码(P9-D.2.b,需真实模型 + 字典)、**不做** 模型按需下载/UI(P9-D.3)、**不接** 转换链/偏好排序(P9-D.4)、**不强制安装** onnxruntime-web(仅 optionalDependency)。`npm test` 24 个脚本全量通过。 +- **2026-05-29 (P9-D 高级 OCR 方向调研 + P9-D.1 PP-OCRv5 引擎骨架)**:调研核实 DEVELOPMENT_TASKS 原命名「PaddleOCR-VL / MinerU」在本项目硬约束(浏览器/Tauri 本地 + 零云端 + 30–80MB 轻量默认包 + 无 Python runtime)下不可内嵌——PaddleOCR-VL(0.9B VLM)无成熟 ONNX/WebGPU 路径、需 ~500MB + 1–2GB VRAM 或 vLLM 服务;MinerU 是 Python/PyTorch/vLLM 工具。**经用户确认**把 P9-D 高级 OCR 内置目标改为 **PP-OCRv5(ONNX Runtime + WebGPU,WASM 回退)**,VLM 标注为远期/外部资源。新增调研 spec `docs/superpowers/specs/2026-05-29-p9d-advanced-ocr-research.md`(含来源链接)+ 子阶段 spec `2026-05-29-p9d1-paddle-ocr-skeleton-design.md`。P9-D.1 按骨架先行落地:新增 `public/core/ocr/paddle-ocr-engine.js`(`paddleOcrEngine` id `paddleocr-v5`,taskCapabilities `["ocr-text","ocr-layout"]`,manifestId `ocr-text.paddleocr.v5`;`isAvailable()` = vendorReady(`__t2fPaddleOcrVendorReady`) && det/cls/rec 模型在本地缓存,Node 恒 false;`recognize()` 三阶段拒绝 vendor-not-ready / model-missing / runtime-not-wired;`markPaddleOcrVendorReady` + `ensureProbe`;**不引入 onnxruntime、不实跑推理**)、`public/core/ocr/paddle-ocr-bootstrap.js`(在 tesseract 之后注册 engine + 注册 PP-OCRv5 ONNX ModelManifest `engine:"paddleocr"`/int8/det/cls/rec perFile 占位到 `defaultModelCache`,状态 not-downloaded)。`browser-transformer.js` 顶层 import paddle-bootstrap + export `paddleOcrEngine`/`PADDLE_OCR_MANIFEST_ID`/`PADDLE_OCR_MODEL_FILES`/`markPaddleOcrVendorReady`/`ensurePaddleOcrBootstrap`。`scripts/ocr-baseline-test.js` 两处 pickForTask 回退集合断言加 `paddleocr-v5` + 新增第 35 组 paddle 骨架断言(注册/isAvailable false/manifest 登记/recognize vendor-not-ready+model-missing 两阶段拒绝)。`scripts/local-security-test.js` 两模块加 ALLOWED + STRICT。`scripts/local-model-direction-test.js` multiModel 守门加 `PP-OCRv5`/`ONNX`/`WebGPU`/`paddleOcrEngine`(保留既有「不默认内置 PaddleOCR-VL/MinerU」负断言)。方向文档 `MULTI_MODEL_ARCHITECTURE`/`CONVERSION_ROUTING`/`DESKTOP_APP_ARCHITECTURE`/`DESKTOP_RELEASE_PLAN`/`RESOURCE_BUDGET`/`PRODUCT_STRATEGY` 统一改为「PP-OCRv5 内置 + VLM 远期/外部」(保留「默认包不含 GB 级模型」表述)。本轮**不接** onnxruntime 运行时(P9-D.2)/ 模型按需下载 UI(P9-D.3)/ 转换链接入与偏好排序(P9-D.4)/ VLM 内嵌 / Python sidecar。`npm test` 24 个脚本全量通过。 +- **2026-05-29 (P9-C.4 转换检验结果 UI)**:把 P9-C.1/2/3 算出但一直被前端丢弃的三层检验结果呈现给用户。此前 `transformContent` 转换后只用 `toConversionDocumentModel` 重建 model 渲染基础信息,`result.quality`(含 `qualityReport.{ruleDiff,ssim,ocrReadback}` + `verification` envelope + `autoRepair`)被丢弃;底部抽屉面板又已在 UI-A 前移除,核心差异化能力「转换后检验」对用户不可见。新增 spec `docs/superpowers/specs/2026-05-29-p9c4-verification-ui-design.md`。`public/index.html` 在 `#outputPreviewPanel` 内新增 `#verificationReportPanel`(`
` 折叠面板,默认 hidden):自动修复结论 + 规则 diff / SSIM / OCR 回读三层逐行(命中显示 fidelity/score/f1 等指标,未命中显示 `verification.skipped[].reason`)+ warnings 按 severity 计数 + badge 显示已检验层数。`public/app.js` 新增 `renderVerificationReport(quality)`(用 `textContent` 写入防 XSS,`describeRuleDiff`/`describeSsim`/`describeOcrReadback` 分层描述,`data-state=ok/drift/skip` 驱动配色)+ `currentConversionQuality` 状态;`transformContent` 捕获 `result.quality` 并在文本/二进制路径前统一渲染;`resetGeneratedOutput` 清空面板。`public/styles.css` 加 `.verification-report` / `.verification-row` / `.verification-badge` 规则(slate+teal + `:has()` 左边框着色)。`scripts/browser-smoke-test.js` 断言 `#verificationReportPanel` 存在,且已移除的底部抽屉负断言(`bottomReportPanel`/`warningsPanel`/`qualityReportPanel`/`diffPanel`/`versionsPanel`)仍成立。本轮**不复活** 旧抽屉、**不改** 转换核心 / verification-stage / format-registry、**不入库** 真实渲染 fixture、**不引入** 新依赖。`npm test` 24 个脚本全量通过。 +- **2026-05-29 (优质测试样例语料生成器)**:新增覆盖全部受支持格式、复杂排版、大小不一(large ≥ 3MB)的程序化样例语料,用来压力测试转换、版面与三层检验能力。新增 `scripts/lib/sample-content.js`(纯函数确定性内容 builder:`buildComplexMarkdown` / `buildComplexHtml` / `buildComplexJson` / `buildComplexXml` / `buildComplexCsv` / `buildComplexText`,覆盖多级标题/嵌套列表/任务项/对齐表格/多语言代码块/嵌套引用/脚注/图片/CJK/RTL/emoji/实体;`SIZE_TIERS` small/medium/large + `buildToTargetBytes` 逼近目标字节)、`scripts/lib/png-encode.js`(node:zlib 最小 PNG 编码器 + `buildPatternPng`)、`scripts/generate-samples.js`(CLI:md/html/txt/json/xml/csv 直接产出,docx/pptx/epub/pdf/xlsx 经项目 writer 产出,png 编码;`--tiers` / `--out` 参数;写 `MANIFEST.json` 含 coverageGaps 登记 doc/ofd 无 writer)。输出到 `samples/generated/`(加入 `.gitignore`,不入库,符合 `samples/fixtures/README.md` 程序化 fixture 政策)。实测 large 层:md/html/txt/json/xml/csv 均 ≥3MB,docx 13.6MB、pdf 19.4MB、xlsx 16MB、epub 4.9MB。新增 `scripts/sample-corpus-test.js`(8 组断言:builder 确定性 + size scaling + 复杂结构覆盖 + CSV 边界字段 + md/csv 跨全矩阵可读性 + PNG 签名/尺寸 + buildToTargetBytes 逼近)纳入 `npm test`。`package.json` 加 `samples:generate` script。`samples/fixtures/README.md` 补程序化语料章节。本轮**不入库** 任何大样例/二进制(gitignore + 政策一致)、**不改** 转换核心/writer、**不引入** 新 npm 依赖(PNG 用 node:zlib 内置);pptx writer 不随内容放大(已知 writer 限制,文件仍有效)、doc/ofd 无 writer 在 MANIFEST 登记缺口。`npm test` 24 个脚本全量通过。 +- **2026-05-29 (P9-C.3 转换后检验三层 · OCR 回读层 + 三层收口)**:落地第三层 OCR 回读,转换后检验三层(rule-diff + SSIM + OCR 回读)齐备。先写 spec `docs/superpowers/specs/2026-05-29-p9c3-ocr-readback-design.md`。新增 `public/core/verification/ocr-readback.js`:`compareText(original, recognized)` 纯函数零依赖(字符级多重集 recall/precision/f1 + `normalizeText` NFKC+小写+去空白,跨中英文与 OCR 噪声稳健,无需分词)+ `extractModelText(model)`(拼接 heading/paragraph/quote/list/table/code/content 文本)+ `runOcrReadbackLayer({ model, output, ctx, engine?, rasterizer? })`(资格 `ctx.to === "pdf"` 且原文非空 且 OCR engine 可用;经 OCR `defaultPdfPageRasterizer` 栅格化输出 PDF → `engine.recognize({ image: dataUrl })` → `compareText` → `qualityReport.ocrReadback = { recall, precision, f1, threshold, passed, engineId, originalLength, recognizedLength, averageConfidence, pageIndex }`;低于阈值默认 f1≥0.7 发 info `OCR_READBACK_DRIFT`;engine 不可用→eligible:false reason `ocr-engine-unavailable`、rasterizer 不可用→`rasterizer-unavailable`、recognize 抛错→`OCR_READBACK_FAILED` info;均不抛不阻塞)。常量 `DEFAULT_OCR_READBACK_THRESHOLD`。`verification-stage.js` `runVerificationStageAsync` 末尾 dynamic import `ocr-readback.js` 跑第三层(避免同步 convert 路径加载 OCR 模块),合并 `layers`/`skipped`/`warnings` + `ocrReadback` 字段。`format-registry.js` `_assembleQuality` 增 `ocrReadback: verification.ocrReadback ?? null`(同步路径恒 null)。`browser-transformer.js` 顶层 export `compareText`/`normalizeText`/`extractModelText`/`runOcrReadbackLayer`/`OCR_READBACK_DRIFT`/`OCR_READBACK_FAILED`/`DEFAULT_OCR_READBACK_THRESHOLD`。新增 `scripts/ocr-readback-test.js`(13 组断言:compareText 相同/子集/CJK/空、normalizeText、extractModelText、runOcrReadbackLayer stub 端到端/drift warning/非 pdf 不 eligible/engine 不可用/rasterizer 不可用、runVerificationStageAsync 三层合并 Node 下 ocr-readback skipped、convertContentAsync md→md ocrReadback=null)。`package.json` test 链在 `ssim-verification-test.js` 之后插入 `ocr-readback-test.js`。`scripts/local-security-test.js` `ocr-readback.js` 加 ALLOWED + STRICT。`scripts/local-model-direction-test.js` 守门加 `runOcrReadbackLayer`/`compareText`/`OCR_READBACK_DRIFT`。`docs/MULTI_MODEL_ARCHITECTURE.md` 新增「OCR 回读层」章节 + 三层齐备小结。本轮**渲染+OCR stub-only**(Node 无 canvas/tessdata,真实 OCR 回读端到端留给浏览器手动验证)、**不让** Repair Engine 消费 ocrReadback、**不做** 多页回读/逐块定位、**不引入** 新依赖、**不接** 高级 OCR(P9-D)、**不改** 同步 `convert()` 语义 / `options.repair === false` 短路 / UI。`npm test` 23 个脚本全量通过。 +- **2026-05-29 (P9-C.2 转换后检验三层 · SSIM 视觉回环层)**:在 P9-C.1 envelope 之上落地第二层 SSIM 视觉对比,采用视觉回环语义(输入页 vs 输出页栅格化做结构相似度)。先写 spec `docs/superpowers/specs/2026-05-29-p9c2-ssim-visual-loopback-design.md`。新增 `public/core/verification/ssim.js`(纯函数零依赖:`rgbaToGrayscale` + `resampleGrayscale` box 重采样到公共网格 + `computeSSIM` 非重叠 8×8 窗口均值 SSIM,`SSIM_C1=6.5025`/`SSIM_C2=58.5225` + `compareImages` 两图归一后算分,`DEFAULT_TARGET_WIDTH=256`)、`public/core/verification/page-image-source.js`(像素源抽象:`defaultPageImageSource` Node 抛 `VERIFICATION_IMAGE_SOURCE_UNAVAILABLE`、浏览器首次调用 dynamic import browser 实现;`setPageImageSource`/`resetPageImageSource` 注入;`RASTERIZABLE_FORMATS = {pdf,png}`)、`public/core/verification/page-image-source-browser.js`(PDF 经 `createBrowserPdfPageRasterizer` 得 dataUrl→`Image`→canvas→`getImageData`;PNG 直接 `Image`→canvas→`getImageData`)。`verification-stage.js` 增 `runSsimLayer({ ctx, output })`(资格判断 from/to ∈ {pdf,png};取源图+输出图像素→`compareImages`→`qualityReport.ssim={score,threshold,passed,width,height,pageIndex,sourceFormat,outputFormat}`;低于阈值默认 0.85 发 info `SSIM_VISUAL_DRIFT`;image source 不可用→eligible:false reason `image-source-unavailable` 不抛)+ `runVerificationStageAsync`(同步 rule-diff 基底 + 异步 SSIM 合并 envelope)+ 常量 `SSIM_VISUAL_DRIFT`/`SSIM_SOURCE_UNAVAILABLE`/`DEFAULT_SSIM_THRESHOLD`。`format-registry.js` 抽出 `_runRepairCycle` + `_assembleQuality` 共享 helper,`convert()`(sync)走 `_wrapWithRepairCycle`(rule-diff,`qualityReport.ssim` 恒 null),`convertAsync()` 改走 `_wrapWithRepairCycleAsync`(rule-diff + SSIM)。`browser-transformer.js` 顶层 export `computeSSIM`/`compareImages`/`rgbaToGrayscale`/`resampleGrayscale`/`runVerificationStageAsync`/`runSsimLayer`/`defaultPageImageSource`/`setPageImageSource`/`resetPageImageSource`/`RASTERIZABLE_FORMATS`/`SSIM_VISUAL_DRIFT`/`VERIFICATION_IMAGE_SOURCE_UNAVAILABLE` 等。新增 `scripts/ssim-verification-test.js`(12 组断言:灰度/重采样/computeSSIM 相同图=1、黑白≈0、单调性、不同尺寸归一、runSsimLayer 注入 stub 端到端/drift warning/非视觉路径不 eligible、defaultPageImageSource Node 抛错、runVerificationStageAsync 合并双层、convertContentAsync 文本路径 ssim=null + 视觉路径 stub 填充 ssim)。`package.json` test 链在 `rule-diff-test.js` 之后插入 `ssim-verification-test.js`。`scripts/local-security-test.js` 三模块加 ALLOWED + STRICT。`scripts/local-model-direction-test.js` 守门加 `computeSSIM`/`runVerificationStageAsync`/`SSIM_VISUAL_DRIFT`。`docs/MULTI_MODEL_ARCHITECTURE.md` 新增「SSIM 视觉回环层」章节。本轮**渲染 stub-only**(Node 无 canvas,真实 PDF/PNG 渲染 fixture + 浏览器端 canvas 像素 wiring 端到端验证留给后续)、**不做** OCR 回读(P9-C.3)、**不引入** 新 npm 依赖、**不支持** OFD 源图/多页、**不改** 同步 `convert()` 语义 / `options.repair === false` 短路 / UI。`npm test` 22 个脚本全量通过。 +- **2026-05-29 (P9-C.1 转换后检验三层 · 规则 diff 层)**:把「转换后检验」核心差异化能力落第一层,并钉好三层统一数据契约。先写两份 spec:`docs/superpowers/specs/2026-05-29-p9c-three-layer-verification-design.md`(P9-C 总 spec:envelope `qualityReport.verification` = `{ eligible, layers, skipped, runtimeMs }` + 三层字段 `ruleDiff`/`ssim`/`ocrReadback` + 与 Repair Engine / RoutePlanner 关系 + 子阶段顺序)、`docs/superpowers/specs/2026-05-29-p9c1-rule-diff-design.md`(P9-C.1 子 spec)。新增 `public/core/verification/` 三模块:`block-fingerprint.js`(从 `repair-engine.js` 抽出 `blockFingerprint` / `modelFingerprint`,行为字节级不变;新增 `getBlockKey` 稳定对齐键 / `extractBlockFields` 字段子集 / `BLOCK_FIELDS_BY_TYPE` 常量;`ROUND_TRIP_FORMATS` 单一来源)、`rule-diff.js`(`diffSemanticDocs(original, readBack)` → `{ identical, blockCounts, changedBlocks, addedBlocks, removedBlocks, fidelity, overallScore }`;按 `getBlockKey` 对齐 + LCS-lite 二次启发对齐;字段级 deep-equal 分 minor/major severity;`fidelity` ∈ exact/minor-drift/major-drift/broken;`overallScore` 由 `MAJOR_WEIGHT=0.4`/`MINOR_WEIGHT=0.05`/`STRUCTURAL_PENALTY=0.5` 加权惩罚 clamp 到 [0,1])、`verification-stage.js`(`runVerificationStage({ model, output, ctx })` 编排:资格判断 from/to ∈ text-canonical + `output.data` 为字符串;同格式直接 `ctx.read` 回读 diff,跨格式仅开 `md ↔ html` 回环(`prepareConversionModel` 反向 + `write` + `read`,repair:false 避免递归);回读抛错发 `RULE_DIFF_READBACK_FAILED`、`fidelity !== "exact"` 发 `RULE_DIFF_DRIFT`,均 info 级不阻塞)。`repair-engine.js` 删本地 `blockFingerprint`/`modelFingerprint`/`ROUND_TRIP_FORMATS` 副本改 import 共享模块,`reverifyRoundTrip` / `roundTripDelta` 契约一字段未动。`format-registry.js` `_wrapWithRepairCycle` 在 Repair cycle 之后接 `runVerificationStage`,把结果叠到 `quality.qualityReport.ruleDiff` 与 `.verification`,verification warnings 经 `withWarnings` + `ensureDocumentAudit` 合回 metadata。`browser-transformer.js` 顶层 export `runVerificationStage` / `diffSemanticDocs` / `blockFingerprint` / `modelFingerprint` / `getBlockKey` / `extractBlockFields` / `BLOCK_FIELDS_BY_TYPE` / `ROUND_TRIP_FORMATS` / `MAJOR_WEIGHT` / `MINOR_WEIGHT` / `STRUCTURAL_PENALTY` / `RULE_DIFF_DRIFT` / `RULE_DIFF_READBACK_FAILED`。新增 `scripts/rule-diff-test.js`(10 组断言:diffSemanticDocs 相同/空白minor/语义major/heading-level major/缺多块 broken;runVerificationStage 回读相同/回读抛错/非 text-canonical 不 eligible;端到端 md→md identical、md→html 跨格式回环走通、md→pdf 不 eligible;共享指纹与 legacy 实现字节级等价)。`package.json` test 链在 `repair-engine-test.js` 之后插入 `rule-diff-test.js`。`scripts/local-security-test.js` 把三模块加入 ALLOWED + STRICT 白名单。`scripts/local-model-direction-test.js` 守门关键词加 `runVerificationStage` / `diffSemanticDocs` / `RULE_DIFF_DRIFT`(multiModel)+ `qualityReport.ruleDiff`(tasks)。`docs/MULTI_MODEL_ARCHITECTURE.md` 新增「转换后检验三层 · 规则 diff 层」章节。本轮**不做** SSIM 视觉对比(P9-C.2)/ OCR 回读(P9-C.3);**不让** Repair Engine 消费 ruleDiff(避免循环依赖);**不动** writer/reader 输出语义 / UI / `options.repair === false` 短路;**不引入** 新 npm 依赖;**不开** 除 md↔html 外的跨格式回环。`npm test` 21 个脚本全量通过。 - **2026-05-28 (P9-B OCR → FixedLayoutModel + 浏览器 rasterize 真实化)**:把 OCR 结果接到第三个规范模型 FixedLayoutModel 上、把浏览器/Tauri 端 PDF rasterize 开箱即用。新增 `public/core/ocr/ocr-to-fixed-layout.js`(`ocrResultToFixedLayoutPage` 按 bbox.y → bbox.x 排序 + 携带 confidence;`mergeOCRResultsToFixedLayout` 多页合并 + `metadata.readingOrder = "heuristic-yx"` + `metadata.ocr` 总览;复用 `createFixedLayoutModel` / `createPage` / `createTextRun`)、`public/core/ocr/pdf-rasterizer-browser.js`(`createBrowserPdfPageRasterizer`:dynamic import `/vendor/pdfjs/pdf.min.mjs` + `getDocument({ data })` → `page.getViewport` → `.toDataURL("image/png")`;失败抛 `OCR_RASTERIZER_FAILED` 含 cause)。改造 `public/core/ocr/pdf-rasterizer.js`:`defaultPdfPageRasterizer` 优先级 inject → 浏览器自动 → throw `OCR_RASTERIZER_UNAVAILABLE`;首次调用自动 `import("./pdf-rasterizer-browser.js")`,Node 检测 `globalThis.document?.createElement` 缺失即放弃;`resetPdfPageRasterizer` 同时清两个缓存。改造 `public/core/ocr/scan-pdf-stage.js`:收集每页 pageResult 数组 → `mergeOCRResultsToFixedLayout` → `model.fixedLayout = fixedLayout` → `fixedLayoutToSemantic` 派生 blocks → 发 `MODEL_VISUAL_FIDELITY_LOST` + `MODEL_TEXT_ORDER_HEURISTIC` info warning → `metadata.modelReview.ocr.fixedLayout = getFixedLayoutSummary(...)`;`metadata.ocr.lines` 保留供 Repair Engine validator 使用。`public/core/models/fixed-layout.js` `createTextRun` 加 `confidence`(clamp 到 [0,1])+ `createPage` 加 `readingOrderHint`,不破坏现有 OFD / PDF reader 调用。`browser-transformer.js` 顶层 export `ocrResultToFixedLayoutPage` / `mergeOCRResultsToFixedLayout` / `READING_ORDER_HEURISTIC` / `createBrowserPdfPageRasterizer` / `MODEL_VISUAL_FIDELITY_LOST` / `MODEL_TEXT_ORDER_HEURISTIC` / `createFixedLayoutModel` / `createFixedLayoutPage` / `createFixedLayoutTextRun` / `createFixedLayoutBbox` / `getFixedLayoutSummary` / `fixedLayoutToSemantic`。`scripts/ocr-baseline-test.js` 扩展为 34 组断言(+`ocrResultToFixedLayoutPage` y/x 排序 + confidence 携带;`mergeOCRResultsToFixedLayout` 多页合并 + `getFixedLayoutSummary` 计数;`runScannedPdfOCRStage` stub 端到端后 `model.fixedLayout.pages.length === 2` + `MODEL_VISUAL_FIDELITY_LOST` + `MODEL_TEXT_ORDER_HEURISTIC` warning;`defaultPdfPageRasterizer` inject → auto-browser → throw 优先级)。`scripts/local-security-test.js` 把 `ocr-to-fixed-layout.js` + `pdf-rasterizer-browser.js` 加入 ALLOWED + STRICT 白名单。`scripts/local-model-direction-test.js` 守门关键词加 `ocrResultToFixedLayoutPage` / `mergeOCRResultsToFixedLayout` / `createBrowserPdfPageRasterizer` / `MODEL_TEXT_ORDER_HEURISTIC`。新 spec `docs/superpowers/specs/2026-05-28-p9b-ocr-fixedlayout-design.md`。本轮**不实现高级阅读顺序算法**(multi-column / heading detection 留给 P9-C/D;用 y → x 启发式 + warning)、**不入库真实扫描 PDF fixture**(继续用 stub)、**不动同步 convert() / PNG enhance / Repair Engine handlers / Tauri CSP / npm 依赖**。`npm test` 20 个脚本全量通过。 - **2026-05-28 (P9-A.4 扫描 PDF OCR 检测 + Rasterizer 骨架 + 多页 stage)**:把 OCR 扩展到扫描型 PDF 路径。新增 `public/core/ocr/pdf-rasterizer.js`(`isScannedPdf(content, options)` 启发式:基于 `expandPdfContentForTextExtraction` + 检测 `PDFJS_PAYLOAD_MARKER` + 字符阈值 300;无 payload → 扫描,有 payload 但 < 阈值 → 扫描;`PdfPageRasterizer` 抽象 + `defaultPdfPageRasterizer` Node 默认抛 `OCR_RASTERIZER_UNAVAILABLE`;`setPdfPageRasterizer(impl)` / `resetPdfPageRasterizer()` 让测试注入 stub)、`public/core/ocr/scan-pdf-stage.js`(`runScannedPdfOCRStage(model, ctx)`:拿 rasterizer + engine → countPages → 循环 rasterize 每页 → engine.recognize → 把多页 paragraph blocks 顺序追加到 model + `metadata.ocr.lines` 含 `pageIndex` / `blockId` + `metadata.modelReview.ocr` 总览 pageCount/lineCount/averageConfidence/runtimeMs;错误统一注入 `OCR_ENGINE_FAILED` warning 返回原 model;可选 `options.ocr.maxScanPages`(默认 5)/`dpi`/`scanPdfThreshold`)。`public/core/format-registry.js` `convertAsync` 新增 PDF 分支:检测扫描 PDF → dynamic import `runScannedPdfOCRStage` → 注入 OCR enhancement;文本 PDF 沿用 P8-B 既有路径。`browser-transformer.js` 顶层 export `isScannedPdf` / `runScannedPdfOCRStage` / `defaultPdfPageRasterizer` / `setPdfPageRasterizer` / `resetPdfPageRasterizer` / `OCR_RASTERIZER_UNAVAILABLE` / `OCR_RASTERIZER_FAILED`。`scripts/ocr-baseline-test.js` 扩展为 30 组断言(+isScannedPdf 对无 payload 最小 PDF 返回 scanned=true、defaultPdfPageRasterizer Node 抛 OCR_RASTERIZER_UNAVAILABLE、runScannedPdfOCRStage stub 端到端 2 页追加 + metadata.modelReview.ocr.pageCount=2、convertContentAsync PDF → txt 走 OCR 分支输出含 stub OCR 文本)。`scripts/local-security-test.js` 把 `pdf-rasterizer.js` / `scan-pdf-stage.js` 加入 ALLOWED + STRICT 白名单。`scripts/local-model-direction-test.js` 守门关键词加 `isScannedPdf` / `runScannedPdfOCRStage` / `defaultPdfPageRasterizer`。新 spec `docs/superpowers/specs/2026-05-28-p9a4-scan-pdf-ocr-design.md`。本轮**不实现真实浏览器端 pdfjs canvas 渲染**(留给 P9-B;defaultPdfPageRasterizer Node 默认抛错,浏览器/Tauri 通过 `setPdfPageRasterizer` 注入实现)、**不在仓库加扫描 PDF fixture**(用最小 PDF 头部 + stub rasterizer 覆盖代码路径)、**不动同步 convert() / PNG 异步 stage / Tauri CSP / npm 依赖**。`npm test` 20 个脚本全量通过。 - **2026-05-28 (P9-A.3 PNG 异步 OCR 接入 + Repair Engine OCR 入口)**:把 P9-A.2.b 提供的 `enhanceWithOCR` 接到 PNG 转换链路上、把 OCR 元数据接到 Repair Engine 上。新增 `public/core/ocr/ocr-stage.js`(`runOCRStage(model, ctx)` 包一层 `enhanceWithOCR` + 注入 `OCR_ENGINE_FAILED` 兜底;提供 `getDefaultOCRLanguage`)、`public/core/ocr/ocr-validator.js`(`detectOCRLowConfidence` 从 `metadata.ocr.lines` 取 confidence < 0.55 的行生成 `replaceTextRun` 候选;每页最多 8 条;evidence 含 engineId / language / bbox / pageIndex / lineIndex)。`public/core/ocr/png-ocr.js` 的 `enhanceWithOCR` 现在把 `pages[].lines` 一一写入 `model.metadata.ocr.lines`,含 `{ pageIndex, lineIndex, text, confidence, bbox, blockId }`,让 validator 能用 blockId 反查 paragraph。`public/core/format-registry.js` 抽出 `_buildRepairCtx` 与 `_wrapWithRepairCycle` 共享 helper,新增 `async convertAsync(...)`:与 `convert()` 同样的入口校验和 prepareConversionModel;当 `options?.ocr?.enabled !== false && fromFormat === "png"` 时 await dynamic-import `runOCRStage` 注入 OCR enhancement;之后走同一 `_wrapWithRepairCycle`。`public/core/repair-engine.js` 的 `createDefaultRepairEngine()` 注册 `detectOCRLowConfidence`(在 DEFAULT_VALIDATORS 之后)。`public/browser-transformer.js` 顶层 export `convertContentAsync` / `runOCRStage` / `getDefaultOCRLanguage` / `detectOCRLowConfidence`。`public/app.js` 的 `convertWithWorker(payload)` 在 worker 不可用时检测 `payload.from === "png"` 改走 `convertContentAsync`,其他格式仍走 `convertContent`。`samples/png/t2f-sample.data-url.txt`(80×24 灰度 PNG,白底黑字"T2F",118 字节 base64 后约 182 字符)+ `samples/png/README.md` 说明用途与浏览器端真实 OCR 验证步骤。`scripts/ocr-baseline-test.js` 扩展为 26 组断言(+convertContentAsync 在 ocr.enabled=false 时返回 writer payload、stub engine 注册后输出文本包含 stub OCR 内容、runOCRStage 持久化 metadata.ocr.lines、detectOCRLowConfidence 对 confidence < 0.55 / >= 0.55 两种场景的行为、t2f-sample fixture 不抛错)。`scripts/local-model-direction-test.js` 守门关键词加 `convertContentAsync` / `runOCRStage` / `detectOCRLowConfidence`。新 spec `docs/superpowers/specs/2026-05-28-p9a3-async-ocr-pipeline-design.md`。本轮**不破坏现有同步 convert() / convertContent()**(所有 20 个测试脚本与 smoke-test 调用方完全不变),不做扫描 PDF,不修改 Tauri CSP,不引入新 npm 依赖,不真实跑 OCR 在 npm test(用 stub engine 覆盖代码路径)。`npm test` 20 个脚本全量通过。 @@ -131,5 +154,6 @@ - `npm test` - `git diff --check` - `npm run release:prepare` -- `git check-ignore -v release\trans2former-2.2.0\RELEASE_MANIFEST.json` +- `git check-ignore -v release\trans2former-2.3.0\RELEASE_MANIFEST.json` +- `npm run samples:generate`(按需生成覆盖全格式的复杂/大样例语料到 samples/generated/,能力压力测试用) - `npm run desktop:build`(Windows P7-A / 发布前安装包验证) diff --git a/README.md b/README.md index e2e0cee..e66ef2a 100644 --- a/README.md +++ b/README.md @@ -5,8 +5,8 @@ Trans2Former 是一个专业级的桌面文档转换工具,支持 12 种输入格式和 11 种输出格式的相互转换。所有转换在本地完成,零上传,保护您的数据隐私。 [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) -[![Tests](https://img.shields.io/badge/smoke-46%20groups%20passing-brightgreen.svg)](#) -[![Version](https://img.shields.io/badge/version-2.2.0-blue.svg)](#) +[![Tests](https://img.shields.io/badge/tests-28%20scripts%20passing-brightgreen.svg)](#) +[![Version](https://img.shields.io/badge/version-2.3.0-blue.svg)](#) --- @@ -16,8 +16,9 @@ Trans2Former 是一个专业级的桌面文档转换工具,支持 12 种输入 - 🚀 **高性能** - 基于 Web Worker 的并行处理 - 📦 **零依赖** - 不需要安装 Office、LibreOffice 或 Pandoc - 🎨 **实时预览** - 转换前后实时预览文档 -- 📝 **结构化编辑** - 支持编辑转换后的文档结构 -- 🧩 **核心内置增强** - OFD、OCR、版面分析等能力代码核心内置;OCR 模型资源按需本地下载到 model-cache,不进入默认安装包 +- 🔤 **本地 OCR** - 图片 / 扫描 PDF 用内置 PP-OCRv5(ONNX Runtime + WebGPU)本地识别,含方向校正、倾斜纠偏、自适应去噪、版面结构(标题/段落)与质量评分 +- 🧮 **LaTeX 渲染** - 预览中用本地 KaTeX 排版 `$...$` / `$$...$$` 数学公式 +- ✅ **转换后检验** - 规则 diff + SSIM 视觉对比 + OCR 回读三层组合统一写入 QualityReport,工作台可视 - 🌍 **多语言** - 支持中英文、RTL 文本等 - ⚡ **无大小限制** - 不设置人为文件大小上限 @@ -120,23 +121,24 @@ Trans2Former/ - **批量转换** - 同时转换多个文件 - **编辑输出** - 直接编辑转换后的文本 - **版本历史** - 查看和恢复历史版本 -- **质量报告** - 查看转换质量和警告 -- **质量报告** - 查看转换质量和警告 +- **转换检验报告** - 规则 diff / SSIM / OCR 回读 + OCR 识别质量评分 --- ## 🧩 核心本地增强 -Trans2Former 不再提供插件安装模式,增强能力代码直接并入核心本地模块;默认安装包目标 30–80 MB,不内置 GB 级模型,相关模型资源按需本地下载到 model-cache: +Trans2Former 不再提供插件安装模式,增强能力代码直接并入核心本地模块;模型资源不进入 git 仓库,由 vendor 脚本 + 本地下载重建,随应用打包: -- **OFD 支持** - 政务格式支持 -- **本地 OCR** - 扫描文档识别(首次启用时下载本地 OCR 模型到 model-cache,识别全程本机执行) -- **版面分析** - 复杂布局识别 -- **表格恢复** - PDF 表格提取 -- **转换后检验** - 规则 diff + SSIM 视觉对比 + OCR 回读三层组合写入 QualityReport -- **高级 OCR**(规划中)- PaddleOCR-VL / MinerU 等大模型作为独立本地资源按需获取 +- **本地 OCR(PP-OCRv5)** - 图片 / 扫描 PDF 经 ONNX Runtime + WebGPU(WASM 回退)在本机识别;含 cls 方向校正(180°,可选模型)、任意角倾斜自动纠偏、自适应中值去噪、版面结构识别(按字号/间距归并标题+段落)、识别质量评分(grade / 置信度 / 低置信行 / 纠偏 / 去噪),全程零联网、零上传 +- **轻量 OCR(Tesseract.js)** - 可选的轻量 OCR 引擎,按需在安全中心导入 tessdata +- **转换后检验三层** - 规则 diff + SSIM 视觉对比 + OCR 回读统一写入 QualityReport,工作台「转换检验报告」可视 +- **LaTeX 数学渲染** - 本地 KaTeX,零联网 +- **OFD 支持 / 版面分析 / 表格恢复** - 核心内置,持续攻坚 +- **高级 OCR(远期)** - PaddleOCR-VL / MinerU 等 VLM 受浏览器/Tauri 本地运行时限制,作为远期/外部资源评估(详见 docs) -这些能力不通过插件包分发;后续实现必须继续保持本地执行、无上传、可解释降级和资源预算约束。 +> 运行高级 OCR:`npm install onnxruntime-web && npm run vendor:onnx`。PP-OCRv5 mobile 检测/识别模型与字典由 `npm run vendor:paddle` 从钉定来源下载、SHA-256 校验(见 [scripts/paddleocr-models.manifest.json](scripts/paddleocr-models.manifest.json))并随 `release:prepare` 打包,启动自动载入、开箱即用;方向分类(cls)为可选,可在安全中心导入/替换。详见 [docs/PP_OCRV5_BROWSER_VERIFICATION.md](docs/PP_OCRV5_BROWSER_VERIFICATION.md)。 + +这些能力不通过插件包分发;实现继续保持本地执行、无上传、可解释降级和资源预算约束。 --- @@ -160,14 +162,14 @@ Trans2Former 严格遵守本地优先原则: npm test ``` -测试覆盖: -- ✅ 核心转换测试(44/44 通过) -- ✅ 快照测试 -- ✅ 格式能力审计 -- ✅ 安全测试 -- ✅ 资源预算测试 -- ✅ 本地安全测试 -- ✅ 发布就绪测试 +测试覆盖(28 个脚本全量通过): +- ✅ 核心转换 / 快照 / 格式能力审计 +- ✅ 转换检验三层(规则 diff / SSIM / OCR 回读) +- ✅ OCR 管线(预处理 / DB 后处理 / CTC 解码 / 方向 / 倾斜 / 去噪 / 结构)+ 真实模型集成(onnxruntime-node,缺依赖优雅跳过) +- ✅ LaTeX 数学 tokenization +- ✅ 安全 / 资源预算 / 方向门禁 / 发布就绪 + +> 复杂/大体积样例语料:`npm run samples:generate` 程序化产出覆盖全格式、大小不一(large ≥ 3MB)的测试样例到 `samples/generated/`(gitignore)。 --- @@ -221,7 +223,7 @@ npm test 1. **复杂样式** - 部分复杂样式可能无法完全保留 2. **图表动画** - PPTX 动画和图表需要后续核心增强 -3. **扫描 PDF** - 扫描文档需要后续核心 OCR 能力 +3. **OCR 难例** - 强斜体 / 复杂艺术字识别仍受限(常规、倾斜、倒置、带噪文档已支持);真实 ONNX 推理在浏览器 / Tauri(WebGPU/WASM)执行 4. **ZIP64** - 暂不支持超大 ZIP 文件 这些限制将在后续版本中通过核心本地模块逐步解决。 @@ -231,20 +233,19 @@ npm test ## 🗺️ 路线图 ### 已完成 ✅ -- [x] P0-P8 核心功能 -- [x] 12 种输入格式 -- [x] 11 种输出格式 -- [x] 核心本地能力路线 -- [x] 桌面发布准备 +- [x] P0-P8 核心功能 + 12 种输入 / 11 种输出格式 +- [x] 转换后检验三层(规则 diff + SSIM + OCR 回读)+ 工作台可视 +- [x] 本地 OCR(PP-OCRv5:识别 + 方向校正 + 倾斜纠偏 + 去噪 + 版面结构 + 质量评分) +- [x] LaTeX 数学渲染(KaTeX) +- [x] Windows 桌面发布(MSI / NSIS) ### 进行中 🚧 -- [ ] 平台安装包构建 -- [ ] SSIM 视觉对比 -- [ ] 性能优化 +- [ ] 跨平台安装包(macOS / Linux)+ 签名公证 +- [ ] OCR 表格结构识别 → Markdown 表格 ### 计划中 📋 -- [ ] 本地 OCR 核心增强 -- [ ] 版面分析核心增强 +- [ ] 强斜体 / 艺术字识别增强 +- [ ] 高级 OCR(PaddleOCR-VL / MinerU)本地运行时评估 - [ ] 更多格式支持 --- diff --git a/RELEASE_NOTES_v2.3.0.md b/RELEASE_NOTES_v2.3.0.md new file mode 100644 index 0000000..016c73d --- /dev/null +++ b/RELEASE_NOTES_v2.3.0.md @@ -0,0 +1,47 @@ +# Trans2Former v2.3.0 + +本地优先的多格式文档转换器。本版把**本地 OCR**从骨架做到真机可用,新增 **LaTeX 数学渲染**与**转换后检验三层可视化**,全程零上传、零联网。 + +## 新增 + +### 本地 OCR(PP-OCRv5) +图片 / 扫描 PDF 经 **ONNX Runtime + WebGPU**(WASM 回退)在本机识别,应用内置一套 PP-OCRv5 mobile 模型,启动自动载入、开箱即用,可在安全中心导入/替换。完整管线: + +- 图像预处理 + **DB 检测后处理(连通域 + unclip 外扩)** + **CTC 解码 + 字典对齐** +- **方向校正**(cls 180°)、**竖排/侧向 90° 试转** +- **任意角倾斜自动纠偏**(错切投影估角 → 旋正重检) +- **自适应去噪**(仅噪图去噪,干净图不受影响) +- **版面结构识别**(按字号/间距归并为标题 + 段落 + 阅读顺序) +- **识别质量评分**(grade / 置信度 / 低置信行 / 纠偏 / 去噪),工作台「转换检验报告」可视 + +> 真机实测:rec 解词图为 "PAIN";产品标签 0.978;倒置文档 0.976;+10° 倾斜 8→16 行恢复至 0.970;15% 椒盐噪点 4→16 行恢复。 + +### LaTeX 数学渲染 +`$...$` / `$$...$$` 受保护识别(反斜杠 / 下划线逐字保留,货币不误判),预览用本地 **KaTeX** 排版,零联网。 + +### 转换后检验三层 +规则 diff + SSIM 视觉对比 + OCR 回读,统一写入 `qualityReport` 并在工作台可视;核心算法零依赖、纯函数。 + +### 其他 +- 轻量 OCR(Tesseract.js)可选引擎 + 优先级路由 +- Repair Engine + 按需模型缓存(manifest / SHA-256 / 安全中心导入) +- `npm run samples:generate`:全格式、大小不一(≥3MB)测试样例语料生成器 + +## 修复 + +- 首页空白(缺 `getKnownInputFormats` re-export) +- OCR 实际不触发(转换绕过 OCR 异步管线) +- 冻结引擎导致安全中心模型导入静默失败 +- OCR 识别质量被 Repair Engine 覆盖丢失 + +## 方向调整 + +高级 OCR 内置目标从 **PaddleOCR-VL / MinerU(VLM)** 调整为 **PP-OCRv5(ONNX/WebGPU)**——VLM 在浏览器/Tauri 本地 + 零云端 + 轻量默认包约束下不可内嵌,标注为远期/外部资源。 + +## 升级指南 + +从 v2.2.x 升级:直接覆盖部署即可,无破坏性 API 变更。运行高级 OCR 需 `npm install onnxruntime-web && npm run vendor:onnx`(应用已内置 PP-OCRv5 模型,开箱即用)。 + +## 全部测试通过 + +`npm test` 全套 **28 个脚本**通过;`git diff --check`、`npm run release:prepare` 通过。Windows MSI / NSIS 安装包经真实 `npm run desktop:build` 产出。 diff --git a/docs/CONVERSION_ROUTING.md b/docs/CONVERSION_ROUTING.md index 3dd0fcb..3757af0 100644 --- a/docs/CONVERSION_ROUTING.md +++ b/docs/CONVERSION_ROUTING.md @@ -150,6 +150,6 @@ P8-B 的首批执行链已闭环。后续按 2026-05-28 [lightweight-default-bun - **P9-A OCR 基线**:PNG 与扫描 PDF 接入轻量 OCR(Tesseract.js / 轻量 PaddleOCR),OCR 模型资源按需下载到 model-cache,不进入默认安装包。 - **P9-B OCR → FixedLayoutModel**:OCRResult → FixedLayoutModel → SemanticDoc,保留 bbox / confidence / page index / reading order,让 Repair Engine 的 `fixedLayoutToSemantic` 路径获得真实证据。 - **P9-C 转换后检验**:规则 diff + SSIM 视觉对比 + OCR 回读三层组合统一写入 QualityReport,作为核心差异化能力提升。 -- **P9-D 高级 OCR**:PaddleOCR-VL / MinerU 等大模型作为独立本地资源按需下载,明确体积、运行内存、降级路径。 +- **P9-D 高级 OCR**:内置目标为 PP-OCRv5(ONNX Runtime + WebGPU,WASM 回退),作为比 Tesseract 更高精度的本地 OCR engine 按需下载 ONNX 模型;PaddleOCR-VL / MinerU 等 VLM 为远期/外部资源(浏览器/Tauri 本地暂不可内嵌)。 在 P9-A 启动前必须先完成 **S3 按需下载与本地缓存治理**:定义 model-cache 目录结构、manifest、checksum、可清理入口、断网降级提示和首次启用下载提示流程。`slide` / `fixedLayout` 路径的 writer 能力和视觉质量证据在 P9-B/C 推进过程中补齐;在证据成立前不将规划边提升为实际 mapper 执行。 diff --git a/docs/DESKTOP_APP_ARCHITECTURE.md b/docs/DESKTOP_APP_ARCHITECTURE.md index 03e5042..331201d 100644 --- a/docs/DESKTOP_APP_ARCHITECTURE.md +++ b/docs/DESKTOP_APP_ARCHITECTURE.md @@ -133,7 +133,8 @@ Tauri + Web-GUI 是当前最合理的桌面路线。 | 重格式 | 核心本地按需加载 | | 默认安装包 | 30–80 MB,不打包 GB 级模型 | | OCR 模型 | 不进入默认安装包;首次启用时本地下载到 model-cache,可清理、可禁用 | -| 高级 OCR(PaddleOCR-VL / MinerU 等) | 独立本地资源,按需获取,明确体积与硬件要求 | +| 高级 OCR(PP-OCRv5 ONNX/WebGPU) | 内置目标:比 Tesseract 更高精度的本地 OCR engine,按需下载 ONNX 模型到 model-cache | +| VLM(PaddleOCR-VL / MinerU 等) | 远期/外部资源;浏览器/Tauri 本地暂不可内嵌,明确体积与硬件要求 | ## 当前实现 diff --git a/docs/DESKTOP_RELEASE_PLAN.md b/docs/DESKTOP_RELEASE_PLAN.md index 55a0513..994efd1 100644 --- a/docs/DESKTOP_RELEASE_PLAN.md +++ b/docs/DESKTOP_RELEASE_PLAN.md @@ -76,7 +76,7 @@ Trans2Former__checksums.sha256 - OCR 模型资源不进入默认安装包;首次启用时本地下载到 model-cache,必须提供 manifest、checksum、缓存路径、可清理入口、体积报告、断网降级提示和失败 fallback,处理过程不上传任何文档内容。 - `release:prepare` 必须依次执行 `sync-pdfjs-vendor` 与 `sync-tesseract-vendor`;后者在 `tesseract.js` optionalDependency 缺失时退出 0,不阻塞 CI/发布流程。 - Tauri CSP 必须保留 `'wasm-unsafe-eval'`(让本地 tesseract.js wasm 在 WebView 中可实例化),且 `connect-src 'self'` 不可放开 —— 模型资源仅同源 vendor 与本地 IndexedDB,禁止任何远程 URL。 -- 高级 OCR 资源(PaddleOCR-VL / MinerU 等大模型)作为独立本地资源按需获取,启用前展示体积、运行内存、降级路径和失败提示。 +- 高级 OCR 内置目标为 PP-OCRv5(ONNX Runtime + WebGPU,WASM 回退),ONNX 模型按需下载到 model-cache,启用前展示体积、运行内存、降级路径和失败提示;PaddleOCR-VL / MinerU 等 VLM 为远期/外部资源(浏览器/Tauri 本地暂不可内嵌)。 - 转换后检验三层(规则 diff、SSIM 视觉对比、OCR 回读)必须可在断网状态运行,验证 Repair Engine 修复后的输出质量并写入 QualityReport。 - 文档处理模式始终禁止网络访问。 diff --git a/docs/MULTI_MODEL_ARCHITECTURE.md b/docs/MULTI_MODEL_ARCHITECTURE.md index 6d691fa..514b220 100644 --- a/docs/MULTI_MODEL_ARCHITECTURE.md +++ b/docs/MULTI_MODEL_ARCHITECTURE.md @@ -220,6 +220,59 @@ S2 已落地为 `public/core/repair-engine.js`、`public/core/repair-actions.js` - `createTextRun` 新增 `confidence` 字段(clamp 到 [0,1]);`createPage` 新增 `readingOrderHint` 字段。 - 不实现高级阅读顺序(multi-column / heading detection)——留给 P9-C / P9-D。 +### 转换后检验三层 · 规则 diff 层(P9-C.1 落地) + +转换后检验是项目核心差异化能力,三层组合(规则 diff + SSIM 视觉对比 + OCR 回读)统一写入 `qualityReport`。P9-C.1 落地第一层规则 diff 与统一编排骨架: + +- `runVerificationStage({ model, output, ctx })` (`public/core/verification/verification-stage.js`):在 Repair Engine `runCycle` 之后跑的独立验证阶段,只读不改 output,结果写入 `qualityReport.ruleDiff` 与 `qualityReport.verification` envelope(`eligible / reason / layers / skipped / runtimeMs`)。层名列表 `layers` 当前只含 `"rule-diff"`,P9-C.2 加 `"ssim"`、P9-C.3 加 `"ocr-readback"`。 +- `diffSemanticDocs(original, readBack)` (`public/core/verification/rule-diff.js`):在原始 SemanticDoc 与 writer→reader 回读 model 之间做字段级 diff,输出 `{ identical, blockCounts, changedBlocks, addedBlocks, removedBlocks, fidelity, overallScore }`。`fidelity` ∈ `exact / minor-drift / major-drift / broken`;`overallScore` 由 `MAJOR_WEIGHT / MINOR_WEIGHT / STRUCTURAL_PENALTY` 加权惩罚算出。 +- 共享指纹模块 `public/core/verification/block-fingerprint.js`:`blockFingerprint` / `modelFingerprint`(从 Repair Engine 抽出,行为不变)+ `getBlockKey` / `extractBlockFields` / `BLOCK_FIELDS_BY_TYPE` 字段子集 + `ROUND_TRIP_FORMATS` 单一来源。Repair Engine 的 `reverifyRoundTrip` / `roundTripDelta` 改 import 共享 `modelFingerprint`,作为粗粒度兼容层与细粒度 `ruleDiff` 并存。 +- 资格判断:from/to 都在 text-canonical 集合(md/html/json/csv/txt/xml)且 `output.data` 为字符串才跑;同格式直接回读 diff,跨格式仅首批开放 `md ↔ html` 回环;其余 writer(PDF/DOCX/XLSX/PPTX/PNG/OFD/EPUB)记 `eligible: false, reason: "writer-not-text-canonical"`,不阻塞转换。 +- 失败兜底:回读抛错发 `RULE_DIFF_READBACK_FAILED`(info);`fidelity !== "exact"` 发 `RULE_DIFF_DRIFT`(info),details 含 from/to/fidelity/score/added/removed/changed 摘要。 +- 本阶段不让 Repair Engine 消费 `ruleDiff`(避免循环依赖),UI 验证卡片留给 P9-C.2 落地后统一做。 + +### 转换后检验三层 · SSIM 视觉回环层(P9-C.2 落地) + +第二层 SSIM 视觉对比采用**视觉回环**语义:对视觉保真型输入(PDF / PNG),把输入页与输出页各自栅格化为像素做结构相似度对比,写入 `qualityReport.ssim`,衡量「这次转换是否保住视觉外观」。 + +- `computeSSIM(grayA, grayB, width, height, opts)` (`public/core/verification/ssim.js`):纯函数、零依赖、非重叠窗口均值 SSIM(C1=6.5025 / C2=58.5225);配套 `rgbaToGrayscale` / `resampleGrayscale`(box 重采样到公共网格)/ `compareImages`(两图归一后算分)。Node / 浏览器均可运行、完整可测。 +- `runVerificationStageAsync({ model, output, ctx })` (`public/core/verification/verification-stage.js`):异步编排,先调同步 `runVerificationStage` 拿 rule-diff 基底,再跑 `runSsimLayer` 视觉回环,合并 `layers` / `skipped` / `warnings` / `runtimeMs` + `ssim` 字段。同步 `runVerificationStage` 不变,供 sync `convert()` 用(其 `qualityReport.ssim` 恒为 `null`)。 +- `runSsimLayer({ ctx, output })`:资格判断 `ctx.from ∈ {pdf,png}` 且 `ctx.to ∈ {pdf,png}`(当前实际命中 `pdf→pdf` / `png→pdf`);经 `defaultPageImageSource` 取源图 + 输出图像素 → `compareImages` → `qualityReport.ssim = { score, threshold, passed, width, height, pageIndex, sourceFormat, outputFormat }`;低于阈值(默认 0.85)发 info 级 `SSIM_VISUAL_DRIFT`。 +- 像素源抽象 `public/core/verification/page-image-source.js`:`defaultPageImageSource`(Node 抛 `VERIFICATION_IMAGE_SOURCE_UNAVAILABLE`;浏览器首次调用 dynamic import `page-image-source-browser.js` 用 vendor pdfjs + canvas `getImageData` 取 RGBA)+ `setPageImageSource` / `resetPageImageSource` 让测试注入 stub。 +- `format-registry.js` 抽出 `_runRepairCycle` / `_assembleQuality` 共享,`convert()`(sync)走 `_wrapWithRepairCycle`(rule-diff),`convertAsync()` 走 `_wrapWithRepairCycleAsync`(rule-diff + SSIM)。`options.repair === false` 仍短路整个验证阶段。 +- 注意:Trans2Former 的 `pdf → pdf` 走「reader 抽文本 → writer 重排版」,视觉本就不保真,SSIM 偏低是**诚实信号**,故仅发 info warning,不判失败、不阻塞。本轮渲染 stub-only(Node 无 canvas;真实 PDF/PNG 渲染 fixture + 浏览器端端到端验证留给后续)。 + +### 转换后检验三层 · OCR 回读层(P9-C.3 落地) + +第三层也是收口层 OCR 回读:把转换输出(当前仅 PDF)栅格化后用 OCR 引擎读回文本,与原始 SemanticDoc 文本对照,写入 `qualityReport.ocrReadback`,回答「转成视觉格式后文字还认得回来吗」。 + +- `compareText(original, recognized)` (`public/core/verification/ocr-readback.js`):纯函数、零依赖、**字符级多重集** recall / precision / f1(配 `normalizeText` NFKC + 小写 + 去空白)。字符级对中英文混排与 OCR 噪声稳健,无需分词。 +- `extractModelText(model)`:拼接 block 文本(heading/paragraph/quote/list/table/code/content)。 +- `runOcrReadbackLayer({ model, output, ctx, engine?, rasterizer? })`:资格 `ctx.to === "pdf"` 且原文非空 且 OCR engine 可用;经 OCR `defaultPdfPageRasterizer` 栅格化输出 PDF → `engine.recognize` → `compareText` → `qualityReport.ocrReadback = { recall, precision, f1, threshold, passed, engineId, originalLength, recognizedLength, averageConfidence }`;低于阈值(默认 f1 ≥ 0.7)发 info `OCR_READBACK_DRIFT`;engine/rasterizer 不可用或 recognize 抛错 → eligible:false(`OCR_READBACK_FAILED` info),不抛、不阻塞。 +- `runVerificationStageAsync` 末尾 dynamic import `ocr-readback.js` 跑第三层,合并进 envelope;`qualityReport.ocrReadback` 同步路径恒 `null`。 +- 命中路径:`md/html/txt/json/xml/docx/doc/epub/csv/xlsx → pdf`(凡产出 PDF 的文本路径),engine 复用已注册的 `ocr-text`(tesseract,需用户导入 tessdata)。 +- 本轮 stub-only(Node 无 canvas/tessdata;真实 OCR 回读端到端留给浏览器手动验证);不让 Repair Engine 消费 `ocrReadback`;高级 OCR 属 P9-D。 + +至此 P9-C 三层检验(rule-diff + ssim + ocr-readback)齐备,统一写入 `qualityReport.{ ruleDiff, ssim, ocrReadback }` + `qualityReport.verification` envelope。 + +### 高级 OCR · PP-OCRv5 (ONNX/WebGPU)(P9-D 方向 + P9-D.1 骨架) + +调研结论([2026-05-29-p9d-advanced-ocr-research.md](superpowers/specs/2026-05-29-p9d-advanced-ocr-research.md)):**PaddleOCR-VL(0.9B VLM)/ MinerU** 在「浏览器/Tauri 本地 + 零云端 + 30–80MB 轻量默认包」约束下当前不可内嵌(VLM 无成熟 ONNX/WebGPU 路径、需 ~500MB + 1–2GB VRAM 或 vLLM 服务;MinerU 是 Python/vLLM 工具)。因此 **P9-D 高级 OCR 的内置目标改为 PP-OCRv5(ONNX Runtime + WebGPU,WASM 回退)**;PaddleOCR-VL / MinerU 标注为**远期/外部资源**,不作为内置路径。 + +P9-D.1 骨架(同 tesseract 骨架先行): + +- `paddleOcrEngine`(`public/core/ocr/paddle-ocr-engine.js`,id `paddleocr-v5`,taskCapabilities `["ocr-text","ocr-layout"]`)实现现有 `OCREngine` 契约,注册到 `defaultOCRRegistry`;`isAvailable()` 检查 vendor 就位 + det/cls/rec 模型在本地缓存,Node/未就位恒 false;`recognize()` 三阶段拒绝(vendor-not-ready / model-missing / runtime-not-wired)。 +- `paddle-ocr-bootstrap.js` 注册 PP-OCRv5 ONNX ModelManifest(`engine: "paddleocr"`,int8,det/cls/rec perFile 占位)到 `defaultModelCache`,状态 `not-downloaded`,按需下载到 `model-cache`。 +- P9-D.1 本轮不引入 onnxruntime-web、不实跑推理。 + +P9-D.2 接入 onnxruntime-web 运行时骨架:`onnxruntime-web` 作为 optionalDependency + `scripts/sync-onnxruntime-vendor.js`(缺包 exit 0)同步到 `public/vendor/onnxruntime/`;`public/core/ocr/paddle-ocr-runtime.js` 提供 `loadOnnxRuntime`(dynamic import 同源 vendor ORT,设 `ort.env.wasm.wasmPaths` 同源、Node 抛 `OCR_VENDOR_LOAD_FAILED`)、`pickExecutionProviders`(`navigator.gpu` → `["webgpu","wasm"]`,否则 `["wasm"]`)、`createOcrSession`/`disposeOcrSession` 骨架。`paddleOcrEngine.recognize` 第三阶段经 `loadOnnxRuntime()`,浏览器装好 vendor + 模型后以 `pipeline-not-wired` 拒绝(det/cls/rec 推理管线 + CTC 解码留给 P9-D.2.b)。Tauri CSP 已含 `wasm-unsafe-eval` + `worker-src blob:` + `connect-src 'self'`,无需改动。后续:P9-D.4 接入转换链并让 paddle 在可用时优先于 tesseract。 + +P9-D.3 模型导入与安全中心管理:复用 tesseract tessdata 的**本地导入**模式(禁联网,不做远程 fetch)。安全中心「模型缓存」对 PP-OCRv5 行渲染导入 det/cls/rec onnx + 清除按钮;导入走 file picker → `sha256Hex` → `defaultOCRStorage.put("paddleocr/v5/")` → `paddleOcrEngine.ensureProbe()`,三件齐全才 `markPaddleOcrVendorReady(true)` + 状态 `available`。**同时修复一个潜伏 bug**:`paddleOcrEngine` / `tesseractOCREngine` 的就绪状态原先存在 `Object.freeze` 后的实例属性上,`ensureProbe()` 赋值在严格模式(ES module)下抛 `Cannot assign to read only property`,会让安全中心导入流程静默失败;改为模块级可变变量持有就绪状态,引擎对象仍冻结。 + +P9-D.2.b 推理管线:`public/core/ocr/paddle-ocr-pipeline.js` 提供纯函数 `parseCharDictionary` / `preprocessForDetection`(ImageNet 归一化 + 32 倍数 + limit_side_len)/ `preprocessForRecognition`(高 48 + [-1,1] 归一化)/ `dbPostProcess`(阈值二值化 + 4-连通域 + 轴对齐 bbox + box 分数过滤 + 缩放回原图)/ `ctcGreedyDecode`(逐时刻 argmax → 折叠连续重复 → 去 blank → 映射字典)/ `cropImageData` / `resizeRgba`,以及编排器 `runPaddlePipeline({ ort, detSession, clsSession, recSession, imageData, dictionary, options })` → OCRResult。`paddleOcrEngine.recognize` 在浏览器把 image 解码为 RGBA(Image+canvas,不用 fetch)→ 从本地缓存取 det/cls/rec 模型 + 可选字典 `paddleocr/v5/dict.txt` → `createOcrSession` ×3 → `runPaddlePipeline`;Node/未就位仍在 `loadOnnxRuntime` 前置拒绝。纯函数 + 编排器(mock session)在 Node 完整单测,真实模型端到端为浏览器手动。本轮不做 cls 角度旋转校正、不做 minAreaRect+unclip 高精度框。 + +P9-D.4 接入转换链(路由偏好):`OCREngineRegistry.pickForTask` 改为**优先级感知**——候选按 `priority` 降序挑第一个 available(缺省 0),无可用时回退末位。引擎优先级 `placeholderOCREngine=0` / `tesseractOCREngine=10` / `paddleOcrEngine=20`,因此 PP-OCRv5 可用时优先于 tesseract。PNG / 扫描 PDF stage(`enhanceWithOCR` / `runScannedPdfOCRStage`)经 `defaultOCRRegistry.pickForTask("ocr-text")` 取引擎,自动选到可用的最高优先级引擎,无需改动。至此 P9-D PP-OCRv5 本地高级 OCR 链路(契约 → 运行时 → 模型导入 → 推理管线 → 路由偏好)齐备,剩真实模型 + 字典导入后的浏览器端端到端验证。 + ## 不做什么(明确边界) - **不引入 DOCX / HTML / PDF 文件级 pivot**:pivot 是内存对象,不是落盘文件。 diff --git a/docs/PP_OCRV5_BROWSER_VERIFICATION.md b/docs/PP_OCRV5_BROWSER_VERIFICATION.md new file mode 100644 index 0000000..f97ee72 --- /dev/null +++ b/docs/PP_OCRV5_BROWSER_VERIFICATION.md @@ -0,0 +1,55 @@ +# PP-OCRv5 高级 OCR 浏览器端到端验证清单 + +适用:P9-D 全链路(.1 契约 / .2 onnxruntime-web 运行时 / .3 模型导入 / .2.b 推理管线 / .4 路由偏好)已在 Node 侧以 mock + 纯函数全覆盖。**真实模型推理只能在浏览器/Tauri(WebGPU/WASM)跑**,本清单给出手动验证步骤。 + +## 前置:准备 vendor 与模型 + +1. 安装可选依赖并同步 vendor(一次性): + ``` + npm install onnxruntime-web + npm run vendor:onnx # 同步 ort*.mjs + *.wasm 到 public/vendor/onnxruntime/ + ``` + (`npm install tesseract.js && npm run vendor:tesseract` 同理,可选,用于对比轻量 OCR。) + +2. 准备 PP-OCRv5 ONNX 模型与字典: + ``` + npm run vendor:paddle # 从钉定来源下载 det/rec + 字典,SHA-256 校验后写入 public/vendor/paddleocr/ + ``` + - `det.onnx`(DB 文本检测)—— **必选**,随包同步 + - `rec.onnx`(CTC 文本识别)—— **必选**,随包同步 + - `dict.txt`(PP-OCRv5 keys)—— **必选**,随包同步(默认字典已随包,无需手动导入) + - `cls.onnx`(方向分类,180°)—— **可选**,不随包;管线缺它也能跑(跳过方向校正)。如需启用,在安全中心导入键名 `paddleocr/v5/cls.onnx` + + 说明:`vendor:paddle` 来源与校验和钉死在 [scripts/paddleocr-models.manifest.json](../scripts/paddleocr-models.manifest.json)(`ppu-paddle-ocr-models`,Apache-2.0,onnx 源自 paddleocr.ai)。模型文件被 `.gitignore` 忽略、不入库,随 `release:prepare` 从磁盘打进发布包。下载仅发生在构建期;转换/识别阶段零联网、零上传。 + +## 步骤:浏览器/Tauri 内验证 + +1. 启动:`npm start`(浏览器)或 `npm run desktop:dev`(Tauri)。 +2. 打开**安全中心** → 「模型缓存」card,找到 **PP-OCRv5 高级 OCR (ONNX/WebGPU)** 行: + - 若已跑过 `npm run vendor:paddle`,det/rec/dict 随包,启动即自动载入、状态直接 **可用**,无需手动导入。 + - 手动覆盖/替换:点「导入 det.onnx / rec.onnx」选择对应文件;**必选 det+rec 齐全**后状态即变 **可用**(SHA-256 校验通过)。cls.onnx 为可选导入(方向校正)。 + - (字典:默认 `dict.txt` 已随包,无需手动导入。自定义字典暂无专用按钮——可临时在 console 调 `defaultOCRStorage.put("paddleocr/v5/dict.txt", buf, {sha256})`;专用按钮列为已知后续。) +3. 验证**引擎优先级**:在 console 执行 + ```js + const m = await import("/browser-transformer.js"); + m.defaultOCRRegistry.pickForTask("ocr-text").id; // 期望 "paddleocr-v5"(paddle 可用时优先于 tesseract) + ``` +4. 验证**真实识别**:上传一张含文字的 PNG,转换到 `txt` / `md`: + - 期望输出包含识别出的文字。 + - 期望「转换检验报告」面板出现,`ocrReadback` 行显示 f1/recall(若 OCR 引擎可用)。 +5. 验证**扫描 PDF**:上传扫描型 PDF(无文本层),转换: + - 期望经 `isScannedPdf` 检测 → rasterize → PP-OCRv5 识别 → 文本输出 + FixedLayoutModel。 +6. 验证**禁联网**:打开 DevTools Network 面板,整个导入 + 识别过程**不应有任何远程请求**(仅同源 vendor / blob / dataURL)。安全中心「对外部请求监控」应为空。 + +## 通过标准 + +- 必选 det+rec(+随包 dict)就位后 PP-OCRv5 行为「可用」,`pickForTask` 选中 `paddleocr-v5`;cls 为可选(仅影响 180° 方向校正)。 +- 含文字 PNG / 扫描 PDF 经高级 OCR 得到合理识别文本(精度取决于所用模型)。 +- 三层检验报告出现,rule-diff 恒在文本路径触发,ocrReadback 在 OCR 可用时触发。 +- 全程无远程网络请求。 + +## 已知后续(不阻塞本验证) + +- 字典导入专用按钮(当前可用 console 兜底)。 +- cls 角度旋转校正、minAreaRect+unclip 高精度框、多栏阅读顺序(精度增强)。 +- 真实 PDF/PNG 渲染 baseline 与 SSIM/OCR 回读 fixture 入库。 diff --git a/docs/PRODUCT_STRATEGY.md b/docs/PRODUCT_STRATEGY.md index 4cf5090..b09f12f 100644 --- a/docs/PRODUCT_STRATEGY.md +++ b/docs/PRODUCT_STRATEGY.md @@ -31,7 +31,7 @@ Tauri 桌面壳 - 中间模型优先:新增格式必须走 `input -> DocumentModel -> output`,避免 `N * N` 转换路线。 - 软件自动修复:质量问题由本地模型识别并由 Repair Engine 执行结构化修复,用户不承担逐项修复判断。 - 核心模块优先:默认安装包保持轻量(目标 30–80 MB);热门基础格式免下载;重格式与 OCR 等增强能力代码核心内置,模型资源按需下载到本地 model-cache,默认包不含 GB 级模型。 -- 高保真攻坚:难格式不是回避项,OFD、PDF、Office 复杂文档和本地 OCR 是长期差异化攻坚方向;高级 OCR(PaddleOCR-VL / MinerU 等)作为独立本地资源按需获取,不与默认安装包绑定。 +- 高保真攻坚:难格式不是回避项,OFD、PDF、Office 复杂文档和本地 OCR 是长期差异化攻坚方向;高级 OCR 内置目标为 PP-OCRv5(ONNX/WebGPU,按需下载 ONNX 模型),PaddleOCR-VL / MinerU 等 VLM 作为远期/外部资源,不与默认安装包绑定。 - 可验证交付:每个阶段都必须有样例、自动化测试、质量报告和可解释 warnings。 ## 产品壁垒 diff --git a/docs/RESOURCE_BUDGET.md b/docs/RESOURCE_BUDGET.md index 0373dcb..a99e538 100644 --- a/docs/RESOURCE_BUDGET.md +++ b/docs/RESOURCE_BUDGET.md @@ -56,7 +56,7 @@ OCR / 版面 / 表格能力的代码核心内置,但**模型资源不进入默 - model-cache 目录必须支持:manifest 记录每个模型资产的版本、checksum、量化方式、任务范围、最低内存和 fallback;用户可见的缓存路径、清理入口、禁用入口;断网降级提示与失败 fallback。 - 缓存包只保存推理资源,不保存训练检查点、优化器状态、标注数据、调试样本或任何用户文档内容。 - OCR、layout、table、quality-reviewer 共享资源必须去重,避免重复下载 tokenizer、字典、字体、运行库或视觉 backbone。 -- 轻量 OCR(Tesseract.js / 轻量 PaddleOCR)与高级 OCR(PaddleOCR-VL / MinerU)使用独立缓存条目;高级 OCR 启用前展示体积、运行内存、降级路径和失败提示。 +- 轻量 OCR(Tesseract.js)与高级 OCR(PP-OCRv5 ONNX/WebGPU)使用独立缓存条目;高级 OCR 启用前展示体积、运行内存、降级路径和失败提示。PaddleOCR-VL / MinerU 等 VLM 为远期/外部资源,不进入默认 dependencies 或安装包本体。 - 具体 MB/GB 上限以首个可运行 OCR 模型构建后的质量、速度、内存测试结果确定,不沿用默认安装包预算。 ### model-cache 目录结构 diff --git a/docs/superpowers/specs/2026-05-29-p9c-three-layer-verification-design.md b/docs/superpowers/specs/2026-05-29-p9c-three-layer-verification-design.md new file mode 100644 index 0000000..a0bd78f --- /dev/null +++ b/docs/superpowers/specs/2026-05-29-p9c-three-layer-verification-design.md @@ -0,0 +1,106 @@ +# P9-C 转换后检验三层架构总设计 + +状态:生效 +日期:2026-05-29 +前置基础:P9-A.1 → P9-A.4 OCR 链路 / P9-B OCR → FixedLayoutModel / S2 Repair Engine / [2026-05-28-lightweight-default-bundle-direction.md](2026-05-28-lightweight-default-bundle-direction.md) +后续阶段:P9-D 高级 OCR / P7-B 跨平台发布 + +## 目标 + +DEVELOPMENT_TASKS.md 把「转换后检验」列为项目核心差异化能力之一:用户最关心的是「这次转换有没有丢东西」,而不是单一 writer 的程序化质量。P8-B 已经把 RoutePlanner 和路径分级说准,S2 Repair Engine 已经在做模型自审 + 同格式 round-trip,但当前 `metadata.qualityReport` 只有粗粒度的 `structureFidelity / tableFidelity / assetFidelity / warningCount / downgradeCount`,没有结构化的差异数据;视觉对比和 OCR 回读完全是占位。 + +P9-C 落地之后: + +- 任何一次 `convertContent` / `convertContentAsync` 在 Repair Engine 自审之后再过一层「验证阶段」(verification stage),把可执行的三层检验结果合并写入 `metadata.qualityReport`。 +- 三层检验之间是独立、可降级、共享同一份数据契约的子系统: + +``` +verification-stage + ├ rule-diff (P9-C.1) → qualityReport.ruleDiff + ├ ssim-visual (P9-C.2) → qualityReport.ssim + └ ocr-readback (P9-C.3) → qualityReport.ocrReadback +``` + +- 每层都按"可证据 → 触发;不可证据 → eligible:false + reason"原则运行;不夸大覆盖、不夸大保真度。 +- UI / Repair Engine / 任务表都从 `qualityReport.verification.layers` 读到当次实际跑过哪些层,避免 P8 之前那种「质量等级写得满,但路径根本没执行」的回潮。 + +## 三层结构与数据契约 + +### 公共 envelope `qualityReport.verification` + +```js +qualityReport.verification = { + eligible: boolean, // 至少一层 eligible 则 true + layers: ["rule-diff"], // 当次跑过且 eligible 的层名列表 + skipped: [ // 不 eligible 的层 + 原因 + { layer: "ssim", reason: "writer-not-rasterizable" }, + { layer: "ocr-readback", reason: "ocr-engine-unavailable" }, + ], + runtimeMs: number, // 三层合计 wall-clock +} +``` + +### 层级具体字段 + +每层的详细结果挂在 `qualityReport.` 平面字段下,方便 UI 直接渲染、Repair Engine validator 直接消费、测试断言直接定位。 + +| 层 | qualityReport 字段 | 触发条件 | 主要字段(详见各子阶段 spec) | +| --- | --- | --- | --- | +| 规则 diff(P9-C.1) | `qualityReport.ruleDiff` | from/to ∈ text-canonical 集合(md/html/json/csv/txt/xml)且 `output.data` 为字符串 | `identical / blockCounts / changedBlocks / addedBlocks / removedBlocks / fidelity / overallScore` | +| SSIM 视觉对比(P9-C.2) | `qualityReport.ssim` | writer 是 PDF / PNG / OFD 等可栅格化输出,且 baseline 注入 ok | `score / threshold / changedPixels / baseline / passed` | +| OCR 回读(P9-C.3) | `qualityReport.ocrReadback` | OCR engine 已 ready(tesseract 或更高级)+ writer 产物可以渲染为图像 | `textRecall / textPrecision / driftedBlocks / engineId / runtimeMs` | + +任意一层不 eligible 时对应字段为 `null`,且在 `verification.skipped` 中显式带 reason,前端不展示该层卡片。 + +### 与 Repair Engine 的关系 + +| 维度 | Repair Engine(S2) | Verification Stage(P9-C) | +| --- | --- | --- | +| 时机 | writer 之后立即跑 | Repair Engine cycle 完成之后跑 | +| 目的 | 自审 + 自动修复 + 可选 fallback | 多层独立证据采集 | +| 入口 | `defaultRepairEngine.runCycle({ model, output, ctx })` | `runVerificationStage({ model, output, ctx })` | +| 数据归宿 | `metadata.autoRepair` + `metadata.modelReview` + `qualityReport.repairStatus/finalDecision` | `qualityReport.verification` + `qualityReport.{ruleDiff,ssim,ocrReadback}` | +| 是否改 output | 可(fallback writer 切换) | 否(只读、只写 metadata + warnings) | +| 是否抛错 | 内部抓错落到 rejected actions | 内部抓错落到层级 warning,eligible 仍可为 true | + +Repair Engine 的 `reverifyRoundTrip` 字段(`roundTripDelta`)保留,作为粗粒度兼容层;P9-C.1 ruleDiff 是它的字段级细化版本,并行存在。Repair Engine 后续可能会注册一个 `detectRuleDiffDrift` validator 来消费 ruleDiff 结果生成 `replaceTextRun` 等 RepairAction,但**不在 P9-C 阶段做**(避免循环依赖)。 + +### 与 P8 RoutePlanner 的关系 + +RoutePlanner 决定 reader → writer 经过哪些模型 mapper、温度和强制 warning;Verification Stage 只看最终 `output` 和最终 `model`,与路径分级正交。`routeClass: "generated" | "restricted"` 路径仍会跑 verification——只要 writer 可被 round-trip / 栅格化 / OCR,结果就有意义;只是 fidelity 低时 ruleDiff/ssim/ocrReadback 会自然反映出来。 + +## 子阶段顺序与里程碑 + +| 子阶段 | 落地内容 | 依赖外部资源 | +| --- | --- | --- | +| **P9-C.1 规则 diff** | `public/core/verification/` 三模块骨架 + `runVerificationStage` 编排 + 接入 `_wrapWithRepairCycle` + `qualityReport.ruleDiff` + `qualityReport.verification` envelope | 无 | +| **P9-C.2 SSIM 视觉对比** | `ssim.js`(自实现,不引第三方 npm) + 复用 P9-B `defaultPdfPageRasterizer` 渲染 PDF 第 1 页 + `tests/visual-baselines/pdf/` 初始基线(md→pdf / html→pdf) + `qualityReport.ssim` | 浏览器/Tauri runtime(PDF.js vendor 已就位);Node 侧 stub | +| **P9-C.3 OCR 回读** | 复用 tesseract engine 对 writer 产物 OCR 文本与原始 SemanticDoc 文本对比;扫描 PDF / PNG / 渲染后 PDF 都可参与 + `qualityReport.ocrReadback` | tessdata 已下载到 IndexedDB(用户手动启用) | + +P9-C.1 是无外部依赖的纯函数验证,应该最先落地,提供 envelope 数据契约 + 单测脚本,为 P9-C.2 / P9-C.3 提供接入位。 + +## 与历史 spec 的关系 + +| 历史 spec | 状态 | 处理 | +| --- | --- | --- | +| [2026-05-27 PDF Single-Page Visual Regression Design](2026-05-27-pdf-single-page-visual-regression-design.md) | 单层视觉对比设计稿;只覆盖 PDF。 | 在 P9-C.2 spec 中作为前置参考,不再单独执行。 | +| [docs/VISUAL_COMPARISON_PLAN.md](../../VISUAL_COMPARISON_PLAN.md) | 2026-05-12 框架文档;建议引入 sharp/ssim.js/pdf-to-png 等 npm 依赖。 | P9-C 方向调整后**不引入新 npm 依赖**;该文档在 P9-C.2 落地时同步收敛或归档。 | +| `scripts/visual-comparison-test.js` | 2026-05-12 占位 stub。 | P9-C.2 落地时拆为 `scripts/ssim-baseline-test.js`(或合并入 `scripts/visual-comparison-test.js` 实现)。P9-C.1 不动。 | + +## 风险与缓解 + +| 风险 | 缓解 | +| --- | --- | +| 三层 envelope 字段一旦发布,未来反复改 schema 会让 UI / 测试断言碎裂 | P9-C.1 把 envelope 字段一次性钉死(`eligible / layers / skipped / runtimeMs`),未来子层只往 `qualityReport.` 加平面字段,不动 envelope。 | +| 三层都跑会显著延长 convert 耗时 | P9-C.1 是纯同步、可降级;P9-C.2 默认渲染 PDF 第 1 页一次(< 200ms);P9-C.3 单页 OCR ~1s。每层都按 `eligible` 短路,且 `options.repair === false` 仍然跳过整个验证阶段。 | +| Repair Engine 与 Verification Stage 出现循环依赖(Repair Engine 想看 ruleDiff,Verification Stage 又跑在 Repair Engine 之后) | P9-C 阶段 Repair Engine 只消费 `metadata.warnings`,不消费 `qualityReport.ruleDiff`;若后续要让 Repair 消费 ruleDiff,再走单独的 P10 阶段把 Repair Engine 改造为可重入。 | +| 历史 `roundTripDelta` 与新 `ruleDiff` 字段含义重叠造成歧义 | 文档里显式标注:`roundTripDelta` 是粗粒度布尔(保留兼容),`ruleDiff` 是细粒度字段级(P9-C 标准字段);UI / Repair Engine 推荐读 `ruleDiff`。 | + +## 验收门槛 + +P9-C 视为通过的标准(每个子阶段独立验证): + +1. 任意一次 `convertContent` / `convertContentAsync` 在 `options.repair !== false` 时返回 `result.quality.qualityReport.verification.layers` 为非空数组(至少含一个已落地层)或返回 `verification.eligible: false` + 完整 `skipped` 列表。 +2. `repair-engine` 现有契约(`autoRepair / modelReview / roundTripDelta / repairStatus / finalDecision`)一字段未动,老断言全过。 +3. `qualityReport.verification.layers` 与 `qualityReport.` 字段对齐:列表里有 `"rule-diff"` 必然 `qualityReport.ruleDiff` 非 null,反之亦然。 +4. 守门脚本(`local-security-test.js` / `local-model-direction-test.js`)覆盖三层骨架关键词;DEVELOPMENT_TASKS.md / docs/MULTI_MODEL_ARCHITECTURE.md / docs/PRODUCT_STRATEGY.md 与本 spec 一致。 diff --git a/docs/superpowers/specs/2026-05-29-p9c1-rule-diff-design.md b/docs/superpowers/specs/2026-05-29-p9c1-rule-diff-design.md new file mode 100644 index 0000000..3462281 --- /dev/null +++ b/docs/superpowers/specs/2026-05-29-p9c1-rule-diff-design.md @@ -0,0 +1,158 @@ +# P9-C.1 规则 diff 验证阶段 + +状态:生效 +日期:2026-05-29 +前置基础:[2026-05-29-p9c-three-layer-verification-design.md](2026-05-29-p9c-three-layer-verification-design.md) / S2 Repair Engine(`reverifyRoundTrip` / `roundTripDelta`) / P8-B 执行型路由 +后续阶段:P9-C.2 SSIM 视觉对比 / P9-C.3 OCR 回读 + +## 目标 + +P9-C 总 spec 钉好了 envelope;本子阶段交付**第一层可执行验证**——规则 diff,并把 `runVerificationStage` 接入 `_wrapWithRepairCycle`,让 envelope 字段在每次转换中真实出现。 + +落地后: + +- `convertContent({ from: "md", to: "md", content: "# Hi" })` 返回 `result.quality.qualityReport.ruleDiff = { identical: true, fidelity: "exact", overallScore: 1, ... }` + `verification.layers = ["rule-diff"]`。 +- `convertContent({ from: "md", to: "html", content })` 跨格式回环 (md→html→md→read-back) 落地 ruleDiff,结构差异以 `changedBlocks / addedBlocks / removedBlocks` 暴露。 +- `convertContent({ from: "md", to: "pdf", content })` 显示 `verification.eligible: false, skipped: [{ layer: "rule-diff", reason: "writer-not-text-canonical" }]`,不阻塞。 +- `repair-engine.js` 的 `blockFingerprint` / `modelFingerprint` 抽到共享模块;行为字节级不变。 + +## 数据流 + +``` +ConverterRegistry.convert / convertAsync + → prepareConversionModel (P8-B 路由 + mapper) + → write (writer 同步) + → _wrapWithRepairCycle + ├ defaultRepairEngine.runCycle (S2 自审 / fallback / roundTripDelta) + └ runVerificationStage({ model: finalModel, output: cycle.output, ctx }) + ├ 资格判断 ROUND_TRIP_FORMATS + output.data 为字符串 + ├ 同格式 (from === to) 路径:ctx.read(output.data, from = ctx.to) → diffSemanticDocs(originalModel, readBack) + ├ 跨格式回环 (md ↔ html):ctx.read(output.data, from = ctx.to) → 反向写回 ctx.from → ctx.read(...) → diffSemanticDocs(originalModel, readBack2) + ├ 失败兜底:read 抛 → ruleDiff = null + warning RULE_DIFF_READBACK_FAILED + └ 输出 { eligible, layers, ruleDiff, warnings, runtimeMs } + → 合并到 quality.qualityReport.{ruleDiff, verification} + → 合并 warnings 到 metadata +``` + +## 新增 / 改造模块 + +| 文件 | 职责 | +| --- | --- | +| [`public/core/verification/block-fingerprint.js`](../../../public/core/verification/block-fingerprint.js) | 共享指纹模块:`blockFingerprint(block)` / `modelFingerprint(model)`(从 `repair-engine.js` 提取,行为不变)+ `getBlockKey(block, index)`(稳定对齐键)+ `extractBlockFields(block)`(字段子集)+ `BLOCK_FIELDS_BY_TYPE` 常量 + `ROUND_TRIP_FORMATS` 单一来源。 | +| [`public/core/verification/rule-diff.js`](../../../public/core/verification/rule-diff.js) | 结构化 diff:`diffSemanticDocs(original, readBack)` → `{ identical, blockCounts, changedBlocks, addedBlocks, removedBlocks, fidelity, overallScore }`;常量 `MINOR_WEIGHT`、`MAJOR_WEIGHT`、`STRUCTURAL_PENALTY` 暴露。 | +| [`public/core/verification/verification-stage.js`](../../../public/core/verification/verification-stage.js) | 编排:`runVerificationStage({ model, output, ctx })` → envelope;处理同格式 / 跨格式回环 / 不 eligible / 失败兜底;warning code 常量 `RULE_DIFF_DRIFT` / `RULE_DIFF_READBACK_FAILED`。 | +| [`public/core/repair-engine.js`](../../../public/core/repair-engine.js) | 改 import 共享指纹;删本地 `blockFingerprint` / `modelFingerprint` / `ROUND_TRIP_FORMATS` 副本,行为不变。 | +| [`public/core/format-registry.js`](../../../public/core/format-registry.js) | `_wrapWithRepairCycle` 末尾接入 `runVerificationStage`;`quality.qualityReport` 增加 `ruleDiff` / `verification` 字段;warnings 合并到 `metadata`。 | +| [`public/browser-transformer.js`](../../../public/browser-transformer.js) | 顶层 export `runVerificationStage` / `diffSemanticDocs` / `blockFingerprint` / `modelFingerprint` / `ROUND_TRIP_FORMATS` / `RULE_DIFF_DRIFT` / `RULE_DIFF_READBACK_FAILED`。 | +| [`scripts/rule-diff-test.js`](../../../scripts/rule-diff-test.js) | 10 组断言:单元(diffSemanticDocs / runVerificationStage 各场景)+ 端到端(md→md、md→html、md→pdf)+ 指纹抽取等价性。 | +| `package.json` | `scripts.test` 链插入 `&& node scripts/rule-diff-test.js`,位置:`repair-engine-test.js` 之后、`model-cache-test.js` 之前。 | +| [`scripts/local-security-test.js`](../../../scripts/local-security-test.js) | `ALLOWED_PUBLIC_FILES` + `STRICT_LOCAL_ONLY_FILES` 加三个新模块路径。 | +| [`scripts/local-model-direction-test.js`](../../../scripts/local-model-direction-test.js) | `assertIncludes("multiModel", ...)` 加 `runVerificationStage` / `diffSemanticDocs` / `RULE_DIFF_DRIFT`;`assertIncludes("tasks", ...)` 加 `qualityReport.ruleDiff`。 | +| [`docs/MULTI_MODEL_ARCHITECTURE.md`](../../MULTI_MODEL_ARCHITECTURE.md) | 增章节简介验证阶段架构,关键词与守门同步。 | +| [`DEVELOPMENT_TASKS.md`](../../../DEVELOPMENT_TASKS.md) | 阶段状态表新增 P9-C.1 行(已完成);`最近验收修复` 顶部追加详细条目;脚本计数 "20 个" → "21 个"。 | + +## 关键设计点 + +### ROUND_TRIP_FORMATS 单一来源 + +`repair-engine.js:10` 定义的 `const ROUND_TRIP_FORMATS = new Set(["md", "html", "json", "csv", "txt", "xml"])` 搬到 `verification/block-fingerprint.js` 作为唯一来源,`repair-engine.js` import 而非重复定义。 + +### 跨格式回环:本批仅开 md ↔ html + +`md → html → md` / `html → md → html` 是仓库里 reader/writer 双向都已实现、`inline-tokens.js` + `semantic-inlines.js` 提供统一 inline pipeline 的路径。2026-05-26 已通过跨格式回归确认这条链不产生不可逆漂移(强弱嵌套、task list、code 块、列表都被 fixture 锁定)。其它 text-canonical 跨格式回环(如 md ↔ json、html ↔ xml)由于 reader 损失模型差异较大,留给后续阶段单独评估,本批显式不开。 + +### diff 对齐策略 + +`diffSemanticDocs(original, readBack)`: + +1. 用 `getBlockKey(block, index)` 给每个 block 算稳定 key:优先 `block.id`,否则 `${block.type}-${index}-${hash(JSON.stringify(extractBlockFields(block)))}`。 +2. 按 key 直接匹配;命中的进 `changedBlocks` 候选池。 +3. 未匹配的走 LCS-lite:按 `(type, firstWords(8))` 启发对齐剩余块;命中加入 `changedBlocks`,否则进 `addedBlocks` / `removedBlocks`。 +4. 字段级比较用 `extractBlockFields(block)` 子集做 deep equal;不一致字段进 `fieldsDiffered`,每个差异分类 severity: + - **minor**:纯空白 / 标点 / 大小写差异,或仅 `text` 字段的非语义变化。 + - **major**:`level / ordered / headers / rows.length / code / language / src / assetId / format` 等结构字段。 +5. `fidelity` 推导: + - `identical: true` → `exact` + - `addedBlocks.length + removedBlocks.length > 0` 且 `(added + removed) / max(1, original.length) > 0.3` → `broken` + - 有任意 major → `major-drift` + - 仅 minor → `minor-drift` + - 全空 → `exact` +6. `overallScore`: + ``` + const penalty = ( + MAJOR_WEIGHT * majorFieldCount + + MINOR_WEIGHT * minorFieldCount + + STRUCTURAL_PENALTY * (addedBlocks.length + removedBlocks.length) + ); + const score = clamp(1 - penalty / Math.max(1, original.blocks.length), 0, 1); + ``` + 常量值:`MAJOR_WEIGHT = 0.4`, `MINOR_WEIGHT = 0.05`, `STRUCTURAL_PENALTY = 0.5`,作为模块顶层 export 暴露。 + +### Warning 策略 + +- `RULE_DIFF_DRIFT`(info 级):当 `fidelity !== "exact"` 时发;`details` 含 `{ from, to, fidelity, score, addedCount, removedCount, changedCount }`。 +- `RULE_DIFF_READBACK_FAILED`(info 级):当 ctx.read 抛错时发;`details` 含 `{ from, to, cause: errorCode }`。 +- 两条都走 `createWarning("info", code, message, details)`,不修改 `warnings.js` 本体。 + +### envelope 写入 + +`format-registry.js` `_wrapWithRepairCycle` 末尾: + +```js +const verification = runVerificationStage({ model: finalModel, output: cycle.output, ctx }); +const finalModelWithVerification = ensureDocumentAudit({ + ...finalModel, + metadata: withWarnings(finalModel.metadata, verification.warnings), +}, { content, reader: fromFormat, writer: ..., targetFormat: ..., fileName, options }); + +const baseQualityReport = finalModelWithVerification.metadata?.qualityReport || {}; +const qualityReport = { + ...baseQualityReport, + repairStatus: cycle.autoRepair?.attempted ? "verified" : "not-attempted", + finalDecision: cycle.autoRepair?.finalDecision || "pending", + ruleDiff: verification.ruleDiff, + verification: { + eligible: verification.eligible, + layers: verification.layers, + skipped: verification.skipped, + runtimeMs: verification.runtimeMs, + }, +}; +``` + +`roundTripDelta` 字段保留在 `cycle.autoRepair` 内,**字段未动**。 + +## 测试覆盖 + +`scripts/rule-diff-test.js` 10 组断言: + +1. `diffSemanticDocs`:相同模型 → `identical: true / fidelity: "exact" / overallScore: 1`。 +2. `diffSemanticDocs`:单 paragraph 大小写变化(`Hello` → `hello`)→ `fidelity: "minor-drift" / changedBlocks.length === 1 / fieldsDiffered === ["text"] / severity === "minor"`。 +3. `diffSemanticDocs`:heading level 改变(h1 → h2)→ `fidelity: "major-drift"`。 +4. `diffSemanticDocs`:缺一个 block + 多一个 block → `addedBlocks.length === 1 / removedBlocks.length === 1`;超 30% 块数差 → `fidelity: "broken"`。 +5. `runVerificationStage`:mock ctx.read 返回相同模型 → `ruleDiff.identical: true / warnings.length === 0 / eligible: true / layers === ["rule-diff"]`。 +6. `runVerificationStage`:mock ctx.read 抛错 → 不抛异常 + `RULE_DIFF_READBACK_FAILED` warning + `ruleDiff: null / eligible: true`。 +7. `runVerificationStage`:`from = "md"`、`to = "pptx"` → `eligible: false / reason 含 "writer-not-text-canonical" / layers === [] / ruleDiff: null / skipped[0].layer === "rule-diff"`。 +8. 端到端 `convertContent({ from: "md", to: "md", content: "# Title\n\nBody" })` → `result.quality.qualityReport.ruleDiff.identical === true / verification.layers === ["rule-diff"] / verification.eligible === true`。 +9. 端到端 `convertContent({ from: "md", to: "html", content: "# A\n\nB" })` → 跨格式回环走通;`ruleDiff !== null / verification.eligible === true`(不断言 identical,避免回环 noise 假阳性)。 +10. 端到端 `convertContent({ from: "md", to: "pdf", content: "# Hi" })` → `ruleDiff === null / verification.eligible === false / verification.skipped[0].reason === "writer-not-text-canonical"`。 + +外加双跑指纹断言(在第 1 组旁边):用同一 model 调 `blockFingerprint`(共享版)+ inline 复用旧 repair-engine 算法(直接抄 lines 16-34 的等价实现)→ 字节级相等。 + +## 验收门槛 + +1. `npm test` 全 21 个脚本通过;`repair-engine-test.js`(10 个断言)不漂移。 +2. `git diff --check` 无 trailing whitespace。 +3. `npm run release:prepare` 不抛错。 +4. 守门关键词全过:`runVerificationStage` / `diffSemanticDocs` / `RULE_DIFF_DRIFT` 出现在 MULTI_MODEL_ARCHITECTURE.md;`qualityReport.ruleDiff` 出现在 DEVELOPMENT_TASKS.md。 +5. 三模块(block-fingerprint / rule-diff / verification-stage)进入 ALLOWED + STRICT 白名单。 + +## 本轮不做 + +- **不做** SSIM 视觉对比(P9-C.2)。 +- **不做** OCR 回读(P9-C.3)。 +- **不让** Repair Engine 消费 ruleDiff 来生成 RepairAction(避免循环依赖,留给后续阶段)。 +- **不动** writer/reader 输出语义。 +- **不动** UI(landing/workbench/preview/security-center),UI 渲染 verification 卡片留给 P9-C.2 落地后统一做。 +- **不引入** 新 npm 依赖。 +- **不开** 除 md ↔ html 之外的跨格式回环。 diff --git a/docs/superpowers/specs/2026-05-29-p9c2-ssim-visual-loopback-design.md b/docs/superpowers/specs/2026-05-29-p9c2-ssim-visual-loopback-design.md new file mode 100644 index 0000000..215ddc3 --- /dev/null +++ b/docs/superpowers/specs/2026-05-29-p9c2-ssim-visual-loopback-design.md @@ -0,0 +1,93 @@ +# P9-C.2 SSIM 视觉对比 · 视觉回环层 + +状态:生效 +日期:2026-05-29 +前置基础:[2026-05-29-p9c-three-layer-verification-design.md](2026-05-29-p9c-three-layer-verification-design.md) / P9-C.1 规则 diff / P9-B 浏览器 rasterize / [2026-05-27-pdf-single-page-visual-regression-design.md](2026-05-27-pdf-single-page-visual-regression-design.md) +后续阶段:P9-C.3 OCR 回读 / P9-D 高级 OCR + +## 目标 + +P9-C 三层检验的第二层:SSIM 视觉对比,采用**视觉回环**语义——对视觉保真型输入(PDF / PNG),把**输入页**与**输出页**各自栅格化为像素,做结构相似度(SSIM)对比,写入 `qualityReport.ssim`,衡量「这次转换是否保住了视觉外观」。 + +落地后: + +- `convertContentAsync({ from: "pdf", to: "pdf" })` / `convertContentAsync({ from: "png", to: "pdf" })` 在 image source 可用时,`result.quality.qualityReport.ssim = { score, threshold, passed, width, height, pageIndex, sourceFormat, outputFormat }`。 +- 非视觉保真路径(`md → pdf` 无源图、`pdf → md` 输出不可栅格化)记 `qualityReport.ssim = null` + `verification.skipped` 带 reason,不阻塞。 +- SSIM 算法**自实现、零新依赖**:纯函数 `computeSSIM` 操作灰度像素缓冲,Node 完全可测。 +- 渲染本轮 **stub-only**:Node 无 canvas/pdfjs runtime,`defaultPageImageSource` 默认抛 `VERIFICATION_IMAGE_SOURCE_UNAVAILABLE`;浏览器/Tauri 自动 dynamic import canvas 实现;测试用注入 stub image source + 合成像素覆盖代码路径,真实 PDF 渲染 fixture 留给后续。 + +## 为什么是异步层 + +栅格化(dynamic import vendor pdfjs + canvas.render + Image 解码)本质异步,而同步 `convert()` 被大量同步调用方依赖(`convertContent` sync)。因此: + +- 同步 `runVerificationStage`(P9-C.1)保持不变,只跑 rule-diff 层,供 sync `convert()` 用。 +- 新增异步 `runVerificationStageAsync`:先调同步 `runVerificationStage` 拿 rule-diff 基底 envelope,再跑 SSIM 层,合并 layers / skipped / warnings / runtimeMs + `ssim` 字段。供 `convertAsync()` 用。 +- `format-registry.js` 抽出 `_assembleQuality({ cycle, verification, ... })` 共享组装;`convert()` 走同步 verification,`convertAsync()` 走异步 verification。 +- envelope 字段 `qualityReport.ssim` 在同步路径恒为 `null`(同步不跑 SSIM),异步路径在 eligible 时填充。 + +## 数据流 + +``` +convertContentAsync({ from, to }) + → prepareConversionModel + (OCR stage 若适用) + write + → _wrapWithRepairCycleAsync + ├ defaultRepairEngine.runCycle (S2) + ├ verification = await runVerificationStageAsync({ model, output, ctx }) + │ ├ base = runVerificationStage(...) // rule-diff(同步) + │ └ ssim = await runSsimLayer({ ctx, output }) // 视觉回环(异步) + │ ├ gate: ctx.from ∈ {pdf,png} && ctx.to ∈ {pdf,png} + │ ├ srcImage = getPageImage({ format: ctx.from, content: ctx.content }) + │ ├ outImage = getPageImage({ format: ctx.to, content: output.data }) + │ ├ compareImages(srcImage, outImage) → score + │ └ passed = score >= threshold; warning SSIM_VISUAL_DRIFT if !passed + └ _assembleQuality → quality.qualityReport.{ ruleDiff, ssim } + verification envelope +``` + +## 新增 / 改造模块 + +| 文件 | 职责 | +| --- | --- | +| `public/core/verification/ssim.js` | 纯 SSIM 核心:`rgbaToGrayscale` / `resampleGrayscale`(box 重采样到公共网格)/ `computeSSIM(grayA, grayB, w, h, opts)`(非重叠窗口均值 SSIM,C1/C2 标准常量)/ `compareImages(imgA, imgB, opts)`(两图归一到公共网格后算分)。常量 `SSIM_C1` / `SSIM_C2` / `DEFAULT_WINDOW_SIZE` / `DEFAULT_TARGET_WIDTH` 暴露。 | +| `public/core/verification/page-image-source.js` | 像素源抽象:`defaultPageImageSource`(Node 抛 `VERIFICATION_IMAGE_SOURCE_UNAVAILABLE`;浏览器首次调用 dynamic import browser 实现)+ `setPageImageSource(impl)` / `resetPageImageSource()`。契约 `getPageImage({ format, content, pageIndex, dpi })` → `{ pixels: RGBA, width, height }`。错误码 `VERIFICATION_IMAGE_SOURCE_UNAVAILABLE` / `VERIFICATION_IMAGE_SOURCE_FAILED`。 | +| `public/core/verification/page-image-source-browser.js` | 浏览器实现:PDF → `createBrowserPdfPageRasterizer` 得 dataUrl → `Image` → canvas → `getImageData` 取像素;PNG → dataUrl/bytes → `Image` → canvas → `getImageData`。STRICT local-only。 | +| `public/core/verification/verification-stage.js` | 增 `runVerificationStageAsync` + `runSsimLayer` + warning 常量 `SSIM_VISUAL_DRIFT` / `SSIM_SOURCE_UNAVAILABLE`。同步 `runVerificationStage` 不变。 | +| `public/core/format-registry.js` | 抽 `_assembleQuality`;`_wrapWithRepairCycle`(sync)+ 新 `_wrapWithRepairCycleAsync`;`convertAsync` 改用 async wrap。 | +| `public/browser-transformer.js` | 顶层 export `computeSSIM` / `compareImages` / `rgbaToGrayscale` / `resampleGrayscale` / `runVerificationStageAsync` / `defaultPageImageSource` / `setPageImageSource` / `resetPageImageSource` / `SSIM_VISUAL_DRIFT` / `VERIFICATION_IMAGE_SOURCE_UNAVAILABLE` 等。 | +| `scripts/ssim-verification-test.js` | 断言:SSIM core(相同图=1、全黑全白≈0、加噪声单调下降、resample 尺寸)+ compareImages 不同尺寸归一 + runSsimLayer 注入 stub image source 端到端 + 非视觉路径 eligible:false + defaultPageImageSource Node 抛错 + runVerificationStageAsync 合并 rule-diff + ssim 双层。 | +| `package.json` | test 链插入 `ssim-verification-test.js`(在 `rule-diff-test.js` 之后)。 | +| `scripts/local-security-test.js` | 四个新模块加 ALLOWED + STRICT。 | +| `scripts/local-model-direction-test.js` | 守门关键词加 `computeSSIM` / `runVerificationStageAsync` / `SSIM_VISUAL_DRIFT`。 | +| `docs/MULTI_MODEL_ARCHITECTURE.md` / `DEVELOPMENT_TASKS.md` | 同步章节、状态行、验收条目;脚本计数 21 → 22。 | + +## SSIM 算法 + +- 输入:两张 `{ pixels: Uint8ClampedArray (RGBA), width, height }`。 +- `rgbaToGrayscale`:`0.299R + 0.587G + 0.114B`,输出 `Uint8ClampedArray` 长度 w*h。 +- `resampleGrayscale`:box 平均重采样到目标网格(默认 `DEFAULT_TARGET_WIDTH = 256`,高度按 imgA 宽高比推导),两图强制到同一网格;尺寸/宽高比不匹配自然拉低分值。 +- `computeSSIM`:非重叠 `windowSize=8` 窗口,逐窗算 `(2μxμy+C1)(2σxy+C2)/((μx²+μy²+C1)(σx²+σy²+C2))`,取所有窗口均值。`C1=(0.01·255)²=6.5025`,`C2=(0.03·255)²=58.5225`。 +- 不重叠窗口 + 固定网格保证确定性,跨平台稳定,零依赖。 + +## 资格与阈值 + +- SSIM 层 eligible 条件:`ctx.from ∈ {pdf, png}` 且 `ctx.to ∈ {pdf, png}`(即有源图、输出可栅格化)。 +- 当前产品矩阵实际命中:`pdf → pdf`、`png → pdf`。`ofd` 源图本轮不支持(无 OFD 渲染)→ 记 reason `format-not-rasterizable`。 +- 默认阈值 `0.85`(可经 `options.verification.ssimThreshold` 调)。注意:Trans2Former 的 `pdf → pdf` 走「reader 抽文本 → writer 重排版」,视觉本就不保真,SSIM 偏低是**诚实信号**,故默认仅发 info 级 `SSIM_VISUAL_DRIFT`,不判失败、不阻塞。 +- image source 不可用(Node 默认 / 浏览器 vendor 缺失)→ `eligible: false, reason: "image-source-unavailable"`,不抛、不阻塞转换。 + +## 验收门槛 + +1. `npm test` 全 22 个脚本通过;P9-C.1 `rule-diff-test` 与 `repair-engine-test` 不漂移。 +2. SSIM core 纯函数在 Node 完整覆盖(相同图=1、退化、resample)。 +3. `runSsimLayer` 注入 stub image source 端到端跑通,写出 `qualityReport.ssim`。 +4. `convertContent`(sync)行为不变:`qualityReport.ssim === null`,rule-diff 字段不动。 +5. 守门关键词 + 白名单覆盖四个新模块;docs 同步。 +6. `git diff --check` / `npm run release:prepare` 通过。 + +## 本轮不做 + +- **不做** OCR 回读(P9-C.3)。 +- **不入库** 真实 PDF/PNG 渲染 baseline(stub-only,真实 fixture + 浏览器端 canvas 像素 wiring 的端到端验证留给后续/手动)。 +- **不引入** 新 npm 依赖(不加 canvas / sharp / ssim.js / pixelmatch)。 +- **不支持** OFD 源图渲染;不做多页(仅第 1 页);不做跨平台像素级 baseline 承诺。 +- **不改** 同步 `convert()` 语义 / `options.repair === false` 短路。 +- **不改** UI(验证卡片渲染留给三层齐备后统一做)。 diff --git a/docs/superpowers/specs/2026-05-29-p9c3-ocr-readback-design.md b/docs/superpowers/specs/2026-05-29-p9c3-ocr-readback-design.md new file mode 100644 index 0000000..c340450 --- /dev/null +++ b/docs/superpowers/specs/2026-05-29-p9c3-ocr-readback-design.md @@ -0,0 +1,90 @@ +# P9-C.3 OCR 回读 · 转换后检验三层第三层 + +状态:生效 +日期:2026-05-29 +前置基础:[2026-05-29-p9c-three-layer-verification-design.md](2026-05-29-p9c-three-layer-verification-design.md) / P9-C.1 规则 diff / P9-C.2 SSIM 视觉回环 / P9-A.2.b tesseract runtime +后续阶段:P9-D 高级 OCR + +## 目标 + +P9-C 三层检验的第三层,也是收口层:OCR 回读。把转换**输出**(当前仅 PDF)栅格化后用 OCR 引擎读回文本,与**原始 SemanticDoc 文本**对照,写入 `qualityReport.ocrReadback`,回答「转成视觉格式后,文字还认得回来吗」。 + +落地后: + +- `convertContentAsync({ from: "md", to: "pdf" })`(以及 html/txt/docx/... → pdf)在 OCR engine 可用时,`result.quality.qualityReport.ocrReadback = { recall, precision, f1, threshold, passed, engineId, originalLength, recognizedLength, averageConfidence }`。 +- OCR engine 不可用(Node 默认 placeholder / 用户未导入 tessdata)或输出不可栅格化 → `qualityReport.ocrReadback = null` + `verification.skipped` 带 reason,不阻塞。 +- 文本相似度 `compareText` **纯函数、零依赖、字符级多重集**,跨中英文稳健,Node 完全可测。 +- 渲染 + OCR 本轮 **stub-only**:Node 无 canvas/tessdata,用注入 stub engine + stub rasterizer 覆盖代码路径;真实 OCR 回读端到端留给浏览器手动验证。 + +## 为什么是字符级多重集相似度 + +OCR 输出有噪声,且中文无空格分词。token(空格切分)召回对 CJK 失效。采用**字符级多重集**: + +- `originalChars` = 归一化(NFKC + 小写 + 去空白)后原文字符多重集 +- `recognizedChars` = OCR 文本同样归一化后的多重集 +- `intersection = Σ min(count_original(c), count_recognized(c))` +- `recall = intersection / max(1, |originalChars|)`(原文有多少被读回) +- `precision = intersection / max(1, |recognizedChars|)`(OCR 读到的有多少是原文里的) +- `f1 = 2PR/(P+R)` + +字符级对中英文混排、OCR 噪声都稳健,零依赖。 + +## 数据流 + +``` +convertContentAsync({ from: , to: "pdf" }) + → ... write → _wrapWithRepairCycleAsync + └ verification = await runVerificationStageAsync(...) + ├ base = runVerificationStage(...) // rule-diff(同步) + ├ ssim = await runSsimLayer(...) // 视觉回环 + └ ocrReadback = await runOcrReadbackLayer(...) // 动态 import ocr-readback.js + ├ gate: ctx.to === "pdf" 且 原文有文本 + ├ engine = injected || defaultOCRRegistry.pickForTask("ocr-text");不可用→eligible:false + ├ rasterizer = injected || OCR defaultPdfPageRasterizer + ├ raster = await rasterizer.rasterize({ content: output.data, pageIndex: 0 }) // dataURL + ├ ocr = await engine.recognize({ image: raster.dataUrl, options:{language} }) + ├ recognizedText = ocr.fullText || join(pages.lines.text) + ├ compareText(originalText, recognizedText) → { recall, precision, f1 } + └ passed = f1 >= threshold;低于发 info OCR_READBACK_DRIFT + → _assembleQuality → quality.qualityReport.{ ruleDiff, ssim, ocrReadback } +``` + +## 新增 / 改造模块 + +| 文件 | 职责 | +| --- | --- | +| `public/core/verification/ocr-readback.js` | `compareText(original, recognized)` 纯函数(字符多重集 recall/precision/f1 + `normalizeText`)+ `extractModelText(model)`(拼接 block 文本)+ `runOcrReadbackLayer({ model, output, ctx, engine?, rasterizer? })` 异步层。常量 `DEFAULT_OCR_READBACK_THRESHOLD` / warning `OCR_READBACK_DRIFT` / `OCR_READBACK_FAILED`。 | +| `public/core/verification/verification-stage.js` | `runVerificationStageAsync` 末尾 dynamic import `ocr-readback.js` 跑第三层;合并 `layers`/`skipped`/`warnings` + `ocrReadback` 字段。 | +| `public/core/format-registry.js` | `_assembleQuality` 增 `ocrReadback: verification.ocrReadback ?? null`(同步路径恒 null)。 | +| `public/browser-transformer.js` | 顶层 export `compareText` / `extractModelText` / `runOcrReadbackLayer` / `OCR_READBACK_DRIFT` / `DEFAULT_OCR_READBACK_THRESHOLD`。 | +| `scripts/ocr-readback-test.js` | 断言:compareText(相同=1、子集 recall<1/precision=1、CJK、空文本)+ extractModelText + runOcrReadbackLayer(stub engine+rasterizer 端到端 / engine 不可用 eligible:false / 非 pdf 输出 eligible:false / rasterize 抛错兜底)+ runVerificationStageAsync 三层合并 + convertContentAsync md→pdf 注入 stub 填充 ocrReadback。 | +| `package.json` | test 链插入 `ocr-readback-test.js`(在 `ssim-verification-test.js` 之后)。 | +| `scripts/local-security-test.js` | `ocr-readback.js` 加 ALLOWED + STRICT。 | +| `scripts/local-model-direction-test.js` | 守门加 `runOcrReadbackLayer` / `compareText` / `OCR_READBACK_DRIFT`。 | +| `docs/MULTI_MODEL_ARCHITECTURE.md` / `DEVELOPMENT_TASKS.md` | 同步章节、状态行、验收条目;脚本计数 22 → 23。 | + +## 资格与阈值 + +- OCR 回读 eligible 条件:`ctx.to === "pdf"`(当前唯一可栅格化文本 writer)且原文文本非空 且 OCR engine 可用 且 rasterizer 成功。 +- 命中路径:`md/html/txt/json/xml/docx/doc/epub/csv/xlsx → pdf`(凡产出 PDF 的文本路径)。 +- 默认阈值 `f1 >= 0.7`(OCR 噪声容忍);低于发 info 级 `OCR_READBACK_DRIFT`,不判失败、不阻塞。 +- engine 不可用(Node placeholder / 未导入 tessdata)→ `eligible:false, reason:"ocr-engine-unavailable"`,不抛。 +- rasterizer 不可用 → `eligible:false, reason:"rasterizer-unavailable"`;rasterize/recognize 抛错 → `OCR_READBACK_FAILED` info warning + eligible:false。 + +## 验收门槛 + +1. `npm test` 全 23 个脚本通过;P9-C.1/C.2、repair-engine、ocr-baseline 不漂移。 +2. `compareText` 纯函数 Node 完整覆盖(相同/子集/CJK/空)。 +3. `runOcrReadbackLayer` 注入 stub engine + stub rasterizer 端到端,写出 `qualityReport.ocrReadback`。 +4. `convertContent`(sync)行为不变:`qualityReport.ocrReadback === null`。 +5. 三层 envelope 对齐:`layers` 含 `ocr-readback` ⟺ `qualityReport.ocrReadback` 非 null。 +6. 守门 + 白名单 + docs 同步;`git diff --check` / `npm run release:prepare` 通过。 + +## 本轮不做 + +- **不入库** 真实 PDF 渲染 + tessdata OCR 回读 fixture(stub-only;浏览器端真实回读手动验证)。 +- **不引入** 新 npm 依赖。 +- **不让** Repair Engine 消费 `ocrReadback`(避免循环依赖)。 +- **不做** 多页回读(仅第 1 页);不做 OCR→原文的逐块定位(只给聚合 recall/precision/f1)。 +- **不改** 同步 `convert()` 语义 / `options.repair === false` 短路 / UI。 +- **不接** 高级 OCR(PaddleOCR-VL / MinerU 属 P9-D);回读复用已注册的 `ocr-text` engine。 diff --git a/docs/superpowers/specs/2026-05-29-p9c4-verification-ui-design.md b/docs/superpowers/specs/2026-05-29-p9c4-verification-ui-design.md new file mode 100644 index 0000000..5366429 --- /dev/null +++ b/docs/superpowers/specs/2026-05-29-p9c4-verification-ui-design.md @@ -0,0 +1,59 @@ +# P9-C.4 转换检验结果 UI + +状态:生效 +日期:2026-05-29 +前置基础:P9-C.1 规则 diff / P9-C.2 SSIM / P9-C.3 OCR 回读(三层均已写入 `qualityReport`)/ UI-A 三视图重构 +后续阶段:P9-D 高级 OCR + +## 目标 + +P9-C.1/2/3 把三层检验结果算进了 `convert()` / `convertAsync()` 返回值的 `result.quality`(`qualityReport.{ ruleDiff, ssim, ocrReadback }` + `verification` envelope + `autoRepair`),但 `public/app.js` 在转换后**丢弃了 `result.quality`**(只用 `toConversionDocumentModel` 重建 model 渲染基础信息),底部抽屉面板又已在 UI-A 前被移除——核心差异化能力「转换后检验」对用户完全不可见。 + +本子阶段把它呈现出来:转换完成后,在「转换结果」面板内展示一个紧凑、可折叠的**转换检验报告**,显示自动修复结论 + 三层检验(规则 diff / SSIM / OCR 回读)的命中状态与关键指标,未命中层显式给出原因。纯展示,不改转换核心。 + +## 范围 + +### 纳入 +- `public/index.html`:在 `#outputPreviewPanel` 内新增 `#verificationReportPanel`(`
` 折叠面板,默认 hidden,转换后显示)。 +- `public/app.js`: + - `transformContent` 捕获 `result.quality` 存入 `currentConversionQuality`。 + - 新增 `renderVerificationReport(quality)` 渲染面板;文本与二进制结果路径都调用。 + - `resetGeneratedOutput` / 新转换开始时清空面板。 +- `public/styles.css`:`.verification-report` 视觉规则(沿用 slate+teal + mini 风格)。 +- `scripts/browser-smoke-test.js`:断言 `#verificationReportPanel` 存在。 +- docs / DEVELOPMENT_TASKS 同步。 + +### 排除 +- **不复活** 已移除的底部抽屉(`bottomReportPanel`/`warningsPanel`/`qualityReportPanel`/`diffPanel`/`versionsPanel` 保持移除,smoke-test 负断言不破坏)。 +- **不改** 转换核心 / verification-stage / format-registry。 +- **不做** 真实渲染 fixture 入库(SSIM/OCR 在浏览器真实跑,但样例验证仍手动)。 +- **不引入** 新依赖。 + +## 展示内容(来自 `result.quality`) + +``` +转换检验报告 +├ 自动修复: · 结论 (来自 autoRepair) +├ 规则 diff:<命中? fidelity + score : 跳过 reason> +├ SSIM 视觉对比:<命中? score(threshold) + passed : 跳过 reason> +├ OCR 回读:<命中? f1/recall(threshold) + passed : 跳过 reason> +└ warnings:<按 severity 计数> +``` + +- 三层逐行;每行用 `data-layer` 标记,命中显示指标,未命中显示 `verification.skipped[].reason`(如 `writer-not-text-canonical` / `image-source-unavailable` / `ocr-engine-unavailable`)。 +- 文案诚实:明确「检验仅在可证据路径触发」,未触发不代表失败。 +- `result.quality` 缺失(`options.repair === false` 或旧路径)→ 面板隐藏。 + +## 渲染函数契约 + +`renderVerificationReport(quality)`: +- `quality == null` → 面板 `hidden = true`,return。 +- 否则填充各行(读 `quality.qualityReport.ruleDiff/ssim/ocrReadback/verification` + `quality.autoRepair`),`hidden = false`。 +- 纯 DOM 文本写入,无 innerHTML 注入用户内容(防 XSS:用 textContent)。 + +## 验收 + +1. `npm test` 全 24 个脚本通过;`browser-smoke-test` 断言 `#verificationReportPanel` 存在且旧抽屉负断言仍成立。 +2. `git diff --check` / `npm run release:prepare` 通过。 +3. 浏览器手动:`md → md` 转换后面板显示规则 diff 命中(fidelity exact)、SSIM/OCR 跳过并给原因。 +4. 不破坏现有结果面板、PDF 预览、输出编辑器、错误面板。 diff --git a/docs/superpowers/specs/2026-05-29-p9d-advanced-ocr-research.md b/docs/superpowers/specs/2026-05-29-p9d-advanced-ocr-research.md new file mode 100644 index 0000000..7611b4a --- /dev/null +++ b/docs/superpowers/specs/2026-05-29-p9d-advanced-ocr-research.md @@ -0,0 +1,68 @@ +# P9-D 高级 OCR 接入路线调研 + +状态:调研 / 待方向确认 +日期:2026-05-29 +前置基础:P9-A OCR 链路(tesseract)/ P9-B FixedLayoutModel / P9-C 三层检验 / [2026-05-28-lightweight-default-bundle-direction.md](2026-05-28-lightweight-default-bundle-direction.md) +触发:DEVELOPMENT_TASKS 把 P9-D 命名为「PaddleOCR-VL / MinerU」,本调研核实其在本项目约束下的可行性。 + +## 项目硬约束(来自既定方向) + +- 本地优先、零上传、**不调用云端 OCR/AI**、处理阶段禁联网。 +- 默认安装包 30–80 MB;GB 级模型**不进默认包**,按需下载到 model-cache。 +- 运行形态:浏览器 + Tauri(Web 前端 + Rust 壳),**无 Python 运行时**。 +- OCR 作为核心内置能力,按需启用、按需下载、可禁用、可清理。 + +## 调研结论 + +### PaddleOCR-VL(0.9B VLM)—— 浏览器/本地不可行(当前) + +- 即便量化,模型约 **500MB** 下载,推理需 **1–2GB VRAM**。 +- **无成熟的 ONNX / WebGPU 转换路径**(依赖 PaddlePaddle 自有框架);官方推荐通过 vLLM 服务或 API 使用。 +- 结论:与「浏览器/Tauri 本地、零云端」冲突,**本阶段不接 VLM**。 + +来源: +- [PaddleOCR-VL Inference Backends (DeepWiki)](https://deepwiki.com/PaddlePaddle/PaddleOCR/2.2.2-paddleocr-vl-inference-backends-and-acceleration) +- [PaddleOCR-VL-1.5 分析(lilting)](https://lilting.ch/en/articles/paddleocr-vl-1-5-document-parsing) +- [PaddleOCR-VL 官方用法](https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PaddleOCR-VL.html) + +### MinerU(2.5 / Pro 1.2B VLM + pipeline atomic models)—— Python 工具,非 JS 可嵌入 + +- 现代栈基于 PyTorch(pipeline 原子模型 ~1–2GB)+ vLLM(VLM 后端,官方 Docker 需 CUDA)。 +- 支持完全离线(`MINERU_MODEL_SOURCE=local` + `mineru-models-download`),但是 **Python CLI/服务**,不能嵌入 Web/Tauri 前端。 +- 唯一接入方式是「外部 Python sidecar」:重、需 Python/CUDA、与 30–80MB 轻量默认包原则冲突。 +- 结论:**本阶段不接 MinerU 内嵌**;可作为「高级用户外部工具」远期评估。 + +来源: +- [MinerU GitHub](https://github.com/opendatalab/mineru) +- [MinerU 模型配置/离线(DeepWiki)](https://deepwiki.com/opendatalab/MinerU/3.2-model-configuration) +- [MinerU2.5 论文](https://arxiv.org/html/2509.22186v1) + +### PP-OCRv5(ONNX Runtime + WebGPU)—— 真正可行的本地高级 OCR + +- **PaddleOCR.js** / 社区 **ppu-paddle-ocr**:基于 PP-OCRv5 的浏览器/多 JS runtime SDK,ONNX Runtime + WebGPU 加速 + WASM 回退,INT8 量化,40–100+ 语言,数据留在本地。 +- PP-OCRv5 全系列都有 **ONNX 导出**;WebGPU 不可用时自动回退 WASM。 +- 与本项目契合度高:作为比 tesseract 更高精度的 OCR engine 注册到现有 `defaultOCRRegistry`,ONNX 模型按需下载到 model-cache,复用 P9-A 的 manifest/checksum/Storage + P9-B 的 FixedLayoutModel + P9-C 的 OCR 回读检验。 + +来源: +- [PaddleOCR.js 浏览器部署](http://www.paddleocr.ai/main/en/version3.x/deployment/browser.html) +- [ppu-paddle-ocr (JSR)](https://jsr.io/@snowfluke/ppu-paddle-ocr) +- [Deterministic OCR in JavaScript: PaddleOCR for Node/Bun/Deno/Browser (DEV)](https://dev.to/awalariansyah/deterministic-ocr-in-javascript-paddleocr-for-node-bun-deno-and-the-browser-2bgn) + +## 建议:把 P9-D「高级 OCR」目标从 VLM 改为 PP-OCRv5(ONNX/WebGPU) + +理由:VLM(PaddleOCR-VL/MinerU)在「浏览器/Tauri 本地 + 零云端 + 轻量默认包」约束下当前不可落地;PP-OCRv5(ONNX/WebGPU)是同时满足精度提升与全部硬约束的现实路径,且能无缝复用 P9-A~C 的全部基础设施(registry / manifest / model-cache / FixedLayout / OCR 回读)。 + +如确认,需同步修订把「PaddleOCR-VL / MinerU」写为内置目标的文档:`DEVELOPMENT_TASKS` / `DESKTOP_APP_ARCHITECTURE` / `DESKTOP_RELEASE_PLAN` / `RESOURCE_BUDGET` / `PRODUCT_STRATEGY` / `MULTI_MODEL_ARCHITECTURE` / `CONVERSION_ROUTING`(及 `local-model-direction-test` 守门关键词),将 VLM 标注为「远期/外部资源」,把 PP-OCRv5(ONNX/WebGPU)确立为高级 OCR 内置路径。 + +## 拟定子阶段(若采纳 PP-OCRv5 路线) + +- **P9-D.1 引擎骨架 + 契约**:`paddle-ocr-engine.js`(实现现有 `OCREngine` 接口,注册到 `defaultOCRRegistry`,taskCapabilities 含 `ocr-text`/`ocr-layout`)+ ONNX 模型 ModelManifest 登记到 `defaultModelCache`(not-downloaded)+ Node 端不可用回退。**不引入运行时依赖、不实跑推理**(沿用 tesseract 骨架先行的节奏)。 +- **P9-D.2 ONNX runtime vendor + WebGPU 接入**:vendor onnxruntime-web(optionalDependency)+ WebGPU/WASM 检测 + COOP/COEP 说明;浏览器端真实推理,Node stub。 +- **P9-D.3 模型按需下载 + 安全中心 UI**:PP-OCRv5 det/rec/cls ONNX 模型按需下载到 model-cache(SHA-256),安全中心导入/清理按钮,断网降级提示。 +- **P9-D.4 接入转换链 + 三层检验**:把高精度 engine 经 `pickForTask` 优先于 tesseract;OCR 回读层复用以更高精度回读;FixedLayout 版面增强。 + +## 本调研不做 + +- 不写任何 P9-D 代码(等方向确认)。 +- 不改既有文档(等方向确认后统一修订)。 +- 不接 VLM 内嵌;不引入 Python sidecar。 diff --git a/docs/superpowers/specs/2026-05-29-p9d1-paddle-ocr-skeleton-design.md b/docs/superpowers/specs/2026-05-29-p9d1-paddle-ocr-skeleton-design.md new file mode 100644 index 0000000..c015964 --- /dev/null +++ b/docs/superpowers/specs/2026-05-29-p9d1-paddle-ocr-skeleton-design.md @@ -0,0 +1,47 @@ +# P9-D.1 PP-OCRv5 高级 OCR 引擎骨架 + +状态:生效 +日期:2026-05-29 +前置基础:[2026-05-29-p9d-advanced-ocr-research.md](2026-05-29-p9d-advanced-ocr-research.md)(方向已确认:高级 OCR = PP-OCRv5 ONNX/WebGPU)/ P9-A.1 OCR 契约 / P9-A.2 tesseract 骨架 / S3 model-cache +后续阶段:P9-D.2 ONNX runtime vendor + WebGPU / P9-D.3 模型按需下载 + UI / P9-D.4 接入转换链 + +## 目标 + +按「骨架先行」节奏(同 P9-A.2 tesseract)落地 PP-OCRv5 高级 OCR 的**引擎契约 + manifest 登记 + Node 不可用回退**,不引入运行时依赖、不实跑推理。同时把方向文档从「PaddleOCR-VL / MinerU 内置」修订为「PP-OCRv5 (ONNX/WebGPU) 内置;VLM 远期/外部」。 + +落地后: +- `paddleOcrEngine` 实现现有 `OCREngine` 接口,注册到 `defaultOCRRegistry`;Node/未就位时 `isAvailable()===false`,`recognize()` 三阶段拒绝(vendor-not-ready / model-missing / runtime-not-wired)。 +- PP-OCRv5 ONNX 模型集(det+rec+cls)以 ModelManifest 登记到 `defaultModelCache`,状态 `not-downloaded`。 +- 复用 P9-A 的 registry/manifest/Storage、P9-B 的 FixedLayoutModel、P9-C 的 OCR 回读检验(后续 P9-D.2+ 接真实推理后自然受益)。 + +## 新增 / 改造模块 + +| 文件 | 职责 | +| --- | --- | +| `public/core/ocr/paddle-ocr-engine.js` | `paddleOcrEngine`(id `paddleocr-v5`,taskCapabilities `["ocr-text","ocr-layout"]`,manifestId `ocr-text.paddleocr.v5`):`isAvailable()` = vendorReady(`__t2fPaddleOcrVendorReady`) && modelReady;`recognize()` 三阶段拒绝(`OCR_UNAVAILABLE` / `OCR_ENGINE_FAILED`);`markPaddleOcrVendorReady(ready)` + `ensureProbe()`。**不引入 onnxruntime,不实跑推理**。 | +| `public/core/ocr/paddle-ocr-bootstrap.js` | 副作用 import:注册 `paddleOcrEngine` 到 `defaultOCRRegistry`(在 tesseract 之后)+ 注册 PP-OCRv5 ModelManifest(`engine: "paddleocr"`,int8,det/rec/cls perFile 占位)到 `defaultModelCache`(`STATUS_NOT_DOWNLOADED`)。 | +| `public/browser-transformer.js` | 顶层 import `paddle-ocr-bootstrap`(tesseract-bootstrap 之后)+ export `paddleOcrEngine` / `PADDLE_OCR_MANIFEST_ID` / `markPaddleOcrVendorReady` / `ensurePaddleOcrBootstrap`。 | +| `scripts/ocr-baseline-test.js` | 更新两处 `pickForTask` 回退集合断言加入 `paddleocr-v5`;新增 paddle 骨架断言(注册 / isAvailable false / manifest 登记 / recognize 三阶段拒绝 / markPaddleOcrVendorReady)。 | +| `scripts/local-security-test.js` | 两个新模块加 ALLOWED + STRICT。 | +| `scripts/local-model-direction-test.js` | multiModel 守门加 `PP-OCRv5` / `ONNX` / `WebGPU` / `paddleOcrEngine`;保留既有「不默认内置 PaddleOCR-VL/MinerU」负断言。 | +| 方向文档(`DEVELOPMENT_TASKS` / `DESKTOP_APP_ARCHITECTURE` / `DESKTOP_RELEASE_PLAN` / `RESOURCE_BUDGET` / `PRODUCT_STRATEGY` / `MULTI_MODEL_ARCHITECTURE` / `CONVERSION_ROUTING`) | 把高级 OCR 内置目标改为 PP-OCRv5 (ONNX/WebGPU),VLM(PaddleOCR-VL/MinerU) 标注为远期/外部资源(与研究 spec 一致)。 | + +## pickForTask 优先级说明 + +`OCREngineRegistry.pickForTask` 返回注册顺序中第一个 available 的 engine;都不可用时返回最后一个候选。本轮 paddle 注册在 tesseract 之后 → 都不可用时回退候选变为 paddle(已更新测试断言)。 + +「paddle 可用时优先于 tesseract」的偏好排序留给 **P9-D.4**(届时通过 priority 字段或调整注册顺序实现);P9-D.1 paddle 恒不可用,偏好排序不影响当前行为。 + +## 验收 + +1. `npm test` 全 24 个脚本通过;既有 placeholder/tesseract 断言不破坏(仅扩展回退集合)。 +2. paddle 引擎注册、`isAvailable()===false`、manifest 登记、recognize 三阶段拒绝均被覆盖。 +3. 方向文档与研究 spec 一致;`local-model-direction-test` 正负断言全过。 +4. `git diff --check` / `npm run release:prepare` 通过;无新增运行时 npm 依赖。 + +## 本轮不做 + +- 不引入 onnxruntime-web / 不实跑 PP-OCRv5 推理(P9-D.2)。 +- 不做模型按需下载 / 安全中心 UI(P9-D.3)。 +- 不改 pickForTask 偏好排序 / 不接转换链(P9-D.4)。 +- 不接 VLM 内嵌 / 不引入 Python sidecar。 diff --git a/docs/superpowers/specs/2026-05-29-p9d2-onnxruntime-vendor-design.md b/docs/superpowers/specs/2026-05-29-p9d2-onnxruntime-vendor-design.md new file mode 100644 index 0000000..9987d32 --- /dev/null +++ b/docs/superpowers/specs/2026-05-29-p9d2-onnxruntime-vendor-design.md @@ -0,0 +1,47 @@ +# P9-D.2 PP-OCRv5 onnxruntime-web vendor + 运行时加载骨架 + +状态:生效 +日期:2026-05-29 +前置基础:[2026-05-29-p9d1-paddle-ocr-skeleton-design.md](2026-05-29-p9d1-paddle-ocr-skeleton-design.md) / P9-A.2 tesseract vendor 骨架(同节奏) +后续阶段:P9-D.2.b 真实 det/cls/rec 推理管线 + CTC 解码 / P9-D.3 模型按需下载 + UI / P9-D.4 接入转换链 + +## 目标 + +按「vendor + 运行时骨架」节奏(同 P9-A.2 tesseract)接入 PP-OCRv5 的 ONNX Runtime(onnxruntime-web):optionalDependency + vendor 同步脚本 + 运行时加载器 + 执行后端(WebGPU/WASM)选择。**本轮不实现 det/cls/rec 推理管线与 CTC 解码**(留 P9-D.2.b,需真实模型 + 字典)。 + +落地后: +- `onnxruntime-web` 作为 optionalDependency 声明;缺失不阻塞(vendor 脚本 exit 0)。 +- `paddle-ocr-runtime.js` 提供 `loadOnnxRuntime`(dynamic import 同源 vendor ORT,Node 抛 `OCR_VENDOR_LOAD_FAILED`)、`pickExecutionProviders`(`navigator.gpu` → `["webgpu","wasm"]`,否则 `["wasm"]`)、`createOcrSession` / `disposeSession` 骨架、`PADDLE_VENDOR_PATHS`。 +- `paddleOcrEngine.recognize` 第三阶段改为真实尝试 `loadOnnxRuntime()`:浏览器装好 vendor + 模型则加载 ORT,再以 `pipeline-not-wired` 拒绝(P9-D.2.b 接管线);Node 在 model-missing 阶段已先行拒绝。 + +## CSP 现状(无需改动) + +Tauri CSP 已是 `script-src 'self' 'wasm-unsafe-eval'` + `worker-src 'self' blob:` + `connect-src 'self'`,足够 onnxruntime-web 同源加载 wasm/worker 与实例化;WebGPU 无需额外 CSP。本轮不动 CSP。 + +## 新增 / 改造模块 + +| 文件 | 职责 | +| --- | --- | +| `scripts/sync-onnxruntime-vendor.js` | 模仿 sync-tesseract-vendor:从 `node_modules/onnxruntime-web/dist/` 同步 `ort*.mjs` + `*.wasm` 到 `public/vendor/onnxruntime/`;缺包 exit 0 不阻塞。 | +| `public/core/ocr/paddle-ocr-runtime.js` | `loadOnnxRuntime(vendorUrl)`(dynamic import,Node 抛 `OCR_VENDOR_LOAD_FAILED`,设 `ort.env.wasm.wasmPaths`)+ `pickExecutionProviders()` + `createOcrSession({ ort, modelBuffer, providers })` + `disposeSession` + `PADDLE_VENDOR_PATHS`。 | +| `public/core/ocr/paddle-ocr-engine.js` | recognize 第三阶段经 `loadOnnxRuntime()` → 暂以 `pipeline-not-wired`(`OCR_ENGINE_FAILED`)拒绝。 | +| `package.json` | `onnxruntime-web` 加 optionalDependencies;`vendor:onnx` script;`release:prepare` 加入 onnx vendor sync。 | +| `public/browser-transformer.js` | export `loadOnnxRuntime` / `pickExecutionProviders` / `createOcrSession` / `disposeOcrSession` / `PADDLE_VENDOR_PATHS`。 | +| `scripts/local-security-test.js` | `isLocalVendorAsset` 识别 `public/vendor/onnxruntime/`;`paddle-ocr-runtime.js` 加 ALLOWED + STRICT。 | +| `scripts/local-model-direction-test.js` | multiModel 守门加 `onnxruntime-web`。 | +| `scripts/ocr-baseline-test.js` | `pickExecutionProviders()` Node 返回 `["wasm"]`;`loadOnnxRuntime()` Node 抛 `OCR_VENDOR_LOAD_FAILED`。 | +| docs / DEVELOPMENT_TASKS | P9-D.2 条目与状态行。 | + +## 验收 + +1. `npm test` 全 24 个脚本通过;paddle 骨架(P9-D.1)断言不破坏。 +2. `pickExecutionProviders` Node 返回 `["wasm"]`;`loadOnnxRuntime` Node 抛 `OCR_VENDOR_LOAD_FAILED`。 +3. `npm run release:prepare` 包含 onnx vendor sync(缺包 exit 0)。 +4. 守门白名单/关键词覆盖;`git diff --check` 通过;onnxruntime-web 仅 optionalDependency,缺失不阻塞。 + +## 本轮不做 + +- 不实现 det/cls/rec 推理管线 / DB 后处理 / CTC 解码(P9-D.2.b,需真实模型 + 字典)。 +- 不做模型按需下载 / 安全中心 UI(P9-D.3)。 +- 不接转换链 / 不改 pickForTask 偏好(P9-D.4)。 +- 不强制安装 onnxruntime-web(仅声明 optionalDependency,npm test 在 Node 用拒绝路径覆盖)。 diff --git a/docs/superpowers/specs/2026-05-29-p9d2b-paddle-inference-pipeline-design.md b/docs/superpowers/specs/2026-05-29-p9d2b-paddle-inference-pipeline-design.md new file mode 100644 index 0000000..d9f9af3 --- /dev/null +++ b/docs/superpowers/specs/2026-05-29-p9d2b-paddle-inference-pipeline-design.md @@ -0,0 +1,53 @@ +# P9-D.2.b PP-OCRv5 推理管线(det + cls + rec + CTC) + +状态:生效 +日期:2026-05-29 +前置基础:P9-D.2 onnxruntime-web 运行时骨架 / P9-D.3 模型导入 / P9-A.1 OCRResult 契约 +后续阶段:P9-D.4 接入转换链并让 paddle 在可用时优先于 tesseract + +## 目标 + +实现 PP-OCRv5 的真实推理管线,把检测/方向/识别三段 ONNX 前向 + 经典前后处理串成 `OCRResult`。**核心前后处理写成纯函数**(预处理、DB 检测后处理、CTC 贪心解码、字典解析、裁剪),在 Node 用合成张量完整单测;`runPaddlePipeline` 编排器接受**可注入的 session 对象**,因此可在 Node 用 mock session + mock ort 端到端测试,无需真实模型。 + +`paddleOcrEngine.recognize` 在浏览器把 `image` 解码为 RGBA → 从本地缓存加载 det/cls/rec 模型 + 字典 → 创建 session → 调 `runPaddlePipeline`。Node/未就位仍在 vendor-load / model-missing 前置拒绝。 + +## 新增 / 改造 + +| 文件 | 职责 | +| --- | --- | +| `public/core/ocr/paddle-ocr-pipeline.js` | 纯函数:`parseCharDictionary` / `preprocessForDetection` / `preprocessForRecognition` / `dbPostProcess` / `ctcGreedyDecode` / `cropImageData` / `resizeRgba`;编排器 `runPaddlePipeline({ ort, detSession, clsSession, recSession, imageData, dictionary, options })` → `OCRResult`。常量 `DET_LIMIT_SIDE_LEN`/`REC_IMAGE_HEIGHT`/`DET_MEAN`/`DET_STD`。 | +| `public/core/ocr/paddle-ocr-engine.js` | recognize 第三阶段:`loadOnnxRuntime` → `decodeImageToImageData(image)`(浏览器 canvas,Node 抛)→ 从 `_storage` 取 det/cls/rec 模型 buffer + 可选字典 `paddleocr/v5/dict.txt` → `createOcrSession` ×3 → `runPaddlePipeline`;任一步失败抛 `OCR_ENGINE_FAILED`。 | +| `public/browser-transformer.js` | export `runPaddlePipeline` / `parseCharDictionary` / `preprocessForDetection` / `preprocessForRecognition` / `dbPostProcess` / `ctcGreedyDecode` / `cropImageData`。 | +| `scripts/paddle-ocr-pipeline-test.js` | 纯函数单测(字典解析、det/rec 预处理形状与归一、DB 连通域出框、CTC 折叠去重去 blank)+ `runPaddlePipeline` 用 mock ort/session + 合成图端到端出 OCRResult。接入 `npm test`(第 25 个)。 | +| `scripts/local-security-test.js` | `paddle-ocr-pipeline.js` 加 ALLOWED + STRICT。 | +| `scripts/local-model-direction-test.js` | multiModel 守门加 `runPaddlePipeline` / `ctcGreedyDecode`。 | +| docs / DEVELOPMENT_TASKS | P9-D.2.b 章节与状态行;脚本计数 24 → 25。 | + +## 管线与契约 + +``` +runPaddlePipeline: + imageData(RGBA) → preprocessForDetection → ort.Tensor → detSession.run + → probMap(1,1,H,W) → dbPostProcess(thresh/boxThresh/minSize) → boxes[] + for box in boxes: + cropImageData → [clsSession.run(方向,本轮仅调用不旋转)] + → preprocessForRecognition(H=48) → recSession.run → logits(1,T,C) + → ctcGreedyDecode(dictionary) → { text, confidence } + → createOCRResult(pages[0].lines = boxes×decode, fullText, averageConfidence) +``` + +- **DB 后处理**:阈值二值化 + 4-连通域 BFS + 轴对齐 bbox + box 平均概率打分 + 尺寸/分数过滤 + 按 det→原图比例缩放回坐标。本轮用轴对齐 bbox(非 minAreaRect+unclip),简化但正确,文档标注;多栏/旋转文本精度提升留后。 +- **CTC 贪心解码**:逐时刻 argmax 取 idx+conf → 折叠连续重复 → 去 blank(0) → 映射字典 → text + 平均 conf。 +- **预处理常量**:det `mean=[0.485,0.456,0.406]/std=[0.229,0.224,0.225]`、`limit_side_len=960`、尺寸取 32 的倍数;rec 高 48、`(x/255-0.5)/0.5`。真实模型精度的端到端校验为浏览器/手动(导入真实 PP-OCRv5 ONNX + 字典后)。 + +## 验收 + +1. `npm test` 全 25 个脚本通过;P9-D.1/D.2/D.3 断言不破坏。 +2. 纯函数单测覆盖字典/预处理/DB 出框/CTC 折叠;`runPaddlePipeline` mock 端到端出含已知文本的 OCRResult。 +3. `git diff --check` / `npm run release:prepare` 通过;无新增运行时依赖(onnxruntime-web 仍仅 optionalDependency)。 + +## 本轮不做 + +- 不做 cls 角度旋转校正(仅调用 clsSession 占位;旋转留后)、不做 minAreaRect+unclip 高精度框、不做多栏阅读顺序。 +- 不在 Node 跑真实 ONNX(mock session 覆盖编排;真实模型端到端为浏览器手动)。 +- 不接转换链 / 偏好排序(P9-D.4)。 diff --git a/docs/superpowers/specs/2026-05-29-p9d3-paddle-model-management-design.md b/docs/superpowers/specs/2026-05-29-p9d3-paddle-model-management-design.md new file mode 100644 index 0000000..b324d16 --- /dev/null +++ b/docs/superpowers/specs/2026-05-29-p9d3-paddle-model-management-design.md @@ -0,0 +1,44 @@ +# P9-D.3 PP-OCRv5 模型导入与安全中心管理 + +状态:生效 +日期:2026-05-29 +前置基础:[2026-05-29-p9d2-onnxruntime-vendor-design.md](2026-05-29-p9d2-onnxruntime-vendor-design.md) / P9-A.2.b tesseract tessdata 导入流程(同模式)/ S3 model-cache +后续阶段:P9-D.2.b 真实 det/cls/rec 推理管线 + CTC 解码 / P9-D.4 接入转换链 + +## 目标 + +让用户能在安全中心把 PP-OCRv5 的 det/cls/rec ONNX 模型导入本地缓存,使 `paddleOcrEngine` 从「model-missing」走向就绪。**复用 tesseract tessdata 的本地导入(file picker + SHA-256 + IndexedDB)模式**——项目禁联网/STRICT 守门禁止任何远程 URL,所谓"按需下载"在本项目中即"用户本地导入模型文件",不做自动远程 fetch。 + +落地后: +- 安全中心「模型缓存」card 对 `engine === "paddleocr"` 行渲染三个导入按钮(det/cls/rec onnx)+ 清除按钮。 +- 导入:file picker → `arrayBuffer` → `sha256Hex` → `defaultOCRStorage.put("paddleocr/v5/")` → `paddleOcrEngine.ensureProbe()`;三件齐全 → `markPaddleOcrVendorReady(true)` + 状态 `available`。 +- 清除:删除三个键 + `ensureProbe` + 状态回 `not-downloaded`。 + +## 新增 / 改造 + +| 文件 | 改动 | +| --- | --- | +| `public/security-center.js` | import `paddleOcrEngine` / `markPaddleOcrVendorReady` / `PADDLE_OCR_MODEL_FILES`;`renderModelCache` 对 paddle 行调 `renderPaddleActions`;新增 `importPaddleModel(dialog, button)`(按 `data-file` 导入到 `paddleocr/v5/`,全部就绪才 `markPaddleOcrVendorReady(true)` + `STATUS_AVAILABLE`,否则保持 `STATUS_VERIFYING`/部分提示)+ `clearPaddleModels`;click 委托加 `[data-import-paddle]` / `[data-clear-paddle]`。 | +| `public/browser-transformer.js` | 已 export 所需 API(P9-D.1/D.2),无需新增。 | +| `scripts/ocr-baseline-test.js` | 第 37 组:手动把三件模型 put 进 `paddleOcrEngine._storage` + `markPaddleOcrVendorReady(true)` + `ensureProbe()` → `isAvailable()===true`;删任一件 → `false`(验证就绪逻辑,不依赖 UI/ORT)。 | +| `scripts/browser-smoke-test.js` | 断言 index.html `#modelCacheFileInput` 存在(导入复用同一隐藏 file input,已存在);无新 DOM 需要。 | +| `docs/MULTI_MODEL_ARCHITECTURE.md` / `DEVELOPMENT_TASKS.md` | P9-D.3 章节 + 状态行 + 验收条目。 | + +## 关键点 + +- 存储键 `paddleocr/v5/det.onnx` / `cls.onnx` / `rec.onnx`,与 `paddle-ocr-engine.js` 的 `MODEL_KEY_PREFIX` + `PADDLE_OCR_MODEL_FILES` 对齐。 +- 就绪判定:`paddleOcrEngine.ensureProbe()` 检查三件齐全;安全中心仅在三件齐全后置 `STATUS_AVAILABLE`,否则提示「还需导入 X」。 +- `markPaddleOcrVendorReady(true)` 表示 ORT vendor 就位(浏览器已 `npm run vendor:onnx`);与模型导入解耦,但导入流程在三件齐全时一并置位,保持与 tesseract 体验一致。 +- 不触网:复用现有 `#modelCacheFileInput` 隐藏 file input;导入是本地文件,不发起任何请求。security-center.js 已在 ALLOWED 白名单。 + +## 验收 + +1. `npm test` 全 24 个脚本通过;paddle 就绪/清除逻辑被覆盖(不依赖 ORT/UI)。 +2. `git diff --check` / `npm run release:prepare` 通过。 +3. 浏览器手动:安全中心对 PP-OCRv5 行显示导入 det/cls/rec + 清除;导入三件后状态变「可用」。 + +## 本轮不做 + +- 不实现 det/cls/rec 推理 + CTC 解码(P9-D.2.b)。 +- 不做自动远程下载(违反禁联网;坚持本地导入)。 +- 不接转换链 / 偏好排序(P9-D.4)。 diff --git a/docs/superpowers/specs/2026-05-29-p9d4-ocr-route-preference-design.md b/docs/superpowers/specs/2026-05-29-p9d4-ocr-route-preference-design.md new file mode 100644 index 0000000..11f1375 --- /dev/null +++ b/docs/superpowers/specs/2026-05-29-p9d4-ocr-route-preference-design.md @@ -0,0 +1,41 @@ +# P9-D.4 高级 OCR 接入转换链(路由偏好) + +状态:生效 +日期:2026-05-29 +前置基础:P9-D.1 引擎契约 / P9-D.2 ORT 运行时 / P9-D.3 模型导入 / P9-D.2.b 推理管线 +后续阶段:真实模型 + 字典导入后的浏览器端端到端验证(手动) + +## 目标 + +P9-D 收口:让 `paddleOcrEngine`(PP-OCRv5)在**可用时优先于** tesseract 被选用,且 PNG / 扫描 PDF 的 OCR stage 自动受益(它们经 `pickForTask("ocr-text")` 选引擎)。当前 `pickForTask` 返回注册顺序中第一个 available;paddle 注册在 tesseract 之后,两者都可用时会错选 tesseract。引入**优先级感知**选择修正。 + +## 设计 + +`OCREngineRegistry.pickForTask(task)`: +- 候选 = 注册引擎中 `taskCapabilities.includes(task)` 的。 +- 在候选中**按 `priority`(降序)挑第一个 available**(`priority` 缺省 0);都不可用时回退到最后注册的候选(行为不变)。 +- 引擎 `priority`:`placeholderOCREngine` = 0(缺省)、`tesseractOCREngine` = 10、`paddleOcrEngine` = 20(最高,优先)。 + +PNG / 扫描 PDF stage 无需改动:`enhanceWithOCR` / `runScannedPdfOCRStage` 经 `defaultOCRRegistry.pickForTask("ocr-text")` 取引擎,自动选到可用的最高优先级引擎(有 paddle 用 paddle,否则 tesseract,否则 placeholder/降级 warning)。 + +## 新增 / 改造 + +| 文件 | 改动 | +| --- | --- | +| `public/core/ocr/ocr-engine.js` | `pickForTask` 改为优先级感知:候选按 `priority` 降序挑首个 available;无可用回退末位。 | +| `public/core/ocr/paddle-ocr-engine.js` | engine 加 `priority: 20`。 | +| `public/core/ocr/tesseract-engine.js` | engine 加 `priority: 10`。 | +| `scripts/ocr-baseline-test.js` | 第 38 组:自建 registry 两 available stub(priority 高/低)→ pickForTask 返回高优先级;默认 registry 同时让 paddle + tesseract available → pickForTask("ocr-text") 返回 `paddleocr-v5`,清理后回到 false。 | +| docs / DEVELOPMENT_TASKS | P9-D.4 章节与状态行。 | + +## 验收 + +1. `npm test` 全 25 个脚本通过;既有 pickForTask 回退断言(无可用时)不破坏。 +2. 两引擎都可用时 `pickForTask("ocr-text")` 选 `paddleocr-v5`;仅 tesseract 可用时选 tesseract。 +3. `git diff --check` / `npm run release:prepare` 通过。 + +## 本轮不做 + +- 不改 PNG / 扫描 PDF stage 代码(经 pickForTask 自动受益)。 +- 不在 Node 跑真实 ONNX;真实模型端到端为浏览器手动验证。 +- 不做 cls 旋转校正 / 高精度框(精度提升属后续)。 diff --git a/package-lock.json b/package-lock.json index 3c4479b..40247bf 100644 --- a/package-lock.json +++ b/package-lock.json @@ -10,10 +10,12 @@ "license": "MIT", "dependencies": { "express": "^4.21.2", - "pdfjs-dist": "*" + "onnxruntime-web": "^1.26.0" }, "optionalDependencies": { - "pdfjs-dist": "^5.7.284" + "onnxruntime-web": "^1.26.0", + "pdfjs-dist": "^5.7.284", + "tesseract.js": "^5.1.1" } }, "node_modules/@napi-rs/canvas": { @@ -281,6 +283,89 @@ "url": "https://github.com/sponsors/Brooooooklyn" } }, + "node_modules/@protobufjs/aspromise": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/@protobufjs/aspromise/-/aspromise-1.1.2.tgz", + "integrity": "sha512-j+gKExEuLmKwvz3OgROXtrJ2UG2x8Ch2YZUxahh+s1F2HZ+wAceUNLkvy6zKCPVRkU++ZWQrdxsUeQXmcg4uoQ==", + "license": "BSD-3-Clause", + "optional": true + }, + "node_modules/@protobufjs/base64": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/@protobufjs/base64/-/base64-1.1.2.tgz", + "integrity": "sha512-AZkcAA5vnN/v4PDqKyMR5lx7hZttPDgClv83E//FMNhR2TMcLUhfRUBHCmSl0oi9zMgDDqRUJkSxO3wm85+XLg==", + "license": "BSD-3-Clause", + "optional": true + }, + "node_modules/@protobufjs/codegen": { + "version": "2.0.5", + "resolved": "https://registry.npmjs.org/@protobufjs/codegen/-/codegen-2.0.5.tgz", + "integrity": "sha512-zgXFLzW3Ap33e6d0Wlj4MGIm6Ce8O89n/apUaGNB/jx+hw+ruWEp7EwGUshdLKVRCxZW12fp9r40E1mQrf/34g==", + "license": "BSD-3-Clause", + "optional": true + }, + "node_modules/@protobufjs/eventemitter": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@protobufjs/eventemitter/-/eventemitter-1.1.1.tgz", + "integrity": "sha512-vW1GmwMZNnL+gMRaovlh9yZX74kc+TTU3FObkkurpMaRtBfLP3ldjS9KQWlwZgraRE0+dheEEoAxdzcJQ8eXZg==", + "license": "BSD-3-Clause", + "optional": true + }, + "node_modules/@protobufjs/fetch": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@protobufjs/fetch/-/fetch-1.1.1.tgz", + "integrity": "sha512-GpptLrs57adMSuHi3VNj0mAF8dwh36LMaYF6XyJ6JMWlVsc+t42tm1HSEDmOs3A8fC9yyeisgLhsTVQokOZ0zw==", + "license": "BSD-3-Clause", + "optional": true, + "dependencies": { + "@protobufjs/aspromise": "^1.1.1" + } + }, + "node_modules/@protobufjs/float": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/@protobufjs/float/-/float-1.0.2.tgz", + "integrity": "sha512-Ddb+kVXlXst9d+R9PfTIxh1EdNkgoRe5tOX6t01f1lYWOvJnSPDBlG241QLzcyPdoNTsblLUdujGSE4RzrTZGQ==", + "license": "BSD-3-Clause", + "optional": true + }, + "node_modules/@protobufjs/inquire": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/@protobufjs/inquire/-/inquire-1.1.2.tgz", + "integrity": "sha512-pa0vFRuws4wkvaXKK1uXZMAwAX4/t8ANaJo45iw/oQHNQ9q5xUzwgFmVJGXiga2BeN+zpX7Vf9vmsiIa2J+MUw==", + "license": "BSD-3-Clause", + "optional": true + }, + "node_modules/@protobufjs/path": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/@protobufjs/path/-/path-1.1.2.tgz", + "integrity": "sha512-6JOcJ5Tm08dOHAbdR3GrvP+yUUfkjG5ePsHYczMFLq3ZmMkAD98cDgcT2iA1lJ9NVwFd4tH/iSSoe44YWkltEA==", + "license": "BSD-3-Clause", + "optional": true + }, + "node_modules/@protobufjs/pool": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/@protobufjs/pool/-/pool-1.1.0.tgz", + "integrity": "sha512-0kELaGSIDBKvcgS4zkjz1PeddatrjYcmMWOlAuAPwAeccUrPHdUqo/J6LiymHHEiJT5NrF1UVwxY14f+fy4WQw==", + "license": "BSD-3-Clause", + "optional": true + }, + "node_modules/@protobufjs/utf8": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@protobufjs/utf8/-/utf8-1.1.1.tgz", + "integrity": "sha512-oOAWABowe8EAbMyWKM0tYDKi8Yaox52D+HWZhAIJqQXbqe0xI/GV7FhLWqlEKreMkfDjshR5FKgi3mnle0h6Eg==", + "license": "BSD-3-Clause", + "optional": true + }, + "node_modules/@types/node": { + "version": "25.9.1", + "resolved": "https://registry.npmjs.org/@types/node/-/node-25.9.1.tgz", + "integrity": "sha512-xfrlY7UD5rMJk3ZVJP8BNzS28J36YJg+xp+LPXV1TdWxr8uMH5A860QNxYDGQe/ylDSgjxE52Q9VnO7p75tJxg==", + "license": "MIT", + "optional": true, + "dependencies": { + "undici-types": ">=7.24.0 <7.24.7" + } + }, "node_modules/accepts": { "version": "1.3.8", "resolved": "https://registry.npmjs.org/accepts/-/accepts-1.3.8.tgz", @@ -300,6 +385,13 @@ "integrity": "sha512-PCVAQswWemu6UdxsDFFX/+gVeYqKAod3D3UVm91jHwynguOwAvYPhx8nNlM++NqRcK6CxxpUafjmhIdKiHibqg==", "license": "MIT" }, + "node_modules/bmp-js": { + "version": "0.1.0", + "resolved": "https://registry.npmjs.org/bmp-js/-/bmp-js-0.1.0.tgz", + "integrity": "sha512-vHdS19CnY3hwiNdkaqk93DvjVLfbEcI8mys4UjuWrlX1haDmroo8o4xCzh4wD6DGV6HxRCyauwhHRqMTfERtjw==", + "license": "MIT", + "optional": true + }, "node_modules/body-parser": { "version": "1.20.5", "resolved": "https://registry.npmjs.org/body-parser/-/body-parser-1.20.5.tgz", @@ -579,6 +671,13 @@ "node": ">= 0.8" } }, + "node_modules/flatbuffers": { + "version": "25.9.23", + "resolved": "https://registry.npmjs.org/flatbuffers/-/flatbuffers-25.9.23.tgz", + "integrity": "sha512-MI1qs7Lo4Syw0EOzUl0xjs2lsoeqFku44KpngfIduHBYvzm8h2+7K8YMQh1JtVVVrUvhLpNwqVi4DERegUJhPQ==", + "license": "Apache-2.0", + "optional": true + }, "node_modules/forwarded": { "version": "0.2.0", "resolved": "https://registry.npmjs.org/forwarded/-/forwarded-0.2.0.tgz", @@ -655,6 +754,13 @@ "url": "https://github.com/sponsors/ljharb" } }, + "node_modules/guid-typescript": { + "version": "1.0.9", + "resolved": "https://registry.npmjs.org/guid-typescript/-/guid-typescript-1.0.9.tgz", + "integrity": "sha512-Y8T4vYhEfwJOTbouREvG+3XDsjr8E3kIr7uf+JZ0BYloFsttiHU0WfvANVsR7TxNUJa/WpCnw/Ino/p+DeBhBQ==", + "license": "ISC", + "optional": true + }, "node_modules/has-symbols": { "version": "1.1.0", "resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz", @@ -711,6 +817,13 @@ "node": ">=0.10.0" } }, + "node_modules/idb-keyval": { + "version": "6.2.4", + "resolved": "https://registry.npmjs.org/idb-keyval/-/idb-keyval-6.2.4.tgz", + "integrity": "sha512-D/NzHWUmYJGXi++z67aMSrnisb9A3621CyRK5G89JyTlN13C8xf0g04DLxUKMufPem3e3L2JAXR6Z00OWy183Q==", + "license": "Apache-2.0", + "optional": true + }, "node_modules/inherits": { "version": "2.0.4", "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz", @@ -726,6 +839,27 @@ "node": ">= 0.10" } }, + "node_modules/is-electron": { + "version": "2.2.2", + "resolved": "https://registry.npmjs.org/is-electron/-/is-electron-2.2.2.tgz", + "integrity": "sha512-FO/Rhvz5tuw4MCWkpMzHFKWD2LsfHzIb7i6MdPYZ/KW7AlxawyLkqdy+jPZP1WubqEADE3O4FUENlJHDfQASRg==", + "license": "MIT", + "optional": true + }, + "node_modules/is-url": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/is-url/-/is-url-1.2.4.tgz", + "integrity": "sha512-ITvGim8FhRiYe4IQ5uHSkj7pVaPDrCTkNd3yq3cV7iZAcJdHTUMPMEHcqSOy9xZ9qFenQCvi+2wjH9a1nXqHww==", + "license": "MIT", + "optional": true + }, + "node_modules/long": { + "version": "5.3.2", + "resolved": "https://registry.npmjs.org/long/-/long-5.3.2.tgz", + "integrity": "sha512-mNAgZ1GmyNhD7AuqnTG3/VQ26o760+ZYBPKjPvugO8+nLbYfX6TVpJPseBvopbdY+qpZ/lKUnmEc1LeZYS3QAA==", + "license": "Apache-2.0", + "optional": true + }, "node_modules/math-intrinsics": { "version": "1.1.0", "resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz", @@ -810,6 +944,27 @@ "node": ">= 0.6" } }, + "node_modules/node-fetch": { + "version": "2.7.0", + "resolved": "https://registry.npmjs.org/node-fetch/-/node-fetch-2.7.0.tgz", + "integrity": "sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A==", + "license": "MIT", + "optional": true, + "dependencies": { + "whatwg-url": "^5.0.0" + }, + "engines": { + "node": "4.x || >=6.0.0" + }, + "peerDependencies": { + "encoding": "^0.1.0" + }, + "peerDependenciesMeta": { + "encoding": { + "optional": true + } + } + }, "node_modules/object-inspect": { "version": "1.13.4", "resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.4.tgz", @@ -834,6 +989,38 @@ "node": ">= 0.8" } }, + "node_modules/onnxruntime-common": { + "version": "1.26.0", + "resolved": "https://registry.npmjs.org/onnxruntime-common/-/onnxruntime-common-1.26.0.tgz", + "integrity": "sha512-qVyMR4lcWgbkc4getFV+GQijsTnbg/siteoqcDwa3sI/LxbrMSNw4ePyvCq/ymdQaRomCA7YuWmhzsswxvymdw==", + "license": "MIT", + "optional": true + }, + "node_modules/onnxruntime-web": { + "version": "1.26.0", + "resolved": "https://registry.npmjs.org/onnxruntime-web/-/onnxruntime-web-1.26.0.tgz", + "integrity": "sha512-LbRr/8zZt2xilI2smrVQGGKINo0U46i8qJp+UXyMBGfqN7KjnH1BiwCwLwyNIVV4i9CKFv7Sf4PwLKWnT8/bEA==", + "license": "MIT", + "optional": true, + "dependencies": { + "flatbuffers": "^25.1.24", + "guid-typescript": "^1.0.9", + "long": "^5.2.3", + "onnxruntime-common": "1.26.0", + "platform": "^1.3.6", + "protobufjs": "^7.2.4" + } + }, + "node_modules/opencollective-postinstall": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/opencollective-postinstall/-/opencollective-postinstall-2.0.3.tgz", + "integrity": "sha512-8AV/sCtuzUeTo8gQK5qDZzARrulB3egtLzFgteqB2tcT4Mw7B8Kt7JcDHmltjz6FOAHsvTevk70gZEbhM4ZS9Q==", + "license": "MIT", + "optional": true, + "bin": { + "opencollective-postinstall": "index.js" + } + }, "node_modules/parseurl": { "version": "1.3.3", "resolved": "https://registry.npmjs.org/parseurl/-/parseurl-1.3.3.tgz", @@ -862,6 +1049,38 @@ "@napi-rs/canvas": "^0.1.100" } }, + "node_modules/platform": { + "version": "1.3.6", + "resolved": "https://registry.npmjs.org/platform/-/platform-1.3.6.tgz", + "integrity": "sha512-fnWVljUchTro6RiCFvCXBbNhJc2NijN7oIQxbwsyL0buWJPG85v81ehlHI9fXrJsMNgTofEoWIQeClKpgxFLrg==", + "license": "MIT", + "optional": true + }, + "node_modules/protobufjs": { + "version": "7.6.1", + "resolved": "https://registry.npmjs.org/protobufjs/-/protobufjs-7.6.1.tgz", + "integrity": "sha512-4K0myLaWL5EteuSAro91EGFgcfVgxb64Jx+7oDAY6GOkXD4M69yuSEljNcInGVCA5sOPxmZ/EqDLj2x0Q0+Ygg==", + "hasInstallScript": true, + "license": "BSD-3-Clause", + "optional": true, + "dependencies": { + "@protobufjs/aspromise": "^1.1.2", + "@protobufjs/base64": "^1.1.2", + "@protobufjs/codegen": "^2.0.5", + "@protobufjs/eventemitter": "^1.1.1", + "@protobufjs/fetch": "^1.1.1", + "@protobufjs/float": "^1.0.2", + "@protobufjs/inquire": "^1.1.2", + "@protobufjs/path": "^1.1.2", + "@protobufjs/pool": "^1.1.0", + "@protobufjs/utf8": "^1.1.1", + "@types/node": ">=13.7.0", + "long": "^5.3.2" + }, + "engines": { + "node": ">=12.0.0" + } + }, "node_modules/proxy-addr": { "version": "2.0.7", "resolved": "https://registry.npmjs.org/proxy-addr/-/proxy-addr-2.0.7.tgz", @@ -914,6 +1133,13 @@ "node": ">= 0.8" } }, + "node_modules/regenerator-runtime": { + "version": "0.13.11", + "resolved": "https://registry.npmjs.org/regenerator-runtime/-/regenerator-runtime-0.13.11.tgz", + "integrity": "sha512-kY1AZVr2Ra+t+piVaJ4gxaFaReZVH40AKNo7UCX6W+dEwBo/2oZJzqfuN1qLq1oL45o56cPaTXELwrTh8Fpggg==", + "license": "MIT", + "optional": true + }, "node_modules/safe-buffer": { "version": "5.2.1", "resolved": "https://registry.npmjs.org/safe-buffer/-/safe-buffer-5.2.1.tgz", @@ -1072,6 +1298,33 @@ "node": ">= 0.8" } }, + "node_modules/tesseract.js": { + "version": "5.1.1", + "resolved": "https://registry.npmjs.org/tesseract.js/-/tesseract.js-5.1.1.tgz", + "integrity": "sha512-lzVl/Ar3P3zhpUT31NjqeCo1f+D5+YfpZ5J62eo2S14QNVOmHBTtbchHm/YAbOOOzCegFnKf4B3Qih9LuldcYQ==", + "hasInstallScript": true, + "license": "Apache-2.0", + "optional": true, + "dependencies": { + "bmp-js": "^0.1.0", + "idb-keyval": "^6.2.0", + "is-electron": "^2.2.2", + "is-url": "^1.2.4", + "node-fetch": "^2.6.9", + "opencollective-postinstall": "^2.0.3", + "regenerator-runtime": "^0.13.3", + "tesseract.js-core": "^5.1.1", + "wasm-feature-detect": "^1.2.11", + "zlibjs": "^0.3.1" + } + }, + "node_modules/tesseract.js-core": { + "version": "5.1.1", + "resolved": "https://registry.npmjs.org/tesseract.js-core/-/tesseract.js-core-5.1.1.tgz", + "integrity": "sha512-KX3bYSU5iGcO1XJa+QGPbi+Zjo2qq6eBhNjSGR5E5q0JtzkoipJKOUQD7ph8kFyteCEfEQ0maWLu8MCXtvX5uQ==", + "license": "Apache-2.0", + "optional": true + }, "node_modules/toidentifier": { "version": "1.0.1", "resolved": "https://registry.npmjs.org/toidentifier/-/toidentifier-1.0.1.tgz", @@ -1081,6 +1334,13 @@ "node": ">=0.6" } }, + "node_modules/tr46": { + "version": "0.0.3", + "resolved": "https://registry.npmjs.org/tr46/-/tr46-0.0.3.tgz", + "integrity": "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw==", + "license": "MIT", + "optional": true + }, "node_modules/type-is": { "version": "1.6.18", "resolved": "https://registry.npmjs.org/type-is/-/type-is-1.6.18.tgz", @@ -1094,6 +1354,13 @@ "node": ">= 0.6" } }, + "node_modules/undici-types": { + "version": "7.24.6", + "resolved": "https://registry.npmjs.org/undici-types/-/undici-types-7.24.6.tgz", + "integrity": "sha512-WRNW+sJgj5OBN4/0JpHFqtqzhpbnV0GuB+OozA9gCL7a993SmU+1JBZCzLNxYsbMfIeDL+lTsphD5jN5N+n0zg==", + "license": "MIT", + "optional": true + }, "node_modules/unpipe": { "version": "1.0.0", "resolved": "https://registry.npmjs.org/unpipe/-/unpipe-1.0.0.tgz", @@ -1120,6 +1387,41 @@ "engines": { "node": ">= 0.8" } + }, + "node_modules/wasm-feature-detect": { + "version": "1.8.0", + "resolved": "https://registry.npmjs.org/wasm-feature-detect/-/wasm-feature-detect-1.8.0.tgz", + "integrity": "sha512-zksaLKM2fVlnB5jQQDqKXXwYHLQUVH9es+5TOOHwGOVJOCeRBCiPjwSg+3tN2AdTCzjgli4jijCH290kXb/zWQ==", + "license": "Apache-2.0", + "optional": true + }, + "node_modules/webidl-conversions": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-3.0.1.tgz", + "integrity": "sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ==", + "license": "BSD-2-Clause", + "optional": true + }, + "node_modules/whatwg-url": { + "version": "5.0.0", + "resolved": "https://registry.npmjs.org/whatwg-url/-/whatwg-url-5.0.0.tgz", + "integrity": "sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw==", + "license": "MIT", + "optional": true, + "dependencies": { + "tr46": "~0.0.3", + "webidl-conversions": "^3.0.0" + } + }, + "node_modules/zlibjs": { + "version": "0.3.1", + "resolved": "https://registry.npmjs.org/zlibjs/-/zlibjs-0.3.1.tgz", + "integrity": "sha512-+J9RrgTKOmlxFSDHo0pI1xM6BLVUv+o0ZT9ANtCxGkjIVCCUdx9alUF8Gm+dGLKbkkkidWIHFDZHDMpfITt4+w==", + "license": "MIT", + "optional": true, + "engines": { + "node": "*" + } } } } diff --git a/package.json b/package.json index b2272af..a33cae6 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "trans2former", - "version": "2.2.0", + "version": "2.3.0", "description": "Trans2Former: Browser-first multi-format document transformer.", "type": "module", "main": "src/web-server.js", @@ -9,11 +9,14 @@ "web": "node src/web-server.js", "vendor:pdfjs": "node scripts/sync-pdfjs-vendor.js", "vendor:tesseract": "node scripts/sync-tesseract-vendor.js", - "release:prepare": "node scripts/sync-pdfjs-vendor.js && node scripts/sync-tesseract-vendor.js && node scripts/prepare-release.js", + "vendor:onnx": "node scripts/sync-onnxruntime-vendor.js", + "vendor:paddle": "node scripts/sync-paddleocr-vendor.js", + "samples:generate": "node scripts/generate-samples.js", + "release:prepare": "node scripts/sync-pdfjs-vendor.js && node scripts/sync-tesseract-vendor.js && node scripts/sync-onnxruntime-vendor.js && node scripts/sync-paddleocr-vendor.js && node scripts/prepare-release.js", "desktop:check": "node scripts/desktop-shell-test.js", "desktop:dev": "npm exec @tauri-apps/cli -- dev", "desktop:build": "npm exec @tauri-apps/cli -- build", - "test": "node scripts/smoke-test.js && node scripts/conversion-snapshot-test.js && node scripts/conversion-capability-audit-test.js && node scripts/product-matrix-docs-test.js && node scripts/conversion-quality-test.js && node scripts/format-integrity-test.js && node scripts/worker-payload-test.js && node scripts/browser-smoke-test.js && node scripts/workbench-queue-test.js && node scripts/desktop-shell-test.js && node scripts/local-security-test.js && node scripts/local-model-direction-test.js && node scripts/repair-engine-test.js && node scripts/model-cache-test.js && node scripts/ocr-baseline-test.js && node scripts/resource-budget-test.js && node scripts/p2-responsiveness-test.js && node scripts/p4-p5-p6-test.js && node scripts/p7-release-productization-test.js && node scripts/release-readiness-test.js" + "test": "node scripts/smoke-test.js && node scripts/conversion-snapshot-test.js && node scripts/conversion-capability-audit-test.js && node scripts/product-matrix-docs-test.js && node scripts/conversion-quality-test.js && node scripts/format-integrity-test.js && node scripts/worker-payload-test.js && node scripts/browser-smoke-test.js && node scripts/workbench-queue-test.js && node scripts/desktop-shell-test.js && node scripts/local-security-test.js && node scripts/local-model-direction-test.js && node scripts/repair-engine-test.js && node scripts/rule-diff-test.js && node scripts/ssim-verification-test.js && node scripts/ocr-readback-test.js && node scripts/sample-corpus-test.js && node scripts/paddle-ocr-pipeline-test.js && node scripts/paddle-ocr-integration-test.js && node scripts/ocr-structure-test.js && node scripts/latex-math-test.js && node scripts/model-cache-test.js && node scripts/ocr-baseline-test.js && node scripts/resource-budget-test.js && node scripts/p2-responsiveness-test.js && node scripts/p4-p5-p6-test.js && node scripts/p7-release-productization-test.js && node scripts/release-readiness-test.js" }, "keywords": [ "converter", @@ -38,6 +41,7 @@ }, "homepage": "https://github.com/Vantalens/Trans2Former#readme", "optionalDependencies": { + "onnxruntime-web": "^1.26.0", "pdfjs-dist": "^5.7.284", "tesseract.js": "^5.1.1" } diff --git a/public/app.js b/public/app.js index e88a539..7f6664a 100644 --- a/public/app.js +++ b/public/app.js @@ -8,6 +8,7 @@ import { renderPreviewHtml, toConversionDocumentModel, toDocumentModel, + ensurePaddleDefaultModels, } from "./browser-transformer.js"; import { normalizeConversionError } from "./core/conversion-error.js"; import { getPlainText } from "./core/document-model.js"; @@ -30,6 +31,7 @@ import { import { readBlobAsDecodedText } from "./core/text-decoding.js"; import { expandPdfContentForTextExtraction } from "./formats/pdf.js"; import { openPreview } from "./router.js"; +import { renderMathIn } from "./katex-render.js"; const inputContent = document.getElementById("inputContent"); const sourcePane = document.querySelector(".source-pane"); @@ -83,6 +85,15 @@ const clearResolvedWarningsButton = document.getElementById("clearResolvedWarnin const qualityReportList = document.getElementById("qualityReportList"); const diffSummary = document.getElementById("diffSummary"); const versionsList = document.getElementById("versionsList"); +const verificationReportPanel = document.getElementById("verificationReportPanel"); +const verificationReportBadge = document.getElementById("verificationReportBadge"); +const verificationRepair = document.getElementById("verificationRepair"); +const verificationRuleDiff = document.getElementById("verificationRuleDiff"); +const verificationSsim = document.getElementById("verificationSsim"); +const verificationOcrReadback = document.getElementById("verificationOcrReadback"); +const verificationOcrRecognition = document.getElementById("verificationOcrRecognition"); +const verificationOcrRecognitionRow = document.getElementById("verificationOcrRecognitionRow"); +const verificationWarnings = document.getElementById("verificationWarnings"); const securityCenterButton = document.getElementById("securityCenterButton"); const workbenchTabs = document.getElementById("workbenchTabs"); const wordCountEl = document.getElementById("wordCount"); @@ -130,6 +141,7 @@ let outputDirectoryLabel = "浏览器下载目录"; let sessionVersions = []; let outputVersionIndex = -1; let currentDocumentModel = null; +let currentConversionQuality = null; let currentOutputFormat = ""; let currentOutputMime = ""; let currentOutputType = "none"; @@ -551,6 +563,97 @@ function renderBottomReports(model = null, output = "") { updateWarningsResolvedControls(model); } +function describeRuleDiff(ruleDiff, verification) { + if (ruleDiff) { + const score = typeof ruleDiff.overallScore === "number" ? ruleDiff.overallScore.toFixed(3) : "-"; + const struct = `+${ruleDiff.addedBlocks?.length || 0}/-${ruleDiff.removedBlocks?.length || 0}/~${ruleDiff.changedBlocks?.length || 0}`; + return { state: ruleDiff.identical ? "ok" : "drift", text: `${ruleDiff.fidelity} · score ${score} · 块 ${struct}` }; + } + const skip = (verification?.skipped || []).find((entry) => entry.layer === "rule-diff"); + return { state: "skip", text: `跳过:${skip?.reason || "未触发"}` }; +} + +function describeSsim(ssim, verification) { + if (ssim) { + const score = typeof ssim.score === "number" ? ssim.score.toFixed(3) : "-"; + return { state: ssim.passed ? "ok" : "drift", text: `score ${score} (阈值 ${ssim.threshold}) · ${ssim.sourceFormat}→${ssim.outputFormat}` }; + } + const skip = (verification?.skipped || []).find((entry) => entry.layer === "ssim"); + return { state: "skip", text: `跳过:${skip?.reason || "未触发"}` }; +} + +function describeOcrReadback(ocrReadback, verification) { + if (ocrReadback) { + const f1 = typeof ocrReadback.f1 === "number" ? ocrReadback.f1.toFixed(3) : "-"; + const recall = typeof ocrReadback.recall === "number" ? ocrReadback.recall.toFixed(3) : "-"; + return { state: ocrReadback.passed ? "ok" : "drift", text: `f1 ${f1} · recall ${recall} (阈值 ${ocrReadback.threshold}) · ${ocrReadback.engineId}` }; + } + const skip = (verification?.skipped || []).find((entry) => entry.layer === "ocr-readback"); + return { state: "skip", text: `跳过:${skip?.reason || "未触发"}` }; +} + +function applyVerificationRow(node, descriptor) { + if (!node) return; + node.textContent = descriptor.text; + node.dataset.state = descriptor.state; +} + +function renderVerificationReport(quality = currentConversionQuality) { + if (!verificationReportPanel) return; + if (!quality || !quality.qualityReport) { + verificationReportPanel.hidden = true; + return; + } + const report = quality.qualityReport; + const verification = report.verification || { layers: [], skipped: [] }; + const autoRepair = quality.autoRepair || {}; + + const repairStatus = report.repairStatus || (autoRepair.attempted ? "verified" : "not-attempted"); + const finalDecision = report.finalDecision || autoRepair.finalDecision || "pending"; + applyVerificationRow(verificationRepair, { + state: finalDecision === "verified" ? "ok" : (finalDecision === "failed-quality-gate" ? "drift" : "skip"), + text: `${repairStatus} · 结论 ${finalDecision}`, + }); + + applyVerificationRow(verificationRuleDiff, describeRuleDiff(report.ruleDiff, verification)); + applyVerificationRow(verificationSsim, describeSsim(report.ssim, verification)); + applyVerificationRow(verificationOcrReadback, describeOcrReadback(report.ocrReadback, verification)); + + // OCR 识别质量(仅当本次转换跑了 OCR 识别才显示) + const modelReview = quality.modelReview || {}; + if (verificationOcrRecognitionRow) { + if (modelReview.ocr) { + const ocr = modelReview.ocr; + const q = modelReview.ocrQuality || {}; + const conf = typeof ocr.averageConfidence === "number" ? ocr.averageConfidence.toFixed(3) : "-"; + const parts = [`引擎 ${ocr.engine || "-"}`, `${ocr.lineCount ?? 0} 行`, `置信度 ${conf}`]; + if (q.grade) parts.push(`质量 ${q.grade}`); + if (q.lowConfidenceLines) parts.push(`低置信 ${q.lowConfidenceLines}`); + if (q.skewApplied) parts.push(`纠偏 ${q.skewApplied}°`); + if (q.rotatedLines) parts.push(`方向校正 ${q.rotatedLines}`); + if (q.denoised) parts.push("已去噪"); + const state = q.grade === "low" ? "drift" : (q.grade === "medium" ? "skip" : "ok"); + applyVerificationRow(verificationOcrRecognition, { state, text: parts.join(" · ") }); + verificationOcrRecognitionRow.hidden = false; + } else { + verificationOcrRecognitionRow.hidden = true; + } + } + + const severity = report.warningsBySeverity || {}; + const severityText = Object.keys(severity).length > 0 + ? Object.entries(severity).map(([level, count]) => `${level}:${count}`).join(" · ") + : "无"; + applyVerificationRow(verificationWarnings, { state: (report.downgradeCount || 0) > 0 ? "drift" : "ok", text: `${report.warningCount || 0} 条(${severityText})` }); + + const activeLayers = (verification.layers || []).length; + if (verificationReportBadge) { + verificationReportBadge.textContent = activeLayers > 0 ? `${activeLayers} 层已检验` : "未触发检验层"; + verificationReportBadge.dataset.state = activeLayers > 0 ? "ok" : "skip"; + } + verificationReportPanel.hidden = false; +} + function getOutputLineCount(output) { return String(output || "").split("\n").length; } @@ -610,6 +713,7 @@ function renderOutputPreview(content = "") { try { textOutputPreview.innerHTML = renderPreviewHtml(content, currentOutputFormat, currentFileName); + renderMathIn(textOutputPreview); outputPreviewNotice.hidden = true; outputPreviewNotice.textContent = ""; } catch (error) { @@ -860,6 +964,10 @@ function resetGeneratedOutput(metaMessage = "尚未生成") { currentOutputFormat = ""; currentOutputType = "none"; currentResolvedWarnings = new Set(); + currentConversionQuality = null; + if (verificationReportPanel) { + verificationReportPanel.hidden = true; + } textOutputPreview.textContent = ""; pdfPreview.removeAttribute("src"); downloadOutputButton.textContent = "下载输出"; @@ -1032,6 +1140,7 @@ function renderLargeDocumentPreview(rawContent, fileName = currentFileName) { `

当前仅渲染前 ${Math.min(model.blocks.length, LARGE_PREVIEW_BLOCK_LIMIT)} 个结构块,转换仍在 Worker 中完整执行。

`, ].join(""); htmlPreview.innerHTML = summary + renderPreviewHtml(previewContent, fromFormatSelect.value, fileName); + renderMathIn(htmlPreview); renderDocumentModelPanel({ ...model, blocks: model.blocks.slice(0, LARGE_PREVIEW_BLOCK_LIMIT), @@ -1071,6 +1180,7 @@ function renderPreview() { const model = toDocumentModel(content, fromFormatSelect.value, currentFileName); const bodyHtml = renderPreviewHtml(content, fromFormatSelect.value, currentFileName); htmlPreview.innerHTML = bodyHtml; + renderMathIn(htmlPreview); renderDocumentModelPanel(model); renderBottomReports(model); lastRenderedPayload = payloadKey; @@ -1230,11 +1340,16 @@ function releaseConversionResources() { } function convertWithWorker(payload) { + const fromFmt = String(payload?.from || "").toLowerCase(); + // OCR 适用输入(图片 / PDF)必须走主线程的异步管线:OCR 需要 canvas 解码图像 + + // onnxruntime 推理,这些在转换 Web Worker 里不可用。直接走 convertContentAsync—— + // 图片与扫描 PDF 触发 OCR;文本 PDF 仍走常规文本路径。 + if (fromFmt === "png" || fromFmt === "pdf") { + // 先确保随包 PP-OCRv5 模型已载入本地缓存(幂等),再跑异步转换/OCR。 + return ensurePaddleDefaultModels().catch(() => {}).then(() => convertInBrowserAsync(payload)); + } const worker = createConvertWorker(); if (!worker) { - if (String(payload?.from || "").toLowerCase() === "png") { - return Promise.resolve(convertInBrowserAsync(payload)); - } return Promise.resolve(convertInBrowser(payload)); } @@ -1304,6 +1419,7 @@ async function transformContent() { const result = await convertWithWorker({ content, from, to, title, fileName: currentFileName, options: { profile: markdownOutputProfile } }); const model = toConversionDocumentModel(content, from, to, currentFileName, currentFileName); currentDocumentModel = model; + currentConversionQuality = result.quality || null; renderDocumentModelPanel(model); currentOutputType = result.type; currentOutputFormat = result.format; @@ -1311,6 +1427,7 @@ async function transformContent() { clearOutputHistory(); updateOutputVersionControls(); renderBottomReports(model, result.type === "text" ? result.data : ""); + renderVerificationReport(currentConversionQuality); if (result.type === "binary") { currentOutputType = "binary"; @@ -1672,3 +1789,7 @@ syncPdfPaperControl(); openPdfPreviewButton.disabled = true; if (openStandalonePreviewButton) openStandalonePreviewButton.disabled = true; updateConversionProgress({ stage: "idle", progress: 0 }); + +// 后台开箱即用加载随包 PP-OCRv5 模型(同源 vendor → 本地缓存),让高级 OCR 无需手动导入。 +// 失败/缺失静默(仍可经安全中心手动导入),不阻塞 UI。 +ensurePaddleDefaultModels().catch(() => {}); diff --git a/public/browser-transformer.js b/public/browser-transformer.js index e2ce542..b3f325d 100644 --- a/public/browser-transformer.js +++ b/public/browser-transformer.js @@ -1,4 +1,4 @@ -import { ConverterRegistry, getAllowedOutputFormats, normalizeFormat } from "./core/format-registry.js"; +import { ConverterRegistry, getAllowedOutputFormats, getKnownInputFormats, normalizeFormat } from "./core/format-registry.js"; import { readCsv, writeCsv } from "./formats/csv.js"; import { readDoc } from "./formats/doc.js"; import { readDocx } from "./formats/docx.js"; @@ -249,9 +249,61 @@ export function listFormats() { export { normalizeFormat }; export { getAllowedOutputFormats }; +export { getKnownInputFormats }; export { expandPdfContentForTextExtraction }; export { defaultRepairEngine, RepairEngine, MIN_CONFIDENCE } from "./core/repair-engine.js"; export { REPAIR_ACTION_TYPES, createRepairAction, validateRepairAction } from "./core/repair-actions.js"; +export { + ROUND_TRIP_FORMATS, + blockFingerprint, + modelFingerprint, + getBlockKey, + extractBlockFields, + BLOCK_FIELDS_BY_TYPE, +} from "./core/verification/block-fingerprint.js"; +export { + diffSemanticDocs, + MAJOR_WEIGHT, + MINOR_WEIGHT, + STRUCTURAL_PENALTY, +} from "./core/verification/rule-diff.js"; +export { + runVerificationStage, + runVerificationStageAsync, + runSsimLayer, + RULE_DIFF_DRIFT, + RULE_DIFF_READBACK_FAILED, + SSIM_VISUAL_DRIFT, + SSIM_SOURCE_UNAVAILABLE, + DEFAULT_SSIM_THRESHOLD, +} from "./core/verification/verification-stage.js"; +export { + computeSSIM, + compareImages, + rgbaToGrayscale, + resampleGrayscale, + SSIM_C1, + SSIM_C2, + DEFAULT_WINDOW_SIZE, + DEFAULT_TARGET_WIDTH, +} from "./core/verification/ssim.js"; +export { + defaultPageImageSource, + setPageImageSource, + resetPageImageSource, + RASTERIZABLE_FORMATS, + VERIFICATION_IMAGE_SOURCE_UNAVAILABLE, + VERIFICATION_IMAGE_SOURCE_FAILED, +} from "./core/verification/page-image-source.js"; +export { + compareText, + normalizeText, + extractModelText, + runOcrReadbackLayer, + OCR_READBACK_DRIFT, + OCR_READBACK_FAILED, + DEFAULT_OCR_READBACK_THRESHOLD, +} from "./core/verification/ocr-readback.js"; export { MODEL_MANIFEST_SCHEMA_VERSION, MODEL_TASKS, @@ -291,6 +343,7 @@ export { } from "./core/model-cache/ui-text.js"; import "./core/ocr/ocr-bootstrap.js"; import "./core/ocr/tesseract-bootstrap.js"; +import "./core/ocr/paddle-ocr-bootstrap.js"; export { OCR_RESULT_SCHEMA_VERSION, OCR_LANGUAGES, @@ -324,6 +377,42 @@ export { markTesseractVendorReady, } from "./core/ocr/tesseract-engine.js"; export { ensureTesseractBootstrap } from "./core/ocr/tesseract-bootstrap.js"; +export { + paddleOcrEngine, + PADDLE_OCR_MANIFEST_ID, + PADDLE_OCR_MODEL_FILES, + PADDLE_OCR_REQUIRED_FILES, + markPaddleOcrVendorReady, +} from "./core/ocr/paddle-ocr-engine.js"; +export { ensurePaddleOcrBootstrap } from "./core/ocr/paddle-ocr-bootstrap.js"; +export { ensurePaddleDefaultModels } from "./core/ocr/paddle-default-models.js"; +export { + loadOnnxRuntime, + pickExecutionProviders, + createOcrSession, + disposeOcrSession, + resetOnnxRuntimeCache, + PADDLE_VENDOR_PATHS, +} from "./core/ocr/paddle-ocr-runtime.js"; +export { + runPaddlePipeline, + parseCharDictionary, + preprocessForDetection, + preprocessForRecognition, + dbPostProcess, + ctcGreedyDecode, + cropImageData, + resizeRgba, + rotateImageData90, + rotateImageData180, + rotateImageDataByAngle, + estimateSkewAngle, + interpretClsOutput, + denoiseImageData, + estimateNoiseLevel, + DET_LIMIT_SIDE_LEN, + REC_IMAGE_HEIGHT, +} from "./core/ocr/paddle-ocr-pipeline.js"; export { InMemoryStorage, createIndexedDBStorage, @@ -339,6 +428,7 @@ export { disposeWorker, } from "./core/ocr/tesseract-runtime.js"; export { enhanceWithOCR } from "./core/ocr/png-ocr.js"; +export { deriveOcrStructure, blocksFromOcrResult } from "./core/ocr/ocr-structure.js"; export { runOCRStage, getDefaultOCRLanguage } from "./core/ocr/ocr-stage.js"; export { detectOCRLowConfidence } from "./core/ocr/ocr-validator.js"; export { diff --git a/public/core/format-registry.js b/public/core/format-registry.js index 36707ac..e3a9f1e 100644 --- a/public/core/format-registry.js +++ b/public/core/format-registry.js @@ -1,6 +1,7 @@ import { ConversionError } from "./conversion-error.js"; import { ensureDocumentAudit } from "./document-audit.js"; import { defaultRepairEngine } from "./repair-engine.js"; +import { runVerificationStage, runVerificationStageAsync } from "./verification/verification-stage.js"; import { createWarning, withWarnings } from "./warnings.js"; const FORMAT_ALIASES = { @@ -438,7 +439,8 @@ export class ConverterRegistry { }; } - _wrapWithRepairCycle({ model, output, ctx, content, fromFormat, toFormat, fileName, options }) { + // 跑 Repair Engine cycle,返回 { earlyReturn } 或 { cycle, effectiveTo, auditedModel }。 + _runRepairCycle({ model, output, ctx, content, fromFormat, toFormat, fileName, options }) { let cycle; try { cycle = defaultRepairEngine.runCycle({ model, output, ctx }); @@ -461,35 +463,75 @@ export class ConverterRegistry { options, }); return { - ...output, - quality: { - qualityReport: audited.metadata?.qualityReport || null, - modelReview: null, - autoRepair: { attempted: false, error: error?.code || "unknown", finalDecision: "failed-quality-gate" }, - conversion: audited.metadata?.conversion || null, + earlyReturn: { + ...output, + quality: { + qualityReport: audited.metadata?.qualityReport || null, + modelReview: null, + autoRepair: { attempted: false, error: error?.code || "unknown", finalDecision: "failed-quality-gate" }, + conversion: audited.metadata?.conversion || null, + }, }, }; } - const finalModel = ensureDocumentAudit({ + const effectiveTo = cycle.autoRepair?.fallbackUsed ? (cycle.autoRepair.fallbackTo || toFormat) : toFormat; + // Repair Engine 写自己的 modelReview,但要保留上游 OCR stage 记下的 ocr / ocrQuality + // 子对象(否则识别质量数据会被覆盖丢失,UI 无法展示)。 + const priorReview = cycle.model.metadata?.modelReview || {}; + const mergedModelReview = { + ...cycle.modelReview, + ...(priorReview.ocr ? { ocr: priorReview.ocr } : {}), + ...(priorReview.ocrQuality ? { ocrQuality: priorReview.ocrQuality } : {}), + }; + const auditedModel = ensureDocumentAudit({ ...cycle.model, metadata: { ...(cycle.model.metadata || {}), autoRepair: cycle.autoRepair, - modelReview: cycle.modelReview, + modelReview: mergedModelReview, }, }, { content, reader: fromFormat, - writer: cycle.autoRepair?.fallbackUsed ? (cycle.autoRepair.fallbackTo || toFormat) : toFormat, - targetFormat: cycle.autoRepair?.fallbackUsed ? (cycle.autoRepair.fallbackTo || toFormat) : toFormat, + writer: effectiveTo, + targetFormat: effectiveTo, fileName, options, }); + return { cycle, effectiveTo, auditedModel }; + } + + // 给定 Repair cycle 结果 + verification envelope,组装最终 quality 返回值。 + _assembleQuality({ cycle, effectiveTo, auditedModel, verification, content, fromFormat, fileName, options }) { + const finalModel = verification.warnings.length > 0 + ? ensureDocumentAudit({ + ...auditedModel, + metadata: withWarnings(auditedModel.metadata || {}, verification.warnings), + }, { + content, + reader: fromFormat, + writer: effectiveTo, + targetFormat: effectiveTo, + fileName, + options, + }) + : auditedModel; + const baseQualityReport = finalModel.metadata?.qualityReport || {}; const qualityReport = { ...baseQualityReport, repairStatus: cycle.autoRepair?.attempted ? "verified" : "not-attempted", finalDecision: cycle.autoRepair?.finalDecision || "pending", + ruleDiff: verification.ruleDiff, + ssim: verification.ssim ?? null, + ocrReadback: verification.ocrReadback ?? null, + verification: { + eligible: verification.eligible, + reason: verification.reason, + layers: verification.layers, + skipped: verification.skipped, + runtimeMs: verification.runtimeMs, + }, }; return { ...cycle.output, @@ -502,6 +544,24 @@ export class ConverterRegistry { }; } + // 同步路径:Repair cycle + 同步验证阶段(仅 rule-diff 层)。 + _wrapWithRepairCycle({ model, output, ctx, content, fromFormat, toFormat, fileName, options }) { + const cycleResult = this._runRepairCycle({ model, output, ctx, content, fromFormat, toFormat, fileName, options }); + if (cycleResult.earlyReturn) return cycleResult.earlyReturn; + const { cycle, effectiveTo, auditedModel } = cycleResult; + const verification = runVerificationStage({ model: auditedModel, output: cycle.output, ctx }); + return this._assembleQuality({ cycle, effectiveTo, auditedModel, verification, content, fromFormat, fileName, options }); + } + + // 异步路径:Repair cycle + 异步验证阶段(rule-diff + SSIM 视觉回环)。 + async _wrapWithRepairCycleAsync({ model, output, ctx, content, fromFormat, toFormat, fileName, options }) { + const cycleResult = this._runRepairCycle({ model, output, ctx, content, fromFormat, toFormat, fileName, options }); + if (cycleResult.earlyReturn) return cycleResult.earlyReturn; + const { cycle, effectiveTo, auditedModel } = cycleResult; + const verification = await runVerificationStageAsync({ model: auditedModel, output: cycle.output, ctx }); + return this._assembleQuality({ cycle, effectiveTo, auditedModel, verification, content, fromFormat, fileName, options }); + } + convert({ content, from, to, title = "document", fileName = "", options = {} }) { const fromFormat = normalizeFormat(from); const toFormat = normalizeFormat(to); @@ -545,6 +605,6 @@ export class ConverterRegistry { return output; } const ctx = this._buildRepairCtx({ content, fromFormat, toFormat, title, fileName, options }); - return this._wrapWithRepairCycle({ model, output, ctx, content, fromFormat, toFormat, fileName, options }); + return this._wrapWithRepairCycleAsync({ model, output, ctx, content, fromFormat, toFormat, fileName, options }); } } diff --git a/public/core/models/semantic-inlines.js b/public/core/models/semantic-inlines.js index c23cc5e..e1fe126 100644 --- a/public/core/models/semantic-inlines.js +++ b/public/core/models/semantic-inlines.js @@ -64,6 +64,11 @@ export function createInlineFootnoteRef(id) { return { type: "footnoteRef", id: String(id ?? "") }; } +// LaTeX 数学:{ type:"math", value: 原始 tex, display: 块级? }。内容逐字保留,不转义、不递归。 +export function createInlineMath(value, display = false) { + return { type: "math", value: String(value ?? ""), display: Boolean(display) }; +} + export function normalizeInlines(input) { if (input === null || input === undefined) return []; if (typeof input === "string") { @@ -85,6 +90,10 @@ export function inlinesToPlainText(inlines) { .map((node) => { if (!node || typeof node !== "object") return ""; if (node.type === "text" || node.type === "code") return String(node.value ?? ""); + if (node.type === "math") { + const d = node.display ? "$$" : "$"; + return `${d}${String(node.value ?? "")}${d}`; + } if (node.type === "linebreak") return "\n"; if (Array.isArray(node.inlines)) return inlinesToPlainText(node.inlines); return ""; @@ -101,6 +110,10 @@ export function inlinesToMarkdown(inlines) { if (node.type === "em") return `*${inlinesToMarkdown(node.inlines)}*`; if (node.type === "del") return `~~${inlinesToMarkdown(node.inlines)}~~`; if (node.type === "code") return `\`${String(node.value ?? "").replace(/`/g, "\\`")}\``; + if (node.type === "math") { + const d = node.display ? "$$" : "$"; + return `${d}${String(node.value ?? "")}${d}`; + } if (node.type === "link") { const inner = inlinesToMarkdown(node.inlines) || node.href || ""; const title = node.title ? ` "${node.title.replace(/"/g, '\\"')}"` : ""; @@ -123,6 +136,12 @@ export function inlinesToHtml(inlines) { if (node.type === "em") return `${inlinesToHtml(node.inlines)}`; if (node.type === "del") return `${inlinesToHtml(node.inlines)}`; if (node.type === "code") return `${escapeHtmlInline(node.value)}`; + if (node.type === "math") { + // 原始 tex 存 data-tex(客户端 KaTeX 渲染);span 文本是带定界符的 tex 作为无 JS 兜底。 + const tex = String(node.value ?? ""); + const d = node.display ? "$$" : "$"; + return `${escapeHtmlInline(`${d}${tex}${d}`)}`; + } if (node.type === "link") { const inner = inlinesToHtml(node.inlines); const titleAttr = node.title ? ` title="${escapeHtmlInline(node.title)}"` : ""; diff --git a/public/core/ocr/ocr-engine.js b/public/core/ocr/ocr-engine.js index b203f0a..084ebdf 100644 --- a/public/core/ocr/ocr-engine.js +++ b/public/core/ocr/ocr-engine.js @@ -98,14 +98,20 @@ export class OCREngineRegistry { pickForTask(task) { const candidates = this.list().filter((engine) => engine.taskCapabilities.includes(task)); if (candidates.length === 0) return null; - const available = candidates.find((engine) => { + // 优先级感知:在候选中按 priority 降序挑第一个 available(priority 缺省 0)。 + // 这样高级引擎(如 PP-OCRv5 priority=20)可用时优先于 tesseract(10)/ placeholder(0)。 + const isAvail = (engine) => { try { return engine.isAvailable() === true; } catch (error) { return false; } - }); + }; + const priorityOf = (engine) => Number(engine.priority) || 0; + const byPriority = [...candidates].sort((a, b) => priorityOf(b) - priorityOf(a)); + const available = byPriority.find(isAvail); if (available) return available; + // 无可用引擎:回退到最后注册的候选(行为不变,仅作为"不可用"代表)。 return candidates[candidates.length - 1]; } diff --git a/public/core/ocr/ocr-structure.js b/public/core/ocr/ocr-structure.js new file mode 100644 index 0000000..2656d02 --- /dev/null +++ b/public/core/ocr/ocr-structure.js @@ -0,0 +1,119 @@ +// OCR 版面结构推断(格式识别增强):把识别到的文本行(带 bbox)按阅读顺序归并成 +// 标题 + 段落,而不是平铺成一个大段。用相对字号(行高)判定标题,用行间垂直间距分段。 +// 纯函数,可测;无 bbox 几何信息时优雅回退(每行一段 / 单段)。 + +import { createParagraph, createHeading } from "../document-model.js"; + +function median(values) { + if (values.length === 0) return 0; + const sorted = [...values].sort((a, b) => a - b); + const mid = sorted.length >> 1; + return sorted.length % 2 ? sorted[mid] : (sorted[mid - 1] + sorted[mid]) / 2; +} + +const CJK = /[㐀-鿿豈-﫿぀-ヿ가-힯]/; + +function isCjk(ch) { + return typeof ch === "string" && CJK.test(ch); +} + +// 同段相邻行拼接:两侧都是 CJK 则直连(无空格),否则空格分隔。 +function joinLines(prevText, lineText) { + if (!prevText) return lineText; + const a = prevText[prevText.length - 1]; + const b = lineText[0]; + return isCjk(a) && isCjk(b) ? prevText + lineText : `${prevText} ${lineText}`; +} + +function headingLevel(ratio) { + if (ratio >= 2) return 1; + if (ratio >= 1.6) return 2; + return 3; +} + +// lines: [{ text, bbox:{x,y,w,h} }](confidence 可选)。返回 block 数组(heading/paragraph)。 +export function deriveOcrStructure(lines, { + headingRatio = 1.35, + paraGapRatio = 0.7, +} = {}) { + const usable = (lines || []).filter((l) => l && typeof l.text === "string" && l.text.trim().length > 0); + if (usable.length === 0) return []; + + const withBox = usable.filter((l) => l.bbox && Number.isFinite(l.bbox.h) && l.bbox.h > 0); + // 几何信息不足(如桩引擎无 bbox):无法结构化,回退为单一段落(保持旧行为)。 + if (withBox.length < 2) { + return [createParagraph(usable.map((l) => l.text.trim()).join("\n"))]; + } + + const sorted = [...withBox].sort((a, b) => (a.bbox.y - b.bbox.y) || (a.bbox.x - b.bbox.x)); + const medianHeight = median(sorted.map((l) => l.bbox.h)) || 1; + const gapThreshold = medianHeight * paraGapRatio; + + const blocks = []; + let para = ""; + let prevBottom = null; + + const flushPara = () => { + if (para.trim().length > 0) blocks.push(createParagraph(para.trim())); + para = ""; + }; + + for (const line of sorted) { + const text = line.text.trim(); + const ratio = line.bbox.h / medianHeight; + const top = line.bbox.y; + + if (ratio >= headingRatio) { + // 标题:独立成块 + flushPara(); + blocks.push(createHeading(headingLevel(ratio), text)); + prevBottom = line.bbox.y + line.bbox.h; + continue; + } + + const bigGap = prevBottom !== null && (top - prevBottom) > gapThreshold; + if (bigGap) flushPara(); + para = joinLines(para, text); + prevBottom = line.bbox.y + line.bbox.h; + } + flushPara(); + + return blocks; +} + +// 为一批 OCR 行回填它们所属块的 id。OCR 追加块在进入 repair cycle 前不会被 document-audit +// 赋 id,故调用方需先给这些块预赋稳定 id(如 "ocr-block-<绝对索引>")再调用本函数。 +// 匹配用「修剪文本包含」:块文本由若干行 trim 后拼接而成,每行 trim 文本必是某块文本的子串。 +// 单调游标处理「多行→一块」(无 bbox 回退、CJK 拼接段落);回卷一遍兼容扫描 PDF 的阅读顺序 +// 重排(lines 顺序 ≠ 块顺序)。空/空白行不产块 → 返回 ""。 +export function mapLinesToBlockIds(lines, blocks) { + const list = Array.isArray(lines) ? lines : []; + const blk = Array.isArray(blocks) ? blocks : []; + let cursor = 0; + return list.map((line) => { + const t = (line?.text || "").trim(); + if (!t) return ""; + for (let j = cursor; j < blk.length; j += 1) { + const bt = (blk[j]?.text || "").trim(); + if (bt && bt.includes(t)) { cursor = j; return blk[j]?.id || ""; } + } + for (let j = 0; j < cursor; j += 1) { + const bt = (blk[j]?.text || "").trim(); + if (bt && bt.includes(t)) return blk[j]?.id || ""; + } + return ""; + }); +} + +// 从 OCRResult 各页推断结构块(按页顺序拼接)。 +export function blocksFromOcrResult(result, options = {}) { + const pages = Array.isArray(result?.pages) ? result.pages : []; + const blocks = []; + for (const page of pages) { + blocks.push(...deriveOcrStructure(page?.lines || [], options)); + } + if (blocks.length === 0 && typeof result?.fullText === "string" && result.fullText.trim().length > 0) { + blocks.push(createParagraph(result.fullText.trim())); + } + return blocks; +} diff --git a/public/core/ocr/paddle-default-models.js b/public/core/ocr/paddle-default-models.js new file mode 100644 index 0000000..2519743 --- /dev/null +++ b/public/core/ocr/paddle-default-models.js @@ -0,0 +1,82 @@ +// 浏览器端开箱即用:把随应用打包的 PP-OCRv5 模型(public/vendor/paddleocr/ 同源) +// 自动载入 OCR 本地缓存(IndexedDB),让高级 OCR 无需手动导入即可用。仅 fetch 同源 +// vendor 资源,不联网、不上传。vendor 缺失时静默跳过(仍可经安全中心手动导入)。 + +import { defaultOCRStorage } from "./ocr-storage.js"; +import { paddleOcrEngine, markPaddleOcrVendorReady, PADDLE_OCR_REQUIRED_FILES } from "./paddle-ocr-engine.js"; + +const VENDOR_BASE = "/vendor/paddleocr/"; +const STORAGE_PREFIX = "paddleocr/v5/"; +const DICT_FILE = "dict.txt"; +// cls(方向分类)可选:vendor 缺失时静默跳过,不拖垮 det/rec 的随包载入。 +const OPTIONAL_FILE = "cls.onnx"; + +let inflight = null; + +function isBrowser() { + return typeof globalThis !== "undefined" + && typeof globalThis.fetch === "function" + && typeof globalThis.document === "object"; +} + +async function fetchToBuffer(url) { + const response = await globalThis.fetch(url); + if (!response.ok) throw new Error(`${url} -> HTTP ${response.status}`); + return response.arrayBuffer(); +} + +async function alreadyLoaded() { + for (const file of PADDLE_OCR_REQUIRED_FILES) { + if (!(await defaultOCRStorage.has(`${STORAGE_PREFIX}${file}`))) return false; + } + return true; +} + +// 幂等:若 vendor 模型已在缓存则只置位;否则 fetch 同源 vendor → 写入缓存 → markReady → probe。 +export async function ensurePaddleDefaultModels() { + if (!isBrowser()) return { loaded: false, reason: "not-browser" }; + if (inflight) return inflight; + inflight = (async () => { + try { + if (await alreadyLoaded()) { + markPaddleOcrVendorReady(true); + await paddleOcrEngine.ensureProbe(); + return { loaded: true, reason: "cached" }; + } + // 先确认 vendor 真的随包提供(HEAD det),缺失则跳过(不报错,留给手动导入)。 + const probe = await globalThis.fetch(`${VENDOR_BASE}det.onnx`, { method: "HEAD" }); + if (!probe.ok) return { loaded: false, reason: "vendor-absent" }; + + for (const file of PADDLE_OCR_REQUIRED_FILES) { + const buffer = await fetchToBuffer(`${VENDOR_BASE}${file}`); + await defaultOCRStorage.put(`${STORAGE_PREFIX}${file}`, buffer, { source: "vendor-bundle" }); + } + // 方向分类 cls(可选):vendor 缺失不致命;缺它管线跳过 180° 校正。 + try { + const cls = await fetchToBuffer(`${VENDOR_BASE}${OPTIONAL_FILE}`); + await defaultOCRStorage.put(`${STORAGE_PREFIX}${OPTIONAL_FILE}`, cls, { source: "vendor-bundle" }); + } catch (clsError) { + // cls 缺失不致命。 + } + // 字典(可选) + try { + const dict = await fetchToBuffer(`${VENDOR_BASE}${DICT_FILE}`); + await defaultOCRStorage.put(`${STORAGE_PREFIX}${DICT_FILE}`, dict, { source: "vendor-bundle" }); + } catch (dictError) { + // 字典缺失不致命;识别可降级。 + } + + markPaddleOcrVendorReady(true); + await paddleOcrEngine.ensureProbe(); + return { loaded: true, reason: "fetched" }; + } catch (error) { + return { loaded: false, reason: `error:${error?.message || error}` }; + } finally { + // 允许后续重试(例如首次 vendor 还没就位)。 + if (inflight) inflight = inflight; + } + })(); + const result = await inflight; + if (!result.loaded) inflight = null; // 失败可重试;成功保留以幂等短路 + return result; +} diff --git a/public/core/ocr/paddle-ocr-bootstrap.js b/public/core/ocr/paddle-ocr-bootstrap.js new file mode 100644 index 0000000..76441c0 --- /dev/null +++ b/public/core/ocr/paddle-ocr-bootstrap.js @@ -0,0 +1,62 @@ +import { + defaultModelCache, + STATUS_NOT_DOWNLOADED, +} from "../model-cache/availability.js"; +import { createModelManifest } from "../model-cache/manifest.js"; +import { defaultOCRRegistry } from "./ocr-engine.js"; +import { ensureTesseractBootstrap } from "./tesseract-bootstrap.js"; +import { paddleOcrEngine, PADDLE_OCR_MANIFEST_ID, PADDLE_OCR_MODEL_FILES } from "./paddle-ocr-engine.js"; + +let bootstrapped = false; + +export function ensurePaddleOcrBootstrap() { + if (bootstrapped) return; + bootstrapped = true; + + // 在 tesseract 之后注册:保证 placeholder → tesseract → paddle 的注册顺序, + // 都不可用时 pickForTask 回退候选落到 paddle(最高级 engine)。可用时的偏好 + // 排序(paddle 优先于 tesseract)留给 P9-D.4。 + ensureTesseractBootstrap(); + + if (!defaultOCRRegistry.has(paddleOcrEngine.id)) { + defaultOCRRegistry.register(paddleOcrEngine); + } + + if (!defaultModelCache.has(PADDLE_OCR_MANIFEST_ID)) { + const perFile = {}; + for (const file of PADDLE_OCR_MODEL_FILES) perFile[file] = "0".repeat(64); + const manifest = createModelManifest({ + manifestId: PADDLE_OCR_MANIFEST_ID, + task: "ocr-text", + engine: "paddleocr", + modelVersion: "v5", + bundleSize: 16 * 1024 * 1024, + quantization: "int8", + minMemoryMB: 512, + sources: [ + { kind: "on-demand-download", path: "model-cache/ocr-text/paddleocr/v5/ (det/cls/rec onnx)" }, + { kind: "user-provided", path: "PP-OCRv5 ONNX via 安全中心 → 下载/导入" }, + ], + checksums: { + algorithm: "SHA-256", + digest: "0".repeat(64), + perFile, + }, + fallback: { + onFailure: "use-degraded-route", + message: "PP-OCRv5 ONNX 占位 manifest;onnxruntime-web 运行时接入留给 P9-D.2,模型按需下载留给 P9-D.3。", + }, + ui: { + label: "PP-OCRv5 高级 OCR (ONNX/WebGPU)", + description: "比 Tesseract 更高精度的本地 OCR;ONNX Runtime + WebGPU(WASM 回退),数据留在本地、零云端。", + enableHint: "首次启用时按需下载 PP-OCRv5 det/cls/rec ONNX 模型到本地缓存,SHA-256 校验通过后激活。", + }, + }); + defaultModelCache.register(manifest); + defaultModelCache.setStatus(PADDLE_OCR_MANIFEST_ID, STATUS_NOT_DOWNLOADED, { + message: "等待 P9-D.2/P9-D.3 接入 onnxruntime-web 与按需下载;高级 OCR engine 已登记但运行时尚未就位。", + }); + } +} + +ensurePaddleOcrBootstrap(); diff --git a/public/core/ocr/paddle-ocr-engine.js b/public/core/ocr/paddle-ocr-engine.js new file mode 100644 index 0000000..ca05614 --- /dev/null +++ b/public/core/ocr/paddle-ocr-engine.js @@ -0,0 +1,171 @@ +// PP-OCRv5 高级 OCR 引擎骨架(P9-D.1)。比 tesseract 更高精度的本地 OCR engine, +// 走 ONNX Runtime + WebGPU(WASM 回退),数据留在本地、零云端。本轮仅骨架 + 契约 + +// 三阶段拒绝路径;onnxruntime vendor 与真实推理留给 P9-D.2,模型按需下载留给 P9-D.3。 + +import { ConversionError } from "../conversion-error.js"; +import { defaultOCRStorage } from "./ocr-storage.js"; +import { + OCR_ENGINE_FAILED, + OCR_UNAVAILABLE, +} from "./ocr-warnings.js"; +import { loadOnnxRuntime, pickExecutionProviders, createOcrSession, disposeOcrSession } from "./paddle-ocr-runtime.js"; +import { runPaddlePipeline, parseCharDictionary } from "./paddle-ocr-pipeline.js"; + +const DICT_KEY = "paddleocr/v5/dict.txt"; + +// 浏览器端把 image(dataURL/同源 blob URL)解码为 RGBA ImageData。不使用 fetch(遵守 +// 本地禁联网守门);用 Image + canvas。Node 无 document → 抛错(recognize 在 loadOnnxRuntime +// 阶段已先行拒绝,不会走到这里)。 +async function decodeImageToImageData(image) { + if (typeof globalThis.document?.createElement !== "function" || typeof globalThis.Image !== "function") { + throw new ConversionError("当前运行时无法解码图像(缺少 document/Image)。", { + category: "convert", + code: OCR_ENGINE_FAILED, + details: { engineId: "paddleocr-v5", reason: "image-decode-unavailable" }, + }); + } + const img = new globalThis.Image(); + img.src = image; + if (typeof img.decode === "function") { + await img.decode(); + } else { + await new Promise((resolve, reject) => { + img.onload = resolve; + img.onerror = () => reject(new Error("image load failed")); + }); + } + const width = img.naturalWidth || img.width; + const height = img.naturalHeight || img.height; + const canvas = globalThis.document.createElement("canvas"); + canvas.width = width; + canvas.height = height; + const ctx = canvas.getContext("2d"); + ctx.drawImage(img, 0, 0, width, height); + const { data } = ctx.getImageData(0, 0, width, height); + return { data, width, height }; +} + +export const PADDLE_OCR_MANIFEST_ID = "ocr-text.paddleocr.v5"; + +// PP-OCRv5 ONNX 模型(检测 / 方向分类 / 识别)。全集供 UI 列举/清理;可用性只看必选集。 +const MODEL_KEY_PREFIX = "paddleocr/v5/"; +export const PADDLE_OCR_MODEL_FILES = Object.freeze(["det.onnx", "cls.onnx", "rec.onnx"]); +// 必选集:det(DB 检测)+ rec(CTC 识别)。cls(方向分类)为可选——管线运行时已容忍其 +// 缺失(clsSession 为 null 时跳过 180° 校正),故不纳入可用性闸门。 +export const PADDLE_OCR_REQUIRED_FILES = Object.freeze(["det.onnx", "rec.onnx"]); + +// 就绪状态放模块级可变变量,而非冻结对象的实例属性(冻结对象在严格模式下无法被 +// ensureProbe 赋值)。引擎对象本身仍可 Object.freeze 防外部篡改。 +let modelsReady = false; + +function vendorReady() { + return Boolean(globalThis.__t2fPaddleOcrVendorReady); +} + +async function hasAllModels(storage) { + for (const file of PADDLE_OCR_REQUIRED_FILES) { + if (!(await storage.has(`${MODEL_KEY_PREFIX}${file}`))) return false; + } + return true; +} + +export const paddleOcrEngine = Object.freeze({ + id: "paddleocr-v5", + taskCapabilities: ["ocr-text", "ocr-layout"], + manifestId: PADDLE_OCR_MANIFEST_ID, + // 高级引擎:可用时经 pickForTask 优先于 tesseract(10) / placeholder(0)。 + priority: 20, + + // 与 OCREngineRegistry 一致:同步 isAvailable,由 ensureProbe() 在 recognize 前刷新缓存。 + // P9-D.1 阶段 vendor 未就位 + 模型未下载,恒为 false。 + isAvailable() { + if (!vendorReady()) return false; + return Boolean(modelsReady); + }, + + _storage: defaultOCRStorage, + + async ensureProbe() { + if (!vendorReady()) { + modelsReady = false; + return false; + } + modelsReady = await hasAllModels(this._storage); + return modelsReady; + }, + + async recognize({ image, options } = {}) { + if (!vendorReady()) { + throw new ConversionError( + "PP-OCRv5 ONNX runtime 未就位,无法执行高级 OCR。onnxruntime-web vendor 接入留给 P9-D.2。", + { + category: "convert", + code: OCR_UNAVAILABLE, + details: { engineId: "paddleocr-v5", manifestId: PADDLE_OCR_MANIFEST_ID, reason: "vendor-not-ready" }, + }, + ); + } + if (!(await hasAllModels(this._storage))) { + throw new ConversionError( + "未在本地缓存中找到 PP-OCRv5 必选模型(det/rec);请先在安全中心导入/下载(cls 方向分类可选)。", + { + category: "convert", + code: OCR_UNAVAILABLE, + details: { engineId: "paddleocr-v5", manifestId: PADDLE_OCR_MANIFEST_ID, reason: "model-missing" }, + }, + ); + } + if (!image) { + throw new ConversionError("OCR 输入图像缺失。", { + category: "validate", + code: OCR_ENGINE_FAILED, + details: { engineId: "paddleocr-v5", reason: "missing-image" }, + }); + } + // P9-D.2.b:真实推理管线。Node/未 vendor 在 loadOnnxRuntime 抛 OCR_VENDOR_LOAD_FAILED。 + const ort = await loadOnnxRuntime(); + const providers = pickExecutionProviders(); + let detSession = null; + let clsSession = null; + let recSession = null; + try { + const imageData = await decodeImageToImageData(image); + const [detBuf, clsBuf, recBuf] = await Promise.all([ + this._storage.get(`${MODEL_KEY_PREFIX}det.onnx`), + this._storage.get(`${MODEL_KEY_PREFIX}cls.onnx`), + this._storage.get(`${MODEL_KEY_PREFIX}rec.onnx`), + ]); + const dictBuf = await this._storage.get(DICT_KEY); + const dictionary = dictBuf + ? parseCharDictionary(new TextDecoder().decode(dictBuf instanceof ArrayBuffer ? new Uint8Array(dictBuf) : dictBuf)) + : []; + detSession = await createOcrSession({ ort, modelBuffer: detBuf, providers }); + clsSession = clsBuf ? await createOcrSession({ ort, modelBuffer: clsBuf, providers }) : null; + recSession = await createOcrSession({ ort, modelBuffer: recBuf, providers }); + return await runPaddlePipeline({ + ort, + detSession, + clsSession, + recSession, + imageData, + dictionary, + options: options || {}, + }); + } catch (error) { + if (error instanceof ConversionError) throw error; + throw new ConversionError(`PP-OCRv5 推理失败:${error?.message || error}`, { + category: "convert", + code: OCR_ENGINE_FAILED, + details: { engineId: "paddleocr-v5", reason: "inference-failed", providers, cause: String(error?.name || error?.message || "unknown") }, + }); + } finally { + await disposeOcrSession(detSession); + await disposeOcrSession(clsSession); + await disposeOcrSession(recSession); + } + }, +}); + +export function markPaddleOcrVendorReady(ready = true) { + globalThis.__t2fPaddleOcrVendorReady = Boolean(ready); +} diff --git a/public/core/ocr/paddle-ocr-pipeline.js b/public/core/ocr/paddle-ocr-pipeline.js new file mode 100644 index 0000000..e31296b --- /dev/null +++ b/public/core/ocr/paddle-ocr-pipeline.js @@ -0,0 +1,601 @@ +// PP-OCRv5 推理管线(P9-D.2.b):检测(det) + 方向(cls) + 识别(rec) 三段 ONNX 前向 + +// 经典前后处理串成 OCRResult。前后处理为纯函数,Node 可完整单测;runPaddlePipeline 接受 +// 可注入 session,便于用 mock 端到端测试(无需真实模型)。数据全程留在本地。 + +import { ConversionError } from "../conversion-error.js"; +import { createOCRResult } from "./ocr-result.js"; + +export const DET_LIMIT_SIDE_LEN = 960; +export const REC_IMAGE_HEIGHT = 48; +export const DET_MEAN = [0.485, 0.456, 0.406]; +export const DET_STD = [0.229, 0.224, 0.225]; + +function clamp(value, min, max) { + return Math.min(Math.max(value, min), max); +} + +// PP-OCR 字典:每行一个 token;CTC blank 占 index 0,末尾保证恰好一个 ASCII 空格(use_space_char)。 +// 注意:仅跳过「完全空行」(length 0),不可用 trim() —— 字典首个合法 token 常是全角空格 U+3000, +// trim() 会误删它造成全表错位。某些字典文件(如 ppu ppocrv5_dict)已把空格作为最后一行显式列出; +// 此时不再重复追加,否则类别数比模型多 1,CTC 解码整体错位。 +export function parseCharDictionary(text) { + const lines = String(text ?? "").replace(/\r\n/g, "\n").replace(/\r/g, "\n").split("\n"); + const chars = []; + for (const line of lines) { + if (line.length === 0) continue; + chars.push(line); + } + if (chars[chars.length - 1] !== " ") chars.push(" "); + return ["", ...chars]; +} + +// 最近邻重采样 RGBA 到目标尺寸。imageData = { data: Uint8ClampedArray(RGBA), width, height }。 +export function resizeRgba(imageData, dstW, dstH) { + const { data, width, height } = imageData; + if (width <= 0 || height <= 0 || dstW <= 0 || dstH <= 0) { + throw new RangeError("resizeRgba: dimensions must be positive."); + } + const out = new Uint8ClampedArray(dstW * dstH * 4); + for (let y = 0; y < dstH; y += 1) { + const sy = Math.min(height - 1, Math.floor((y * height) / dstH)); + for (let x = 0; x < dstW; x += 1) { + const sx = Math.min(width - 1, Math.floor((x * width) / dstW)); + const so = (sy * width + sx) * 4; + const dstO = (y * dstW + x) * 4; + out[dstO] = data[so]; + out[dstO + 1] = data[so + 1]; + out[dstO + 2] = data[so + 2]; + out[dstO + 3] = data[so + 3]; + } + } + return { data: out, width: dstW, height: dstH }; +} + +function roundToMultiple(value, multiple, min) { + const rounded = Math.max(min, Math.round(value / multiple) * multiple); + return rounded; +} + +// 检测预处理:限制最长边 + 取 32 倍数 + ImageNet 归一化 → NCHW Float32。 +export function preprocessForDetection(imageData, { limitSideLen = DET_LIMIT_SIDE_LEN } = {}) { + const { width, height } = imageData; + const maxSide = Math.max(width, height); + const ratio = maxSide > limitSideLen ? limitSideLen / maxSide : 1; + const resizedW = roundToMultiple(width * ratio, 32, 32); + const resizedH = roundToMultiple(height * ratio, 32, 32); + const resized = resizeRgba(imageData, resizedW, resizedH); + const data = new Float32Array(3 * resizedH * resizedW); + const plane = resizedH * resizedW; + for (let i = 0; i < plane; i += 1) { + const o = i * 4; + data[i] = (resized.data[o] / 255 - DET_MEAN[0]) / DET_STD[0]; + data[plane + i] = (resized.data[o + 1] / 255 - DET_MEAN[1]) / DET_STD[1]; + data[2 * plane + i] = (resized.data[o + 2] / 255 - DET_MEAN[2]) / DET_STD[2]; + } + return { + data, + dims: [1, 3, resizedH, resizedW], + resizedWidth: resizedW, + resizedHeight: resizedH, + scaleW: width / resizedW, + scaleH: height / resizedH, + }; +} + +// 识别预处理:高度固定 48,宽按比例(封顶),归一化到 [-1,1] → NCHW Float32。 +export function preprocessForRecognition(imageData, { height = REC_IMAGE_HEIGHT, maxWidth = 1280 } = {}) { + const ratio = imageData.height > 0 ? imageData.width / imageData.height : 1; + const targetW = clamp(Math.max(1, Math.round(height * ratio)), 1, maxWidth); + const resized = resizeRgba(imageData, targetW, height); + const data = new Float32Array(3 * height * targetW); + const plane = height * targetW; + for (let i = 0; i < plane; i += 1) { + const o = i * 4; + data[i] = (resized.data[o] / 255 - 0.5) / 0.5; + data[plane + i] = (resized.data[o + 1] / 255 - 0.5) / 0.5; + data[2 * plane + i] = (resized.data[o + 2] / 255 - 0.5) / 0.5; + } + return { data, dims: [1, 3, height, targetW], width: targetW, height }; +} + +// DB 检测后处理:阈值二值化 + 4-连通域 + 轴对齐 bbox + box 分数过滤 + 缩放回原图坐标。 +export function dbPostProcess(probData, mapW, mapH, { + thresh = 0.3, + boxThresh = 0.5, + minSize = 3, + unclipRatio = 1.6, + scaleW = 1, + scaleH = 1, +} = {}) { + if (!probData || probData.length < mapW * mapH) { + throw new ConversionError("dbPostProcess: probability map smaller than mapW*mapH.", { + category: "validate", + code: "OCR_ENGINE_INVALID", + details: { reason: "prob-map-too-small" }, + }); + } + const visited = new Uint8Array(mapW * mapH); + const boxes = []; + const stack = []; + for (let start = 0; start < mapW * mapH; start += 1) { + if (visited[start]) continue; + if (probData[start] <= thresh) { + visited[start] = 1; + continue; + } + // BFS/DFS connected component + let minX = mapW; + let minY = mapH; + let maxX = 0; + let maxY = 0; + let sum = 0; + let count = 0; + stack.length = 0; + stack.push(start); + visited[start] = 1; + while (stack.length > 0) { + const idx = stack.pop(); + const px = idx % mapW; + const py = (idx - px) / mapW; + sum += probData[idx]; + count += 1; + if (px < minX) minX = px; + if (px > maxX) maxX = px; + if (py < minY) minY = py; + if (py > maxY) maxY = py; + const neighbors = [ + px > 0 ? idx - 1 : -1, + px < mapW - 1 ? idx + 1 : -1, + py > 0 ? idx - mapW : -1, + py < mapH - 1 ? idx + mapW : -1, + ]; + for (const n of neighbors) { + if (n < 0 || visited[n]) continue; + if (probData[n] > thresh) { + visited[n] = 1; + stack.push(n); + } else { + visited[n] = 1; + } + } + } + const score = count > 0 ? sum / count : 0; + let boxW = maxX - minX + 1; + let boxH = maxY - minY + 1; + if (score < boxThresh) continue; + if (boxW < minSize || boxH < minSize) continue; + // unclip:DB 概率图相对真实文字是收缩的,按 PP-OCR 用 area*ratio/perimeter 向外扩, + // 否则裁剪框过紧、切掉字符笔画导致识别错乱。 + const distance = (boxW * boxH * unclipRatio) / Math.max(1, 2 * (boxW + boxH)); + let ex0 = minX - distance; + let ey0 = minY - distance; + let ex1 = maxX + distance; + let ey1 = maxY + distance; + ex0 = Math.max(0, ex0); + ey0 = Math.max(0, ey0); + ex1 = Math.min(mapW - 1, ex1); + ey1 = Math.min(mapH - 1, ey1); + boxes.push({ + x: Math.round(ex0 * scaleW), + y: Math.round(ey0 * scaleH), + w: Math.round((ex1 - ex0 + 1) * scaleW), + h: Math.round((ey1 - ey0 + 1) * scaleH), + score, + }); + } + // 阅读顺序:上→下,再左→右(粗启发,多栏留后)。 + boxes.sort((a, b) => (a.y - b.y) || (a.x - b.x)); + return boxes; +} + +// 噪点估计:采样像素,统计与 3×3 邻域中值相差很大(孤立跳变,椒盐特征)的灰度像素比例。 +// 文字边缘也会产生差异但占比小;椒盐噪点会显著抬高该比例。返回 [0,1] 的噪点比例。 +export function estimateNoiseLevel(imageData, { jump = 80, step = 2 } = {}) { + const { data, width, height } = imageData; + if (width < 3 || height < 3) return 0; + const gray = (o) => (data[o] * 299 + data[o + 1] * 587 + data[o + 2] * 114) / 1000; + let speckle = 0; + let sampled = 0; + const win = []; + for (let y = 1; y < height - 1; y += step) { + for (let x = 1; x < width - 1; x += step) { + win.length = 0; + for (let dy = -1; dy <= 1; dy += 1) { + for (let dx = -1; dx <= 1; dx += 1) { + win.push(gray(((y + dy) * width + (x + dx)) * 4)); + } + } + win.sort((a, b) => a - b); + const med = win[4]; + if (Math.abs(gray((y * width + x) * 4) - med) > jump) speckle += 1; + sampled += 1; + } + } + return sampled > 0 ? speckle / sampled : 0; +} + +// 图像去噪:3×3 中值滤波(逐通道取邻域中值)。对椒盐/背景杂点有效且保边, +// 用于 OCR 前清理噪点,改善带噪图 / 艺术字背景的识别。alpha 透传。 +export function denoiseImageData(imageData, { window = 3 } = {}) { + const { data, width, height } = imageData; + if (width < 3 || height < 3) return imageData; + const radius = Math.max(1, Math.floor(window / 2)); + const out = new Uint8ClampedArray(data.length); + const win = []; + for (let y = 0; y < height; y += 1) { + for (let x = 0; x < width; x += 1) { + const o = (y * width + x) * 4; + for (let c = 0; c < 3; c += 1) { + win.length = 0; + for (let dy = -radius; dy <= radius; dy += 1) { + const yy = Math.min(height - 1, Math.max(0, y + dy)); + for (let dx = -radius; dx <= radius; dx += 1) { + const xx = Math.min(width - 1, Math.max(0, x + dx)); + win.push(data[(yy * width + xx) * 4 + c]); + } + } + win.sort((a, b) => a - b); + out[o + c] = win[win.length >> 1]; + } + out[o + 3] = data[o + 3]; + } + } + return { data: out, width, height }; +} + +// 按任意角度旋转 RGBA 图像(最近邻,画布扩展以容纳,背景默认白)。正角度 = 逆时针。 +export function rotateImageDataByAngle(imageData, degrees, { background = 255 } = {}) { + const { data, width: W, height: H } = imageData; + const a = (degrees * Math.PI) / 180; + const c = Math.cos(a); + const s = Math.sin(a); + const nw = Math.max(1, Math.ceil(Math.abs(W * c) + Math.abs(H * s))); + const nh = Math.max(1, Math.ceil(Math.abs(W * s) + Math.abs(H * c))); + const out = new Uint8ClampedArray(nw * nh * 4).fill(background); + // 背景 alpha 设满 + for (let i = 3; i < out.length; i += 4) out[i] = 255; + const cx = W / 2; + const cy = H / 2; + const ncx = nw / 2; + const ncy = nh / 2; + for (let y = 0; y < nh; y += 1) { + for (let x = 0; x < nw; x += 1) { + const dx = x - ncx; + const dy = y - ncy; + const sx = Math.round(dx * c + dy * s + cx); + const sy = Math.round(-dx * s + dy * c + cy); + if (sx >= 0 && sx < W && sy >= 0 && sy < H) { + const so = (sy * W + sx) * 4; + const dstO = (y * nw + x) * 4; + out[dstO] = data[so]; out[dstO + 1] = data[so + 1]; out[dstO + 2] = data[so + 2]; out[dstO + 3] = data[so + 3]; + } + } + } + return { data: out, width: nw, height: nh }; +} + +// 估计文档倾斜角:对 det 概率图二值化后,用「错切投影直方图方差」找让文本行最对齐水平的角度。 +// 返回应当旋转的角度(度),使图像去倾斜(正角度 = 逆时针)。范围 [-maxAngle, maxAngle]。 +export function estimateSkewAngle(probData, mapW, mapH, { maxAngle = 12, step = 1, thresh = 0.3 } = {}) { + // 收集文本像素坐标(下采样以提速) + const pts = []; + const sub = Math.max(1, Math.floor(Math.max(mapW, mapH) / 160)); + for (let y = 0; y < mapH; y += sub) { + for (let x = 0; x < mapW; x += sub) { + if (probData[y * mapW + x] > thresh) pts.push([x, y]); + } + } + if (pts.length < 20) return 0; + let bestAngle = 0; + let bestVar = -1; + for (let deg = -maxAngle; deg <= maxAngle; deg += step) { + const a = (deg * Math.PI) / 180; + const c = Math.cos(a); + const s = Math.sin(a); + // 旋转后行坐标的直方图(按行 bin 统计文本像素),方差越大表示文本行越对齐水平 + const hist = new Map(); + for (const [x, y] of pts) { + const ry = Math.round(-x * s + y * c); + hist.set(ry, (hist.get(ry) || 0) + 1); + } + let sum = 0; + let sumSq = 0; + let n = 0; + for (const v of hist.values()) { sum += v; sumSq += v * v; n += 1; } + if (n === 0) continue; + const mean = sum / n; + const variance = sumSq / n - mean * mean; + if (variance > bestVar) { bestVar = variance; bestAngle = deg; } + } + return bestAngle; +} + +// 旋转 RGBA 图像。rotateImageData180:上下左右翻转(cls 检测到 180° 时用)。 +export function rotateImageData180(imageData) { + const { data, width, height } = imageData; + const out = new Uint8ClampedArray(data.length); + for (let y = 0; y < height; y += 1) { + for (let x = 0; x < width; x += 1) { + const s = (y * width + x) * 4; + const d = ((height - 1 - y) * width + (width - 1 - x)) * 4; + out[d] = data[s]; out[d + 1] = data[s + 1]; out[d + 2] = data[s + 2]; out[d + 3] = data[s + 3]; + } + } + return { data: out, width, height }; +} + +// 旋转 90°(dir: "cw" 顺时针 / "ccw" 逆时针)。输出宽高互换。用于竖排 / 侧向文本。 +export function rotateImageData90(imageData, dir = "cw") { + const { data, width, height } = imageData; + const ow = height; + const oh = width; + const out = new Uint8ClampedArray(ow * oh * 4); + for (let y = 0; y < height; y += 1) { + for (let x = 0; x < width; x += 1) { + const s = (y * width + x) * 4; + let ox; + let oy; + if (dir === "cw") { ox = height - 1 - y; oy = x; } else { ox = y; oy = width - 1 - x; } + const d = (oy * ow + ox) * 4; + out[d] = data[s]; out[d + 1] = data[s + 1]; out[d + 2] = data[s + 2]; out[d + 3] = data[s + 3]; + } + } + return { data: out, width: ow, height: oh }; +} + +// cls 输出 [c0, c1] softmax;c1 高表示需要旋转 180°。返回 { flip, confidence }。 +export function interpretClsOutput(clsData, threshold = 0.6) { + const c0 = clsData?.[0] ?? 1; + const c1 = clsData?.[1] ?? 0; + return { flip: c1 > c0 && c1 >= threshold, confidence: Math.max(c0, c1) }; +} + +// 裁剪 RGBA 区域(坐标 clamp 到图内)。 +export function cropImageData(imageData, box) { + const { data, width, height } = imageData; + const x0 = clamp(Math.floor(box.x), 0, Math.max(0, width - 1)); + const y0 = clamp(Math.floor(box.y), 0, Math.max(0, height - 1)); + const cw = clamp(Math.round(box.w), 1, width - x0); + const ch = clamp(Math.round(box.h), 1, height - y0); + const out = new Uint8ClampedArray(cw * ch * 4); + for (let y = 0; y < ch; y += 1) { + for (let x = 0; x < cw; x += 1) { + const so = ((y0 + y) * width + (x0 + x)) * 4; + const dstO = (y * cw + x) * 4; + out[dstO] = data[so]; + out[dstO + 1] = data[so + 1]; + out[dstO + 2] = data[so + 2]; + out[dstO + 3] = data[so + 3]; + } + } + return { data: out, width: cw, height: ch }; +} + +// CTC 贪心解码:逐时刻 argmax → 折叠连续重复 → 去 blank(0) → 映射字典。 +// logitsData 按 [T, C] 行主序;dictionary[idx] 给出字符(idx 0 为 blank)。 +export function ctcGreedyDecode(logitsData, timeSteps, numClasses, dictionary = []) { + let text = ""; + let confSum = 0; + let confCount = 0; + let prevIdx = -1; + for (let t = 0; t < timeSteps; t += 1) { + const base = t * numClasses; + let bestIdx = 0; + let bestVal = -Infinity; + for (let c = 0; c < numClasses; c += 1) { + const v = logitsData[base + c]; + if (v > bestVal) { + bestVal = v; + bestIdx = c; + } + } + if (bestIdx !== prevIdx && bestIdx !== 0) { + const ch = dictionary[bestIdx]; + if (typeof ch === "string" && ch !== "") { + text += ch; + confSum += bestVal; + confCount += 1; + } + } + prevIdx = bestIdx; + } + const confidence = confCount > 0 ? clamp(confSum / confCount, 0, 1) : 0; + return { text, confidence }; +} + +function firstOutput(session, result) { + const name = Array.isArray(session.outputNames) && session.outputNames.length > 0 + ? session.outputNames[0] + : Object.keys(result)[0]; + return result[name]; +} + +function firstInputName(session) { + return Array.isArray(session.inputNames) && session.inputNames.length > 0 + ? session.inputNames[0] + : "x"; +} + +async function runSession(ort, session, { data, dims }) { + const tensor = new ort.Tensor("float32", data, dims); + const feeds = { [firstInputName(session)]: tensor }; + const result = await session.run(feeds); + return firstOutput(session, result); +} + +// 编排器:imageData + 三段 session + 字典 → OCRResult。ort 仅用于构造 Tensor。 +export async function runPaddlePipeline({ + ort, + detSession, + clsSession = null, + recSession, + imageData, + dictionary = [], + options = {}, +} = {}) { + if (!ort || typeof ort.Tensor !== "function") { + throw new ConversionError("runPaddlePipeline requires an onnxruntime namespace with Tensor.", { + category: "validate", + code: "OCR_ENGINE_INVALID", + details: { reason: "missing-ort" }, + }); + } + if (!detSession || !recSession) { + throw new ConversionError("runPaddlePipeline requires det and rec sessions.", { + category: "validate", + code: "OCR_ENGINE_INVALID", + details: { reason: "missing-sessions" }, + }); + } + const startedAt = Date.now(); + + // 去噪:默认 auto——仅当估计噪点比例超过阈值时才中值滤波(中值滤波会软化干净图、 + // 降低清晰文本置信度,所以干净图绝不去噪)。options.denoise: "auto"|true|false。 + const denoiseMode = options.denoise ?? "auto"; + const denoiseThreshold = typeof options.denoiseThreshold === "number" ? options.denoiseThreshold : 0.05; + let noiseLevel = 0; + let denoised = false; + let workImage = imageData; + if (denoiseMode === true) { + workImage = denoiseImageData(imageData, options.denoiseWindow ? { window: options.denoiseWindow } : {}); + denoised = true; + } else if (denoiseMode !== false) { + noiseLevel = estimateNoiseLevel(imageData); + if (noiseLevel > denoiseThreshold) { + workImage = denoiseImageData(imageData, options.denoiseWindow ? { window: options.denoiseWindow } : {}); + denoised = true; + } + } + + // 单次检测:返回 prob 图 + 还原坐标的 scale。 + async function detect(image) { + const detInput = preprocessForDetection(image, options.det || {}); + const detOut = await runSession(ort, detSession, { data: detInput.data, dims: detInput.dims }); + const pd = detOut?.data || detOut; + const dims = detOut?.dims || [1, 1, detInput.resizedHeight, detInput.resizedWidth]; + const mh = dims[dims.length - 2]; + const mw = dims[dims.length - 1]; + return { pd, mw, mh, scaleW: detInput.scaleW * (detInput.resizedWidth / mw), scaleH: detInput.scaleH * (detInput.resizedHeight / mh) }; + } + + let det = await detect(workImage); + // 去倾斜:从 det 概率图估倾斜角,超阈值则把图旋正后重检(仅倾斜图付二次检测代价; + // 正立图估计≈0 不重检)。识别仍只跑一次(在最终图上)。options.deskew: true(默认)|false。 + let skewApplied = 0; + const minSkew = typeof options.minSkew === "number" ? options.minSkew : 3; + if (options.deskew !== false) { + const est = estimateSkewAngle(det.pd, det.mw, det.mh, options.skew || {}); + if (Math.abs(est) >= minSkew) { + workImage = rotateImageDataByAngle(workImage, -est); + skewApplied = est; + det = await detect(workImage); + } + } + const probData = det.pd; + const mapW = det.mw; + const mapH = det.mh; + const boxes = dbPostProcess(probData, mapW, mapH, { + ...(options.db || {}), + scaleW: det.scaleW, + scaleH: det.scaleH, + }); + + const verticalAspect = typeof options.verticalAspect === "number" ? options.verticalAspect : 1.5; + const clsThreshold = typeof options.clsThreshold === "number" ? options.clsThreshold : 0.6; + const lowConfidence = typeof options.lowConfidence === "number" ? options.lowConfidence : 0.6; + + // 对单个裁剪做:可选 cls 180° 校正 → rec → CTC,返回 { text, confidence }。 + async function recognizeCrop(cropImg) { + let img = cropImg; + let orientation = "0"; + if (clsSession) { + try { + const clsPre = preprocessForRecognition(img, options.cls || options.rec || {}); + const clsOut = await runSession(ort, clsSession, { data: clsPre.data, dims: clsPre.dims }); + const { flip } = interpretClsOutput(clsOut?.data || clsOut, clsThreshold); + if (flip) { + img = rotateImageData180(img); + orientation = "180"; + } + } catch (error) { + // 方向分类失败不致命,按原图识别。 + } + } + const recPre = preprocessForRecognition(img, options.rec || {}); + const recOut = await runSession(ort, recSession, { data: recPre.data, dims: recPre.dims }); + const logits = recOut?.data || recOut; + const recDims = recOut?.dims || [1, 0, dictionary.length]; + const decoded = ctcGreedyDecode(logits, recDims[recDims.length - 2], recDims[recDims.length - 1], dictionary); + return { ...decoded, orientation }; + } + + const lines = []; + for (const box of boxes) { + const crop = cropImageData(workImage, box); + // 竖排 / 侧向文本:高宽比偏高的框,额外尝试旋转 90°(cw + ccw),按识别置信度取最优。 + const candidates = [{ img: crop, rot: "0" }]; + if (box.h > box.w * verticalAspect) { + candidates.push({ img: rotateImageData90(crop, "cw"), rot: "90cw" }); + candidates.push({ img: rotateImageData90(crop, "ccw"), rot: "90ccw" }); + } + let best = null; + for (const cand of candidates) { + let decoded; + try { + decoded = await recognizeCrop(cand.img); + } catch (error) { + continue; + } + if (decoded.text.trim().length === 0) continue; + if (!best || decoded.confidence > best.confidence) { + best = { ...decoded, rotation: cand.rot }; + } + } + if (best) { + lines.push({ + text: best.text, + confidence: best.confidence, + bbox: { x: box.x, y: box.y, w: box.w, h: box.h }, + orientation: best.rotation === "0" ? best.orientation : best.rotation, + lowConfidence: best.confidence < lowConfidence, + }); + } + } + + const confs = lines.map((l) => l.confidence); + const averageConfidence = confs.length > 0 ? clamp(confs.reduce((s, c) => s + c, 0) / confs.length, 0, 1) : 0; + const lowConfidenceLines = lines.filter((l) => l.lowConfidence).length; + // 质量把控摘要:整体/最低置信度、低置信行数、是否有旋转校正。 + const quality = { + lineCount: lines.length, + averageConfidence, + minConfidence: confs.length > 0 ? clamp(Math.min(...confs), 0, 1) : 0, + lowConfidenceLines, + rotatedLines: lines.filter((l) => l.orientation && l.orientation !== "0").length, + denoised, + noiseLevel, + skewApplied, + grade: averageConfidence >= 0.9 && lowConfidenceLines === 0 + ? "high" + : (averageConfidence >= 0.7 ? "medium" : "low"), + }; + + const result = createOCRResult({ + language: options.language || "auto", + pages: [ + { + pageIndex: 0, + width: imageData.width, + height: imageData.height, + lines, + }, + ], + fullText: lines.map((l) => l.text).join("\n"), + averageConfidence, + runtimeMs: Date.now() - startedAt, + engine: "paddleocr-v5", + modelVersion: "v5", + warnings: [], + }); + return { ...result, quality }; +} diff --git a/public/core/ocr/paddle-ocr-runtime.js b/public/core/ocr/paddle-ocr-runtime.js new file mode 100644 index 0000000..ce04277 --- /dev/null +++ b/public/core/ocr/paddle-ocr-runtime.js @@ -0,0 +1,95 @@ +// PP-OCRv5 ONNX Runtime 运行时加载器(P9-D.2)。dynamic import 同源 vendor onnxruntime-web, +// WebGPU 优先 / WASM 回退选择执行后端,提供 InferenceSession 创建/释放骨架。数据留在本地、 +// 零云端。真实 det/cls/rec 推理管线 + CTC 解码留给 P9-D.2.b。 + +import { ConversionError } from "../conversion-error.js"; + +export const OCR_VENDOR_LOAD_FAILED = "OCR_VENDOR_LOAD_FAILED"; + +export const PADDLE_VENDOR_PATHS = Object.freeze({ + // ORT 的 ESM 入口;浏览器/Tauri 端从同源 vendor 目录加载,wasm 二进制亦在该目录。 + mainBundle: "/vendor/onnxruntime/ort.min.mjs", + wasmDir: "/vendor/onnxruntime/", +}); + +let cachedNamespace = null; + +function hasWebGPU() { + return typeof globalThis !== "undefined" + && typeof globalThis.navigator === "object" + && globalThis.navigator !== null + && typeof globalThis.navigator.gpu === "object" + && globalThis.navigator.gpu !== null; +} + +// WebGPU 可用 → ["webgpu","wasm"](webgpu 优先,wasm 回退);否则 ["wasm"]。 +// Node(无 navigator.gpu)返回 ["wasm"],纯函数可测。 +export function pickExecutionProviders() { + return hasWebGPU() ? ["webgpu", "wasm"] : ["wasm"]; +} + +export async function loadOnnxRuntime(vendorUrl = PADDLE_VENDOR_PATHS.mainBundle) { + if (cachedNamespace) return cachedNamespace; + try { + const ort = await import(/* @vite-ignore */ vendorUrl); + const namespace = ort?.default && ort.default.InferenceSession ? ort.default : ort; + if (!namespace || !namespace.InferenceSession) { + throw new Error("vendor onnxruntime-web missing InferenceSession"); + } + // 让 ORT 从同源 vendor 目录加载 wasm 二进制,绝不联网。 + if (namespace.env?.wasm) { + namespace.env.wasm.wasmPaths = PADDLE_VENDOR_PATHS.wasmDir; + } + cachedNamespace = namespace; + return cachedNamespace; + } catch (error) { + throw new ConversionError( + `onnxruntime-web vendor 加载失败:${error?.message || error}`, + { + category: "convert", + code: OCR_VENDOR_LOAD_FAILED, + details: { + path: vendorUrl, + cause: String(error?.name || error?.message || "unknown"), + }, + }, + ); + } +} + +export async function createOcrSession({ ort, modelBuffer, providers = pickExecutionProviders() } = {}) { + if (!ort || !ort.InferenceSession) { + throw new ConversionError("onnxruntime namespace 未就位,无法创建 InferenceSession。", { + category: "convert", + code: OCR_VENDOR_LOAD_FAILED, + details: { reason: "namespace-missing" }, + }); + } + if (!modelBuffer) { + throw new ConversionError("createOcrSession requires an ONNX model buffer.", { + category: "validate", + code: "OCR_ENGINE_INVALID", + details: { reason: "missing-model-buffer" }, + }); + } + try { + const data = modelBuffer instanceof ArrayBuffer ? new Uint8Array(modelBuffer) : modelBuffer; + return await ort.InferenceSession.create(data, { executionProviders: providers }); + } catch (error) { + throw new ConversionError(`ONNX InferenceSession 创建失败:${error?.message || error}`, { + category: "convert", + code: "OCR_ENGINE_FAILED", + details: { reason: "session-create-failed", providers, cause: String(error?.name || error?.message || "unknown") }, + }); + } +} + +export async function disposeOcrSession(session) { + if (session && typeof session.release === "function") { + try { await session.release(); } catch (error) { /* ignore */ } + } +} + +export function resetOnnxRuntimeCache() { + cachedNamespace = null; +} diff --git a/public/core/ocr/png-ocr.js b/public/core/ocr/png-ocr.js index 2c64e1c..00a3403 100644 --- a/public/core/ocr/png-ocr.js +++ b/public/core/ocr/png-ocr.js @@ -1,7 +1,7 @@ -import { createParagraph } from "../document-model.js"; import { withWarnings } from "../warnings.js"; import { defaultOCRRegistry } from "./ocr-engine.js"; import { summarizeOCRResult } from "./ocr-result.js"; +import { blocksFromOcrResult, mapLinesToBlockIds } from "./ocr-structure.js"; import { createOCRUnavailableWarning, createOCREngineFailedWarning, @@ -36,19 +36,6 @@ function cloneModel(model) { }; } -function paragraphsFromOCR(result) { - const pages = Array.isArray(result?.pages) ? result.pages : []; - const paragraphs = []; - for (const page of pages) { - const lines = Array.isArray(page.lines) ? page.lines : []; - const text = lines.map((line) => line.text).filter(Boolean).join("\n"); - if (text.trim().length > 0) paragraphs.push(createParagraph(text)); - } - if (paragraphs.length === 0 && typeof result?.fullText === "string" && result.fullText.trim().length > 0) { - paragraphs.push(createParagraph(result.fullText)); - } - return paragraphs; -} export async function enhanceWithOCR(model, { engine = null, registry = defaultOCRRegistry } = {}) { const resolvedEngine = engine || registry.pickForTask("ocr-text"); @@ -106,7 +93,8 @@ export async function enhanceWithOCR(model, { engine = null, registry = defaultO return next; } - const paragraphs = paragraphsFromOCR(result); + // 格式识别增强:按版面(bbox/行高/间距)把识别行归并成标题+段落;几何不足时回退。 + const paragraphs = blocksFromOcrResult(result); const ocrWarnings = []; if (typeof result?.averageConfidence === "number" && result.averageConfidence < LOW_CONFIDENCE_THRESHOLD) { ocrWarnings.push(createOCRLowConfidenceWarning({ @@ -119,6 +107,11 @@ export async function enhanceWithOCR(model, { engine = null, registry = defaultO const enhanced = cloneModel(model); const appendedStart = enhanced.blocks.length; enhanced.blocks = [...enhanced.blocks, ...paragraphs]; + // 给 OCR 追加块预赋稳定 id(绝对索引),让低置信修复能按 block.id 命中。document-audit 的 + // `id: block.id || ...` 会保留它,且 "ocr-block-" 前缀不与审计的 "block-N-hash" 冲突。 + for (let i = appendedStart; i < enhanced.blocks.length; i += 1) { + if (!enhanced.blocks[i].id) enhanced.blocks[i].id = `ocr-block-${i}`; + } enhanced.metadata = withWarnings(enhanced.metadata, ocrWarnings); enhanced.metadata.modelReview = { ...(enhanced.metadata.modelReview || {}), @@ -127,6 +120,8 @@ export async function enhanceWithOCR(model, { engine = null, registry = defaultO tasks: Array.from(new Set([...(enhanced.metadata.modelReview?.tasks || []), "ocr-text-recognition"])), inferenceMode: "local", ocr: summarizeOCRResult(result), + // 质量把控:若引擎提供了质量评估(置信度分级、低置信行、旋转校正数),一并记录。 + ...(result?.quality ? { ocrQuality: result.quality } : {}), }; enhanced.metadata.ocr = collectLineMetadata(result, enhanced.blocks, appendedStart); return enhanced; @@ -134,24 +129,25 @@ export async function enhanceWithOCR(model, { engine = null, registry = defaultO function collectLineMetadata(result, blocks, appendedStart) { const pages = Array.isArray(result?.pages) ? result.pages : []; - const lines = []; + const flat = []; pages.forEach((page, pageIndex) => { const pageLines = Array.isArray(page.lines) ? page.lines : []; pageLines.forEach((line, lineIndex) => { - // Each appended paragraph corresponds to a page; pick the block that - // received the paragraph for this page so repair candidates can refer - // back to it by id. - const block = blocks[appendedStart + pageIndex]; - lines.push({ - pageIndex, - lineIndex, - text: line.text || "", - confidence: typeof line.confidence === "number" ? line.confidence : 0, - bbox: line.bbox || null, - blockId: block?.id || "", - }); + flat.push({ pageIndex, lineIndex, line }); }); }); + // 用文本包含把每行映射到承载它的追加块的 id(处理「多行→一块」与结构归并),而不是按页 + // 索引硬猜(旧 blocks[appendedStart + pageIndex] 在每页多块/标题分块时会错配)。 + const appendedBlocks = blocks.slice(appendedStart); + const blockIds = mapLinesToBlockIds(flat.map((f) => f.line), appendedBlocks); + const lines = flat.map((f, i) => ({ + pageIndex: f.pageIndex, + lineIndex: f.lineIndex, + text: f.line.text || "", + confidence: typeof f.line.confidence === "number" ? f.line.confidence : 0, + bbox: f.line.bbox || null, + blockId: blockIds[i] || "", + })); return { language: result?.language || "auto", pageCount: pages.length, diff --git a/public/core/ocr/scan-pdf-stage.js b/public/core/ocr/scan-pdf-stage.js index 9004695..5c40109 100644 --- a/public/core/ocr/scan-pdf-stage.js +++ b/public/core/ocr/scan-pdf-stage.js @@ -4,6 +4,7 @@ import { defaultOCRRegistry } from "./ocr-engine.js"; import { createOCREngineFailedWarning, createOCRUnavailableWarning, createOCRLowConfidenceWarning } from "./ocr-warnings.js"; import { defaultPdfPageRasterizer } from "./pdf-rasterizer.js"; import { mergeOCRResultsToFixedLayout } from "./ocr-to-fixed-layout.js"; +import { mapLinesToBlockIds } from "./ocr-structure.js"; import { fixedLayoutToSemantic } from "../models/mappers.js"; import { getFixedLayoutSummary } from "../models/fixed-layout.js"; @@ -135,6 +136,10 @@ export async function runScannedPdfOCRStage(model, ctx = {}) { sourceFormat: enhanced.sourceFormat || "pdf", }); enhanced.blocks.push(...(semanticFromLayout.blocks || [])); + // 给追加块预赋稳定 id(绝对索引),供低置信修复按 block.id 命中;document-audit 保留之。 + for (let i = appendedStart; i < enhanced.blocks.length; i += 1) { + if (!enhanced.blocks[i].id) enhanced.blocks[i].id = `ocr-block-${i}`; + } enhanced.metadata = withWarnings(enhanced.metadata, [ createWarning( "info", @@ -151,12 +156,11 @@ export async function runScannedPdfOCRStage(model, ctx = {}) { ]); } - // Re-resolve blockId for each ocr line after blocks were appended via mapper - let assignedFromBlockIndex = appendedStart; - for (const ocrLine of lines) { - const blockSlot = enhanced.blocks[assignedFromBlockIndex]; - if (blockSlot) ocrLine.blockId = blockSlot.id || ""; - } + // 用文本包含把每行映射到承载它的追加块的 id。不能按 lines 顺序硬配索引: + // mergeOCRResultsToFixedLayout 会按阅读顺序(bbox y→x)重排,lines 顺序 ≠ 块顺序。 + const appendedBlocks = enhanced.blocks.slice(appendedStart); + const blockIds = mapLinesToBlockIds(lines, appendedBlocks); + lines.forEach((ocrLine, i) => { ocrLine.blockId = blockIds[i] || ""; }); enhanced.metadata.ocr = { language: language || "auto", diff --git a/public/core/ocr/tesseract-engine.js b/public/core/ocr/tesseract-engine.js index 55a06ee..43894ec 100644 --- a/public/core/ocr/tesseract-engine.js +++ b/public/core/ocr/tesseract-engine.js @@ -16,6 +16,10 @@ export const TESSERACT_MANIFEST_ID = "ocr-text.tesseract.5.0.0"; const TESSDATA_KEY_PREFIX = "tesseract/"; const DEFAULT_LANGUAGES = ["chi_sim", "eng"]; +// 就绪状态放模块级可变变量,而非冻结对象的实例属性(冻结对象在严格模式下无法被 +// ensureProbe 赋值)。引擎对象本身仍可 Object.freeze 防外部篡改。 +let tessdataReady = false; + function vendorReady() { return Boolean(globalThis.__t2fTesseractVendorReady); } @@ -33,6 +37,8 @@ export const tesseractOCREngine = Object.freeze({ id: "tesseract-zh-en", taskCapabilities: ["ocr-text"], manifestId: TESSERACT_MANIFEST_ID, + // 轻量内置引擎:优先级高于 placeholder(0),低于 PP-OCRv5 高级引擎(20)。 + priority: 10, // OCREngineRegistry expects a synchronous isAvailable. We expose a synchronous // signature backed by a cached probe; the probe is updated by ensureProbe() @@ -40,20 +46,19 @@ export const tesseractOCREngine = Object.freeze({ // false. isAvailable() { if (!vendorReady()) return false; - return Boolean(tesseractOCREngine._tessdataReady); + return Boolean(tessdataReady); }, - _tessdataReady: false, _storage: defaultOCRStorage, async ensureProbe() { if (!vendorReady()) { - this._tessdataReady = false; + tessdataReady = false; return false; } const language = await hasAnyTessdata(this._storage, DEFAULT_LANGUAGES); - this._tessdataReady = Boolean(language); - return this._tessdataReady; + tessdataReady = Boolean(language); + return tessdataReady; }, async recognize({ image, options } = {}) { diff --git a/public/core/repair-engine.js b/public/core/repair-engine.js index 4af2f54..612665b 100644 --- a/public/core/repair-engine.js +++ b/public/core/repair-engine.js @@ -3,36 +3,11 @@ import { validateRepairAction, summarizeAction } from "./repair-actions.js"; import { DEFAULT_HANDLERS } from "./repair-handlers.js"; import { DEFAULT_VALIDATORS } from "./repair-validators.js"; import { detectOCRLowConfidence } from "./ocr/ocr-validator.js"; +import { ROUND_TRIP_FORMATS, modelFingerprint } from "./verification/block-fingerprint.js"; import { createWarning, withWarnings } from "./warnings.js"; export const MIN_CONFIDENCE = 0.6; -const ROUND_TRIP_FORMATS = new Set(["md", "html", "json", "csv", "txt", "xml"]); - -function isPlainObject(value) { - return Boolean(value) && typeof value === "object" && !Array.isArray(value); -} - -function blockFingerprint(block) { - if (!isPlainObject(block)) return ""; - if (block.type === "heading") return `h${block.level}|${block.text || ""}`; - if (block.type === "paragraph" || block.type === "quote") return `${block.type}|${block.text || ""}`; - if (block.type === "code") return `code|${block.language || ""}|${block.code || ""}`; - if (block.type === "list") return `list|${block.ordered ? "ol" : "ul"}|${(block.items || []).join("")}`; - if (block.type === "table") { - return `table|${(block.headers || []).join("")}|${(block.rows || []).map((row) => (row || []).join("")).join("")}`; - } - if (block.type === "image" || block.type === "asset") { - return `${block.type}|${block.src || ""}|${block.alt || ""}|${block.assetId || ""}`; - } - if (block.type === "raw") return `raw|${block.format || ""}|${block.content || ""}`; - return block.type || ""; -} - -function modelFingerprint(model) { - return (model.blocks || []).map(blockFingerprint).join(""); -} - function summarizeQuality(model) { const report = model.metadata?.qualityReport || {}; return { diff --git a/public/core/verification/block-fingerprint.js b/public/core/verification/block-fingerprint.js new file mode 100644 index 0000000..bb9d514 --- /dev/null +++ b/public/core/verification/block-fingerprint.js @@ -0,0 +1,79 @@ +// 共享指纹模块:原 repair-engine.js 的 blockFingerprint / modelFingerprint 抽出, +// 行为字节级不变;额外暴露 getBlockKey / extractBlockFields / BLOCK_FIELDS_BY_TYPE +// 给 verification-stage 的字段级 diff 用。ROUND_TRIP_FORMATS 也搬到这里作为单一来源。 + +export const ROUND_TRIP_FORMATS = new Set(["md", "html", "json", "csv", "txt", "xml"]); + +export const BLOCK_FIELDS_BY_TYPE = { + heading: ["type", "level", "text"], + paragraph: ["type", "text"], + quote: ["type", "text"], + code: ["type", "language", "code"], + list: ["type", "ordered", "items"], + table: ["type", "headers", "rows"], + image: ["type", "src", "alt"], + asset: ["type", "assetId", "alt"], + raw: ["type", "format", "content"], +}; + +function isPlainObject(value) { + return Boolean(value) && typeof value === "object" && !Array.isArray(value); +} + +function stableHash(value) { + const text = String(value ?? ""); + let hash = 2166136261; + for (let index = 0; index < text.length; index += 1) { + hash ^= text.charCodeAt(index); + hash = Math.imul(hash, 16777619); + } + return (hash >>> 0).toString(36).padStart(8, "0").slice(0, 8); +} + +export function blockFingerprint(block) { + if (!isPlainObject(block)) return ""; + if (block.type === "heading") return `h${block.level}|${block.text || ""}`; + if (block.type === "paragraph" || block.type === "quote") return `${block.type}|${block.text || ""}`; + if (block.type === "code") return `code|${block.language || ""}|${block.code || ""}`; + if (block.type === "list") return `list|${block.ordered ? "ol" : "ul"}|${(block.items || []).join("")}`; + if (block.type === "table") { + return `table|${(block.headers || []).join("")}|${(block.rows || []).map((row) => (row || []).join("")).join("")}`; + } + if (block.type === "image" || block.type === "asset") { + return `${block.type}|${block.src || ""}|${block.alt || ""}|${block.assetId || ""}`; + } + if (block.type === "raw") return `raw|${block.format || ""}|${block.content || ""}`; + return block.type || ""; +} + +export function modelFingerprint(model) { + return (model?.blocks || []).map(blockFingerprint).join(""); +} + +export function extractBlockFields(block) { + if (!isPlainObject(block)) return { type: "" }; + const fields = BLOCK_FIELDS_BY_TYPE[block.type] || ["type"]; + const subset = {}; + for (const field of fields) { + const value = block[field]; + if (Array.isArray(value)) { + subset[field] = value.map((entry) => Array.isArray(entry) ? entry.map(String) : (entry === undefined || entry === null ? "" : String(entry))); + } else if (value === undefined || value === null) { + subset[field] = ""; + } else if (typeof value === "boolean" || typeof value === "number") { + subset[field] = value; + } else { + subset[field] = String(value); + } + } + return subset; +} + +export function getBlockKey(block, index) { + if (isPlainObject(block) && typeof block.id === "string" && block.id.length > 0) { + return block.id; + } + const fields = extractBlockFields(block); + const fingerprint = stableHash(JSON.stringify(fields)); + return `${fields.type || "unknown"}-${index}-${fingerprint}`; +} diff --git a/public/core/verification/ocr-readback.js b/public/core/verification/ocr-readback.js new file mode 100644 index 0000000..c2fac49 --- /dev/null +++ b/public/core/verification/ocr-readback.js @@ -0,0 +1,176 @@ +// OCR 回读:P9-C 三层检验第三层。把转换输出(PDF)栅格化后用 OCR 引擎读回文本, +// 与原始 SemanticDoc 文本做字符级多重集相似度,写入 qualityReport.ocrReadback。 +// engine / rasterizer 复用已注册的 ocr-text 资源;Node 默认不可用 → eligible:false。 + +import { createWarning } from "../warnings.js"; +import { defaultOCRRegistry } from "../ocr/ocr-engine.js"; +import { defaultPdfPageRasterizer } from "../ocr/pdf-rasterizer.js"; + +export const OCR_READBACK_DRIFT = "OCR_READBACK_DRIFT"; +export const OCR_READBACK_FAILED = "OCR_READBACK_FAILED"; +export const DEFAULT_OCR_READBACK_THRESHOLD = 0.7; + +// 当前唯一可栅格化的文本 writer。 +const OCR_READBACK_OUTPUT_FORMATS = new Set(["pdf"]); + +export function normalizeText(value) { + let text = String(value ?? ""); + if (typeof text.normalize === "function") text = text.normalize("NFKC"); + return text.toLowerCase().replace(/\s+/g, ""); +} + +function charMultiset(normalized) { + const counts = new Map(); + for (const ch of normalized) { + counts.set(ch, (counts.get(ch) || 0) + 1); + } + return counts; +} + +// 字符级多重集 recall / precision / f1,跨中英文与 OCR 噪声稳健。 +export function compareText(original, recognized) { + const normOriginal = normalizeText(original); + const normRecognized = normalizeText(recognized); + const originalCounts = charMultiset(normOriginal); + const recognizedCounts = charMultiset(normRecognized); + + let intersection = 0; + for (const [ch, count] of originalCounts) { + const other = recognizedCounts.get(ch) || 0; + intersection += Math.min(count, other); + } + + const originalLength = normOriginal.length; + const recognizedLength = normRecognized.length; + const recall = originalLength > 0 ? intersection / originalLength : (recognizedLength === 0 ? 1 : 0); + const precision = recognizedLength > 0 ? intersection / recognizedLength : (originalLength === 0 ? 1 : 0); + const f1 = (precision + recall) > 0 ? (2 * precision * recall) / (precision + recall) : 0; + + return { + recall, + precision, + f1, + originalLength, + recognizedLength, + intersection, + }; +} + +function textOfBlock(block) { + if (!block || typeof block !== "object") return ""; + if (typeof block.text === "string") return block.text; + if (Array.isArray(block.items)) return block.items.join("\n"); + if (Array.isArray(block.rows)) { + const head = Array.isArray(block.headers) ? block.headers.join(" ") : ""; + return [head, ...block.rows.map((row) => (Array.isArray(row) ? row.join(" ") : ""))].join("\n"); + } + if (typeof block.code === "string") return block.code; + if (typeof block.content === "string") return block.content; + if (typeof block.alt === "string") return block.alt; + return ""; +} + +export function extractModelText(model) { + const blocks = Array.isArray(model?.blocks) ? model.blocks : []; + return blocks.map(textOfBlock).filter((text) => text && text.trim().length > 0).join("\n"); +} + +function recognizedTextOf(ocrResult) { + if (typeof ocrResult?.fullText === "string" && ocrResult.fullText.trim().length > 0) { + return ocrResult.fullText; + } + const pages = Array.isArray(ocrResult?.pages) ? ocrResult.pages : []; + return pages + .flatMap((page) => (Array.isArray(page.lines) ? page.lines.map((line) => line.text) : [])) + .filter(Boolean) + .join("\n"); +} + +function nowMs() { + if (typeof performance !== "undefined" && typeof performance.now === "function") { + return performance.now(); + } + return Date.now(); +} + +export async function runOcrReadbackLayer({ + model, + output, + ctx, + engine = null, + rasterizer = defaultPdfPageRasterizer, + registry = defaultOCRRegistry, +} = {}) { + const start = nowMs(); + const skip = (reason, warnings = []) => ({ eligible: false, reason, ocrReadback: null, warnings, runtimeMs: nowMs() - start }); + + if (!OCR_READBACK_OUTPUT_FORMATS.has(ctx?.to)) { + return skip("output-not-rasterizable-for-ocr"); + } + const originalText = extractModelText(model); + if (originalText.trim().length === 0) { + return skip("no-source-text"); + } + + const resolvedEngine = engine || registry.pickForTask("ocr-text"); + let available = false; + try { + available = Boolean(resolvedEngine) && resolvedEngine.isAvailable() === true; + } catch (error) { + available = false; + } + if (!available) { + return skip("ocr-engine-unavailable"); + } + + const threshold = typeof ctx?.options?.verification?.ocrReadbackThreshold === "number" + ? ctx.options.verification.ocrReadbackThreshold + : DEFAULT_OCR_READBACK_THRESHOLD; + const language = ctx?.options?.ocr?.language || "auto"; + + let recognized; + try { + const raster = await rasterizer.rasterize({ content: output?.data, pageIndex: 0 }); + recognized = await resolvedEngine.recognize({ image: raster.dataUrl, options: { language } }); + } catch (error) { + const cause = error?.code || error?.message || "unknown"; + if (cause === "OCR_RASTERIZER_UNAVAILABLE") { + return skip("rasterizer-unavailable"); + } + return skip(`readback-failed:${cause}`, [ + createWarning("info", OCR_READBACK_FAILED, `OCR 回读失败:${cause}.`, { from: ctx?.from, to: ctx?.to, cause }), + ]); + } + + const recognizedText = recognizedTextOf(recognized); + const similarity = compareText(originalText, recognizedText); + const passed = similarity.f1 >= threshold; + const ocrReadback = { + recall: similarity.recall, + precision: similarity.precision, + f1: similarity.f1, + threshold, + passed, + engineId: resolvedEngine.id, + originalLength: similarity.originalLength, + recognizedLength: similarity.recognizedLength, + averageConfidence: typeof recognized?.averageConfidence === "number" ? recognized.averageConfidence : null, + pageIndex: 0, + }; + const warnings = passed + ? [] + : [createWarning( + "info", + OCR_READBACK_DRIFT, + `OCR 回读 ${ctx?.from} → ${ctx?.to} 低于阈值(f1 ${similarity.f1.toFixed(3)} < ${threshold})。`, + { from: ctx?.from, to: ctx?.to, f1: similarity.f1, recall: similarity.recall, precision: similarity.precision, threshold }, + )]; + + return { + eligible: true, + reason: "completed", + ocrReadback, + warnings, + runtimeMs: nowMs() - start, + }; +} diff --git a/public/core/verification/page-image-source-browser.js b/public/core/verification/page-image-source-browser.js new file mode 100644 index 0000000..de98dd0 --- /dev/null +++ b/public/core/verification/page-image-source-browser.js @@ -0,0 +1,96 @@ +// 浏览器/Tauri 端像素源实现:PDF 经 vendor pdfjs rasterize 得 PNG dataUrl,PNG 直接用 +// 其 dataUrl/bytes,统一通过 Image → canvas → getImageData 取 RGBA 像素。仅在 DOM runtime +// 可用;Node 测试用注入 stub。本文件不联网,所有资源走同源 vendor / blob / dataUrl。 + +import { ConversionError } from "../conversion-error.js"; +import { createBrowserPdfPageRasterizer } from "../ocr/pdf-rasterizer-browser.js"; + +function ensureBrowserRuntime() { + if (typeof globalThis === "undefined" || typeof globalThis.document?.createElement !== "function") { + throw new ConversionError("Browser page image source needs a DOM runtime.", { + category: "convert", + code: "VERIFICATION_IMAGE_SOURCE_UNAVAILABLE", + details: { reason: "missing-document" }, + }); + } +} + +function toPngDataUrl(content) { + if (typeof content === "string") { + if (content.startsWith("data:")) return content; + // 裸 base64 / 二进制字符串:包成 png dataUrl 让浏览器解码。 + return `data:image/png;base64,${globalThis.btoa(content)}`; + } + if (content instanceof Uint8Array || content instanceof ArrayBuffer) { + const bytes = content instanceof ArrayBuffer ? new Uint8Array(content) : content; + let binary = ""; + for (let i = 0; i < bytes.length; i += 1) binary += String.fromCharCode(bytes[i]); + return `data:image/png;base64,${globalThis.btoa(binary)}`; + } + throw new ConversionError("Unsupported PNG content type for browser image source.", { + category: "validate", + code: "VERIFICATION_IMAGE_SOURCE_FAILED", + details: { reason: "unsupported-content-type" }, + }); +} + +function loadImage(dataUrl) { + return new Promise((resolve, reject) => { + const image = new globalThis.Image(); + image.onload = () => resolve(image); + image.onerror = () => reject(new ConversionError("Image 解码失败。", { + category: "convert", + code: "VERIFICATION_IMAGE_SOURCE_FAILED", + details: { reason: "image-decode-failed" }, + })); + image.src = dataUrl; + }); +} + +function drawToPixels(image, width, height) { + const canvas = globalThis.document.createElement("canvas"); + canvas.width = width; + canvas.height = height; + const ctx = canvas.getContext("2d"); + if (!ctx) { + throw new ConversionError("Canvas 2d context 不可用。", { + category: "convert", + code: "VERIFICATION_IMAGE_SOURCE_FAILED", + details: { reason: "canvas-context-missing" }, + }); + } + ctx.drawImage(image, 0, 0, width, height); + const imageData = ctx.getImageData(0, 0, width, height); + return { pixels: imageData.data, width, height }; +} + +export function createBrowserPageImageSource({ dpi = 144 } = {}) { + let pdfRasterizer = null; + function getPdfRasterizer() { + if (!pdfRasterizer) pdfRasterizer = createBrowserPdfPageRasterizer(); + return pdfRasterizer; + } + + return Object.freeze({ + async getPageImage({ format, content, pageIndex = 0, dpi: dpiOverride } = {}) { + ensureBrowserRuntime(); + const normalized = String(format || "").toLowerCase(); + if (normalized === "pdf") { + const raster = await getPdfRasterizer().rasterize({ content, pageIndex, dpi: dpiOverride || dpi }); + const image = await loadImage(raster.dataUrl); + return drawToPixels(image, raster.width, raster.height); + } + if (normalized === "png") { + const image = await loadImage(toPngDataUrl(content)); + const width = image.naturalWidth || image.width; + const height = image.naturalHeight || image.height; + return drawToPixels(image, width, height); + } + throw new ConversionError(`像素源不支持格式:${normalized}`, { + category: "convert", + code: "VERIFICATION_IMAGE_SOURCE_FAILED", + details: { reason: "format-not-rasterizable", format: normalized }, + }); + }, + }); +} diff --git a/public/core/verification/page-image-source.js b/public/core/verification/page-image-source.js new file mode 100644 index 0000000..3356c3f --- /dev/null +++ b/public/core/verification/page-image-source.js @@ -0,0 +1,75 @@ +// 像素源抽象:SSIM 视觉回环层需要把某个格式的某一页栅格化为 RGBA 像素缓冲。 +// 与 OCR 的 pdf-rasterizer 分离关注点(OCR 要 PNG dataUrl,SSIM 要原始像素)。 +// Node 默认不可用(抛错);浏览器/Tauri 首次调用 dynamic import canvas 实现; +// 测试通过 setPageImageSource 注入 stub。 + +import { ConversionError } from "../conversion-error.js"; + +export const VERIFICATION_IMAGE_SOURCE_UNAVAILABLE = "VERIFICATION_IMAGE_SOURCE_UNAVAILABLE"; +export const VERIFICATION_IMAGE_SOURCE_FAILED = "VERIFICATION_IMAGE_SOURCE_FAILED"; + +// 当前视觉回环支持的可栅格化格式(有源图 / 输出可渲染)。 +export const RASTERIZABLE_FORMATS = new Set(["pdf", "png"]); + +function isBrowserRuntime() { + return typeof globalThis !== "undefined" + && typeof globalThis.document?.createElement === "function"; +} + +let _injectedSource = null; +let _autoBrowserImpl = null; +let _autoBrowserLoadFailed = false; + +async function tryLoadBrowserSource() { + if (_autoBrowserImpl) return _autoBrowserImpl; + if (_autoBrowserLoadFailed) return null; + if (!isBrowserRuntime()) { + _autoBrowserLoadFailed = true; + return null; + } + try { + const mod = await import("./page-image-source-browser.js"); + _autoBrowserImpl = mod.createBrowserPageImageSource(); + return _autoBrowserImpl; + } catch (error) { + _autoBrowserLoadFailed = true; + return null; + } +} + +function throwUnavailable(operation) { + throw new ConversionError( + `验证阶段像素源在当前运行时不可用(${operation})。请用 setPageImageSource 注入实现,或在浏览器/Tauri 端启用 vendor pdfjs + canvas。`, + { + category: "convert", + code: VERIFICATION_IMAGE_SOURCE_UNAVAILABLE, + details: { reason: "no-runtime-image-source", operation }, + }, + ); +} + +export const defaultPageImageSource = Object.freeze({ + async getPageImage(args) { + if (_injectedSource) return _injectedSource.getPageImage(args); + const browserImpl = await tryLoadBrowserSource(); + if (browserImpl) return browserImpl.getPageImage(args); + throwUnavailable("getPageImage"); + }, +}); + +export function setPageImageSource(impl) { + if (!impl || typeof impl.getPageImage !== "function") { + throw new ConversionError("setPageImageSource requires a { getPageImage } function.", { + category: "validate", + code: "VERIFICATION_IMAGE_SOURCE_INVALID", + details: { reason: "missing-methods" }, + }); + } + _injectedSource = impl; +} + +export function resetPageImageSource() { + _injectedSource = null; + _autoBrowserImpl = null; + _autoBrowserLoadFailed = false; +} diff --git a/public/core/verification/rule-diff.js b/public/core/verification/rule-diff.js new file mode 100644 index 0000000..56dae2e --- /dev/null +++ b/public/core/verification/rule-diff.js @@ -0,0 +1,224 @@ +// 规则 diff:在两个 SemanticDoc model(原始 + writer→reader 回读)之间产生 +// 字段级差异,输出 qualityReport.ruleDiff 标准结构。属 P9-C 三层检验的第一层。 + +import { getBlockKey, extractBlockFields, BLOCK_FIELDS_BY_TYPE } from "./block-fingerprint.js"; + +export const MAJOR_WEIGHT = 0.4; +export const MINOR_WEIGHT = 0.05; +export const STRUCTURAL_PENALTY = 0.5; + +const MAJOR_FIELDS = new Set([ + "level", + "ordered", + "headers", + "rows", + "code", + "language", + "src", + "assetId", + "format", +]); + +function clamp(value, min, max) { + if (Number.isNaN(value)) return min; + return Math.min(Math.max(value, min), max); +} + +function firstWords(text, count = 8) { + if (typeof text !== "string") return ""; + return text.trim().split(/\s+/).slice(0, count).join(" "); +} + +function blockSnippet(block) { + if (!block) return ""; + if (typeof block.text === "string") return firstWords(block.text, 12); + if (typeof block.code === "string") return firstWords(block.code, 12); + if (Array.isArray(block.items) && block.items.length > 0) return firstWords(block.items[0], 12); + if (typeof block.alt === "string") return firstWords(block.alt, 12); + if (typeof block.assetId === "string") return block.assetId; + if (typeof block.content === "string") return firstWords(block.content, 12); + return block.type || ""; +} + +function isWhitespaceOrPunctOnlyDelta(beforeStr, afterStr) { + if (typeof beforeStr !== "string" || typeof afterStr !== "string") return false; + const normalize = (value) => value.replace(/[\s\p{P}]+/gu, "").toLowerCase(); + return normalize(beforeStr) === normalize(afterStr); +} + +function arraysEqual(a, b) { + if (a === b) return true; + if (!Array.isArray(a) || !Array.isArray(b)) return false; + if (a.length !== b.length) return false; + for (let index = 0; index < a.length; index += 1) { + const left = a[index]; + const right = b[index]; + if (Array.isArray(left) || Array.isArray(right)) { + if (!arraysEqual(left, right)) return false; + } else if (left !== right) { + return false; + } + } + return true; +} + +function classifyFieldSeverity(field, beforeValue, afterValue) { + if (MAJOR_FIELDS.has(field)) { + if (field === "headers" || field === "rows") { + if (Array.isArray(beforeValue) && Array.isArray(afterValue) && beforeValue.length !== afterValue.length) { + return "major"; + } + } + return "major"; + } + if (field === "text") { + if (isWhitespaceOrPunctOnlyDelta(beforeValue, afterValue)) return "minor"; + return "major"; + } + if (typeof beforeValue === "string" && typeof afterValue === "string") { + if (isWhitespaceOrPunctOnlyDelta(beforeValue, afterValue)) return "minor"; + } + return "minor"; +} + +function diffBlockPair(before, after) { + const fields = BLOCK_FIELDS_BY_TYPE[before?.type] || BLOCK_FIELDS_BY_TYPE[after?.type] || ["type"]; + const fieldsDiffered = []; + for (const field of fields) { + const beforeValue = extractBlockFields(before)[field]; + const afterValue = extractBlockFields(after)[field]; + let equal; + if (Array.isArray(beforeValue) || Array.isArray(afterValue)) { + equal = arraysEqual(beforeValue, afterValue); + } else { + equal = beforeValue === afterValue; + } + if (!equal) { + fieldsDiffered.push({ + field, + severity: classifyFieldSeverity(field, beforeValue, afterValue), + before: beforeValue, + after: afterValue, + }); + } + } + return fieldsDiffered; +} + +function buildAlignmentKey(block, index) { + const fields = extractBlockFields(block); + const head = firstWords(typeof block?.text === "string" ? block.text : (Array.isArray(block?.items) ? block.items[0] || "" : "")); + return `${fields.type || "?"}|${head}|${index}`; +} + +export function diffSemanticDocs(original, readBack) { + const originalBlocks = Array.isArray(original?.blocks) ? original.blocks : []; + const readBackBlocks = Array.isArray(readBack?.blocks) ? readBack.blocks : []; + + const originalKeys = originalBlocks.map((block, index) => getBlockKey(block, index)); + const readBackKeys = readBackBlocks.map((block, index) => getBlockKey(block, index)); + + const matchedOriginalIdx = new Set(); + const matchedReadBackIdx = new Set(); + const changedBlocks = []; + + for (let i = 0; i < originalBlocks.length; i += 1) { + const j = readBackKeys.indexOf(originalKeys[i]); + if (j >= 0 && !matchedReadBackIdx.has(j)) { + matchedOriginalIdx.add(i); + matchedReadBackIdx.add(j); + const fieldsDiffered = diffBlockPair(originalBlocks[i], readBackBlocks[j]); + if (fieldsDiffered.length > 0) { + changedBlocks.push({ + id: originalKeys[i], + type: originalBlocks[i]?.type || "", + fieldsDiffered, + severity: fieldsDiffered.some((entry) => entry.severity === "major") ? "major" : "minor", + }); + } + } + } + + // LCS-lite 二次对齐:按 (type, firstWords) 启发匹配剩余块 + for (let i = 0; i < originalBlocks.length; i += 1) { + if (matchedOriginalIdx.has(i)) continue; + const heuristic = buildAlignmentKey(originalBlocks[i], i).split("|").slice(0, 2).join("|"); + for (let j = 0; j < readBackBlocks.length; j += 1) { + if (matchedReadBackIdx.has(j)) continue; + const candidate = buildAlignmentKey(readBackBlocks[j], j).split("|").slice(0, 2).join("|"); + if (candidate === heuristic) { + matchedOriginalIdx.add(i); + matchedReadBackIdx.add(j); + const fieldsDiffered = diffBlockPair(originalBlocks[i], readBackBlocks[j]); + if (fieldsDiffered.length > 0) { + changedBlocks.push({ + id: originalKeys[i], + type: originalBlocks[i]?.type || "", + fieldsDiffered, + severity: fieldsDiffered.some((entry) => entry.severity === "major") ? "major" : "minor", + }); + } + break; + } + } + } + + const removedBlocks = []; + for (let i = 0; i < originalBlocks.length; i += 1) { + if (matchedOriginalIdx.has(i)) continue; + removedBlocks.push({ + id: originalKeys[i], + type: originalBlocks[i]?.type || "", + snippet: blockSnippet(originalBlocks[i]), + }); + } + + const addedBlocks = []; + for (let j = 0; j < readBackBlocks.length; j += 1) { + if (matchedReadBackIdx.has(j)) continue; + addedBlocks.push({ + id: readBackKeys[j], + type: readBackBlocks[j]?.type || "", + snippet: blockSnippet(readBackBlocks[j]), + }); + } + + const majorFieldCount = changedBlocks.reduce( + (sum, entry) => sum + entry.fieldsDiffered.filter((field) => field.severity === "major").length, + 0, + ); + const minorFieldCount = changedBlocks.reduce( + (sum, entry) => sum + entry.fieldsDiffered.filter((field) => field.severity === "minor").length, + 0, + ); + const structuralDelta = addedBlocks.length + removedBlocks.length; + const denominator = Math.max(1, originalBlocks.length); + const penalty = MAJOR_WEIGHT * majorFieldCount + MINOR_WEIGHT * minorFieldCount + STRUCTURAL_PENALTY * structuralDelta; + const overallScore = clamp(1 - penalty / denominator, 0, 1); + + const identical = changedBlocks.length === 0 && addedBlocks.length === 0 && removedBlocks.length === 0; + let fidelity; + if (identical) { + fidelity = "exact"; + } else if (structuralDelta / denominator > 0.3 || readBackBlocks.length === 0) { + fidelity = "broken"; + } else if (majorFieldCount > 0) { + fidelity = "major-drift"; + } else { + fidelity = "minor-drift"; + } + + return { + identical, + blockCounts: { + original: originalBlocks.length, + readBack: readBackBlocks.length, + delta: readBackBlocks.length - originalBlocks.length, + }, + changedBlocks, + addedBlocks, + removedBlocks, + fidelity, + overallScore, + }; +} diff --git a/public/core/verification/ssim.js b/public/core/verification/ssim.js new file mode 100644 index 0000000..29d45a3 --- /dev/null +++ b/public/core/verification/ssim.js @@ -0,0 +1,127 @@ +// SSIM 视觉对比核心:纯函数,零依赖,操作灰度像素缓冲。P9-C 三层检验的第二层 +// (视觉回环)用它对比输入页与输出页的结构相似度。Node / 浏览器均可运行。 + +export const SSIM_C1 = (0.01 * 255) ** 2; // 6.5025 +export const SSIM_C2 = (0.03 * 255) ** 2; // 58.5225 +export const DEFAULT_WINDOW_SIZE = 8; +export const DEFAULT_TARGET_WIDTH = 256; + +function toClampedArray(value) { + if (value instanceof Uint8ClampedArray) return value; + if (value instanceof Uint8Array || Array.isArray(value)) return Uint8ClampedArray.from(value); + throw new TypeError("ssim: pixel buffer must be Uint8ClampedArray / Uint8Array / number[]."); +} + +// RGBA Uint8 buffer (length w*h*4) → 灰度 Uint8ClampedArray (length w*h)。 +export function rgbaToGrayscale(rgba) { + const buffer = toClampedArray(rgba); + const pixelCount = Math.floor(buffer.length / 4); + const gray = new Uint8ClampedArray(pixelCount); + for (let i = 0; i < pixelCount; i += 1) { + const o = i * 4; + gray[i] = Math.round(0.299 * buffer[o] + 0.587 * buffer[o + 1] + 0.114 * buffer[o + 2]); + } + return gray; +} + +// box 平均重采样:把 srcW×srcH 灰度图缩放到 dstW×dstH。 +export function resampleGrayscale(gray, srcW, srcH, dstW, dstH) { + const source = toClampedArray(gray); + if (srcW <= 0 || srcH <= 0 || dstW <= 0 || dstH <= 0) { + throw new RangeError("ssim: resample dimensions must be positive."); + } + if (srcW === dstW && srcH === dstH) return source; + const out = new Uint8ClampedArray(dstW * dstH); + for (let dy = 0; dy < dstH; dy += 1) { + const sy0 = Math.floor((dy * srcH) / dstH); + const sy1 = Math.max(sy0 + 1, Math.floor(((dy + 1) * srcH) / dstH)); + for (let dx = 0; dx < dstW; dx += 1) { + const sx0 = Math.floor((dx * srcW) / dstW); + const sx1 = Math.max(sx0 + 1, Math.floor(((dx + 1) * srcW) / dstW)); + let sum = 0; + let count = 0; + for (let sy = sy0; sy < sy1 && sy < srcH; sy += 1) { + for (let sx = sx0; sx < sx1 && sx < srcW; sx += 1) { + sum += source[sy * srcW + sx]; + count += 1; + } + } + out[dy * dstW + dx] = count > 0 ? Math.round(sum / count) : 0; + } + } + return out; +} + +// 非重叠窗口均值 SSIM。grayA / grayB 必须同尺寸 width×height。 +export function computeSSIM(grayA, grayB, width, height, options = {}) { + const a = toClampedArray(grayA); + const b = toClampedArray(grayB); + if (a.length !== width * height || b.length !== width * height) { + throw new RangeError("ssim: grayscale buffers must match width*height."); + } + const windowSize = Math.max(2, Math.floor(options.windowSize || DEFAULT_WINDOW_SIZE)); + const c1 = typeof options.c1 === "number" ? options.c1 : SSIM_C1; + const c2 = typeof options.c2 === "number" ? options.c2 : SSIM_C2; + + let ssimSum = 0; + let windowCount = 0; + + for (let wy = 0; wy + windowSize <= height; wy += windowSize) { + for (let wx = 0; wx + windowSize <= width; wx += windowSize) { + let sumA = 0; + let sumB = 0; + let sumAA = 0; + let sumBB = 0; + let sumAB = 0; + const n = windowSize * windowSize; + for (let y = 0; y < windowSize; y += 1) { + const row = (wy + y) * width + wx; + for (let x = 0; x < windowSize; x += 1) { + const va = a[row + x]; + const vb = b[row + x]; + sumA += va; + sumB += vb; + sumAA += va * va; + sumBB += vb * vb; + sumAB += va * vb; + } + } + const meanA = sumA / n; + const meanB = sumB / n; + const varA = sumAA / n - meanA * meanA; + const varB = sumBB / n - meanB * meanB; + const covAB = sumAB / n - meanA * meanB; + const numerator = (2 * meanA * meanB + c1) * (2 * covAB + c2); + const denominator = (meanA * meanA + meanB * meanB + c1) * (varA + varB + c2); + ssimSum += denominator === 0 ? 1 : numerator / denominator; + windowCount += 1; + } + } + + const score = windowCount > 0 ? ssimSum / windowCount : 1; + return { score, windowCount, windowSize, width, height }; +} + +// 端到端:两张 { pixels: RGBA, width, height } → 归一到公共网格后算 SSIM。 +export function compareImages(imageA, imageB, options = {}) { + if (!imageA || !imageB || !imageA.pixels || !imageB.pixels) { + throw new TypeError("ssim: compareImages requires { pixels, width, height } for both images."); + } + const targetWidth = Math.max(8, Math.floor(options.targetWidth || DEFAULT_TARGET_WIDTH)); + const aspect = imageA.height > 0 && imageA.width > 0 ? imageA.height / imageA.width : 1; + const dstW = Math.min(targetWidth, Math.max(imageA.width, imageB.width)); + const dstH = Math.max(8, Math.round(dstW * aspect)); + + const grayA = resampleGrayscale(rgbaToGrayscale(imageA.pixels), imageA.width, imageA.height, dstW, dstH); + const grayB = resampleGrayscale(rgbaToGrayscale(imageB.pixels), imageB.width, imageB.height, dstW, dstH); + + const result = computeSSIM(grayA, grayB, dstW, dstH, options); + return { + score: result.score, + width: dstW, + height: dstH, + windowCount: result.windowCount, + windowSize: result.windowSize, + dimensionsMatched: imageA.width === imageB.width && imageA.height === imageB.height, + }; +} diff --git a/public/core/verification/verification-stage.js b/public/core/verification/verification-stage.js new file mode 100644 index 0000000..db0f487 --- /dev/null +++ b/public/core/verification/verification-stage.js @@ -0,0 +1,294 @@ +// 验证阶段编排:在 Repair Engine cycle 之后跑,组合 P9-C 三层检验结果到统一 envelope。 +// 本批仅实现第一层(rule-diff);P9-C.2 SSIM、P9-C.3 OCR 回读以后按相同 envelope 接入。 + +import { createWarning } from "../warnings.js"; +import { ROUND_TRIP_FORMATS } from "./block-fingerprint.js"; +import { diffSemanticDocs } from "./rule-diff.js"; +import { compareImages } from "./ssim.js"; +import { + defaultPageImageSource, + RASTERIZABLE_FORMATS, +} from "./page-image-source.js"; + +export const RULE_DIFF_DRIFT = "RULE_DIFF_DRIFT"; +export const RULE_DIFF_READBACK_FAILED = "RULE_DIFF_READBACK_FAILED"; +export const SSIM_VISUAL_DRIFT = "SSIM_VISUAL_DRIFT"; +export const SSIM_SOURCE_UNAVAILABLE = "SSIM_SOURCE_UNAVAILABLE"; + +export const DEFAULT_SSIM_THRESHOLD = 0.85; + +const CROSS_FORMAT_LOOPBACK_PAIRS = new Set([ + "md->html", + "html->md", +]); + +function nowMs() { + if (typeof performance !== "undefined" && typeof performance.now === "function") { + return performance.now(); + } + return Date.now(); +} + +function skippedEntry(layer, reason) { + return { layer, reason }; +} + +function buildEmptyEnvelope(reason, runtimeMs) { + return { + eligible: false, + reason, + layers: [], + skipped: [skippedEntry("rule-diff", reason)], + ruleDiff: null, + warnings: [], + runtimeMs, + }; +} + +function shouldRunRuleDiff(ctx, output) { + if (!ROUND_TRIP_FORMATS.has(ctx?.from) || !ROUND_TRIP_FORMATS.has(ctx?.to)) { + return { ok: false, reason: "writer-not-text-canonical" }; + } + if (typeof output?.data !== "string") { + return { ok: false, reason: "output-not-string" }; + } + return { ok: true }; +} + +function safeRead(ctx, payload, fromFormat) { + try { + const readBack = ctx.read({ + content: payload, + from: fromFormat, + title: ctx?.title || "verification-readback", + }); + return { ok: true, model: readBack }; + } catch (error) { + return { + ok: false, + error: error?.code || error?.message || "unknown", + }; + } +} + +function safeCrossLoopback(ctx, payload) { + // payload 在 ctx.to 格式中:先反向 prepareConversionModel(payload, from = ctx.to, to = ctx.from), + // 用 ctx.write 把它写回原格式,再 read 一次以拿到与 original 同源格式的 SemanticDoc。 + if (typeof ctx?.prepareConversionModel !== "function" || typeof ctx?.write !== "function") { + return { ok: false, error: "ctx-missing-pipeline" }; + } + try { + const reverseModel = ctx.prepareConversionModel({ + content: payload, + from: ctx.to, + to: ctx.from, + title: ctx?.title || "verification-readback", + fileName: "", + options: { repair: false }, + }); + const reverseOutput = ctx.write({ + model: reverseModel, + to: ctx.from, + title: ctx?.title || "verification-readback", + options: {}, + }); + if (typeof reverseOutput?.data !== "string") { + return { ok: false, error: "reverse-output-not-string" }; + } + const readBack = ctx.read({ + content: reverseOutput.data, + from: ctx.from, + title: ctx?.title || "verification-readback", + }); + return { ok: true, model: readBack }; + } catch (error) { + return { ok: false, error: error?.code || error?.message || "unknown" }; + } +} + +export function runVerificationStage({ model, output, ctx } = {}) { + const start = nowMs(); + const gating = shouldRunRuleDiff(ctx, output); + if (!gating.ok) { + return buildEmptyEnvelope(gating.reason, nowMs() - start); + } + if (typeof ctx?.read !== "function") { + return buildEmptyEnvelope("ctx-missing-read", nowMs() - start); + } + + let readBackResult; + const pairKey = `${ctx.from}->${ctx.to}`; + + if (ctx.from === ctx.to) { + readBackResult = safeRead(ctx, output.data, ctx.to); + } else if (CROSS_FORMAT_LOOPBACK_PAIRS.has(pairKey)) { + readBackResult = safeCrossLoopback(ctx, output.data); + } else { + return buildEmptyEnvelope("cross-format-loopback-not-enabled", nowMs() - start); + } + + if (!readBackResult.ok) { + const warning = createWarning( + "info", + RULE_DIFF_READBACK_FAILED, + `Rule-diff readback failed for ${pairKey}: ${readBackResult.error}.`, + { from: ctx.from, to: ctx.to, cause: readBackResult.error }, + ); + return { + eligible: true, + reason: "readback-failed", + layers: [], + skipped: [skippedEntry("rule-diff", `readback-failed:${readBackResult.error}`)], + ruleDiff: null, + warnings: [warning], + runtimeMs: nowMs() - start, + }; + } + + const ruleDiff = diffSemanticDocs(model, readBackResult.model); + const warnings = []; + if (ruleDiff.fidelity !== "exact") { + warnings.push(createWarning( + "info", + RULE_DIFF_DRIFT, + `Rule-diff detected ${ruleDiff.fidelity} for ${pairKey} (score ${ruleDiff.overallScore.toFixed(3)}).`, + { + from: ctx.from, + to: ctx.to, + fidelity: ruleDiff.fidelity, + score: ruleDiff.overallScore, + addedCount: ruleDiff.addedBlocks.length, + removedCount: ruleDiff.removedBlocks.length, + changedCount: ruleDiff.changedBlocks.length, + }, + )); + } + + return { + eligible: true, + reason: "completed", + layers: ["rule-diff"], + skipped: [], + ruleDiff, + warnings, + runtimeMs: nowMs() - start, + }; +} + +// ---- SSIM 视觉回环层(P9-C.2,异步) ---- + +function ssimGate(ctx) { + if (!RASTERIZABLE_FORMATS.has(ctx?.from)) { + return { ok: false, reason: ctx?.from ? "source-not-rasterizable" : "no-source-format" }; + } + if (!RASTERIZABLE_FORMATS.has(ctx?.to)) { + return { ok: false, reason: "output-not-rasterizable" }; + } + return { ok: true }; +} + +export async function runSsimLayer({ ctx, output, imageSource = defaultPageImageSource } = {}) { + const start = nowMs(); + const gate = ssimGate(ctx); + if (!gate.ok) { + return { + eligible: false, + reason: gate.reason, + ssim: null, + warnings: [], + runtimeMs: nowMs() - start, + }; + } + + const threshold = typeof ctx?.options?.verification?.ssimThreshold === "number" + ? ctx.options.verification.ssimThreshold + : DEFAULT_SSIM_THRESHOLD; + + let sourceImage; + let outputImage; + try { + sourceImage = await imageSource.getPageImage({ format: ctx.from, content: ctx.content, pageIndex: 0 }); + outputImage = await imageSource.getPageImage({ format: ctx.to, content: output?.data, pageIndex: 0 }); + } catch (error) { + const cause = error?.code || error?.message || "unknown"; + const reason = cause === "VERIFICATION_IMAGE_SOURCE_UNAVAILABLE" ? "image-source-unavailable" : `image-source-failed:${cause}`; + return { + eligible: false, + reason, + ssim: null, + warnings: cause === "VERIFICATION_IMAGE_SOURCE_UNAVAILABLE" + ? [] + : [createWarning("info", SSIM_SOURCE_UNAVAILABLE, `SSIM 视觉对比跳过:${reason}.`, { from: ctx.from, to: ctx.to, cause })], + runtimeMs: nowMs() - start, + }; + } + + const comparison = compareImages(sourceImage, outputImage, ctx?.options?.verification || {}); + const passed = comparison.score >= threshold; + const ssim = { + score: comparison.score, + threshold, + passed, + width: comparison.width, + height: comparison.height, + pageIndex: 0, + sourceFormat: ctx.from, + outputFormat: ctx.to, + dimensionsMatched: comparison.dimensionsMatched, + }; + const warnings = passed + ? [] + : [createWarning( + "info", + SSIM_VISUAL_DRIFT, + `SSIM 视觉对比 ${ctx.from} → ${ctx.to} 低于阈值(score ${comparison.score.toFixed(3)} < ${threshold})。`, + { from: ctx.from, to: ctx.to, score: comparison.score, threshold }, + )]; + + return { + eligible: true, + reason: "completed", + ssim, + warnings, + runtimeMs: nowMs() - start, + }; +} + +// 异步编排:同步 rule-diff 基底 + 异步 SSIM 视觉回环 + 异步 OCR 回读,合并为统一 envelope。 +export async function runVerificationStageAsync({ model, output, ctx, imageSource, ocrEngine, ocrRasterizer } = {}) { + const base = runVerificationStage({ model, output, ctx }); + const ssimLayer = await runSsimLayer({ ctx, output, imageSource }); + + let ocrLayer = { eligible: false, reason: "ocr-readback-not-loaded", ocrReadback: null, warnings: [], runtimeMs: 0 }; + try { + const mod = await import("./ocr-readback.js"); + ocrLayer = await mod.runOcrReadbackLayer({ model, output, ctx, engine: ocrEngine, rasterizer: ocrRasterizer }); + } catch (error) { + ocrLayer = { eligible: false, reason: `ocr-readback-load-failed:${error?.code || error?.message || "unknown"}`, ocrReadback: null, warnings: [], runtimeMs: 0 }; + } + + const layers = [...base.layers]; + const skipped = [...base.skipped]; + if (ssimLayer.eligible) { + layers.push("ssim"); + } else { + skipped.push({ layer: "ssim", reason: ssimLayer.reason }); + } + if (ocrLayer.eligible) { + layers.push("ocr-readback"); + } else { + skipped.push({ layer: "ocr-readback", reason: ocrLayer.reason }); + } + + return { + eligible: base.eligible || ssimLayer.eligible || ocrLayer.eligible, + reason: base.reason, + layers, + skipped, + ruleDiff: base.ruleDiff, + ssim: ssimLayer.ssim, + ocrReadback: ocrLayer.ocrReadback, + warnings: [...base.warnings, ...ssimLayer.warnings, ...ocrLayer.warnings], + runtimeMs: base.runtimeMs + ssimLayer.runtimeMs + ocrLayer.runtimeMs, + }; +} diff --git a/public/formats/inline-tokens.js b/public/formats/inline-tokens.js index 45cb643..87af827 100644 --- a/public/formats/inline-tokens.js +++ b/public/formats/inline-tokens.js @@ -54,6 +54,35 @@ export function parseInlineMarkdown(text) { while (index < source.length) { const ch = source[index]; + // LaTeX 数学:$$...$$(块级)/ $...$(行内)。内容逐字保留,不递归、不转义(须在转义 + // 处理之前,避免 \frac 的反斜杠被吃掉)。行内启发式:定界符内侧不得为空白,避免把 + // "$5 ... $10" 这类货币误判为数学。 + if (ch === "$") { + if (source.startsWith("$$", index)) { + const end = source.indexOf("$$", index + 2); + if (end > index + 2) { + const inner = source.slice(index + 2, end); + if (inner.trim()) { + pushText(); + tokens.push({ type: "math", display: true, value: inner }); + index = end + 2; + continue; + } + } + } else { + const end = source.indexOf("$", index + 1); + if (end > index + 1) { + const inner = source.slice(index + 1, end); + if (inner && !inner.includes("\n") && !/^\s/.test(inner) && !/\s$/.test(inner)) { + pushText(); + tokens.push({ type: "math", display: false, value: inner }); + index = end + 1; + continue; + } + } + } + } + // 转义 if (ch === "\\" && index + 1 < source.length) { buffer += source[index + 1]; diff --git a/public/index.html b/public/index.html index 5d1f07a..7f9343b 100644 --- a/public/index.html +++ b/public/index.html @@ -7,6 +7,8 @@ + +
@@ -225,6 +227,18 @@

Trans2Former

+
diff --git a/public/katex-render.js b/public/katex-render.js new file mode 100644 index 0000000..6e5fb1f --- /dev/null +++ b/public/katex-render.js @@ -0,0 +1,23 @@ +// LaTeX 渲染:把 inline 渲染器产出的 `` +// 用 KaTeX 排版。KaTeX 经
diff --git a/public/preview.js b/public/preview.js index d30b89f..614561a 100644 --- a/public/preview.js +++ b/public/preview.js @@ -3,6 +3,7 @@ import { renderPreviewHtml, toDocumentModel, } from "./browser-transformer.js"; +import { renderMathIn } from "./katex-render.js"; const TEXT_LIKE_FORMATS = new Set(["md", "html", "txt", "json", "csv", "xml"]); const IMAGE_FORMATS = new Set(["png"]); @@ -99,6 +100,7 @@ function renderTextLike(content, format) { wrap.className = "preview-canvas preview-markdown"; try { wrap.innerHTML = renderPreviewHtml(content, format, payload?.source?.fileName || "preview"); + renderMathIn(wrap); } catch (error) { wrap.textContent = String(content || ""); } diff --git a/public/security-center.js b/public/security-center.js index 170da06..dd403e4 100644 --- a/public/security-center.js +++ b/public/security-center.js @@ -10,6 +10,10 @@ import { markTesseractVendorReady, sha256Hex, tesseractOCREngine, + paddleOcrEngine, + markPaddleOcrVendorReady, + PADDLE_OCR_MODEL_FILES, + PADDLE_OCR_REQUIRED_FILES, } from "/browser-transformer.js"; const ORIGIN = location.origin; @@ -225,7 +229,9 @@ function renderModelCache(dialog) { `; const actionsHtml = entry.manifest.task === "ocr-text" && entry.manifest.engine === "tesseract" ? renderTesseractActions(entry) - : ""; + : (entry.manifest.task === "ocr-text" && entry.manifest.engine === "paddleocr" + ? renderPaddleActions(entry) + : ""); li.innerHTML = headerHtml + actionsHtml; list.appendChild(li); } @@ -249,6 +255,19 @@ function renderTesseractActions(entry) { `; } +function renderPaddleActions(entry) { + const ready = entry.status === STATUS_AVAILABLE; + const importButtons = PADDLE_OCR_MODEL_FILES.map((file) => + ``, + ).join(""); + return ` +
+ ${importButtons} + +
+ `; +} + function setStatusMessage(dialog, text, level = "info") { const target = dialog.querySelector("[data-model-cache-status]"); if (!target) return; @@ -328,6 +347,91 @@ async function clearTessdata(dialog, button) { } } +async function importPaddleModel(dialog, button) { + const file = button.dataset.file || "det.onnx"; + const manifestId = button.dataset.manifestId; + const fileInput = dialog.querySelector("#modelCacheFileInput"); + if (!fileInput) { + setStatusMessage(dialog, "无法找到文件选择器", "error"); + return; + } + await new Promise((resolve) => { + const handler = async () => { + fileInput.removeEventListener("change", handler); + const picked = fileInput.files?.[0]; + fileInput.value = ""; + if (!picked) { + resolve(); + return; + } + try { + defaultModelCache.setStatus(manifestId, STATUS_VERIFYING, { message: `校验 ${picked.name}…` }); + setStatusMessage(dialog, `正在校验 ${picked.name} (${(picked.size / 1024).toFixed(0)} KB)…`, "info"); + const buffer = await picked.arrayBuffer(); + const sha256 = await sha256Hex(buffer); + await defaultOCRStorage.put(`paddleocr/v5/${file}`, buffer, { sha256 }); + // 先置位 vendor-ready(用户已选用 PP-OCRv5),再 probe;否则 ensureProbe 在 vendor + // 未置位时恒返回 false,状态永远翻不过去。真正的 onnxruntime 运行时加载仍在 + // recognize() 时把关。对齐 paddle-default-models.js / tesseract 导入流程的顺序。 + markPaddleOcrVendorReady(true); + const ready = typeof paddleOcrEngine.ensureProbe === "function" + ? await paddleOcrEngine.ensureProbe() + : false; + if (ready) { + defaultModelCache.setStatus(manifestId, STATUS_AVAILABLE, { + message: `PP-OCRv5 必选模型 (det/rec) 就位 (最近导入 ${file}, sha256=${sha256.slice(0, 12)}…)`, + file, + sha256, + size: buffer.byteLength, + }); + setStatusMessage(dialog, `${file} 已导入,PP-OCRv5 必选模型齐全 ✅(cls 可选)`, "success"); + } else { + const missing = await missingPaddleFiles(); + defaultModelCache.setStatus(manifestId, STATUS_VERIFYING, { + message: `已导入 ${file};还需导入:${missing.join(", ") || "(无)"}`, + }); + setStatusMessage(dialog, `${file} 已导入;还需导入:${missing.join(", ")}`, "info"); + } + } catch (error) { + defaultModelCache.setStatus(manifestId, STATUS_NOT_DOWNLOADED, { + message: `导入失败:${error?.message || error}`, + }); + setStatusMessage(dialog, `导入 ${file} 失败:${error?.message || error}`, "error"); + } + resolve(); + }; + fileInput.addEventListener("change", handler); + fileInput.click(); + }); +} + +async function missingPaddleFiles() { + // 只报必选缺失(det/rec);cls 为可选,不算缺。 + const missing = []; + for (const file of PADDLE_OCR_REQUIRED_FILES) { + if (!(await defaultOCRStorage.has(`paddleocr/v5/${file}`))) missing.push(file); + } + return missing; +} + +async function clearPaddleModels(dialog, button) { + const manifestId = button.dataset.manifestId; + try { + for (const file of PADDLE_OCR_MODEL_FILES) { + await defaultOCRStorage.delete(`paddleocr/v5/${file}`); + } + if (typeof paddleOcrEngine.ensureProbe === "function") { + await paddleOcrEngine.ensureProbe(); + } + defaultModelCache.setStatus(manifestId, STATUS_NOT_DOWNLOADED, { + message: "已清除本地 PP-OCRv5 模型;下次启用需重新导入 det/rec(cls 可选)。", + }); + setStatusMessage(dialog, "PP-OCRv5 模型已清除", "info"); + } catch (error) { + setStatusMessage(dialog, `清除失败:${error?.message || error}`, "error"); + } +} + function formatBundleSize(bytes) { if (!Number.isFinite(bytes) || bytes <= 0) return "未声明体积"; const units = ["B", "KB", "MB", "GB"]; @@ -383,6 +487,12 @@ function init() { } else if (target.matches("[data-clear-tessdata]")) { event.preventDefault(); clearTessdata(dialog, target); + } else if (target.matches("[data-import-paddle]")) { + event.preventDefault(); + importPaddleModel(dialog, target); + } else if (target.matches("[data-clear-paddle]")) { + event.preventDefault(); + clearPaddleModels(dialog, target); } }); } diff --git a/public/styles.css b/public/styles.css index dc9fd81..d4aa35b 100644 --- a/public/styles.css +++ b/public/styles.css @@ -1420,6 +1420,100 @@ button:disabled { color: #ffe5eb; } +/* P9-C.4 转换检验报告:呈现规则 diff + SSIM + OCR 回读三层检验结果 */ +.verification-report { + margin: 10px; + padding: 8px 10px; + border: 1px solid var(--border); + border-radius: var(--radius); + background: var(--surface); + font-size: 0.78rem; +} + +.verification-report[hidden] { + display: none; +} + +.verification-report > summary { + cursor: pointer; + font-weight: 600; + color: var(--text); + list-style: revert; +} + +.verification-badge { + margin-left: 6px; + padding: 1px 8px; + border-radius: 999px; + font-size: 0.68rem; + font-weight: 500; + background: rgba(45, 156, 142, 0.14); + color: #1f6f64; +} + +.verification-badge[data-state="skip"] { + background: rgba(120, 132, 148, 0.16); + color: #586172; +} + +.verification-note { + margin: 8px 0 6px; + color: var(--muted, #6b7280); + font-size: 0.7rem; +} + +.verification-grid { + display: grid; + gap: 6px; + margin: 0; +} + +.verification-row { + display: grid; + grid-template-columns: 7.5rem minmax(0, 1fr); + align-items: baseline; + gap: 8px; + padding: 5px 8px; + border: 1px solid var(--border); + border-left: 3px solid var(--border); + border-radius: 6px; + background: rgba(255, 255, 255, 0.6); +} + +.verification-row dt { + margin: 0; + color: #586172; + font-size: 0.72rem; +} + +.verification-row dd { + margin: 0; + overflow: hidden; + font-family: var(--mono); + font-size: 0.72rem; + word-break: break-word; +} + +.verification-row dd[data-state="ok"] { + color: #1f6f64; +} + +.verification-row dd[data-state="drift"] { + color: #9a5b18; +} + +.verification-row dd[data-state="skip"] { + color: #8a93a3; +} + +.verification-row:has(dd[data-state="ok"]) { + border-left-color: #2d9c8e; +} + +.verification-row:has(dd[data-state="drift"]) { + border-left-color: #d08a3a; +} + .bottom-drawer { border-top: 1px solid var(--border); background: var(--surface); diff --git a/public/vendor/katex/fonts/KaTeX_AMS-Regular.woff2 b/public/vendor/katex/fonts/KaTeX_AMS-Regular.woff2 new file mode 100644 index 0000000..0acaaff Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_AMS-Regular.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Caligraphic-Bold.woff2 b/public/vendor/katex/fonts/KaTeX_Caligraphic-Bold.woff2 new file mode 100644 index 0000000..f390922 Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Caligraphic-Bold.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Caligraphic-Regular.woff2 b/public/vendor/katex/fonts/KaTeX_Caligraphic-Regular.woff2 new file mode 100644 index 0000000..75344a1 Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Caligraphic-Regular.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Fraktur-Bold.woff2 b/public/vendor/katex/fonts/KaTeX_Fraktur-Bold.woff2 new file mode 100644 index 0000000..395f28b Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Fraktur-Bold.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Fraktur-Regular.woff2 b/public/vendor/katex/fonts/KaTeX_Fraktur-Regular.woff2 new file mode 100644 index 0000000..735f694 Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Fraktur-Regular.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Main-Bold.woff2 b/public/vendor/katex/fonts/KaTeX_Main-Bold.woff2 new file mode 100644 index 0000000..ab2ad21 Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Main-Bold.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Main-BoldItalic.woff2 b/public/vendor/katex/fonts/KaTeX_Main-BoldItalic.woff2 new file mode 100644 index 0000000..5931794 Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Main-BoldItalic.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Main-Italic.woff2 b/public/vendor/katex/fonts/KaTeX_Main-Italic.woff2 new file mode 100644 index 0000000..b50920e Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Main-Italic.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Main-Regular.woff2 b/public/vendor/katex/fonts/KaTeX_Main-Regular.woff2 new file mode 100644 index 0000000..eb24a7b Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Main-Regular.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Math-BoldItalic.woff2 b/public/vendor/katex/fonts/KaTeX_Math-BoldItalic.woff2 new file mode 100644 index 0000000..2965702 Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Math-BoldItalic.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Math-Italic.woff2 b/public/vendor/katex/fonts/KaTeX_Math-Italic.woff2 new file mode 100644 index 0000000..215c143 Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Math-Italic.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_SansSerif-Bold.woff2 b/public/vendor/katex/fonts/KaTeX_SansSerif-Bold.woff2 new file mode 100644 index 0000000..cfaa3bd Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_SansSerif-Bold.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_SansSerif-Italic.woff2 b/public/vendor/katex/fonts/KaTeX_SansSerif-Italic.woff2 new file mode 100644 index 0000000..349c06d Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_SansSerif-Italic.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_SansSerif-Regular.woff2 b/public/vendor/katex/fonts/KaTeX_SansSerif-Regular.woff2 new file mode 100644 index 0000000..a90eea8 Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_SansSerif-Regular.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Script-Regular.woff2 b/public/vendor/katex/fonts/KaTeX_Script-Regular.woff2 new file mode 100644 index 0000000..b3048fc Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Script-Regular.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Size1-Regular.woff2 b/public/vendor/katex/fonts/KaTeX_Size1-Regular.woff2 new file mode 100644 index 0000000..c5a8462 Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Size1-Regular.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Size2-Regular.woff2 b/public/vendor/katex/fonts/KaTeX_Size2-Regular.woff2 new file mode 100644 index 0000000..e1bccfe Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Size2-Regular.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Size3-Regular.woff2 b/public/vendor/katex/fonts/KaTeX_Size3-Regular.woff2 new file mode 100644 index 0000000..249a286 Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Size3-Regular.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Size4-Regular.woff2 b/public/vendor/katex/fonts/KaTeX_Size4-Regular.woff2 new file mode 100644 index 0000000..680c130 Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Size4-Regular.woff2 differ diff --git a/public/vendor/katex/fonts/KaTeX_Typewriter-Regular.woff2 b/public/vendor/katex/fonts/KaTeX_Typewriter-Regular.woff2 new file mode 100644 index 0000000..771f1af Binary files /dev/null and b/public/vendor/katex/fonts/KaTeX_Typewriter-Regular.woff2 differ diff --git a/public/vendor/katex/katex.min.css b/public/vendor/katex/katex.min.css new file mode 100644 index 0000000..71298b5 --- /dev/null +++ b/public/vendor/katex/katex.min.css @@ -0,0 +1 @@ +@font-face{font-display:block;font-family:KaTeX_AMS;font-style:normal;font-weight:400;src:url(fonts/KaTeX_AMS-Regular.woff2) format("woff2"),url(fonts/KaTeX_AMS-Regular.woff) format("woff"),url(fonts/KaTeX_AMS-Regular.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Caligraphic;font-style:normal;font-weight:700;src:url(fonts/KaTeX_Caligraphic-Bold.woff2) format("woff2"),url(fonts/KaTeX_Caligraphic-Bold.woff) format("woff"),url(fonts/KaTeX_Caligraphic-Bold.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Caligraphic;font-style:normal;font-weight:400;src:url(fonts/KaTeX_Caligraphic-Regular.woff2) format("woff2"),url(fonts/KaTeX_Caligraphic-Regular.woff) format("woff"),url(fonts/KaTeX_Caligraphic-Regular.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Fraktur;font-style:normal;font-weight:700;src:url(fonts/KaTeX_Fraktur-Bold.woff2) format("woff2"),url(fonts/KaTeX_Fraktur-Bold.woff) format("woff"),url(fonts/KaTeX_Fraktur-Bold.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Fraktur;font-style:normal;font-weight:400;src:url(fonts/KaTeX_Fraktur-Regular.woff2) format("woff2"),url(fonts/KaTeX_Fraktur-Regular.woff) format("woff"),url(fonts/KaTeX_Fraktur-Regular.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Main;font-style:normal;font-weight:700;src:url(fonts/KaTeX_Main-Bold.woff2) format("woff2"),url(fonts/KaTeX_Main-Bold.woff) format("woff"),url(fonts/KaTeX_Main-Bold.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Main;font-style:italic;font-weight:700;src:url(fonts/KaTeX_Main-BoldItalic.woff2) format("woff2"),url(fonts/KaTeX_Main-BoldItalic.woff) format("woff"),url(fonts/KaTeX_Main-BoldItalic.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Main;font-style:italic;font-weight:400;src:url(fonts/KaTeX_Main-Italic.woff2) format("woff2"),url(fonts/KaTeX_Main-Italic.woff) format("woff"),url(fonts/KaTeX_Main-Italic.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Main;font-style:normal;font-weight:400;src:url(fonts/KaTeX_Main-Regular.woff2) format("woff2"),url(fonts/KaTeX_Main-Regular.woff) format("woff"),url(fonts/KaTeX_Main-Regular.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Math;font-style:italic;font-weight:700;src:url(fonts/KaTeX_Math-BoldItalic.woff2) format("woff2"),url(fonts/KaTeX_Math-BoldItalic.woff) format("woff"),url(fonts/KaTeX_Math-BoldItalic.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Math;font-style:italic;font-weight:400;src:url(fonts/KaTeX_Math-Italic.woff2) format("woff2"),url(fonts/KaTeX_Math-Italic.woff) format("woff"),url(fonts/KaTeX_Math-Italic.ttf) format("truetype")}@font-face{font-display:block;font-family:"KaTeX_SansSerif";font-style:normal;font-weight:700;src:url(fonts/KaTeX_SansSerif-Bold.woff2) format("woff2"),url(fonts/KaTeX_SansSerif-Bold.woff) format("woff"),url(fonts/KaTeX_SansSerif-Bold.ttf) format("truetype")}@font-face{font-display:block;font-family:"KaTeX_SansSerif";font-style:italic;font-weight:400;src:url(fonts/KaTeX_SansSerif-Italic.woff2) format("woff2"),url(fonts/KaTeX_SansSerif-Italic.woff) format("woff"),url(fonts/KaTeX_SansSerif-Italic.ttf) format("truetype")}@font-face{font-display:block;font-family:"KaTeX_SansSerif";font-style:normal;font-weight:400;src:url(fonts/KaTeX_SansSerif-Regular.woff2) format("woff2"),url(fonts/KaTeX_SansSerif-Regular.woff) format("woff"),url(fonts/KaTeX_SansSerif-Regular.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Script;font-style:normal;font-weight:400;src:url(fonts/KaTeX_Script-Regular.woff2) format("woff2"),url(fonts/KaTeX_Script-Regular.woff) format("woff"),url(fonts/KaTeX_Script-Regular.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Size1;font-style:normal;font-weight:400;src:url(fonts/KaTeX_Size1-Regular.woff2) format("woff2"),url(fonts/KaTeX_Size1-Regular.woff) format("woff"),url(fonts/KaTeX_Size1-Regular.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Size2;font-style:normal;font-weight:400;src:url(fonts/KaTeX_Size2-Regular.woff2) format("woff2"),url(fonts/KaTeX_Size2-Regular.woff) format("woff"),url(fonts/KaTeX_Size2-Regular.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Size3;font-style:normal;font-weight:400;src:url(fonts/KaTeX_Size3-Regular.woff2) format("woff2"),url(fonts/KaTeX_Size3-Regular.woff) format("woff"),url(fonts/KaTeX_Size3-Regular.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Size4;font-style:normal;font-weight:400;src:url(fonts/KaTeX_Size4-Regular.woff2) format("woff2"),url(fonts/KaTeX_Size4-Regular.woff) format("woff"),url(fonts/KaTeX_Size4-Regular.ttf) format("truetype")}@font-face{font-display:block;font-family:KaTeX_Typewriter;font-style:normal;font-weight:400;src:url(fonts/KaTeX_Typewriter-Regular.woff2) format("woff2"),url(fonts/KaTeX_Typewriter-Regular.woff) format("woff"),url(fonts/KaTeX_Typewriter-Regular.ttf) format("truetype")}.katex{font:normal 1.21em KaTeX_Main,Times New Roman,serif;line-height:1.2;position:relative;text-indent:0;text-rendering:auto}.katex *{-ms-high-contrast-adjust:none!important;border-color:currentColor}.katex .katex-version:after{content:"0.17.0"}.katex .katex-mathml{border:0;-webkit-clip-path:inset(50%);clip-path:inset(50%);height:1px;overflow:hidden;padding:0;position:absolute;width:1px}.katex .katex-html>.newline{display:block}.katex .base{position:relative;white-space:nowrap;width:-webkit-min-content;width:-moz-min-content;width:min-content}.katex .base,.katex .strut{display:inline-block}.katex .textbf{font-weight:700}.katex .textit{font-style:italic}.katex .textrm{font-family:KaTeX_Main}.katex .textsf{font-family:KaTeX_SansSerif}.katex .texttt{font-family:KaTeX_Typewriter}.katex .mathnormal{font-family:KaTeX_Math;font-style:italic}.katex .mathit{font-family:KaTeX_Main;font-style:italic}.katex .mathrm{font-style:normal}.katex .mathbf{font-family:KaTeX_Main;font-weight:700}.katex .boldsymbol{font-family:KaTeX_Math;font-style:italic;font-weight:700}.katex .amsrm,.katex .mathbb,.katex .textbb{font-family:KaTeX_AMS}.katex .mathcal{font-family:KaTeX_Caligraphic}.katex .mathfrak,.katex .textfrak{font-family:KaTeX_Fraktur}.katex .mathboldfrak,.katex .textboldfrak{font-family:KaTeX_Fraktur;font-weight:700}.katex .mathtt{font-family:KaTeX_Typewriter}.katex .mathscr,.katex .textscr{font-family:KaTeX_Script}.katex .mathsf,.katex .textsf{font-family:KaTeX_SansSerif}.katex .mathboldsf,.katex .textboldsf{font-family:KaTeX_SansSerif;font-weight:700}.katex .mathitsf,.katex .mathsfit,.katex .textitsf{font-family:KaTeX_SansSerif;font-style:italic}.katex .mainrm{font-family:KaTeX_Main;font-style:normal}.katex .vlist-t{border-collapse:collapse;display:inline-table;table-layout:fixed}.katex .vlist-r{display:table-row}.katex .vlist{display:table-cell;position:relative;vertical-align:bottom}.katex .vlist>span{display:block;height:0;position:relative}.katex .vlist>span>span{display:inline-block}.katex .vlist>span>.pstrut{overflow:hidden;width:0}.katex .vlist-t2{margin-right:-2px}.katex .vlist-s{display:table-cell;font-size:1px;min-width:2px;vertical-align:bottom;width:2px}.katex .vbox{align-items:baseline;display:inline-flex;flex-direction:column}.katex .hbox{width:100%}.katex .hbox,.katex .thinbox{display:inline-flex;flex-direction:row}.katex .thinbox{max-width:0;width:0}.katex .msupsub{text-align:left}.katex .mfrac>span>span{text-align:center}.katex .mfrac .frac-line{border-bottom-style:solid;display:inline-block;width:100%}.katex .hdashline,.katex .hline,.katex .mfrac .frac-line,.katex .overline .overline-line,.katex .rule,.katex .underline .underline-line{min-height:1px}.katex .mspace{display:inline-block}.katex .smash{display:inline;line-height:0}.katex .clap,.katex .llap,.katex .rlap{position:relative;width:0}.katex .clap>.inner,.katex .llap>.inner,.katex .rlap>.inner{position:absolute}.katex .clap>.fix,.katex .llap>.fix,.katex .rlap>.fix{display:inline-block}.katex .llap>.inner{right:0}.katex .clap>.inner,.katex .rlap>.inner{left:0}.katex .clap>.inner>span{margin-left:-50%;margin-right:50%}.katex .rule{border:0 solid;display:inline-block;position:relative}.katex .hline,.katex .overline .overline-line,.katex .underline .underline-line{border-bottom-style:solid;display:inline-block;width:100%}.katex .hdashline{border-bottom-style:dashed;display:inline-block;width:100%}.katex .sqrt>.root{margin-left:.2777777778em;margin-right:-.5555555556em}.katex .fontsize-ensurer.reset-size1.size1,.katex .sizing.reset-size1.size1{font-size:1em}.katex .fontsize-ensurer.reset-size1.size2,.katex .sizing.reset-size1.size2{font-size:1.2em}.katex .fontsize-ensurer.reset-size1.size3,.katex .sizing.reset-size1.size3{font-size:1.4em}.katex .fontsize-ensurer.reset-size1.size4,.katex .sizing.reset-size1.size4{font-size:1.6em}.katex .fontsize-ensurer.reset-size1.size5,.katex .sizing.reset-size1.size5{font-size:1.8em}.katex .fontsize-ensurer.reset-size1.size6,.katex .sizing.reset-size1.size6{font-size:2em}.katex .fontsize-ensurer.reset-size1.size7,.katex .sizing.reset-size1.size7{font-size:2.4em}.katex .fontsize-ensurer.reset-size1.size8,.katex .sizing.reset-size1.size8{font-size:2.88em}.katex .fontsize-ensurer.reset-size1.size9,.katex .sizing.reset-size1.size9{font-size:3.456em}.katex .fontsize-ensurer.reset-size1.size10,.katex .sizing.reset-size1.size10{font-size:4.148em}.katex .fontsize-ensurer.reset-size1.size11,.katex .sizing.reset-size1.size11{font-size:4.976em}.katex .fontsize-ensurer.reset-size2.size1,.katex .sizing.reset-size2.size1{font-size:.8333333333em}.katex .fontsize-ensurer.reset-size2.size2,.katex .sizing.reset-size2.size2{font-size:1em}.katex .fontsize-ensurer.reset-size2.size3,.katex .sizing.reset-size2.size3{font-size:1.1666666667em}.katex .fontsize-ensurer.reset-size2.size4,.katex .sizing.reset-size2.size4{font-size:1.3333333333em}.katex .fontsize-ensurer.reset-size2.size5,.katex .sizing.reset-size2.size5{font-size:1.5em}.katex .fontsize-ensurer.reset-size2.size6,.katex .sizing.reset-size2.size6{font-size:1.6666666667em}.katex .fontsize-ensurer.reset-size2.size7,.katex .sizing.reset-size2.size7{font-size:2em}.katex .fontsize-ensurer.reset-size2.size8,.katex .sizing.reset-size2.size8{font-size:2.4em}.katex .fontsize-ensurer.reset-size2.size9,.katex .sizing.reset-size2.size9{font-size:2.88em}.katex .fontsize-ensurer.reset-size2.size10,.katex .sizing.reset-size2.size10{font-size:3.4566666667em}.katex .fontsize-ensurer.reset-size2.size11,.katex .sizing.reset-size2.size11{font-size:4.1466666667em}.katex .fontsize-ensurer.reset-size3.size1,.katex .sizing.reset-size3.size1{font-size:.7142857143em}.katex .fontsize-ensurer.reset-size3.size2,.katex .sizing.reset-size3.size2{font-size:.8571428571em}.katex .fontsize-ensurer.reset-size3.size3,.katex .sizing.reset-size3.size3{font-size:1em}.katex .fontsize-ensurer.reset-size3.size4,.katex .sizing.reset-size3.size4{font-size:1.1428571429em}.katex .fontsize-ensurer.reset-size3.size5,.katex .sizing.reset-size3.size5{font-size:1.2857142857em}.katex .fontsize-ensurer.reset-size3.size6,.katex .sizing.reset-size3.size6{font-size:1.4285714286em}.katex .fontsize-ensurer.reset-size3.size7,.katex .sizing.reset-size3.size7{font-size:1.7142857143em}.katex .fontsize-ensurer.reset-size3.size8,.katex .sizing.reset-size3.size8{font-size:2.0571428571em}.katex .fontsize-ensurer.reset-size3.size9,.katex .sizing.reset-size3.size9{font-size:2.4685714286em}.katex .fontsize-ensurer.reset-size3.size10,.katex .sizing.reset-size3.size10{font-size:2.9628571429em}.katex .fontsize-ensurer.reset-size3.size11,.katex .sizing.reset-size3.size11{font-size:3.5542857143em}.katex .fontsize-ensurer.reset-size4.size1,.katex .sizing.reset-size4.size1{font-size:.625em}.katex .fontsize-ensurer.reset-size4.size2,.katex .sizing.reset-size4.size2{font-size:.75em}.katex .fontsize-ensurer.reset-size4.size3,.katex .sizing.reset-size4.size3{font-size:.875em}.katex .fontsize-ensurer.reset-size4.size4,.katex .sizing.reset-size4.size4{font-size:1em}.katex .fontsize-ensurer.reset-size4.size5,.katex .sizing.reset-size4.size5{font-size:1.125em}.katex .fontsize-ensurer.reset-size4.size6,.katex .sizing.reset-size4.size6{font-size:1.25em}.katex .fontsize-ensurer.reset-size4.size7,.katex .sizing.reset-size4.size7{font-size:1.5em}.katex .fontsize-ensurer.reset-size4.size8,.katex .sizing.reset-size4.size8{font-size:1.8em}.katex .fontsize-ensurer.reset-size4.size9,.katex .sizing.reset-size4.size9{font-size:2.16em}.katex .fontsize-ensurer.reset-size4.size10,.katex .sizing.reset-size4.size10{font-size:2.5925em}.katex .fontsize-ensurer.reset-size4.size11,.katex .sizing.reset-size4.size11{font-size:3.11em}.katex .fontsize-ensurer.reset-size5.size1,.katex .sizing.reset-size5.size1{font-size:.5555555556em}.katex .fontsize-ensurer.reset-size5.size2,.katex .sizing.reset-size5.size2{font-size:.6666666667em}.katex .fontsize-ensurer.reset-size5.size3,.katex .sizing.reset-size5.size3{font-size:.7777777778em}.katex .fontsize-ensurer.reset-size5.size4,.katex .sizing.reset-size5.size4{font-size:.8888888889em}.katex .fontsize-ensurer.reset-size5.size5,.katex .sizing.reset-size5.size5{font-size:1em}.katex .fontsize-ensurer.reset-size5.size6,.katex .sizing.reset-size5.size6{font-size:1.1111111111em}.katex .fontsize-ensurer.reset-size5.size7,.katex .sizing.reset-size5.size7{font-size:1.3333333333em}.katex .fontsize-ensurer.reset-size5.size8,.katex .sizing.reset-size5.size8{font-size:1.6em}.katex .fontsize-ensurer.reset-size5.size9,.katex .sizing.reset-size5.size9{font-size:1.92em}.katex .fontsize-ensurer.reset-size5.size10,.katex .sizing.reset-size5.size10{font-size:2.3044444444em}.katex .fontsize-ensurer.reset-size5.size11,.katex .sizing.reset-size5.size11{font-size:2.7644444444em}.katex .fontsize-ensurer.reset-size6.size1,.katex .sizing.reset-size6.size1{font-size:.5em}.katex .fontsize-ensurer.reset-size6.size2,.katex .sizing.reset-size6.size2{font-size:.6em}.katex .fontsize-ensurer.reset-size6.size3,.katex .sizing.reset-size6.size3{font-size:.7em}.katex .fontsize-ensurer.reset-size6.size4,.katex .sizing.reset-size6.size4{font-size:.8em}.katex .fontsize-ensurer.reset-size6.size5,.katex .sizing.reset-size6.size5{font-size:.9em}.katex .fontsize-ensurer.reset-size6.size6,.katex .sizing.reset-size6.size6{font-size:1em}.katex .fontsize-ensurer.reset-size6.size7,.katex .sizing.reset-size6.size7{font-size:1.2em}.katex .fontsize-ensurer.reset-size6.size8,.katex .sizing.reset-size6.size8{font-size:1.44em}.katex .fontsize-ensurer.reset-size6.size9,.katex .sizing.reset-size6.size9{font-size:1.728em}.katex .fontsize-ensurer.reset-size6.size10,.katex .sizing.reset-size6.size10{font-size:2.074em}.katex .fontsize-ensurer.reset-size6.size11,.katex .sizing.reset-size6.size11{font-size:2.488em}.katex .fontsize-ensurer.reset-size7.size1,.katex .sizing.reset-size7.size1{font-size:.4166666667em}.katex .fontsize-ensurer.reset-size7.size2,.katex .sizing.reset-size7.size2{font-size:.5em}.katex .fontsize-ensurer.reset-size7.size3,.katex .sizing.reset-size7.size3{font-size:.5833333333em}.katex .fontsize-ensurer.reset-size7.size4,.katex .sizing.reset-size7.size4{font-size:.6666666667em}.katex .fontsize-ensurer.reset-size7.size5,.katex .sizing.reset-size7.size5{font-size:.75em}.katex .fontsize-ensurer.reset-size7.size6,.katex .sizing.reset-size7.size6{font-size:.8333333333em}.katex .fontsize-ensurer.reset-size7.size7,.katex .sizing.reset-size7.size7{font-size:1em}.katex .fontsize-ensurer.reset-size7.size8,.katex .sizing.reset-size7.size8{font-size:1.2em}.katex .fontsize-ensurer.reset-size7.size9,.katex .sizing.reset-size7.size9{font-size:1.44em}.katex .fontsize-ensurer.reset-size7.size10,.katex .sizing.reset-size7.size10{font-size:1.7283333333em}.katex .fontsize-ensurer.reset-size7.size11,.katex .sizing.reset-size7.size11{font-size:2.0733333333em}.katex .fontsize-ensurer.reset-size8.size1,.katex .sizing.reset-size8.size1{font-size:.3472222222em}.katex .fontsize-ensurer.reset-size8.size2,.katex .sizing.reset-size8.size2{font-size:.4166666667em}.katex .fontsize-ensurer.reset-size8.size3,.katex .sizing.reset-size8.size3{font-size:.4861111111em}.katex .fontsize-ensurer.reset-size8.size4,.katex .sizing.reset-size8.size4{font-size:.5555555556em}.katex .fontsize-ensurer.reset-size8.size5,.katex .sizing.reset-size8.size5{font-size:.625em}.katex .fontsize-ensurer.reset-size8.size6,.katex .sizing.reset-size8.size6{font-size:.6944444444em}.katex .fontsize-ensurer.reset-size8.size7,.katex .sizing.reset-size8.size7{font-size:.8333333333em}.katex .fontsize-ensurer.reset-size8.size8,.katex .sizing.reset-size8.size8{font-size:1em}.katex .fontsize-ensurer.reset-size8.size9,.katex .sizing.reset-size8.size9{font-size:1.2em}.katex .fontsize-ensurer.reset-size8.size10,.katex .sizing.reset-size8.size10{font-size:1.4402777778em}.katex .fontsize-ensurer.reset-size8.size11,.katex .sizing.reset-size8.size11{font-size:1.7277777778em}.katex .fontsize-ensurer.reset-size9.size1,.katex .sizing.reset-size9.size1{font-size:.2893518519em}.katex .fontsize-ensurer.reset-size9.size2,.katex .sizing.reset-size9.size2{font-size:.3472222222em}.katex .fontsize-ensurer.reset-size9.size3,.katex .sizing.reset-size9.size3{font-size:.4050925926em}.katex .fontsize-ensurer.reset-size9.size4,.katex .sizing.reset-size9.size4{font-size:.462962963em}.katex .fontsize-ensurer.reset-size9.size5,.katex .sizing.reset-size9.size5{font-size:.5208333333em}.katex .fontsize-ensurer.reset-size9.size6,.katex .sizing.reset-size9.size6{font-size:.5787037037em}.katex .fontsize-ensurer.reset-size9.size7,.katex .sizing.reset-size9.size7{font-size:.6944444444em}.katex .fontsize-ensurer.reset-size9.size8,.katex .sizing.reset-size9.size8{font-size:.8333333333em}.katex .fontsize-ensurer.reset-size9.size9,.katex .sizing.reset-size9.size9{font-size:1em}.katex .fontsize-ensurer.reset-size9.size10,.katex .sizing.reset-size9.size10{font-size:1.2002314815em}.katex .fontsize-ensurer.reset-size9.size11,.katex .sizing.reset-size9.size11{font-size:1.4398148148em}.katex .fontsize-ensurer.reset-size10.size1,.katex .sizing.reset-size10.size1{font-size:.2410800386em}.katex .fontsize-ensurer.reset-size10.size2,.katex .sizing.reset-size10.size2{font-size:.2892960463em}.katex .fontsize-ensurer.reset-size10.size3,.katex .sizing.reset-size10.size3{font-size:.337512054em}.katex .fontsize-ensurer.reset-size10.size4,.katex .sizing.reset-size10.size4{font-size:.3857280617em}.katex .fontsize-ensurer.reset-size10.size5,.katex .sizing.reset-size10.size5{font-size:.4339440694em}.katex .fontsize-ensurer.reset-size10.size6,.katex .sizing.reset-size10.size6{font-size:.4821600771em}.katex .fontsize-ensurer.reset-size10.size7,.katex .sizing.reset-size10.size7{font-size:.5785920926em}.katex .fontsize-ensurer.reset-size10.size8,.katex .sizing.reset-size10.size8{font-size:.6943105111em}.katex .fontsize-ensurer.reset-size10.size9,.katex .sizing.reset-size10.size9{font-size:.8331726133em}.katex .fontsize-ensurer.reset-size10.size10,.katex .sizing.reset-size10.size10{font-size:1em}.katex .fontsize-ensurer.reset-size10.size11,.katex .sizing.reset-size10.size11{font-size:1.1996142719em}.katex .fontsize-ensurer.reset-size11.size1,.katex .sizing.reset-size11.size1{font-size:.2009646302em}.katex .fontsize-ensurer.reset-size11.size2,.katex .sizing.reset-size11.size2{font-size:.2411575563em}.katex .fontsize-ensurer.reset-size11.size3,.katex .sizing.reset-size11.size3{font-size:.2813504823em}.katex .fontsize-ensurer.reset-size11.size4,.katex .sizing.reset-size11.size4{font-size:.3215434084em}.katex .fontsize-ensurer.reset-size11.size5,.katex .sizing.reset-size11.size5{font-size:.3617363344em}.katex .fontsize-ensurer.reset-size11.size6,.katex .sizing.reset-size11.size6{font-size:.4019292605em}.katex .fontsize-ensurer.reset-size11.size7,.katex .sizing.reset-size11.size7{font-size:.4823151125em}.katex .fontsize-ensurer.reset-size11.size8,.katex .sizing.reset-size11.size8{font-size:.578778135em}.katex .fontsize-ensurer.reset-size11.size9,.katex .sizing.reset-size11.size9{font-size:.6945337621em}.katex .fontsize-ensurer.reset-size11.size10,.katex .sizing.reset-size11.size10{font-size:.8336012862em}.katex .fontsize-ensurer.reset-size11.size11,.katex .sizing.reset-size11.size11{font-size:1em}.katex .delimsizing.size1{font-family:KaTeX_Size1}.katex .delimsizing.size2{font-family:KaTeX_Size2}.katex .delimsizing.size3{font-family:KaTeX_Size3}.katex .delimsizing.size4{font-family:KaTeX_Size4}.katex .delimsizing.mult .delim-size1>span{font-family:KaTeX_Size1}.katex .delimsizing.mult .delim-size4>span{font-family:KaTeX_Size4}.katex .nulldelimiter{display:inline-block;width:.12em}.katex .delimcenter,.katex .op-symbol{position:relative}.katex .op-symbol.small-op{font-family:KaTeX_Size1}.katex .op-symbol.large-op{font-family:KaTeX_Size2}.katex .accent>.vlist-t,.katex .op-limits>.vlist-t{text-align:center}.katex .accent .accent-body{position:relative}.katex .accent .accent-body:not(.accent-full){width:0}.katex .overlay{display:block}.katex .mtable .vertical-separator{display:inline-block;min-width:1px}.katex .mtable .arraycolsep{display:inline-block}.katex .mtable .col-align-c>.vlist-t{text-align:center}.katex .mtable .col-align-l>.vlist-t{text-align:left}.katex .mtable .col-align-r>.vlist-t{text-align:right}.katex .svg-align{text-align:left}.katex svg{fill:currentColor;stroke:currentColor;display:block;height:inherit;position:absolute;width:100%}.katex svg path{stroke:none}.katex svg{fill-rule:nonzero;fill-opacity:1;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1}.katex img{border-style:none;max-height:none;max-width:none;min-height:0;min-width:0}.katex .stretchy{display:block;overflow:hidden;position:relative;width:100%}.katex .stretchy:after,.katex .stretchy:before{content:""}.katex .hide-tail{overflow:hidden;position:relative;width:100%}.katex .halfarrow-left{left:0;overflow:hidden;position:absolute;width:50.2%}.katex .halfarrow-right{overflow:hidden;position:absolute;right:0;width:50.2%}.katex .brace-left{left:0;overflow:hidden;position:absolute;width:25.1%}.katex .brace-center{left:25%;overflow:hidden;position:absolute;width:50%}.katex .brace-right{overflow:hidden;position:absolute;right:0;width:25.1%}.katex .x-arrow-pad{padding:0 .5em}.katex .cd-arrow-pad{padding:0 .55556em 0 .27778em}.katex .mover,.katex .munder,.katex .x-arrow{text-align:center}.katex .boxpad{padding:0 .3em}.katex .fbox,.katex .fcolorbox{border:.04em solid;box-sizing:border-box}.katex .cancel-pad{padding:0 .2em}.katex .cancel-lap{margin-left:-.2em;margin-right:-.2em}.katex .sout{border-bottom-style:solid;border-bottom-width:.08em}.katex .angl{border-right:.049em solid;border-top:.049em solid;box-sizing:border-box;margin-right:.03889em}.katex .anglpad{padding:0 .03889em}.katex .eqn-num:before{content:"(" counter(katexEqnNo) ")";counter-increment:katexEqnNo}.katex .mml-eqn-num:before{content:"(" counter(mmlEqnNo) ")";counter-increment:mmlEqnNo}.katex .mtr-glue{width:50%}.katex .cd-vert-arrow{display:inline-block;position:relative}.katex .cd-label-left{display:inline-block;position:absolute;right:calc(50% + .3em);text-align:left}.katex .cd-label-right{display:inline-block;left:calc(50% + .3em);position:absolute;text-align:right}.katex-display{display:block;margin:1em 0;text-align:center}.katex-display>.katex{display:block;text-align:center;white-space:nowrap}.katex-display>.katex>.katex-html{display:block;position:relative}.katex-display>.katex>.katex-html>.tag{position:absolute;right:0}.katex-display.leqno>.katex>.katex-html>.tag{left:0;right:auto}.katex-display.fleqn>.katex{padding-left:2em;text-align:left}body{counter-reset:katexEqnNo mmlEqnNo} diff --git a/public/vendor/katex/katex.min.js b/public/vendor/katex/katex.min.js new file mode 100644 index 0000000..91882b1 --- /dev/null +++ b/public/vendor/katex/katex.min.js @@ -0,0 +1 @@ +!function(e,t){"object"==typeof exports&&"object"==typeof module?module.exports=t():"function"==typeof define&&define.amd?define([],t):"object"==typeof exports?exports.katex=t():e.katex=t()}("undefined"!=typeof self?self:this,function(){return function(){"use strict";var e={d:function(t,r){for(var n in r)e.o(r,n)&&!e.o(t,n)&&Object.defineProperty(t,n,{enumerable:!0,get:r[n]})},o:function(e,t){return Object.prototype.hasOwnProperty.call(e,t)}},t={};e.d(t,{default:function(){return mo}});class r extends Error{constructor(e,t){let n,o,s="KaTeX parse error: "+e;const i=t&&t.loc;if(i&&i.start<=i.end){const e=i.lexer.input;n=i.start,o=i.end,n===e.length?s+=" at end of input: ":s+=" at position "+(n+1)+": ";const t=e.slice(n,o).replace(/[^]/g,"$&\u0332");let r,l;r=n>15?"\u2026"+e.slice(n-15,n):e.slice(0,n),l=o+15e.replace(o,"-$1").toLowerCase(),i={"&":"&",">":">","<":"<",'"':""","'":"'"},l=/[&><"']/g,a=e=>String(e).replace(l,e=>i[e]),c=e=>"ordgroup"===e.type||"color"===e.type?1===e.body.length?c(e.body[0]):e:"font"===e.type?c(e.body):e,h=new Set(["mathord","textord","atom"]),m=e=>h.has(c(e).type),u={displayMode:{type:"boolean",description:"Render math in display mode, which puts the math in display style (so \\int and \\sum are large, for example), and centers the math on the page on its own line.",cli:"-d, --display-mode"},output:{type:{enum:["htmlAndMathml","html","mathml"]},description:"Determines the markup language of the output.",cli:"-F, --format "},leqno:{type:"boolean",description:"Render display math in leqno style (left-justified tags)."},fleqn:{type:"boolean",description:"Render display math flush left."},throwOnError:{type:"boolean",default:!0,cli:"-t, --no-throw-on-error",cliDescription:"Render errors (in the color given by --error-color) instead of throwing a ParseError exception when encountering an error."},errorColor:{type:"string",default:"#cc0000",cli:"-c, --error-color ",cliDescription:"A color string given in the format 'rgb' or 'rrggbb' (no #). This option determines the color of errors rendered by the -t option.",cliProcessor:e=>"#"+e},macros:{type:"object",cli:"-m, --macro ",cliDescription:"Define custom macro of the form '\\foo:expansion' (use multiple -m arguments for multiple macros).",cliDefault:[],cliProcessor:(e,t)=>(t.push(e),t)},minRuleThickness:{type:"number",description:"Specifies a minimum thickness, in ems, for fraction lines, `\\sqrt` top lines, `{array}` vertical lines, `\\hline`, `\\hdashline`, `\\underline`, `\\overline`, and the borders of `\\fbox`, `\\boxed`, and `\\fcolorbox`.",processor:e=>Math.max(0,e),cli:"--min-rule-thickness ",cliProcessor:parseFloat},colorIsTextColor:{type:"boolean",description:"Makes \\color behave like LaTeX's 2-argument \\textcolor, instead of LaTeX's one-argument \\color mode change.",cli:"-b, --color-is-text-color"},strict:{type:[{enum:["warn","ignore","error"]},"boolean","function"],description:"Turn on strict / LaTeX faithfulness mode, which throws an error if the input uses features that are not supported by LaTeX.",cli:"-S, --strict",cliDefault:!1},trust:{type:["boolean","function"],description:"Trust the input, enabling all HTML features such as \\url.",cli:"-T, --trust"},maxSize:{type:"number",default:1/0,description:"If non-zero, all user-specified sizes, e.g. in \\rule{500em}{500em}, will be capped to maxSize ems. Otherwise, elements and spaces can be arbitrarily large",processor:e=>Math.max(0,e),cli:"-s, --max-size ",cliProcessor:parseInt},maxExpand:{type:"number",default:1e3,description:"Limit the number of macro expansions to the specified number, to prevent e.g. infinite macro loops. If set to Infinity, the macro expander will try to fully expand as in LaTeX.",processor:e=>Math.max(0,e),cli:"-e, --max-expand ",cliProcessor:e=>"Infinity"===e?1/0:parseInt(e)},globalGroup:{type:"boolean",cli:!1}};function p(e){if(void 0!==e.default)return e.default;return function(e){if("string"!=typeof e)return e.enum[0];switch(e){case"boolean":return!1;case"string":return"";case"number":return 0;case"object":return{};default:throw new Error("Unexpected schema type; settings must declare an explicit default.")}}(Array.isArray(e.type)?e.type[0]:e.type)}function d(e,t,r,n){const o=r[t];e[t]=void 0!==o?n.processor?n.processor(o):o:p(n)}class g{constructor(e){void 0===e&&(e={}),this.displayMode=void 0,this.output=void 0,this.leqno=void 0,this.fleqn=void 0,this.throwOnError=void 0,this.errorColor=void 0,this.macros=void 0,this.minRuleThickness=void 0,this.colorIsTextColor=void 0,this.strict=void 0,this.trust=void 0,this.maxSize=void 0,this.maxExpand=void 0,this.globalGroup=void 0,e=e||{};for(const t of Object.keys(u)){const r=u[t];r&&d(this,t,e,r)}}reportNonstrict(e,t,r){let o=this.strict;if("function"==typeof o&&(o=o(e,t,r)),o&&"ignore"!==o){if(!0===o||"error"===o)throw new n("LaTeX-incompatible input and strict mode is set to 'error': "+t+" ["+e+"]",r);"warn"===o?"undefined"!=typeof console&&console.warn("LaTeX-incompatible input and strict mode is set to 'warn': "+t+" ["+e+"]"):"undefined"!=typeof console&&console.warn("LaTeX-incompatible input and strict mode is set to unrecognized '"+o+"': "+t+" ["+e+"]")}}useStrictBehavior(e,t,r){let n=this.strict;if("function"==typeof n)try{n=n(e,t,r)}catch(e){n="error"}return!(!n||"ignore"===n)&&(!0===n||"error"===n||("warn"===n?("undefined"!=typeof console&&console.warn("LaTeX-incompatible input and strict mode is set to 'warn': "+t+" ["+e+"]"),!1):("undefined"!=typeof console&&console.warn("LaTeX-incompatible input and strict mode is set to unrecognized '"+n+"': "+t+" ["+e+"]"),!1)))}isTrusted(e){if("url"in e&&e.url&&!e.protocol){const t=(e=>{const t=/^[\x00-\x20]*([^\\/#?]*?)(:|�*58|�*3a|&colon)/i.exec(e);return t?":"!==t[2]?null:/^[a-zA-Z][a-zA-Z0-9+\-.]*$/.test(t[1])?t[1].toLowerCase():null:"_relative"})(e.url);if(null==t)return!1;e.protocol=t}const t="function"==typeof this.trust?this.trust(e):this.trust;return Boolean(t)}}class f{constructor(e,t,r){this.id=void 0,this.size=void 0,this.cramped=void 0,this.id=e,this.size=t,this.cramped=r}sup(){return b[y[this.id]]}sub(){return b[x[this.id]]}fracNum(){return b[w[this.id]]}fracDen(){return b[v[this.id]]}cramp(){return b[k[this.id]]}text(){return b[z[this.id]]}isTight(){return this.size>=2}}const b=[new f(0,0,!1),new f(1,0,!0),new f(2,1,!1),new f(3,1,!0),new f(4,2,!1),new f(5,2,!0),new f(6,3,!1),new f(7,3,!0)],y=[4,5,4,5,6,7,6,7],x=[5,5,5,5,7,7,7,7],w=[2,3,4,5,6,7,6,7],v=[3,3,5,5,7,7,7,7],k=[1,1,3,3,5,5,7,7],z=[0,1,2,3,2,3,2,3];var S={DISPLAY:b[0],TEXT:b[2],SCRIPT:b[4],SCRIPTSCRIPT:b[6]};const M=[{name:"latin",blocks:[[256,591],[768,879]]},{name:"cyrillic",blocks:[[1024,1279]]},{name:"armenian",blocks:[[1328,1423]]},{name:"brahmic",blocks:[[2304,4255]]},{name:"georgian",blocks:[[4256,4351]]},{name:"cjk",blocks:[[12288,12543],[19968,40879],[65280,65376]]},{name:"hangul",blocks:[[44032,55215]]}];const A=[];function T(e){for(let t=0;t=A[t]&&e<=A[t+1])return!0;return!1}M.forEach(e=>e.blocks.forEach(e=>A.push(...e)));const C=e=>e+" "+e,B=80,q={doubleleftarrow:"M262 157\nl10-10c34-36 62.7-77 86-123 3.3-8 5-13.3 5-16 0-5.3-6.7-8-20-8-7.3\n 0-12.2.5-14.5 1.5-2.3 1-4.8 4.5-7.5 10.5-49.3 97.3-121.7 169.3-217 216-28\n 14-57.3 25-88 33-6.7 2-11 3.8-13 5.5-2 1.7-3 4.2-3 7.5s1 5.8 3 7.5\nc2 1.7 6.3 3.5 13 5.5 68 17.3 128.2 47.8 180.5 91.5 52.3 43.7 93.8 96.2 124.5\n 157.5 9.3 8 15.3 12.3 18 13h6c12-.7 18-4 18-10 0-2-1.7-7-5-15-23.3-46-52-87\n-86-123l-10-10h399738v-40H218c328 0 0 0 0 0l-10-8c-26.7-20-65.7-43-117-69 2.7\n-2 6-3.7 10-5 36.7-16 72.3-37.3 107-64l10-8h399782v-40z\nm8 0v40h399730v-40zm0 194v40h399730v-40z",doublerightarrow:"M399738 392l\n-10 10c-34 36-62.7 77-86 123-3.3 8-5 13.3-5 16 0 5.3 6.7 8 20 8 7.3 0 12.2-.5\n 14.5-1.5 2.3-1 4.8-4.5 7.5-10.5 49.3-97.3 121.7-169.3 217-216 28-14 57.3-25 88\n-33 6.7-2 11-3.8 13-5.5 2-1.7 3-4.2 3-7.5s-1-5.8-3-7.5c-2-1.7-6.3-3.5-13-5.5-68\n-17.3-128.2-47.8-180.5-91.5-52.3-43.7-93.8-96.2-124.5-157.5-9.3-8-15.3-12.3-18\n-13h-6c-12 .7-18 4-18 10 0 2 1.7 7 5 15 23.3 46 52 87 86 123l10 10H0v40h399782\nc-328 0 0 0 0 0l10 8c26.7 20 65.7 43 117 69-2.7 2-6 3.7-10 5-36.7 16-72.3 37.3\n-107 64l-10 8H0v40zM0 157v40h399730v-40zm0 194v40h399730v-40z",leftarrow:"M400000 241H110l3-3c68.7-52.7 113.7-120\n 135-202 4-14.7 6-23 6-25 0-7.3-7-11-21-11-8 0-13.2.8-15.5 2.5-2.3 1.7-4.2 5.8\n-5.5 12.5-1.3 4.7-2.7 10.3-4 17-12 48.7-34.8 92-68.5 130S65.3 228.3 18 247\nc-10 4-16 7.7-18 11 0 8.7 6 14.3 18 17 47.3 18.7 87.8 47 121.5 85S196 441.3 208\n 490c.7 2 1.3 5 2 9s1.2 6.7 1.5 8c.3 1.3 1 3.3 2 6s2.2 4.5 3.5 5.5c1.3 1 3.3\n 1.8 6 2.5s6 1 10 1c14 0 21-3.7 21-11 0-2-2-10.3-6-25-20-79.3-65-146.7-135-202\n l-3-3h399890zM100 241v40h399900v-40z",leftbrace:"M6 548l-6-6v-35l6-11c56-104 135.3-181.3 238-232 57.3-28.7 117\n-45 179-50h399577v120H403c-43.3 7-81 15-113 26-100.7 33-179.7 91-237 174-2.7\n 5-6 9-10 13-.7 1-7.3 1-20 1H6z",leftbraceunder:"M0 6l6-6h17c12.688 0 19.313.3 20 1 4 4 7.313 8.3 10 13\n 35.313 51.3 80.813 93.8 136.5 127.5 55.688 33.7 117.188 55.8 184.5 66.5.688\n 0 2 .3 4 1 18.688 2.7 76 4.3 172 5h399450v120H429l-6-1c-124.688-8-235-61.7\n-331-161C60.687 138.7 32.312 99.3 7 54L0 41V6z",leftgroup:"M400000 80\nH435C64 80 168.3 229.4 21 260c-5.9 1.2-18 0-18 0-2 0-3-1-3-3v-38C76 61 257 0\n 435 0h399565z",leftgroupunder:"M400000 262\nH435C64 262 168.3 112.6 21 82c-5.9-1.2-18 0-18 0-2 0-3 1-3 3v38c76 158 257 219\n 435 219h399565z",leftharpoon:"M0 267c.7 5.3 3 10 7 14h399993v-40H93c3.3\n-3.3 10.2-9.5 20.5-18.5s17.8-15.8 22.5-20.5c50.7-52 88-110.3 112-175 4-11.3 5\n-18.3 3-21-1.3-4-7.3-6-18-6-8 0-13 .7-15 2s-4.7 6.7-8 16c-42 98.7-107.3 174.7\n-196 228-6.7 4.7-10.7 8-12 10-1.3 2-2 5.7-2 11zm100-26v40h399900v-40z",leftharpoonplus:"M0 267c.7 5.3 3 10 7 14h399993v-40H93c3.3-3.3 10.2-9.5\n 20.5-18.5s17.8-15.8 22.5-20.5c50.7-52 88-110.3 112-175 4-11.3 5-18.3 3-21-1.3\n-4-7.3-6-18-6-8 0-13 .7-15 2s-4.7 6.7-8 16c-42 98.7-107.3 174.7-196 228-6.7 4.7\n-10.7 8-12 10-1.3 2-2 5.7-2 11zm100-26v40h399900v-40zM0 435v40h400000v-40z\nm0 0v40h400000v-40z",leftharpoondown:"M7 241c-4 4-6.333 8.667-7 14 0 5.333.667 9 2 11s5.333\n 5.333 12 10c90.667 54 156 130 196 228 3.333 10.667 6.333 16.333 9 17 2 .667 5\n 1 9 1h5c10.667 0 16.667-2 18-6 2-2.667 1-9.667-3-21-32-87.333-82.667-157.667\n-152-211l-3-3h399907v-40zM93 281 H400000 v-40L7 241z",leftharpoondownplus:"M7 435c-4 4-6.3 8.7-7 14 0 5.3.7 9 2 11s5.3 5.3 12\n 10c90.7 54 156 130 196 228 3.3 10.7 6.3 16.3 9 17 2 .7 5 1 9 1h5c10.7 0 16.7\n-2 18-6 2-2.7 1-9.7-3-21-32-87.3-82.7-157.7-152-211l-3-3h399907v-40H7zm93 0\nv40h399900v-40zM0 241v40h399900v-40zm0 0v40h399900v-40z",lefthook:"M400000 281 H103s-33-11.2-61-33.5S0 197.3 0 164s14.2-61.2 42.5\n-83.5C70.8 58.2 104 47 142 47 c16.7 0 25 6.7 25 20 0 12-8.7 18.7-26 20-40 3.3\n-68.7 15.7-86 37-10 12-15 25.3-15 40 0 22.7 9.8 40.7 29.5 54 19.7 13.3 43.5 21\n 71.5 23h399859zM103 281v-40h399897v40z",leftlinesegment:C("M40 281 V428 H0 V94 H40 V241 H400000 v40z"),leftbracketunder:C("M0 0 h120 V290 H399995 v120 H0z"),leftbracketover:C("M0 440 h120 V150 H399995 v-120 H0z"),leftmapsto:C("M40 281 V448H0V74H40V241H400000v40z"),leftToFrom:"M0 147h400000v40H0zm0 214c68 40 115.7 95.7 143 167h22c15.3 0 23\n-.3 23-1 0-1.3-5.3-13.7-16-37-18-35.3-41.3-69-70-101l-7-8h399905v-40H95l7-8\nc28.7-32 52-65.7 70-101 10.7-23.3 16-35.7 16-37 0-.7-7.7-1-23-1h-22C115.7 265.3\n 68 321 0 361zm0-174v-40h399900v40zm100 154v40h399900v-40z",longequal:C("M0 50 h400000 v40H0z m0 194h40000v40H0z"),midbrace:"M200428 334\nc-100.7-8.3-195.3-44-280-108-55.3-42-101.7-93-139-153l-9-14c-2.7 4-5.7 8.7-9 14\n-53.3 86.7-123.7 153-211 199-66.7 36-137.3 56.3-212 62H0V214h199568c178.3-11.7\n 311.7-78.3 403-201 6-8 9.7-12 11-12 .7-.7 6.7-1 18-1s17.3.3 18 1c1.3 0 5 4 11\n 12 44.7 59.3 101.3 106.3 170 141s145.3 54.3 229 60h199572v120z",midbraceunder:"M199572 214\nc100.7 8.3 195.3 44 280 108 55.3 42 101.7 93 139 153l9 14c2.7-4 5.7-8.7 9-14\n 53.3-86.7 123.7-153 211-199 66.7-36 137.3-56.3 212-62h199568v120H200432c-178.3\n 11.7-311.7 78.3-403 201-6 8-9.7 12-11 12-.7.7-6.7 1-18 1s-17.3-.3-18-1c-1.3 0\n-5-4-11-12-44.7-59.3-101.3-106.3-170-141s-145.3-54.3-229-60H0V214z",oiintSize1:"M512.6 71.6c272.6 0 320.3 106.8 320.3 178.2 0 70.8-47.7 177.6\n-320.3 177.6S193.1 320.6 193.1 249.8c0-71.4 46.9-178.2 319.5-178.2z\nm368.1 178.2c0-86.4-60.9-215.4-368.1-215.4-306.4 0-367.3 129-367.3 215.4 0 85.8\n60.9 214.8 367.3 214.8 307.2 0 368.1-129 368.1-214.8z",oiintSize2:"M757.8 100.1c384.7 0 451.1 137.6 451.1 230 0 91.3-66.4 228.8\n-451.1 228.8-386.3 0-452.7-137.5-452.7-228.8 0-92.4 66.4-230 452.7-230z\nm502.4 230c0-111.2-82.4-277.2-502.4-277.2s-504 166-504 277.2\nc0 110 84 276 504 276s502.4-166 502.4-276z",oiiintSize1:"M681.4 71.6c408.9 0 480.5 106.8 480.5 178.2 0 70.8-71.6 177.6\n-480.5 177.6S202.1 320.6 202.1 249.8c0-71.4 70.5-178.2 479.3-178.2z\nm525.8 178.2c0-86.4-86.8-215.4-525.7-215.4-437.9 0-524.7 129-524.7 215.4 0\n85.8 86.8 214.8 524.7 214.8 438.9 0 525.7-129 525.7-214.8z",oiiintSize2:"M1021.2 53c603.6 0 707.8 165.8 707.8 277.2 0 110-104.2 275.8\n-707.8 275.8-606 0-710.2-165.8-710.2-275.8C311 218.8 415.2 53 1021.2 53z\nm770.4 277.1c0-131.2-126.4-327.6-770.5-327.6S248.4 198.9 248.4 330.1\nc0 130 128.8 326.4 772.7 326.4s770.5-196.4 770.5-326.4z",rightarrow:"M0 241v40h399891c-47.3 35.3-84 78-110 128\n-16.7 32-27.7 63.7-33 95 0 1.3-.2 2.7-.5 4-.3 1.3-.5 2.3-.5 3 0 7.3 6.7 11 20\n 11 8 0 13.2-.8 15.5-2.5 2.3-1.7 4.2-5.5 5.5-11.5 2-13.3 5.7-27 11-41 14.7-44.7\n 39-84.5 73-119.5s73.7-60.2 119-75.5c6-2 9-5.7 9-11s-3-9-9-11c-45.3-15.3-85\n-40.5-119-75.5s-58.3-74.8-73-119.5c-4.7-14-8.3-27.3-11-40-1.3-6.7-3.2-10.8-5.5\n-12.5-2.3-1.7-7.5-2.5-15.5-2.5-14 0-21 3.7-21 11 0 2 2 10.3 6 25 20.7 83.3 67\n 151.7 139 205zm0 0v40h399900v-40z",rightbrace:"M400000 542l\n-6 6h-17c-12.7 0-19.3-.3-20-1-4-4-7.3-8.3-10-13-35.3-51.3-80.8-93.8-136.5-127.5\ns-117.2-55.8-184.5-66.5c-.7 0-2-.3-4-1-18.7-2.7-76-4.3-172-5H0V214h399571l6 1\nc124.7 8 235 61.7 331 161 31.3 33.3 59.7 72.7 85 118l7 13v35z",rightbraceunder:"M399994 0l6 6v35l-6 11c-56 104-135.3 181.3-238 232-57.3\n 28.7-117 45-179 50H-300V214h399897c43.3-7 81-15 113-26 100.7-33 179.7-91 237\n-174 2.7-5 6-9 10-13 .7-1 7.3-1 20-1h17z",rightgroup:"M0 80h399565c371 0 266.7 149.4 414 180 5.9 1.2 18 0 18 0 2 0\n 3-1 3-3v-38c-76-158-257-219-435-219H0z",rightgroupunder:"M0 262h399565c371 0 266.7-149.4 414-180 5.9-1.2 18 0 18\n 0 2 0 3 1 3 3v38c-76 158-257 219-435 219H0z",rightharpoon:"M0 241v40h399993c4.7-4.7 7-9.3 7-14 0-9.3\n-3.7-15.3-11-18-92.7-56.7-159-133.7-199-231-3.3-9.3-6-14.7-8-16-2-1.3-7-2-15-2\n-10.7 0-16.7 2-18 6-2 2.7-1 9.7 3 21 15.3 42 36.7 81.8 64 119.5 27.3 37.7 58\n 69.2 92 94.5zm0 0v40h399900v-40z",rightharpoonplus:"M0 241v40h399993c4.7-4.7 7-9.3 7-14 0-9.3-3.7-15.3-11\n-18-92.7-56.7-159-133.7-199-231-3.3-9.3-6-14.7-8-16-2-1.3-7-2-15-2-10.7 0-16.7\n 2-18 6-2 2.7-1 9.7 3 21 15.3 42 36.7 81.8 64 119.5 27.3 37.7 58 69.2 92 94.5z\nm0 0v40h399900v-40z m100 194v40h399900v-40zm0 0v40h399900v-40z",rightharpoondown:"M399747 511c0 7.3 6.7 11 20 11 8 0 13-.8 15-2.5s4.7-6.8\n 8-15.5c40-94 99.3-166.3 178-217 13.3-8 20.3-12.3 21-13 5.3-3.3 8.5-5.8 9.5\n-7.5 1-1.7 1.5-5.2 1.5-10.5s-2.3-10.3-7-15H0v40h399908c-34 25.3-64.7 57-92 95\n-27.3 38-48.7 77.7-64 119-3.3 8.7-5 14-5 16zM0 241v40h399900v-40z",rightharpoondownplus:"M399747 705c0 7.3 6.7 11 20 11 8 0 13-.8\n 15-2.5s4.7-6.8 8-15.5c40-94 99.3-166.3 178-217 13.3-8 20.3-12.3 21-13 5.3-3.3\n 8.5-5.8 9.5-7.5 1-1.7 1.5-5.2 1.5-10.5s-2.3-10.3-7-15H0v40h399908c-34 25.3\n-64.7 57-92 95-27.3 38-48.7 77.7-64 119-3.3 8.7-5 14-5 16zM0 435v40h399900v-40z\nm0-194v40h400000v-40zm0 0v40h400000v-40z",righthook:"M399859 241c-764 0 0 0 0 0 40-3.3 68.7-15.7 86-37 10-12 15-25.3\n 15-40 0-22.7-9.8-40.7-29.5-54-19.7-13.3-43.5-21-71.5-23-17.3-1.3-26-8-26-20 0\n-13.3 8.7-20 26-20 38 0 71 11.2 99 33.5 0 0 7 5.6 21 16.7 14 11.2 21 33.5 21\n 66.8s-14 61.2-42 83.5c-28 22.3-61 33.5-99 33.5L0 241z M0 281v-40h399859v40z",rightlinesegment:C("M399960 241 V94 h40 V428 h-40 V281 H0 v-40z"),rightbracketunder:C("M399995 0 h-120 V290 H0 v120 H400000z"),rightbracketover:C("M399995 440 h-120 V150 H0 v-120 H399995z"),rightToFrom:"M400000 167c-70.7-42-118-97.7-142-167h-23c-15.3 0-23 .3-23\n 1 0 1.3 5.3 13.7 16 37 18 35.3 41.3 69 70 101l7 8H0v40h399905l-7 8c-28.7 32\n-52 65.7-70 101-10.7 23.3-16 35.7-16 37 0 .7 7.7 1 23 1h23c24-69.3 71.3-125 142\n-167z M100 147v40h399900v-40zM0 341v40h399900v-40z",twoheadleftarrow:"M0 167c68 40\n 115.7 95.7 143 167h22c15.3 0 23-.3 23-1 0-1.3-5.3-13.7-16-37-18-35.3-41.3-69\n-70-101l-7-8h125l9 7c50.7 39.3 85 86 103 140h46c0-4.7-6.3-18.7-19-42-18-35.3\n-40-67.3-66-96l-9-9h399716v-40H284l9-9c26-28.7 48-60.7 66-96 12.7-23.333 19\n-37.333 19-42h-46c-18 54-52.3 100.7-103 140l-9 7H95l7-8c28.7-32 52-65.7 70-101\n 10.7-23.333 16-35.7 16-37 0-.7-7.7-1-23-1h-22C115.7 71.3 68 127 0 167z",twoheadrightarrow:"M400000 167\nc-68-40-115.7-95.7-143-167h-22c-15.3 0-23 .3-23 1 0 1.3 5.3 13.7 16 37 18 35.3\n 41.3 69 70 101l7 8h-125l-9-7c-50.7-39.3-85-86-103-140h-46c0 4.7 6.3 18.7 19 42\n 18 35.3 40 67.3 66 96l9 9H0v40h399716l-9 9c-26 28.7-48 60.7-66 96-12.7 23.333\n-19 37.333-19 42h46c18-54 52.3-100.7 103-140l9-7h125l-7 8c-28.7 32-52 65.7-70\n 101-10.7 23.333-16 35.7-16 37 0 .7 7.7 1 23 1h22c27.3-71.3 75-127 143-167z",tilde1:"M200 55.538c-77 0-168 73.953-177 73.953-3 0-7\n-2.175-9-5.437L2 97c-1-2-2-4-2-6 0-4 2-7 5-9l20-12C116 12 171 0 207 0c86 0\n 114 68 191 68 78 0 168-68 177-68 4 0 7 2 9 5l12 19c1 2.175 2 4.35 2 6.525 0\n 4.35-2 7.613-5 9.788l-19 13.05c-92 63.077-116.937 75.308-183 76.128\n-68.267.847-113-73.952-191-73.952z",tilde2:"M344 55.266c-142 0-300.638 81.316-311.5 86.418\n-8.01 3.762-22.5 10.91-23.5 5.562L1 120c-1-2-1-3-1-4 0-5 3-9 8-10l18.4-9C160.9\n 31.9 283 0 358 0c148 0 188 122 331 122s314-97 326-97c4 0 8 2 10 7l7 21.114\nc1 2.14 1 3.21 1 4.28 0 5.347-3 9.626-7 10.696l-22.3 12.622C852.6 158.372 751\n 181.476 676 181.476c-149 0-189-126.21-332-126.21z",tilde3:"M786 59C457 59 32 175.242 13 175.242c-6 0-10-3.457\n-11-10.37L.15 138c-1-7 3-12 10-13l19.2-6.4C378.4 40.7 634.3 0 804.3 0c337 0\n 411.8 157 746.8 157 328 0 754-112 773-112 5 0 10 3 11 9l1 14.075c1 8.066-.697\n 16.595-6.697 17.492l-21.052 7.31c-367.9 98.146-609.15 122.696-778.15 122.696\n -338 0-409-156.573-744-156.573z",tilde4:"M786 58C457 58 32 177.487 13 177.487c-6 0-10-3.345\n-11-10.035L.15 143c-1-7 3-12 10-13l22-6.7C381.2 35 637.15 0 807.15 0c337 0 409\n 177 744 177 328 0 754-127 773-127 5 0 10 3 11 9l1 14.794c1 7.805-3 13.38-9\n 14.495l-20.7 5.574c-366.85 99.79-607.3 139.372-776.3 139.372-338 0-409\n -175.236-744-175.236z",vec:"M377 20c0-5.333 1.833-10 5.5-14S391 0 397 0c4.667 0 8.667 1.667 12 5\n3.333 2.667 6.667 9 10 19 6.667 24.667 20.333 43.667 41 57 7.333 4.667 11\n10.667 11 18 0 6-1 10-3 12s-6.667 5-14 9c-28.667 14.667-53.667 35.667-75 63\n-1.333 1.333-3.167 3.5-5.5 6.5s-4 4.833-5 5.5c-1 .667-2.5 1.333-4.5 2s-4.333 1\n-7 1c-4.667 0-9.167-1.833-13.5-5.5S337 184 337 178c0-12.667 15.667-32.333 47-59\nH213l-171-1c-8.667-6-13-12.333-13-19 0-4.667 4.333-11.333 13-20h359\nc-16-25.333-24-45-24-59z",widehat1:"M529 0h5l519 115c5 1 9 5 9 10 0 1-1 2-1 3l-4 22\nc-1 5-5 9-11 9h-2L532 67 19 159h-2c-5 0-9-4-11-9l-5-22c-1-6 2-12 8-13z",widehat2:"M1181 0h2l1171 176c6 0 10 5 10 11l-2 23c-1 6-5 10\n-11 10h-1L1182 67 15 220h-1c-6 0-10-4-11-10l-2-23c-1-6 4-11 10-11z",widehat3:"M1181 0h2l1171 236c6 0 10 5 10 11l-2 23c-1 6-5 10\n-11 10h-1L1182 67 15 280h-1c-6 0-10-4-11-10l-2-23c-1-6 4-11 10-11z",widehat4:"M1181 0h2l1171 296c6 0 10 5 10 11l-2 23c-1 6-5 10\n-11 10h-1L1182 67 15 340h-1c-6 0-10-4-11-10l-2-23c-1-6 4-11 10-11z",widecheck1:"M529,159h5l519,-115c5,-1,9,-5,9,-10c0,-1,-1,-2,-1,-3l-4,-22c-1,\n-5,-5,-9,-11,-9h-2l-512,92l-513,-92h-2c-5,0,-9,4,-11,9l-5,22c-1,6,2,12,8,13z",widecheck2:"M1181,220h2l1171,-176c6,0,10,-5,10,-11l-2,-23c-1,-6,-5,-10,\n-11,-10h-1l-1168,153l-1167,-153h-1c-6,0,-10,4,-11,10l-2,23c-1,6,4,11,10,11z",widecheck3:"M1181,280h2l1171,-236c6,0,10,-5,10,-11l-2,-23c-1,-6,-5,-10,\n-11,-10h-1l-1168,213l-1167,-213h-1c-6,0,-10,4,-11,10l-2,23c-1,6,4,11,10,11z",widecheck4:"M1181,340h2l1171,-296c6,0,10,-5,10,-11l-2,-23c-1,-6,-5,-10,\n-11,-10h-1l-1168,273l-1167,-273h-1c-6,0,-10,4,-11,10l-2,23c-1,6,4,11,10,11z",baraboveleftarrow:"M400000 620h-399890l3 -3c68.7 -52.7 113.7 -120 135 -202\nc4 -14.7 6 -23 6 -25c0 -7.3 -7 -11 -21 -11c-8 0 -13.2 0.8 -15.5 2.5\nc-2.3 1.7 -4.2 5.8 -5.5 12.5c-1.3 4.7 -2.7 10.3 -4 17c-12 48.7 -34.8 92 -68.5 130\ns-74.2 66.3 -121.5 85c-10 4 -16 7.7 -18 11c0 8.7 6 14.3 18 17c47.3 18.7 87.8 47\n121.5 85s56.5 81.3 68.5 130c0.7 2 1.3 5 2 9s1.2 6.7 1.5 8c0.3 1.3 1 3.3 2 6\ns2.2 4.5 3.5 5.5c1.3 1 3.3 1.8 6 2.5s6 1 10 1c14 0 21 -3.7 21 -11\nc0 -2 -2 -10.3 -6 -25c-20 -79.3 -65 -146.7 -135 -202l-3 -3h399890z\nM100 620v40h399900v-40z M0 241v40h399900v-40zM0 241v40h399900v-40z",rightarrowabovebar:"M0 241v40h399891c-47.3 35.3-84 78-110 128-16.7 32\n-27.7 63.7-33 95 0 1.3-.2 2.7-.5 4-.3 1.3-.5 2.3-.5 3 0 7.3 6.7 11 20 11 8 0\n13.2-.8 15.5-2.5 2.3-1.7 4.2-5.5 5.5-11.5 2-13.3 5.7-27 11-41 14.7-44.7 39\n-84.5 73-119.5s73.7-60.2 119-75.5c6-2 9-5.7 9-11s-3-9-9-11c-45.3-15.3-85-40.5\n-119-75.5s-58.3-74.8-73-119.5c-4.7-14-8.3-27.3-11-40-1.3-6.7-3.2-10.8-5.5\n-12.5-2.3-1.7-7.5-2.5-15.5-2.5-14 0-21 3.7-21 11 0 2 2 10.3 6 25 20.7 83.3 67\n151.7 139 205zm96 379h399894v40H0zm0 0h399904v40H0z",baraboveshortleftharpoon:"M507,435c-4,4,-6.3,8.7,-7,14c0,5.3,0.7,9,2,11\nc1.3,2,5.3,5.3,12,10c90.7,54,156,130,196,228c3.3,10.7,6.3,16.3,9,17\nc2,0.7,5,1,9,1c0,0,5,0,5,0c10.7,0,16.7,-2,18,-6c2,-2.7,1,-9.7,-3,-21\nc-32,-87.3,-82.7,-157.7,-152,-211c0,0,-3,-3,-3,-3l399351,0l0,-40\nc-398570,0,-399437,0,-399437,0z M593 435 v40 H399500 v-40z\nM0 281 v-40 H399908 v40z M0 281 v-40 H399908 v40z",rightharpoonaboveshortbar:"M0,241 l0,40c399126,0,399993,0,399993,0\nc4.7,-4.7,7,-9.3,7,-14c0,-9.3,-3.7,-15.3,-11,-18c-92.7,-56.7,-159,-133.7,-199,\n-231c-3.3,-9.3,-6,-14.7,-8,-16c-2,-1.3,-7,-2,-15,-2c-10.7,0,-16.7,2,-18,6\nc-2,2.7,-1,9.7,3,21c15.3,42,36.7,81.8,64,119.5c27.3,37.7,58,69.2,92,94.5z\nM0 241 v40 H399908 v-40z M0 475 v-40 H399500 v40z M0 475 v-40 H399500 v40z",shortbaraboveleftharpoon:"M7,435c-4,4,-6.3,8.7,-7,14c0,5.3,0.7,9,2,11\nc1.3,2,5.3,5.3,12,10c90.7,54,156,130,196,228c3.3,10.7,6.3,16.3,9,17c2,0.7,5,1,9,\n1c0,0,5,0,5,0c10.7,0,16.7,-2,18,-6c2,-2.7,1,-9.7,-3,-21c-32,-87.3,-82.7,-157.7,\n-152,-211c0,0,-3,-3,-3,-3l399907,0l0,-40c-399126,0,-399993,0,-399993,0z\nM93 435 v40 H400000 v-40z M500 241 v40 H400000 v-40z M500 241 v40 H400000 v-40z",shortrightharpoonabovebar:"M53,241l0,40c398570,0,399437,0,399437,0\nc4.7,-4.7,7,-9.3,7,-14c0,-9.3,-3.7,-15.3,-11,-18c-92.7,-56.7,-159,-133.7,-199,\n-231c-3.3,-9.3,-6,-14.7,-8,-16c-2,-1.3,-7,-2,-15,-2c-10.7,0,-16.7,2,-18,6\nc-2,2.7,-1,9.7,3,21c15.3,42,36.7,81.8,64,119.5c27.3,37.7,58,69.2,92,94.5z\nM500 241 v40 H399408 v-40z M500 435 v40 H400000 v-40z"};class I{constructor(e){this.children=void 0,this.classes=void 0,this.height=void 0,this.depth=void 0,this.maxFontSize=void 0,this.style=void 0,this.children=e,this.classes=[],this.height=0,this.depth=0,this.maxFontSize=0,this.style={}}hasClass(e){return this.classes.includes(e)}toNode(){const e=document.createDocumentFragment();for(let t=0;t{if("toText"in e)return e.toText();throw new Error("Expected MathDomNode with toText, got "+e.constructor.name)}).join("")}}const R={pt:1,mm:7227/2540,cm:7227/254,in:72.27,bp:1.00375,pc:12,dd:1238/1157,cc:14856/1157,nd:685/642,nc:1370/107,sp:1/65536,px:1.00375},H={ex:!0,em:!0,mu:!0},E=function(e){return"string"!=typeof e&&(e=e.unit),e in R||e in H||"ex"===e},N=function(e,t){let r;if(e.unit in R)r=R[e.unit]/t.fontMetrics().ptPerEm/t.sizeMultiplier;else if("mu"===e.unit)r=t.fontMetrics().cssEmPerMu;else{let o;if(o=t.style.isTight()?t.havingStyle(t.style.text()):t,"ex"===e.unit)r=o.fontMetrics().xHeight;else{if("em"!==e.unit)throw new n("Invalid unit: '"+e.unit+"'");r=o.fontMetrics().quad}o!==t&&(r*=o.sizeMultiplier/t.sizeMultiplier)}return Math.min(e.number*r,t.maxSize)},O=function(e){return+e.toFixed(4)+"em"},D=function(e){return e.filter(e=>e).join(" ")},L=function(e){let t="";for(const r of Object.keys(e)){const n=e[r];void 0!==n&&(t+=s(r)+":"+n+";")}return t},P=function(e,t,r){if(this.classes=e||[],this.attributes={},this.height=0,this.depth=0,this.maxFontSize=0,this.style=r||{},t){t.style.isTight()&&this.classes.push("mtight");const e=t.getColor();e&&(this.style.color=e)}},F=function(e){const t=document.createElement(e);t.className=D(this.classes),Object.assign(t.style,this.style);for(const e of Object.keys(this.attributes))t.setAttribute(e,this.attributes[e]);for(let e=0;e/=\x00-\x1f]/,G=function(e){let t="<"+e;this.classes.length&&(t+=' class="'+a(D(this.classes))+'"');const r=L(this.style);r&&(t+=' style="'+a(r)+'"');for(const e of Object.keys(this.attributes)){if(V.test(e))throw new n("Invalid attribute name '"+e+"'");t+=" "+e+'="'+a(this.attributes[e])+'"'}t+=">";for(let e=0;e",t};class U{constructor(e,t,r,n){this.children=void 0,this.attributes=void 0,this.classes=void 0,this.height=void 0,this.depth=void 0,this.width=void 0,this.maxFontSize=void 0,this.style=void 0,this.italic=void 0,P.call(this,e,r,n),this.children=t||[]}setAttribute(e,t){this.attributes[e]=t}hasClass(e){return this.classes.includes(e)}toNode(){return F.call(this,"span")}toMarkup(){return G.call(this,"span")}}class X{constructor(e,t,r,n){this.children=void 0,this.attributes=void 0,this.classes=void 0,this.height=void 0,this.depth=void 0,this.maxFontSize=void 0,this.style=void 0,P.call(this,t,n),this.children=r||[],this.setAttribute("href",e)}setAttribute(e,t){this.attributes[e]=t}hasClass(e){return this.classes.includes(e)}toNode(){return F.call(this,"a")}toMarkup(){return G.call(this,"a")}}class Y{constructor(e,t,r){this.src=void 0,this.alt=void 0,this.classes=void 0,this.height=void 0,this.depth=void 0,this.maxFontSize=void 0,this.style=void 0,this.alt=t,this.src=e,this.classes=["mord"],this.height=0,this.depth=0,this.maxFontSize=0,this.style=r}hasClass(e){return this.classes.includes(e)}toNode(){const e=document.createElement("img");return e.src=this.src,e.alt=this.alt,e.className="mord",Object.assign(e.style,this.style),e}toMarkup(){let e=''+a(this.alt)+'=n[0]&&e<=n[1])return r.name}}return null}(this.text.charCodeAt(0));a&&this.classes.push(a+"_fallback"),/[\xee\xef\xed\xec]/.test(this.text)&&(this.text=j[this.text])}hasClass(e){return this.classes.includes(e)}toNode(){const e=document.createTextNode(this.text);let t=null;return this.italic>0&&(t=document.createElement("span"),t.style.marginRight=O(this.italic)),this.classes.length>0&&(t=t||document.createElement("span"),t.className=D(this.classes)),Object.keys(this.style).length>0&&(t=t||document.createElement("span"),Object.assign(t.style,this.style)),t?(t.appendChild(e),t):e}toMarkup(){let e=!1,t="0&&(r+="margin-right:"+O(this.italic)+";"),r+=L(this.style),r&&(e=!0,t+=' style="'+a(r)+'"');const n=a(this.text);return e?(t+=">",t+=n,t+="",t):n}}class _{constructor(e,t){this.children=void 0,this.attributes=void 0,this.children=e||[],this.attributes=t||{}}toNode(){const e=document.createElementNS("http://www.w3.org/2000/svg","svg");for(const t of Object.keys(this.attributes))e.setAttribute(t,this.attributes[t]);for(let t=0;t':''}}class Z{constructor(e){this.attributes=void 0,this.attributes=e||{}}toNode(){const e=document.createElementNS("http://www.w3.org/2000/svg","line");for(const t of Object.keys(this.attributes))e.setAttribute(t,this.attributes[t]);return e}toMarkup(){let e="","\\gt",!0),oe(se,le,be,"\u2208","\\in",!0),oe(se,le,be,"\ue020","\\@not"),oe(se,le,be,"\u2282","\\subset",!0),oe(se,le,be,"\u2283","\\supset",!0),oe(se,le,be,"\u2286","\\subseteq",!0),oe(se,le,be,"\u2287","\\supseteq",!0),oe(se,ae,be,"\u2288","\\nsubseteq",!0),oe(se,ae,be,"\u2289","\\nsupseteq",!0),oe(se,le,be,"\u22a8","\\models"),oe(se,le,be,"\u2190","\\leftarrow",!0),oe(se,le,be,"\u2264","\\le"),oe(se,le,be,"\u2264","\\leq",!0),oe(se,le,be,"<","\\lt",!0),oe(se,le,be,"\u2192","\\rightarrow",!0),oe(se,le,be,"\u2192","\\to"),oe(se,ae,be,"\u2271","\\ngeq",!0),oe(se,ae,be,"\u2270","\\nleq",!0),oe(se,le,ye,"\xa0","\\ "),oe(se,le,ye,"\xa0","\\space"),oe(se,le,ye,"\xa0","\\nobreakspace"),oe(ie,le,ye,"\xa0","\\ "),oe(ie,le,ye,"\xa0"," "),oe(ie,le,ye,"\xa0","\\space"),oe(ie,le,ye,"\xa0","\\nobreakspace"),oe(se,le,ye,"","\\nobreak"),oe(se,le,ye,"","\\allowbreak"),oe(se,le,fe,",",","),oe(se,le,fe,";",";"),oe(se,ae,he,"\u22bc","\\barwedge",!0),oe(se,ae,he,"\u22bb","\\veebar",!0),oe(se,le,he,"\u2299","\\odot",!0),oe(se,le,he,"\u2295","\\oplus",!0),oe(se,le,he,"\u2297","\\otimes",!0),oe(se,le,xe,"\u2202","\\partial",!0),oe(se,le,he,"\u2298","\\oslash",!0),oe(se,ae,he,"\u229a","\\circledcirc",!0),oe(se,ae,he,"\u22a1","\\boxdot",!0),oe(se,le,he,"\u25b3","\\bigtriangleup"),oe(se,le,he,"\u25bd","\\bigtriangledown"),oe(se,le,he,"\u2020","\\dagger"),oe(se,le,he,"\u22c4","\\diamond"),oe(se,le,he,"\u22c6","\\star"),oe(se,le,he,"\u25c3","\\triangleleft"),oe(se,le,he,"\u25b9","\\triangleright"),oe(se,le,ge,"{","\\{"),oe(ie,le,xe,"{","\\{"),oe(ie,le,xe,"{","\\textbraceleft"),oe(se,le,me,"}","\\}"),oe(ie,le,xe,"}","\\}"),oe(ie,le,xe,"}","\\textbraceright"),oe(se,le,ge,"{","\\lbrace"),oe(se,le,me,"}","\\rbrace"),oe(se,le,ge,"[","\\lbrack",!0),oe(ie,le,xe,"[","\\lbrack",!0),oe(se,le,me,"]","\\rbrack",!0),oe(ie,le,xe,"]","\\rbrack",!0),oe(se,le,ge,"(","\\lparen",!0),oe(se,le,me,")","\\rparen",!0),oe(ie,le,xe,"<","\\textless",!0),oe(ie,le,xe,">","\\textgreater",!0),oe(se,le,ge,"\u230a","\\lfloor",!0),oe(se,le,me,"\u230b","\\rfloor",!0),oe(se,le,ge,"\u2308","\\lceil",!0),oe(se,le,me,"\u2309","\\rceil",!0),oe(se,le,xe,"\\","\\backslash"),oe(se,le,xe,"\u2223","|"),oe(se,le,xe,"\u2223","\\vert"),oe(ie,le,xe,"|","\\textbar",!0),oe(se,le,xe,"\u2225","\\|"),oe(se,le,xe,"\u2225","\\Vert"),oe(ie,le,xe,"\u2225","\\textbardbl"),oe(ie,le,xe,"~","\\textasciitilde"),oe(ie,le,xe,"\\","\\textbackslash"),oe(ie,le,xe,"^","\\textasciicircum"),oe(se,le,be,"\u2191","\\uparrow",!0),oe(se,le,be,"\u21d1","\\Uparrow",!0),oe(se,le,be,"\u2193","\\downarrow",!0),oe(se,le,be,"\u21d3","\\Downarrow",!0),oe(se,le,be,"\u2195","\\updownarrow",!0),oe(se,le,be,"\u21d5","\\Updownarrow",!0),oe(se,le,de,"\u2210","\\coprod"),oe(se,le,de,"\u22c1","\\bigvee"),oe(se,le,de,"\u22c0","\\bigwedge"),oe(se,le,de,"\u2a04","\\biguplus"),oe(se,le,de,"\u22c2","\\bigcap"),oe(se,le,de,"\u22c3","\\bigcup"),oe(se,le,de,"\u222b","\\int"),oe(se,le,de,"\u222b","\\intop"),oe(se,le,de,"\u222c","\\iint"),oe(se,le,de,"\u222d","\\iiint"),oe(se,le,de,"\u220f","\\prod"),oe(se,le,de,"\u2211","\\sum"),oe(se,le,de,"\u2a02","\\bigotimes"),oe(se,le,de,"\u2a01","\\bigoplus"),oe(se,le,de,"\u2a00","\\bigodot"),oe(se,le,de,"\u222e","\\oint"),oe(se,le,de,"\u222f","\\oiint"),oe(se,le,de,"\u2230","\\oiiint"),oe(se,le,de,"\u2a06","\\bigsqcup"),oe(se,le,de,"\u222b","\\smallint"),oe(ie,le,ue,"\u2026","\\textellipsis"),oe(se,le,ue,"\u2026","\\mathellipsis"),oe(ie,le,ue,"\u2026","\\ldots",!0),oe(se,le,ue,"\u2026","\\ldots",!0),oe(se,le,ue,"\u22ef","\\@cdots",!0),oe(se,le,ue,"\u22f1","\\ddots",!0),oe(se,le,xe,"\u22ee","\\varvdots"),oe(ie,le,xe,"\u22ee","\\varvdots"),oe(se,le,ce,"\u02ca","\\acute"),oe(se,le,ce,"\u02cb","\\grave"),oe(se,le,ce,"\xa8","\\ddot"),oe(se,le,ce,"~","\\tilde"),oe(se,le,ce,"\u02c9","\\bar"),oe(se,le,ce,"\u02d8","\\breve"),oe(se,le,ce,"\u02c7","\\check"),oe(se,le,ce,"^","\\hat"),oe(se,le,ce,"\u20d7","\\vec"),oe(se,le,ce,"\u02d9","\\dot"),oe(se,le,ce,"\u02da","\\mathring"),oe(se,le,pe,"\ue131","\\@imath"),oe(se,le,pe,"\ue237","\\@jmath"),oe(se,le,xe,"\u0131","\u0131"),oe(se,le,xe,"\u0237","\u0237"),oe(ie,le,xe,"\u0131","\\i",!0),oe(ie,le,xe,"\u0237","\\j",!0),oe(ie,le,xe,"\xdf","\\ss",!0),oe(ie,le,xe,"\xe6","\\ae",!0),oe(ie,le,xe,"\u0153","\\oe",!0),oe(ie,le,xe,"\xf8","\\o",!0),oe(ie,le,xe,"\xc6","\\AE",!0),oe(ie,le,xe,"\u0152","\\OE",!0),oe(ie,le,xe,"\xd8","\\O",!0),oe(ie,le,ce,"\u02ca","\\'"),oe(ie,le,ce,"\u02cb","\\`"),oe(ie,le,ce,"\u02c6","\\^"),oe(ie,le,ce,"\u02dc","\\~"),oe(ie,le,ce,"\u02c9","\\="),oe(ie,le,ce,"\u02d8","\\u"),oe(ie,le,ce,"\u02d9","\\."),oe(ie,le,ce,"\xb8","\\c"),oe(ie,le,ce,"\u02da","\\r"),oe(ie,le,ce,"\u02c7","\\v"),oe(ie,le,ce,"\xa8",'\\"'),oe(ie,le,ce,"\u02dd","\\H"),oe(ie,le,ce,"\u25ef","\\textcircled");const we={"--":!0,"---":!0,"``":!0,"''":!0};oe(ie,le,xe,"\u2013","--",!0),oe(ie,le,xe,"\u2013","\\textendash"),oe(ie,le,xe,"\u2014","---",!0),oe(ie,le,xe,"\u2014","\\textemdash"),oe(ie,le,xe,"\u2018","`",!0),oe(ie,le,xe,"\u2018","\\textquoteleft"),oe(ie,le,xe,"\u2019","'",!0),oe(ie,le,xe,"\u2019","\\textquoteright"),oe(ie,le,xe,"\u201c","``",!0),oe(ie,le,xe,"\u201c","\\textquotedblleft"),oe(ie,le,xe,"\u201d","''",!0),oe(ie,le,xe,"\u201d","\\textquotedblright"),oe(se,le,xe,"\xb0","\\degree",!0),oe(ie,le,xe,"\xb0","\\degree"),oe(ie,le,xe,"\xb0","\\textdegree",!0),oe(se,le,xe,"\xa3","\\pounds"),oe(se,le,xe,"\xa3","\\mathsterling",!0),oe(ie,le,xe,"\xa3","\\pounds"),oe(ie,le,xe,"\xa3","\\textsterling",!0),oe(se,ae,xe,"\u2720","\\maltese"),oe(ie,ae,xe,"\u2720","\\maltese");const ve='0123456789/@."';for(let e=0;e<14;e++){const t=ve.charAt(e);oe(se,le,xe,t,t)}const ke='0123456789!@*()-=+";:?/.,';for(let e=0;e<25;e++){const t=ke.charAt(e);oe(ie,le,xe,t,t)}const ze="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";for(let e=0;e<52;e++){const t=ze.charAt(e);oe(se,le,pe,t,t),oe(ie,le,xe,t,t)}let Se;oe(se,ae,xe,"C","\u2102"),oe(ie,ae,xe,"C","\u2102"),oe(se,ae,xe,"H","\u210d"),oe(ie,ae,xe,"H","\u210d"),oe(se,ae,xe,"N","\u2115"),oe(ie,ae,xe,"N","\u2115"),oe(se,ae,xe,"P","\u2119"),oe(ie,ae,xe,"P","\u2119"),oe(se,ae,xe,"Q","\u211a"),oe(ie,ae,xe,"Q","\u211a"),oe(se,ae,xe,"R","\u211d"),oe(ie,ae,xe,"R","\u211d"),oe(se,ae,xe,"Z","\u2124"),oe(ie,ae,xe,"Z","\u2124"),oe(se,le,pe,"h","\u210e"),oe(ie,le,pe,"h","\u210e");for(let e=0;e<52;e++){const t=ze.charAt(e);Se=String.fromCharCode(55349,56320+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),Se=String.fromCharCode(55349,56372+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),Se=String.fromCharCode(55349,56424+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),Se=String.fromCharCode(55349,56580+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),Se=String.fromCharCode(55349,56684+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),Se=String.fromCharCode(55349,56736+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),Se=String.fromCharCode(55349,56788+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),Se=String.fromCharCode(55349,56840+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),Se=String.fromCharCode(55349,56944+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),e<26&&(Se=String.fromCharCode(55349,56632+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),Se=String.fromCharCode(55349,56476+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se))}Se=String.fromCharCode(55349,56668),oe(se,le,pe,"k",Se),oe(ie,le,xe,"k",Se);for(let e=0;e<10;e++){const t=e.toString();Se=String.fromCharCode(55349,57294+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),Se=String.fromCharCode(55349,57314+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),Se=String.fromCharCode(55349,57324+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se),Se=String.fromCharCode(55349,57334+e),oe(se,le,pe,t,Se),oe(ie,le,xe,t,Se)}const Me="\xd0\xde\xfe";for(let e=0;e<3;e++){const t=Me.charAt(e);oe(se,le,pe,t,t),oe(ie,le,xe,t,t)}const Ae={mathClass:"mathbf",textClass:"textbf",font:"Main-Bold"},Te={mathClass:"mathnormal",textClass:"textit",font:"Math-Italic"},Ce={mathClass:"boldsymbol",textClass:"boldsymbol",font:"Main-BoldItalic"},Be={mathClass:"",textClass:"",font:""},qe={mathClass:"mathfrak",textClass:"textfrak",font:"Fraktur-Regular"},Ie={mathClass:"mathbb",textClass:"textbb",font:"AMS-Regular"},Re={mathClass:"mathboldfrak",textClass:"textboldfrak",font:"Fraktur-Regular"},He={mathClass:"mathsf",textClass:"textsf",font:"SansSerif-Regular"},Ee={mathClass:"mathboldsf",textClass:"textboldsf",font:"SansSerif-Bold"},Ne={mathClass:"mathitsf",textClass:"textitsf",font:"SansSerif-Italic"},Oe={mathClass:"mathtt",textClass:"texttt",font:"Typewriter-Regular"},De=[Ae,Ae,Te,Te,Ce,Ce,{mathClass:"mathscr",textClass:"textscr",font:"Script-Regular"},Be,Be,Be,qe,qe,Ie,Ie,Re,Re,He,He,Ee,Ee,Ne,Ne,Be,Be,Oe,Oe],Le=[Ae,Be,He,Ee,Oe],Pe=function(e,t,r){if(ne[r][e]){const t=ne[r][e].replace;t&&(e=t)}return{value:e,metrics:ee(e,t,r)}},Fe=function(e,t,r,n,o){const s=Pe(e,t,r),i=s.metrics;let l;if(e=s.value,i){let t=i.italic;("text"===r||n&&"mathit"===n.font)&&(t=0),l=new W(e,i.height,i.depth,t,i.skew,i.width,o)}else"undefined"!=typeof console&&console.warn("No character metrics for '"+e+"' in style '"+t+"' and mode '"+r+"'"),l=new W(e,0,0,0,0,0,o);if(n){l.maxFontSize=n.sizeMultiplier,n.style.isTight()&&l.classes.push("mtight");const e=n.getColor();e&&(l.style.color=e)}return l},Ve=function(e,t,r,n){return void 0===n&&(n=[]),"boldsymbol"===r.font&&Pe(e,"Main-Bold",t).metrics?Fe(e,"Main-Bold",t,r,n.concat(["mathbf"])):"\\"===e||"main"===ne[t][e].font?Fe(e,"Main-Regular",t,r,n):Fe(e,"AMS-Regular",t,r,n.concat(["amsrm"]))},Ge=function(e,t){const r="mathord"===e.type?"mathord":"textord",o=e.mode,s=e.text,i=["mord"],{font:l,fontFamily:a,fontWeight:c,fontShape:h}=t,m="math"===o||"text"===o&&!!l,u=m?l:a;let p="",d="";if(55349===s.charCodeAt(0)){const e=(e=>{const t=1024*(e.charCodeAt(0)-55296)+(e.charCodeAt(1)-56320)+65536;if(119808<=t&&t<120484){const e=Math.floor((t-119808)/26);return De[e]}if(120782<=t&&t<=120831){const e=Math.floor((t-120782)/10);return Le[e]}if(120485===t||120486===t)return De[0];if(120486{if(D(e.classes)!==D(t.classes)||e.skew!==t.skew||e.maxFontSize!==t.maxFontSize||0!==e.italic&&e.hasClass("mathnormal"))return!1;if(1===e.classes.length){const t=e.classes[0];if("mbin"===t||"mord"===t)return!1}for(const r of Object.keys(e.style))if(e.style[r]!==t.style[r])return!1;for(const r of Object.keys(t.style))if(e.style[r]!==t.style[r])return!1;return!0},Xe=e=>{for(let t=0;tt&&(t=s.height),s.depth>r&&(r=s.depth),s.maxFontSize>n&&(n=s.maxFontSize)}e.height=t,e.depth=r,e.maxFontSize=n},je=function(e,t,r,n){const o=new U(e,t,r,n);return Ye(o),o},We=(e,t,r,n)=>new U(e,t,r,n),_e=function(e,t,r){const n=je([e],[],t);return n.height=Math.max(r||t.fontMetrics().defaultRuleThickness,t.minRuleThickness),n.style.borderBottomWidth=O(n.height),n.maxFontSize=1,n},$e=function(e){const t=new I(e);return Ye(t),t},Ze=function(e,t){return e instanceof I?je([],[e],t):e},Ke=function(e,t){const{children:r,depth:n}=function(e){if("individualShift"===e.positionType){const t=e.children,r=[t[0]],n=-t[0].shift-t[0].elem.depth;let o=n;for(let e=1;e{const r=je(["mspace"],[],t),n=N(e,t);return r.style.marginRight=O(n),r},Qe=(e,t,r)=>{let n,o;switch(e){case"amsrm":n="AMS";break;case"textrm":n="Main";break;case"textsf":n="SansSerif";break;case"texttt":n="Typewriter";break;default:n=e}return o="textbf"===t&&"textit"===r?"BoldItalic":"textbf"===t?"Bold":"textit"===r?"Italic":"Regular",n+"-"+o},et={mathbf:{variant:"bold",fontName:"Main-Bold"},mathrm:{variant:"normal",fontName:"Main-Regular"},textit:{variant:"italic",fontName:"Main-Italic"},mathit:{variant:"italic",fontName:"Main-Italic"},mathnormal:{variant:"italic",fontName:"Math-Italic"},mathsfit:{variant:"sans-serif-italic",fontName:"SansSerif-Italic"},mathbb:{variant:"double-struck",fontName:"AMS-Regular"},mathcal:{variant:"script",fontName:"Caligraphic-Regular"},mathfrak:{variant:"fraktur",fontName:"Fraktur-Regular"},mathscr:{variant:"script",fontName:"Script-Regular"},mathsf:{variant:"sans-serif",fontName:"SansSerif-Regular"},mathtt:{variant:"monospace",fontName:"Typewriter-Regular"}},tt={vec:["vec",.471,.714],oiintSize1:["oiintSize1",.957,.499],oiintSize2:["oiintSize2",1.472,.659],oiiintSize1:["oiiintSize1",1.304,.499],oiiintSize2:["oiiintSize2",1.98,.659]},rt=function(e,t){const[r,n,o]=tt[e],s=new $(r),i=new _([s],{width:O(n),height:O(o),style:"width:"+O(n),viewBox:"0 0 "+1e3*n+" "+1e3*o,preserveAspectRatio:"xMinYMin"}),l=We(["overlay"],[i],t);return l.height=o,l.style.height=O(o),l.style.width=O(n),l},nt={number:3,unit:"mu"},ot={number:4,unit:"mu"},st={number:5,unit:"mu"},it={mord:{mop:nt,mbin:ot,mrel:st,minner:nt},mop:{mord:nt,mop:nt,mrel:st,minner:nt},mbin:{mord:ot,mop:ot,mopen:ot,minner:ot},mrel:{mord:st,mop:st,mopen:st,minner:st},mopen:{},mclose:{mop:nt,mbin:ot,mrel:st,minner:nt},mpunct:{mord:nt,mop:nt,mrel:st,mopen:nt,mclose:nt,mpunct:nt,minner:nt},minner:{mord:nt,mop:nt,mbin:ot,mrel:st,mopen:nt,mpunct:nt,minner:nt}},lt={mord:{mop:nt},mop:{mord:nt,mop:nt},mbin:{},mrel:{},mopen:{},mclose:{mop:nt},mpunct:{},minner:{mop:nt}},at={},ct={},ht={};function mt(e){const{type:t,names:r,htmlBuilder:n,mathmlBuilder:o}=e;for(let t=0;t{const r=t.classes[0],n=e.classes[0];"mbin"===r&&ft.has(n)?t.classes[0]="mord":"mbin"===n&>.has(r)&&(e.classes[0]="mord")},{node:i},l,a),wt(o,(e,t)=>{var r,n;const o=zt(t),i=zt(e),l=o&&i?e.hasClass("mtight")?null==(r=lt[o])?void 0:r[i]:null==(n=it[o])?void 0:n[i]:null;if(l)return Je(l,s)},{node:i},l,a),o},wt=function(e,t,r,n,o){n&&e.push(n);let s=0;for(;sr=>{e.splice(t+1,0,r),s++})(s)}n&&e.pop()},vt=function(e){return e instanceof I||e instanceof X||e instanceof U&&e.hasClass("enclosing")?e:null},kt=function(e,t){const r=vt(e);if(r){const e=r.children;if(e.length){if("right"===t)return kt(e[e.length-1],"right");if("left"===t)return kt(e[0],"left")}}return e},zt=function(e,t){if(!e)return null;t&&(e=kt(e,t));const r=e.classes[0];return yt[r]||null},St=function(e,t){const r=["nulldelimiter"].concat(e.baseSizingClasses());return je(t.concat(r))},Mt=function(e,t,r){if(!e)return je();if(ct[e.type]){let n=ct[e.type](e,t);if(r&&t.size!==r.size){n=je(t.sizingClasses(r),[n],t);const e=t.sizeMultiplier/r.sizeMultiplier;n.height*=e,n.depth*=e}return n}throw new n("Got group of unknown type: '"+e.type+"'")};function At(e,t){const r=je(["base"],e,t),n=je(["strut"]);return n.style.height=O(r.height+r.depth),r.depth&&(n.style.verticalAlign=O(-r.depth)),r.children.unshift(n),r}function Tt(e,t){let r=null;1===e.length&&"tag"===e[0].type&&(r=e[0].tag,e=e[0].body);const n=xt(e,t,"root");let o;2===n.length&&n[1].hasClass("tag")&&(o=n.pop());const s=[];let i,l=[];for(let e=0;e0&&(s.push(At(l,t)),l=[]),s.push(n[e]));l.length>0&&s.push(At(l,t)),r?(i=At(xt(r,t,!0),t),i.classes=["tag"],s.push(i)):o&&s.push(o);const a=je(["katex-html"],s);if(a.setAttribute("aria-hidden","true"),i){const e=i.children[0];e.style.height=O(a.height+a.depth),a.depth&&(e.style.verticalAlign=O(-a.depth))}return a}function Ct(e){return new I(e)}class Bt{constructor(e,t,r){this.type=void 0,this.attributes=void 0,this.children=void 0,this.classes=void 0,this.type=e,this.attributes={},this.children=t||[],this.classes=r||[]}setAttribute(e,t){this.attributes[e]=t}getAttribute(e){return this.attributes[e]}toNode(){const e=document.createElementNS("http://www.w3.org/1998/Math/MathML",this.type);for(const t in this.attributes)Object.prototype.hasOwnProperty.call(this.attributes,t)&&e.setAttribute(t,this.attributes[t]);this.classes.length>0&&(e.className=D(this.classes));for(let t=0;t0&&(e+=' class ="'+a(D(this.classes))+'"'),e+=">";for(let t=0;t",e}toText(){return this.children.map(e=>e.toText()).join("")}}class qt{constructor(e){this.text=void 0,this.text=e}toNode(){return document.createTextNode(this.text)}toMarkup(){return a(this.toText())}toText(){return this.text}}class It{constructor(e){this.width=void 0,this.character=void 0,this.width=e,this.character=e>=.05555&&e<=.05556?"\u200a":e>=.1666&&e<=.1667?"\u2009":e>=.2222&&e<=.2223?"\u2005":e>=.2777&&e<=.2778?"\u2005\u200a":e>=-.05556&&e<=-.05555?"\u200a\u2063":e>=-.1667&&e<=-.1666?"\u2009\u2063":e>=-.2223&&e<=-.2222?"\u205f\u2063":e>=-.2778&&e<=-.2777?"\u2005\u2063":null}toNode(){if(this.character)return document.createTextNode(this.character);{const e=document.createElementNS("http://www.w3.org/1998/Math/MathML","mspace");return e.setAttribute("width",O(this.width)),e}}toMarkup(){return this.character?""+this.character+"":''}toText(){return this.character?this.character:" "}}const Rt=new Set(["\\imath","\\jmath"]),Ht=new Set(["mrow","mtable"]),Et=function(e,t,r){return!ne[t][e]||!ne[t][e].replace||55349===e.charCodeAt(0)||we.hasOwnProperty(e)&&r&&(r.fontFamily&&"tt"===r.fontFamily.slice(4,6)||r.font&&"tt"===r.font.slice(4,6))||(e=ne[t][e].replace),new qt(e)},Nt=function(e){return 1===e.length?e[0]:new Bt("mrow",e)},Ot={mathit:"italic",boldsymbol:e=>"textord"===e.type?"bold":"bold-italic",mathbf:"bold",mathbb:"double-struck",mathsfit:"sans-serif-italic",mathfrak:"fraktur",mathscr:"script",mathcal:"script",mathsf:"sans-serif",mathtt:"monospace"},Dt=(e,t)=>{if("text"===e.mode){if("texttt"===t.fontFamily)return"monospace";if("textsf"===t.fontFamily)return"textit"===t.fontShape&&"textbf"===t.fontWeight?"sans-serif-bold-italic":"textit"===t.fontShape?"sans-serif-italic":"textbf"===t.fontWeight?"bold-sans-serif":"sans-serif";if("textit"===t.fontShape&&"textbf"===t.fontWeight)return"bold-italic";if("textit"===t.fontShape)return"italic";if("textbf"===t.fontWeight)return"bold"}const r=t.font;if(!r||"mathnormal"===r)return null;const n=e.mode,o=Ot[r];if(o)return"function"==typeof o?o(e):o;let s=e.text;if(Rt.has(s))return null;if(ne[n][s]){const e=ne[n][s].replace;e&&(s=e)}return ee(s,et[r].fontName,n)?et[r].variant:null};function Lt(e){if(!e)return!1;if("mi"===e.type&&1===e.children.length){const t=e.children[0];return t instanceof qt&&"."===t.text}if("mo"===e.type&&1===e.children.length&&"true"===e.getAttribute("separator")&&"0em"===e.getAttribute("lspace")&&"0em"===e.getAttribute("rspace")){const t=e.children[0];return t instanceof qt&&","===t.text}return!1}const Pt=function(e,t,r){if(1===e.length){const n=Vt(e[0],t);return r&&n instanceof Bt&&"mo"===n.type&&(n.setAttribute("lspace","0em"),n.setAttribute("rspace","0em")),[n]}const n=[];let o;for(let r=0;r=1&&("mn"===o.type||Lt(o))){const e=s.children[0];e instanceof Bt&&"mn"===e.type&&(e.children=[...o.children,...e.children],n.pop())}else if("mi"===o.type&&1===o.children.length){const e=o.children[0];if(e instanceof qt&&"\u0338"===e.text&&("mo"===s.type||"mi"===s.type||"mn"===s.type)){const e=s.children[0];e instanceof qt&&e.text.length>0&&(e.text=e.text.slice(0,1)+"\u0338"+e.text.slice(1),n.pop())}}}n.push(s),o=s}return n},Ft=function(e,t,r){return Nt(Pt(e,t,r))},Vt=function(e,t){if(!e)return new Bt("mrow");if(ht[e.type])return ht[e.type](e,t);throw new n("Got group of unknown type: '"+e.type+"'")};function Gt(e,t,r,n,o){const s=Pt(e,r);let i;i=1===s.length&&s[0]instanceof Bt&&Ht.has(s[0].type)?s[0]:new Bt("mrow",s);const l=new Bt("annotation",[new qt(t)]);l.setAttribute("encoding","application/x-tex");const a=new Bt("semantics",[i,l]),c=new Bt("math",[a]);c.setAttribute("xmlns","http://www.w3.org/1998/Math/MathML"),n&&c.setAttribute("display","block");return je([o?"katex":"katex-mathml"],[c])}const Ut=[[1,1,1],[2,1,1],[3,1,1],[4,2,1],[5,2,1],[6,3,1],[7,4,2],[8,6,3],[9,7,6],[10,8,7],[11,10,9]],Xt=[.5,.6,.7,.8,.9,1,1.2,1.44,1.728,2.074,2.488],Yt=function(e,t){return t.size<2?e:Ut[e-1][t.size-1]};class jt{constructor(e){this.style=void 0,this.color=void 0,this.size=void 0,this.textSize=void 0,this.phantom=void 0,this.font=void 0,this.fontFamily=void 0,this.fontWeight=void 0,this.fontShape=void 0,this.sizeMultiplier=void 0,this.maxSize=void 0,this.minRuleThickness=void 0,this._fontMetrics=void 0,this.style=e.style,this.color=e.color,this.size=e.size||jt.BASESIZE,this.textSize=e.textSize||this.size,this.phantom=!!e.phantom,this.font=e.font||"",this.fontFamily=e.fontFamily||"",this.fontWeight=e.fontWeight||"",this.fontShape=e.fontShape||"",this.sizeMultiplier=Xt[this.size-1],this.maxSize=e.maxSize,this.minRuleThickness=e.minRuleThickness,this._fontMetrics=void 0}extend(e){const t={style:this.style,size:this.size,textSize:this.textSize,color:this.color,phantom:this.phantom,font:this.font,fontFamily:this.fontFamily,fontWeight:this.fontWeight,fontShape:this.fontShape,maxSize:this.maxSize,minRuleThickness:this.minRuleThickness};return Object.assign(t,e),new jt(t)}havingStyle(e){return this.style===e?this:this.extend({style:e,size:Yt(this.textSize,e)})}havingCrampedStyle(){return this.havingStyle(this.style.cramp())}havingSize(e){return this.size===e&&this.textSize===e?this:this.extend({style:this.style.text(),size:e,textSize:e,sizeMultiplier:Xt[e-1]})}havingBaseStyle(e){e=e||this.style.text();const t=Yt(jt.BASESIZE,e);return this.size===t&&this.textSize===jt.BASESIZE&&this.style===e?this:this.extend({style:e,size:t})}havingBaseSizing(){let e;switch(this.style.id){case 4:case 5:e=3;break;case 6:case 7:e=1;break;default:e=6}return this.extend({style:this.style.text(),size:e})}withColor(e){return this.extend({color:e})}withPhantom(){return this.extend({phantom:!0})}withFont(e){return this.extend({font:e})}withTextFontFamily(e){return this.extend({fontFamily:e,font:""})}withTextFontWeight(e){return this.extend({fontWeight:e,font:""})}withTextFontShape(e){return this.extend({fontShape:e,font:""})}sizingClasses(e){return e.size!==this.size?["sizing","reset-size"+e.size,"size"+this.size]:[]}baseSizingClasses(){return this.size!==jt.BASESIZE?["sizing","reset-size"+this.size,"size"+jt.BASESIZE]:[]}fontMetrics(){return this._fontMetrics||(this._fontMetrics=function(e){let t;if(t=e>=5?0:e>=3?1:2,!te[t]){const e=te[t]={cssEmPerMu:J.quad[t]/18};for(const r in J)J.hasOwnProperty(r)&&(e[r]=J[r][t])}return te[t]}(this.size)),this._fontMetrics}getColor(){return this.phantom?"transparent":this.color}}jt.BASESIZE=6;var Wt=jt;const _t=function(e){return new Wt({style:e.displayMode?S.DISPLAY:S.TEXT,maxSize:e.maxSize,minRuleThickness:e.minRuleThickness})},$t=function(e,t){if(t.displayMode){const r=["katex-display"];t.leqno&&r.push("leqno"),t.fleqn&&r.push("fleqn"),e=je(r,[e])}return e},Zt=function(e,t,r){const n=_t(r);let o;if("mathml"===r.output)return Gt(e,t,n,r.displayMode,!0);if("html"===r.output){const t=Tt(e,n);o=je(["katex"],[t])}else{const s=Gt(e,t,n,r.displayMode,!1),i=Tt(e,n);o=je(["katex"],[s,i])}return $t(o,r)};const Kt={widehat:"^",widecheck:"\u02c7",widetilde:"~",utilde:"~",overleftarrow:"\u2190",underleftarrow:"\u2190",xleftarrow:"\u2190",overrightarrow:"\u2192",underrightarrow:"\u2192",xrightarrow:"\u2192",underbrace:"\u23df",overbrace:"\u23de",underbracket:"\u23b5",overbracket:"\u23b4",overgroup:"\u23e0",undergroup:"\u23e1",overleftrightarrow:"\u2194",underleftrightarrow:"\u2194",xleftrightarrow:"\u2194",Overrightarrow:"\u21d2",xRightarrow:"\u21d2",overleftharpoon:"\u21bc",xleftharpoonup:"\u21bc",overrightharpoon:"\u21c0",xrightharpoonup:"\u21c0",xLeftarrow:"\u21d0",xLeftrightarrow:"\u21d4",xhookleftarrow:"\u21a9",xhookrightarrow:"\u21aa",xmapsto:"\u21a6",xrightharpoondown:"\u21c1",xleftharpoondown:"\u21bd",xrightleftharpoons:"\u21cc",xleftrightharpoons:"\u21cb",xtwoheadleftarrow:"\u219e",xtwoheadrightarrow:"\u21a0",xlongequal:"=",xtofrom:"\u21c4",xrightleftarrows:"\u21c4",xrightequilibrium:"\u21cc",xleftequilibrium:"\u21cb","\\cdrightarrow":"\u2192","\\cdleftarrow":"\u2190","\\cdlongequal":"="},Jt=function(e){const t=new Bt("mo",[new qt(Kt[e.replace(/^\\/,"")])]);return t.setAttribute("stretchy","true"),t},Qt={overrightarrow:[["rightarrow"],.888,522,"xMaxYMin"],overleftarrow:[["leftarrow"],.888,522,"xMinYMin"],underrightarrow:[["rightarrow"],.888,522,"xMaxYMin"],underleftarrow:[["leftarrow"],.888,522,"xMinYMin"],xrightarrow:[["rightarrow"],1.469,522,"xMaxYMin"],"\\cdrightarrow":[["rightarrow"],3,522,"xMaxYMin"],xleftarrow:[["leftarrow"],1.469,522,"xMinYMin"],"\\cdleftarrow":[["leftarrow"],3,522,"xMinYMin"],Overrightarrow:[["doublerightarrow"],.888,560,"xMaxYMin"],xRightarrow:[["doublerightarrow"],1.526,560,"xMaxYMin"],xLeftarrow:[["doubleleftarrow"],1.526,560,"xMinYMin"],overleftharpoon:[["leftharpoon"],.888,522,"xMinYMin"],xleftharpoonup:[["leftharpoon"],.888,522,"xMinYMin"],xleftharpoondown:[["leftharpoondown"],.888,522,"xMinYMin"],overrightharpoon:[["rightharpoon"],.888,522,"xMaxYMin"],xrightharpoonup:[["rightharpoon"],.888,522,"xMaxYMin"],xrightharpoondown:[["rightharpoondown"],.888,522,"xMaxYMin"],xlongequal:[["longequal"],.888,334,"xMinYMin"],"\\cdlongequal":[["longequal"],3,334,"xMinYMin"],xtwoheadleftarrow:[["twoheadleftarrow"],.888,334,"xMinYMin"],xtwoheadrightarrow:[["twoheadrightarrow"],.888,334,"xMaxYMin"],overleftrightarrow:[["leftarrow","rightarrow"],.888,522],overbrace:[["leftbrace","midbrace","rightbrace"],1.6,548],underbrace:[["leftbraceunder","midbraceunder","rightbraceunder"],1.6,548],underleftrightarrow:[["leftarrow","rightarrow"],.888,522],xleftrightarrow:[["leftarrow","rightarrow"],1.75,522],xLeftrightarrow:[["doubleleftarrow","doublerightarrow"],1.75,560],xrightleftharpoons:[["leftharpoondownplus","rightharpoonplus"],1.75,716],xleftrightharpoons:[["leftharpoonplus","rightharpoondownplus"],1.75,716],xhookleftarrow:[["leftarrow","righthook"],1.08,522],xhookrightarrow:[["lefthook","rightarrow"],1.08,522],overlinesegment:[["leftlinesegment","rightlinesegment"],.888,522],underlinesegment:[["leftlinesegment","rightlinesegment"],.888,522],overbracket:[["leftbracketover","rightbracketover"],1.6,440],underbracket:[["leftbracketunder","rightbracketunder"],1.6,410],overgroup:[["leftgroup","rightgroup"],.888,342],undergroup:[["leftgroupunder","rightgroupunder"],.888,342],xmapsto:[["leftmapsto","rightarrow"],1.5,522],xtofrom:[["leftToFrom","rightToFrom"],1.75,528],xrightleftarrows:[["baraboveleftarrow","rightarrowabovebar"],1.75,901],xrightequilibrium:[["baraboveshortleftharpoon","rightharpoonaboveshortbar"],1.75,716],xleftequilibrium:[["shortbaraboveleftharpoon","shortrightharpoonabovebar"],1.75,716]},er=new Set(["widehat","widecheck","widetilde","utilde"]),tr=function(e,t){const{span:r,minWidth:n,height:o}=function(){let r=4e5;const n=e.label.slice(1);if(er.has(n)&&"base"in e){const o="ordgroup"===e.base.type?e.base.body.length:1;let s,i,l;if(o>5)"widehat"===n||"widecheck"===n?(s=420,r=2364,l=.42,i=n+"4"):(s=312,r=2340,l=.34,i="tilde4");else{const e=[1,1,2,2,3,3][o];"widehat"===n||"widecheck"===n?(r=[0,1062,2364,2364,2364][e],s=[0,239,300,360,420][e],l=[0,.24,.3,.3,.36,.42][e],i=n+e):(r=[0,600,1033,2339,2340][e],s=[0,260,286,306,312][e],l=[0,.26,.286,.3,.306,.34][e],i="tilde"+e)}const a=new $(i),c=new _([a],{width:"100%",height:O(l),viewBox:"0 0 "+r+" "+s,preserveAspectRatio:"none"});return{span:We([],[c],t),minWidth:0,height:l}}{const e=[],o=Qt[n];if(!o)throw new Error('No SVG data for "'+n+'".');const[s,i,l]=o,a=l/1e3,c=s.length;let h,m;if(1===c){if(4!==o.length)throw new Error('Expected 4-tuple for single-path SVG data "'+n+'".');h=["hide-tail"],m=[o[3]]}else if(2===c)h=["halfarrow-left","halfarrow-right"],m=["xMinYMin","xMaxYMin"];else{if(3!==c)throw new Error("Correct katexImagesData or update code here to support\n "+c+" children.");h=["brace-left","brace-center","brace-right"],m=["xMinYMin","xMidYMin","xMaxYMin"]}for(let n=0;n0&&(r.style.minWidth=O(n)),r},rr={bin:1,close:1,inner:1,open:1,punct:1,rel:1},nr={"accent-token":1,mathord:1,"op-token":1,spacing:1,textord:1};function or(e,t){if(!e||e.type!==t)throw new Error("Expected node of type "+t+", but got "+(e?"node of type "+e.type:String(e)));return e}function sr(e){const t=ir(e);if(!t)throw new Error("Expected node of symbol group type, but got "+(e?"node of type "+e.type:String(e)));return t}function ir(e){return e&&("atom"===e.type||nr.hasOwnProperty(e.type))?e:null}const lr=e=>{return e instanceof W?e:((t=e)instanceof U||t instanceof X||t instanceof I)&&1===e.children.length?lr(e.children[0]):void 0;var t},ar=(e,t)=>{let r,n,o;e&&"supsub"===e.type?(n=or(e.base,"accent"),r=n.base,e.base=r,o=function(e){if(e instanceof U)return e;throw new Error("Expected span but got "+String(e)+".")}(Mt(e,t)),e.base=n):(n=or(e,"accent"),r=n.base);const s=Mt(r,t.havingCrampedStyle());let i=0;var l,a;n.isShifty&&m(r)&&(i=null!=(l=null==(a=lr(s))?void 0:a.skew)?l:0);const c="\\c"===n.label;let h,u=c?s.height+s.depth:Math.min(s.height,t.fontMetrics().xHeight);if(n.isStretchy)h=tr(n,t),h=Ke({positionType:"firstBaseline",children:[{type:"elem",elem:s},{type:"elem",elem:h,wrapperClasses:["svg-align"],wrapperStyle:i>0?{width:"calc(100% - "+O(2*i)+")",marginLeft:O(2*i)}:void 0}]});else{let e,r;"\\vec"===n.label?(e=rt("vec",t),r=tt.vec[1]):(e=Ge({type:"textord",mode:n.mode,text:n.label},t),e=function(e){if(e instanceof W)return e;throw new Error("Expected symbolNode but got "+String(e)+".")}(e),e.italic=0,r=e.width,c&&(u+=e.depth)),h=je(["accent-body"],[e]);const o="\\textcircled"===n.label;o&&(h.classes.push("accent-full"),u=s.height);let l=i;o||(l-=r/2),h.style.left=O(l),"\\textcircled"===n.label&&(h.style.top=".2em"),h=Ke({positionType:"firstBaseline",children:[{type:"elem",elem:s},{type:"kern",size:-u},{type:"elem",elem:h}]})}const p=je(["mord","accent"],[h],t);return o?(o.children[0]=p,o.height=Math.max(p.height,o.height),o.classes[0]="mord",o):p},cr=new RegExp(["\\acute","\\grave","\\ddot","\\tilde","\\bar","\\breve","\\check","\\hat","\\vec","\\dot","\\mathring"].map(e=>"\\"+e).join("|"));mt({type:"accent",names:["\\acute","\\grave","\\ddot","\\tilde","\\bar","\\breve","\\check","\\hat","\\vec","\\dot","\\mathring","\\widecheck","\\widehat","\\widetilde","\\overrightarrow","\\overleftarrow","\\Overrightarrow","\\overleftrightarrow","\\overgroup","\\overlinesegment","\\overleftharpoon","\\overrightharpoon"],numArgs:1,handler:(e,t)=>{const r=pt(t[0]),n=!cr.test(e.funcName),o=!n||"\\widehat"===e.funcName||"\\widetilde"===e.funcName||"\\widecheck"===e.funcName;return{type:"accent",mode:e.parser.mode,label:e.funcName,isStretchy:n,isShifty:o,base:r}},htmlBuilder:ar,mathmlBuilder:(e,t)=>{const r=e.isStretchy?Jt(e.label):new Bt("mo",[Et(e.label,e.mode)]),n=new Bt("mover",[Vt(e.base,t),r]);return n.setAttribute("accent","true"),n}}),mt({type:"accent",names:["\\'","\\`","\\^","\\~","\\=","\\u","\\.",'\\"',"\\c","\\r","\\H","\\v","\\textcircled"],numArgs:1,allowedInText:!0,allowedInMath:!0,argTypes:["primitive"],handler:(e,t)=>{const r=t[0];let n=e.parser.mode;return"math"===n&&(e.parser.settings.reportNonstrict("mathVsTextAccents","LaTeX's accent "+e.funcName+" works only in text mode"),n="text"),{type:"accent",mode:n,label:e.funcName,isStretchy:!1,isShifty:!0,base:r}}}),mt({type:"accentUnder",names:["\\underleftarrow","\\underrightarrow","\\underleftrightarrow","\\undergroup","\\underlinesegment","\\utilde"],numArgs:1,handler:(e,t)=>{let{parser:r,funcName:n}=e;const o=t[0];return{type:"accentUnder",mode:r.mode,label:n,base:o}},htmlBuilder:(e,t)=>{const r=Mt(e.base,t),n=tr(e,t),o="\\utilde"===e.label?.12:0,s=Ke({positionType:"top",positionData:r.height,children:[{type:"elem",elem:n,wrapperClasses:["svg-align"]},{type:"kern",size:o},{type:"elem",elem:r}]});return je(["mord","accentunder"],[s],t)},mathmlBuilder:(e,t)=>{const r=Jt(e.label),n=new Bt("munder",[Vt(e.base,t),r]);return n.setAttribute("accentunder","true"),n}});const hr=e=>{const t=new Bt("mpadded",e?[e]:[]);return t.setAttribute("width","+0.6em"),t.setAttribute("lspace","0.3em"),t};mt({type:"xArrow",names:["\\xleftarrow","\\xrightarrow","\\xLeftarrow","\\xRightarrow","\\xleftrightarrow","\\xLeftrightarrow","\\xhookleftarrow","\\xhookrightarrow","\\xmapsto","\\xrightharpoondown","\\xrightharpoonup","\\xleftharpoondown","\\xleftharpoonup","\\xrightleftharpoons","\\xleftrightharpoons","\\xlongequal","\\xtwoheadrightarrow","\\xtwoheadleftarrow","\\xtofrom","\\xrightleftarrows","\\xrightequilibrium","\\xleftequilibrium","\\\\cdrightarrow","\\\\cdleftarrow","\\\\cdlongequal"],numArgs:1,numOptionalArgs:1,handler(e,t,r){let{parser:n,funcName:o}=e;return{type:"xArrow",mode:n.mode,label:o,body:t[0],below:r[0]}},htmlBuilder(e,t){const r=t.style;let n=t.havingStyle(r.sup());const o=Ze(Mt(e.body,n,t),t),s="\\x"===e.label.slice(0,2)?"x":"cd";let i;o.classes.push(s+"-arrow-pad"),e.below&&(n=t.havingStyle(r.sub()),i=Ze(Mt(e.below,n,t),t),i.classes.push(s+"-arrow-pad"));const l=tr(e,t),a=-t.fontMetrics().axisHeight+.5*l.height;let c,h=-t.fontMetrics().axisHeight-.5*l.height-.111;if((o.depth>.25||"\\xleftequilibrium"===e.label)&&(h-=o.depth),i){const e=-t.fontMetrics().axisHeight+i.height+.5*l.height+.111;c=Ke({positionType:"individualShift",children:[{type:"elem",elem:o,shift:h},{type:"elem",elem:l,shift:a,wrapperClasses:["svg-align"]},{type:"elem",elem:i,shift:e}]})}else c=Ke({positionType:"individualShift",children:[{type:"elem",elem:o,shift:h},{type:"elem",elem:l,shift:a,wrapperClasses:["svg-align"]}]});return je(["mrel","x-arrow"],[c],t)},mathmlBuilder(e,t){const r=Jt(e.label);let n;if(r.setAttribute("minsize","x"===e.label.charAt(0)?"1.75em":"3.0em"),e.body){const o=hr(Vt(e.body,t));if(e.below){const s=hr(Vt(e.below,t));n=new Bt("munderover",[r,s,o])}else n=new Bt("mover",[r,o])}else if(e.below){const o=hr(Vt(e.below,t));n=new Bt("munder",[r,o])}else n=hr(),n=new Bt("mover",[r,n]);return n}}),mt({type:"mclass",names:["\\mathord","\\mathbin","\\mathrel","\\mathopen","\\mathclose","\\mathpunct","\\mathinner"],numArgs:1,primitive:!0,handler(e,t){let{parser:r,funcName:n}=e;const o=t[0];return{type:"mclass",mode:r.mode,mclass:"m"+n.slice(5),body:dt(o),isCharacterBox:m(o)}},htmlBuilder:function(e,t){const r=xt(e.body,t,!0);return je([e.mclass],r,t)},mathmlBuilder:function(e,t){let r;const n=Pt(e.body,t);return"minner"===e.mclass?r=new Bt("mpadded",n):"mord"===e.mclass?e.isCharacterBox?(r=n[0],r.type="mi"):r=new Bt("mi",n):(e.isCharacterBox?(r=n[0],r.type="mo"):r=new Bt("mo",n),"mbin"===e.mclass?(r.attributes.lspace="0.22em",r.attributes.rspace="0.22em"):"mpunct"===e.mclass?(r.attributes.lspace="0em",r.attributes.rspace="0.17em"):"mopen"!==e.mclass&&"mclose"!==e.mclass||(r.attributes.lspace="0em",r.attributes.rspace="0em")),r}});const mr=e=>{const t="ordgroup"===e.type&&e.body.length?e.body[0]:e;return"atom"!==t.type||"bin"!==t.family&&"rel"!==t.family?"mord":"m"+t.family};mt({type:"mclass",names:["\\@binrel"],numArgs:2,handler(e,t){let{parser:r}=e;return{type:"mclass",mode:r.mode,mclass:mr(t[0]),body:dt(t[1]),isCharacterBox:m(t[1])}}}),mt({type:"mclass",names:["\\stackrel","\\overset","\\underset"],numArgs:2,handler(e,t){let{parser:r,funcName:n}=e;const o=t[1],s=t[0];let i;i="\\stackrel"!==n?mr(o):"mrel";const l={type:"op",mode:o.mode,limits:!0,alwaysHandleSupSub:!0,parentIsSupSub:!1,symbol:!1,suppressBaseShift:"\\stackrel"!==n,body:dt(o)},a="\\underset"===n?{type:"supsub",mode:s.mode,base:l,sub:s}:{type:"supsub",mode:s.mode,base:l,sup:s};return{type:"mclass",mode:r.mode,mclass:i,body:[a],isCharacterBox:m(a)}}}),mt({type:"pmb",names:["\\pmb"],numArgs:1,allowedInText:!0,handler(e,t){let{parser:r}=e;return{type:"pmb",mode:r.mode,mclass:mr(t[0]),body:dt(t[0])}},htmlBuilder(e,t){const r=xt(e.body,t,!0),n=je([e.mclass],r,t);return n.style.textShadow="0.02em 0.01em 0.04px",n},mathmlBuilder(e,t){const r=Pt(e.body,t),n=new Bt("mstyle",r);return n.setAttribute("style","text-shadow: 0.02em 0.01em 0.04px"),n}});const ur={">":"\\\\cdrightarrow","<":"\\\\cdleftarrow","=":"\\\\cdlongequal",A:"\\uparrow",V:"\\downarrow","|":"\\Vert",".":"no arrow"},pr=()=>({type:"styling",body:[],mode:"math",style:"display",resetFont:!0}),dr=e=>"textord"===e.type&&"@"===e.text,gr=(e,t)=>("mathord"===e.type||"atom"===e.type)&&e.text===t;function fr(e,t,r){const n=ur[e];switch(n){case"\\\\cdrightarrow":case"\\\\cdleftarrow":return r.callFunction(n,[t[0]],[t[1]]);case"\\uparrow":case"\\downarrow":{const e={type:"atom",text:n,mode:"math",family:"rel"},o={type:"ordgroup",mode:"math",body:[r.callFunction("\\\\cdleft",[t[0]],[]),r.callFunction("\\Big",[e],[]),r.callFunction("\\\\cdright",[t[1]],[])]};return r.callFunction("\\\\cdparent",[o],[])}case"\\\\cdlongequal":return r.callFunction("\\\\cdlongequal",[],[]);case"\\Vert":{const e={type:"textord",text:"\\Vert",mode:"math"};return r.callFunction("\\Big",[e],[])}default:return{type:"textord",text:" ",mode:"math"}}}mt({type:"cdlabel",names:["\\\\cdleft","\\\\cdright"],numArgs:1,handler(e,t){let{parser:r,funcName:n}=e;return{type:"cdlabel",mode:r.mode,side:n.slice(4),label:t[0]}},htmlBuilder(e,t){const r=t.havingStyle(t.style.sup()),n=Ze(Mt(e.label,r,t),t);return n.classes.push("cd-label-"+e.side),n.style.bottom=O(.8-n.depth),n.height=0,n.depth=0,n},mathmlBuilder(e,t){let r=new Bt("mrow",[Vt(e.label,t)]);return r=new Bt("mpadded",[r]),r.setAttribute("width","0"),"left"===e.side&&r.setAttribute("lspace","-1width"),r.setAttribute("voffset","0.7em"),r=new Bt("mstyle",[r]),r.setAttribute("displaystyle","false"),r.setAttribute("scriptlevel","1"),r}}),mt({type:"cdlabelparent",names:["\\\\cdparent"],numArgs:1,handler(e,t){let{parser:r}=e;return{type:"cdlabelparent",mode:r.mode,fragment:t[0]}},htmlBuilder(e,t){const r=Ze(Mt(e.fragment,t),t);return r.classes.push("cd-vert-arrow"),r},mathmlBuilder(e,t){return new Bt("mrow",[Vt(e.fragment,t)])}}),mt({type:"textord",names:["\\@char"],numArgs:1,allowedInText:!0,handler(e,t){let{parser:r}=e;const o=or(t[0],"ordgroup").body;let s="";for(let e=0;e=1114111)throw new n("\\@char with invalid code point "+s);return l<=65535?i=String.fromCharCode(l):(l-=65536,i=String.fromCharCode(55296+(l>>10),56320+(1023&l))),{type:"textord",mode:r.mode,text:i}}});mt({type:"color",names:["\\textcolor"],numArgs:2,allowedInText:!0,argTypes:["color","original"],handler(e,t){let{parser:r}=e;const n=or(t[0],"color-token").color,o=t[1];return{type:"color",mode:r.mode,color:n,body:dt(o)}},htmlBuilder:(e,t)=>{const r=xt(e.body,t.withColor(e.color),!1);return $e(r)},mathmlBuilder:(e,t)=>{const r=Pt(e.body,t.withColor(e.color)),n=new Bt("mstyle",r);return n.setAttribute("mathcolor",e.color),n}}),mt({type:"color",names:["\\color"],numArgs:1,allowedInText:!0,argTypes:["color"],handler(e,t){let{parser:r,breakOnTokenText:n}=e;const o=or(t[0],"color-token").color;r.gullet.macros.set("\\current@color",o);const s=r.parseExpression(!0,n);return{type:"color",mode:r.mode,color:o,body:s}}}),mt({type:"cr",names:["\\\\"],numArgs:0,numOptionalArgs:0,allowedInText:!0,handler(e,t,r){let{parser:n}=e;const o="["===n.gullet.future().text?n.parseSizeGroup(!0):null,s=!n.settings.displayMode||!n.settings.useStrictBehavior("newLineInDisplayMode","In LaTeX, \\\\ or \\newline does nothing in display mode");return{type:"cr",mode:n.mode,newLine:s,size:o&&or(o,"size").value}},htmlBuilder(e,t){const r=je(["mspace"],[],t);return e.newLine&&(r.classes.push("newline"),e.size&&(r.style.marginTop=O(N(e.size,t)))),r},mathmlBuilder(e,t){const r=new Bt("mspace");return e.newLine&&(r.setAttribute("linebreak","newline"),e.size&&r.setAttribute("height",O(N(e.size,t)))),r}});const br={"\\global":"\\global","\\long":"\\\\globallong","\\\\globallong":"\\\\globallong","\\def":"\\gdef","\\gdef":"\\gdef","\\edef":"\\xdef","\\xdef":"\\xdef","\\let":"\\\\globallet","\\futurelet":"\\\\globalfuture"},yr=e=>{const t=e.text;if(/^(?:[\\{}$&#^_]|EOF)$/.test(t))throw new n("Expected a control sequence",e);return t},xr=(e,t,r,n)=>{let o=e.gullet.macros.get(r.text);null==o&&(r.noexpand=!0,o={tokens:[r],numArgs:0,unexpandable:!e.gullet.isExpandable(r.text)}),e.gullet.macros.set(t,o,n)};mt({type:"internal",names:["\\global","\\long","\\\\globallong"],numArgs:0,allowedInText:!0,handler(e){let{parser:t,funcName:r}=e;t.consumeSpaces();const o=t.fetch();if(br[o.text])return"\\global"!==r&&"\\\\globallong"!==r||(o.text=br[o.text]),or(t.parseFunction(),"internal");throw new n("Invalid token after macro prefix",o)}}),mt({type:"internal",names:["\\def","\\gdef","\\edef","\\xdef"],numArgs:0,allowedInText:!0,primitive:!0,handler(e){let{parser:t,funcName:r}=e,o=t.gullet.popToken();const s=o.text;if(/^(?:[\\{}$&#^_]|EOF)$/.test(s))throw new n("Expected a control sequence",o);let i,l=0;const a=[[]];for(;"{"!==t.gullet.future().text;)if(o=t.gullet.popToken(),"#"===o.text){if("{"===t.gullet.future().text){i=t.gullet.future(),a[l].push("{");break}if(o=t.gullet.popToken(),!/^[1-9]$/.test(o.text))throw new n('Invalid argument number "'+o.text+'"');if(parseInt(o.text)!==l+1)throw new n('Argument number "'+o.text+'" out of order');l++,a.push([])}else{if("EOF"===o.text)throw new n("Expected a macro definition");a[l].push(o.text)}let{tokens:c}=t.gullet.consumeArg();return i&&c.unshift(i),"\\edef"!==r&&"\\xdef"!==r||(c=t.gullet.expandTokens(c),c.reverse()),t.gullet.macros.set(s,{tokens:c,numArgs:l,delimiters:a},r===br[r]),{type:"internal",mode:t.mode}}}),mt({type:"internal",names:["\\let","\\\\globallet"],numArgs:0,allowedInText:!0,primitive:!0,handler(e){let{parser:t,funcName:r}=e;const n=yr(t.gullet.popToken());t.gullet.consumeSpaces();const o=(e=>{let t=e.gullet.popToken();return"="===t.text&&(t=e.gullet.popToken()," "===t.text&&(t=e.gullet.popToken())),t})(t);return xr(t,n,o,"\\\\globallet"===r),{type:"internal",mode:t.mode}}}),mt({type:"internal",names:["\\futurelet","\\\\globalfuture"],numArgs:0,allowedInText:!0,primitive:!0,handler(e){let{parser:t,funcName:r}=e;const n=yr(t.gullet.popToken()),o=t.gullet.popToken(),s=t.gullet.popToken();return xr(t,n,s,"\\\\globalfuture"===r),t.gullet.pushToken(s),t.gullet.pushToken(o),{type:"internal",mode:t.mode}}});const wr=function(e,t,r){const n=ee(ne.math[e]&&ne.math[e].replace||e,t,r);if(!n)throw new Error("Unsupported symbol "+e+" and font size "+t+".");return n},vr=function(e,t,r,n){const o=r.havingBaseStyle(t),s=je(n.concat(o.sizingClasses(r)),[e],r),i=o.sizeMultiplier/r.sizeMultiplier;return s.height*=i,s.depth*=i,s.maxFontSize=o.sizeMultiplier,s},kr=function(e,t,r){const n=t.havingBaseStyle(r),o=(1-t.sizeMultiplier/n.sizeMultiplier)*t.fontMetrics().axisHeight;e.classes.push("delimcenter"),e.style.top=O(o),e.height-=o,e.depth+=o},zr=function(e,t,r,n,o,s){const i=function(e,t,r,n){return Fe(e,"Size"+t+"-Regular",r,n)}(e,t,o,n),l=vr(je(["delimsizing","size"+t],[i],n),S.TEXT,n,s);return r&&kr(l,n,S.TEXT),l},Sr=function(e,t,r){let n;n="Size1-Regular"===t?"delim-size1":"delim-size4";return{type:"elem",elem:je(["delimsizinginner",n],[je([],[Fe(e,t,r)])])}},Mr=function(e,t,r){const n=K["Size4-Regular"][e.charCodeAt(0)]?K["Size4-Regular"][e.charCodeAt(0)][4]:K["Size1-Regular"][e.charCodeAt(0)][4],o=new $("inner",function(e,t){switch(e){case"\u239c":return C("M291 0 H417 V"+t+" H291z");case"\u2223":return C("M145 0 H188 V"+t+" H145z");case"\u2225":return C("M145 0 H188 V"+t+" H145z")+C("M367 0 H410 V"+t+" H367z");case"\u239f":return C("M457 0 H583 V"+t+" H457z");case"\u23a2":return C("M319 0 H403 V"+t+" H319z");case"\u23a5":return C("M263 0 H347 V"+t+" H263z");case"\u23aa":return C("M384 0 H504 V"+t+" H384z");case"\u23d0":return C("M312 0 H355 V"+t+" H312z");case"\u2016":return C("M257 0 H300 V"+t+" H257z")+C("M478 0 H521 V"+t+" H478z");default:return""}}(e,Math.round(1e3*t))),s=new _([o],{width:O(n),height:O(t),style:"width:"+O(n),viewBox:"0 0 "+1e3*n+" "+Math.round(1e3*t),preserveAspectRatio:"xMinYMin"}),i=We([],[s],r);return i.height=t,i.style.height=O(t),i.style.width=O(n),{type:"elem",elem:i}},Ar={type:"kern",size:-.008},Tr=new Set(["|","\\lvert","\\rvert","\\vert"]),Cr=new Set(["\\|","\\lVert","\\rVert","\\Vert"]),Br=function(e,t,r,n,o,s){let i,l,a,c,h="",m=0;i=a=c=e,l=null;let u="Size1-Regular";"\\uparrow"===e?a=c="\u23d0":"\\Uparrow"===e?a=c="\u2016":"\\downarrow"===e?i=a="\u23d0":"\\Downarrow"===e?i=a="\u2016":"\\updownarrow"===e?(i="\\uparrow",a="\u23d0",c="\\downarrow"):"\\Updownarrow"===e?(i="\\Uparrow",a="\u2016",c="\\Downarrow"):Tr.has(e)?(a="\u2223",h="vert",m=333):Cr.has(e)?(a="\u2225",h="doublevert",m=556):"["===e||"\\lbrack"===e?(i="\u23a1",a="\u23a2",c="\u23a3",u="Size4-Regular",h="lbrack",m=667):"]"===e||"\\rbrack"===e?(i="\u23a4",a="\u23a5",c="\u23a6",u="Size4-Regular",h="rbrack",m=667):"\\lfloor"===e||"\u230a"===e?(a=i="\u23a2",c="\u23a3",u="Size4-Regular",h="lfloor",m=667):"\\lceil"===e||"\u2308"===e?(i="\u23a1",a=c="\u23a2",u="Size4-Regular",h="lceil",m=667):"\\rfloor"===e||"\u230b"===e?(a=i="\u23a5",c="\u23a6",u="Size4-Regular",h="rfloor",m=667):"\\rceil"===e||"\u2309"===e?(i="\u23a4",a=c="\u23a5",u="Size4-Regular",h="rceil",m=667):"("===e||"\\lparen"===e?(i="\u239b",a="\u239c",c="\u239d",u="Size4-Regular",h="lparen",m=875):")"===e||"\\rparen"===e?(i="\u239e",a="\u239f",c="\u23a0",u="Size4-Regular",h="rparen",m=875):"\\{"===e||"\\lbrace"===e?(i="\u23a7",l="\u23a8",c="\u23a9",a="\u23aa",u="Size4-Regular"):"\\}"===e||"\\rbrace"===e?(i="\u23ab",l="\u23ac",c="\u23ad",a="\u23aa",u="Size4-Regular"):"\\lgroup"===e||"\u27ee"===e?(i="\u23a7",c="\u23a9",a="\u23aa",u="Size4-Regular"):"\\rgroup"===e||"\u27ef"===e?(i="\u23ab",c="\u23ad",a="\u23aa",u="Size4-Regular"):"\\lmoustache"===e||"\u23b0"===e?(i="\u23a7",c="\u23ad",a="\u23aa",u="Size4-Regular"):"\\rmoustache"!==e&&"\u23b1"!==e||(i="\u23ab",c="\u23a9",a="\u23aa",u="Size4-Regular");const p=wr(i,u,o),d=p.height+p.depth,g=wr(a,u,o),f=g.height+g.depth,b=wr(c,u,o),y=b.height+b.depth;let x=0,w=1;if(null!==l){const e=wr(l,u,o);x=e.height+e.depth,w=2}const v=d+y+x,k=v+Math.max(0,Math.ceil((t-v)/(w*f)))*w*f;let z=n.fontMetrics().axisHeight;r&&(z*=n.sizeMultiplier);const M=k/2-z,A=[];if(h.length>0){const e=k-d-y,t=Math.round(1e3*k),r=function(e,t){switch(e){case"lbrack":return"M403 1759 V84 H666 V0 H319 V1759 v"+t+" v1759 v84 h347 v-84\nH403z M403 1759 V0 H319 V1759 v"+t+" v1759 v84 h84z";case"rbrack":return"M347 1759 V0 H0 V84 H263 V1759 v"+t+" v1759 H0 v84 H347z\nM347 1759 V0 H263 V1759 v"+t+" v1759 h84z";case"vert":return"M145 15 v585 v"+t+" v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v"+-t+" v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v"+t+" v585 h43z";case"doublevert":return"M145 15 v585 v"+t+" v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v"+-t+" v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M188 15 H145 v585 v"+t+" v585 h43z\nM367 15 v585 v"+t+" v585 c2.667,10,9.667,15,21,15\nc10,0,16.667,-5,20,-15 v-585 v"+-t+" v-585 c-2.667,-10,-9.667,-15,-21,-15\nc-10,0,-16.667,5,-20,15z M410 15 H367 v585 v"+t+" v585 h43z";case"lfloor":return"M319 602 V0 H403 V602 v"+t+" v1715 h263 v84 H319z\nMM319 602 V0 H403 V602 v"+t+" v1715 H319z";case"rfloor":return"M319 602 V0 H403 V602 v"+t+" v1799 H0 v-84 H319z\nMM319 602 V0 H403 V602 v"+t+" v1715 H319z";case"lceil":return"M403 1759 V84 H666 V0 H319 V1759 v"+t+" v602 h84z\nM403 1759 V0 H319 V1759 v"+t+" v602 h84z";case"rceil":return"M347 1759 V0 H0 V84 H263 V1759 v"+t+" v602 h84z\nM347 1759 V0 h-84 V1759 v"+t+" v602 h84z";case"lparen":return"M863,9c0,-2,-2,-5,-6,-9c0,0,-17,0,-17,0c-12.7,0,-19.3,0.3,-20,1\nc-5.3,5.3,-10.3,11,-15,17c-242.7,294.7,-395.3,682,-458,1162c-21.3,163.3,-33.3,349,\n-36,557 l0,"+(t+84)+"c0.2,6,0,26,0,60c2,159.3,10,310.7,24,454c53.3,528,210,\n949.7,470,1265c4.7,6,9.7,11.7,15,17c0.7,0.7,7,1,19,1c0,0,18,0,18,0c4,-4,6,-7,6,-9\nc0,-2.7,-3.3,-8.7,-10,-18c-135.3,-192.7,-235.5,-414.3,-300.5,-665c-65,-250.7,-102.5,\n-544.7,-112.5,-882c-2,-104,-3,-167,-3,-189\nl0,-"+(t+92)+"c0,-162.7,5.7,-314,17,-454c20.7,-272,63.7,-513,129,-723c65.3,\n-210,155.3,-396.3,270,-559c6.7,-9.3,10,-15.3,10,-18z";case"rparen":return"M76,0c-16.7,0,-25,3,-25,9c0,2,2,6.3,6,13c21.3,28.7,42.3,60.3,\n63,95c96.7,156.7,172.8,332.5,228.5,527.5c55.7,195,92.8,416.5,111.5,664.5\nc11.3,139.3,17,290.7,17,454c0,28,1.7,43,3.3,45l0,"+(t+9)+"\nc-3,4,-3.3,16.7,-3.3,38c0,162,-5.7,313.7,-17,455c-18.7,248,-55.8,469.3,-111.5,664\nc-55.7,194.7,-131.8,370.3,-228.5,527c-20.7,34.7,-41.7,66.3,-63,95c-2,3.3,-4,7,-6,11\nc0,7.3,5.7,11,17,11c0,0,11,0,11,0c9.3,0,14.3,-0.3,15,-1c5.3,-5.3,10.3,-11,15,-17\nc242.7,-294.7,395.3,-681.7,458,-1161c21.3,-164.7,33.3,-350.7,36,-558\nl0,-"+(t+144)+"c-2,-159.3,-10,-310.7,-24,-454c-53.3,-528,-210,-949.7,\n-470,-1265c-4.7,-6,-9.7,-11.7,-15,-17c-0.7,-0.7,-6.7,-1,-18,-1z";default:throw new Error("Unknown stretchy delimiter.")}}(h,Math.round(1e3*e)),o=new $(h,r),s=O(m/1e3),i=O(t/1e3),l=new _([o],{width:s,height:i,viewBox:"0 0 "+m+" "+t}),a=We([],[l],n);a.height=t/1e3,a.style.width=s,a.style.height=i,A.push({type:"elem",elem:a})}else{if(A.push(Sr(c,u,o)),A.push(Ar),null===l){const e=k-d-y+.016;A.push(Mr(a,e,n))}else{const e=(k-d-y-x)/2+.016;A.push(Mr(a,e,n)),A.push(Ar),A.push(Sr(l,u,o)),A.push(Ar),A.push(Mr(a,e,n))}A.push(Ar),A.push(Sr(i,u,o))}const T=n.havingBaseStyle(S.TEXT),C=Ke({positionType:"bottom",positionData:M,children:A});return vr(je(["delimsizing","mult"],[C],T),S.TEXT,n,s)},qr=.08,Ir=function(e,t,r,n,o){const s=function(e,t,r){t*=1e3;let n="";switch(e){case"sqrtMain":n=function(e,t){return"M95,"+(622+e+t)+"\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl"+e/2.075+" -"+e+"\nc5.3,-9.3,12,-14,20,-14\nH400000v"+(40+e)+"H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM"+(834+e)+" "+t+"h400000v"+(40+e)+"h-400000z"}(t,B);break;case"sqrtSize1":n=function(e,t){return"M263,"+(601+e+t)+"c0.7,0,18,39.7,52,119\nc34,79.3,68.167,158.7,102.5,238c34.3,79.3,51.8,119.3,52.5,120\nc340,-704.7,510.7,-1060.3,512,-1067\nl"+e/2.084+" -"+e+"\nc4.7,-7.3,11,-11,19,-11\nH40000v"+(40+e)+"H1012.3\ns-271.3,567,-271.3,567c-38.7,80.7,-84,175,-136,283c-52,108,-89.167,185.3,-111.5,232\nc-22.3,46.7,-33.8,70.3,-34.5,71c-4.7,4.7,-12.3,7,-23,7s-12,-1,-12,-1\ns-109,-253,-109,-253c-72.7,-168,-109.3,-252,-110,-252c-10.7,8,-22,16.7,-34,26\nc-22,17.3,-33.3,26,-34,26s-26,-26,-26,-26s76,-59,76,-59s76,-60,76,-60z\nM"+(1001+e)+" "+t+"h400000v"+(40+e)+"h-400000z"}(t,B);break;case"sqrtSize2":n=function(e,t){return"M983 "+(10+e+t)+"\nl"+e/3.13+" -"+e+"\nc4,-6.7,10,-10,18,-10 H400000v"+(40+e)+"\nH1013.1s-83.4,268,-264.1,840c-180.7,572,-277,876.3,-289,913c-4.7,4.7,-12.7,7,-24,7\ns-12,0,-12,0c-1.3,-3.3,-3.7,-11.7,-7,-25c-35.3,-125.3,-106.7,-373.3,-214,-744\nc-10,12,-21,25,-33,39s-32,39,-32,39c-6,-5.3,-15,-14,-27,-26s25,-30,25,-30\nc26.7,-32.7,52,-63,76,-91s52,-60,52,-60s208,722,208,722\nc56,-175.3,126.3,-397.3,211,-666c84.7,-268.7,153.8,-488.2,207.5,-658.5\nc53.7,-170.3,84.5,-266.8,92.5,-289.5z\nM"+(1001+e)+" "+t+"h400000v"+(40+e)+"h-400000z"}(t,B);break;case"sqrtSize3":n=function(e,t){return"M424,"+(2398+e+t)+"\nc-1.3,-0.7,-38.5,-172,-111.5,-514c-73,-342,-109.8,-513.3,-110.5,-514\nc0,-2,-10.7,14.3,-32,49c-4.7,7.3,-9.8,15.7,-15.5,25c-5.7,9.3,-9.8,16,-12.5,20\ns-5,7,-5,7c-4,-3.3,-8.3,-7.7,-13,-13s-13,-13,-13,-13s76,-122,76,-122s77,-121,77,-121\ns209,968,209,968c0,-2,84.7,-361.7,254,-1079c169.3,-717.3,254.7,-1077.7,256,-1081\nl"+e/4.223+" -"+e+"c4,-6.7,10,-10,18,-10 H400000\nv"+(40+e)+"H1014.6\ns-87.3,378.7,-272.6,1166c-185.3,787.3,-279.3,1182.3,-282,1185\nc-2,6,-10,9,-24,9\nc-8,0,-12,-0.7,-12,-2z M"+(1001+e)+" "+t+"\nh400000v"+(40+e)+"h-400000z"}(t,B);break;case"sqrtSize4":n=function(e,t){return"M473,"+(2713+e+t)+"\nc339.3,-1799.3,509.3,-2700,510,-2702 l"+e/5.298+" -"+e+"\nc3.3,-7.3,9.3,-11,18,-11 H400000v"+(40+e)+"H1017.7\ns-90.5,478,-276.2,1466c-185.7,988,-279.5,1483,-281.5,1485c-2,6,-10,9,-24,9\nc-8,0,-12,-0.7,-12,-2c0,-1.3,-5.3,-32,-16,-92c-50.7,-293.3,-119.7,-693.3,-207,-1200\nc0,-1.3,-5.3,8.7,-16,30c-10.7,21.3,-21.3,42.7,-32,64s-16,33,-16,33s-26,-26,-26,-26\ns76,-153,76,-153s77,-151,77,-151c0.7,0.7,35.7,202,105,604c67.3,400.7,102,602.7,104,\n606zM"+(1001+e)+" "+t+"h400000v"+(40+e)+"H1017.7z"}(t,B);break;case"sqrtTall":n=function(e,t,r){return"M702 "+(e+t)+"H400000"+(40+e)+"\nH742v"+(r-54-t-e)+"l-4 4-4 4c-.667.7 -2 1.5-4 2.5s-4.167 1.833-6.5 2.5-5.5 1-9.5 1\nh-12l-28-84c-16.667-52-96.667 -294.333-240-727l-212 -643 -85 170\nc-4-3.333-8.333-7.667-13 -13l-13-13l77-155 77-156c66 199.333 139 419.667\n219 661 l218 661zM702 "+t+"H400000v"+(40+e)+"H742z"}(t,B,r)}return n}(e,n,r),i=new $(e,s),l=new _([i],{width:"400em",height:O(t),viewBox:"0 0 400000 "+r,preserveAspectRatio:"xMinYMin slice"});return We(["hide-tail"],[l],o)},Rr=new Set(["(","\\lparen",")","\\rparen","[","\\lbrack","]","\\rbrack","\\{","\\lbrace","\\}","\\rbrace","\\lfloor","\\rfloor","\u230a","\u230b","\\lceil","\\rceil","\u2308","\u2309","\\surd"]),Hr=new Set(["\\uparrow","\\downarrow","\\updownarrow","\\Uparrow","\\Downarrow","\\Updownarrow","|","\\|","\\vert","\\Vert","\\lvert","\\rvert","\\lVert","\\rVert","\\lgroup","\\rgroup","\u27ee","\u27ef","\\lmoustache","\\rmoustache","\u23b0","\u23b1"]),Er=new Set(["<",">","\\langle","\\rangle","/","\\backslash","\\lt","\\gt"]),Nr=[0,1.2,1.8,2.4,3],Or=function(e,t,r,o,s){if("<"===e||"\\lt"===e||"\u27e8"===e?e="\\langle":">"!==e&&"\\gt"!==e&&"\u27e9"!==e||(e="\\rangle"),Rr.has(e)||Er.has(e))return zr(e,t,!1,r,o,s);if(Hr.has(e))return Br(e,Nr[t],!1,r,o,s);throw new n("Illegal delimiter: '"+e+"'")},Dr=[{type:"small",style:S.SCRIPTSCRIPT},{type:"small",style:S.SCRIPT},{type:"small",style:S.TEXT},{type:"large",size:1},{type:"large",size:2},{type:"large",size:3},{type:"large",size:4}],Lr=[{type:"small",style:S.SCRIPTSCRIPT},{type:"small",style:S.SCRIPT},{type:"small",style:S.TEXT},{type:"stack"}],Pr=[{type:"small",style:S.SCRIPTSCRIPT},{type:"small",style:S.SCRIPT},{type:"small",style:S.TEXT},{type:"large",size:1},{type:"large",size:2},{type:"large",size:3},{type:"large",size:4},{type:"stack"}],Fr=function(e){if("small"===e.type)return"Main-Regular";if("large"===e.type)return"Size"+e.size+"-Regular";if("stack"===e.type)return"Size4-Regular";{const t=e.type;throw new Error("Add support for delim type '"+t+"' here.")}},Vr=function(e,t,r,n){for(let o=Math.min(2,3-n.style.size);ot)return s}return r[r.length-1]},Gr=function(e,t,r,n,o,s){let i;"<"===e||"\\lt"===e||"\u27e8"===e?e="\\langle":">"!==e&&"\\gt"!==e&&"\u27e9"!==e||(e="\\rangle"),i=Er.has(e)?Dr:Rr.has(e)?Pr:Lr;const l=Vr(e,t,i,n);return"small"===l.type?function(e,t,r,n,o,s){const i=Fe(e,"Main-Regular",o,n),l=vr(i,t,n,s);return r&&kr(l,n,t),l}(e,l.style,r,n,o,s):"large"===l.type?zr(e,l.size,r,n,o,s):Br(e,t,r,n,o,s)},Ur=function(e,t,r,n,o,s){const i=n.fontMetrics().axisHeight*n.sizeMultiplier,l=5/n.fontMetrics().ptPerEm,a=Math.max(t-i,r+i),c=Math.max(a/500*901,2*a-l);return Gr(e,c,!0,n,o,s)},Xr={"\\bigl":{mclass:"mopen",size:1},"\\Bigl":{mclass:"mopen",size:2},"\\biggl":{mclass:"mopen",size:3},"\\Biggl":{mclass:"mopen",size:4},"\\bigr":{mclass:"mclose",size:1},"\\Bigr":{mclass:"mclose",size:2},"\\biggr":{mclass:"mclose",size:3},"\\Biggr":{mclass:"mclose",size:4},"\\bigm":{mclass:"mrel",size:1},"\\Bigm":{mclass:"mrel",size:2},"\\biggm":{mclass:"mrel",size:3},"\\Biggm":{mclass:"mrel",size:4},"\\big":{mclass:"mord",size:1},"\\Big":{mclass:"mord",size:2},"\\bigg":{mclass:"mord",size:3},"\\Bigg":{mclass:"mord",size:4}},Yr=new Set(["(","\\lparen",")","\\rparen","[","\\lbrack","]","\\rbrack","\\{","\\lbrace","\\}","\\rbrace","\\lfloor","\\rfloor","\u230a","\u230b","\\lceil","\\rceil","\u2308","\u2309","<",">","\\langle","\u27e8","\\rangle","\u27e9","\\lt","\\gt","\\lvert","\\rvert","\\lVert","\\rVert","\\lgroup","\\rgroup","\u27ee","\u27ef","\\lmoustache","\\rmoustache","\u23b0","\u23b1","/","\\backslash","|","\\vert","\\|","\\Vert","\\uparrow","\\Uparrow","\\downarrow","\\Downarrow","\\updownarrow","\\Updownarrow","."]);function jr(e){return"isMiddle"in e}function Wr(e,t){const r=ir(e);if(r&&Yr.has(r.text))return r;throw new n(r?"Invalid delimiter '"+r.text+"' after '"+t.funcName+"'":"Invalid delimiter type '"+e.type+"'",e)}function _r(e){if(!e.body)throw new Error("Bug: The leftright ParseNode wasn't fully parsed.")}mt({type:"delimsizing",names:["\\bigl","\\Bigl","\\biggl","\\Biggl","\\bigr","\\Bigr","\\biggr","\\Biggr","\\bigm","\\Bigm","\\biggm","\\Biggm","\\big","\\Big","\\bigg","\\Bigg"],numArgs:1,argTypes:["primitive"],handler:(e,t)=>{const r=Wr(t[0],e);return{type:"delimsizing",mode:e.parser.mode,size:Xr[e.funcName].size,mclass:Xr[e.funcName].mclass,delim:r.text}},htmlBuilder:(e,t)=>"."===e.delim?je([e.mclass]):Or(e.delim,e.size,t,e.mode,[e.mclass]),mathmlBuilder:e=>{const t=[];"."!==e.delim&&t.push(Et(e.delim,e.mode));const r=new Bt("mo",t);"mopen"===e.mclass||"mclose"===e.mclass?r.setAttribute("fence","true"):r.setAttribute("fence","false"),r.setAttribute("stretchy","true");const n=O(Nr[e.size]);return r.setAttribute("minsize",n),r.setAttribute("maxsize",n),r}}),mt({type:"leftright-right",names:["\\right"],numArgs:1,primitive:!0,handler:(e,t)=>{const r=e.parser.gullet.macros.get("\\current@color");if(r&&"string"!=typeof r)throw new n("\\current@color set to non-string in \\right");return{type:"leftright-right",mode:e.parser.mode,delim:Wr(t[0],e).text,color:r}}}),mt({type:"leftright",names:["\\left"],numArgs:1,primitive:!0,handler:(e,t)=>{const r=Wr(t[0],e),n=e.parser;++n.leftrightDepth;const o=n.parseExpression(!1);--n.leftrightDepth,n.expect("\\right",!1);const s=or(n.parseFunction(),"leftright-right");return{type:"leftright",mode:n.mode,body:o,left:r.text,right:s.delim,rightColor:s.color}},htmlBuilder:(e,t)=>{_r(e);const r=xt(e.body,t,!0,["mopen","mclose"]);let n,o,s=0,i=0,l=!1;for(let e=0;e{_r(e);const r=Pt(e.body,t);if("."!==e.left){const t=new Bt("mo",[Et(e.left,e.mode)]);t.setAttribute("fence","true"),r.unshift(t)}if("."!==e.right){const t=new Bt("mo",[Et(e.right,e.mode)]);t.setAttribute("fence","true"),e.rightColor&&t.setAttribute("mathcolor",e.rightColor),r.push(t)}return Nt(r)}}),mt({type:"middle",names:["\\middle"],numArgs:1,primitive:!0,handler:(e,t)=>{const r=Wr(t[0],e);if(!e.parser.leftrightDepth)throw new n("\\middle without preceding \\left",r);return{type:"middle",mode:e.parser.mode,delim:r.text}},htmlBuilder:(e,t)=>{let r;return"."===e.delim?r=St(t,[]):(r=Or(e.delim,1,t,e.mode,[]),r.isMiddle={delim:e.delim,options:t}),r},mathmlBuilder:(e,t)=>{const r="\\vert"===e.delim||"|"===e.delim?Et("|","text"):Et(e.delim,e.mode),n=new Bt("mo",[r]);return n.setAttribute("fence","true"),n.setAttribute("lspace","0.05em"),n.setAttribute("rspace","0.05em"),n}});mt({type:"enclose",names:["\\colorbox"],numArgs:2,allowedInText:!0,argTypes:["color","hbox"],handler(e,t,r){let{parser:n,funcName:o}=e;const s=or(t[0],"color-token").color,i=t[1];return{type:"enclose",mode:n.mode,label:o,backgroundColor:s,body:i}},htmlBuilder:(e,t)=>{const r=Ze(Mt(e.body,t),t),n=e.label.slice(1);let o,s,i=t.sizeMultiplier;const l=m(e.body);if("sout"===n)o=je(["stretchy","sout"]),o.height=t.fontMetrics().defaultRuleThickness/i,s=-.5*t.fontMetrics().xHeight;else if("phase"===n){const e=N({number:.6,unit:"pt"},t),n=N({number:.35,unit:"ex"},t);i/=t.havingBaseSizing().sizeMultiplier;const l=r.height+r.depth+e+n;r.style.paddingLeft=O(l/2+e);const c=Math.floor(1e3*l*i),h="M400000 "+(a=c)+" H0 L"+a/2+" 0 l65 45 L145 "+(a-80)+" H400000z",m=new _([new $("phase",h)],{width:"400em",height:O(c/1e3),viewBox:"0 0 400000 "+c,preserveAspectRatio:"xMinYMin slice"});o=We(["hide-tail"],[m],t),o.style.height=O(l),s=r.depth+e+n}else{let i,a;/cancel/.test(n)?l||r.classes.push("cancel-pad"):"angl"===n?r.classes.push("anglpad"):r.classes.push("boxpad");let c=0;/box/.test(n)?(c=Math.max(t.fontMetrics().fboxrule,t.minRuleThickness),i=t.fontMetrics().fboxsep+("colorbox"===n?0:c),a=i):"angl"===n?(c=Math.max(t.fontMetrics().defaultRuleThickness,t.minRuleThickness),i=4*c,a=Math.max(0,.25-r.depth)):(i=l?.2:0,a=i),o=function(e,t,r,n,o){let s;const i=e.height+e.depth+r+n;if(/fbox|color|angl/.test(t)){if(s=je(["stretchy",t],[],o),"fbox"===t){const e=o.color&&o.getColor();e&&(s.style.borderColor=e)}}else{const e=[];/^[bx]cancel$/.test(t)&&e.push(new Z({x1:"0",y1:"0",x2:"100%",y2:"100%","stroke-width":"0.046em"})),/^x?cancel$/.test(t)&&e.push(new Z({x1:"0",y1:"100%",x2:"100%",y2:"0","stroke-width":"0.046em"}));const r=new _(e,{width:"100%",height:O(i)});s=We([],[r],o)}return s.height=i,s.style.height=O(i),s}(r,n,i,a,t),/fbox|boxed|fcolorbox/.test(n)?(o.style.borderStyle="solid",o.style.borderWidth=O(c)):"angl"===n&&.049!==c&&(o.style.borderTopWidth=O(c),o.style.borderRightWidth=O(c)),s=r.depth+a,e.backgroundColor&&(o.style.backgroundColor=e.backgroundColor,e.borderColor&&(o.style.borderColor=e.borderColor))}var a;let c;if(e.backgroundColor)c=Ke({positionType:"individualShift",children:[{type:"elem",elem:o,shift:s},{type:"elem",elem:r,shift:0}]});else{const e=/cancel|phase/.test(n)?["svg-align"]:[];c=Ke({positionType:"individualShift",children:[{type:"elem",elem:r,shift:0},{type:"elem",elem:o,shift:s,wrapperClasses:e}]})}return/cancel/.test(n)&&(c.height=r.height,c.depth=r.depth),/cancel/.test(n)&&!l?je(["mord","cancel-lap"],[c],t):je(["mord"],[c],t)},mathmlBuilder:(e,t)=>{let r;const n=new Bt(e.label.includes("colorbox")?"mpadded":"menclose",[Vt(e.body,t)]);switch(e.label){case"\\cancel":n.setAttribute("notation","updiagonalstrike");break;case"\\bcancel":n.setAttribute("notation","downdiagonalstrike");break;case"\\phase":n.setAttribute("notation","phasorangle");break;case"\\sout":n.setAttribute("notation","horizontalstrike");break;case"\\fbox":n.setAttribute("notation","box");break;case"\\angl":n.setAttribute("notation","actuarial");break;case"\\fcolorbox":case"\\colorbox":if(r=t.fontMetrics().fboxsep*t.fontMetrics().ptPerEm,n.setAttribute("width","+"+2*r+"pt"),n.setAttribute("height","+"+2*r+"pt"),n.setAttribute("lspace",r+"pt"),n.setAttribute("voffset",r+"pt"),"\\fcolorbox"===e.label){const r=Math.max(t.fontMetrics().fboxrule,t.minRuleThickness);n.setAttribute("style","border: "+O(r)+" solid "+e.borderColor)}break;case"\\xcancel":n.setAttribute("notation","updiagonalstrike downdiagonalstrike")}return e.backgroundColor&&n.setAttribute("mathbackground",e.backgroundColor),n}}),mt({type:"enclose",names:["\\fcolorbox"],numArgs:3,allowedInText:!0,argTypes:["color","color","hbox"],handler(e,t,r){let{parser:n,funcName:o}=e;const s=or(t[0],"color-token").color,i=or(t[1],"color-token").color,l=t[2];return{type:"enclose",mode:n.mode,label:o,backgroundColor:i,borderColor:s,body:l}}}),mt({type:"enclose",names:["\\fbox"],numArgs:1,argTypes:["hbox"],allowedInText:!0,handler(e,t){let{parser:r}=e;return{type:"enclose",mode:r.mode,label:"\\fbox",body:t[0]}}}),mt({type:"enclose",names:["\\cancel","\\bcancel","\\xcancel","\\phase"],numArgs:1,handler(e,t){let{parser:r,funcName:n}=e;const o=t[0];return{type:"enclose",mode:r.mode,label:n,body:o}}}),mt({type:"enclose",names:["\\sout"],numArgs:1,allowedInText:!0,handler(e,t){let{parser:r,funcName:n}=e;"math"===r.mode&&r.settings.reportNonstrict("mathVsSout","LaTeX's \\sout works only in text mode");const o=t[0];return{type:"enclose",mode:r.mode,label:n,body:o}}}),mt({type:"enclose",names:["\\angl"],numArgs:1,argTypes:["hbox"],allowedInText:!1,handler(e,t){let{parser:r}=e;return{type:"enclose",mode:r.mode,label:"\\angl",body:t[0]}}});const $r={};function Zr(e){let{type:t,names:r,props:n,handler:o,htmlBuilder:s,mathmlBuilder:i}=e;const l={type:t,numArgs:n.numArgs||0,allowedInText:!1,numOptionalArgs:0,handler:o};for(let e=0;e{if(!e.parser.settings.displayMode)throw new n("{"+e.envName+"} can be used only in display mode.")},nn=new Set(["gather","gather*"]);function on(e){if(!e.includes("ed"))return!e.includes("*")}function sn(e,t,r){let{hskipBeforeAndAfter:o,addJot:s,cols:i,arraystretch:l,colSeparationType:a,autoTag:c,singleRow:h,emptySingleRow:m,maxNumCols:u,leqno:p}=t;if(e.gullet.beginGroup(),h||e.gullet.macros.set("\\cr","\\\\\\relax"),!l){const t=e.gullet.expandMacroAsText("\\arraystretch");if(null==t)l=1;else if(l=parseFloat(t),!l||l<0)throw new n("Invalid \\arraystretch: "+t)}e.gullet.beginGroup();let d=[];const g=[d],f=[],b=[],y=null!=c?[]:void 0;function x(){c&&e.gullet.macros.set("\\@eqnsw","1",!0)}function w(){y&&(e.gullet.macros.get("\\df@tag")?(y.push(e.subparse([new en("\\df@tag")])),e.gullet.macros.set("\\df@tag",void 0,!0)):y.push(Boolean(c)&&"1"===e.gullet.macros.get("\\@eqnsw")))}for(x(),b.push(tn(e));;){const t=e.parseExpression(!1,h?"\\end":"\\\\");e.gullet.endGroup(),e.gullet.beginGroup();let o={type:"ordgroup",mode:e.mode,body:t};r&&(o={type:"styling",mode:e.mode,style:r,resetFont:!0,body:[o]}),d.push(o);const s=e.fetch().text;if("&"===s){if(u&&d.length===u){if(h||a)throw new n("Too many tab characters: &",e.nextToken);e.settings.reportNonstrict("textEnv","Too few columns specified in the {array} column argument.")}e.consume()}else{if("\\end"===s){w(),1===d.length&&"styling"===o.type&&1===o.body.length&&"ordgroup"===o.body[0].type&&0===o.body[0].body.length&&(g.length>1||!m)&&g.pop(),b.length0&&(y+=.25),c.push({pos:y,isDashed:e[t]})}for(x(i[0]),r=0;r0&&(u+=b,ce))for(r=0;r=l)continue;var B,q;if(o>0||e.hskipBeforeAndAfter)i=null!=(B=null==(q=c)?void 0:q.pregap)?B:u,0!==i&&(z=je(["arraycolsep"],[]),z.style.width=O(i),k.push(z));const p=[];for(r=0;r0){const e=_e("hline",t,h),r=_e("hdashline",t,h),n=[{type:"elem",elem:H,shift:0}];for(;c.length>0;){const t=c.pop(),o=t.pos-w;t.isDashed?n.push({type:"elem",elem:r,shift:o}):n.push({type:"elem",elem:e,shift:o})}H=Ke({positionType:"individualShift",children:n})}if(0===A.length)return je(["mord"],[H],t);{const e=Ke({positionType:"individualShift",children:A}),r=je(["tag"],[e],t);return $e([H,r])}},cn={c:"center ",l:"left ",r:"right "},hn=function(e,t){const r=[],n=new Bt("mtd",[],["mtr-glue"]),o=new Bt("mtd",[],["mml-eqn-num"]);for(let s=0;s0){const t=e.cols;let r="",n=!1,o=0,i=t.length;"separator"===t[0].type&&(l+="top ",o=1),"separator"===t[t.length-1].type&&(l+="bottom ",i-=1);for(let e=o;e0?"left ":"",l+=h[h.length-1].length>0?"right ":"";for(let e=1;e0&&c&&(n=1),r[e]={type:"align",align:t,pregap:n,postgap:0}}return s.colSeparationType=c?"align":"alignat",s};Zr({type:"array",names:["array","darray"],props:{numArgs:1},handler(e,t){const r=(ir(t[0])?[t[0]]:or(t[0],"ordgroup").body).map(function(e){const t=sr(e).text;if("lcr".includes(t))return{type:"align",align:t};if("|"===t)return{type:"separator",separator:"|"};if(":"===t)return{type:"separator",separator:":"};throw new n("Unknown column alignment: "+t,e)}),o={cols:r,hskipBeforeAndAfter:!0,maxNumCols:r.length};return sn(e.parser,o,ln(e.envName))},htmlBuilder:an,mathmlBuilder:hn}),Zr({type:"array",names:["matrix","pmatrix","bmatrix","Bmatrix","vmatrix","Vmatrix","matrix*","pmatrix*","bmatrix*","Bmatrix*","vmatrix*","Vmatrix*"],props:{numArgs:0},handler(e){const t={matrix:null,pmatrix:["(",")"],bmatrix:["[","]"],Bmatrix:["\\{","\\}"],vmatrix:["|","|"],Vmatrix:["\\Vert","\\Vert"]}[e.envName.replace("*","")];let r="c";const o={hskipBeforeAndAfter:!1,cols:[{type:"align",align:r}]};if("*"===e.envName.charAt(e.envName.length-1)){const t=e.parser;if(t.consumeSpaces(),"["===t.fetch().text){if(t.consume(),t.consumeSpaces(),r=t.fetch().text,!"lcr".includes(r))throw new n("Expected l or c or r",t.nextToken);t.consume(),t.consumeSpaces(),t.expect("]"),t.consume(),o.cols=[{type:"align",align:r}]}}const s=sn(e.parser,o,ln(e.envName)),i=Math.max(0,...s.body.map(e=>e.length));return s.cols=new Array(i).fill({type:"align",align:r}),t?{type:"leftright",mode:e.mode,body:[s],left:t[0],right:t[1],rightColor:void 0}:s},htmlBuilder:an,mathmlBuilder:hn}),Zr({type:"array",names:["smallmatrix"],props:{numArgs:0},handler(e){const t=sn(e.parser,{arraystretch:.5},"script");return t.colSeparationType="small",t},htmlBuilder:an,mathmlBuilder:hn}),Zr({type:"array",names:["subarray"],props:{numArgs:1},handler(e,t){const r=(ir(t[0])?[t[0]]:or(t[0],"ordgroup").body).map(function(e){const t=sr(e).text;if("lc".includes(t))return{type:"align",align:t};throw new n("Unknown column alignment: "+t,e)});if(r.length>1)throw new n("{subarray} can contain only one column");const o={cols:r,hskipBeforeAndAfter:!1,arraystretch:.5},s=sn(e.parser,o,"script");if(s.body.length>0&&s.body[0].length>1)throw new n("{subarray} can contain only one column");return s},htmlBuilder:an,mathmlBuilder:hn}),Zr({type:"array",names:["cases","dcases","rcases","drcases"],props:{numArgs:0},handler(e){const t=sn(e.parser,{arraystretch:1.2,cols:[{type:"align",align:"l",pregap:0,postgap:1},{type:"align",align:"l",pregap:0,postgap:0}]},ln(e.envName));return{type:"leftright",mode:e.mode,body:[t],left:e.envName.includes("r")?".":"\\{",right:e.envName.includes("r")?"\\}":".",rightColor:void 0}},htmlBuilder:an,mathmlBuilder:hn}),Zr({type:"array",names:["align","align*","aligned","split"],props:{numArgs:0},handler:mn,htmlBuilder:an,mathmlBuilder:hn}),Zr({type:"array",names:["gathered","gather","gather*"],props:{numArgs:0},handler(e){nn.has(e.envName)&&rn(e);const t={cols:[{type:"align",align:"c"}],addJot:!0,colSeparationType:"gather",autoTag:on(e.envName),emptySingleRow:!0,leqno:e.parser.settings.leqno};return sn(e.parser,t,"display")},htmlBuilder:an,mathmlBuilder:hn}),Zr({type:"array",names:["alignat","alignat*","alignedat"],props:{numArgs:1},handler:mn,htmlBuilder:an,mathmlBuilder:hn}),Zr({type:"array",names:["equation","equation*"],props:{numArgs:0},handler(e){rn(e);const t={autoTag:on(e.envName),emptySingleRow:!0,singleRow:!0,maxNumCols:1,leqno:e.parser.settings.leqno};return sn(e.parser,t,"display")},htmlBuilder:an,mathmlBuilder:hn}),Zr({type:"array",names:["CD"],props:{numArgs:0},handler(e){return rn(e),function(e){const t=[];for(e.gullet.beginGroup(),e.gullet.macros.set("\\cr","\\\\\\relax"),e.gullet.beginGroup();;){t.push(e.parseExpression(!1,"\\\\")),e.gullet.endGroup(),e.gullet.beginGroup();const r=e.fetch().text;if("&"!==r&&"\\\\"!==r){if("\\end"===r){0===t[t.length-1].length&&t.pop();break}throw new n("Expected \\\\ or \\cr or \\end",e.nextToken)}e.consume()}let r=[];const o=[r];for(let s=0;sAV".includes(o))throw new n('Expected one of "<>AV=|." after @',i[t]);for(let e=0;e<2;e++){let r=!0;for(let l=t+1;l{let{parser:r,funcName:n}=e;const o=pt(t[0]),s=n in pn?pn[n]:n;return{type:"font",mode:r.mode,font:s.slice(1),body:o}},htmlBuilder:(e,t)=>{const r=e.font,n=t.withFont(r);return Mt(e.body,n)},mathmlBuilder:(e,t)=>{const r=e.font,n=t.withFont(r);return Vt(e.body,n)}}),mt({type:"mclass",names:["\\boldsymbol","\\bm"],numArgs:1,handler:(e,t)=>{let{parser:r}=e;const n=t[0];return{type:"mclass",mode:r.mode,mclass:mr(n),body:[{type:"font",mode:r.mode,font:"boldsymbol",body:n}],isCharacterBox:m(n)}}}),mt({type:"font",names:["\\rm","\\sf","\\tt","\\bf","\\it","\\cal"],numArgs:0,allowedInText:!0,handler:(e,t)=>{let{parser:r,funcName:n,breakOnTokenText:o}=e;const{mode:s}=r,i=r.parseExpression(!0,o);return{type:"font",mode:s,font:"math"+n.slice(1),body:{type:"ordgroup",mode:r.mode,body:i}}}});const dn=(e,t)=>{if(!t)return e;return{type:"styling",mode:e.mode,style:t,body:[e]}};mt({type:"genfrac",names:["\\cfrac","\\dfrac","\\frac","\\tfrac","\\dbinom","\\binom","\\tbinom","\\\\atopfrac","\\\\bracefrac","\\\\brackfrac"],numArgs:2,allowedInArgument:!0,handler:(e,t)=>{let{parser:r,funcName:n}=e;const o=t[0],s=t[1];let i,l=null,a=null;switch(n){case"\\cfrac":case"\\dfrac":case"\\frac":case"\\tfrac":i=!0;break;case"\\\\atopfrac":i=!1;break;case"\\dbinom":case"\\binom":case"\\tbinom":i=!1,l="(",a=")";break;case"\\\\bracefrac":i=!1,l="\\{",a="\\}";break;case"\\\\brackfrac":i=!1,l="[",a="]";break;default:throw new Error("Unrecognized genfrac command")}const c="\\cfrac"===n;let h=null;return c||n.startsWith("\\d")?h="display":n.startsWith("\\t")&&(h="text"),dn({type:"genfrac",mode:r.mode,numer:o,denom:s,continued:c,hasBarLine:i,leftDelim:l,rightDelim:a,barSize:null},h)},htmlBuilder:(e,t)=>{const r=t.style,n=r.fracNum(),o=r.fracDen();let s;s=t.havingStyle(n);const i=Mt(e.numer,s,t);if(e.continued){const e=8.5/t.fontMetrics().ptPerEm,r=3.5/t.fontMetrics().ptPerEm;i.height=i.height0?3*h:7*h,p=t.fontMetrics().denom1):(c>0?(m=t.fontMetrics().num2,u=h):(m=t.fontMetrics().num3,u=3*h),p=t.fontMetrics().denom2),a){const e=t.fontMetrics().axisHeight;m-i.depth-(e+.5*c){const r=new Bt("mfrac",[Vt(e.numer,t),Vt(e.denom,t)]);if(e.hasBarLine){if(e.barSize){const n=N(e.barSize,t);r.setAttribute("linethickness",O(n))}}else r.setAttribute("linethickness","0px");if(null!=e.leftDelim||null!=e.rightDelim){const t=[];if(null!=e.leftDelim){const r=new Bt("mo",[new qt(e.leftDelim.replace("\\",""))]);r.setAttribute("fence","true"),t.push(r)}if(t.push(r),null!=e.rightDelim){const r=new Bt("mo",[new qt(e.rightDelim.replace("\\",""))]);r.setAttribute("fence","true"),t.push(r)}return Nt(t)}return r}}),mt({type:"infix",names:["\\over","\\choose","\\atop","\\brace","\\brack"],numArgs:0,infix:!0,handler(e){let t,{parser:r,funcName:n,token:o}=e;switch(n){case"\\over":t="\\frac";break;case"\\choose":t="\\binom";break;case"\\atop":t="\\\\atopfrac";break;case"\\brace":t="\\\\bracefrac";break;case"\\brack":t="\\\\brackfrac";break;default:throw new Error("Unrecognized infix genfrac command")}return{type:"infix",mode:r.mode,replaceWith:t,token:o}}});const gn=["display","text","script","scriptscript"],fn=function(e){let t=null;return e.length>0&&(t=e,t="."===t?null:t),t};mt({type:"genfrac",names:["\\genfrac"],numArgs:6,allowedInArgument:!0,argTypes:["math","math","size","text","math","math"],handler(e,t){let{parser:r}=e;const n=t[4],o=t[5],s=pt(t[0]),i="atom"===s.type&&"open"===s.family?fn(s.text):null,l=pt(t[1]),a="atom"===l.type&&"close"===l.family?fn(l.text):null,c=or(t[2],"size");let h,m=null;c.isBlank?h=!0:(m=c.value,h=m.number>0);let u=null,p=t[3];if("ordgroup"===p.type){if(p.body.length>0){const e=or(p.body[0],"textord");u=gn[Number(e.text)]}}else p=or(p,"textord"),u=gn[Number(p.text)];return dn({type:"genfrac",mode:r.mode,numer:n,denom:o,continued:!1,hasBarLine:h,barSize:m,leftDelim:i,rightDelim:a},u)}}),mt({type:"infix",names:["\\above"],numArgs:1,argTypes:["size"],infix:!0,handler(e,t){let{parser:r,funcName:n,token:o}=e;return{type:"infix",mode:r.mode,replaceWith:"\\\\abovefrac",size:or(t[0],"size").value,token:o}}}),mt({type:"genfrac",names:["\\\\abovefrac"],numArgs:3,argTypes:["math","size","math"],handler:(e,t)=>{let{parser:r,funcName:n}=e;const o=t[0],s=or(t[1],"infix").size;if(!s)throw new Error("\\\\abovefrac expected size, but got "+String(s));const i=t[2],l=s.number>0;return{type:"genfrac",mode:r.mode,numer:o,denom:i,continued:!1,hasBarLine:l,barSize:s,leftDelim:null,rightDelim:null}}});const bn=(e,t)=>{const r=t.style;let n,o;"supsub"===e.type?(n=e.sup?Mt(e.sup,t.havingStyle(r.sup()),t):Mt(e.sub,t.havingStyle(r.sub()),t),o=or(e.base,"horizBrace")):o=or(e,"horizBrace");const s=Mt(o.base,t.havingBaseStyle(S.DISPLAY)),i=tr(o,t);let l;if(l=o.isOver?Ke({positionType:"firstBaseline",children:[{type:"elem",elem:s},{type:"kern",size:.1},{type:"elem",elem:i,wrapperClasses:["svg-align"]}]}):Ke({positionType:"bottom",positionData:s.depth+.1+i.height,children:[{type:"elem",elem:i,wrapperClasses:["svg-align"]},{type:"kern",size:.1},{type:"elem",elem:s}]}),n){const e=je(["minner",o.isOver?"mover":"munder"],[l],t);l=o.isOver?Ke({positionType:"firstBaseline",children:[{type:"elem",elem:e},{type:"kern",size:.2},{type:"elem",elem:n}]}):Ke({positionType:"bottom",positionData:e.depth+.2+n.height+n.depth,children:[{type:"elem",elem:n},{type:"kern",size:.2},{type:"elem",elem:e}]})}return je(["minner",o.isOver?"mover":"munder"],[l],t)};mt({type:"horizBrace",names:["\\overbrace","\\underbrace","\\overbracket","\\underbracket"],numArgs:1,handler(e,t){let{parser:r,funcName:n}=e;return{type:"horizBrace",mode:r.mode,label:n,isOver:n.includes("\\over"),base:t[0]}},htmlBuilder:bn,mathmlBuilder:(e,t)=>{const r=Jt(e.label);return new Bt(e.isOver?"mover":"munder",[Vt(e.base,t),r])}}),mt({type:"href",names:["\\href"],numArgs:2,argTypes:["url","original"],allowedInText:!0,handler:(e,t)=>{let{parser:r}=e;const n=t[1],o=or(t[0],"url").url;return r.settings.isTrusted({command:"\\href",url:o})?{type:"href",mode:r.mode,href:o,body:dt(n)}:r.formatUnsupportedCmd("\\href")},htmlBuilder:(e,t)=>{const r=xt(e.body,t,!1);return function(e,t,r,n){const o=new X(e,t,r,n);return Ye(o),o}(e.href,[],r,t)},mathmlBuilder:(e,t)=>{let r=Ft(e.body,t);return r instanceof Bt||(r=new Bt("mrow",[r])),r.setAttribute("href",e.href),r}}),mt({type:"href",names:["\\url"],numArgs:1,argTypes:["url"],allowedInText:!0,handler:(e,t)=>{let{parser:r}=e;const n=or(t[0],"url").url;if(!r.settings.isTrusted({command:"\\url",url:n}))return r.formatUnsupportedCmd("\\url");const o=[];for(let e=0;e{let{parser:r,funcName:o,token:s}=e;const i=or(t[0],"raw").string,l=t[1];let a;r.settings.strict&&r.settings.reportNonstrict("htmlExtension","HTML extension is disabled on strict mode");const c={};switch(o){case"\\htmlClass":c.class=i,a={command:"\\htmlClass",class:i};break;case"\\htmlId":c.id=i,a={command:"\\htmlId",id:i};break;case"\\htmlStyle":c.style=i,a={command:"\\htmlStyle",style:i};break;case"\\htmlData":{const e=i.split(",");for(let t=0;t{const r=xt(e.body,t,!1),n=["enclosing"];e.attributes.class&&n.push(...e.attributes.class.trim().split(/\s+/));const o=je(n,r,t);for(const t in e.attributes)"class"!==t&&e.attributes.hasOwnProperty(t)&&o.setAttribute(t,e.attributes[t]);return o},mathmlBuilder:(e,t)=>Ft(e.body,t)}),mt({type:"htmlmathml",names:["\\html@mathml"],numArgs:2,allowedInArgument:!0,allowedInText:!0,handler:(e,t)=>{let{parser:r}=e;return{type:"htmlmathml",mode:r.mode,html:dt(t[0]),mathml:dt(t[1])}},htmlBuilder:(e,t)=>{const r=xt(e.html,t,!1);return $e(r)},mathmlBuilder:(e,t)=>Ft(e.mathml,t)});const yn=function(e){if(/^[-+]? *(\d+(\.\d*)?|\.\d+)$/.test(e))return{number:+e,unit:"bp"};{const t=/([-+]?) *(\d+(?:\.\d*)?|\.\d+) *([a-z]{2})/.exec(e);if(!t)throw new n("Invalid size: '"+e+"' in \\includegraphics");const r={number:+(t[1]+t[2]),unit:t[3]};if(!E(r))throw new n("Invalid unit: '"+r.unit+"' in \\includegraphics.");return r}};mt({type:"includegraphics",names:["\\includegraphics"],numArgs:1,numOptionalArgs:1,argTypes:["raw","url"],allowedInText:!1,handler:(e,t,r)=>{let{parser:o}=e,s={number:0,unit:"em"},i={number:.9,unit:"em"},l={number:0,unit:"em"},a="";if(r[0]){const e=or(r[0],"raw").string.split(",");for(let t=0;t{const r=N(e.height,t);let n=0;e.totalheight.number>0&&(n=N(e.totalheight,t)-r);let o=0;e.width.number>0&&(o=N(e.width,t));const s={height:O(r+n)};o>0&&(s.width=O(o)),n>0&&(s.verticalAlign=O(-n));const i=new Y(e.src,e.alt,s);return i.height=r,i.depth=n,i},mathmlBuilder:(e,t)=>{const r=new Bt("mglyph",[]);r.setAttribute("alt",e.alt);const n=N(e.height,t);let o=0;if(e.totalheight.number>0&&(o=N(e.totalheight,t)-n,r.setAttribute("valign",O(-o))),r.setAttribute("height",O(n+o)),e.width.number>0){const n=N(e.width,t);r.setAttribute("width",O(n))}return r.setAttribute("src",e.src),r}}),mt({type:"kern",names:["\\kern","\\mkern","\\hskip","\\mskip"],numArgs:1,argTypes:["size"],primitive:!0,allowedInText:!0,handler(e,t){let{parser:r,funcName:n}=e;const o=or(t[0],"size");if(r.settings.strict){const e="m"===n[1],t="mu"===o.value.unit;e?(t||r.settings.reportNonstrict("mathVsTextUnits","LaTeX's "+n+" supports only mu units, not "+o.value.unit+" units"),"math"!==r.mode&&r.settings.reportNonstrict("mathVsTextUnits","LaTeX's "+n+" works only in math mode")):t&&r.settings.reportNonstrict("mathVsTextUnits","LaTeX's "+n+" doesn't support mu units")}return{type:"kern",mode:r.mode,dimension:o.value}},htmlBuilder(e,t){return Je(e.dimension,t)},mathmlBuilder(e,t){const r=N(e.dimension,t);return new It(r)}}),mt({type:"lap",names:["\\mathllap","\\mathrlap","\\mathclap"],numArgs:1,allowedInText:!0,handler:(e,t)=>{let{parser:r,funcName:n}=e;const o=t[0];return{type:"lap",mode:r.mode,alignment:n.slice(5),body:o}},htmlBuilder:(e,t)=>{let r;"clap"===e.alignment?(r=je([],[Mt(e.body,t)]),r=je(["inner"],[r],t)):r=je(["inner"],[Mt(e.body,t)]);const n=je(["fix"],[]);let o=je([e.alignment],[r,n],t);const s=je(["strut"]);return s.style.height=O(o.height+o.depth),o.depth&&(s.style.verticalAlign=O(-o.depth)),o.children.unshift(s),o=je(["thinbox"],[o],t),je(["mord","vbox"],[o],t)},mathmlBuilder:(e,t)=>{const r=new Bt("mpadded",[Vt(e.body,t)]);if("rlap"!==e.alignment){const t="llap"===e.alignment?"-1":"-0.5";r.setAttribute("lspace",t+"width")}return r.setAttribute("width","0px"),r}}),mt({type:"styling",names:["\\(","$"],numArgs:0,allowedInText:!0,allowedInMath:!1,handler(e,t){let{funcName:r,parser:n}=e;const o=n.mode;n.switchMode("math");const s="\\("===r?"\\)":"$",i=n.parseExpression(!1,s);return n.expect(s),n.switchMode(o),{type:"styling",mode:n.mode,style:"text",resetFont:!0,body:i}}}),mt({type:"text",names:["\\)","\\]"],numArgs:0,allowedInText:!0,allowedInMath:!1,handler(e,t){throw new n("Mismatched "+e.funcName)}});const xn=(e,t)=>{switch(t.style.size){case S.DISPLAY.size:return e.display;case S.TEXT.size:return e.text;case S.SCRIPT.size:return e.script;case S.SCRIPTSCRIPT.size:return e.scriptscript;default:return e.text}};mt({type:"mathchoice",names:["\\mathchoice"],numArgs:4,primitive:!0,handler:(e,t)=>{let{parser:r}=e;return{type:"mathchoice",mode:r.mode,display:dt(t[0]),text:dt(t[1]),script:dt(t[2]),scriptscript:dt(t[3])}},htmlBuilder:(e,t)=>{const r=xn(e,t),n=xt(r,t,!1);return $e(n)},mathmlBuilder:(e,t)=>{const r=xn(e,t);return Ft(r,t)}});const wn=(e,t,r,n,o,s,i)=>{e=je([],[e]);const l=r&&m(r);let a,c,h;if(t){const e=Mt(t,n.havingStyle(o.sup()),n);c={elem:e,kern:Math.max(n.fontMetrics().bigOpSpacing1,n.fontMetrics().bigOpSpacing3-e.depth)}}if(r){const e=Mt(r,n.havingStyle(o.sub()),n);a={elem:e,kern:Math.max(n.fontMetrics().bigOpSpacing2,n.fontMetrics().bigOpSpacing4-e.height)}}if(c&&a){const t=n.fontMetrics().bigOpSpacing5+a.elem.height+a.elem.depth+a.kern+e.depth+i;h=Ke({positionType:"bottom",positionData:t,children:[{type:"kern",size:n.fontMetrics().bigOpSpacing5},{type:"elem",elem:a.elem,marginLeft:O(-s)},{type:"kern",size:a.kern},{type:"elem",elem:e},{type:"kern",size:c.kern},{type:"elem",elem:c.elem,marginLeft:O(s)},{type:"kern",size:n.fontMetrics().bigOpSpacing5}]})}else if(a){const t=e.height-i;h=Ke({positionType:"top",positionData:t,children:[{type:"kern",size:n.fontMetrics().bigOpSpacing5},{type:"elem",elem:a.elem,marginLeft:O(-s)},{type:"kern",size:a.kern},{type:"elem",elem:e}]})}else{if(!c)return e;{const t=e.depth+i;h=Ke({positionType:"bottom",positionData:t,children:[{type:"elem",elem:e},{type:"kern",size:c.kern},{type:"elem",elem:c.elem,marginLeft:O(s)},{type:"kern",size:n.fontMetrics().bigOpSpacing5}]})}}const u=[h];if(a&&0!==s&&!l){const e=je(["mspace"],[],n);e.style.marginRight=O(s),u.unshift(e)}return je(["mop","op-limits"],u,n)},vn=new Set(["\\smallint"]),kn=(e,t)=>{let r,n,o,s=!1;"supsub"===e.type?(r=e.sup,n=e.sub,o=or(e.base,"op"),s=!0):o=or(e,"op");const i=t.style;let l,a,c=!1;if(i.size===S.DISPLAY.size&&o.symbol&&!vn.has(o.name)&&(c=!0),o.symbol){const e=c?"Size2-Regular":"Size1-Regular";let r="";if("\\oiint"!==o.name&&"\\oiiint"!==o.name||(r=o.name.slice(1),o.name="oiint"===r?"\\iint":"\\iiint"),l=Fe(o.name,e,"math",t,["mop","op-symbol",c?"large-op":"small-op"]),a=l.italic,r.length>0){const e=rt(r+"Size"+(c?"2":"1"),t);l=Ke({positionType:"individualShift",children:[{type:"elem",elem:l,shift:0},{type:"elem",elem:e,shift:c?.08:0}]}),o.name="\\"+r,l.classes.unshift("mop"),l.italic=a}}else if(o.body){const e=xt(o.body,t,!0);1===e.length&&e[0]instanceof W?(l=e[0],l.classes[0]="mop"):l=je(["mop"],e,t)}else{const e=[];for(let r=1;r{let{parser:r,funcName:n}=e,o=n;return 1===o.length&&(o=zn[o]),{type:"op",mode:r.mode,limits:!0,parentIsSupSub:!1,symbol:!0,name:o}},htmlBuilder:kn,mathmlBuilder:(e,t)=>{let r;if(e.symbol)r=new Bt("mo",[Et(e.name,e.mode)]),vn.has(e.name)&&r.setAttribute("largeop","false");else if(e.body)r=new Bt("mo",Pt(e.body,t));else{r=new Bt("mi",[new qt(e.name.slice(1))]);const t=new Bt("mo",[Et("\u2061","text")]);r=e.parentIsSupSub?new Bt("mrow",[r,t]):Ct([r,t])}return r}}),mt({type:"op",names:["\\mathop"],numArgs:1,primitive:!0,handler:(e,t)=>{let{parser:r}=e;const n=t[0];return{type:"op",mode:r.mode,limits:!1,parentIsSupSub:!1,symbol:!1,body:dt(n)}}});const Sn={"\u222b":"\\int","\u222c":"\\iint","\u222d":"\\iiint","\u222e":"\\oint","\u222f":"\\oiint","\u2230":"\\oiiint"};mt({type:"op",names:["\\arcsin","\\arccos","\\arctan","\\arctg","\\arcctg","\\arg","\\ch","\\cos","\\cosec","\\cosh","\\cot","\\cotg","\\coth","\\csc","\\ctg","\\cth","\\deg","\\dim","\\exp","\\hom","\\ker","\\lg","\\ln","\\log","\\sec","\\sin","\\sinh","\\sh","\\tan","\\tanh","\\tg","\\th"],numArgs:0,handler(e){let{parser:t,funcName:r}=e;return{type:"op",mode:t.mode,limits:!1,parentIsSupSub:!1,symbol:!1,name:r}}}),mt({type:"op",names:["\\det","\\gcd","\\inf","\\lim","\\max","\\min","\\Pr","\\sup"],numArgs:0,handler(e){let{parser:t,funcName:r}=e;return{type:"op",mode:t.mode,limits:!0,parentIsSupSub:!1,symbol:!1,name:r}}}),mt({type:"op",names:["\\int","\\iint","\\iiint","\\oint","\\oiint","\\oiiint","\u222b","\u222c","\u222d","\u222e","\u222f","\u2230"],numArgs:0,allowedInArgument:!0,handler(e){let{parser:t,funcName:r}=e,n=r;return 1===n.length&&(n=Sn[n]),{type:"op",mode:t.mode,limits:!1,parentIsSupSub:!1,symbol:!0,name:n}}});const Mn=(e,t)=>{let r,n,o,s,i=!1;if("supsub"===e.type?(r=e.sup,n=e.sub,o=or(e.base,"operatorname"),i=!0):o=or(e,"operatorname"),o.body.length>0){const e=o.body.map(e=>{const t="text"in e?e.text:void 0;return"string"==typeof t?{type:"textord",mode:e.mode,text:t}:e}),r=xt(e,t.withFont("mathrm"),!0);for(let e=0;e{let{parser:r,funcName:n}=e;const o=t[0];return{type:"operatorname",mode:r.mode,body:dt(o),alwaysHandleSupSub:"\\operatornamewithlimits"===n,limits:!1,parentIsSupSub:!1}},htmlBuilder:Mn,mathmlBuilder:(e,t)=>{let r=Pt(e.body,t.withFont("mathrm")),n=!0;for(let e=0;ee.toText()).join("");r=[new qt(e)]}const o=new Bt("mi",r);o.setAttribute("mathvariant","normal");const s=new Bt("mo",[Et("\u2061","text")]);return e.parentIsSupSub?new Bt("mrow",[o,s]):Ct([o,s])}}),Jr("\\operatorname","\\@ifstar\\operatornamewithlimits\\operatorname@"),ut({type:"ordgroup",htmlBuilder(e,t){return e.semisimple?$e(xt(e.body,t,!1)):je(["mord"],xt(e.body,t,!0),t)},mathmlBuilder(e,t){return Ft(e.body,t,!0)}}),mt({type:"overline",names:["\\overline"],numArgs:1,handler(e,t){let{parser:r}=e;const n=t[0];return{type:"overline",mode:r.mode,body:n}},htmlBuilder(e,t){const r=Mt(e.body,t.havingCrampedStyle()),n=_e("overline-line",t),o=t.fontMetrics().defaultRuleThickness,s=Ke({positionType:"firstBaseline",children:[{type:"elem",elem:r},{type:"kern",size:3*o},{type:"elem",elem:n},{type:"kern",size:o}]});return je(["mord","overline"],[s],t)},mathmlBuilder(e,t){const r=new Bt("mo",[new qt("\u203e")]);r.setAttribute("stretchy","true");const n=new Bt("mover",[Vt(e.body,t),r]);return n.setAttribute("accent","true"),n}}),mt({type:"phantom",names:["\\phantom"],numArgs:1,allowedInText:!0,handler:(e,t)=>{let{parser:r}=e;const n=t[0];return{type:"phantom",mode:r.mode,body:dt(n)}},htmlBuilder:(e,t)=>{const r=xt(e.body,t.withPhantom(),!1);return $e(r)},mathmlBuilder:(e,t)=>{const r=Pt(e.body,t);return new Bt("mphantom",r)}}),Jr("\\hphantom","\\smash{\\phantom{#1}}"),mt({type:"vphantom",names:["\\vphantom"],numArgs:1,allowedInText:!0,handler:(e,t)=>{let{parser:r}=e;const n=t[0];return{type:"vphantom",mode:r.mode,body:n}},htmlBuilder:(e,t)=>{const r=je(["inner"],[Mt(e.body,t.withPhantom())]),n=je(["fix"],[]);return je(["mord","rlap"],[r,n],t)},mathmlBuilder:(e,t)=>{const r=Pt(dt(e.body),t),n=new Bt("mphantom",r),o=new Bt("mpadded",[n]);return o.setAttribute("width","0px"),o}}),mt({type:"raisebox",names:["\\raisebox"],numArgs:2,argTypes:["size","hbox"],allowedInText:!0,handler(e,t){let{parser:r}=e;const n=or(t[0],"size").value,o=t[1];return{type:"raisebox",mode:r.mode,dy:n,body:o}},htmlBuilder(e,t){const r=Mt(e.body,t),n=N(e.dy,t);return Ke({positionType:"shift",positionData:-n,children:[{type:"elem",elem:r}]})},mathmlBuilder(e,t){const r=new Bt("mpadded",[Vt(e.body,t)]),n=e.dy.number+e.dy.unit;return r.setAttribute("voffset",n),r}}),mt({type:"internal",names:["\\relax"],numArgs:0,allowedInText:!0,allowedInArgument:!0,handler(e){let{parser:t}=e;return{type:"internal",mode:t.mode}}}),mt({type:"rule",names:["\\rule"],numArgs:2,numOptionalArgs:1,allowedInText:!0,allowedInMath:!0,argTypes:["size","size","size"],handler(e,t,r){let{parser:n}=e;const o=r[0],s=or(t[0],"size"),i=or(t[1],"size");return{type:"rule",mode:n.mode,shift:o&&or(o,"size").value,width:s.value,height:i.value}},htmlBuilder(e,t){const r=je(["mord","rule"],[],t),n=N(e.width,t),o=N(e.height,t),s=e.shift?N(e.shift,t):0;return r.style.borderRightWidth=O(n),r.style.borderTopWidth=O(o),r.style.bottom=O(s),r.width=n,r.height=o+s,r.depth=-s,r.maxFontSize=1.125*o*t.sizeMultiplier,r},mathmlBuilder(e,t){const r=N(e.width,t),n=N(e.height,t),o=e.shift?N(e.shift,t):0,s=t.color&&t.getColor()||"black",i=new Bt("mspace");i.setAttribute("mathbackground",s),i.setAttribute("width",O(r)),i.setAttribute("height",O(n));const l=new Bt("mpadded",[i]);return o>=0?l.setAttribute("height",O(o)):(l.setAttribute("height",O(o)),l.setAttribute("depth",O(-o))),l.setAttribute("voffset",O(o)),l}});const Tn=["\\tiny","\\sixptsize","\\scriptsize","\\footnotesize","\\small","\\normalsize","\\large","\\Large","\\LARGE","\\huge","\\Huge"];mt({type:"sizing",names:Tn,numArgs:0,allowedInText:!0,handler:(e,t)=>{let{breakOnTokenText:r,funcName:n,parser:o}=e;const s=o.parseExpression(!1,r);return{type:"sizing",mode:o.mode,size:Tn.indexOf(n)+1,body:s}},htmlBuilder:(e,t)=>{const r=t.havingSize(e.size);return An(e.body,r,t)},mathmlBuilder:(e,t)=>{const r=t.havingSize(e.size),n=Pt(e.body,r),o=new Bt("mstyle",n);return o.setAttribute("mathsize",O(r.sizeMultiplier)),o}}),mt({type:"smash",names:["\\smash"],numArgs:1,numOptionalArgs:1,allowedInText:!0,handler:(e,t,r)=>{let{parser:n}=e,o=!1,s=!1;const i=r[0]&&or(r[0],"ordgroup");if(i){let e;for(let t=0;t{const r=je([],[Mt(e.body,t)]);if(!e.smashHeight&&!e.smashDepth)return r;if(e.smashHeight&&(r.height=0),e.smashDepth&&(r.depth=0),e.smashHeight&&e.smashDepth)return je(["mord","smash"],[r],t);if(r.children)for(let t=0;t{const r=new Bt("mpadded",[Vt(e.body,t)]);return e.smashHeight&&r.setAttribute("height","0px"),e.smashDepth&&r.setAttribute("depth","0px"),r}}),mt({type:"sqrt",names:["\\sqrt"],numArgs:1,numOptionalArgs:1,handler(e,t,r){let{parser:n}=e;const o=r[0],s=t[0];return{type:"sqrt",mode:n.mode,body:s,index:o}},htmlBuilder(e,t){let r=Mt(e.body,t.havingCrampedStyle());0===r.height&&(r.height=t.fontMetrics().xHeight),r=Ze(r,t);const n=t.fontMetrics().defaultRuleThickness;let o=n;t.style.idr.height+r.depth+s&&(s=(s+h-r.height-r.depth)/2);const m=l.height-r.height-s-a;r.style.paddingLeft=O(c);const u=Ke({positionType:"firstBaseline",children:[{type:"elem",elem:r,wrapperClasses:["svg-align"]},{type:"kern",size:-(r.height+m)},{type:"elem",elem:l},{type:"kern",size:a}]});if(e.index){const r=t.havingStyle(S.SCRIPTSCRIPT),n=Mt(e.index,r,t),o=.6*(u.height-u.depth),s=Ke({positionType:"shift",positionData:-o,children:[{type:"elem",elem:n}]}),i=je(["root"],[s]);return je(["mord","sqrt"],[i,u],t)}return je(["mord","sqrt"],[u],t)},mathmlBuilder(e,t){const{body:r,index:n}=e;return n?new Bt("mroot",[Vt(r,t),Vt(n,t)]):new Bt("msqrt",[Vt(r,t)])}});const Cn={display:S.DISPLAY,text:S.TEXT,script:S.SCRIPT,scriptscript:S.SCRIPTSCRIPT};mt({type:"styling",names:["\\displaystyle","\\textstyle","\\scriptstyle","\\scriptscriptstyle"],numArgs:0,allowedInText:!0,primitive:!0,handler(e,t){let{breakOnTokenText:r,funcName:n,parser:o}=e;const s=o.parseExpression(!0,r),i=n.slice(1,n.length-5);if(!(i in Cn))throw new Error("Unknown style: "+i);return{type:"styling",mode:o.mode,style:i,body:s}},htmlBuilder(e,t){const r=Cn[e.style];let n=t.havingStyle(r);return e.resetFont&&(n=n.withFont("")),An(e.body,n,t)},mathmlBuilder(e,t){const r=Cn[e.style];let n=t.havingStyle(r);e.resetFont&&(n=n.withFont(""));const o=Pt(e.body,n),s=new Bt("mstyle",o),i={display:["0","true"],text:["0","false"],script:["1","false"],scriptscript:["2","false"]}[e.style];return s.setAttribute("scriptlevel",i[0]),s.setAttribute("displaystyle",i[1]),s}});ut({type:"supsub",htmlBuilder(e,t){const r=function(e,t){const r=e.base;if(r)return"op"===r.type?r.limits&&(t.style.size===S.DISPLAY.size||r.alwaysHandleSupSub)?kn:null:"operatorname"===r.type?r.alwaysHandleSupSub&&(t.style.size===S.DISPLAY.size||r.limits)?Mn:null:"accent"===r.type?m(r.base)?ar:null:"horizBrace"===r.type&&!e.sub===r.isOver?bn:null;return null}(e,t);if(r)return r(e,t);const{base:n,sup:o,sub:s}=e,i=Mt(n,t);let l,a;const c=t.fontMetrics();let h=0,u=0;const p=n&&m(n);if(o){const e=t.havingStyle(t.style.sup());l=Mt(o,e,t),p||(h=i.height-e.fontMetrics().supDrop*e.sizeMultiplier/t.sizeMultiplier)}if(s){const e=t.havingStyle(t.style.sub());a=Mt(s,e,t),p||(u=i.depth+e.fontMetrics().subDrop*e.sizeMultiplier/t.sizeMultiplier)}let d;d=t.style===S.DISPLAY?c.sup1:t.style.cramped?c.sup3:c.sup2;const g=t.sizeMultiplier,f=O(.5/c.ptPerEm/g);let b,y=null;if(a){const t=e.base&&"op"===e.base.type&&e.base.name&&("\\oiint"===e.base.name||"\\oiiint"===e.base.name);var x;if(i instanceof W||t)y=O(-(null!=(x=i.italic)?x:0))}if(l&&a){h=Math.max(h,d,l.depth+.25*c.xHeight),u=Math.max(u,c.sub2);const e=4*c.defaultRuleThickness;if(h-l.depth-(a.height-u)0&&(h+=t,u-=t)}b=Ke({positionType:"individualShift",children:[{type:"elem",elem:a,shift:u,marginRight:f,marginLeft:y},{type:"elem",elem:l,shift:-h,marginRight:f}]})}else if(a){u=Math.max(u,c.sub1,a.height-.8*c.xHeight);b=Ke({positionType:"shift",positionData:u,children:[{type:"elem",elem:a,marginLeft:y,marginRight:f}]})}else{if(!l)throw new Error("supsub must have either sup or sub.");h=Math.max(h,d,l.depth+.25*c.xHeight),b=Ke({positionType:"shift",positionData:-h,children:[{type:"elem",elem:l,marginRight:f}]})}const w=zt(i,"right")||"mord";return je([w],[i,je(["msupsub"],[b])],t)},mathmlBuilder(e,t){let r,n,o=!1;e.base&&"horizBrace"===e.base.type&&(n=!!e.sup,n===e.base.isOver&&(o=!0,r=e.base.isOver)),!e.base||"op"!==e.base.type&&"operatorname"!==e.base.type||(e.base.parentIsSupSub=!0);const s=[Vt(e.base,t)];let i;if(e.sub&&s.push(Vt(e.sub,t)),e.sup&&s.push(Vt(e.sup,t)),o)i=r?"mover":"munder";else if(e.sub)if(e.sup){const r=e.base;i=r&&"op"===r.type&&r.limits&&t.style===S.DISPLAY||r&&"operatorname"===r.type&&r.alwaysHandleSupSub&&(t.style===S.DISPLAY||r.limits)?"munderover":"msubsup"}else{const r=e.base;i=r&&"op"===r.type&&r.limits&&(t.style===S.DISPLAY||r.alwaysHandleSupSub)||r&&"operatorname"===r.type&&r.alwaysHandleSupSub&&(r.limits||t.style===S.DISPLAY)?"munder":"msub"}else{const r=e.base;i=r&&"op"===r.type&&r.limits&&(t.style===S.DISPLAY||r.alwaysHandleSupSub)||r&&"operatorname"===r.type&&r.alwaysHandleSupSub&&(r.limits||t.style===S.DISPLAY)?"mover":"msup"}return new Bt(i,s)}}),ut({type:"atom",htmlBuilder(e,t){return Ve(e.text,e.mode,t,["m"+e.family])},mathmlBuilder(e,t){const r=new Bt("mo",[Et(e.text,e.mode)]);if("bin"===e.family){const n=Dt(e,t);"bold-italic"===n&&r.setAttribute("mathvariant",n)}else"punct"===e.family?r.setAttribute("separator","true"):"open"!==e.family&&"close"!==e.family||r.setAttribute("stretchy","false");return r}});const Bn={mi:"italic",mn:"normal",mtext:"normal"};ut({type:"mathord",htmlBuilder(e,t){return Ge(e,t)},mathmlBuilder(e,t){const r=new Bt("mi",[Et(e.text,e.mode,t)]),n=Dt(e,t)||"italic";return n!==Bn[r.type]&&r.setAttribute("mathvariant",n),r}}),ut({type:"textord",htmlBuilder(e,t){return Ge(e,t)},mathmlBuilder(e,t){const r=Et(e.text,e.mode,t),n=Dt(e,t)||"normal";let o;return o="text"===e.mode?new Bt("mtext",[r]):/[0-9]/.test(e.text)?new Bt("mn",[r]):"\\prime"===e.text?new Bt("mo",[r]):new Bt("mi",[r]),n!==Bn[o.type]&&o.setAttribute("mathvariant",n),o}});const qn={"\\nobreak":"nobreak","\\allowbreak":"allowbreak"},In={" ":{},"\\ ":{},"~":{className:"nobreak"},"\\space":{},"\\nobreakspace":{className:"nobreak"}};ut({type:"spacing",htmlBuilder(e,t){if(In.hasOwnProperty(e.text)){const r=In[e.text].className||"";if("text"===e.mode){const n=Ge(e,t);return n.classes.push(r),n}return je(["mspace",r],[Ve(e.text,e.mode,t)],t)}if(qn.hasOwnProperty(e.text))return je(["mspace",qn[e.text]],[],t);throw new n('Unknown type of space "'+e.text+'"')},mathmlBuilder(e,t){let r;if(!In.hasOwnProperty(e.text)){if(qn.hasOwnProperty(e.text))return new Bt("mspace");throw new n('Unknown type of space "'+e.text+'"')}return r=new Bt("mtext",[new qt("\xa0")]),r}});const Rn=()=>{const e=new Bt("mtd",[]);return e.setAttribute("width","50%"),e};ut({type:"tag",mathmlBuilder(e,t){const r=new Bt("mtable",[new Bt("mtr",[Rn(),new Bt("mtd",[Ft(e.body,t)]),Rn(),new Bt("mtd",[Ft(e.tag,t)])])]);return r.setAttribute("width","100%"),r}});const Hn={"\\text":void 0,"\\textrm":"textrm","\\textsf":"textsf","\\texttt":"texttt","\\textnormal":"textrm"},En={"\\textbf":"textbf","\\textmd":"textmd"},Nn={"\\textit":"textit","\\textup":"textup"},On=(e,t)=>{const r=e.font;return r?Hn[r]?t.withTextFontFamily(Hn[r]):En[r]?t.withTextFontWeight(En[r]):"\\emph"===r?"textit"===t.fontShape?t.withTextFontShape("textup"):t.withTextFontShape("textit"):t.withTextFontShape(Nn[r]):t};mt({type:"text",names:["\\text","\\textrm","\\textsf","\\texttt","\\textnormal","\\textbf","\\textmd","\\textit","\\textup","\\emph"],numArgs:1,argTypes:["text"],allowedInArgument:!0,allowedInText:!0,handler(e,t){let{parser:r,funcName:n}=e;const o=t[0];return{type:"text",mode:r.mode,body:dt(o),font:n}},htmlBuilder(e,t){const r=On(e,t),n=xt(e.body,r,!0);return je(["mord","text"],n,r)},mathmlBuilder(e,t){const r=On(e,t);return Ft(e.body,r)}}),mt({type:"underline",names:["\\underline"],numArgs:1,allowedInText:!0,handler(e,t){let{parser:r}=e;return{type:"underline",mode:r.mode,body:t[0]}},htmlBuilder(e,t){const r=Mt(e.body,t),n=_e("underline-line",t),o=t.fontMetrics().defaultRuleThickness,s=Ke({positionType:"top",positionData:r.height,children:[{type:"kern",size:o},{type:"elem",elem:n},{type:"kern",size:3*o},{type:"elem",elem:r}]});return je(["mord","underline"],[s],t)},mathmlBuilder(e,t){const r=new Bt("mo",[new qt("\u203e")]);r.setAttribute("stretchy","true");const n=new Bt("munder",[Vt(e.body,t),r]);return n.setAttribute("accentunder","true"),n}}),mt({type:"vcenter",names:["\\vcenter"],numArgs:1,argTypes:["original"],allowedInText:!1,handler(e,t){let{parser:r}=e;return{type:"vcenter",mode:r.mode,body:t[0]}},htmlBuilder(e,t){const r=Mt(e.body,t),n=t.fontMetrics().axisHeight,o=.5*(r.height-n-(r.depth+n));return Ke({positionType:"shift",positionData:o,children:[{type:"elem",elem:r}]})},mathmlBuilder(e,t){const r=new Bt("mpadded",[Vt(e.body,t)],["vcenter"]);return new Bt("mrow",[r])}}),mt({type:"verb",names:["\\verb"],numArgs:0,allowedInText:!0,handler(e,t,r){throw new n("\\verb ended by end of line instead of matching delimiter")},htmlBuilder(e,t){const r=Dn(e),n=[],o=t.havingStyle(t.style.text());for(let t=0;te.body.replace(/ /g,e.star?"\u2423":"\xa0");var Ln=at;const Pn="[ \r\n\t]",Fn="(\\\\[a-zA-Z@]+)"+Pn+"*",Vn="[\u0300-\u036f]",Gn=new RegExp(Vn+"+$"),Un="("+Pn+"+)|\\\\(\n|[ \r\t]+\n?)[ \r\t]*|([!-\\[\\]-\u2027\u202a-\ud7ff\uf900-\uffff]"+Vn+"*|[\ud800-\udbff][\udc00-\udfff]"+Vn+"*|\\\\verb\\*([^]).*?\\4|\\\\verb([^*a-zA-Z]).*?\\5|"+Fn+"|\\\\[^\ud800-\udfff])";class Xn{constructor(e,t){this.input=void 0,this.settings=void 0,this.tokenRegex=void 0,this.catcodes=void 0,this.input=e,this.settings=t,this.tokenRegex=new RegExp(Un,"g"),this.catcodes={"%":14,"~":13}}setCatcode(e,t){this.catcodes[e]=t}lex(){const e=this.input,t=this.tokenRegex.lastIndex;if(t===e.length)return new en("EOF",new Qr(this,t,t));const r=this.tokenRegex.exec(e);if(null===r||r.index!==t)throw new n("Unexpected character: '"+e[t]+"'",new en(e[t],new Qr(this,t,t+1)));const o=r[6]||r[3]||(r[2]?"\\ ":" ");if(14===this.catcodes[o]){const t=e.indexOf("\n",this.tokenRegex.lastIndex);return-1===t?(this.tokenRegex.lastIndex=e.length,this.settings.reportNonstrict("commentAtEnd","% comment has no terminating newline; LaTeX would fail because of commenting the end of math mode (e.g. $)")):this.tokenRegex.lastIndex=t+1,this.lex()}return new en(o,new Qr(this,t,this.tokenRegex.lastIndex))}}class Yn{constructor(e,t){void 0===e&&(e={}),void 0===t&&(t={}),this.current=void 0,this.builtins=void 0,this.undefStack=void 0,this.current=t,this.builtins=e,this.undefStack=[]}beginGroup(){this.undefStack.push({})}endGroup(){if(0===this.undefStack.length)throw new n("Unbalanced namespace destruction: attempt to pop global namespace; please report this as a bug");const e=this.undefStack.pop();for(const t in e)e.hasOwnProperty(t)&&(null==e[t]?delete this.current[t]:this.current[t]=e[t])}endGroups(){for(;this.undefStack.length>0;)this.endGroup()}has(e){return this.current.hasOwnProperty(e)||this.builtins.hasOwnProperty(e)}get(e){return this.current.hasOwnProperty(e)?this.current[e]:this.builtins[e]}set(e,t,r){if(void 0===r&&(r=!1),r){for(let t=0;t0&&(this.undefStack[this.undefStack.length-1][e]=t)}else{const t=this.undefStack[this.undefStack.length-1];t&&!t.hasOwnProperty(e)&&(t[e]=this.current[e])}null==t?delete this.current[e]:this.current[e]=t}}var jn=Kr;Jr("\\noexpand",function(e){const t=e.popToken();return e.isExpandable(t.text)&&(t.noexpand=!0,t.treatAsRelax=!0),{tokens:[t],numArgs:0}}),Jr("\\expandafter",function(e){const t=e.popToken();return e.expandOnce(!0),{tokens:[t],numArgs:0}}),Jr("\\@firstoftwo",function(e){return{tokens:e.consumeArgs(2)[0],numArgs:0}}),Jr("\\@secondoftwo",function(e){return{tokens:e.consumeArgs(2)[1],numArgs:0}}),Jr("\\@ifnextchar",function(e){const t=e.consumeArgs(3);e.consumeSpaces();const r=e.future();return 1===t[0].length&&t[0][0].text===r.text?{tokens:t[1],numArgs:0}:{tokens:t[2],numArgs:0}}),Jr("\\@ifstar","\\@ifnextchar *{\\@firstoftwo{#1}}"),Jr("\\TextOrMath",function(e){const t=e.consumeArgs(2);return"text"===e.mode?{tokens:t[0],numArgs:0}:{tokens:t[1],numArgs:0}});const Wn={0:0,1:1,2:2,3:3,4:4,5:5,6:6,7:7,8:8,9:9,a:10,A:10,b:11,B:11,c:12,C:12,d:13,D:13,e:14,E:14,f:15,F:15};Jr("\\char",function(e){let t,r=e.popToken(),o=0;if("'"===r.text)t=8,r=e.popToken();else if('"'===r.text)t=16,r=e.popToken();else if("`"===r.text)if(r=e.popToken(),"\\"===r.text[0])o=r.text.charCodeAt(1);else{if("EOF"===r.text)throw new n("\\char` missing argument");o=r.text.charCodeAt(0)}else t=10;if(t){if(o=Wn[r.text],null==o||o>=t)throw new n("Invalid base-"+t+" digit "+r.text);let s;for(;null!=(s=Wn[e.future().text])&&s{let s=e.consumeArg().tokens;if(1!==s.length)throw new n("\\newcommand's first argument must be a macro name");const i=s[0].text,l=e.isDefined(i);if(l&&!t)throw new n("\\newcommand{"+i+"} attempting to redefine "+i+"; use \\renewcommand");if(!l&&!r)throw new n("\\renewcommand{"+i+"} when command "+i+" does not yet exist; use \\newcommand");let a=0;if(s=e.consumeArg().tokens,1===s.length&&"["===s[0].text){let t="",r=e.expandNextToken();for(;"]"!==r.text&&"EOF"!==r.text;)t+=r.text,r=e.expandNextToken();if(!t.match(/^\s*[0-9]+\s*$/))throw new n("Invalid number of arguments: "+t);a=parseInt(t),s=e.consumeArg().tokens}return l&&o||e.macros.set(i,{tokens:s,numArgs:a}),""};Jr("\\newcommand",e=>_n(e,!1,!0,!1)),Jr("\\renewcommand",e=>_n(e,!0,!1,!1)),Jr("\\providecommand",e=>_n(e,!0,!0,!0)),Jr("\\message",e=>{const t=e.consumeArgs(1)[0];return console.log(t.reverse().map(e=>e.text).join("")),""}),Jr("\\errmessage",e=>{const t=e.consumeArgs(1)[0];return console.error(t.reverse().map(e=>e.text).join("")),""}),Jr("\\show",e=>{const t=e.popToken(),r=t.text;return console.log(t,e.macros.get(r),Ln[r],ne.math[r],ne.text[r]),""}),Jr("\\bgroup","{"),Jr("\\egroup","}"),Jr("~","\\nobreakspace"),Jr("\\lq","`"),Jr("\\rq","'"),Jr("\\aa","\\r a"),Jr("\\AA","\\r A"),Jr("\\textcopyright","\\html@mathml{\\textcircled{c}}{\\char`\xa9}"),Jr("\\copyright","\\TextOrMath{\\textcopyright}{\\text{\\textcopyright}}"),Jr("\\textregistered","\\html@mathml{\\textcircled{\\scriptsize R}}{\\char`\xae}"),Jr("\u212c","\\mathscr{B}"),Jr("\u2130","\\mathscr{E}"),Jr("\u2131","\\mathscr{F}"),Jr("\u210b","\\mathscr{H}"),Jr("\u2110","\\mathscr{I}"),Jr("\u2112","\\mathscr{L}"),Jr("\u2133","\\mathscr{M}"),Jr("\u211b","\\mathscr{R}"),Jr("\u212d","\\mathfrak{C}"),Jr("\u210c","\\mathfrak{H}"),Jr("\u2128","\\mathfrak{Z}"),Jr("\\Bbbk","\\Bbb{k}"),Jr("\\llap","\\mathllap{\\textrm{#1}}"),Jr("\\rlap","\\mathrlap{\\textrm{#1}}"),Jr("\\clap","\\mathclap{\\textrm{#1}}"),Jr("\\mathstrut","\\vphantom{(}"),Jr("\\underbar","\\underline{\\text{#1}}"),Jr("\\not",'\\html@mathml{\\mathrel{\\mathrlap\\@not}\\nobreak}{\\char"338}'),Jr("\\neq","\\html@mathml{\\mathrel{\\not=}}{\\mathrel{\\char`\u2260}}"),Jr("\\ne","\\neq"),Jr("\u2260","\\neq"),Jr("\\notin","\\html@mathml{\\mathrel{{\\in}\\mathllap{/\\mskip1mu}}}{\\mathrel{\\char`\u2209}}"),Jr("\u2209","\\notin"),Jr("\u2258","\\html@mathml{\\mathrel{=\\kern{-1em}\\raisebox{0.4em}{$\\scriptsize\\frown$}}}{\\mathrel{\\char`\u2258}}"),Jr("\u2259","\\html@mathml{\\stackrel{\\tiny\\wedge}{=}}{\\mathrel{\\char`\u2258}}"),Jr("\u225a","\\html@mathml{\\stackrel{\\tiny\\vee}{=}}{\\mathrel{\\char`\u225a}}"),Jr("\u225b","\\html@mathml{\\stackrel{\\scriptsize\\star}{=}}{\\mathrel{\\char`\u225b}}"),Jr("\u225d","\\html@mathml{\\stackrel{\\tiny\\mathrm{def}}{=}}{\\mathrel{\\char`\u225d}}"),Jr("\u225e","\\html@mathml{\\stackrel{\\tiny\\mathrm{m}}{=}}{\\mathrel{\\char`\u225e}}"),Jr("\u225f","\\html@mathml{\\stackrel{\\tiny?}{=}}{\\mathrel{\\char`\u225f}}"),Jr("\u27c2","\\perp"),Jr("\u203c","\\mathclose{!\\mkern-0.8mu!}"),Jr("\u220c","\\notni"),Jr("\u231c","\\ulcorner"),Jr("\u231d","\\urcorner"),Jr("\u231e","\\llcorner"),Jr("\u231f","\\lrcorner"),Jr("\xa9","\\copyright"),Jr("\xae","\\textregistered"),Jr("\\ulcorner",'\\html@mathml{\\@ulcorner}{\\mathop{\\char"231c}}'),Jr("\\urcorner",'\\html@mathml{\\@urcorner}{\\mathop{\\char"231d}}'),Jr("\\llcorner",'\\html@mathml{\\@llcorner}{\\mathop{\\char"231e}}'),Jr("\\lrcorner",'\\html@mathml{\\@lrcorner}{\\mathop{\\char"231f}}'),Jr("\\vdots","{\\varvdots\\rule{0pt}{15pt}}"),Jr("\u22ee","\\vdots"),Jr("\\varGamma","\\mathit{\\Gamma}"),Jr("\\varDelta","\\mathit{\\Delta}"),Jr("\\varTheta","\\mathit{\\Theta}"),Jr("\\varLambda","\\mathit{\\Lambda}"),Jr("\\varXi","\\mathit{\\Xi}"),Jr("\\varPi","\\mathit{\\Pi}"),Jr("\\varSigma","\\mathit{\\Sigma}"),Jr("\\varUpsilon","\\mathit{\\Upsilon}"),Jr("\\varPhi","\\mathit{\\Phi}"),Jr("\\varPsi","\\mathit{\\Psi}"),Jr("\\varOmega","\\mathit{\\Omega}"),Jr("\\substack","\\begin{subarray}{c}#1\\end{subarray}"),Jr("\\colon","\\nobreak\\mskip2mu\\mathpunct{}\\mathchoice{\\mkern-3mu}{\\mkern-3mu}{}{}{:}\\mskip6mu\\relax"),Jr("\\boxed","\\fbox{$\\displaystyle{#1}$}"),Jr("\\iff","\\DOTSB\\;\\Longleftrightarrow\\;"),Jr("\\implies","\\DOTSB\\;\\Longrightarrow\\;"),Jr("\\impliedby","\\DOTSB\\;\\Longleftarrow\\;"),Jr("\\dddot","{\\overset{\\raisebox{-0.1ex}{\\normalsize ...}}{#1}}"),Jr("\\ddddot","{\\overset{\\raisebox{-0.1ex}{\\normalsize ....}}{#1}}");const $n={",":"\\dotsc","\\not":"\\dotsb","+":"\\dotsb","=":"\\dotsb","<":"\\dotsb",">":"\\dotsb","-":"\\dotsb","*":"\\dotsb",":":"\\dotsb","\\DOTSB":"\\dotsb","\\coprod":"\\dotsb","\\bigvee":"\\dotsb","\\bigwedge":"\\dotsb","\\biguplus":"\\dotsb","\\bigcap":"\\dotsb","\\bigcup":"\\dotsb","\\prod":"\\dotsb","\\sum":"\\dotsb","\\bigotimes":"\\dotsb","\\bigoplus":"\\dotsb","\\bigodot":"\\dotsb","\\bigsqcup":"\\dotsb","\\And":"\\dotsb","\\longrightarrow":"\\dotsb","\\Longrightarrow":"\\dotsb","\\longleftarrow":"\\dotsb","\\Longleftarrow":"\\dotsb","\\longleftrightarrow":"\\dotsb","\\Longleftrightarrow":"\\dotsb","\\mapsto":"\\dotsb","\\longmapsto":"\\dotsb","\\hookrightarrow":"\\dotsb","\\doteq":"\\dotsb","\\mathbin":"\\dotsb","\\mathrel":"\\dotsb","\\relbar":"\\dotsb","\\Relbar":"\\dotsb","\\xrightarrow":"\\dotsb","\\xleftarrow":"\\dotsb","\\DOTSI":"\\dotsi","\\int":"\\dotsi","\\oint":"\\dotsi","\\iint":"\\dotsi","\\iiint":"\\dotsi","\\iiiint":"\\dotsi","\\idotsint":"\\dotsi","\\DOTSX":"\\dotsx"},Zn=new Set(["bin","rel"]);Jr("\\dots",function(e){let t="\\dotso";const r=e.expandAfterFuture().text;return r in $n?t=$n[r]:("\\not"===r.slice(0,4)||r in ne.math&&Zn.has(ne.math[r].group))&&(t="\\dotsb"),t});const Kn={")":!0,"]":!0,"\\rbrack":!0,"\\}":!0,"\\rbrace":!0,"\\rangle":!0,"\\rceil":!0,"\\rfloor":!0,"\\rgroup":!0,"\\rmoustache":!0,"\\right":!0,"\\bigr":!0,"\\biggr":!0,"\\Bigr":!0,"\\Biggr":!0,$:!0,";":!0,".":!0,",":!0};Jr("\\dotso",function(e){return e.future().text in Kn?"\\ldots\\,":"\\ldots"}),Jr("\\dotsc",function(e){const t=e.future().text;return t in Kn&&","!==t?"\\ldots\\,":"\\ldots"}),Jr("\\cdots",function(e){return e.future().text in Kn?"\\@cdots\\,":"\\@cdots"}),Jr("\\dotsb","\\cdots"),Jr("\\dotsm","\\cdots"),Jr("\\dotsi","\\!\\cdots"),Jr("\\dotsx","\\ldots\\,"),Jr("\\DOTSI","\\relax"),Jr("\\DOTSB","\\relax"),Jr("\\DOTSX","\\relax"),Jr("\\tmspace","\\TextOrMath{\\kern#1#3}{\\mskip#1#2}\\relax"),Jr("\\,","\\tmspace+{3mu}{.1667em}"),Jr("\\thinspace","\\,"),Jr("\\>","\\mskip{4mu}"),Jr("\\:","\\tmspace+{4mu}{.2222em}"),Jr("\\medspace","\\:"),Jr("\\;","\\tmspace+{5mu}{.2777em}"),Jr("\\thickspace","\\;"),Jr("\\!","\\tmspace-{3mu}{.1667em}"),Jr("\\negthinspace","\\!"),Jr("\\negmedspace","\\tmspace-{4mu}{.2222em}"),Jr("\\negthickspace","\\tmspace-{5mu}{.277em}"),Jr("\\enspace","\\kern.5em "),Jr("\\enskip","\\hskip.5em\\relax"),Jr("\\quad","\\hskip1em\\relax"),Jr("\\qquad","\\hskip2em\\relax"),Jr("\\tag","\\@ifstar\\tag@literal\\tag@paren"),Jr("\\tag@paren","\\tag@literal{({#1})}"),Jr("\\tag@literal",e=>{if(e.macros.get("\\df@tag"))throw new n("Multiple \\tag");return"\\gdef\\df@tag{\\text{#1}}"}),Jr("\\bmod","\\mathchoice{\\mskip1mu}{\\mskip1mu}{\\mskip5mu}{\\mskip5mu}\\mathbin{\\rm mod}\\mathchoice{\\mskip1mu}{\\mskip1mu}{\\mskip5mu}{\\mskip5mu}"),Jr("\\pod","\\allowbreak\\mathchoice{\\mkern18mu}{\\mkern8mu}{\\mkern8mu}{\\mkern8mu}(#1)"),Jr("\\pmod","\\pod{{\\rm mod}\\mkern6mu#1}"),Jr("\\mod","\\allowbreak\\mathchoice{\\mkern18mu}{\\mkern12mu}{\\mkern12mu}{\\mkern12mu}{\\rm mod}\\,\\,#1"),Jr("\\newline","\\\\\\relax"),Jr("\\TeX","\\textrm{\\html@mathml{T\\kern-.1667em\\raisebox{-.5ex}{E}\\kern-.125emX}{TeX}}");const Jn=O(K["Main-Regular"]["T".charCodeAt(0)][1]-.7*K["Main-Regular"]["A".charCodeAt(0)][1]);Jr("\\LaTeX","\\textrm{\\html@mathml{L\\kern-.36em\\raisebox{"+Jn+"}{\\scriptstyle A}\\kern-.15em\\TeX}{LaTeX}}"),Jr("\\KaTeX","\\textrm{\\html@mathml{K\\kern-.17em\\raisebox{"+Jn+"}{\\scriptstyle A}\\kern-.15em\\TeX}{KaTeX}}"),Jr("\\hspace","\\@ifstar\\@hspacer\\@hspace"),Jr("\\@hspace","\\hskip #1\\relax"),Jr("\\@hspacer","\\rule{0pt}{0pt}\\hskip #1\\relax"),Jr("\\ordinarycolon",":"),Jr("\\vcentcolon","\\mathrel{\\mathop\\ordinarycolon}"),Jr("\\dblcolon",'\\html@mathml{\\mathrel{\\vcentcolon\\mathrel{\\mkern-.9mu}\\vcentcolon}}{\\mathop{\\char"2237}}'),Jr("\\coloneqq",'\\html@mathml{\\mathrel{\\vcentcolon\\mathrel{\\mkern-1.2mu}=}}{\\mathop{\\char"2254}}'),Jr("\\Coloneqq",'\\html@mathml{\\mathrel{\\dblcolon\\mathrel{\\mkern-1.2mu}=}}{\\mathop{\\char"2237\\char"3d}}'),Jr("\\coloneq",'\\html@mathml{\\mathrel{\\vcentcolon\\mathrel{\\mkern-1.2mu}\\mathrel{-}}}{\\mathop{\\char"3a\\char"2212}}'),Jr("\\Coloneq",'\\html@mathml{\\mathrel{\\dblcolon\\mathrel{\\mkern-1.2mu}\\mathrel{-}}}{\\mathop{\\char"2237\\char"2212}}'),Jr("\\eqqcolon",'\\html@mathml{\\mathrel{=\\mathrel{\\mkern-1.2mu}\\vcentcolon}}{\\mathop{\\char"2255}}'),Jr("\\Eqqcolon",'\\html@mathml{\\mathrel{=\\mathrel{\\mkern-1.2mu}\\dblcolon}}{\\mathop{\\char"3d\\char"2237}}'),Jr("\\eqcolon",'\\html@mathml{\\mathrel{\\mathrel{-}\\mathrel{\\mkern-1.2mu}\\vcentcolon}}{\\mathop{\\char"2239}}'),Jr("\\Eqcolon",'\\html@mathml{\\mathrel{\\mathrel{-}\\mathrel{\\mkern-1.2mu}\\dblcolon}}{\\mathop{\\char"2212\\char"2237}}'),Jr("\\colonapprox",'\\html@mathml{\\mathrel{\\vcentcolon\\mathrel{\\mkern-1.2mu}\\approx}}{\\mathop{\\char"3a\\char"2248}}'),Jr("\\Colonapprox",'\\html@mathml{\\mathrel{\\dblcolon\\mathrel{\\mkern-1.2mu}\\approx}}{\\mathop{\\char"2237\\char"2248}}'),Jr("\\colonsim",'\\html@mathml{\\mathrel{\\vcentcolon\\mathrel{\\mkern-1.2mu}\\sim}}{\\mathop{\\char"3a\\char"223c}}'),Jr("\\Colonsim",'\\html@mathml{\\mathrel{\\dblcolon\\mathrel{\\mkern-1.2mu}\\sim}}{\\mathop{\\char"2237\\char"223c}}'),Jr("\u2237","\\dblcolon"),Jr("\u2239","\\eqcolon"),Jr("\u2254","\\coloneqq"),Jr("\u2255","\\eqqcolon"),Jr("\u2a74","\\Coloneqq"),Jr("\\ratio","\\vcentcolon"),Jr("\\coloncolon","\\dblcolon"),Jr("\\colonequals","\\coloneqq"),Jr("\\coloncolonequals","\\Coloneqq"),Jr("\\equalscolon","\\eqqcolon"),Jr("\\equalscoloncolon","\\Eqqcolon"),Jr("\\colonminus","\\coloneq"),Jr("\\coloncolonminus","\\Coloneq"),Jr("\\minuscolon","\\eqcolon"),Jr("\\minuscoloncolon","\\Eqcolon"),Jr("\\coloncolonapprox","\\Colonapprox"),Jr("\\coloncolonsim","\\Colonsim"),Jr("\\simcolon","\\mathrel{\\sim\\mathrel{\\mkern-1.2mu}\\vcentcolon}"),Jr("\\simcoloncolon","\\mathrel{\\sim\\mathrel{\\mkern-1.2mu}\\dblcolon}"),Jr("\\approxcolon","\\mathrel{\\approx\\mathrel{\\mkern-1.2mu}\\vcentcolon}"),Jr("\\approxcoloncolon","\\mathrel{\\approx\\mathrel{\\mkern-1.2mu}\\dblcolon}"),Jr("\\notni","\\html@mathml{\\not\\ni}{\\mathrel{\\char`\u220c}}"),Jr("\\limsup","\\DOTSB\\operatorname*{lim\\,sup}"),Jr("\\liminf","\\DOTSB\\operatorname*{lim\\,inf}"),Jr("\\injlim","\\DOTSB\\operatorname*{inj\\,lim}"),Jr("\\projlim","\\DOTSB\\operatorname*{proj\\,lim}"),Jr("\\varlimsup","\\DOTSB\\operatorname*{\\overline{lim}}"),Jr("\\varliminf","\\DOTSB\\operatorname*{\\underline{lim}}"),Jr("\\varinjlim","\\DOTSB\\operatorname*{\\underrightarrow{lim}}"),Jr("\\varprojlim","\\DOTSB\\operatorname*{\\underleftarrow{lim}}"),Jr("\\gvertneqq","\\html@mathml{\\@gvertneqq}{\u2269}"),Jr("\\lvertneqq","\\html@mathml{\\@lvertneqq}{\u2268}"),Jr("\\ngeqq","\\html@mathml{\\@ngeqq}{\u2271}"),Jr("\\ngeqslant","\\html@mathml{\\@ngeqslant}{\u2271}"),Jr("\\nleqq","\\html@mathml{\\@nleqq}{\u2270}"),Jr("\\nleqslant","\\html@mathml{\\@nleqslant}{\u2270}"),Jr("\\nshortmid","\\html@mathml{\\@nshortmid}{\u2224}"),Jr("\\nshortparallel","\\html@mathml{\\@nshortparallel}{\u2226}"),Jr("\\nsubseteqq","\\html@mathml{\\@nsubseteqq}{\u2288}"),Jr("\\nsupseteqq","\\html@mathml{\\@nsupseteqq}{\u2289}"),Jr("\\varsubsetneq","\\html@mathml{\\@varsubsetneq}{\u228a}"),Jr("\\varsubsetneqq","\\html@mathml{\\@varsubsetneqq}{\u2acb}"),Jr("\\varsupsetneq","\\html@mathml{\\@varsupsetneq}{\u228b}"),Jr("\\varsupsetneqq","\\html@mathml{\\@varsupsetneqq}{\u2acc}"),Jr("\\imath","\\html@mathml{\\@imath}{\u0131}"),Jr("\\jmath","\\html@mathml{\\@jmath}{\u0237}"),Jr("\\llbracket","\\html@mathml{\\mathopen{[\\mkern-3.2mu[}}{\\mathopen{\\char`\u27e6}}"),Jr("\\rrbracket","\\html@mathml{\\mathclose{]\\mkern-3.2mu]}}{\\mathclose{\\char`\u27e7}}"),Jr("\u27e6","\\llbracket"),Jr("\u27e7","\\rrbracket"),Jr("\\lBrace","\\html@mathml{\\mathopen{\\{\\mkern-3.2mu[}}{\\mathopen{\\char`\u2983}}"),Jr("\\rBrace","\\html@mathml{\\mathclose{]\\mkern-3.2mu\\}}}{\\mathclose{\\char`\u2984}}"),Jr("\u2983","\\lBrace"),Jr("\u2984","\\rBrace"),Jr("\\minuso","\\mathbin{\\html@mathml{{\\mathrlap{\\mathchoice{\\kern{0.145em}}{\\kern{0.145em}}{\\kern{0.1015em}}{\\kern{0.0725em}}\\circ}{-}}}{\\char`\u29b5}}"),Jr("\u29b5","\\minuso"),Jr("\\darr","\\downarrow"),Jr("\\dArr","\\Downarrow"),Jr("\\Darr","\\Downarrow"),Jr("\\lang","\\langle"),Jr("\\rang","\\rangle"),Jr("\\uarr","\\uparrow"),Jr("\\uArr","\\Uparrow"),Jr("\\Uarr","\\Uparrow"),Jr("\\N","\\mathbb{N}"),Jr("\\R","\\mathbb{R}"),Jr("\\Z","\\mathbb{Z}"),Jr("\\alef","\\aleph"),Jr("\\alefsym","\\aleph"),Jr("\\Alpha","\\mathrm{A}"),Jr("\\Beta","\\mathrm{B}"),Jr("\\bull","\\bullet"),Jr("\\Chi","\\mathrm{X}"),Jr("\\clubs","\\clubsuit"),Jr("\\cnums","\\mathbb{C}"),Jr("\\Complex","\\mathbb{C}"),Jr("\\Dagger","\\ddagger"),Jr("\\diamonds","\\diamondsuit"),Jr("\\empty","\\emptyset"),Jr("\\Epsilon","\\mathrm{E}"),Jr("\\Eta","\\mathrm{H}"),Jr("\\exist","\\exists"),Jr("\\harr","\\leftrightarrow"),Jr("\\hArr","\\Leftrightarrow"),Jr("\\Harr","\\Leftrightarrow"),Jr("\\hearts","\\heartsuit"),Jr("\\image","\\Im"),Jr("\\infin","\\infty"),Jr("\\Iota","\\mathrm{I}"),Jr("\\isin","\\in"),Jr("\\Kappa","\\mathrm{K}"),Jr("\\larr","\\leftarrow"),Jr("\\lArr","\\Leftarrow"),Jr("\\Larr","\\Leftarrow"),Jr("\\lrarr","\\leftrightarrow"),Jr("\\lrArr","\\Leftrightarrow"),Jr("\\Lrarr","\\Leftrightarrow"),Jr("\\Mu","\\mathrm{M}"),Jr("\\natnums","\\mathbb{N}"),Jr("\\Nu","\\mathrm{N}"),Jr("\\Omicron","\\mathrm{O}"),Jr("\\plusmn","\\pm"),Jr("\\rarr","\\rightarrow"),Jr("\\rArr","\\Rightarrow"),Jr("\\Rarr","\\Rightarrow"),Jr("\\real","\\Re"),Jr("\\reals","\\mathbb{R}"),Jr("\\Reals","\\mathbb{R}"),Jr("\\Rho","\\mathrm{P}"),Jr("\\sdot","\\cdot"),Jr("\\sect","\\S"),Jr("\\spades","\\spadesuit"),Jr("\\sub","\\subset"),Jr("\\sube","\\subseteq"),Jr("\\supe","\\supseteq"),Jr("\\Tau","\\mathrm{T}"),Jr("\\thetasym","\\vartheta"),Jr("\\weierp","\\wp"),Jr("\\Zeta","\\mathrm{Z}"),Jr("\\argmin","\\DOTSB\\operatorname*{arg\\,min}"),Jr("\\argmax","\\DOTSB\\operatorname*{arg\\,max}"),Jr("\\plim","\\DOTSB\\mathop{\\operatorname{plim}}\\limits"),Jr("\\bra","\\mathinner{\\langle{#1}|}"),Jr("\\ket","\\mathinner{|{#1}\\rangle}"),Jr("\\braket","\\mathinner{\\langle{#1}\\rangle}"),Jr("\\Bra","\\left\\langle#1\\right|"),Jr("\\Ket","\\left|#1\\right\\rangle");const Qn=e=>t=>{const r=t.consumeArg().tokens,n=t.consumeArg().tokens,o=t.consumeArg().tokens,s=t.consumeArg().tokens,i=t.macros.get("|"),l=t.macros.get("\\|");t.macros.beginGroup();const a=t=>r=>{e&&(r.macros.set("|",i),o.length&&r.macros.set("\\|",l));let s=t;if(!t&&o.length){"|"===r.future().text&&(r.popToken(),s=!0)}return{tokens:s?o:n,numArgs:0}};t.macros.set("|",a(!1)),o.length&&t.macros.set("\\|",a(!0));const c=t.consumeArg().tokens,h=t.expandTokens([...s,...c,...r]);return t.macros.endGroup(),{tokens:h.reverse(),numArgs:0}};Jr("\\bra@ket",Qn(!1)),Jr("\\bra@set",Qn(!0)),Jr("\\Braket","\\bra@ket{\\left\\langle}{\\,\\middle\\vert\\,}{\\,\\middle\\vert\\,}{\\right\\rangle}"),Jr("\\Set","\\bra@set{\\left\\{\\:}{\\;\\middle\\vert\\;}{\\;\\middle\\Vert\\;}{\\:\\right\\}}"),Jr("\\set","\\bra@set{\\{\\,}{\\mid}{}{\\,\\}}"),Jr("\\angln","{\\angl n}"),Jr("\\blue","\\textcolor{##6495ed}{#1}"),Jr("\\orange","\\textcolor{##ffa500}{#1}"),Jr("\\pink","\\textcolor{##ff00af}{#1}"),Jr("\\red","\\textcolor{##df0030}{#1}"),Jr("\\green","\\textcolor{##28ae7b}{#1}"),Jr("\\gray","\\textcolor{gray}{#1}"),Jr("\\purple","\\textcolor{##9d38bd}{#1}"),Jr("\\blueA","\\textcolor{##ccfaff}{#1}"),Jr("\\blueB","\\textcolor{##80f6ff}{#1}"),Jr("\\blueC","\\textcolor{##63d9ea}{#1}"),Jr("\\blueD","\\textcolor{##11accd}{#1}"),Jr("\\blueE","\\textcolor{##0c7f99}{#1}"),Jr("\\tealA","\\textcolor{##94fff5}{#1}"),Jr("\\tealB","\\textcolor{##26edd5}{#1}"),Jr("\\tealC","\\textcolor{##01d1c1}{#1}"),Jr("\\tealD","\\textcolor{##01a995}{#1}"),Jr("\\tealE","\\textcolor{##208170}{#1}"),Jr("\\greenA","\\textcolor{##b6ffb0}{#1}"),Jr("\\greenB","\\textcolor{##8af281}{#1}"),Jr("\\greenC","\\textcolor{##74cf70}{#1}"),Jr("\\greenD","\\textcolor{##1fab54}{#1}"),Jr("\\greenE","\\textcolor{##0d923f}{#1}"),Jr("\\goldA","\\textcolor{##ffd0a9}{#1}"),Jr("\\goldB","\\textcolor{##ffbb71}{#1}"),Jr("\\goldC","\\textcolor{##ff9c39}{#1}"),Jr("\\goldD","\\textcolor{##e07d10}{#1}"),Jr("\\goldE","\\textcolor{##a75a05}{#1}"),Jr("\\redA","\\textcolor{##fca9a9}{#1}"),Jr("\\redB","\\textcolor{##ff8482}{#1}"),Jr("\\redC","\\textcolor{##f9685d}{#1}"),Jr("\\redD","\\textcolor{##e84d39}{#1}"),Jr("\\redE","\\textcolor{##bc2612}{#1}"),Jr("\\maroonA","\\textcolor{##ffbde0}{#1}"),Jr("\\maroonB","\\textcolor{##ff92c6}{#1}"),Jr("\\maroonC","\\textcolor{##ed5fa6}{#1}"),Jr("\\maroonD","\\textcolor{##ca337c}{#1}"),Jr("\\maroonE","\\textcolor{##9e034e}{#1}"),Jr("\\purpleA","\\textcolor{##ddd7ff}{#1}"),Jr("\\purpleB","\\textcolor{##c6b9fc}{#1}"),Jr("\\purpleC","\\textcolor{##aa87ff}{#1}"),Jr("\\purpleD","\\textcolor{##7854ab}{#1}"),Jr("\\purpleE","\\textcolor{##543b78}{#1}"),Jr("\\mintA","\\textcolor{##f5f9e8}{#1}"),Jr("\\mintB","\\textcolor{##edf2df}{#1}"),Jr("\\mintC","\\textcolor{##e0e5cc}{#1}"),Jr("\\grayA","\\textcolor{##f6f7f7}{#1}"),Jr("\\grayB","\\textcolor{##f0f1f2}{#1}"),Jr("\\grayC","\\textcolor{##e3e5e6}{#1}"),Jr("\\grayD","\\textcolor{##d6d8da}{#1}"),Jr("\\grayE","\\textcolor{##babec2}{#1}"),Jr("\\grayF","\\textcolor{##888d93}{#1}"),Jr("\\grayG","\\textcolor{##626569}{#1}"),Jr("\\grayH","\\textcolor{##3b3e40}{#1}"),Jr("\\grayI","\\textcolor{##21242c}{#1}"),Jr("\\kaBlue","\\textcolor{##314453}{#1}"),Jr("\\kaGreen","\\textcolor{##71B307}{#1}");const eo={"^":!0,_:!0,"\\limits":!0,"\\nolimits":!0};class to{constructor(e,t,r){this.settings=void 0,this.expansionCount=void 0,this.lexer=void 0,this.macros=void 0,this.stack=void 0,this.mode=void 0,this.settings=t,this.expansionCount=0,this.feed(e),this.macros=new Yn(jn,t.macros),this.mode=r,this.stack=[]}feed(e){this.lexer=new Xn(e,this.settings)}switchMode(e){this.mode=e}beginGroup(){this.macros.beginGroup()}endGroup(){this.macros.endGroup()}endGroups(){this.macros.endGroups()}future(){return 0===this.stack.length&&this.pushToken(this.lexer.lex()),this.stack[this.stack.length-1]}popToken(){return this.future(),this.stack.pop()}pushToken(e){this.stack.push(e)}pushTokens(e){this.stack.push(...e)}scanArgument(e){let t,r,n;if(e){if(this.consumeSpaces(),"["!==this.future().text)return null;t=this.popToken(),({tokens:n,end:r}=this.consumeArg(["]"]))}else({tokens:n,start:t,end:r}=this.consumeArg());return this.pushToken(new en("EOF",r.loc)),this.pushTokens(n),new en("",Qr.range(t,r))}consumeSpaces(){for(;;){if(" "!==this.future().text)break;this.stack.pop()}}consumeArg(e){const t=[],r=e&&e.length>0;r||this.consumeSpaces();const o=this.future();let s,i=0,l=0;do{if(s=this.popToken(),t.push(s),"{"===s.text)++i;else if("}"===s.text){if(--i,-1===i)throw new n("Extra }",s)}else if("EOF"===s.text)throw new n("Unexpected end of input in a macro argument, expected '"+(e&&r?e[l]:"}")+"'",s);if(e&&r)if((0===i||1===i&&"{"===e[l])&&s.text===e[l]){if(++l,l===e.length){t.splice(-l,l);break}}else l=0}while(0!==i||r);return"{"===o.text&&"}"===t[t.length-1].text&&(t.pop(),t.shift()),t.reverse(),{tokens:t,start:o,end:s}}consumeArgs(e,t){if(t){if(t.length!==e+1)throw new n("The length of delimiters doesn't match the number of args!");const r=t[0];for(let e=0;ethis.settings.maxExpand)throw new n("Too many expansions: infinite loop or need to increase maxExpand setting")}expandOnce(e){const t=this.popToken(),r=t.text,o=t.noexpand?null:this._getExpansion(r);if(null==o||e&&o.unexpandable){if(e&&null==o&&"\\"===r[0]&&!this.isDefined(r))throw new n("Undefined control sequence: "+r);return this.pushToken(t),!1}this.countExpansion(1);let s=o.tokens;const i=this.consumeArgs(o.numArgs,o.delimiters);if(o.numArgs){s=s.slice();for(let e=s.length-1;e>=0;--e){let t=s[e];if("#"===t.text){if(0===e)throw new n("Incomplete placeholder at end of macro body",t);if(t=s[--e],"#"===t.text)s.splice(e+1,1);else{if(!/^[1-9]$/.test(t.text))throw new n("Not a valid argument number",t);s.splice(e,2,...i[+t.text-1])}}}}return this.pushTokens(s),s.length}expandAfterFuture(){return this.expandOnce(),this.future()}expandNextToken(){for(;;)if(!1===this.expandOnce()){const e=this.stack.pop();return e.treatAsRelax&&(e.text="\\relax"),e}}expandMacro(e){return this.macros.has(e)?this.expandTokens([new en(e)]):void 0}expandTokens(e){const t=[],r=this.stack.length;for(this.pushTokens(e);this.stack.length>r;)if(!1===this.expandOnce(!0)){const e=this.stack.pop();e.treatAsRelax&&(e.noexpand=!1,e.treatAsRelax=!1),t.push(e)}return this.countExpansion(t.length),t}expandMacroAsText(e){const t=this.expandMacro(e);return t?t.map(e=>e.text).join(""):t}_getExpansion(e){const t=this.macros.get(e);if(null==t)return t;if(1===e.length){const t=this.lexer.catcodes[e];if(null!=t&&13!==t)return}const r="function"==typeof t?t(this):t;if("string"==typeof r){let e=0;if(r.includes("#")){const t=r.replace(/##/g,"");for(;t.includes("#"+(e+1));)++e}const t=new Xn(r,this.settings),n=[];let o=t.lex();for(;"EOF"!==o.text;)n.push(o),o=t.lex();n.reverse();return{tokens:n,numArgs:e}}return r}isDefined(e){return this.macros.has(e)||Ln.hasOwnProperty(e)||ne.math.hasOwnProperty(e)||ne.text.hasOwnProperty(e)||eo.hasOwnProperty(e)}isExpandable(e){const t=this.macros.get(e);return null!=t?"string"==typeof t||"function"==typeof t||!t.unexpandable:Ln.hasOwnProperty(e)&&!Ln[e].primitive}}const ro=/^[\u208a\u208b\u208c\u208d\u208e\u2080\u2081\u2082\u2083\u2084\u2085\u2086\u2087\u2088\u2089\u2090\u2091\u2095\u1d62\u2c7c\u2096\u2097\u2098\u2099\u2092\u209a\u1d63\u209b\u209c\u1d64\u1d65\u2093\u1d66\u1d67\u1d68\u1d69\u1d6a]/,no=Object.freeze({"\u208a":"+","\u208b":"-","\u208c":"=","\u208d":"(","\u208e":")","\u2080":"0","\u2081":"1","\u2082":"2","\u2083":"3","\u2084":"4","\u2085":"5","\u2086":"6","\u2087":"7","\u2088":"8","\u2089":"9","\u2090":"a","\u2091":"e","\u2095":"h","\u1d62":"i","\u2c7c":"j","\u2096":"k","\u2097":"l","\u2098":"m","\u2099":"n","\u2092":"o","\u209a":"p","\u1d63":"r","\u209b":"s","\u209c":"t","\u1d64":"u","\u1d65":"v","\u2093":"x","\u1d66":"\u03b2","\u1d67":"\u03b3","\u1d68":"\u03c1","\u1d69":"\u03d5","\u1d6a":"\u03c7","\u207a":"+","\u207b":"-","\u207c":"=","\u207d":"(","\u207e":")","\u2070":"0","\xb9":"1","\xb2":"2","\xb3":"3","\u2074":"4","\u2075":"5","\u2076":"6","\u2077":"7","\u2078":"8","\u2079":"9","\u1d2c":"A","\u1d2e":"B","\u1d30":"D","\u1d31":"E","\u1d33":"G","\u1d34":"H","\u1d35":"I","\u1d36":"J","\u1d37":"K","\u1d38":"L","\u1d39":"M","\u1d3a":"N","\u1d3c":"O","\u1d3e":"P","\u1d3f":"R","\u1d40":"T","\u1d41":"U","\u2c7d":"V","\u1d42":"W","\u1d43":"a","\u1d47":"b","\u1d9c":"c","\u1d48":"d","\u1d49":"e","\u1da0":"f","\u1d4d":"g","\u02b0":"h","\u2071":"i","\u02b2":"j","\u1d4f":"k","\u02e1":"l","\u1d50":"m","\u207f":"n","\u1d52":"o","\u1d56":"p","\u02b3":"r","\u02e2":"s","\u1d57":"t","\u1d58":"u","\u1d5b":"v","\u02b7":"w","\u02e3":"x","\u02b8":"y","\u1dbb":"z","\u1d5d":"\u03b2","\u1d5e":"\u03b3","\u1d5f":"\u03b4","\u1d60":"\u03d5","\u1d61":"\u03c7","\u1dbf":"\u03b8"}),oo={"\u0301":{text:"\\'",math:"\\acute"},"\u0300":{text:"\\`",math:"\\grave"},"\u0308":{text:'\\"',math:"\\ddot"},"\u0303":{text:"\\~",math:"\\tilde"},"\u0304":{text:"\\=",math:"\\bar"},"\u0306":{text:"\\u",math:"\\breve"},"\u030c":{text:"\\v",math:"\\check"},"\u0302":{text:"\\^",math:"\\hat"},"\u0307":{text:"\\.",math:"\\dot"},"\u030a":{text:"\\r",math:"\\mathring"},"\u030b":{text:"\\H"},"\u0327":{text:"\\c"}},so={"\xe1":"a\u0301","\xe0":"a\u0300","\xe4":"a\u0308","\u01df":"a\u0308\u0304","\xe3":"a\u0303","\u0101":"a\u0304","\u0103":"a\u0306","\u1eaf":"a\u0306\u0301","\u1eb1":"a\u0306\u0300","\u1eb5":"a\u0306\u0303","\u01ce":"a\u030c","\xe2":"a\u0302","\u1ea5":"a\u0302\u0301","\u1ea7":"a\u0302\u0300","\u1eab":"a\u0302\u0303","\u0227":"a\u0307","\u01e1":"a\u0307\u0304","\xe5":"a\u030a","\u01fb":"a\u030a\u0301","\u1e03":"b\u0307","\u0107":"c\u0301","\u1e09":"c\u0327\u0301","\u010d":"c\u030c","\u0109":"c\u0302","\u010b":"c\u0307","\xe7":"c\u0327","\u010f":"d\u030c","\u1e0b":"d\u0307","\u1e11":"d\u0327","\xe9":"e\u0301","\xe8":"e\u0300","\xeb":"e\u0308","\u1ebd":"e\u0303","\u0113":"e\u0304","\u1e17":"e\u0304\u0301","\u1e15":"e\u0304\u0300","\u0115":"e\u0306","\u1e1d":"e\u0327\u0306","\u011b":"e\u030c","\xea":"e\u0302","\u1ebf":"e\u0302\u0301","\u1ec1":"e\u0302\u0300","\u1ec5":"e\u0302\u0303","\u0117":"e\u0307","\u0229":"e\u0327","\u1e1f":"f\u0307","\u01f5":"g\u0301","\u1e21":"g\u0304","\u011f":"g\u0306","\u01e7":"g\u030c","\u011d":"g\u0302","\u0121":"g\u0307","\u0123":"g\u0327","\u1e27":"h\u0308","\u021f":"h\u030c","\u0125":"h\u0302","\u1e23":"h\u0307","\u1e29":"h\u0327","\xed":"i\u0301","\xec":"i\u0300","\xef":"i\u0308","\u1e2f":"i\u0308\u0301","\u0129":"i\u0303","\u012b":"i\u0304","\u012d":"i\u0306","\u01d0":"i\u030c","\xee":"i\u0302","\u01f0":"j\u030c","\u0135":"j\u0302","\u1e31":"k\u0301","\u01e9":"k\u030c","\u0137":"k\u0327","\u013a":"l\u0301","\u013e":"l\u030c","\u013c":"l\u0327","\u1e3f":"m\u0301","\u1e41":"m\u0307","\u0144":"n\u0301","\u01f9":"n\u0300","\xf1":"n\u0303","\u0148":"n\u030c","\u1e45":"n\u0307","\u0146":"n\u0327","\xf3":"o\u0301","\xf2":"o\u0300","\xf6":"o\u0308","\u022b":"o\u0308\u0304","\xf5":"o\u0303","\u1e4d":"o\u0303\u0301","\u1e4f":"o\u0303\u0308","\u022d":"o\u0303\u0304","\u014d":"o\u0304","\u1e53":"o\u0304\u0301","\u1e51":"o\u0304\u0300","\u014f":"o\u0306","\u01d2":"o\u030c","\xf4":"o\u0302","\u1ed1":"o\u0302\u0301","\u1ed3":"o\u0302\u0300","\u1ed7":"o\u0302\u0303","\u022f":"o\u0307","\u0231":"o\u0307\u0304","\u0151":"o\u030b","\u1e55":"p\u0301","\u1e57":"p\u0307","\u0155":"r\u0301","\u0159":"r\u030c","\u1e59":"r\u0307","\u0157":"r\u0327","\u015b":"s\u0301","\u1e65":"s\u0301\u0307","\u0161":"s\u030c","\u1e67":"s\u030c\u0307","\u015d":"s\u0302","\u1e61":"s\u0307","\u015f":"s\u0327","\u1e97":"t\u0308","\u0165":"t\u030c","\u1e6b":"t\u0307","\u0163":"t\u0327","\xfa":"u\u0301","\xf9":"u\u0300","\xfc":"u\u0308","\u01d8":"u\u0308\u0301","\u01dc":"u\u0308\u0300","\u01d6":"u\u0308\u0304","\u01da":"u\u0308\u030c","\u0169":"u\u0303","\u1e79":"u\u0303\u0301","\u016b":"u\u0304","\u1e7b":"u\u0304\u0308","\u016d":"u\u0306","\u01d4":"u\u030c","\xfb":"u\u0302","\u016f":"u\u030a","\u0171":"u\u030b","\u1e7d":"v\u0303","\u1e83":"w\u0301","\u1e81":"w\u0300","\u1e85":"w\u0308","\u0175":"w\u0302","\u1e87":"w\u0307","\u1e98":"w\u030a","\u1e8d":"x\u0308","\u1e8b":"x\u0307","\xfd":"y\u0301","\u1ef3":"y\u0300","\xff":"y\u0308","\u1ef9":"y\u0303","\u0233":"y\u0304","\u0177":"y\u0302","\u1e8f":"y\u0307","\u1e99":"y\u030a","\u017a":"z\u0301","\u017e":"z\u030c","\u1e91":"z\u0302","\u017c":"z\u0307","\xc1":"A\u0301","\xc0":"A\u0300","\xc4":"A\u0308","\u01de":"A\u0308\u0304","\xc3":"A\u0303","\u0100":"A\u0304","\u0102":"A\u0306","\u1eae":"A\u0306\u0301","\u1eb0":"A\u0306\u0300","\u1eb4":"A\u0306\u0303","\u01cd":"A\u030c","\xc2":"A\u0302","\u1ea4":"A\u0302\u0301","\u1ea6":"A\u0302\u0300","\u1eaa":"A\u0302\u0303","\u0226":"A\u0307","\u01e0":"A\u0307\u0304","\xc5":"A\u030a","\u01fa":"A\u030a\u0301","\u1e02":"B\u0307","\u0106":"C\u0301","\u1e08":"C\u0327\u0301","\u010c":"C\u030c","\u0108":"C\u0302","\u010a":"C\u0307","\xc7":"C\u0327","\u010e":"D\u030c","\u1e0a":"D\u0307","\u1e10":"D\u0327","\xc9":"E\u0301","\xc8":"E\u0300","\xcb":"E\u0308","\u1ebc":"E\u0303","\u0112":"E\u0304","\u1e16":"E\u0304\u0301","\u1e14":"E\u0304\u0300","\u0114":"E\u0306","\u1e1c":"E\u0327\u0306","\u011a":"E\u030c","\xca":"E\u0302","\u1ebe":"E\u0302\u0301","\u1ec0":"E\u0302\u0300","\u1ec4":"E\u0302\u0303","\u0116":"E\u0307","\u0228":"E\u0327","\u1e1e":"F\u0307","\u01f4":"G\u0301","\u1e20":"G\u0304","\u011e":"G\u0306","\u01e6":"G\u030c","\u011c":"G\u0302","\u0120":"G\u0307","\u0122":"G\u0327","\u1e26":"H\u0308","\u021e":"H\u030c","\u0124":"H\u0302","\u1e22":"H\u0307","\u1e28":"H\u0327","\xcd":"I\u0301","\xcc":"I\u0300","\xcf":"I\u0308","\u1e2e":"I\u0308\u0301","\u0128":"I\u0303","\u012a":"I\u0304","\u012c":"I\u0306","\u01cf":"I\u030c","\xce":"I\u0302","\u0130":"I\u0307","\u0134":"J\u0302","\u1e30":"K\u0301","\u01e8":"K\u030c","\u0136":"K\u0327","\u0139":"L\u0301","\u013d":"L\u030c","\u013b":"L\u0327","\u1e3e":"M\u0301","\u1e40":"M\u0307","\u0143":"N\u0301","\u01f8":"N\u0300","\xd1":"N\u0303","\u0147":"N\u030c","\u1e44":"N\u0307","\u0145":"N\u0327","\xd3":"O\u0301","\xd2":"O\u0300","\xd6":"O\u0308","\u022a":"O\u0308\u0304","\xd5":"O\u0303","\u1e4c":"O\u0303\u0301","\u1e4e":"O\u0303\u0308","\u022c":"O\u0303\u0304","\u014c":"O\u0304","\u1e52":"O\u0304\u0301","\u1e50":"O\u0304\u0300","\u014e":"O\u0306","\u01d1":"O\u030c","\xd4":"O\u0302","\u1ed0":"O\u0302\u0301","\u1ed2":"O\u0302\u0300","\u1ed6":"O\u0302\u0303","\u022e":"O\u0307","\u0230":"O\u0307\u0304","\u0150":"O\u030b","\u1e54":"P\u0301","\u1e56":"P\u0307","\u0154":"R\u0301","\u0158":"R\u030c","\u1e58":"R\u0307","\u0156":"R\u0327","\u015a":"S\u0301","\u1e64":"S\u0301\u0307","\u0160":"S\u030c","\u1e66":"S\u030c\u0307","\u015c":"S\u0302","\u1e60":"S\u0307","\u015e":"S\u0327","\u0164":"T\u030c","\u1e6a":"T\u0307","\u0162":"T\u0327","\xda":"U\u0301","\xd9":"U\u0300","\xdc":"U\u0308","\u01d7":"U\u0308\u0301","\u01db":"U\u0308\u0300","\u01d5":"U\u0308\u0304","\u01d9":"U\u0308\u030c","\u0168":"U\u0303","\u1e78":"U\u0303\u0301","\u016a":"U\u0304","\u1e7a":"U\u0304\u0308","\u016c":"U\u0306","\u01d3":"U\u030c","\xdb":"U\u0302","\u016e":"U\u030a","\u0170":"U\u030b","\u1e7c":"V\u0303","\u1e82":"W\u0301","\u1e80":"W\u0300","\u1e84":"W\u0308","\u0174":"W\u0302","\u1e86":"W\u0307","\u1e8c":"X\u0308","\u1e8a":"X\u0307","\xdd":"Y\u0301","\u1ef2":"Y\u0300","\u0178":"Y\u0308","\u1ef8":"Y\u0303","\u0232":"Y\u0304","\u0176":"Y\u0302","\u1e8e":"Y\u0307","\u0179":"Z\u0301","\u017d":"Z\u030c","\u1e90":"Z\u0302","\u017b":"Z\u0307","\u03ac":"\u03b1\u0301","\u1f70":"\u03b1\u0300","\u1fb1":"\u03b1\u0304","\u1fb0":"\u03b1\u0306","\u03ad":"\u03b5\u0301","\u1f72":"\u03b5\u0300","\u03ae":"\u03b7\u0301","\u1f74":"\u03b7\u0300","\u03af":"\u03b9\u0301","\u1f76":"\u03b9\u0300","\u03ca":"\u03b9\u0308","\u0390":"\u03b9\u0308\u0301","\u1fd2":"\u03b9\u0308\u0300","\u1fd1":"\u03b9\u0304","\u1fd0":"\u03b9\u0306","\u03cc":"\u03bf\u0301","\u1f78":"\u03bf\u0300","\u03cd":"\u03c5\u0301","\u1f7a":"\u03c5\u0300","\u03cb":"\u03c5\u0308","\u03b0":"\u03c5\u0308\u0301","\u1fe2":"\u03c5\u0308\u0300","\u1fe1":"\u03c5\u0304","\u1fe0":"\u03c5\u0306","\u03ce":"\u03c9\u0301","\u1f7c":"\u03c9\u0300","\u038e":"\u03a5\u0301","\u1fea":"\u03a5\u0300","\u03ab":"\u03a5\u0308","\u1fe9":"\u03a5\u0304","\u1fe8":"\u03a5\u0306","\u038f":"\u03a9\u0301","\u1ffa":"\u03a9\u0300"};class io{constructor(e,t){this.mode=void 0,this.gullet=void 0,this.settings=void 0,this.leftrightDepth=void 0,this.nextToken=void 0,this.mode="math",this.gullet=new to(e,t,this.mode),this.settings=t,this.leftrightDepth=0,this.nextToken=null}expect(e,t){if(void 0===t&&(t=!0),this.fetch().text!==e)throw new n("Expected '"+e+"', got '"+this.fetch().text+"'",this.fetch());t&&this.consume()}consume(){this.nextToken=null}fetch(){return null==this.nextToken&&(this.nextToken=this.gullet.expandNextToken()),this.nextToken}switchMode(e){this.mode=e,this.gullet.switchMode(e)}parse(){this.settings.globalGroup||this.gullet.beginGroup(),this.settings.colorIsTextColor&&this.gullet.macros.set("\\color","\\textcolor");try{const e=this.parseExpression(!1);return this.expect("EOF"),this.settings.globalGroup||this.gullet.endGroup(),e}finally{this.gullet.endGroups()}}subparse(e){const t=this.nextToken;this.consume(),this.gullet.pushToken(new en("}")),this.gullet.pushTokens(e);const r=this.parseExpression(!1);return this.expect("}"),this.nextToken=t,r}parseExpression(e,t){const r=[];for(;;){"math"===this.mode&&this.consumeSpaces();const n=this.fetch();if(io.endOfExpression.has(n.text))break;if(t&&n.text===t)break;if(e&&Ln[n.text]&&Ln[n.text].infix)break;const o=this.parseAtom(t);if(!o)break;"internal"!==o.type&&r.push(o)}return"text"===this.mode&&this.formLigatures(r),this.handleInfixNodes(r)}handleInfixNodes(e){let t,r=-1;for(let o=0;o=128))return null;this.settings.strict&&(T(t.charCodeAt(0))?"math"===this.mode&&this.settings.reportNonstrict("unicodeTextInMathMode",'Unicode text character "'+t[0]+'" used in math mode',e):this.settings.reportNonstrict("unknownSymbol",'Unrecognized Unicode character "'+t[0]+'" ('+t.charCodeAt(0)+")",e)),o={type:"textord",mode:"text",loc:Qr.range(e),text:t}}if(this.consume(),r)for(let t=0;t t.trim()).filter(Boolean) || args.tiers; + else if (argv[i] === "--out") args.out = argv[++i] || args.out; + } + return args; +} + +function decodeDataUrlToBuffer(dataUrl) { + const comma = dataUrl.indexOf(","); + const base64 = dataUrl.slice(comma + 1); + return Buffer.from(base64, "base64"); +} + +function toBinaryBuffer(result) { + const data = result?.data ?? result; + if (typeof data === "string" && data.startsWith("data:")) return decodeDataUrlToBuffer(data); + if (data instanceof Uint8Array) return Buffer.from(data); + throw new Error("unexpected binary writer output"); +} + +// 每个 tier 的文本源 builder scale。 +function scaleForTier(tier, builderKey) { + if (tier === "large") { + return buildToTargetBytes(TEXT_BUILDERS[builderKey], LARGE_TARGET_BYTES).scale; + } + return SIZE_TIERS[tier]; +} + +// 大尺寸 PNG(>3MB 原始像素 → 压缩后视图案而定)。 +const PNG_DIMS = { small: [96, 64], medium: [640, 480], large: [1600, 1600] }; + +async function emit(manifest, outDir, format, tier, buffer, source) { + const dir = path.join(outDir, tier); + await mkdir(dir, { recursive: true }); + const fileName = `${format}-${tier}.${format}`; + const filePath = path.join(dir, fileName); + await writeFile(filePath, buffer); + manifest.files.push({ + format, + tier, + path: path.relative(".", filePath).replaceAll("\\", "/"), + bytes: buffer.length, + source, + }); + const mb = (buffer.length / (1024 * 1024)).toFixed(2); + console.log(` ${format.padEnd(5)} ${tier.padEnd(6)} ${String(buffer.length).padStart(9)} B (${mb} MB) ${fileName}`); +} + +async function generateTier(manifest, outDir, tier) { + console.log(`\n[tier: ${tier}]`); + + // 1) 文本原生格式:md / html / json / xml / csv / txt + const textSources = {}; + for (const key of Object.keys(TEXT_BUILDERS)) { + const scale = scaleForTier(tier, key); + const content = TEXT_BUILDERS[key](scale); + textSources[key] = content; + await emit(manifest, outDir, key, tier, Buffer.from(content, "utf8"), `builder:${key}@scale=${scale}`); + } + + // 2) 重格式经项目自带 writer 程序化产出(输入用上面的文本源)。 + const binaryJobs = [ + { format: "docx", from: "md", source: () => textSources.md }, + { format: "pptx", from: "md", source: () => textSources.md }, + { format: "epub", from: "md", source: () => textSources.md }, + { format: "pdf", from: "md", source: () => textSources.md }, + { format: "xlsx", from: "csv", source: () => textSources.csv }, + ]; + for (const job of binaryJobs) { + try { + const result = convertContent({ + content: job.source(), + from: job.from, + to: job.format, + title: `sample-${job.format}-${tier}`, + options: { repair: false }, + }); + await emit(manifest, outDir, job.format, tier, toBinaryBuffer(result), `convert:${job.from}->${job.format}`); + } catch (error) { + console.log(` ${job.format.padEnd(5)} ${tier.padEnd(6)} SKIPPED (${error.message})`); + manifest.skipped.push({ format: job.format, tier, reason: error.message }); + } + } + + // 3) PNG:程序化棋盘+渐变图(无 writer,直接编码)。 + const [pw, ph] = PNG_DIMS[tier] || PNG_DIMS.small; + await emit(manifest, outDir, "png", tier, buildPatternPng(pw, ph), `png-encode:${pw}x${ph}`); +} + +async function main() { + const { tiers, out } = parseArgs(process.argv.slice(2)); + const outDir = path.resolve(out); + console.log(`Trans2Former sample generator → ${path.relative(".", outDir) || outDir}`); + console.log(`tiers: ${tiers.join(", ")}`); + + await rm(outDir, { recursive: true, force: true }); + await mkdir(outDir, { recursive: true }); + + const manifest = { + schema: "trans2former.sample-corpus.v1", + generatedAt: new Date().toISOString(), + tiers, + note: "Programmatically regenerated fixtures. Binaries are git-ignored; rerun `npm run samples:generate`.", + coverageGaps: [ + { format: "doc", reason: "legacy Word binary has no writer; reader is best-effort only" }, + { format: "ofd", reason: "OFD writer not implemented (reader is L0); see docs/OFD_RESEARCH.md" }, + ], + files: [], + skipped: [], + }; + + for (const tier of tiers) { + if (!(tier in SIZE_TIERS)) { + console.log(`(skip unknown tier: ${tier})`); + continue; + } + await generateTier(manifest, outDir, tier); + } + + await writeFile(path.join(outDir, "MANIFEST.json"), JSON.stringify(manifest, null, 2)); + + const largest = manifest.files.reduce((max, f) => Math.max(max, f.bytes), 0); + console.log(`\nDone. ${manifest.files.length} files, ${manifest.skipped.length} skipped. Largest: ${(largest / (1024 * 1024)).toFixed(2)} MB.`); + console.log(`Manifest: ${path.relative(".", path.join(outDir, "MANIFEST.json")).replaceAll("\\", "/")}`); +} + +main().catch((error) => { + console.error("Sample generation failed:", error); + process.exit(1); +}); diff --git a/scripts/latex-math-test.js b/scripts/latex-math-test.js new file mode 100644 index 0000000..a70500b --- /dev/null +++ b/scripts/latex-math-test.js @@ -0,0 +1,80 @@ +import assert from "node:assert/strict"; + +import { + convertContent, +} from "../public/browser-transformer.js"; +import { parseInlineMarkdown } from "../public/formats/inline-tokens.js"; +import { + inlinesToHtml, + inlinesToMarkdown, + inlinesToPlainText, + createInlineMath, +} from "../public/core/models/semantic-inlines.js"; + +const BACKSLASH = String.fromCharCode(92); +const frac = `${BACKSLASH}frac{a}{b}`; // \frac{a}{b} +const sum = `${BACKSLASH}sum_{i=1}^{n}`; // \sum_{i=1}^{n} + +// 1. Inline $...$ recognized; backslash + underscore preserved verbatim +{ + const toks = parseInlineMarkdown(`pre $x^2 + y_0$ post`); + const math = toks.find((t) => t.type === "math"); + assert.ok(math, "inline $...$ should produce a math token"); + assert.equal(math.display, false); + assert.equal(math.value, "x^2 + y_0"); +} + +// 2. Display $$...$$ preserves backslashes (no markdown escaping eats \frac / \sum) +{ + const toks = parseInlineMarkdown(`$$${frac} = ${sum} c_i$$`); + const math = toks.find((t) => t.type === "math"); + assert.ok(math, "display $$...$$ should produce a math token"); + assert.equal(math.display, true); + assert.equal(math.value.charCodeAt(0), 92, "value must start with a literal backslash"); + assert.ok(math.value.includes(frac), "\\frac must be preserved verbatim"); + assert.ok(math.value.includes(sum), "\\sum must be preserved verbatim"); +} + +// 3. Currency is NOT treated as math (heuristic: no inner-edge whitespace) +{ + const toks = parseInlineMarkdown("pay $5 and $10 today"); + assert.equal(toks.some((t) => t.type === "math"), false, "currency $5 / $10 must not become math"); +} + +// 4. HTML output emits a katex-targetable span carrying raw tex in data-tex +{ + const html = inlinesToHtml(parseInlineMarkdown(`$${frac}$`)); + assert.ok(html.includes('class="t2f-math"'), "math should render a .t2f-math span"); + assert.ok(html.includes('data-display="false"')); + assert.ok(html.includes(`data-tex="${frac}"`), "data-tex must carry the raw tex (backslash preserved)"); +} + +// 5. Markdown round-trip preserves $...$ and $$...$$ verbatim +{ + const md = `inline $x^2$ and block:\n\n$$${frac}$$\n`; + const toks = parseInlineMarkdown(`inline $x^2$ and block: $$${frac}$$`); + const rendered = inlinesToMarkdown(toks); + assert.ok(rendered.includes("$x^2$"), "inline math round-trips"); + assert.ok(rendered.includes(`$$${frac}$$`), "display math round-trips with backslash"); + + const result = convertContent({ content: md, from: "md", to: "md", options: { repair: false } }); + assert.ok(result.data.includes("$x^2$")); + assert.ok(result.data.includes(`$$${frac}$$`)); +} + +// 6. Plain text keeps delimited tex; createInlineMath factory shape +{ + const node = createInlineMath("E=mc^2", false); + assert.deepEqual(node, { type: "math", value: "E=mc^2", display: false }); + assert.equal(inlinesToPlainText([node]), "$E=mc^2$"); + assert.equal(inlinesToPlainText([createInlineMath("x", true)]), "$$x$$"); +} + +// 7. End-to-end md -> html conversion emits math spans (not from underscores) +{ + const result = convertContent({ content: `$$a_b + ${frac}$$`, from: "md", to: "html", options: { repair: false } }); + assert.ok(result.data.includes("t2f-math"), "html conversion should contain a math span"); + assert.ok(!result.data.includes("b"), "underscore inside math must not become "); +} + +console.log("LaTeX math test passed: inline/display tokenization (backslash + underscore preserved), currency exclusion, katex-targetable html span, markdown round-trip, plain-text + factory verified."); diff --git a/scripts/lib/png-encode.js b/scripts/lib/png-encode.js new file mode 100644 index 0000000..506795e --- /dev/null +++ b/scripts/lib/png-encode.js @@ -0,0 +1,80 @@ +// 最小 PNG 编码器(Node 端,仅用于生成测试样例,不进入 public/ 运行时)。 +// 用 node:zlib deflate 压缩,输出真实可读的 RGBA PNG。支持生成大尺寸图(>3MB)。 + +import zlib from "node:zlib"; + +const PNG_SIGNATURE = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]); + +const CRC_TABLE = (() => { + const table = new Uint32Array(256); + for (let n = 0; n < 256; n += 1) { + let c = n; + for (let k = 0; k < 8; k += 1) { + c = c & 1 ? 0xedb88320 ^ (c >>> 1) : c >>> 1; + } + table[n] = c >>> 0; + } + return table; +})(); + +function crc32(buffer) { + let crc = 0xffffffff; + for (let i = 0; i < buffer.length; i += 1) { + crc = CRC_TABLE[(crc ^ buffer[i]) & 0xff] ^ (crc >>> 8); + } + return (crc ^ 0xffffffff) >>> 0; +} + +function chunk(type, data) { + const typeBuf = Buffer.from(type, "ascii"); + const lengthBuf = Buffer.alloc(4); + lengthBuf.writeUInt32BE(data.length, 0); + const crcBuf = Buffer.alloc(4); + crcBuf.writeUInt32BE(crc32(Buffer.concat([typeBuf, data])), 0); + return Buffer.concat([lengthBuf, typeBuf, data, crcBuf]); +} + +// pixelFn(x, y) -> [r, g, b, a] (0-255). 生成 width×height RGBA PNG Buffer。 +export function encodePng(width, height, pixelFn) { + const ihdr = Buffer.alloc(13); + ihdr.writeUInt32BE(width, 0); + ihdr.writeUInt32BE(height, 4); + ihdr.writeUInt8(8, 8); // bit depth + ihdr.writeUInt8(6, 9); // color type RGBA + ihdr.writeUInt8(0, 10); // compression + ihdr.writeUInt8(0, 11); // filter + ihdr.writeUInt8(0, 12); // interlace + + const stride = width * 4; + const raw = Buffer.alloc((stride + 1) * height); + for (let y = 0; y < height; y += 1) { + raw[y * (stride + 1)] = 0; // filter type none + for (let x = 0; x < width; x += 1) { + const [r, g, b, a] = pixelFn(x, y); + const offset = y * (stride + 1) + 1 + x * 4; + raw[offset] = r & 0xff; + raw[offset + 1] = g & 0xff; + raw[offset + 2] = b & 0xff; + raw[offset + 3] = a & 0xff; + } + } + const idatData = zlib.deflateSync(raw, { level: 6 }); + + return Buffer.concat([ + PNG_SIGNATURE, + chunk("IHDR", ihdr), + chunk("IDAT", idatData), + chunk("IEND", Buffer.alloc(0)), + ]); +} + +// 生成一张确定性的彩色棋盘 + 渐变图,便于视觉/OCR 占位测试。 +export function buildPatternPng(width, height) { + return encodePng(width, height, (x, y) => { + const checker = ((x >> 4) + (y >> 4)) % 2 === 0; + const r = Math.floor((x / width) * 255); + const g = Math.floor((y / height) * 255); + const b = checker ? 200 : 60; + return [r, g, b, 255]; + }); +} diff --git a/scripts/lib/sample-content.js b/scripts/lib/sample-content.js new file mode 100644 index 0000000..d784745 --- /dev/null +++ b/scripts/lib/sample-content.js @@ -0,0 +1,181 @@ +// 优质测试样例内容生成器(纯函数,确定性)。供 generate-samples.js 写盘和 +// sample-corpus-test.js 小规模回归共用。每个 builder 接受 scale(章节重复次数), +// 用来产出大小不一的复杂样例:scale=1 小,scale≈120 中(~300KB),scale≈1300 大(≥3MB)。 + +export const SIZE_TIERS = Object.freeze({ + small: 1, + medium: 120, + large: 1300, +}); + +// 复杂文本素材:中英文、RTL、emoji、标点、实体、长词。确定性、无随机。 +const PARAGRAPH_SNIPPETS = [ + "Trans2Former 是一个**本地优先**、_零上传_的多格式文档转换工作台,强调确定性算法转换、本地 OCR 与转换后质量检验。", + "The quick brown fox jumps over the lazy dog while 中文、日本語、한국어 and emoji 🚀📄✅ coexist in one paragraph.", + "数学近似:E = mc^2,π ≈ 3.14159,∑(1/n^2) = π²/6;货币 $1,234.56 / €987,65 / ¥10000。", + "RTL 混排:العربية والعبرية טקסט مع English inline,验证双向文本不破坏块结构。", + "特殊字符与实体:< > & \" ' © ® ™ — – … « » 「」『』,以及 HTML 实体 & < © 的往返保真。", + "Edge tokens: supercalifragilisticexpialidocious, a_very_long_snake_case_identifier_that_should_not_wrap_oddly, and URLs like https://example.com/path?query=1&lang=zh#frag.", +]; + +const CODE_SAMPLES = [ + { lang: "javascript", code: "export function convert(input) {\n const model = read(input);\n return write(model); // round-trip\n}" }, + { lang: "python", code: "def fib(n):\n a, b = 0, 1\n for _ in range(n):\n a, b = b, a + b\n return a # 斐波那契" }, + { lang: "json", code: "{\n \"name\": \"trans2former\",\n \"local\": true,\n \"formats\": [\"md\", \"pdf\", \"docx\"]\n}" }, + { lang: "sql", code: "SELECT id, name FROM docs WHERE lang = 'zh-CN' ORDER BY created_at DESC;" }, +]; + +function repeatedField(base, index) { + return `${base}-${String(index).padStart(5, "0")}`; +} + +export function buildComplexMarkdown(scale = 1) { + const parts = []; + parts.push("# Trans2Former 综合能力测试样例\n"); + parts.push("> 本文件由 `scripts/generate-samples.js` 程序化生成,用于压力测试转换、版面与检验能力。\n"); + parts.push("[TOC]\n"); + for (let s = 0; s < scale; s += 1) { + const n = s + 1; + parts.push(`## 第 ${n} 章 · 复杂排版段落\n`); + for (const snippet of PARAGRAPH_SNIPPETS) { + parts.push(`${snippet}\n`); + } + parts.push(`### ${n}.1 列表与任务\n`); + parts.push("- 无序项 A\n - 嵌套项 A.1\n - 深层项 A.1.a\n- 无序项 B\n"); + parts.push("1. 有序项一\n2. 有序项二\n 1. 子项 2.1\n 2. 子项 2.2\n"); + parts.push("- [x] 已完成任务\n- [ ] 待办任务\n"); + parts.push(`### ${n}.2 表格(含对齐 + 中文 + 数字)\n`); + parts.push("| 左对齐 | 居中 | 右对齐 |\n| :--- | :---: | ---: |\n"); + for (let r = 0; r < 6; r += 1) { + parts.push(`| ${repeatedField("项目", n * 10 + r)} | 状态-${r} | ${(r * 1234.5).toFixed(2)} |\n`); + } + parts.push("\n"); + const code = CODE_SAMPLES[s % CODE_SAMPLES.length]; + parts.push(`### ${n}.3 代码块(${code.lang})\n`); + parts.push("```" + code.lang + "\n" + code.code + "\n```\n"); + parts.push("> 引用块:转换核心围绕 `input -> canonical model -> mapper route -> QualityReport -> output`。\n>> 嵌套引用:规则 diff + SSIM + OCR 回读 三层检验。\n"); + parts.push(`![示意图 ${n}](https://example.com/img/${n}.png "图 ${n}")\n`); + parts.push("脚注引用[^note" + n + "],行内 `code`、**粗体**、*斜体*、~~删除线~~ 与 [链接](https://example.com)。\n"); + parts.push(`[^note${n}]: 第 ${n} 章脚注内容,验证脚注往返。\n`); + parts.push("\n---\n\n"); + } + return parts.join("\n"); +} + +export function buildComplexHtml(scale = 1) { + const parts = []; + parts.push("\n综合 HTML 样例"); + parts.push("

Trans2Former HTML 能力测试

"); + for (let s = 0; s < scale; s += 1) { + const n = s + 1; + parts.push(`

第 ${n} 节

`); + for (const snippet of PARAGRAPH_SNIPPETS) { + parts.push(`

${snippet.replace(/&/g, "&").replace(/`); + } + parts.push("

  • 无序 A
    • 嵌套 A.1
  • 无序 B
"); + parts.push("
  1. 有序一
  2. 有序二
    1. 子 2.1
"); + parts.push(""); + for (let r = 0; r < 6; r += 1) { + parts.push(``); + } + parts.push("
列1列2列3
${repeatedField("单元", n * 10 + r)}${r}${(r * 99.9).toFixed(2)}
"); + const code = CODE_SAMPLES[s % CODE_SAMPLES.length]; + parts.push(`
${code.code.replace(/
`); + parts.push("

引用:本地优先、零上传。

"); + parts.push(`
图 ${n}
图 ${n}
`); + } + parts.push(""); + return parts.join("\n"); +} + +export function buildComplexJson(scale = 1) { + const records = []; + for (let s = 0; s < scale; s += 1) { + records.push({ + id: repeatedField("doc", s), + title: `文档 ${s + 1} · Document ${s + 1}`, + tags: ["中文", "english", "العربية", "emoji-🚀"], + meta: { + author: `作者-${s}`, + nested: { level: 3, values: [s, s * 2, s * 3], note: "深层嵌套 nested object" }, + unicode: "特殊字符 < > & \" ' © — …", + }, + sections: PARAGRAPH_SNIPPETS.map((text, i) => ({ index: i, text })), + }); + } + return JSON.stringify({ schema: "trans2former.sample.v1", count: records.length, records }, null, 2); +} + +export function buildComplexXml(scale = 1) { + const parts = []; + parts.push(""); + parts.push(""); + for (let s = 0; s < scale; s += 1) { + const n = s + 1; + parts.push(` `); + parts.push(` 复杂书目 ${n} & Title ${n}`); + parts.push(` 作者 ${n}`); + parts.push(` 与 & 符号的 CDATA 段,第 ${n} 条。 ]]>`); + parts.push(" 中文englishemoji-🚀"); + parts.push(` ${(n * 12.5).toFixed(2)}`); + parts.push(" "); + } + parts.push(""); + return parts.join("\n"); +} + +export function buildComplexCsv(scale = 1) { + const rows = []; + rows.push("id,名称,description,price,tags,note"); + for (let s = 0; s < scale; s += 1) { + for (let r = 0; r < 6; r += 1) { + const idx = s * 6 + r; + const desc = `含逗号, 引号"和换行\n的字段 ${idx}`; + const tags = "中文;english;🚀"; + rows.push(`${idx},"产品 ${idx}","${desc.replace(/"/g, "\"\"")}",${(idx * 3.14).toFixed(2)},"${tags}","note ${idx} — 特殊字符 < > &"`); + } + } + return rows.join("\n"); +} + +export function buildComplexText(scale = 1) { + const parts = []; + parts.push("Trans2Former 纯文本能力测试样例"); + parts.push("=".repeat(60)); + for (let s = 0; s < scale; s += 1) { + const n = s + 1; + parts.push(`\n[第 ${n} 段]`); + for (const snippet of PARAGRAPH_SNIPPETS) { + // 去掉 markdown 标记,纯文本 + parts.push(snippet.replace(/[*_~`#>]/g, "")); + } + parts.push("超长单行:" + "长词".repeat(200) + " end-of-line-" + n); + parts.push("制表符\t分隔\t列1\t列2\t列3"); + } + return parts.join("\n"); +} + +// 把某个 builder 反复扩到至少 targetBytes 字节(用于精确逼近 3MB)。 +export function buildToTargetBytes(builder, targetBytes) { + let scale = 1; + let content = builder(scale); + const perScale = Buffer.byteLength(builder(2), "utf8") - Buffer.byteLength(builder(1), "utf8"); + if (perScale > 0) { + scale = Math.max(1, Math.ceil(targetBytes / perScale)); + content = builder(scale); + while (Buffer.byteLength(content, "utf8") < targetBytes) { + scale = Math.ceil(scale * 1.15) + 1; + content = builder(scale); + } + } + return { content, scale }; +} + +export const TEXT_BUILDERS = Object.freeze({ + md: buildComplexMarkdown, + html: buildComplexHtml, + json: buildComplexJson, + xml: buildComplexXml, + csv: buildComplexCsv, + txt: buildComplexText, +}); diff --git a/scripts/local-model-direction-test.js b/scripts/local-model-direction-test.js index b34aadb..2c8733f 100644 --- a/scripts/local-model-direction-test.js +++ b/scripts/local-model-direction-test.js @@ -41,6 +41,7 @@ assertIncludes("tasks", "model-cache"); assertIncludes("tasks", "30–80 MB"); assertIncludes("tasks", "默认包不含 GB 级模型"); assertIncludes("tasks", "OCR 模型按需下载"); +assertIncludes("tasks", "qualityReport.ruleDiff"); assertExcludes("tasks", "不依赖 Office、LibreOffice、Pandoc、云端转换或 OCR/AI"); assertExcludesPattern( "tasks", @@ -146,6 +147,22 @@ assertIncludes("multiModel", "ocrResultToFixedLayoutPage"); assertIncludes("multiModel", "mergeOCRResultsToFixedLayout"); assertIncludes("multiModel", "createBrowserPdfPageRasterizer"); assertIncludes("multiModel", "MODEL_TEXT_ORDER_HEURISTIC"); +assertIncludes("multiModel", "runVerificationStage"); +assertIncludes("multiModel", "diffSemanticDocs"); +assertIncludes("multiModel", "RULE_DIFF_DRIFT"); +assertIncludes("multiModel", "computeSSIM"); +assertIncludes("multiModel", "runVerificationStageAsync"); +assertIncludes("multiModel", "SSIM_VISUAL_DRIFT"); +assertIncludes("multiModel", "runOcrReadbackLayer"); +assertIncludes("multiModel", "compareText"); +assertIncludes("multiModel", "OCR_READBACK_DRIFT"); +assertIncludes("multiModel", "PP-OCRv5"); +assertIncludes("multiModel", "ONNX"); +assertIncludes("multiModel", "WebGPU"); +assertIncludes("multiModel", "paddleOcrEngine"); +assertIncludes("multiModel", "onnxruntime-web"); +assertIncludes("multiModel", "runPaddlePipeline"); +assertIncludes("multiModel", "ctcGreedyDecode"); assertIncludes("budget", "model-cache///"); assertIncludes("budget", "SHA-256"); assertExcludes("multiModel", "external engine 一律插件化"); diff --git a/scripts/local-security-test.js b/scripts/local-security-test.js index cb880bd..c8fa17f 100644 --- a/scripts/local-security-test.js +++ b/scripts/local-security-test.js @@ -42,18 +42,54 @@ const ALLOWED_PUBLIC_FILES = new Set([ // scan-pdf-stage 串联 enhanceWithOCR 异步多页路径。两者均不联网。 path.normalize("public/core/ocr/pdf-rasterizer.js"), path.normalize("public/core/ocr/scan-pdf-stage.js"), + // P9-D.1/D.2/D.2.b PP-OCRv5 高级 OCR:engine 实现 OCREngine 契约;bootstrap 注册 engine + + // ONNX manifest;runtime 通过同源 vendor 加载 onnxruntime-web + WebGPU/WASM 后端;pipeline + // 是纯前后处理(预处理/DB 后处理/CTC 解码)+ 编排器;default-models fetch 同源 vendor 随包 + // 模型 → 本地缓存(开箱即用)。均不联网、不上传,仅访问 /vendor/ 同源资源。 + path.normalize("public/core/ocr/paddle-ocr-engine.js"), + path.normalize("public/core/ocr/paddle-ocr-bootstrap.js"), + path.normalize("public/core/ocr/paddle-ocr-runtime.js"), + path.normalize("public/core/ocr/paddle-ocr-pipeline.js"), + path.normalize("public/core/ocr/paddle-default-models.js"), // P9-B FixedLayoutModel + 浏览器 rasterize:ocr-to-fixed-layout 仅做数据映射; // pdf-rasterizer-browser dynamic import 同源 vendor pdfjs,运行时画布在浏览器/Tauri。 path.normalize("public/core/ocr/ocr-to-fixed-layout.js"), path.normalize("public/core/ocr/pdf-rasterizer-browser.js"), + // P9-C.1 转换后检验三层(规则 diff 层):block-fingerprint 共享指纹;rule-diff 字段级 + // 结构对比;verification-stage 编排 writer→reader 回读 diff。三者均为纯函数,不联网、 + // 不持久化,下方 STRICT_LOCAL_ONLY_FILES 守门它们不得出现任何远程协议。 + path.normalize("public/core/verification/block-fingerprint.js"), + path.normalize("public/core/verification/rule-diff.js"), + path.normalize("public/core/verification/verification-stage.js"), + // P9-C.2 转换后检验三层(SSIM 视觉回环层):ssim 纯算法;page-image-source 像素源抽象; + // page-image-source-browser 通过同源 vendor pdfjs + canvas 取像素。三者不联网。 + path.normalize("public/core/verification/ssim.js"), + path.normalize("public/core/verification/page-image-source.js"), + path.normalize("public/core/verification/page-image-source-browser.js"), + // P9-C.3 转换后检验三层(OCR 回读层):ocr-readback 复用已注册 ocr-text engine + + // OCR pdf-rasterizer 把输出 PDF 读回文本做字符级相似度。纯逻辑,不联网。 + path.normalize("public/core/verification/ocr-readback.js"), ]); function isLocalVendorAsset(normalizedPath, content) { + // onnxruntime-web 是第三方运行时 bundle,其 minified 代码内含 CDN/源映射等远程 URL + // 字符串(无法在静态扫描层面剔除)。本项目对 ORT 的"零联网"保证来自两层运行时控制: + // (1) paddle-ocr-runtime.loadOnnxRuntime 把 ort.env.wasm.wasmPaths 钉到同源 vendor 目录, + // wasm 不走 CDN; + // (2) Tauri CSP `connect-src 'self'` 阻断任何远程连接。 + // 因此 onnxruntime vendor 整目录视为可信,不做"无远程 URL 字符串"硬扫描。 + if (normalizedPath.startsWith(path.normalize("public/vendor/onnxruntime/"))) { + return true; + } + // KaTeX 是纯同步数学排版库,运行时零网络 I/O。其 min.js 内的 http(s) 字符串仅为 W3C + // MathML/SVG 命名空间标识(createElementNS 用),非网络请求,故整目录视为可信。 + if (normalizedPath.startsWith(path.normalize("public/vendor/katex/"))) { + return true; + } const isVendor = normalizedPath.startsWith(path.normalize("public/vendor/pdfjs/")) || normalizedPath.startsWith(path.normalize("public/vendor/tesseract/")); if (!isVendor) return false; - // Vendor 资源(pdfjs / tesseract)允许内部 fetch / XHR 之类访问同源 wasm/worker; - // 但禁止任何远程 URL(http(s):// / ws(s)://)。 + // pdfjs / tesseract vendor:允许内部 fetch / XHR 访问同源 wasm/worker,但禁止任何远程 URL。 return !content.includes("http://") && !content.includes("https://") && !content.includes("ws://") @@ -112,6 +148,18 @@ const STRICT_LOCAL_ONLY_FILES = new Set([ path.normalize("public/core/ocr/scan-pdf-stage.js"), path.normalize("public/core/ocr/ocr-to-fixed-layout.js"), path.normalize("public/core/ocr/pdf-rasterizer-browser.js"), + path.normalize("public/core/ocr/paddle-ocr-engine.js"), + path.normalize("public/core/ocr/paddle-ocr-bootstrap.js"), + path.normalize("public/core/ocr/paddle-ocr-runtime.js"), + path.normalize("public/core/ocr/paddle-ocr-pipeline.js"), + path.normalize("public/core/ocr/paddle-default-models.js"), + path.normalize("public/core/verification/block-fingerprint.js"), + path.normalize("public/core/verification/rule-diff.js"), + path.normalize("public/core/verification/verification-stage.js"), + path.normalize("public/core/verification/ssim.js"), + path.normalize("public/core/verification/page-image-source.js"), + path.normalize("public/core/verification/page-image-source-browser.js"), + path.normalize("public/core/verification/ocr-readback.js"), ]); function assertNoRemoteUrlsInStrictFiles(filePath, content) { diff --git a/scripts/ocr-baseline-test.js b/scripts/ocr-baseline-test.js index 7e85747..6602772 100644 --- a/scripts/ocr-baseline-test.js +++ b/scripts/ocr-baseline-test.js @@ -50,6 +50,13 @@ import { MODEL_TEXT_ORDER_HEURISTIC, getFixedLayoutSummary, fixedLayoutToSemantic, + paddleOcrEngine, + PADDLE_OCR_MANIFEST_ID, + markPaddleOcrVendorReady, + ensurePaddleOcrBootstrap, + loadOnnxRuntime, + pickExecutionProviders, + PADDLE_VENDOR_PATHS, } from "../public/browser-transformer.js"; import { ConversionError } from "../public/core/conversion-error.js"; @@ -218,9 +225,9 @@ function makeStubEngine(overrides = {}) { // registered. With both isAvailable()=false, pickForTask falls back to the last // registered engine. Both ids are acceptable here. const picked = defaultOCRRegistry.pickForTask("ocr-text"); - assert.ok(picked, "pickForTask should return either placeholder or tesseract"); + assert.ok(picked, "pickForTask should return a fallback engine"); assert.equal( - ["placeholder", "tesseract-zh-en"].includes(picked.id), + ["placeholder", "tesseract-zh-en", "paddleocr-v5"].includes(picked.id), true, `pickForTask returned unexpected engine: ${picked.id}`, ); @@ -236,9 +243,9 @@ function makeStubEngine(overrides = {}) { assert.ok(ocrWarning, "PNG reader should attach OCR_UNAVAILABLE warning"); assert.equal(ocrWarning.severity, "info"); assert.equal( - ["placeholder", "tesseract-zh-en"].includes(ocrWarning.details?.engineId), + ["placeholder", "tesseract-zh-en", "paddleocr-v5"].includes(ocrWarning.details?.engineId), true, - `expected engineId to be placeholder or tesseract-zh-en, got ${ocrWarning.details?.engineId}`, + `expected engineId to be placeholder/tesseract/paddle, got ${ocrWarning.details?.engineId}`, ); } @@ -270,6 +277,13 @@ function makeStubEngine(overrides = {}) { const status = defaultModelCache.getStatus(TESSERACT_MANIFEST_ID); assert.ok(status, "tesseract manifest should be registered in defaultModelCache"); assert.equal(status.status, STATUS_NOT_DOWNLOADED); + + // ensureProbe must not throw on the frozen engine (readiness lives in module state, + // not a frozen instance prop) — otherwise the security-center import flow fails silently. + markTesseractVendorReady(true); + await assert.doesNotReject(() => tesseractOCREngine.ensureProbe(), "tesseract.ensureProbe must not throw on frozen engine"); + markTesseractVendorReady(false); + assert.equal(await tesseractOCREngine.ensureProbe(), false); } // 12. tesseractOCREngine.recognize rejects with OCR_UNAVAILABLE/OCR_ENGINE_FAILED depending on stage @@ -509,6 +523,18 @@ function makeStubEngine(overrides = {}) { options: { repair: false }, }); assert.equal(result.data.includes("ASYNC-STUB-TEXT"), true, "convertContentAsync should append OCR text into markdown output"); + + // Regression: the repair cycle must not clobber the OCR modelReview (ocr/ocrQuality) + // — convertContentAsync without repair:false should still surface OCR recognition quality. + const withRepair = await convertContentAsync({ + content: tinyPng, + from: "png", + to: "txt", + title: "async-stub-repair.png", + options: {}, + }); + assert.ok(withRepair.quality?.modelReview?.ocr, "result.quality.modelReview.ocr must survive the repair cycle for the UI"); + assert.equal(withRepair.quality.modelReview.ocr.engine, "async-stub"); } finally { defaultOCRRegistry.unregister(stubEngine.id); } @@ -544,6 +570,13 @@ function makeStubEngine(overrides = {}) { assert.equal(enhanced.metadata.ocr.lineCount, 1); assert.equal(enhanced.metadata.ocr.lines[0].text, "stage-line"); assert.equal(typeof enhanced.metadata.ocr.lines[0].confidence, "number"); + // blockId must resolve to a real appended OCR block that CONTAINS the line text, + // which is the precondition for low-confidence replaceTextRun repair to target it. + const lineBlockId = enhanced.metadata.ocr.lines[0].blockId; + assert.ok(lineBlockId && lineBlockId.startsWith("ocr-block-"), "ocr line should carry a stable ocr-block id"); + const target = enhanced.blocks.find((b) => b.id === lineBlockId); + assert.ok(target, "blockId should resolve to a real block in enhanced.blocks"); + assert.ok((target.text || "").includes("stage-line"), "target block text should contain the line text"); } finally { defaultOCRRegistry.unregister(stubEngine.id); } @@ -850,6 +883,16 @@ function makeStubEngine(overrides = {}) { warnings.find((w) => w.code === MODEL_TEXT_ORDER_HEURISTIC), "stage should emit MODEL_TEXT_ORDER_HEURISTIC info warning", ); + // Each ocr line must resolve to a real appended block CONTAINING its text — even though + // mergeOCRResultsToFixedLayout re-sorts by reading order (so lines order != block order). + const ocrLines = enhanced.metadata?.ocr?.lines || []; + assert.equal(ocrLines.length, 2, "two scanned pages => two ocr lines"); + for (const ln of ocrLines) { + assert.ok(ln.blockId && ln.blockId.startsWith("ocr-block-"), "scan-pdf ocr line should carry a stable ocr-block id"); + const target = enhanced.blocks.find((b) => b.id === ln.blockId); + assert.ok(target, "scan-pdf blockId should resolve to a real block"); + assert.ok((target.text || "").includes((ln.text || "").trim()), "target block should contain the line text"); + } } finally { defaultOCRRegistry.unregister(stubEngine.id); resetPdfPageRasterizer(); @@ -883,4 +926,160 @@ function makeStubEngine(overrides = {}) { ); } -console.log("OCR baseline test passed: contracts, registry, bootstraps, storage, png reader/async stage, repair validator, scan PDF detection + rasterizer skeleton + multi-page OCR stage + FixedLayoutModel mapping + browser rasterizer fallback all verified."); +// 35. PP-OCRv5 advanced OCR engine skeleton (P9-D.1): registered, unavailable in Node, +// manifest registered, recognize three-stage rejection. +{ + ensurePaddleOcrBootstrap(); + ensurePaddleOcrBootstrap(); + assert.equal(defaultOCRRegistry.has(paddleOcrEngine.id), true, "paddle engine should be registered after bootstrap"); + assert.equal(paddleOcrEngine.id, "paddleocr-v5"); + assert.equal(paddleOcrEngine.taskCapabilities.includes("ocr-text"), true); + assert.equal(paddleOcrEngine.taskCapabilities.includes("ocr-layout"), true); + assert.equal(paddleOcrEngine.isAvailable(), false, "paddle engine should report unavailable until P9-D.2 wires onnxruntime"); + + const status = defaultModelCache.getStatus(PADDLE_OCR_MANIFEST_ID); + assert.ok(status, "paddle manifest should be registered in defaultModelCache"); + assert.equal(status.status, STATUS_NOT_DOWNLOADED); + + // vendor not ready => OCR_UNAVAILABLE / vendor-not-ready + markPaddleOcrVendorReady(false); + await assert.rejects( + () => paddleOcrEngine.recognize({ image: { width: 10, height: 10 } }), + (err) => err instanceof ConversionError && err.code === OCR_UNAVAILABLE && err.details?.reason === "vendor-not-ready", + ); + + // vendor ready but models missing => OCR_UNAVAILABLE / model-missing + markPaddleOcrVendorReady(true); + try { + await assert.rejects( + () => paddleOcrEngine.recognize({ image: { width: 10, height: 10 } }), + (err) => err instanceof ConversionError && err.code === OCR_UNAVAILABLE && err.details?.reason === "model-missing", + ); + } finally { + markPaddleOcrVendorReady(false); + } +} + +// 36. PP-OCRv5 onnxruntime-web runtime loader (P9-D.2): EP selection + Node vendor-load reject. +{ + // Node has no navigator.gpu => wasm-only execution providers. + assert.deepEqual(pickExecutionProviders(), ["wasm"], "Node should pick wasm-only execution provider"); + assert.equal(PADDLE_VENDOR_PATHS.mainBundle, "/vendor/onnxruntime/ort.min.mjs"); + + // loadOnnxRuntime dynamic-imports a same-origin vendor path that does not resolve in Node => throws. + await assert.rejects( + () => loadOnnxRuntime(), + (err) => err instanceof ConversionError && err.code === OCR_VENDOR_LOAD_FAILED, + "loadOnnxRuntime should reject with OCR_VENDOR_LOAD_FAILED when vendor onnxruntime-web is absent (Node)", + ); + + // With vendor + models simulated ready, recognize reaches the runtime loader and surfaces + // the vendor-load failure (rather than the earlier vendor-not-ready / model-missing stages). + markPaddleOcrVendorReady(true); + for (const file of ["det.onnx", "cls.onnx", "rec.onnx"]) { + await paddleOcrEngine._storage.put(`paddleocr/v5/${file}`, new Uint8Array([1]).buffer, { sha256: "x" }); + } + try { + await assert.rejects( + () => paddleOcrEngine.recognize({ image: { width: 4, height: 4 } }), + (err) => err instanceof ConversionError && err.code === OCR_VENDOR_LOAD_FAILED, + "paddle recognize should reach loadOnnxRuntime and reject with OCR_VENDOR_LOAD_FAILED in Node", + ); + } finally { + for (const file of ["det.onnx", "cls.onnx", "rec.onnx"]) { + await paddleOcrEngine._storage.delete(`paddleocr/v5/${file}`); + } + markPaddleOcrVendorReady(false); + } +} + +// 37. PP-OCRv5 model import availability flip (P9-D.3): required det+rec present + vendor ready +// => isAvailable() true; cls is OPTIONAL (removing it stays ready); removing a required +// model (rec) => false. Mirrors security-center import/clear with cls optional. +{ + const det = "paddleocr/v5/det.onnx"; + const cls = "paddleocr/v5/cls.onnx"; + const rec = "paddleocr/v5/rec.onnx"; + markPaddleOcrVendorReady(true); + try { + // partial: det present but required rec missing => not ready (cls present but optional) + await paddleOcrEngine._storage.put(det, new Uint8Array([1]).buffer, { sha256: "a" }); + await paddleOcrEngine._storage.put(cls, new Uint8Array([2]).buffer, { sha256: "b" }); + assert.equal(await paddleOcrEngine.ensureProbe(), false, "det without required rec should not be ready"); + assert.equal(paddleOcrEngine.isAvailable(), false); + + // required det+rec present => ready + await paddleOcrEngine._storage.put(rec, new Uint8Array([3]).buffer, { sha256: "c" }); + assert.equal(await paddleOcrEngine.ensureProbe(), true, "required det+rec present => ready"); + assert.equal(paddleOcrEngine.isAvailable(), true); + + // vendor flag off => unavailable even with models + markPaddleOcrVendorReady(false); + assert.equal(paddleOcrEngine.isAvailable(), false, "vendor not ready => unavailable regardless of models"); + + // remove OPTIONAL cls => still ready (det+rec remain) + markPaddleOcrVendorReady(true); + await paddleOcrEngine._storage.delete(cls); + assert.equal(await paddleOcrEngine.ensureProbe(), true, "removing optional cls keeps readiness"); + + // remove a REQUIRED model (rec) => not ready + await paddleOcrEngine._storage.delete(rec); + assert.equal(await paddleOcrEngine.ensureProbe(), false, "removing required rec should drop readiness"); + } finally { + for (const key of [det, cls, rec]) await paddleOcrEngine._storage.delete(key); + markPaddleOcrVendorReady(false); + await paddleOcrEngine.ensureProbe(); + } +} + +// 38. Priority-aware pickForTask (P9-D.4): higher-priority available engine wins; PP-OCRv5 +// preferred over tesseract when both available. +{ + const reg = new OCREngineRegistry(); + const stub = (id, priority, available) => ({ + id, taskCapabilities: ["ocr-text"], priority, isAvailable: () => available, recognize: async () => ({}), + }); + reg.register(stub("low-pri", 5, true)); + reg.register(stub("high-pri", 20, true)); + reg.register(stub("mid-pri", 10, true)); + assert.equal(reg.pickForTask("ocr-text").id, "high-pri", "highest-priority available engine should win"); + + // Only a low-priority engine available => it is picked even if a higher-priority one is unavailable. + const reg2 = new OCREngineRegistry(); + reg2.register(stub("hi-unavail", 20, false)); + reg2.register(stub("lo-avail", 5, true)); + assert.equal(reg2.pickForTask("ocr-text").id, "lo-avail", "available lower-priority engine should win over unavailable higher-priority"); + + // Default registry: both tesseract + paddle available => paddle (priority 20) preferred. + markTesseractVendorReady(true); + await tesseractOCREngine._storage.put("tesseract/eng.traineddata", new Uint8Array([1]).buffer, { sha256: "x" }); + await tesseractOCREngine.ensureProbe(); + markPaddleOcrVendorReady(true); + for (const file of ["det.onnx", "cls.onnx", "rec.onnx"]) { + await paddleOcrEngine._storage.put(`paddleocr/v5/${file}`, new Uint8Array([1]).buffer, { sha256: "x" }); + } + await paddleOcrEngine.ensureProbe(); + try { + assert.equal(tesseractOCREngine.isAvailable(), true); + assert.equal(paddleOcrEngine.isAvailable(), true); + assert.equal(defaultOCRRegistry.pickForTask("ocr-text").id, "paddleocr-v5", "PP-OCRv5 should be preferred over tesseract when both available"); + + // Remove paddle models => tesseract wins. + for (const file of ["det.onnx", "cls.onnx", "rec.onnx"]) { + await paddleOcrEngine._storage.delete(`paddleocr/v5/${file}`); + } + await paddleOcrEngine.ensureProbe(); + assert.equal(defaultOCRRegistry.pickForTask("ocr-text").id, "tesseract-zh-en", "tesseract should win when paddle unavailable"); + } finally { + await tesseractOCREngine._storage.delete("tesseract/eng.traineddata"); + for (const file of ["det.onnx", "cls.onnx", "rec.onnx"]) { + await paddleOcrEngine._storage.delete(`paddleocr/v5/${file}`); + } + markTesseractVendorReady(false); + markPaddleOcrVendorReady(false); + await tesseractOCREngine.ensureProbe(); + await paddleOcrEngine.ensureProbe(); + } +} + +console.log("OCR baseline test passed: contracts, registry, bootstraps, storage, png reader/async stage, repair validator, scan PDF detection + rasterizer skeleton + multi-page OCR stage + FixedLayoutModel mapping + browser rasterizer fallback + PP-OCRv5 advanced engine skeleton + onnxruntime-web runtime loader + model import availability flip + priority-aware route preference all verified."); diff --git a/scripts/ocr-readback-test.js b/scripts/ocr-readback-test.js new file mode 100644 index 0000000..0b125d8 --- /dev/null +++ b/scripts/ocr-readback-test.js @@ -0,0 +1,183 @@ +import assert from "node:assert/strict"; + +import { + compareText, + normalizeText, + extractModelText, + runOcrReadbackLayer, + runVerificationStageAsync, + convertContentAsync, + OCR_READBACK_DRIFT, + DEFAULT_OCR_READBACK_THRESHOLD, +} from "../public/browser-transformer.js"; + +function stubEngine(fullText, { available = true, averageConfidence = 0.9 } = {}) { + return { + id: "stub-ocr", + taskCapabilities: ["ocr-text"], + isAvailable: () => available, + recognize: async () => ({ fullText, averageConfidence, pages: [] }), + }; +} + +function stubRasterizer({ throwCode = null } = {}) { + return { + async rasterize() { + if (throwCode) { + const error = new Error("stub rasterizer failure"); + error.code = throwCode; + throw error; + } + return { dataUrl: "data:image/png;base64,AAAA", width: 16, height: 16 }; + }, + async countPages() { return 1; }, + }; +} + +function model(blocks) { + return { + schemaVersion: "trans2former.document.v1", + title: "ocr-readback", + sourceFormat: "md", + blocks, + assets: [], + metadata: { warnings: [], qualityReport: {} }, + }; +} + +// 1. compareText identical -> all 1 +{ + const r = compareText("Hello World", "Hello World"); + assert.equal(r.recall, 1); + assert.equal(r.precision, 1); + assert.equal(r.f1, 1); +} + +// 2. compareText subset -> recall < 1, precision = 1 +{ + const r = compareText("Hello World", "Hello"); + assert.ok(r.recall < 1, "recall should drop when text missing"); + assert.equal(r.precision, 1); + assert.ok(r.f1 < 1 && r.f1 > 0); +} + +// 3. compareText CJK (char-level multiset works without spaces) +{ + const r = compareText("你好世界", "你好世"); + assert.equal(r.recall, 0.75); + assert.equal(r.precision, 1); +} + +// 4. compareText empty original + empty recognized -> 1; empty original + text -> precision 0 +{ + assert.equal(compareText("", "").f1, 1); + const r = compareText("", "noise"); + assert.equal(r.precision, 0); +} + +// 5. normalizeText strips whitespace + lowercases +{ + assert.equal(normalizeText(" He llo\nWORLD "), "helloworld"); +} + +// 6. extractModelText joins block text incl. list/table +{ + const text = extractModelText(model([ + { type: "heading", text: "Title" }, + { type: "paragraph", text: "Body" }, + { type: "list", ordered: false, items: ["a", "b"] }, + { type: "table", headers: ["h1"], rows: [["c1"]] }, + ])); + assert.ok(text.includes("Title") && text.includes("Body") && text.includes("a") && text.includes("c1")); +} + +// 7. runOcrReadbackLayer happy path with stub engine + rasterizer +{ + const layer = await runOcrReadbackLayer({ + model: model([{ type: "heading", text: "Title" }, { type: "paragraph", text: "Body" }]), + output: { data: "" }, + ctx: { from: "md", to: "pdf", options: {} }, + engine: stubEngine("Title Body"), + rasterizer: stubRasterizer(), + }); + assert.equal(layer.eligible, true); + assert.equal(layer.ocrReadback.passed, true); + assert.equal(layer.ocrReadback.engineId, "stub-ocr"); + assert.ok(layer.ocrReadback.f1 >= DEFAULT_OCR_READBACK_THRESHOLD); +} + +// 8. runOcrReadbackLayer drift -> OCR_READBACK_DRIFT warning +{ + const layer = await runOcrReadbackLayer({ + model: model([{ type: "paragraph", text: "The quick brown fox jumps over the lazy dog" }]), + output: { data: "" }, + ctx: { from: "md", to: "pdf", options: {} }, + engine: stubEngine("zzzzz"), + rasterizer: stubRasterizer(), + }); + assert.equal(layer.eligible, true); + assert.equal(layer.ocrReadback.passed, false); + assert.equal(layer.warnings[0].code, OCR_READBACK_DRIFT); +} + +// 9. not eligible: non-pdf output +{ + const layer = await runOcrReadbackLayer({ + model: model([{ type: "paragraph", text: "x" }]), + output: { data: "text" }, + ctx: { from: "md", to: "md", options: {} }, + engine: stubEngine("x"), + rasterizer: stubRasterizer(), + }); + assert.equal(layer.eligible, false); + assert.equal(layer.reason, "output-not-rasterizable-for-ocr"); +} + +// 10. not eligible: engine unavailable (via registry returning null) +{ + const layer = await runOcrReadbackLayer({ + model: model([{ type: "paragraph", text: "x" }]), + output: { data: "" }, + ctx: { from: "md", to: "pdf", options: {} }, + engine: null, + registry: { pickForTask: () => null }, + rasterizer: stubRasterizer(), + }); + assert.equal(layer.eligible, false); + assert.equal(layer.reason, "ocr-engine-unavailable"); +} + +// 11. rasterizer unavailable -> reason rasterizer-unavailable (no throw) +{ + const layer = await runOcrReadbackLayer({ + model: model([{ type: "paragraph", text: "x" }]), + output: { data: "" }, + ctx: { from: "md", to: "pdf", options: {} }, + engine: stubEngine("x"), + rasterizer: stubRasterizer({ throwCode: "OCR_RASTERIZER_UNAVAILABLE" }), + }); + assert.equal(layer.eligible, false); + assert.equal(layer.reason, "rasterizer-unavailable"); +} + +// 12. runVerificationStageAsync merges three layers; default (no stub) ocr-readback skipped +{ + const env = await runVerificationStageAsync({ + model: model([{ type: "paragraph", text: "Body" }]), + output: { data: "" }, + ctx: { from: "md", to: "pdf", content: "Body", read: () => model([{ type: "paragraph", text: "Body" }]), options: {} }, + }); + // md->pdf: rule-diff skipped (pdf not text-canonical), ssim skipped (md not rasterizable), + // ocr-readback skipped in Node (no engine available) + assert.ok(env.skipped.some((s) => s.layer === "ocr-readback")); + assert.equal(env.ocrReadback, null); +} + +// 13. End-to-end convertContentAsync md->md: ocrReadback null (sync-like text path) +{ + const result = await convertContentAsync({ content: "# Title\n\nBody", from: "md", to: "md", title: "e2e" }); + assert.equal(result.quality.qualityReport.ocrReadback, null); + assert.equal(result.quality.qualityReport.ruleDiff.identical, true); +} + +console.log("OCR readback test passed: compareText (identical/subset/CJK/empty), normalizeText, extractModelText, readback layer gating/drift/unavailable/rasterizer-fail, async three-layer merge, end-to-end null-on-text covered."); diff --git a/scripts/ocr-structure-test.js b/scripts/ocr-structure-test.js new file mode 100644 index 0000000..509a411 --- /dev/null +++ b/scripts/ocr-structure-test.js @@ -0,0 +1,74 @@ +import assert from "node:assert/strict"; + +import { deriveOcrStructure, blocksFromOcrResult } from "../public/browser-transformer.js"; + +const line = (text, x, y, w, h) => ({ text, bbox: { x, y, w, h } }); + +// 1. Larger-font line becomes a heading; body lines group into a paragraph +{ + const lines = [ + line("产品标题", 10, 10, 200, 40), // tall => heading + line("第一行正文", 10, 60, 200, 18), + line("第二行正文", 10, 82, 200, 18), + ]; + const blocks = deriveOcrStructure(lines); + assert.equal(blocks[0].type, "heading"); + assert.equal(blocks[0].text, "产品标题"); + assert.ok(blocks[0].level >= 1 && blocks[0].level <= 6); + // the two close body lines merge into one paragraph + const paras = blocks.filter((b) => b.type === "paragraph"); + assert.equal(paras.length, 1); + assert.ok(paras[0].text.includes("第一行正文") && paras[0].text.includes("第二行正文")); +} + +// 2. A large vertical gap splits paragraphs +{ + const lines = [ + line("段落一行一", 10, 10, 200, 18), + line("段落一行二", 10, 30, 200, 18), + line("段落二行一", 10, 200, 200, 18), // big gap => new paragraph + ]; + const blocks = deriveOcrStructure(lines); + assert.equal(blocks.filter((b) => b.type === "paragraph").length, 2, "large gap should split into two paragraphs"); +} + +// 3. CJK lines join without spaces; latin lines join with a space +{ + const cjk = deriveOcrStructure([line("你好", 0, 0, 50, 18), line("世界", 0, 20, 50, 18)]); + assert.equal(cjk[0].text, "你好世界", "adjacent CJK lines should join without a space"); + const latin = deriveOcrStructure([line("hello", 0, 0, 50, 18), line("world", 0, 20, 50, 18)]); + assert.equal(latin[0].text, "hello world", "latin lines should join with a space"); +} + +// 4. Reading order: out-of-order (but close) lines are sorted top->bottom before joining +{ + const blocks = deriveOcrStructure([ + line("下面", 0, 28, 50, 18), + line("上面", 0, 6, 50, 18), + ]); + assert.equal(blocks[0].text, "上面下面"); +} + +// 5. No-geometry fallback: lines without bbox collapse to a single paragraph (legacy behavior) +{ + const blocks = deriveOcrStructure([{ text: "a" }, { text: "b" }]); + assert.equal(blocks.length, 1); + assert.equal(blocks[0].type, "paragraph"); + assert.equal(blocks[0].text, "a\nb"); +} + +// 6. blocksFromOcrResult walks pages; empty falls back to fullText +{ + const result = { pages: [{ lines: [line("标题大字", 0, 0, 100, 40), line("正文", 0, 50, 100, 16)] }] }; + const blocks = blocksFromOcrResult(result); + assert.ok(blocks.some((b) => b.type === "heading")); + assert.equal(blocksFromOcrResult({ pages: [], fullText: "only text" })[0].text, "only text"); + assert.equal(blocksFromOcrResult({ pages: [] }).length, 0); +} + +// 7. Empty / whitespace lines are ignored +{ + assert.equal(deriveOcrStructure([{ text: " ", bbox: { x: 0, y: 0, w: 1, h: 1 } }]).length, 0); +} + +console.log("OCR structure test passed: heading detection by font size, paragraph grouping by vertical gap, CJK/latin line joining, reading-order sort, no-geometry fallback, and blocksFromOcrResult page walk verified."); diff --git a/scripts/paddle-ocr-integration-test.js b/scripts/paddle-ocr-integration-test.js new file mode 100644 index 0000000..ff8b1e4 --- /dev/null +++ b/scripts/paddle-ocr-integration-test.js @@ -0,0 +1,61 @@ +// 真实 PP-OCRv5 ONNX 集成测试:用 onnxruntime-node 在 Node 端跑真实 rec 模型,验证 +// 整条识别管线(预处理 + CTC + 字典)对真实模型确实正确。 +// +// 依赖 onnxruntime-node + pngjs(重型/原生,非项目运行时依赖)+ 本地 vendor 模型 + +// 字典 + 词图 fixture。任一缺失则**优雅跳过**(exit 0),所以默认 `npm test` 在未安装 +// 这些开发依赖 / 未下载模型的环境下不会失败。 +// +// 本机启用:npm i -D onnxruntime-node pngjs && npm run vendor:onnx + 下载 PP-OCRv5 模型。 + +import assert from "node:assert/strict"; +import { readFileSync, existsSync } from "node:fs"; +import { createRequire } from "node:module"; + +const require = createRequire(import.meta.url); + +function tryRequire(name) { + try { return require(name); } catch { return null; } +} + +const ort = tryRequire("onnxruntime-node"); +const pngjs = tryRequire("pngjs"); +const REC = "public/vendor/paddleocr/rec.onnx"; +const DICT = "public/vendor/paddleocr/dict.txt"; +const FIXTURE = "samples/ocr/word-PAIN.png"; + +if (!ort || !pngjs) { + console.log("PP-OCRv5 integration test skipped: onnxruntime-node / pngjs not installed (dev-only). Install with `npm i -D onnxruntime-node pngjs`."); + process.exit(0); +} +if (!existsSync(REC) || !existsSync(DICT)) { + console.log("PP-OCRv5 integration test skipped: vendor models absent. Run `npm run vendor:onnx` + download PP-OCRv5 ONNX into public/vendor/paddleocr/."); + process.exit(0); +} +if (!existsSync(FIXTURE)) { + console.log(`PP-OCRv5 integration test skipped: fixture ${FIXTURE} missing.`); + process.exit(0); +} + +const P = await import("../public/core/ocr/paddle-ocr-pipeline.js"); + +const png = pngjs.PNG.sync.read(readFileSync(FIXTURE)); +const img = { data: new Uint8ClampedArray(png.data), width: png.width, height: png.height }; +const dict = P.parseCharDictionary(readFileSync(DICT, "utf8")); + +const rec = await ort.InferenceSession.create(REC); +const pre = P.preprocessForRecognition(img, {}); +const out = await rec.run({ [rec.inputNames[0]]: new ort.Tensor("float32", pre.data, pre.dims) }); +const o = out[rec.outputNames[0]]; +const C = o.dims[o.dims.length - 1]; +const T = o.dims[o.dims.length - 2]; + +// rec 输出类别数必须等于字典长度(blank + chars + space),否则对齐错位。 +assert.equal(C, dict.length, `rec output classes (${C}) must equal dictionary length (${dict.length})`); + +const decoded = P.ctcGreedyDecode(o.data, T, C, dict); +const text = decoded.text.toUpperCase().replace(/\s+/g, ""); +// fixture word-PAIN.png 的 ground-truth 是 "PAIN" +assert.equal(text, "PAIN", `expected rec to read "PAIN", got "${decoded.text}"`); +assert.ok(decoded.confidence > 0.8, `expected high confidence, got ${decoded.confidence}`); + +console.log(`PP-OCRv5 integration test passed: real rec model decodes fixture -> "${decoded.text}" (conf ${decoded.confidence.toFixed(3)}, C=${C} matches dict).`); diff --git a/scripts/paddle-ocr-pipeline-test.js b/scripts/paddle-ocr-pipeline-test.js new file mode 100644 index 0000000..c41d0de --- /dev/null +++ b/scripts/paddle-ocr-pipeline-test.js @@ -0,0 +1,327 @@ +import assert from "node:assert/strict"; + +import { + parseCharDictionary, + preprocessForDetection, + preprocessForRecognition, + dbPostProcess, + ctcGreedyDecode, + cropImageData, + resizeRgba, + rotateImageData90, + rotateImageData180, + rotateImageDataByAngle, + estimateSkewAngle, + interpretClsOutput, + denoiseImageData, + estimateNoiseLevel, + runPaddlePipeline, + DET_LIMIT_SIDE_LEN, + REC_IMAGE_HEIGHT, +} from "../public/browser-transformer.js"; + +function solidRgba(value, width, height) { + const data = new Uint8ClampedArray(width * height * 4); + for (let i = 0; i < width * height; i += 1) { + data[i * 4] = value; + data[i * 4 + 1] = value; + data[i * 4 + 2] = value; + data[i * 4 + 3] = 255; + } + return { data, width, height }; +} + +// Mock onnxruntime namespace + sessions for end-to-end orchestration without real models. +const mockOrt = { Tensor: class { constructor(type, data, dims) { this.type = type; this.data = data; this.dims = dims; } } }; + +function mockSession(outputName, produce) { + return { + inputNames: ["x"], + outputNames: [outputName], + run: async (feeds) => ({ [outputName]: produce(feeds.x) }), + }; +} + +// 1. parseCharDictionary: blank at 0, lines preserved, trailing space appended +{ + const dict = parseCharDictionary("你\n好\nA"); + assert.deepEqual(dict, ["", "你", "好", "A", " "]); + assert.deepEqual(parseCharDictionary(""), ["", " "]); + // 字典文件已显式以空格行结尾(如 ppu ppocrv5_dict)→ 不重复追加,避免类别数比模型多 1。 + assert.deepEqual(parseCharDictionary("你\n好\n "), ["", "你", "好", " "]); + // 全角空格 U+3000 是合法 token,必须保留(不可用 trim() 误删)。 + assert.deepEqual(parseCharDictionary(" \nA"), ["", " ", "A", " "]); +} + +// 2. preprocessForDetection: NCHW float, dims multiple of 32, scale ratios +{ + const det = preprocessForDetection(solidRgba(255, 20, 10)); + assert.equal(det.dims[0], 1); + assert.equal(det.dims[1], 3); + assert.equal(det.dims[2] % 32, 0); + assert.equal(det.dims[3] % 32, 0); + assert.equal(det.data.length, 3 * det.dims[2] * det.dims[3]); + assert.ok(det.scaleW > 0 && det.scaleH > 0); + // limit side len respected + const big = preprocessForDetection(solidRgba(0, 4000, 100)); + assert.ok(big.resizedWidth <= DET_LIMIT_SIDE_LEN + 32); +} + +// 3. preprocessForRecognition: fixed height 48, normalized to [-1,1] +{ + const rec = preprocessForRecognition(solidRgba(255, 100, 32)); + assert.equal(rec.dims[2], REC_IMAGE_HEIGHT); + assert.equal(rec.height, REC_IMAGE_HEIGHT); + // white pixel -> (255/255 - 0.5)/0.5 = 1 + assert.ok(Math.abs(rec.data[0] - 1) < 1e-6); +} + +// 4. resizeRgba + cropImageData +{ + const img = solidRgba(120, 10, 10); + const small = resizeRgba(img, 5, 5); + assert.equal(small.width, 5); + assert.equal(small.data.length, 5 * 5 * 4); + const crop = cropImageData(img, { x: 2, y: 2, w: 4, h: 4 }); + assert.equal(crop.width, 4); + assert.equal(crop.height, 4); + // crop clamps when out of range + const clamped = cropImageData(img, { x: 8, y: 8, w: 10, h: 10 }); + assert.ok(clamped.width <= 2 && clamped.height <= 2); +} + +// 5. dbPostProcess: connected component => one box; sub-threshold => none +{ + const prob = new Float32Array(36).fill(0); // 6x6 + // hot 3x3 block at (1,1)-(3,3) + for (let y = 1; y <= 3; y += 1) for (let x = 1; x <= 3; x += 1) prob[y * 6 + x] = 0.9; + // unclipRatio:0 关闭外扩,校验连通域 bbox 精确坐标 + const boxes = dbPostProcess(prob, 6, 6, { thresh: 0.3, boxThresh: 0.5, minSize: 2, unclipRatio: 0, scaleW: 1, scaleH: 1 }); + assert.equal(boxes.length, 1); + assert.equal(boxes[0].x, 1); + assert.equal(boxes[0].y, 1); + assert.equal(boxes[0].w, 3); + assert.equal(boxes[0].h, 3); + + // 默认 unclip>0 时框应向外扩(覆盖原 bbox 且更大),不切字符 + const unclipped = dbPostProcess(prob, 6, 6, { thresh: 0.3, boxThresh: 0.5, minSize: 2, scaleW: 1, scaleH: 1 }); + assert.equal(unclipped.length, 1); + assert.ok(unclipped[0].w >= 3 && unclipped[0].h >= 3, "unclip should not shrink the box"); + assert.ok(unclipped[0].x <= 1 && unclipped[0].y <= 1, "unclip should expand the box outward"); + + const none = dbPostProcess(new Float32Array(36).fill(0.1), 6, 6, { thresh: 0.3 }); + assert.equal(none.length, 0); +} + +// 6. dbPostProcess two separate blocks => two boxes sorted top->bottom +{ + const prob = new Float32Array(64).fill(0); // 8x8 + for (let y = 0; y <= 2; y += 1) for (let x = 0; x <= 2; x += 1) prob[y * 8 + x] = 0.8; // top-left + for (let y = 5; y <= 7; y += 1) for (let x = 5; x <= 7; x += 1) prob[y * 8 + x] = 0.8; // bottom-right + const boxes = dbPostProcess(prob, 8, 8, { thresh: 0.3, boxThresh: 0.5, minSize: 2, scaleW: 1, scaleH: 1 }); + assert.equal(boxes.length, 2); + assert.ok(boxes[0].y <= boxes[1].y, "boxes should be sorted top-to-bottom"); +} + +// 7. ctcGreedyDecode: collapse repeats + drop blank(0) +{ + const dict = ["", "a", "b", "c"]; + // T=5, C=4; argmax sequence: a,a,blank,b,c -> "abc" + const logits = new Float32Array([ + 0, 9, 0, 0, + 0, 9, 0, 0, + 9, 0, 0, 0, + 0, 0, 9, 0, + 0, 0, 0, 9, + ]); + const { text, confidence } = ctcGreedyDecode(logits, 5, 4, dict); + assert.equal(text, "abc"); + assert.ok(confidence > 0); + // all blank => empty + const blank = new Float32Array([9, 0, 0, 0, 9, 0, 0, 0]); + assert.equal(ctcGreedyDecode(blank, 2, 4, dict).text, ""); +} + +// 8. runPaddlePipeline end-to-end with mock ort + mock sessions => OCRResult with decoded text +{ + const dict = ["", "H", "I"]; + // det produces a prob map (same size as resized det input) with a hot block. + const detSession = mockSession("det_out", (tensor) => { + const [, , H, W] = tensor.dims; + const data = new Float32Array(H * W).fill(0); + // hot region in the middle covering >= minSize + for (let y = Math.floor(H / 2) - 4; y < Math.floor(H / 2) + 4; y += 1) { + for (let x = Math.floor(W / 2) - 4; x < Math.floor(W / 2) + 4; x += 1) { + if (y >= 0 && y < H && x >= 0 && x < W) data[y * W + x] = 0.9; + } + } + return { data, dims: [1, 1, H, W] }; + }); + const clsSession = mockSession("cls_out", () => ({ data: new Float32Array([0.9, 0.1]), dims: [1, 2] })); + // rec produces logits decoding to "HI": T=3, C=3 -> H, I, blank + const recSession = mockSession("rec_out", () => ({ + data: new Float32Array([0, 9, 0, 0, 0, 9, 9, 0, 0]), + dims: [1, 3, 3], + })); + + const imageData = solidRgba(200, 64, 64); + const result = await runPaddlePipeline({ + ort: mockOrt, + detSession, + clsSession, + recSession, + imageData, + dictionary: dict, + options: { db: { thresh: 0.3, boxThresh: 0.5, minSize: 2 } }, + }); + assert.equal(result.schemaVersion, "trans2former.ocr-result.v1"); + assert.equal(result.engine, "paddleocr-v5"); + assert.ok(result.pages[0].lines.length >= 1, "pipeline should produce at least one recognized line"); + assert.equal(result.pages[0].lines[0].text, "HI"); + assert.ok(result.fullText.includes("HI")); + assert.ok(result.averageConfidence > 0); +} + +// 9. runPaddlePipeline validates ort + sessions +{ + await assert.rejects( + () => runPaddlePipeline({ ort: null, detSession: {}, recSession: {}, imageData: solidRgba(0, 4, 4) }), + (err) => err.code === "OCR_ENGINE_INVALID", + ); +} + +// 10. Rotation helpers: 180 is an involution; 90 swaps dims; corners map correctly +{ + // 2x3 image with a distinct top-left red pixel + const w = 2, h = 3; + const data = new Uint8ClampedArray(w * h * 4); + const setPx = (img, x, y, r) => { const o = (y * img.width + x) * 4; img.data[o] = r; img.data[o + 3] = 255; }; + const img = { data, width: w, height: h }; + setPx(img, 0, 0, 200); // mark top-left + + const r180 = rotateImageData180(img); + assert.equal(r180.width, w); + assert.equal(r180.height, h); + // top-left should now be at bottom-right + assert.equal(r180.data[((h - 1) * w + (w - 1)) * 4], 200); + // 180 is its own inverse + const back = rotateImageData180(r180); + assert.deepEqual(Array.from(back.data), Array.from(img.data)); + + const cw = rotateImageData90(img, "cw"); + assert.equal(cw.width, h); // dims swapped + assert.equal(cw.height, w); + // top-left (0,0) under CW goes to (height-1, 0) = (2,0) + assert.equal(cw.data[(0 * cw.width + (h - 1)) * 4], 200); + const ccw = rotateImageData90(img, "ccw"); + assert.equal(ccw.width, h); + assert.equal(ccw.height, w); +} + +// 11. interpretClsOutput: c1 high => flip; c0 high => no flip; below threshold => no flip +{ + assert.equal(interpretClsOutput([0.05, 0.95], 0.6).flip, true); + assert.equal(interpretClsOutput([0.95, 0.05], 0.6).flip, false); + assert.equal(interpretClsOutput([0.45, 0.55], 0.6).flip, false, "below threshold should not flip"); + assert.equal(interpretClsOutput([0.1, 0.9]).confidence, 0.9); +} + +// 12. runPaddlePipeline returns a quality assessment (grade + confidence stats) +{ + const dict = ["", "H", "I"]; + const detSession = mockSession("det_out", (tensor) => { + const [, , H, W] = tensor.dims; + const data = new Float32Array(H * W).fill(0); + for (let y = Math.floor(H / 2) - 4; y < Math.floor(H / 2) + 4; y += 1) + for (let x = Math.floor(W / 2) - 4; x < Math.floor(W / 2) + 4; x += 1) + if (y >= 0 && y < H && x >= 0 && x < W) data[y * W + x] = 0.9; + return { data, dims: [1, 1, H, W] }; + }); + const recSession = mockSession("rec_out", () => ({ data: new Float32Array([0, 9, 0, 0, 0, 9, 9, 0, 0]), dims: [1, 3, 3] })); + const result = await runPaddlePipeline({ + ort: mockOrt, detSession, recSession, imageData: solidRgba(200, 64, 64), dictionary: dict, + options: { db: { thresh: 0.3, boxThresh: 0.5, minSize: 2 } }, + }); + assert.ok(result.quality, "pipeline should attach a quality assessment"); + assert.equal(typeof result.quality.averageConfidence, "number"); + assert.equal(typeof result.quality.minConfidence, "number"); + assert.ok(["high", "medium", "low"].includes(result.quality.grade)); + assert.equal(result.quality.lineCount, result.pages[0].lines.length); +} + +// 13. estimateNoiseLevel: clean ~0; salt-and-pepper => high. denoiseImageData removes speckle. +{ + const w = 32, h = 32; + const clean = solidRgba(128, w, h); + assert.ok(estimateNoiseLevel(clean) < 0.02, "uniform image should have near-zero noise estimate"); + + // sprinkle salt-and-pepper + const noisy = { data: new Uint8ClampedArray(clean.data), width: w, height: h }; + let seed = 7; + const rnd = () => { seed = (seed * 1103515245 + 12345) & 0x7fffffff; return seed / 0x7fffffff; }; + for (let i = 0; i < w * h; i += 1) { + if (rnd() < 0.15) { const v = rnd() < 0.5 ? 0 : 255; noisy.data[i * 4] = v; noisy.data[i * 4 + 1] = v; noisy.data[i * 4 + 2] = v; } + } + assert.ok(estimateNoiseLevel(noisy) > 0.05, "salt-and-pepper image should exceed denoise threshold"); + + // median denoise reduces the speckle measure + const cleaned = denoiseImageData(noisy); + assert.equal(cleaned.width, w); + assert.ok(estimateNoiseLevel(cleaned) < estimateNoiseLevel(noisy), "denoise should reduce the noise estimate"); +} + +// 14. auto-denoise gating: clean stays untouched; quality reports denoised flag +{ + const dict = ["", "H", "I"]; + const detSession = mockSession("det_out", (tensor) => { + const [, , H, W] = tensor.dims; + const data = new Float32Array(H * W).fill(0); + for (let y = Math.floor(H / 2) - 4; y < Math.floor(H / 2) + 4; y += 1) + for (let x = Math.floor(W / 2) - 4; x < Math.floor(W / 2) + 4; x += 1) + if (y >= 0 && y < H && x >= 0 && x < W) data[y * W + x] = 0.9; + return { data, dims: [1, 1, H, W] }; + }); + const recSession = mockSession("rec_out", () => ({ data: new Float32Array([0, 9, 0, 0, 0, 9, 9, 0, 0]), dims: [1, 3, 3] })); + const r = await runPaddlePipeline({ + ort: mockOrt, detSession, recSession, imageData: solidRgba(200, 64, 64), dictionary: dict, + options: { db: { thresh: 0.3, boxThresh: 0.5, minSize: 2 } }, + }); + assert.equal(r.quality.denoised, false, "clean uniform image should not be denoised"); + assert.equal(typeof r.quality.noiseLevel, "number"); +} + +// 15. rotateImageDataByAngle: 0deg is identity-ish; expands canvas for non-zero angles +{ + const img = solidRgba(120, 10, 8); + const same = rotateImageDataByAngle(img, 0); + assert.equal(same.width, 10); + assert.equal(same.height, 8); + const rot = rotateImageDataByAngle(img, 30); + assert.ok(rot.width >= 10 && rot.height >= 8, "rotation should expand the canvas"); + assert.equal(rot.data.length, rot.width * rot.height * 4); +} + +// 16. estimateSkewAngle: a synthetic horizontal-lines prob map skewed by +A is detected ~A +{ + const W = 120, H = 120; + // build a prob map with horizontal text rows, then shear it by angle A + const A = 8; + const t = Math.tan((A * Math.PI) / 180); + const prob = new Float32Array(W * H).fill(0); + for (let row = 20; row < H; row += 20) { + for (let x = 10; x < W - 10; x += 1) { + const y = Math.round(row + x * t); // shear -> slanted rows + if (y >= 0 && y < H) prob[y * W + x] = 1; + } + } + const est = estimateSkewAngle(prob, W, H, { maxAngle: 15, step: 1, thresh: 0.3 }); + assert.ok(Math.abs(est) >= 3, `should detect a non-trivial skew, got ${est}`); + assert.ok(Math.sign(est) === Math.sign(A) || est === A || Math.abs(est - A) <= 3, `skew estimate ${est} should be near +${A}`); + + // flat horizontal rows => ~0 skew + const flat = new Float32Array(W * H).fill(0); + for (let row = 20; row < H; row += 20) for (let x = 10; x < W - 10; x += 1) flat[row * W + x] = 1; + assert.ok(Math.abs(estimateSkewAngle(flat, W, H, { maxAngle: 15 })) <= 2, "flat rows should estimate ~0 skew"); +} + +console.log("PP-OCRv5 pipeline test passed: dictionary, det/rec preprocessing, resize/crop, DB postprocess + unclip, CTC greedy decode, rotation helpers + cls interpretation, denoise (noise estimate + median + auto-gating), skew estimation + arbitrary rotation, quality assessment, and mock-session end-to-end runPaddlePipeline verified."); diff --git a/scripts/paddleocr-models.manifest.json b/scripts/paddleocr-models.manifest.json new file mode 100644 index 0000000..0b098aa --- /dev/null +++ b/scripts/paddleocr-models.manifest.json @@ -0,0 +1,34 @@ +{ + "source": { + "repo": "PT-Perkasa-Pilar-Utama/ppu-paddle-ocr-models", + "commit": "3a180da5b1a3bab3371d970f4da42cb9b354a9a7", + "license": "Apache-2.0", + "note": "PP-OCRv5 mobile ONNX (onnx sourced from paddleocr.ai). Direction classifier (cls.onnx) is optional and intentionally NOT bundled; users may import their own via the security center." + }, + "files": [ + { + "target": "det.onnx", + "kind": "lfs", + "remotePath": "detection/PP-OCRv5_mobile_det_infer.onnx", + "required": true, + "size": 4748769, + "sha256": "d7fe3ea74652890722c0f4d02458b7261d9f5ae6c92904d05707c9eb155c7924" + }, + { + "target": "rec.onnx", + "kind": "lfs", + "remotePath": "recognition/PP-OCRv5_mobile_rec_infer.onnx", + "required": true, + "size": 16559278, + "sha256": "d253c3cbee6e507828a5271a30ab0ec8ae7c2a99d0cc8e6f844fe380809d22b3" + }, + { + "target": "dict.txt", + "kind": "raw", + "remotePath": "recognition/ppocrv5_dict.txt", + "required": true, + "size": 74014, + "sha256": "9dfc80c50b6cb07399a47a7cf25d11db475fb4ad0e1fc96b2eff6467c8166ff3" + } + ] +} diff --git a/scripts/release-readiness-test.js b/scripts/release-readiness-test.js index f493878..a7a001d 100644 --- a/scripts/release-readiness-test.js +++ b/scripts/release-readiness-test.js @@ -41,8 +41,10 @@ for (const file of REQUIRED_FILES) { const packageJson = JSON.parse(await readFile("package.json", "utf8")); assert.equal(packageJson.scripts["vendor:pdfjs"], "node scripts/sync-pdfjs-vendor.js"); -assert.equal(packageJson.scripts["release:prepare"], "node scripts/sync-pdfjs-vendor.js && node scripts/sync-tesseract-vendor.js && node scripts/prepare-release.js"); +assert.equal(packageJson.scripts["release:prepare"], "node scripts/sync-pdfjs-vendor.js && node scripts/sync-tesseract-vendor.js && node scripts/sync-onnxruntime-vendor.js && node scripts/sync-paddleocr-vendor.js && node scripts/prepare-release.js"); assert.equal(packageJson.scripts["vendor:tesseract"], "node scripts/sync-tesseract-vendor.js"); +assert.equal(packageJson.scripts["vendor:onnx"], "node scripts/sync-onnxruntime-vendor.js"); +assert.equal(packageJson.scripts["vendor:paddle"], "node scripts/sync-paddleocr-vendor.js"); const releasePrep = await readFile("docs/RELEASE_PREP.md", "utf8"); for (const requiredText of [ diff --git a/scripts/resource-budget-test.js b/scripts/resource-budget-test.js index a997699..8927686 100644 --- a/scripts/resource-budget-test.js +++ b/scripts/resource-budget-test.js @@ -3,14 +3,19 @@ import { readFile, readdir, stat } from "node:fs/promises"; import path from "node:path"; const BUDGETS = [ - { path: "public/core", maxBytes: 256 * 1024 }, + // public/core 是纯 JS 算法核心(转换/路由/Repair/三层检验/OCR 前后处理管线), + // 不含任何模型权重——模型只进 model-cache、按需导入。P9-C 三层检验 + P9-D PP-OCRv5 + // 推理管线(DB 后处理 + CTC 解码等纯函数)合理扩容到 320KB,仍远小于任何带权重方案。 + { path: "public/core", maxBytes: 320 * 1024 }, { path: "public/formats", maxBytes: 512 * 1024 }, { path: "public/workers", maxBytes: 128 * 1024 }, { path: "scripts", maxBytes: 512 * 1024 }, { path: "public", maxBytes: 2 * 1024 * 1024, exclude: ["public/vendor"] }, - // vendored PDF.js(main + worker + cmaps + standard_fonts)属于按需的可选引擎, - // 不应挤占核心主预算,但本身仍要有上限避免漂移。 - { path: "public/vendor", maxBytes: 6 * 1024 * 1024 }, + // vendored 引擎/模型属于按需的可选资源,不挤占核心主预算,但仍设上限防漂移。 + // 含:pdfjs(~4MB) + onnxruntime-web 最小 JSEP 构建(~25MB) + tesseract.js core 全 SIMD/LSTM + // 变体(~30MB) + PP-OCRv5 mobile det/rec + 字典(~21MB) + KaTeX(~1MB) ≈ 80MB。这些不入 git + // (见 .gitignore),由 vendor 脚本 + 本地下载重建,随应用打包。上限留约 20% 余量防漂移。 + { path: "public/vendor", maxBytes: 96 * 1024 * 1024 }, ]; const FORBIDDEN_DEPENDENCIES = [ diff --git a/scripts/rule-diff-test.js b/scripts/rule-diff-test.js new file mode 100644 index 0000000..9e2c3e4 --- /dev/null +++ b/scripts/rule-diff-test.js @@ -0,0 +1,201 @@ +import assert from "node:assert/strict"; + +import { + convertContent, + diffSemanticDocs, + runVerificationStage, + blockFingerprint, + modelFingerprint, + ROUND_TRIP_FORMATS, + RULE_DIFF_DRIFT, + RULE_DIFF_READBACK_FAILED, +} from "../public/browser-transformer.js"; + +function block(overrides = {}) { + return { + id: overrides.id, + type: "paragraph", + text: "Hello world", + warnings: [], + sourceSpan: { startLine: null, endLine: null, startOffset: null, endOffset: null }, + ...overrides, + }; +} + +function model(blocks) { + return { + schemaVersion: "trans2former.document.v1", + title: "diff-test", + sourceFormat: "md", + blocks, + assets: [], + metadata: { warnings: [], qualityReport: {} }, + }; +} + +// Reference implementation of the legacy fingerprint, kept for byte-level equivalence check. +function legacyBlockFingerprint(b) { + if (!b || typeof b !== "object") return ""; + if (b.type === "heading") return `h${b.level}|${b.text || ""}`; + if (b.type === "paragraph" || b.type === "quote") return `${b.type}|${b.text || ""}`; + if (b.type === "code") return `code|${b.language || ""}|${b.code || ""}`; + if (b.type === "list") return `list|${b.ordered ? "ol" : "ul"}|${(b.items || []).join("")}`; + if (b.type === "table") { + return `table|${(b.headers || []).join("")}|${(b.rows || []).map((row) => (row || []).join("")).join("")}`; + } + if (b.type === "image" || b.type === "asset") { + return `${b.type}|${b.src || ""}|${b.alt || ""}|${b.assetId || ""}`; + } + if (b.type === "raw") return `raw|${b.format || ""}|${b.content || ""}`; + return b.type || ""; +} + +// 1. Identical models → exact / score 1 +{ + const original = model([block({ id: "b1" }), block({ id: "b2", type: "heading", level: 1, text: "Title" })]); + const readBack = model([block({ id: "b1" }), block({ id: "b2", type: "heading", level: 1, text: "Title" })]); + const diff = diffSemanticDocs(original, readBack); + assert.equal(diff.identical, true); + assert.equal(diff.fidelity, "exact"); + assert.equal(diff.overallScore, 1); + assert.equal(diff.changedBlocks.length, 0); + assert.equal(diff.addedBlocks.length, 0); + assert.equal(diff.removedBlocks.length, 0); +} + +// 2. Whitespace/punct-only text delta → minor-drift, severity minor +{ + const original = model([block({ id: "b1", text: "Hello, world!" })]); + const readBack = model([block({ id: "b1", text: "Hello world" })]); + const diff = diffSemanticDocs(original, readBack); + assert.equal(diff.fidelity, "minor-drift"); + assert.equal(diff.changedBlocks.length, 1); + assert.deepEqual(diff.changedBlocks[0].fieldsDiffered.map((f) => f.field), ["text"]); + assert.equal(diff.changedBlocks[0].severity, "minor"); +} + +// 2b. Substantive text change → major (text field semantic change) +{ + const original = model([block({ id: "b1", text: "The quick brown fox" })]); + const readBack = model([block({ id: "b1", text: "Totally different sentence here" })]); + const diff = diffSemanticDocs(original, readBack); + assert.equal(diff.changedBlocks[0].severity, "major"); + assert.equal(diff.fidelity, "major-drift"); +} + +// 3. Heading level change → major-drift +{ + const original = model([block({ id: "h", type: "heading", level: 1, text: "Same" })]); + const readBack = model([block({ id: "h", type: "heading", level: 2, text: "Same" })]); + const diff = diffSemanticDocs(original, readBack); + assert.equal(diff.fidelity, "major-drift"); + assert.ok(diff.changedBlocks[0].fieldsDiffered.some((f) => f.field === "level" && f.severity === "major")); +} + +// 4. Missing + extra block, >30% structural delta → broken +{ + const original = model([ + block({ id: "a", text: "alpha" }), + block({ id: "b", text: "beta" }), + ]); + const readBack = model([ + block({ id: "a", text: "alpha" }), + block({ id: "c", text: "gamma" }), + ]); + const diff = diffSemanticDocs(original, readBack); + assert.equal(diff.removedBlocks.length, 1); + assert.equal(diff.addedBlocks.length, 1); + assert.equal(diff.removedBlocks[0].id, "b"); + assert.equal(diff.addedBlocks[0].id, "c"); + assert.equal(diff.fidelity, "broken"); +} + +// 5. runVerificationStage with mock ctx.read returning identical model → ruleDiff.identical, no warnings +{ + const original = model([block({ id: "b1", text: "stable" })]); + const ctx = { + from: "md", + to: "md", + read: () => model([block({ id: "b1", text: "stable" })]), + }; + const result = runVerificationStage({ model: original, output: { type: "text", format: "md", data: "stable" }, ctx }); + assert.equal(result.eligible, true); + assert.deepEqual(result.layers, ["rule-diff"]); + assert.equal(result.ruleDiff.identical, true); + assert.equal(result.warnings.length, 0); +} + +// 6. runVerificationStage with ctx.read throwing → RULE_DIFF_READBACK_FAILED warning, no throw +{ + const original = model([block({ id: "b1" })]); + const ctx = { + from: "md", + to: "md", + read: () => { throw new Error("boom"); }, + }; + const result = runVerificationStage({ model: original, output: { type: "text", format: "md", data: "x" }, ctx }); + assert.equal(result.eligible, true); + assert.equal(result.ruleDiff, null); + assert.equal(result.warnings.length, 1); + assert.equal(result.warnings[0].code, RULE_DIFF_READBACK_FAILED); +} + +// 7. runVerificationStage not eligible for non-text-canonical writer +{ + const original = model([block({ id: "b1" })]); + const ctx = { + from: "md", + to: "pptx", + read: () => original, + }; + const result = runVerificationStage({ model: original, output: { type: "binary", format: "pptx", data: "" }, ctx }); + assert.equal(result.eligible, false); + assert.equal(result.reason, "writer-not-text-canonical"); + assert.deepEqual(result.layers, []); + assert.equal(result.ruleDiff, null); + assert.equal(result.skipped[0].layer, "rule-diff"); + assert.equal(ROUND_TRIP_FORMATS.has("pptx"), false); +} + +// 8. End-to-end md -> md → ruleDiff.identical, verification.layers +{ + const result = convertContent({ content: "# Title\n\nBody text.", from: "md", to: "md", title: "e2e.md" }); + assert.equal(result.quality.qualityReport.ruleDiff.identical, true); + assert.deepEqual(result.quality.qualityReport.verification.layers, ["rule-diff"]); + assert.equal(result.quality.qualityReport.verification.eligible, true); +} + +// 9. End-to-end md -> html cross-format loopback runs, ruleDiff non-null +{ + const result = convertContent({ content: "# A\n\nB", from: "md", to: "html", title: "e2e.html" }); + assert.equal(result.quality.qualityReport.verification.eligible, true); + assert.notEqual(result.quality.qualityReport.ruleDiff, null); + assert.deepEqual(result.quality.qualityReport.verification.layers, ["rule-diff"]); +} + +// 10. End-to-end md -> pdf not eligible + shared fingerprint byte-equivalence +{ + const result = convertContent({ content: "# Hi", from: "md", to: "pdf", title: "e2e.pdf" }); + assert.equal(result.quality.qualityReport.ruleDiff, null); + assert.equal(result.quality.qualityReport.verification.eligible, false); + assert.equal(result.quality.qualityReport.verification.skipped[0].reason, "writer-not-text-canonical"); + + // shared fingerprint must match legacy implementation byte-for-byte + const samples = [ + block({ id: "p", text: "para" }), + block({ id: "h", type: "heading", level: 3, text: "Head" }), + { type: "code", language: "js", code: "x=1" }, + { type: "list", ordered: true, items: ["one", "two"] }, + { type: "table", headers: ["a", "b"], rows: [["1", "2"], ["3", "4"]] }, + { type: "image", src: "i.png", alt: "img" }, + { type: "raw", format: "html", content: "x" }, + ]; + for (const sample of samples) { + assert.equal(blockFingerprint(sample), legacyBlockFingerprint(sample), `fingerprint drift for ${sample.type}`); + } + const m = model(samples); + assert.equal(modelFingerprint(m), samples.map(legacyBlockFingerprint).join("")); + assert.equal(typeof RULE_DIFF_DRIFT, "string"); +} + +console.log("Rule diff test passed: diffSemanticDocs units, verification-stage gating/readback, end-to-end md->md/html/pdf, fingerprint equivalence covered."); diff --git a/scripts/sample-corpus-test.js b/scripts/sample-corpus-test.js new file mode 100644 index 0000000..a98d5f6 --- /dev/null +++ b/scripts/sample-corpus-test.js @@ -0,0 +1,93 @@ +import assert from "node:assert/strict"; + +import { + TEXT_BUILDERS, + SIZE_TIERS, + buildComplexMarkdown, + buildComplexCsv, + buildToTargetBytes, +} from "./lib/sample-content.js"; +import { buildPatternPng } from "./lib/png-encode.js"; +import { convertContent, getAllowedOutputFormats } from "../public/browser-transformer.js"; + +// 快速门禁:只在 small scale 验证生成器逻辑与跨格式可读性,不写 3MB 文件、不落盘。 +// 真正的大样例语料由 `npm run samples:generate` 产出到 samples/generated/(gitignore)。 + +// 1. 文本 builder 在 small scale 产出非空且确定(同输入同输出)。 +{ + for (const [key, builder] of Object.entries(TEXT_BUILDERS)) { + const a = builder(1); + const b = builder(1); + assert.equal(typeof a, "string"); + assert.ok(a.length > 0, `${key} builder should produce non-empty content`); + assert.equal(a, b, `${key} builder must be deterministic`); + } +} + +// 2. scale 增大 → 内容增大(医用于 size tier)。 +{ + const small = buildComplexMarkdown(SIZE_TIERS.small); + const medium = buildComplexMarkdown(SIZE_TIERS.medium); + assert.ok(medium.length > small.length * 10, "medium markdown should be much larger than small"); +} + +// 3. 复杂 markdown 覆盖关键结构(表格/代码/任务/脚注/CJK/RTL)。 +{ + const md = buildComplexMarkdown(1); + assert.ok(md.includes("|"), "should contain table"); + assert.ok(md.includes("```"), "should contain code fence"); + assert.ok(md.includes("- [x]"), "should contain task list"); + assert.ok(md.includes("[^note1]"), "should contain footnote"); + assert.ok(/[一-鿿]/.test(md), "should contain CJK"); + assert.ok(/[؀-ۿ]/.test(md), "should contain Arabic/RTL"); +} + +// 4. CSV builder 含带引号/逗号/换行的字段(边界 CSV)。 +{ + const csv = buildComplexCsv(1); + assert.ok(csv.includes("\"\""), "CSV should contain escaped quotes"); + assert.ok(csv.split("\n").length > 1, "CSV should have multiple rows"); +} + +// 5. small-scale 文本源能转换到全部产品矩阵目标格式且非空(回归可读性)。 +{ + const md = buildComplexMarkdown(1); + const targets = getAllowedOutputFormats("md"); + assert.ok(targets.length > 0); + for (const to of targets) { + const result = convertContent({ content: md, from: "md", to, title: "corpus", options: { repair: false } }); + const data = result?.data ?? result; + assert.ok(data && (typeof data === "string" ? data.length > 0 : data.length > 0), `md -> ${to} should be non-empty`); + } +} + +// 6. csv -> xlsx/json/md 走得通(结构化数据链)。 +{ + const csv = buildComplexCsv(2); + for (const to of ["xlsx", "json", "md", "html"]) { + const result = convertContent({ content: csv, from: "csv", to, title: "corpus-csv", options: { repair: false } }); + const data = result?.data ?? result; + assert.ok(data, `csv -> ${to} should produce output`); + } +} + +// 7. PNG encoder 产出有效 PNG 签名头 + 随尺寸增大。 +{ + const small = buildPatternPng(16, 16); + const big = buildPatternPng(64, 64); + const signature = [0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]; + for (let i = 0; i < signature.length; i += 1) { + assert.equal(small[i], signature[i], "PNG signature byte mismatch"); + } + assert.ok(big.length > small.length, "larger PNG should have more bytes"); +} + +// 8. buildToTargetBytes 能逼近目标字节(用于 large ≥ 3MB 层)。 +{ + const target = 200 * 1024; + const { content, scale } = buildToTargetBytes(buildComplexMarkdown, target); + assert.ok(Buffer.byteLength(content, "utf8") >= target, "should reach target bytes"); + assert.ok(scale > 1, "scale should grow to reach target"); +} + +console.log("Sample corpus test passed: deterministic complex builders, size scaling, CJK/RTL/table/code coverage, cross-format readability, PNG encoder, target-byte scaling verified."); diff --git a/scripts/ssim-verification-test.js b/scripts/ssim-verification-test.js new file mode 100644 index 0000000..45fd340 --- /dev/null +++ b/scripts/ssim-verification-test.js @@ -0,0 +1,181 @@ +import assert from "node:assert/strict"; + +import { + computeSSIM, + compareImages, + rgbaToGrayscale, + resampleGrayscale, + runSsimLayer, + runVerificationStageAsync, + defaultPageImageSource, + setPageImageSource, + resetPageImageSource, + RASTERIZABLE_FORMATS, + SSIM_VISUAL_DRIFT, + VERIFICATION_IMAGE_SOURCE_UNAVAILABLE, + convertContentAsync, +} from "../public/browser-transformer.js"; + +function solidImage(value, width = 32, height = 32) { + const pixels = new Uint8ClampedArray(width * height * 4); + for (let i = 0; i < width * height; i += 1) { + pixels[i * 4] = value; + pixels[i * 4 + 1] = value; + pixels[i * 4 + 2] = value; + pixels[i * 4 + 3] = 255; + } + return { pixels, width, height }; +} + +function gradientImage(width = 32, height = 32, offset = 0) { + // 平滑水平渐变(低频),重采样到不同网格仍稳定。 + const pixels = new Uint8ClampedArray(width * height * 4); + for (let y = 0; y < height; y += 1) { + for (let x = 0; x < width; x += 1) { + const v = Math.min(255, Math.round((x / Math.max(1, width - 1)) * 255) + offset); + const o = (y * width + x) * 4; + pixels[o] = v; + pixels[o + 1] = v; + pixels[o + 2] = v; + pixels[o + 3] = 255; + } + } + return { pixels, width, height }; +} + +// 1. rgbaToGrayscale length + luminance +{ + const gray = rgbaToGrayscale(solidImage(120, 4, 4).pixels); + assert.equal(gray.length, 16); + assert.equal(gray[0], 120); +} + +// 2. computeSSIM identical buffers -> 1 +{ + const gray = rgbaToGrayscale(gradientImage(16, 16).pixels); + const result = computeSSIM(gray, gray, 16, 16, { windowSize: 8 }); + assert.ok(result.score > 0.999, `identical SSIM should be ~1, got ${result.score}`); +} + +// 3. compareImages identical -> ~1, black vs white -> near 0 +{ + const black = solidImage(0); + const white = solidImage(255); + assert.ok(compareImages(black, black).score > 0.999); + assert.ok(compareImages(black, white).score < 0.05, "black vs white should be near zero"); +} + +// 4. compareImages monotonic: closer image scores higher than further +{ + const base = gradientImage(32, 32, 0); + const near = gradientImage(32, 32, 8); + const far = solidImage(0, 32, 32); + const nearScore = compareImages(base, near, { targetWidth: 32 }).score; + const farScore = compareImages(base, far, { targetWidth: 32 }).score; + assert.ok(nearScore > farScore, `near (${nearScore}) should beat far (${farScore})`); + assert.ok(nearScore < 1, "a perturbed image should be below 1"); +} + +// 5. resampleGrayscale resizes to requested grid + same-size passthrough +{ + const gray = rgbaToGrayscale(gradientImage(16, 16).pixels); + const down = resampleGrayscale(gray, 16, 16, 8, 8); + assert.equal(down.length, 64); + const same = resampleGrayscale(gray, 16, 16, 16, 16); + assert.equal(same.length, 256); +} + +// 6. compareImages handles mismatched dimensions by normalizing to a common grid +{ + const big = gradientImage(64, 64); + const small = gradientImage(16, 16); + const cmp = compareImages(big, small, { targetWidth: 32 }); + assert.equal(cmp.dimensionsMatched, false); + assert.ok(cmp.score > 0.5, "same gradient at different sizes should still be fairly similar"); +} + +// 7. runSsimLayer with stub image source -> eligible, score, passed +{ + setPageImageSource({ getPageImage: async ({ format }) => (format === "pdf" ? solidImage(100) : solidImage(104)) }); + const layer = await runSsimLayer({ ctx: { from: "png", to: "pdf", content: "", options: {} }, output: { data: "" } }); + assert.equal(layer.eligible, true); + assert.equal(typeof layer.ssim.score, "number"); + assert.equal(layer.ssim.sourceFormat, "png"); + assert.equal(layer.ssim.outputFormat, "pdf"); + resetPageImageSource(); +} + +// 8. runSsimLayer drift below threshold -> SSIM_VISUAL_DRIFT warning +{ + setPageImageSource({ getPageImage: async ({ format }) => (format === "pdf" ? solidImage(0) : solidImage(255)) }); + const layer = await runSsimLayer({ ctx: { from: "png", to: "pdf", content: "", options: { verification: { ssimThreshold: 0.9 } } }, output: { data: "" } }); + assert.equal(layer.eligible, true); + assert.equal(layer.ssim.passed, false); + assert.equal(layer.warnings.length, 1); + assert.equal(layer.warnings[0].code, SSIM_VISUAL_DRIFT); + resetPageImageSource(); +} + +// 9. runSsimLayer not eligible for non-rasterizable path +{ + const layer = await runSsimLayer({ ctx: { from: "md", to: "pdf", content: "# Hi", options: {} }, output: { data: "" } }); + assert.equal(layer.eligible, false); + assert.equal(layer.reason, "source-not-rasterizable"); + assert.equal(layer.ssim, null); + assert.equal(RASTERIZABLE_FORMATS.has("md"), false); + assert.equal(RASTERIZABLE_FORMATS.has("pdf"), true); +} + +// 10. defaultPageImageSource throws image-source-unavailable in Node (no DOM) +{ + resetPageImageSource(); + await assert.rejects( + () => defaultPageImageSource.getPageImage({ format: "pdf", content: "" }), + (err) => err.code === VERIFICATION_IMAGE_SOURCE_UNAVAILABLE, + "Node default image source must report unavailable", + ); + // and runSsimLayer surfaces that as eligible:false without throwing + const layer = await runSsimLayer({ ctx: { from: "png", to: "pdf", content: "", options: {} }, output: { data: "" } }); + assert.equal(layer.eligible, false); + assert.equal(layer.reason, "image-source-unavailable"); +} + +// 11. runVerificationStageAsync merges rule-diff base + ssim layer +{ + setPageImageSource({ getPageImage: async ({ format }) => (format === "pdf" ? solidImage(100) : solidImage(100)) }); + const env = await runVerificationStageAsync({ + model: { blocks: [] }, + output: { data: "" }, + ctx: { from: "png", to: "pdf", content: "", read: () => ({ blocks: [] }), options: {} }, + }); + assert.ok(env.layers.includes("ssim")); + // png is not text-canonical so rule-diff is skipped + assert.ok(env.skipped.some((s) => s.layer === "rule-diff")); + assert.notEqual(env.ssim, null); + resetPageImageSource(); +} + +// 12. End-to-end convertContentAsync: sync ssim stays null on text path; pdf->pdf populates ssim via stub +{ + // text path (md->md): ssim must be null, rule-diff present + resetPageImageSource(); + const textResult = await convertContentAsync({ content: "# Title\n\nBody", from: "md", to: "md", title: "e2e" }); + assert.equal(textResult.quality.qualityReport.ssim, null); + assert.equal(textResult.quality.qualityReport.ruleDiff.identical, true); + + // visual path (png->pdf) with stub image source: ssim populated + setPageImageSource({ getPageImage: async ({ format }) => (format === "pdf" ? solidImage(120) : solidImage(125)) }); + const visualResult = await convertContentAsync({ + content: "data:image/png;base64,AAAA", + from: "png", + to: "pdf", + title: "e2e-visual", + options: { ocr: { enabled: false } }, + }); + assert.notEqual(visualResult.quality.qualityReport.ssim, null); + assert.equal(visualResult.quality.qualityReport.ssim.sourceFormat, "png"); + assert.ok(visualResult.quality.qualityReport.verification.layers.includes("ssim")); + resetPageImageSource(); +} + +console.log("SSIM verification test passed: ssim core (grayscale/resample/computeSSIM/compareImages), ssim layer gating/drift/unavailable, async envelope merge, end-to-end null-on-text + populated-on-visual covered."); diff --git a/scripts/sync-onnxruntime-vendor.js b/scripts/sync-onnxruntime-vendor.js new file mode 100644 index 0000000..844f93c --- /dev/null +++ b/scripts/sync-onnxruntime-vendor.js @@ -0,0 +1,60 @@ +import { access, copyFile, mkdir, readdir, stat } from "node:fs/promises"; +import path from "node:path"; + +// 模仿 sync-tesseract-vendor:把 onnxruntime-web 的运行时资源(ort*.mjs + *.wasm)同步到 +// public/vendor/onnxruntime/,供 PP-OCRv5 高级 OCR 在浏览器/Tauri 端同源加载。 +// onnxruntime-web 是 optionalDependency;缺失时 exit 0,不阻塞 release:prepare / 安装。 + +const ROOT = process.cwd(); +const ORT_DIST = path.join(ROOT, "node_modules", "onnxruntime-web", "dist"); +const TARGET_DIR = path.join(ROOT, "public", "vendor", "onnxruntime"); + +async function pathExists(p) { + try { + await access(p); + return true; + } catch { + return false; + } +} + +async function main() { + if (!(await pathExists(ORT_DIST))) { + console.warn("[sync-onnxruntime-vendor] onnxruntime-web is not installed (optionalDependency missing). Skipping vendor sync."); + console.warn("[sync-onnxruntime-vendor] Run `npm install onnxruntime-web` to enable the PP-OCRv5 advanced OCR runtime."); + return; + } + + await mkdir(TARGET_DIR, { recursive: true }); + + // 只同步运行时实际需要的最小集合:`ort.min.mjs` 入口 + 它加载的 JSEP 构建 + // (`ort-wasm-simd-threaded.jsep.{mjs,wasm}`,同时支持 WebGPU 与 WASM 执行后端)。 + // 其余 all/bundle/jspi/asyncify/plain 变体(~68MB 冗余)不进 vendor,避免撑大应用体积。 + const KEEP = new Set([ + "ort.min.mjs", + "ort-wasm-simd-threaded.jsep.mjs", + "ort-wasm-simd-threaded.jsep.wasm", + ]); + const entries = await readdir(ORT_DIST); + let copied = 0; + for (const entry of entries) { + if (!KEEP.has(entry)) continue; + const source = path.join(ORT_DIST, entry); + const info = await stat(source); + if (!info.isFile()) continue; + await copyFile(source, path.join(TARGET_DIR, entry)); + copied += 1; + } + + if (copied === 0) { + console.warn("[sync-onnxruntime-vendor] No onnxruntime-web runtime assets found in dist/. PP-OCRv5 will stay unavailable until installed."); + return; + } + + console.log(`onnxruntime-web vendor synced to public/vendor/onnxruntime/ (${copied} files).`); +} + +main().catch((error) => { + console.warn(`[sync-onnxruntime-vendor] sync failed: ${error?.message || error}`); + process.exitCode = 0; +}); diff --git a/scripts/sync-paddleocr-vendor.js b/scripts/sync-paddleocr-vendor.js new file mode 100644 index 0000000..2c51444 --- /dev/null +++ b/scripts/sync-paddleocr-vendor.js @@ -0,0 +1,106 @@ +import { createHash } from "node:crypto"; +import { mkdir, readFile, rm, stat, writeFile } from "node:fs/promises"; +import path from "node:path"; +import { fileURLToPath } from "node:url"; + +// 把 PP-OCRv5 mobile 模型(det/rec + 字典)从钉定 commit 的 ppu-paddle-ocr-models 仓库下载到 +// public/vendor/paddleocr/,供高级 OCR 启动自动载入、开箱即用。逐文件 SHA-256 校验 +// scripts/paddleocr-models.manifest.json(入库,可复现)。 +// +// 设计与 sync-onnxruntime-vendor 一致的「非阻塞」原则: +// - 已存在且校验通过 → 跳过(幂等)。 +// - 网络/HTTP 失败(离线、源不可达)→ 警告 + exit 0,不阻断 npm install / release:prepare。 +// - 下到字节但 size/SHA-256 不符 → 删除半成品 + 非零退出(完整性问题必须报警,不可静默放行)。 +// +// 注意:本脚本在构建期联网;App 运行期仍零联网(模型已落到同源 vendor)。 +// 方向分类 cls.onnx 为可选,不随包;如需 180° 校正可在安全中心手动导入。 + +const __dirname = path.dirname(fileURLToPath(import.meta.url)); +const ROOT = process.cwd(); +const MANIFEST_PATH = path.join(__dirname, "paddleocr-models.manifest.json"); +const TARGET_DIR = path.join(ROOT, "public", "vendor", "paddleocr"); + +function remoteUrl(source, file) { + const { repo, commit } = source; + if (file.kind === "lfs") { + return `https://media.githubusercontent.com/media/${repo}/${commit}/${file.remotePath}`; + } + return `https://raw.githubusercontent.com/${repo}/${commit}/${file.remotePath}`; +} + +function sha256(buffer) { + return createHash("sha256").update(buffer).digest("hex"); +} + +async function fileMatches(destPath, file) { + try { + const info = await stat(destPath); + if (!info.isFile() || info.size !== file.size) return false; + const buffer = await readFile(destPath); + return sha256(buffer) === file.sha256; + } catch { + return false; + } +} + +async function main() { + let manifest; + try { + manifest = JSON.parse(await readFile(MANIFEST_PATH, "utf8")); + } catch (error) { + console.warn(`[sync-paddleocr-vendor] manifest 读取失败:${error?.message || error};跳过。`); + return; + } + + await mkdir(TARGET_DIR, { recursive: true }); + + let synced = 0; + let skipped = 0; + for (const file of manifest.files || []) { + const destPath = path.join(TARGET_DIR, file.target); + + if (await fileMatches(destPath, file)) { + skipped += 1; + continue; + } + + const url = remoteUrl(manifest.source, file); + let buffer; + try { + const response = await fetch(url); + if (!response.ok) throw new Error(`HTTP ${response.status}`); + buffer = Buffer.from(await response.arrayBuffer()); + } catch (error) { + // 网络/HTTP 失败:非阻塞退出(与 onnx/tesseract vendor 一致)。 + console.warn(`[sync-paddleocr-vendor] 下载 ${file.target} 失败(${error?.message || error})。`); + console.warn("[sync-paddleocr-vendor] 离线或源不可达时跳过;高级 OCR 仍可在安全中心手动导入模型。"); + console.warn(`[sync-paddleocr-vendor] 源:${url}`); + return; + } + + // 下到字节后做完整性校验:不符则删半成品 + 非零退出(不可静默放行)。 + const actualSize = buffer.length; + const actualSha = sha256(buffer); + if (actualSize !== file.size || actualSha !== file.sha256) { + await rm(destPath, { force: true }); + console.error(`[sync-paddleocr-vendor] ${file.target} 完整性校验失败:`); + console.error(` 期望 size=${file.size} sha256=${file.sha256}`); + console.error(` 实际 size=${actualSize} sha256=${actualSha}`); + console.error(" 源文件可能已变更/损坏;请核对 manifest 与钉定 commit。"); + process.exit(1); + } + + await writeFile(destPath, buffer); + synced += 1; + console.log(`[sync-paddleocr-vendor] ${file.target} 同步并校验通过 (${(actualSize / (1024 * 1024)).toFixed(2)} MB).`); + } + + console.log( + `PP-OCRv5 vendor synced to public/vendor/paddleocr/ (downloaded=${synced}, cached=${skipped}; cls optional, not bundled).`, + ); +} + +main().catch((error) => { + console.warn(`[sync-paddleocr-vendor] sync failed: ${error?.message || error}`); + process.exitCode = 0; +}); diff --git a/scripts/sync-tesseract-vendor.js b/scripts/sync-tesseract-vendor.js index 9a1788c..c984211 100644 --- a/scripts/sync-tesseract-vendor.js +++ b/scripts/sync-tesseract-vendor.js @@ -1,6 +1,13 @@ -import { access, copyFile, mkdir, readdir, stat } from "node:fs/promises"; +import { access, copyFile, mkdir, readdir, readFile, stat, writeFile } from "node:fs/promises"; import path from "node:path"; +// 本项目 local-only 安全门禁(scripts/local-security-test.js)禁止 public/vendor/ 下任何 +// vendor .js 出现远程协议字符串。tesseract.js bundle 里只内置了 CDN 默认路径 +// (https://cdn.jsdelivr.net/...),而运行时(tesseract-runtime.js)恒以同源 /vendor/ 路径 +// 覆盖 corePath/workerPath/langPath —— 这些 CDN 默认值是死代码。复制后把它们改写成同源相对 +// 路径并去掉 sourceMappingURL 注释,让 served asset 真正 local-only。 +const REMOTE_PROTOCOL_RE = /(https?:|wss?:)\/\//; + const ROOT = process.cwd(); const TESSERACT_DIST = path.join(ROOT, "node_modules", "tesseract.js", "dist"); const TESSERACT_CORE = path.join(ROOT, "node_modules", "tesseract.js-core"); @@ -23,6 +30,32 @@ async function copyIfPresent(source, destination) { return true; } +// 复制后清洗:把 vendor bundle 里的 CDN 默认路径改写成同源 /vendor 路径,并去掉 +// sourceMappingURL 注释(对应 .map 不再随包)。清洗后逐文件断言无残留远程协议, +// 让未来 tesseract 版本若引入新远程 host 时本脚本先行报警而非静默漏过门禁。 +async function sanitizeVendorBundles(dir) { + if (!(await pathExists(dir))) return; + const entries = await readdir(dir); + for (const entry of entries) { + if (!/\.(js|mjs)$/.test(entry)) continue; + const filePath = path.join(dir, entry); + const info = await stat(filePath); + if (!info.isFile()) continue; + const original = await readFile(filePath, "utf8"); + const cleaned = original + .replaceAll("https://cdn.jsdelivr.net", "/vendor") + .replace(/\n?\/\/[#@]\s*sourceMappingURL=.*$/gm, ""); + if (cleaned !== original) await writeFile(filePath, cleaned, "utf8"); + const leftover = cleaned.match(REMOTE_PROTOCOL_RE); + if (leftover) { + throw new Error( + `[sync-tesseract-vendor] ${path.relative(ROOT, filePath).replaceAll("\\", "/")} 清洗后仍含远程协议串 (${leftover[0]}...);` + + `请扩展 sanitizeVendorBundles 的改写规则后再发布。`, + ); + } + } +} + async function copyDistEntries(sourceDir, destDir) { if (!(await pathExists(sourceDir))) return []; const entries = await readdir(sourceDir); @@ -31,7 +64,7 @@ async function copyDistEntries(sourceDir, destDir) { const fullSource = path.join(sourceDir, entry); const info = await stat(fullSource); if (!info.isFile()) continue; - if (!/\.(js|mjs|wasm|map)$/.test(entry)) continue; + if (!/\.(js|mjs|wasm)$/.test(entry)) continue; if (entry.endsWith(".d.ts")) continue; await copyFile(fullSource, path.join(destDir, entry)); copied.push(entry); @@ -53,7 +86,7 @@ async function main() { let mainCopied = false; let workerCopied = false; for (const entry of distEntries) { - if (!/\.(js|mjs|map)$/.test(entry)) continue; + if (!/\.(js|mjs)$/.test(entry)) continue; if (entry.endsWith(".d.ts")) continue; const source = path.join(TESSERACT_DIST, entry); if (entry.startsWith("worker")) { @@ -84,7 +117,10 @@ async function main() { console.warn("[sync-tesseract-vendor] tesseract.js-core not found; wasm runtime missing. OCR will stay unavailable until installed."); } - console.log(`Tesseract.js vendor synced to public/vendor/tesseract/ (worker=${workerCopied}, core=${coreBundleCopied}).`); + await sanitizeVendorBundles(TARGET_CORE_DIR); + await sanitizeVendorBundles(TARGET_WORKER_DIR); + + console.log(`Tesseract.js vendor synced to public/vendor/tesseract/ (worker=${workerCopied}, core=${coreBundleCopied}); remote URLs sanitized.`); } main().catch((error) => { diff --git a/src-tauri/Cargo.lock b/src-tauri/Cargo.lock index dae71b3..860753d 100644 --- a/src-tauri/Cargo.lock +++ b/src-tauri/Cargo.lock @@ -3625,7 +3625,7 @@ dependencies = [ [[package]] name = "trans2former" -version = "2.2.0" +version = "2.3.0" dependencies = [ "tauri", "tauri-build", diff --git a/src-tauri/Cargo.toml b/src-tauri/Cargo.toml index 2099c3b..76618a7 100644 --- a/src-tauri/Cargo.toml +++ b/src-tauri/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "trans2former" -version = "2.2.0" +version = "2.3.0" description = "Trans2Former Desktop" edition = "2021" diff --git a/src-tauri/tauri.conf.json b/src-tauri/tauri.conf.json index 5a568ee..da6c3f7 100644 --- a/src-tauri/tauri.conf.json +++ b/src-tauri/tauri.conf.json @@ -1,7 +1,7 @@ { "$schema": "https://schema.tauri.app/config/2", "productName": "Trans2Former", - "version": "2.2.0", + "version": "2.3.0", "identifier": "com.vantalens.trans2former", "build": { "beforeDevCommand": "npm start",