Skip to content

P8 routing baseline → P9-B FixedLayoutModel + lightweight-default-bundle direction#2

Merged
Vantalens merged 6 commits into
mainfrom
codex/p8-routing-release-baseline
May 28, 2026
Merged

P8 routing baseline → P9-B FixedLayoutModel + lightweight-default-bundle direction#2
Vantalens merged 6 commits into
mainfrom
codex/p8-routing-release-baseline

Conversation

@Vantalens

Copy link
Copy Markdown
Owner

Summary

  • P8 路由与可执行 mapper 基线、S1 / S2 Repair Engine / S3 Model Cache、UI-A 三视图重构(Landing + Workbench + 独立预览页 + hash 路由)。
  • 方向调整:撤销「模型随安装包交付」叙事,落地「默认轻量包 30–80 MB + OCR 模型按需下载到 model-cache」;同步 8 个方向文档 + 守门测试。
  • OCR 链路 P9-A.1 → P9-A.4 → P9-B:契约 + placeholder → tesseract.js vendor + IDB tessdata 启用 → PNG 异步 OCR + Repair Engine OCR 入口 → 扫描 PDF Rasterizer + 多页 stage → OCRResult 落 FixedLayoutModel + 浏览器端 pdfjs rasterize 自动加载。

Test plan

  • `npm test` 全 20 个脚本通过(含新 product-matrix-docs / repair-engine / model-cache / ocr-baseline / local-model-direction 守门)
  • `npm run release:prepare`(pdfjs + tesseract vendor sync,tesseract 缺包时退 0 不阻塞)
  • `git diff --check`(无 trailing whitespace)
  • `node scripts/local-security-test.js`(新 OCR 文件白名单 + STRICT 守门)
  • 浏览器手动:`npm install tesseract.js && npm run vendor:tesseract && npm start` → 安全中心导入 tessdata → 上传 PNG/扫描 PDF → 输出 markdown/txt 包含 OCR 文本
  • 桌面手动:`npm run desktop:build`(Windows MSI/NSIS)
  • 跨平台 macOS/Linux 构建(P7-B 待启动,不在本 PR 范围)

🤖 Generated with Claude Code

Vantalens and others added 6 commits May 27, 2026 23:59
- Updated `product-matrix-docs-test.js` to include validation for known input formats and ensure all documented formats are accounted for.
- Modified `release-readiness-test.js` to include the new `sync-tesseract-vendor.js` script in the release preparation process.
- Added `repair-engine-test.js` to implement comprehensive tests for the repair engine, covering action contracts, validation, and end-to-end conversion scenarios.
- Introduced `sync-tesseract-vendor.js` to manage the synchronization of Tesseract.js vendor files.
- Updated `tauri.conf.json` to adjust the Content Security Policy for improved security.
阶段状态表中存在已完成 milestone 与原 "待启动" 描述并列的两条遗留行:
- P9-A.3 端到端 PNG + 扫描 PDF(已被 P9-A.3 PNG 异步 OCR 接入 + Repair 入口替代)
- P9-B OCR → FixedLayoutModel(已被 P9-B OCR → FixedLayoutModel + 浏览器 rasterize 替代)

删除两条旧行以反映 P9-A.3 / P9-A.4 / P9-B 实际拆分与完成状态。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 28, 2026 15:48
@Vantalens Vantalens merged commit b9a8909 into main May 28, 2026
1 check failed
@Vantalens Vantalens deleted the codex/p8-routing-release-baseline branch May 28, 2026 15:50

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c015f67967

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return null;
}

export const tesseractOCREngine = Object.freeze({

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep Tesseract readiness state mutable

When the Security Center imports or clears tessdata, it calls tesseractOCREngine.ensureProbe(), which assigns this._tessdataReady; because this object is frozen, ES module strict mode throws TypeError: Cannot assign to read only property '_tessdataReady'. In the import flow this is caught as an import failure before the model can become available, so users cannot enable the Tesseract OCR engine even after selecting a valid .traineddata file.

Useful? React with 👍 / 👎.

Comment thread public/app.js
Comment on lines +1235 to +1236
if (String(payload?.from || "").toLowerCase() === "png") {
return Promise.resolve(convertInBrowserAsync(payload));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route worker conversions through the async OCR path

This only uses convertInBrowserAsync when Worker is unavailable, but the normal browser/Tauri path creates /workers/convert-worker.js, which still calls the synchronous convertContent. As a result, PNG conversions from the workbench skip registry.convertAsync() and never run runOCRStage, so imported OCR models have no effect in the primary UI path.

Useful? React with 👍 / 👎.

@Vantalens Vantalens review requested due to automatic review settings May 28, 2026 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant