Skip to content

feat: V1.0 neural reranker (bert-tiny-chinese ONNX)#4

Merged
winterdrive merged 5 commits into
mainfrom
feat/user-supplement
Jun 22, 2026
Merged

feat: V1.0 neural reranker (bert-tiny-chinese ONNX)#4
winterdrive merged 5 commits into
mainfrom
feat/user-supplement

Conversation

@winterdrive

Copy link
Copy Markdown
Owner

Summary

  • 神經 reranker: ckiplab/bert-tiny-chinese ONNX (≈45 MB),使用 masked PLH 對 Viterbi top-5 重新排序,透過 ONNX Runtime load-dynamic 載入,若模型檔不存在則優雅降級
  • 最小信心閾值 0.40 nats: 防止 BERT-tiny 在低信心時覆蓋 Viterbi(遍/變 差距 ~0.20 被阻擋;實作/十座 差距 ~0.57 被接受)
  • 字典現代化: dict.rs 載入時自動將所有 799 個 條目映射為高優先 版本
  • supplement.tsv: 新增 /哪裡/這裡/那裡 現代變體條目
  • 版號: 0.5.0 → 1.0.0

通過的測試

  • test_eval_chifan: 你吃飯了嗎 ✅(字典現代化修正)
  • test_eval_cesuo: 請問廁所在哪裡 ✅(supplement 修正)
  • test_eval_shizuo: 這個功能還沒實作 ✅(reranker 修正,PLH diff 0.57)
  • test_mass_zaishuo: 你可以再說一遍嗎 ✅(閾值保護,迴歸未發生)
  • 全部 57 個測試通過

模型安裝(使用者端)

%APPDATA%\Migao\models\
├── bert-tiny-chinese.onnx      (0.6 MB)
├── bert-tiny-chinese.onnx.data (44.3 MB)
├── vocab.txt                   (109 KB)
└── onnxruntime.dll             (16.5 MB)

🤖 Generated with Claude Code

- viterbi.rs: BIGRAM_WEIGHT 0.15 → 0.20 (context-sensitive disambiguation)
- user_data.rs: runtime %APPDATA%\Migao\user_supplement.tsv loading
- dict.rs: parse_tsv_into refactor; loads user supplement at startup
- main.rs: new \migao report <garbled> <correct>\ subcommand
- lib.rs: expose user_data module
- bopomofo.rs: chifan / cesuo / shizuo require neural reranker to pass
- Add src/reranker.rs: masked PLH scoring via ckiplab/bert-tiny-chinese
  (ONNX Runtime load-dynamic); global OnceLock with Mutex<Session>
- Add src/tokenizer.rs: character-level BERT tokenizer with MASK_ID
- Reranker uses minimum-margin threshold (0.40 nats) to avoid overriding
  Viterbi when BERT confidence is low (e.g. 遍/變 blocked, 實作/十座 accepted)
- dict.rs: auto-modernise 喫→吃 at load time (799 compound entries)
- supplement.tsv: add 裡/哪裡/這裡/那裡 modern variant entries
- Cargo.toml: add ort=2.0.0-rc.12 (load-dynamic+ndarray), ndarray=0.17
- All 57 tests pass; eval tests test_eval_chifan/cesuo/shizuo all pass

Bump version 0.5.0 → 1.0.0
@winterdrive winterdrive force-pushed the feat/user-supplement branch from 386be37 to 0bf8d68 Compare June 22, 2026 13:06
@winterdrive winterdrive merged commit e4de8d9 into main Jun 22, 2026
1 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant