Skip to content

Feat/uncertainty head#3

Open
Deepcity wants to merge 44 commits into
Shy-98:mainfrom
Deepcity:feat/uncertainty-head
Open

Feat/uncertainty head#3
Deepcity wants to merge 44 commits into
Shy-98:mainfrom
Deepcity:feat/uncertainty-head

Conversation

@Deepcity

@Deepcity Deepcity commented Jun 9, 2026

Copy link
Copy Markdown

No description provided.

Deepcity and others added 30 commits May 8, 2026 17:36
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tests verify optimum stays at var=sq_error (beta>0) and value-minimization (beta=0).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…o gate

Gaussian baseline stays bit-identical (gate: max|delta|=0 on 10 utts).
logvar clamp applied only to the new beta_nll head to preserve baseline.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ion)

Smoke (n=10): ENCE 4.75, pred_var 0.011 vs sq_err 0.363 (~32x overconfident),
frame corr ~-0.03, utt Spearman 0.44 -> collapsed per-frame variance, utt-level signal.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tch + detached-std reconstruction

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Baseline: ENCE 5.17, ~39x overconfident, frame-corr -0.025, utt Spearman 0.468.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…stable)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…libration plots) + diagnosis figures

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…one; A4/A2/A1 pending)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…->1.78)

beta-NLL more than halves ENCE, ~4x less overconfident, wider variance;
tradeoff: higher recon error + lower utt-level corr at 6k steps.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A2 (n=50): self-conf vs error Pearson beta-NLL 0.89 > gauss 0.78; best-of-N ~random (honest negative).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… rebuild HTML, 0 placeholders

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…+ coherent numbering

Added: baseline reliability, train curves, ENCE/overconfidence bars, logvar distribution,
per-frame var-vs-error density, temp-sweep overlay, A2 comparison. Fixed §4 to actual
training settings + efficiency before/after note.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…_hifigan mel convention mismatches MELLE log10-fbank (released model ~99% WER = vocoder mismatch); document honestly in 5.7

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…full training)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ial/Okabe-Ito plot style

- quick_exp cleaned 15G->4.4G (deleted health/smoke/stopped-100h dirs; trimmed unused step_3000/6001),
  renamed kept dirs: released_clean100_gaussian_50k / matched_clean100_{gaussian,betanll}_6k (960h dirs untouched, training active).
- parse_log: split on step-reset, keep longest run -> fixes the spurious closed loop from appended re-runs.
- eval/plotstyle.py: Arial(->Liberation Sans), larger fonts, Okabe-Ito colorblind-safe palette; applied to all plot scripts; figures regenerated; report rebuilt.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ed-work section

- eval/losses.py: nig_loss (Amini 2020 evidential regression) + tests (21/21 pass).
- modules.py/MELLE.py/DDP_main.py: head_type='nig' (additive; gaussian/beta-NLL untouched);
  evidential head exposes same mu/logvar interface (aleatoric logvar) so inference/calibration/self-NLL
  are head-agnostic; evidential params stashed+masked for the loss; detached-std reconstruction.
- docs/related_work_cn_models.md: verified 国模 implementation map; report 2.1 added (14-model table + positioning).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… root-cause analysis (why MELLE over-confidences)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…table (国模 survey -> Appendix B), add rich 读图 block under all 10 figures, fix released overconfidence 54x->39x

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ctions for standalone print review

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eline, not the released model) in text+figures; drop future-work (留作后续) phrasing — report is the final result
…e labels (fix CJK 乱码); NIG bake-off label -> ASCII
…igures; renumber figures 4-13; split §1 into 1.1 background / 1.2 motivation
claude and others added 14 commits June 8, 2026 15:32
…ELLE) + 4 figures; update masthead/abstract/§4/§7 for 960h scope
- fig3: 'mel-L2 to GT (free-running, noisy proxy)' -> 'mel L2 to GT (rough
  quality proxy)'; spell out OOD->unseen; caveat lives in the caption
- fig15: 'utt'->'utterance' and wrap the 3 titles to 2 lines to stop the
  overconfidence/spearman title collision
- report: replace all 9 '我们' with impersonal 本文/本工作 voice
- drop inline arXiv ids in the §2 table and §B prose (MELLE/BELLE/GOAT/β-NLL),
  keep them only in References and the appendix survey table

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…NIG)

Self-NLL best-of-N selection plus calibration dumps backing §5.8:
a2_960*.json, calib_960_{gauss,betanll}_step10000.json,
dump_960_{gauss,betanll}.npz, and the selfconf_scatter_960* /
bestofn_bar_960* figures.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
….png

- eval/losses.py: cast NIG inputs to fp32 (lgamma/log overflow under AMP)
- scripts/predownload_models.sh: cache HuBERT/SpeechT5 ahead of eval
- remove stray img.png

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The training-curve caption referenced a fixed 'closed-curve' plotting bug
('先前闭合曲线 bug 已修复...无回环') — development-process narrative that
does not belong in the final report. Replaced with a plain reading of the
curves; also dropped the recollection flavor of '当初' in the §5.7 rationale.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e is legible

The baseline (gaussian) reliability bins collapse into a tight low-RMV
cluster; the large solid markers blobbed together and hid the connecting
line. Add a shared RELIABILITY_KW style (small hollow markers, thin line)
in plotstyle.py and apply it to all three reliability figures (fig4/6/14)
so they stay synchronized.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Regenerate report.md from report_revised.html (the reviewed version):
- bilingual gloss of every acronym on first use (TTS, ENCE, MLP, β-NLL,
  NIG, WER, GV-ratio, OOD, KL, BCE, self-NLL, best-of-N)
- measured rewrite of every section; soften the '国模/诚实' framing
- rename headings (引言: 从复读现象说起; 方法: 诊断、改造与验证;
  6 范围与局限; 5.8 三种方差头对比; B 相关国产模型的位置; etc.)
- new title

Figures remapped to figs/*.png paths (so the build uses the current
PNGs, including the reliability-marker fix); YAML front-matter restored;
pandoc round-trip escapes cleaned. Rebuilt HTML matches the revised
baseline except auto-generated TOC anchors (now self-consistent) and the
code-highlight wrapper div.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- §1.2 / §5.2: rewrite the two '不是…而是' sentences into plain assertions
  (count 2 -> 0), per the standalone-print voice rule
- strip stray U+FE0E variation selectors after '↔' (3 spots: §1.2, §2, §5.9 table)
- §1.2: unbold '多样性不可控' so item 4 matches the plain items 1-3

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The xelatex step hard-coded 'Noto Sans CJK SC', which isn't installed, so
PDF generation always failed. Now build_report.sh auto-picks the first
installed CJK font (Noto/Source Han > AR PL UMing > Droid Sans Fallback,
ordered so the chosen font covers U+00B7 '·') and a symbol-capable Latin
font (DejaVu Sans), and surfaces the real xelatex error instead of
swallowing it. Produces report/report.pdf (19pp, A4) with zero missing
glyphs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- delete §5.6 '可复现性发现(真实代码缺陷)' (eval-time dropout, unreachable
  temperature) per request
- renumber downstream: §5.7 WER -> §5.6, §5.8 960h bake-off -> §5.7
- repoint cross-refs (masthead/§4/§5.7 note: §5.8->§5.7; §6 limitation: §5.7->§5.6)
- drop the now-dangling abstract sentence summarizing the deleted subsection

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- remove the A0–A4 experiment-code labels from headings, prose, and the
  appendix command comments (A100 GPU references untouched)
- §3.1 '诊断:校准审计' -> '校准检验:方法与指标'; §5.1 drop the '诊断:' prefix
- replace the jargon 审计(audit) with 检验(check) throughout (11 spots),
  smoothing the few resulting repetitions in the abstract / §5.3 / §3.3

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants