Feat/uncertainty head by Deepcity · Pull Request #3 · Shy-98/MELLE

Deepcity · 2026-06-09T02:20:58Z

No description provided.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Tests verify optimum stays at var=sq_error (beta>0) and value-minimization (beta=0). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…o gate Gaussian baseline stays bit-identical (gate: max|delta|=0 on 10 utts). logvar clamp applied only to the new beta_nll head to preserve baseline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ion) Smoke (n=10): ENCE 4.75, pred_var 0.011 vs sq_err 0.363 (~32x overconfident), frame corr ~-0.03, utt Spearman 0.44 -> collapsed per-frame variance, utt-level signal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…tch + detached-std reconstruction Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Baseline: ENCE 5.17, ~39x overconfident, frame-corr -0.025, utt Spearman 0.468. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…stable) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…libration plots) + diagnosis figures Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…one; A4/A2/A1 pending) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…->1.78) beta-NLL more than halves ENCE, ~4x less overconfident, wider variance; tradeoff: higher recon error + lower utt-level corr at 6k steps. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A2 (n=50): self-conf vs error Pearson beta-NLL 0.89 > gauss 0.78; best-of-N ~random (honest negative). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… rebuild HTML, 0 placeholders Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…+ coherent numbering Added: baseline reliability, train curves, ENCE/overconfidence bars, logvar distribution, per-frame var-vs-error density, temp-sweep overlay, A2 comparison. Fixed §4 to actual training settings + efficiency before/after note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…_hifigan mel convention mismatches MELLE log10-fbank (released model ~99% WER = vocoder mismatch); document honestly in 5.7 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…full training) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ial/Okabe-Ito plot style - quick_exp cleaned 15G->4.4G (deleted health/smoke/stopped-100h dirs; trimmed unused step_3000/6001), renamed kept dirs: released_clean100_gaussian_50k / matched_clean100_{gaussian,betanll}_6k (960h dirs untouched, training active). - parse_log: split on step-reset, keep longest run -> fixes the spurious closed loop from appended re-runs. - eval/plotstyle.py: Arial(->Liberation Sans), larger fonts, Okabe-Ito colorblind-safe palette; applied to all plot scripts; figures regenerated; report rebuilt. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ed-work section - eval/losses.py: nig_loss (Amini 2020 evidential regression) + tests (21/21 pass). - modules.py/MELLE.py/DDP_main.py: head_type='nig' (additive; gaussian/beta-NLL untouched); evidential head exposes same mu/logvar interface (aleatoric logvar) so inference/calibration/self-NLL are head-agnostic; evidential params stashed+masked for the loss; detached-std reconstruction. - docs/related_work_cn_models.md: verified 国模 implementation map; report 2.1 added (14-model table + positioning). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… root-cause analysis (why MELLE over-confidences) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…table (国模 survey -> Appendix B), add rich 读图 block under all 10 figures, fix released overconfidence 54x->39x Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ctions for standalone print review Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…eline, not the released model) in text+figures; drop future-work (留作后续) phrasing — report is the final result

…e labels (fix CJK 乱码); NIG bake-off label -> ASCII

…igures; renumber figures 4-13; split §1 into 1.1 background / 1.2 motivation

…ELLE) + 4 figures; update masthead/abstract/§4/§7 for 960h scope

…p the explanations)

- fig3: 'mel-L2 to GT (free-running, noisy proxy)' -> 'mel L2 to GT (rough quality proxy)'; spell out OOD->unseen; caveat lives in the caption - fig15: 'utt'->'utterance' and wrap the 3 titles to 2 lines to stop the overconfidence/spearman title collision - report: replace all 9 '我们' with impersonal 本文/本工作 voice - drop inline arXiv ids in the §2 table and §B prose (MELLE/BELLE/GOAT/β-NLL), keep them only in References and the appendix survey table Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…NIG) Self-NLL best-of-N selection plus calibration dumps backing §5.8: a2_960*.json, calib_960_{gauss,betanll}_step10000.json, dump_960_{gauss,betanll}.npz, and the selfconf_scatter_960* / bestofn_bar_960* figures. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

….png - eval/losses.py: cast NIG inputs to fp32 (lgamma/log overflow under AMP) - scripts/predownload_models.sh: cache HuBERT/SpeechT5 ahead of eval - remove stray img.png Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The training-curve caption referenced a fixed 'closed-curve' plotting bug ('先前闭合曲线 bug 已修复...无回环') — development-process narrative that does not belong in the final report. Replaced with a plain reading of the curves; also dropped the recollection flavor of '当初' in the §5.7 rationale. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e is legible The baseline (gaussian) reliability bins collapse into a tight low-RMV cluster; the large solid markers blobbed together and hid the connecting line. Add a shared RELIABILITY_KW style (small hollow markers, thin line) in plotstyle.py and apply it to all three reliability figures (fig4/6/14) so they stay synchronized. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Regenerate report.md from report_revised.html (the reviewed version): - bilingual gloss of every acronym on first use (TTS, ENCE, MLP, β-NLL, NIG, WER, GV-ratio, OOD, KL, BCE, self-NLL, best-of-N) - measured rewrite of every section; soften the '国模/诚实' framing - rename headings (引言: 从复读现象说起; 方法: 诊断、改造与验证; 6 范围与局限; 5.8 三种方差头对比; B 相关国产模型的位置; etc.) - new title Figures remapped to figs/*.png paths (so the build uses the current PNGs, including the reliability-marker fix); YAML front-matter restored; pandoc round-trip escapes cleaned. Rebuilt HTML matches the revised baseline except auto-generated TOC anchors (now self-consistent) and the code-highlight wrapper div. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- §1.2 / §5.2: rewrite the two '不是…而是' sentences into plain assertions (count 2 -> 0), per the standalone-print voice rule - strip stray U+FE0E variation selectors after '↔' (3 spots: §1.2, §2, §5.9 table) - §1.2: unbold '多样性不可控' so item 4 matches the plain items 1-3 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The xelatex step hard-coded 'Noto Sans CJK SC', which isn't installed, so PDF generation always failed. Now build_report.sh auto-picks the first installed CJK font (Noto/Source Han > AR PL UMing > Droid Sans Fallback, ordered so the chosen font covers U+00B7 '·') and a symbol-capable Latin font (DejaVu Sans), and surfaces the real xelatex error instead of swallowing it. Produces report/report.pdf (19pp, A4) with zero missing glyphs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- delete §5.6 '可复现性发现（真实代码缺陷）' (eval-time dropout, unreachable temperature) per request - renumber downstream: §5.7 WER -> §5.6, §5.8 960h bake-off -> §5.7 - repoint cross-refs (masthead/§4/§5.7 note: §5.8->§5.7; §6 limitation: §5.7->§5.6) - drop the now-dangling abstract sentence summarizing the deleted subsection Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- remove the A0–A4 experiment-code labels from headings, prose, and the appendix command comments (A100 GPU references untouched) - §3.1 '诊断：校准审计' -> '校准检验：方法与指标'; §5.1 drop the '诊断：' prefix - replace the jargon 审计(audit) with 检验(check) throughout (11 spots), smoothing the few resulting repetitions in the abstract / §5.3 / §3.3 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Deepcity and others added 30 commits May 8, 2026 17:36

feat: add MELLE reproduction workflow

a8fb4a2

chore: scaffold uncertainty-head work + design spec & plan

74cc82e

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat: zero-dependency metrics lib (GV-ratio, ENCE, reliability, corr)

9fdc70a

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat: beta-NLL heteroscedastic loss (Seitzer 2022)

0ff39e0

Tests verify optimum stays at var=sq_error (beta>0) and value-minimization (beta=0). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

test: capture pre-change baseline inference reference (md5)

b23e4da

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(A0): thread temp + capture mu/logvar + head_type scaffold + repr…

b6144c2

…o gate Gaussian baseline stays bit-identical (gate: max|delta|=0 on 10 utts). logvar clamp applied only to the new beta_nll head to preserve baseline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

fix: PreNet dropout respects eval mode (+quantify eval-time noise)

6424b55

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(A4): head_type/beta/nll_weight training args + beta-NLL loss swi…

0353d37

…tch + detached-std reconstruction Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

exp(A3): baseline calibration (1234 utts) + 5k->50k trajectory

043a66f

Baseline: ENCE 5.17, ~39x overconfident, frame-corr -0.025, utt Spearman 0.468. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

exp(A4): beta-NLL 50-step health check (5/50 overflow, loss falling, …

3838aad

…stable) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(A1/A2/A3): eval+plot scripts (selfconf best-of-N, temp sweep, ca…

551306f

…libration plots) + diagnosis figures Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs: printable report skeleton with diagnosis results baked in (A3 d…

f8daeda

…one; A4/A2/A1 pending) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs: build printable report HTML (pandoc 2.x self-contained, A4 CJK)

b4934dc

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

exp(A3-after): matched gauss-rerun vs beta-NLL calibration (ENCE 4.13…

aa16878

…->1.78) beta-NLL more than halves ENCE, ~4x less overconfident, wider variance; tradeoff: higher recon error + lower utt-level corr at 6k steps. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs+exp(A2): fill A2 best-of-N + A3-after into report, rebuild HTML

beadc20

A2 (n=50): self-conf vs error Pearson beta-NLL 0.89 > gauss 0.78; best-of-N ~random (honest negative). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

exp(A1)+docs: fill temp-sweep results (beta-NLL wider dynamic range),…

40f676f

… rebuild HTML, 0 placeholders Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

exp(WER)+docs: WER pipeline (eval/wer_gen.py) + finding that speecht5…

c8eb249

…_hifigan mel convention mismatches MELLE log10-fbank (released model ~99% WER = vocoder mismatch); document honestly in 5.7 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat: calibration-trajectory-compare plot (gaussian vs beta-NLL over …

04746e3

…full training) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs: add 1.1 motivation (what TTS problems calibration solves) + 3.2…

d629afa

… root-cause analysis (why MELLE over-confidences) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs: experiment-report restructure — condense related-work to 4-row …

955033e

…table (国模 survey -> Appendix B), add rich 读图 block under all 10 figures, fix released overconfidence 54x->39x Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs: report voice — remove meta phrasing (需要强调) and '不是…而是…' constru…

bf2e7a8

…ctions for standalone print review Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs: smooth residual 而非 in Appendix B

b1bdf60

docs: correct step_50000 label (train-clean-100 50k-step gaussian bas…

26e4e92

…eline, not the released model) in text+figures; drop future-work (留作后续) phrasing — report is the final result

docs: remove future-work (留作后续) phrasing — report stands as final result

06e25bd

exp: repetition/looping probe (3s-reference protocol) + English figur…

b531362

…e labels (fix CJK 乱码); NIG bake-off label -> ASCII

docs: open report with the repetition/looping failure (§1 hook) + 3 f…

81ad187

…igures; renumber figures 4-13; split §1 into 1.1 background / 1.2 motivation

claude and others added 14 commits June 8, 2026 15:32

docs: add §5.8 960h 3-head calibration bake-off (gaussian/β-NLL/NIG-B…

1edeab0

…ELLE) + 4 figures; update masthead/abstract/§4/§7 for 960h scope

docs: drop the literal '读图' label from figure-explanation blocks (kee…

9245d47

…p the explanations)

docs: remove last '读图' mention (mid-sentence)

4c4cf7b

feat:final report

080659e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/uncertainty head#3

Feat/uncertainty head#3
Deepcity wants to merge 44 commits into
Shy-98:mainfrom
Deepcity:feat/uncertainty-head

Deepcity commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Deepcity commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants