Feat/uncertainty head#3
Open
Deepcity wants to merge 44 commits into
Open
Conversation
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tests verify optimum stays at var=sq_error (beta>0) and value-minimization (beta=0). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…o gate Gaussian baseline stays bit-identical (gate: max|delta|=0 on 10 utts). logvar clamp applied only to the new beta_nll head to preserve baseline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ion) Smoke (n=10): ENCE 4.75, pred_var 0.011 vs sq_err 0.363 (~32x overconfident), frame corr ~-0.03, utt Spearman 0.44 -> collapsed per-frame variance, utt-level signal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tch + detached-std reconstruction Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Baseline: ENCE 5.17, ~39x overconfident, frame-corr -0.025, utt Spearman 0.468. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…stable) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…libration plots) + diagnosis figures Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…one; A4/A2/A1 pending) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…->1.78) beta-NLL more than halves ENCE, ~4x less overconfident, wider variance; tradeoff: higher recon error + lower utt-level corr at 6k steps. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A2 (n=50): self-conf vs error Pearson beta-NLL 0.89 > gauss 0.78; best-of-N ~random (honest negative). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… rebuild HTML, 0 placeholders Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…+ coherent numbering Added: baseline reliability, train curves, ENCE/overconfidence bars, logvar distribution, per-frame var-vs-error density, temp-sweep overlay, A2 comparison. Fixed §4 to actual training settings + efficiency before/after note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…_hifigan mel convention mismatches MELLE log10-fbank (released model ~99% WER = vocoder mismatch); document honestly in 5.7 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…full training) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ial/Okabe-Ito plot style
- quick_exp cleaned 15G->4.4G (deleted health/smoke/stopped-100h dirs; trimmed unused step_3000/6001),
renamed kept dirs: released_clean100_gaussian_50k / matched_clean100_{gaussian,betanll}_6k (960h dirs untouched, training active).
- parse_log: split on step-reset, keep longest run -> fixes the spurious closed loop from appended re-runs.
- eval/plotstyle.py: Arial(->Liberation Sans), larger fonts, Okabe-Ito colorblind-safe palette; applied to all plot scripts; figures regenerated; report rebuilt.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ed-work section - eval/losses.py: nig_loss (Amini 2020 evidential regression) + tests (21/21 pass). - modules.py/MELLE.py/DDP_main.py: head_type='nig' (additive; gaussian/beta-NLL untouched); evidential head exposes same mu/logvar interface (aleatoric logvar) so inference/calibration/self-NLL are head-agnostic; evidential params stashed+masked for the loss; detached-std reconstruction. - docs/related_work_cn_models.md: verified 国模 implementation map; report 2.1 added (14-model table + positioning). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… root-cause analysis (why MELLE over-confidences) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…table (国模 survey -> Appendix B), add rich 读图 block under all 10 figures, fix released overconfidence 54x->39x Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ctions for standalone print review Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eline, not the released model) in text+figures; drop future-work (留作后续) phrasing — report is the final result
…e labels (fix CJK 乱码); NIG bake-off label -> ASCII
…igures; renumber figures 4-13; split §1 into 1.1 background / 1.2 motivation
…ELLE) + 4 figures; update masthead/abstract/§4/§7 for 960h scope
…p the explanations)
- fig3: 'mel-L2 to GT (free-running, noisy proxy)' -> 'mel L2 to GT (rough quality proxy)'; spell out OOD->unseen; caveat lives in the caption - fig15: 'utt'->'utterance' and wrap the 3 titles to 2 lines to stop the overconfidence/spearman title collision - report: replace all 9 '我们' with impersonal 本文/本工作 voice - drop inline arXiv ids in the §2 table and §B prose (MELLE/BELLE/GOAT/β-NLL), keep them only in References and the appendix survey table Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…NIG)
Self-NLL best-of-N selection plus calibration dumps backing §5.8:
a2_960*.json, calib_960_{gauss,betanll}_step10000.json,
dump_960_{gauss,betanll}.npz, and the selfconf_scatter_960* /
bestofn_bar_960* figures.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
….png - eval/losses.py: cast NIG inputs to fp32 (lgamma/log overflow under AMP) - scripts/predownload_models.sh: cache HuBERT/SpeechT5 ahead of eval - remove stray img.png Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The training-curve caption referenced a fixed 'closed-curve' plotting bug
('先前闭合曲线 bug 已修复...无回环') — development-process narrative that
does not belong in the final report. Replaced with a plain reading of the
curves; also dropped the recollection flavor of '当初' in the §5.7 rationale.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e is legible The baseline (gaussian) reliability bins collapse into a tight low-RMV cluster; the large solid markers blobbed together and hid the connecting line. Add a shared RELIABILITY_KW style (small hollow markers, thin line) in plotstyle.py and apply it to all three reliability figures (fig4/6/14) so they stay synchronized. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Regenerate report.md from report_revised.html (the reviewed version): - bilingual gloss of every acronym on first use (TTS, ENCE, MLP, β-NLL, NIG, WER, GV-ratio, OOD, KL, BCE, self-NLL, best-of-N) - measured rewrite of every section; soften the '国模/诚实' framing - rename headings (引言: 从复读现象说起; 方法: 诊断、改造与验证; 6 范围与局限; 5.8 三种方差头对比; B 相关国产模型的位置; etc.) - new title Figures remapped to figs/*.png paths (so the build uses the current PNGs, including the reliability-marker fix); YAML front-matter restored; pandoc round-trip escapes cleaned. Rebuilt HTML matches the revised baseline except auto-generated TOC anchors (now self-consistent) and the code-highlight wrapper div. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- §1.2 / §5.2: rewrite the two '不是…而是' sentences into plain assertions (count 2 -> 0), per the standalone-print voice rule - strip stray U+FE0E variation selectors after '↔' (3 spots: §1.2, §2, §5.9 table) - §1.2: unbold '多样性不可控' so item 4 matches the plain items 1-3 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The xelatex step hard-coded 'Noto Sans CJK SC', which isn't installed, so PDF generation always failed. Now build_report.sh auto-picks the first installed CJK font (Noto/Source Han > AR PL UMing > Droid Sans Fallback, ordered so the chosen font covers U+00B7 '·') and a symbol-capable Latin font (DejaVu Sans), and surfaces the real xelatex error instead of swallowing it. Produces report/report.pdf (19pp, A4) with zero missing glyphs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- delete §5.6 '可复现性发现(真实代码缺陷)' (eval-time dropout, unreachable temperature) per request - renumber downstream: §5.7 WER -> §5.6, §5.8 960h bake-off -> §5.7 - repoint cross-refs (masthead/§4/§5.7 note: §5.8->§5.7; §6 limitation: §5.7->§5.6) - drop the now-dangling abstract sentence summarizing the deleted subsection Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- remove the A0–A4 experiment-code labels from headings, prose, and the appendix command comments (A100 GPU references untouched) - §3.1 '诊断:校准审计' -> '校准检验:方法与指标'; §5.1 drop the '诊断:' prefix - replace the jargon 审计(audit) with 检验(check) throughout (11 spots), smoothing the few resulting repetitions in the abstract / §5.3 / §3.3 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.