Skip to content

Fix LR scheduler accumulation mismatch and tune medium config#13

Merged
asfilion merged 1 commit into
mainfrom
fix/training-memory-leak
Feb 16, 2026
Merged

Fix LR scheduler accumulation mismatch and tune medium config#13
asfilion merged 1 commit into
mainfrom
fix/training-memory-leak

Conversation

@asfilion
Copy link
Copy Markdown
Owner

Summary

  • Fix LR scheduler step accumulation mismatch when resuming training
  • Tune medium config hyperparameters

Test plan

  • Full test suite passes (178 tests)
  • Verify training resumes correctly from checkpoint

🤖 Generated with Claude Code

- Fix scheduler to use optimizer steps per epoch (len(loader) // accum_steps)
  instead of raw batch count — LR was never decaying with accum_steps > 1
- Restore correct scheduler/optimizer ordering (scheduler after optimizer)
- Scale SpecAugment time masking with sequence length (~2% of frames)
- Tune medium config: LR 3e-4→1e-4, seq_len 4096→2048, dropout 0.15→0.2,
  warmup 1000→500

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@asfilion asfilion merged commit e79832b into main Feb 16, 2026
6 checks passed
@asfilion asfilion deleted the fix/training-memory-leak branch February 17, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant