feat: Add training loop skeleton with logging hooks and checkpoint save/load by KrishanYadav333 · Pull Request #24 · ML4SCI/EXXA

KrishanYadav333 · 2026-03-26T15:37:43Z

Part of pre-GSoC groundwork for the EXXA DDPM denoising pipeline.

Adds a minimal, model-agnostic Trainer class that drives the training loop.

Changes

src/training/trainer.py — Trainer class with:
- train_one_epoch() — dataloader iteration, forward pass, loss, optimizer step
- log_fn hook — called with (epoch, step, loss) after each step
- save_checkpoint() / load_checkpoint() — full state dict round-trip
src/training/__init__.py — exports Trainer

Design

Trainer is model-agnostic — it expects any nn.Module with a training_loss(batch) -> Tensor method. This means it works today with the toy model in tests and will plug directly into DDPM once implemented.

Tests

18 tests in tests/test_trainer.py covering:

Instantiation and device placement
train_one_epoch return type, loss positivity, epoch counter
Loss descent over multiple epochs on a toy synthetic batch
Logging hook call count and argument types
Checkpoint save/load restoring epoch, model weights, and inference consistency

All 18 pass.

…/load

feat: Add Trainer with train loop, logging hooks, and checkpoint save…

530e371

…/load

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add training loop skeleton with logging hooks and checkpoint save/load#24

feat: Add training loop skeleton with logging hooks and checkpoint save/load#24
KrishanYadav333 wants to merge 1 commit intoML4SCI:mainfrom
KrishanYadav333:feat/training-loop

KrishanYadav333 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KrishanYadav333 commented Mar 26, 2026

Changes

Design

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant