Skip to content

updates for the IIA course #3

Merged
amithjkamath merged 3 commits into
ubern-mia:mainfrom
amithjkamath:main
May 13, 2026
Merged

updates for the IIA course #3
amithjkamath merged 3 commits into
ubern-mia:mainfrom
amithjkamath:main

Conversation

@amithjkamath

Copy link
Copy Markdown
Collaborator

Updates specifically for the training-models folder with infrastructure to make the IIA course easier to follow

Copilot AI review requested due to automatic review settings May 13, 2026 11:20
@amithjkamath amithjkamath merged commit 0e81762 into ubern-mia:main May 13, 2026
1 of 2 checks passed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reorganizes the training-models/ Dermamnist scripts to share common training/evaluation/visualization logic, and adds repo-level tooling (uv + Makefile) intended to make the IIA course workflow easier to follow.

Changes:

  • Introduce training-models/shared/ utilities for data loading, training/checkpointing, evaluation, and visualization.
  • Refactor Dermamnist v1–v7 scripts to use the shared utilities and a consistent argparse CLI (train|test|visualize).
  • Add uv/Makefile-based setup instructions and update documentation paths/layout.

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
training-models/utils.py Leaves a pointer note indicating utilities moved into shared/.
training-models/shared/init.py Defines the shared package for common training-model utilities.
training-models/shared/data.py Adds shared Dermamnist dataset loading helper.
training-models/shared/model.py Adds shared device/output-path helpers.
training-models/shared/utils.py Adds shared training loop, checkpointing, evaluation, visualization, and test helpers.
training-models/README.md Expands setup/run instructions and updates results/artifact layout references.
training-models/dermamnist_v1_initial.py Refactors script to shared utilities + CLI.
training-models/dermamnist_v2_momentum0p9.py Refactors script to shared utilities + CLI.
training-models/dermamnist_v3_lr0p005_val_patience.py Refactors script to shared utilities + CLI.
training-models/dermamnist_v4_adam_TB.py Refactors script to shared utilities + CLI.
training-models/dermamnist_v5_deeper_network.py Refactors script to shared utilities + CLI.
training-models/dermamnist_v6_even_deeper_network.py Refactors script to shared utilities + CLI.
training-models/dermamnist_v7_with_augm.py Refactors script to shared utilities + CLI (including augmentation).
training-models/_config.yml Updates book path to the notebook under training-models/.
README.md Adjusts a couple of links to use repo-relative paths.
pyproject.toml Adds project metadata + dependency list for uv installation.
Makefile Adds uv-based setup/install, clean targets, and versioned run targets.
.gitignore Ignores generated artifacts (results, checkpoints, ONNX, venv, etc.).
Comments suppressed due to low confidence (1)

training-models/README.md:60

  • Same table formatting issue here: the rows start with ||, which adds an empty column and misaligns the table in standard markdown renderers. Please switch to single leading | and verify the table renders correctly on GitHub.
## Version performance summary

| Version | Key change | Test accuracy | Delta |
|---|---|---|---|
| v1 | Baseline 4-layer CNN, SGD lr=5e-6, momentum=0.5 | ~0.65 | — |
| v2 | Momentum 0.5 → 0.9 | ~0.65 | ≈ 0.00 |
| v3 | Learning rate 5e-6 → 0.005, validation patience | ~0.75 | +0.10 |
| v4 | SGD → Adam, TensorBoard logging | **0.762** | +0.01 |
| v5 | Deeper 6-layer CNN (adds 128-ch block) | 0.755 | −0.007 |
| v6 | Even deeper 8-layer CNN (adds 256-ch block) | < v5 | −0.01 |
| v7 | Data augmentation (flip + crop) on 8-layer CNN | **0.770** | +0.015 |


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

labels = labels.squeeze().long()

outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
if epoch_label != "?":
epoch_label += 1 # stored as 0-indexed
best_acc = saved.get("best_accuracy", "?")
print(f"Loaded checkpoint: epoch {epoch_label}, best accuracy so far {best_acc:.4f}")
if flag == "train":
data_next = DataClass(split="val", transform=base_transform, download=True)
elif flag == "test":
data_next = DataClass(split="test", transform=base_transform, download=True)
Comment thread training-models/README.md
Comment on lines +36 to +46
Each `results/vN/` directory contains:

| File | Description |
|---|---|
| `best_model.pt` | Saved model weights at peak validation accuracy |
| `train_loss.png` / `val_acc.png` | Training curves (v1–v3) |
| TensorBoard event files | Training + validation loss and accuracy (v4–v7) |
| `model_summary.txt` | Layer-by-layer parameter count via **torchinfo** |
| `model_graph.png` | Computation graph image via **torchview** |
| `model.onnx` | ONNX export for interactive inspection with **netron** |

Comment thread training-models/README.md
Comment on lines 97 to 108
We invite you to make a copy of the notebook, and then make changes to it (you could simply copy changes from the scripts we point to) as we make progress in the versions below.

![Model is not really learning all categories](dermamnist_v1_initial/per_class_metrics.png)
![Model is not really learning all categories](results/v1/per_class_metrics.png)

Note from the image above that only the `melanocytic nevi` category is being learnt by the model, and since it has the largest representation in both the training and validation/test set, the weighted average accuracy is quite high even though all the other categories have 0 accuracy.

![training loss for initial version](dermamnist_v1_initial/train_loss.png)
![training loss for initial version](results/v1/train_loss.png)

Training loss for initial version: seems to reduce with increasing iterations, but then flattens out. When it flattens out, it is really not very useful to train for more iterations, as the accuracy also flattens out. We will see in the third version, how this wasteful training could be avoided using 'validation patience'.

![validation accuracy for initial version](dermamnist_v1_initial/val_acc.png)
![validation accuracy for initial version](results/v1/val_acc.png)

Comment thread training-models/README.md
![Test accuracy and classification report](dermamnist_v7_with_augm/test_accuracy_v7.png)
![Test accuracy and classification report](results/v7/test_accuracy_v7.png)

For this version, we note that the test accuracy is now 0.770, higher than all the benchmarks listed on the [MedMNIST webpage](https://medmnist.com)! Mote also that the `dermatofibroma` category is no longer 0 in it's metrics, and nearly all categories have precisions greater than all previous versions.
Comment thread pyproject.toml
[project]
name = "bender"
version = "0.1.0"
requires-python = ">=3.13"
Comment on lines +180 to +185
val_batch = next(iter(loader_val))
val_inputs = val_batch[0]
val_labels = val_batch[1].squeeze().long()
optimizer.zero_grad()
val_outputs = model(val_inputs)
val_loss = loss_function(val_outputs, val_labels)
torch.onnx.export(model, dummy_input, onnx_path, opset_version=11)
print(f"ONNX model saved. Run: netron {onnx_path}")
except ModuleNotFoundError as e:
print(f"ONNX export skipped ({e}). Install the missing package with: pip install onnxscript")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants