This repository contains the code for a STATS 402 project on CNN regularization. The experiments compare standard training, input-space regularizers, vicinal methods, label smoothing, and SAM across clean accuracy, distribution-shift performance, calibration, adversarial evaluation, and training cost.
Large datasets, checkpoints, logs, course documents, and report source files are
not included in the repository. The small CSV/JSON summaries in results/ are
kept so the completed experiment outputs can be inspected without downloading
full checkpoints.
src/final_project/: training, data loading, model, metric, attack, and regularizer code.configs/: YAML configurations for the completed experiment stages.run_stage_a.pytorun_stage_d.py: stage-level runners.scripts/: setup and batch execution helpers.tests/: unit and smoke tests.results/: curated summaries from completed runs.
git clone git@github.com:AndyLu666/Stats402.git
cd Stats402
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export PYTHONPATH=./srcRun the test suite:
PYTHONPATH=./src pytest -qThe project uses public image-classification datasets.
- CIFAR-10 and CIFAR-100 are downloaded by
torchvision. - Tiny-ImageNet-200 should be placed under
data/tiny-imagenet-200/, or passed with--data-root. - CIFAR-10-C can be downloaded with
scripts/setup_benchmarks.sh. - CIFAR-10.1 and CIFAR-10.2 are optional. Evaluation skips them if the files are absent.
Prepare the benchmark directory:
bash scripts/setup_benchmarks.sh ./dataFor a smaller setup without CIFAR-10-C:
DOWNLOAD_CIFAR10C=0 bash scripts/setup_benchmarks.sh ./dataFull runs require GPU time. The same runners can be used with --pilot,
--epochs, and --max-steps-per-epoch for quick checks.
Stage A, single-family screening:
python run_stage_a.py --phase generate_configs --data-root ./data
python run_stage_a.py --phase coarse_train --run-root runs/stage_a_formal --data-root ./data --gpu 0
python run_stage_a.py --phase select_winners --run-root runs/stage_a_formal
python run_stage_a.py --phase summarize --run-root runs/stage_a_formalStage B, cross-dataset validation:
python run_stage_b.py --phase generate_configs --run-root runs/stage_b_formal --stage-a-run-root runs/stage_a_formal --include-challenger --data-root ./data
python run_stage_b.py --phase train --run-root runs/stage_b_formal --stage-a-run-root runs/stage_a_formal --data-root ./data --gpu 0
python run_stage_b.py --phase eval --run-root runs/stage_b_formal --stage-a-run-root runs/stage_a_formal --data-root ./data --gpu 0
python run_stage_b.py --phase summarize --run-root runs/stage_b_formal --stage-a-run-root runs/stage_a_formalStage C, regularizer composition:
python run_stage_c.py --phase generate_configs --run-root runs/stage_c_formal --data-root ./data
python run_stage_c.py --phase train --run-root runs/stage_c_formal --data-root ./data --gpu 0
python run_stage_c.py --phase eval --run-root runs/stage_c_formal --data-root ./data --gpu 0
python run_stage_c.py --phase summarize --run-root runs/stage_c_formalStage C+, MixCut diagnostic runs:
python run_stage_c_plus.py --phase generate_configs --run-root runs/stage_c_plus_formal --data-root ./data
python run_stage_c_plus.py --phase train --run-root runs/stage_c_plus_formal --data-root ./data --gpu 0
python run_stage_c_plus.py --phase eval --run-root runs/stage_c_plus_formal --data-root ./data --gpu 0
python run_stage_c_plus.py --phase summarize --run-root runs/stage_c_plus_formalStage D, matched transfer checks:
python run_stage_d.py --phase generate_configs --run-root runs/stage_d_formal --data-root ./data
python run_stage_d.py --phase train --run-root runs/stage_d_formal --data-root ./data --gpu 0
python run_stage_d.py --phase eval --run-root runs/stage_d_formal --data-root ./data --gpu 0
python run_stage_d.py --phase summarize --run-root runs/stage_d_formalThe fastest check is the test suite:
PYTHONPATH=./src pytest -qA small CPU pilot can also be launched after configurations are generated:
python run_stage_b.py \
--phase train \
--run-root runs/stage_b_smoke \
--stage-a-run-root runs/stage_a_formal \
--datasets cifar100 \
--families baseline \
--data-root ./data \
--device cpu \
--pilotrun_stage_a.py: Stage A configuration generation, training, winner selection, and summary.run_stage_b.py: Stage B cross-dataset validation on CIFAR-10, CIFAR-100, and Tiny-ImageNet-200.run_stage_c.py: Stage C composition runs for CutMix, Mixup, SAM, and Label Smoothing variants.run_stage_c_plus.py: Stage C+ MixCut and MixCut+SAM runs.run_stage_d.py: Stage D transfer checks across datasets, backbones, seeds, and finalist methods.src/final_project/config.py: config parsing and normalization.src/final_project/data.py: dataset loaders and benchmark helpers.src/final_project/models.py: ResNet-20, WRN-28-10, DenseNet-BC, and ConvNeXt-Tiny builders.src/final_project/train.py: training loop and regularizer integration.src/final_project/evaluate.py: clean, corruption, shift, calibration, FGSM, and PGD evaluation.src/final_project/metrics.py: accuracy, ECE, NLL, Brier score, and related metrics.src/final_project/attacks.py: FGSM and PGD helpers.src/final_project/results.py: result loading and aggregation.src/final_project/regularizers/: Mixup, CutMix, AugMix, MixCut, rotation, and SAM implementations.
results/stage_a/: screening results and selected family winners.results/stage_b/: cross-dataset single-method summaries.results/stage_c/: regularizer-composition summaries.results/stage_c_plus/: MixCut diagnostic summaries.results/stage_d/: transfer-check summaries.
The full training outputs are excluded because they are large. Re-running a
stage writes checkpoints and logs under runs/, which is ignored by Git.