diff --git a/finetuning/SWIN/CONCRETE_RUN_README.md b/finetuning/SWIN/CONCRETE_RUN_README.md new file mode 100644 index 0000000..fdcf5ac --- /dev/null +++ b/finetuning/SWIN/CONCRETE_RUN_README.md @@ -0,0 +1,260 @@ +# Concrete next run — SWIN-L 384 (Tier 1 & 2) + +Implements the "Putting it together — a concrete next run" recipe from +`SWIN_training_setup_summary.md` (§10 recommendations, §11 step-by-step), as a single +strong single-model candidate plus the code features it needs. + +## What this run is + +| Lever | Choice | Section | +|-------|--------|---------| +| Backbone | `microsoft/swin-large-patch4-window12-384-in22k` (stay in SWIN) | 1.1 / 1.2 | +| Resolution | 384 (processor-driven, no resize code) | 1.2 | +| Loss | balanced-softmax CE + multi-task family/genus/species heads | 1.3-A | +| Schedule | 100 epochs, 5% warmup, cosine, **EMA on** | 2.6 | +| Augmentation | MEDIUM (RandAug mag 7, mild mixup/erasing) — cold-start safe | 2.5 | +| Inference | multi-crop + flip TTA wired, enabled only for final prediction | 2.7 | + +Config: [`configs_advanced/swin_large_384_concrete.yml`](configs_advanced/swin_large_384_concrete.yml) + +## Code features added to `SWIN_finetuning_advanced.py` + +All are config-gated and default **off**, so existing configs behave exactly as before. + +1. **Balanced softmax / logit adjustment (Tier 1.3-A)** — new `long_tail` section. + A per-class `log_prior` (log training frequency, in the species/CE head's index + space) is added to the species logits **during training only** (`logits + tau*log_prior`), + then plain argmax at inference. Down-weights head classes to lift macro-F1 on the long + tail. Applied in `MixupTrainer.compute_loss` for single-task and multi-task, in both the + mixup and non-mixup paths. Not applied to ArcFace. + ```yaml + long_tail: + logit_adjustment: true + tau: 1.0 # strength; 1.0 = standard balanced softmax + ``` + +2. **Weight EMA (Tier 2.6)** — new `ema` section + `EMACallback`. + Maintains a shadow average of the parameters (`shadow = decay*shadow + (1-decay)*param` + every step) and copies it into the model at train end, so the final `evaluate()` and + `save_model()` reflect EMA weights. **Keep `load_best_model_at_end: false`** — the + best-checkpoint reload would otherwise be overwritten by the EMA copy. + ```yaml + ema: + enabled: true + decay: 0.9998 + ``` + +3. **Horizontal-flip TTA (Tier 2.7)** — `multi_crop.flip`. + `build_multi_crop_transforms(..., flip=True)` also emits a flipped variant of each crop, + so logits average over crops × {orig, flip}. Leave `multi_crop.enabled: false` during + training; enable it for the final/leaderboard prediction only. + +4. **Gradient-checkpointing passthrough** — `MultiTaskSwinModel` / `SwinWithArcFace` now + forward `gradient_checkpointing_enable/disable` to the backbone, so + `training.gradient_checkpointing: true` works for the wrapped models (needed to fit + SWIN-L @384 on one GPU). + +## Environment setup (one-time) + +Jobs run via `train_advanced.sh`, which loads: + +```bash +module load miniconda +module load academic-ml/spring-2026 +conda activate spring-2026-pyt +``` + +`spring-2026-pyt` already provides torch 2.9.1, transformers 4.57.3 (≥4.52, required), +datasets, accelerate, safetensors, torchvision, scikit-learn, pillow, pyyaml, numpy. Two +packages it does **not** include are needed by the trainer — install them once into your +user-site: + +```bash +module load miniconda && conda activate spring-2026-pyt +pip install --user evaluate wandb +``` + +Notes: +- `evaluate` is required (accuracy / macro-F1); `wandb` is needed because the configs use + `report_to: wandb`. Set `--set training.report_to=none` (or `wandb.enabled: false`) to skip W&B. +- If `import wandb` fails with `cannot import name 'validate_core_schema' from 'pydantic_core'`, + the `--user` install shadowed the env's `pydantic_core`. Remove the duplicate so the env's + copy is used again: + `rm -rf ~/.local/lib/python3.12/site-packages/pydantic_core ~/.local/lib/python3.12/site-packages/pydantic_core-*.dist-info` +- `evaluate.load(...)` downloads its metric script from the HF hub on first use and caches it + under `~/.cache/huggingface`. Run the smoke test (below) once from a login node to warm the + cache if your compute nodes can't reach the hub. +- The PyTorch env requires `gpu_c >= 7.0`; the submit scripts request `gpu_c=8.0` (A100), so OK. + +Sanity check the env: +```bash +python -c "import torch, transformers, datasets, evaluate, wandb; print('env OK')" +``` + +### Weights & Biases (logs to gardoslab / herbdl) + +The trainer calls `wandb.init(entity="gardoslab", project="herbdl", name=run_name, +group=run_group, id=run_id, ...)` straight from the config (see +`SWIN_finetuning_advanced.py`), so no code change is needed — you only need team membership ++ a valid API key. + +1. **Be a member of the `gardoslab` team.** Open while signed in. + If you can't see it, ask the team owner to invite your W&B username. `entity="gardoslab"` + fails with a permission error until you're a member — being logged in is not enough. + +2. **Authenticate on SCC** (login node; `~/.netrc` is shared, so compute-node jobs reuse it — + no per-job login). Grab your key from : + ```bash + module load miniconda && conda activate spring-2026-pyt + wandb login --relogin # paste key; --relogin replaces a stale key + ``` + +3. **Verify** (the stored key can be stale even though `~/.netrc` exists): + ```bash + wandb login --verify + python -c "import wandb; v=wandb.Api().viewer; print(v.username, '| teams:', v.teams)" + ``` + `gardoslab` should appear in `teams`. + +Notes: +- Alternative to `~/.netrc`: `export WANDB_API_KEY=` in your shell profile (keeps the + key out of any committed script). +- The seed loop in `submit_concrete.sh` sets a distinct `run_id`/`run_name` per seed, so seeds + appear as separate runs grouped under `SWIN_L_384_Concrete`. +- To skip W&B for a run: `--set training.report_to=none` (or `wandb.enabled: false`). +- If a compute node can't reach W&B: `export WANDB_MODE=offline`, then `wandb sync ` later. + +## How to launch (you run this — nothing is auto-submitted) + +Single run (seed 0): +```bash +cd finetuning/SWIN +SEEDS="0" bash submit_concrete.sh +``` + +3- or 5-seed ensemble: +```bash +bash submit_concrete.sh # seeds 0 1 2 +SEEDS="0 1 2 3 4" bash submit_concrete.sh +``` + +Each job requests 1 A100-80G GPU on `herbdl` for 48h and writes to +`finetuning/output/SWIN/SWIN_L_384_CONCRETE_SEED/`. Adjust the `-M` email in +`submit_concrete.sh` if needed. + +### Smoke test first (recommended) +Verify the pipeline end-to-end cheaply before committing 48h jobs: +```bash +qsub -l h_rt=2:00:00 -pe omp 8 -P herbdl -l gpus=1 -l gpu_c=8.0 -l gpu_memory=80G \ + -N SWINL384_SMOKE \ + -v CONFIG_FILE=configs_advanced/swin_large_384_concrete.yml \ +-v SET_ARGS="--set data.max_train_samples=2000 --set data.max_eval_samples=2000 --set training.num_train_epochs=1 --set training.output_dir=/projectnb/herbdl/workspaces/tgardos/herbdl/finetuning/output/SWIN/SMOKE --set training.overwrite_output_dir=true --set wandb.enabled=false" \ + train_advanced.sh +``` + +## Output paths auto-relocate to your workspace + +Most configs in this repo (inherited from faridkar's) hardcode `output_dir`/`logging_dir` +under `/projectnb/herbdl/workspaces/faridkar/herbdl/...`. The trainer rewrites any +`.../workspaces//herbdl` prefix to the repo you actually run from, preserving the +trailing run name — so a `tgardos` checkout writes to +`/projectnb/herbdl/workspaces/tgardos/herbdl/finetuning/output/SWIN/` automatically, +with no YAML edits. It logs the rewrite (`__CUSTOM__: Relocated output path ...`). Set +`HERBDL_NO_RELOCATE=1` to disable (e.g. to write somewhere else via an explicit path). + +## Warm-start (Tier 2.5 — recommended once a 384 checkpoint exists) + +Cold-from-in22k is the dependency-free default. The curriculum finding is that chaining a +hard change from a converged checkpoint beats cold-starting it. Once you have a converged +SWIN-L 384 run, chain from it (keep `config_name`/`image_processor_name` on the 384 arch) +and raise `augmentation.randaugment.magnitude` to 9: +```bash +CKPT=/projectnb/herbdl/workspaces/tgardos/herbdl/finetuning/output/SWIN/SWIN_L_384_CONCRETE_SEED0 \ + SEEDS="1" bash submit_concrete.sh +``` + +## OOM / memory tuning + +SWIN-L @384 is heavy. If a job OOMs, lower the per-device batch and raise grad-accum to +keep the effective batch (~128) constant, e.g. via `--set`: +``` +--set training.per_device_train_batch_size=8 --set training.gradient_accumulation_steps=16 +``` +`gradient_checkpointing: true` is already on. + +## Final prediction with TTA + +For the leaderboard/final eval, enable TTA on the trained checkpoint: +```yaml +multi_crop: + enabled: true + crop_sizes: [400, 416, 448, 480, 512] + target_size: 384 + flip: true +``` +The trainer runs `multi_crop_evaluate` after the standard eval and prints averaged +accuracy + macro-F1 (`__CUSTOM__: Multi-crop eval ...`). Mirror the same crops/flip in +`prediction.py` / `kaggle_submission.py` so the submission matches the eval. + +## Metrics + +Both top-1 accuracy and macro-F1 are reported every epoch (`eval_accuracy` / +`eval_species_f1` for multi-task). Macro-F1 over the long tail is the number to watch +(Tier 0). + +## Remote monitoring from phone / MacBook (Claude Code Remote Control) + +To babysit a run (check `qstat`, read logs, tweak configs) from an iPhone or MacBook, use +Claude Code **Remote Control** — the `claude` process keeps running on the SCC login node +(full `/projectnb` + `qsub` access), and your phone/browser are just remote windows into it. +This is different from *Claude Code on the web*, whose cloud sandbox has **no** SCC access. + +### Updating Claude Code on SCC (needed: ≥ 2.1.51 for Remote Control) + +Claude Code here is installed as an npm **prefix** install and run via a shell alias: +```bash +alias claude='npx --prefix ~/claude-code claude' +``` +Because of that, `claude update` does **not** work — it targets npm's global prefix, which +is the read-only shared module dir (`/share/pkg.8/.../spring-2026-pyt`). Update the copy the +alias actually uses instead: +```bash +module load miniconda && conda activate spring-2026-pyt # for a consistent node/npm +npm install --prefix ~/claude-code @anthropic-ai/claude-code@latest +npx --prefix ~/claude-code claude --version # confirm >= 2.1.51 +``` +Re-run that `npm install --prefix` line whenever you want to upgrade (don't use `claude update`). + +### Starting a Remote Control session + +Remote Control requires a **claude.ai subscription login (Pro/Max/Team/Enterprise) — API keys +are not supported**. On the SCC login node: +```bash +unset ANTHROPIC_API_KEY # if set, it blocks Remote Control +claude /login # choose the claude.ai option (not a Console API key) + +tmux new -s claude-hpc # persistent: survives SSH disconnects +# inside tmux: +cd /projectnb/herbdl/workspaces/tgardos/herbdl +claude remote-control --name "HerbDL SWIN-L 384" +``` +It prints a session URL and offers a QR code (press space). Detach with `Ctrl-b d`; Claude +keeps running. + +- **iPhone:** Claude app → **Code** tab → pick "HerbDL SWIN-L 384" (or scan the QR). +- **MacBook:** open the session URL, or go to **claude.ai/code** and pick the session. For a + local terminal instead: `ssh -t scc1.bu.edu "tmux attach -t claude-hpc"`. + +Notes: +- Keep Claude on the **login node** (lightweight coordinator); GPU training stays in `qsub` + jobs on compute nodes. Don't run training directly under Claude. +- Remote Control can **push a phone notification** when a long task finishes (enable via `/config`). +- Text commands (`/context`, `/usage`) work from mobile; interactive pickers (`/resume`, `/mcp`) + only from the local terminal. + +## Deferred (next ensemble members) + +Per the chosen scope, these are intentionally **not** in this run and remain available to +add later as additional ensemble members: domain-pretrained backbone swap (Tier 1.1 +timm/open_clip loader), warmed-up ArcFace rescue (Tier 2.4), class-balanced sampler / +two-stage cRT (Tier 1.3-B), and +2021 data (Tier 2.8). diff --git a/finetuning/SWIN/CURRICULUM_REPORT.md b/finetuning/SWIN/CURRICULUM_REPORT.md new file mode 100644 index 0000000..d12332a --- /dev/null +++ b/finetuning/SWIN/CURRICULUM_REPORT.md @@ -0,0 +1,125 @@ +# Curriculum Learning — Stage-by-Stage Impact Report + +## Starting Point: SWIN_BASE_BASELINE + +**What it is:** SWIN-Base (224px, ImageNet-22k pretrained), fine-tuned with standard CE loss, no augmentation beyond basic resizing/normalization, unfrozen backbone from the start. + +**Result:** Peak F1 = **0.7454** @ epoch 47.8 + +**Interpretation:** Solid starting point. Slow convergence curve — model starts at 0.58 F1 and takes ~48 epochs to plateau. This is the reference to beat. + +--- + +## Interlude: Standalone Augmentation Test (SWIN_BASE_224_AUGMENTED) + +**What it added:** Heavy augmentation (RandAugment mag=9, Mixup α=0.8, CutMix α=1.0, RandomErasing 25%, label smoothing 0.1) applied directly from scratch — no warm-up, no curriculum. + +**Result:** Peak F1 = **0.6118** @ epoch 44.4 — **worse than baseline by 3.4 points** + +**Why it failed:** Throwing all regularization at a model cold is destructive. Strong Mixup/CutMix targets corrupt learning signal before the backbone has stabilized. The model oscillates and never recovers — note the flat 0.57–0.61 plateau from epoch 20–99. This is the key motivation for curriculum learning. + +--- + +## Curriculum Stage 1 — Mild Augmentation Warm-up + +**What changed:** Initialized from baseline checkpoint. RandAugment mag=4 (mild), Mixup α=0.8, CutMix α=1.0, RandomErasing p=0.1, label smoothing 0.05. LR = 5e-5. + +**Result:** Peak F1 = **0.7214** @ epoch 23.9 + +**Interpretation:** Starts immediately at 0.69 F1 (baseline already baked in), reaches 0.72 in 24 epochs. The mild augmentation + lower LR successfully builds on the baseline without disrupting it. Notably, this run converges faster than the baseline — 0.69 at epoch 3 vs. 0.58 for baseline. + +**Gain vs baseline at epoch 24:** +0.013 F1 + +--- + +## Curriculum Stage 2 — Medium Augmentation + +**What changed:** From S1 checkpoint. RandAugment mag=7 (stepped up), RandomErasing p=0.15, label smoothing 0.1. LR = 3e-5. + +**Result:** Peak F1 = **0.7421** @ epoch 27.3 + +**Gain vs S1:** +0.021 F1 + +**Interpretation:** The stepped-up augmentation is now helping rather than hurting, because the backbone is already warm. Model jumps to 0.72 at epoch 3 and climbs to 0.74 by epoch 27. + +--- + +## Curriculum Stage 3 — Heavy Augmentation + +**What changed:** From S2 checkpoint. RandAugment mag=9 (full strength), RandomErasing p=0.25. LR = 2e-5. 50 epochs. + +**Result:** Peak F1 = **0.7510** @ epoch 41.0 + +**Gain vs S2:** +0.009 F1. Diminishing returns beginning. + +**Interpretation:** Full augmentation now converges to a higher ceiling than baseline. However, the improvement margin is shrinking. The model starts at 0.74 immediately and creeps upward slowly — most gain is in early epochs, then it plateaus. + +--- + +## Curriculum Stage 3-Cont — Extended Cosine Schedule + +**What changed:** From S3 final model (not best checkpoint). Fresh cosine LR schedule restart from 2e-5. Same augmentation. Intended to push past the S3 plateau. + +**Result:** Peak F1 = **0.7510** @ epoch 50.0 + +**Gain vs S3:** **+0.000 F1** + +**Interpretation:** The LR restart did not help — S3 had already converged. The model stays in the same 0.74–0.75 band the entire 50 epochs. This suggests the 224px + CE + augmentation combination has hit its ceiling. + +--- + +## Curriculum MultiTask — Auxiliary Family/Genus Heads + +**What changed:** From S3-Cont final model. Added CE auxiliary heads for family and genus (weights 0.2×family + 0.3×genus + 1.0×species). Mixup/CutMix retained. LR = 3e-4 (higher — new heads need to train). 100 epochs. + +**Result:** Peak F1 = **0.7523** @ epoch 68.3 + +**Gain vs S3-Cont:** +0.001 F1 net, but with a very different trajectory. + +**Key observation:** The new family/genus heads start randomly initialized → eval_on_start near-zero → slow recovery through ~40 epochs before exceeding S3-Cont. MultiTask eventually pulls ahead but the improvement is modest. The multi-task signal is providing regularization but not a dramatic accuracy boost on its own. + +--- + +## Curriculum ArcFace — SubCenter ArcFace Metric Learning + +**What changed:** From MultiTask checkpoint. Replaced CE species head with SubCenter ArcFace (embedding=512, scale=30, margin=0.5, k=3 sub-centers). Mixup/CutMix disabled (incompatible with hard labels). Hybrid CE weight = 0.0. LR = 1e-4. 60 epochs. + +**Result:** Peak F1 = **0.7376** @ epoch 58.1 + +**Gain vs MultiTask:** **–0.015 F1** — a regression. + +**Interpretation:** ArcFace starts from near-zero (random embedding + weight matrix initialization), takes ~40 epochs just to recover to MultiTask's level, and peaks 1.5% *below* the MultiTask checkpoint it started from. The loss function change required too many epochs to re-learn what CE had already learned. The 60-epoch budget was insufficient for ArcFace to amortize its warm-up cost and then improve further. + +--- + +## Summary Table + +| Stage | Technique Added | Peak F1 | Δ vs Previous | Epochs to Peak | +|-------|----------------|---------|--------------|----------------| +| Baseline | CE, no augmentation | 0.7454 | — | 47.8 | +| Aug (standalone) | Heavy aug, no curriculum | 0.6118 | –0.034 | 44.4 | +| S1 | Mild aug (warm-up) | 0.7214 | –0.024* | 23.9 | +| S2 | Medium aug | 0.7421 | +0.021 | 27.3 | +| S3 | Heavy aug | 0.7510 | +0.009 | 41.0 | +| S3-Cont | LR restart | 0.7510 | +0.000 | 50.0 | +| MultiTask | Family/genus aux heads | 0.7523 | +0.001 | 68.3 | +| ArcFace | Metric learning loss | 0.7376 | **–0.015** | 58.1 | + +\* S1 starts below baseline because it used fewer epochs (25 vs. 48 for baseline). Chaining S1→S2→S3 ultimately exceeds the baseline ceiling (0.751 vs. 0.745). + +--- + +## Key Takeaways + +1. **Curriculum ordering matters critically.** Applying heavy augmentation cold destroyed performance (0.61). Applied progressively, it exceeds baseline (0.751 vs. 0.745). + +2. **The aug curriculum plateau is around 0.750–0.752.** S3, S3-Cont, and MultiTask all peak in this band. The 224px CE model appears structurally capped here. + +3. **MultiTask gave only marginal gain (+0.001).** The auxiliary signal helps slightly but the species task already dominates. More useful as regularization than as a direct accuracy booster. + +4. **ArcFace regressed.** The 60-epoch budget was too short — ArcFace requires a long cold-start recovery period before it can outperform CE. The hybrid/384 stages queued after it will inherit this disadvantage. + +5. **The gap to 0.80 is still ~5 points.** The most promising levers remaining are: + - **384px resolution** — larger receptive field is known to help fine-grained recognition + - **SWIN V2 architecture** — updated relative position bias and scaled cosine attention + - **Revisiting ArcFace** with a longer budget or frozen-backbone warm-up phase diff --git a/finetuning/SWIN/SWIN_finetuning.py b/finetuning/SWIN/SWIN_finetuning.py index ae4d6d0..df2f20d 100644 --- a/finetuning/SWIN/SWIN_finetuning.py +++ b/finetuning/SWIN/SWIN_finetuning.py @@ -51,9 +51,21 @@ from transformers.utils.versions import require_version import wandb +from transformers.integrations import WandbCallback os.environ['WANDB_DISABLED'] = 'false' +_WANDB_CONFIG_BLOCKLIST = {"label2id", "id2label"} + +class FilteredWandbCallback(WandbCallback): + """WandbCallback that skips large, uninformative model config keys.""" + def on_train_begin(self, args, state, control, model=None, **kwargs): + super().on_train_begin(args, state, control, model=model, **kwargs) + wandb.config.update( + {k: None for k in _WANDB_CONFIG_BLOCKLIST if k in wandb.config}, + allow_val_change=True, + ) + """ Fine-tuning a 🤗 Transformers model for image classification""" @@ -622,6 +634,8 @@ def val_transforms(example_batch): tokenizer=image_processor, data_collator=collate_fn, ) + trainer.remove_callback(WandbCallback) + trainer.add_callback(FilteredWandbCallback) # Training if training_args.do_train: diff --git a/finetuning/SWIN/SWIN_finetuning_advanced.py b/finetuning/SWIN/SWIN_finetuning_advanced.py index db32b3b..9354479 100644 --- a/finetuning/SWIN/SWIN_finetuning_advanced.py +++ b/finetuning/SWIN/SWIN_finetuning_advanced.py @@ -16,16 +16,20 @@ import argparse import logging import os +import re import sys import yaml from dataclasses import dataclass, field from typing import Optional import random +import math + import evaluate import numpy as np import torch import torch.nn as nn +import torch.nn.functional as F from datasets import load_dataset from PIL import Image from torchvision.transforms import ( @@ -50,6 +54,7 @@ AutoModelForImageClassification, HfArgumentParser, Trainer, + TrainerCallback, TrainingArguments, set_seed, ) @@ -57,8 +62,20 @@ from transformers.utils.versions import require_version import wandb +from transformers.integrations import WandbCallback + +os.environ['WANDB_DISABLED'] = 'false' # may be overridden by config -os.environ['WANDB_DISABLED'] = 'false' +_WANDB_CONFIG_BLOCKLIST = {"label2id", "id2label"} + +class FilteredWandbCallback(WandbCallback): + """WandbCallback that skips large, uninformative model config keys.""" + def on_train_begin(self, args, state, control, model=None, **kwargs): + super().on_train_begin(args, state, control, model=model, **kwargs) + wandb.config.update( + {k: None for k in _WANDB_CONFIG_BLOCKLIST if k in wandb.config}, + allow_val_change=True, + ) """ Fine-tuning a 🤗 Transformers model for image classification with advanced augmentations""" @@ -233,6 +250,13 @@ def __init__(self, base_model, num_families, num_genera, num_species): self.genus_classifier = nn.Linear(hidden_size, num_genera) self.species_classifier = nn.Linear(hidden_size, num_species) + def gradient_checkpointing_enable(self, **kwargs): + # Passthrough so HF Trainer's gradient_checkpointing flag reaches the backbone. + self.swin.gradient_checkpointing_enable(**kwargs) + + def gradient_checkpointing_disable(self): + self.swin.gradient_checkpointing_disable() + def forward(self, pixel_values, family_labels=None, genus_labels=None, species_labels=None, **kwargs): outputs = self.swin(pixel_values) pooled_output = outputs.pooler_output # [batch_size, hidden_size] @@ -260,17 +284,139 @@ def forward(self, pixel_values, family_labels=None, genus_labels=None, species_l } +class SubCenterArcMarginProduct(nn.Module): + """SubCenter ArcFace margin head (k sub-centers per class for robustness to label noise).""" + def __init__(self, in_features, out_features, k=3, s=30.0, m=0.50, easy_margin=False): + super().__init__() + self.in_features = in_features + self.out_features = out_features + self.s = s + self.m = m + self.k = k + self.weight = nn.Parameter(torch.FloatTensor(out_features * k, in_features)) + nn.init.xavier_uniform_(self.weight) + self.easy_margin = easy_margin + self.cos_m = math.cos(m) + self.sin_m = math.sin(m) + self.th = math.cos(math.pi - m) + self.mm = math.sin(math.pi - m) * m + + def forward(self, embeddings, labels=None): + embeddings = F.normalize(embeddings, p=2, dim=1) + weight = F.normalize(self.weight, p=2, dim=1) + cosine = F.linear(embeddings, weight).view(-1, self.out_features, self.k) + cosine, _ = torch.max(cosine, dim=2) # [B, num_classes] + + if labels is None: + return cosine * self.s + + sine = torch.sqrt((1.0 - cosine.pow(2)).clamp(0, 1)) + phi = cosine * self.cos_m - sine * self.sin_m + phi = torch.where(cosine > self.th, phi, cosine - self.mm) if not self.easy_margin else torch.where(cosine > 0, phi, cosine) + one_hot = torch.zeros_like(cosine).scatter_(1, labels.view(-1, 1).long(), 1) + return (one_hot * phi + (1.0 - one_hot) * cosine) * self.s + + +class SwinWithArcFace(nn.Module): + """ + SWIN backbone + SubCenter ArcFace species head. + Optionally adds CE auxiliary heads for family/genus (multi-task). + Optionally blends a CE species head with ArcFace (hybrid loss). + """ + def __init__(self, base_model, num_species, embedding_size=512, scale=30.0, margin=0.50, + num_subcenters=3, num_families=None, num_genera=None, + family_weight=0.2, genus_weight=0.3, hybrid_ce_weight=0.0): + super().__init__() + self.config = base_model.config + if hasattr(base_model, 'swinv2'): + self.swin = base_model.swinv2 + elif hasattr(base_model, 'swin'): + self.swin = base_model.swin + else: + raise ValueError("Base model must have 'swin' or 'swinv2' attribute") + + hidden_size = base_model.config.hidden_size + self.num_species = num_species + self.family_weight = family_weight + self.genus_weight = genus_weight + self.hybrid_ce_weight = hybrid_ce_weight + + self.embedding = nn.Linear(hidden_size, embedding_size) + self.bn = nn.BatchNorm1d(embedding_size) + self.arcface = SubCenterArcMarginProduct(embedding_size, num_species, k=num_subcenters, s=scale, m=margin) + + if hybrid_ce_weight > 0: + self.ce_classifier = nn.Linear(hidden_size, num_species) + + self.use_multi_task = num_families is not None and num_genera is not None + if self.use_multi_task: + self.family_classifier = nn.Linear(hidden_size, num_families) + self.genus_classifier = nn.Linear(hidden_size, num_genera) + + def gradient_checkpointing_enable(self, **kwargs): + # Passthrough so HF Trainer's gradient_checkpointing flag reaches the backbone. + self.swin.gradient_checkpointing_enable(**kwargs) + + def gradient_checkpointing_disable(self): + self.swin.gradient_checkpointing_disable() + + def forward(self, pixel_values, labels=None, family_labels=None, genus_labels=None, species_labels=None, **kwargs): + pooled = self.swin(pixel_values).pooler_output + embeddings = self.bn(self.embedding(pooled)) + arc_labels = species_labels if species_labels is not None else labels + + if arc_labels is not None: + arc_logits = self.arcface(embeddings, arc_labels) + arc_loss = F.cross_entropy(arc_logits, arc_labels) + + if self.hybrid_ce_weight > 0: + ce_logits = self.ce_classifier(pooled) + ce_loss = F.cross_entropy(ce_logits, arc_labels) + w = self.hybrid_ce_weight + loss = (1 - w) * arc_loss + w * ce_loss + logits = torch.log((1 - w) * F.softmax(arc_logits, dim=1) + w * F.softmax(ce_logits, dim=1) + 1e-8) + else: + loss = arc_loss + logits = arc_logits + + if self.use_multi_task and family_labels is not None and genus_labels is not None: + loss = loss + self.family_weight * F.cross_entropy(self.family_classifier(pooled), family_labels) + loss = loss + self.genus_weight * F.cross_entropy(self.genus_classifier(pooled), genus_labels) + else: + # Inference: cosine similarity, no margin + weight = F.normalize(self.arcface.weight, p=2, dim=1) + emb = F.normalize(embeddings, p=2, dim=1) + cosine = F.linear(emb, weight).view(-1, self.num_species, self.arcface.k) + cosine, _ = torch.max(cosine, dim=2) + arc_logits = cosine * self.arcface.s + + if self.hybrid_ce_weight > 0: + ce_logits = self.ce_classifier(pooled) + w = self.hybrid_ce_weight + logits = torch.log((1 - w) * F.softmax(arc_logits, dim=1) + w * F.softmax(ce_logits, dim=1) + 1e-8) + else: + logits = arc_logits + loss = None + + result = {'loss': loss, 'logits': logits} + if self.use_multi_task: + result['family_logits'] = self.family_classifier(pooled) + result['genus_logits'] = self.genus_classifier(pooled) + return result + + class MixupCutmixCollator: """ Collator that applies Mixup and/or Cutmix augmentation. """ - def __init__(self, mixup_alpha=0.8, cutmix_alpha=1.0, prob=0.5, label_smoothing=0.1, num_classes=1000, multi_task=False): + def __init__(self, mixup_alpha=0.8, cutmix_alpha=1.0, prob=0.5, label_smoothing=0.1, num_classes=1000, multi_task=False, label_column_name="label"): self.mixup_alpha = mixup_alpha self.cutmix_alpha = cutmix_alpha self.prob = prob self.label_smoothing = label_smoothing self.num_classes = num_classes self.multi_task = multi_task + self.label_column_name = label_column_name def __call__(self, examples): pixel_values = torch.stack([example["pixel_values"] for example in examples]) @@ -280,9 +426,11 @@ def __call__(self, examples): if "label" in examples[0]: labels = torch.tensor([example["label"] for example in examples]) else: - # This is for validation/evaluation - no mixup/cutmix should be applied - # Just return the basic batch - result = {"pixel_values": pixel_values} + # Validation/evaluation — no mixup/cutmix, just collate cleanly + result = { + "pixel_values": pixel_values, + "labels": torch.tensor([example[self.label_column_name] for example in examples]), + } if self.multi_task and "family_label" in examples[0]: result.update({ @@ -375,15 +523,49 @@ class MixupTrainer(Trainer): """ Custom Trainer that handles Mixup/Cutmix loss computation and batch-wise evaluation. """ - def __init__(self, *args, multi_task=False, **kwargs): + def __init__(self, *args, multi_task=False, arcface=False, + logit_adjustment=False, log_prior=None, logit_adjustment_tau=1.0, **kwargs): super().__init__(*args, **kwargs) self.multi_task = multi_task + self.arcface = arcface + # Balanced-softmax / logit adjustment (Tier 1.3-A). log_prior is a + # [num_species] tensor of log class frequencies; added to the species + # logits during TRAINING only (never at inference) to down-weight head + # classes and lift macro-F1 on the long tail. + self.logit_adjustment = logit_adjustment + self.log_prior = log_prior + self.logit_adjustment_tau = logit_adjustment_tau + + def _adjust_logits(self, logits): + """Add tau * log_prior to species logits (balanced softmax). No-op if disabled.""" + if self.log_prior is None: + return logits + return logits + self.logit_adjustment_tau * self.log_prior.to(logits.device, logits.dtype) def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None): labels_a = inputs.pop("labels") labels_b = inputs.pop("labels_b", None) lam = inputs.pop("lam", 1.0) + smoothing = getattr(self.data_collator, 'label_smoothing', 0.0) + + # ArcFace: model computes its own loss internally (no mixup/cutmix with ArcFace) + if self.arcface: + if self.multi_task: + family_labels = inputs.pop("family_labels", None) + genus_labels = inputs.pop("genus_labels", None) + species_labels = inputs.pop("species_labels", None) + outputs = model( + pixel_values=inputs["pixel_values"], + species_labels=species_labels, + family_labels=family_labels, + genus_labels=genus_labels, + ) + else: + outputs = model(pixel_values=inputs["pixel_values"], labels=labels_a) + loss = outputs.get("loss") + return (loss, outputs) if return_outputs else loss + # Handle multi-task learning if self.multi_task: family_labels = inputs.pop("family_labels") @@ -402,12 +584,13 @@ def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=N # If we have mixup/cutmix, we need to manually compute the mixed loss if family_labels_b is not None: - loss_fct = nn.CrossEntropyLoss() + loss_fct = nn.CrossEntropyLoss(label_smoothing=smoothing) # Get logits for each taxonomy level family_logits = outputs.get("family_logits") genus_logits = outputs.get("genus_logits") - species_logits = outputs.get("species_logits") + # Balanced softmax: adjust species logits only (the long-tailed target) + species_logits = self._adjust_logits(outputs.get("species_logits")) # Compute mixed losses family_loss = lam * loss_fct(family_logits, family_labels) + (1 - lam) * loss_fct(family_logits, family_labels_b) @@ -416,6 +599,16 @@ def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=N # Combined loss with same weighting as the model loss = species_loss + 0.3 * genus_loss + 0.2 * family_loss + elif self.logit_adjustment: + # No mixup this batch, but balanced softmax is on: recompute the + # combined loss in-trainer so the species logits get the log-prior + # (the model's internal loss does not apply it). + loss_fct = nn.CrossEntropyLoss(label_smoothing=smoothing) + species_logits = self._adjust_logits(outputs.get("species_logits")) + species_loss = loss_fct(species_logits, species_labels) + genus_loss = loss_fct(outputs.get("genus_logits"), genus_labels) + family_loss = loss_fct(outputs.get("family_logits"), family_labels) + loss = species_loss + 0.3 * genus_loss + 0.2 * family_loss else: # Model already computed the loss loss = outputs.get("loss") @@ -423,17 +616,14 @@ def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=N return (loss, outputs) if return_outputs else loss else: # Standard single-task training - # For standard models, only pass pixel_values (labels handled separately) outputs = model(pixel_values=inputs["pixel_values"]) - logits = outputs.get("logits") + logits = self._adjust_logits(outputs.get("logits")) # balanced softmax (no-op if disabled) + loss_fct = nn.CrossEntropyLoss(label_smoothing=smoothing) if labels_b is not None: # Mixup/Cutmix loss - loss_fct = nn.CrossEntropyLoss() loss = lam * loss_fct(logits, labels_a) + (1 - lam) * loss_fct(logits, labels_b) else: - # Standard loss - loss_fct = nn.CrossEntropyLoss() loss = loss_fct(logits, labels_a) return (loss, outputs) if return_outputs else loss @@ -508,7 +698,10 @@ def evaluation_loop( labels = labels_a with torch.no_grad(): - if self.multi_task: + if self.arcface: + # ArcFace inference: no labels → cosine similarity logits (no margin) + outputs = model(pixel_values=inputs["pixel_values"]) + elif self.multi_task: # For multi-task, pass all labels to model outputs = model( pixel_values=inputs["pixel_values"], @@ -582,6 +775,53 @@ def evaluation_loop( ) +class EMACallback(TrainerCallback): + """ + Exponential Moving Average of model weights (Tier 2.6). + + Keeps a shadow copy of the trainable parameters, updated every optimizer + step as shadow = decay*shadow + (1-decay)*param. At the end of training the + EMA weights are copied into the model, so the final `trainer.evaluate()` and + `trainer.save_model()` both reflect the averaged weights (typically a steady + +0.2-0.5% for ~free). Only parameters are averaged; buffers (e.g. BN running + stats) are left as-is. + + Note: do not combine with `load_best_model_at_end: true` — the best-checkpoint + reload happens before this callback and would be overwritten by the EMA copy. + """ + def __init__(self, decay=0.9998): + self.decay = decay + self.shadow = None + + def on_train_begin(self, args, state, control, model=None, **kwargs): + self.shadow = { + n: p.detach().clone().float() + for n, p in model.named_parameters() if p.requires_grad + } + print(f"__CUSTOM__: EMA enabled (decay={self.decay}); tracking {len(self.shadow)} parameter tensors") + + def on_step_end(self, args, state, control, model=None, **kwargs): + if self.shadow is None: + return + d = self.decay + with torch.no_grad(): + for n, p in model.named_parameters(): + if n in self.shadow: + self.shadow[n].mul_(d).add_(p.detach().float(), alpha=1.0 - d) + + def copy_to_model(self, model): + if self.shadow is None: + return + with torch.no_grad(): + for n, p in model.named_parameters(): + if n in self.shadow: + p.data.copy_(self.shadow[n].to(p.dtype)) + + def on_train_end(self, args, state, control, model=None, **kwargs): + print("__CUSTOM__: Copying EMA weights into model for final eval/save") + self.copy_to_model(model) + + def load_config_from_yaml(config_path): """Load configuration from YAML file.""" with open(config_path, 'r') as f: @@ -589,17 +829,21 @@ def load_config_from_yaml(config_path): return config -def build_multi_crop_transforms(crop_sizes, target_size, image_mean, image_std): - """Returns one Compose transform per crop size for multi-crop TTA.""" - return [ - Compose([ - Resize(crop_size), - CenterCrop(target_size), - ToTensor(), - Normalize(mean=image_mean, std=image_std), - ]) - for crop_size in crop_sizes - ] +def build_multi_crop_transforms(crop_sizes, target_size, image_mean, image_std, flip=False): + """ + Returns Compose transforms for multi-crop TTA (Tier 2.7). + + One transform per crop size; if `flip` is True, also emit a horizontally + flipped variant of each crop, so logits are averaged over crops x {orig, flip}. + """ + norm = Normalize(mean=image_mean, std=image_std) + transforms = [] + for crop_size in crop_sizes: + base = [Resize(crop_size), CenterCrop(target_size)] + transforms.append(Compose(base + [ToTensor(), norm])) + if flip: + transforms.append(Compose(base + [RandomHorizontalFlip(p=1.0), ToTensor(), norm])) + return transforms def multi_crop_evaluate(model, filepaths, labels, crop_transforms, device, compute_metrics_fn): @@ -640,15 +884,73 @@ def multi_crop_evaluate(model, filepaths, labels, crop_transforms, device, compu return metrics +def _resolve_num_workers(n): + """Return n, or all scheduler-allocated CPUs when n == -1.""" + if n == -1: + try: + return len(os.sched_getaffinity(0)) + except AttributeError: + return os.cpu_count() or 8 + return n + + +def _relocate_output_dir(path): + """ + Re-root an output/logging path to the workspace this script is actually + running from, instead of whoever authored the config. + + Configs in this repo hardcode paths like + /projectnb/herbdl/workspaces//herbdl/finetuning/output/SWIN/. + When a different user runs the same config, rewrite the + `.../workspaces//herbdl` prefix to this checkout's repo root so the + run is written under the runner's own workspace rather than the author's. + The trailing run name (.../output/SWIN/) is preserved. No-op if the + path doesn't match that layout or is already under this repo. Set + HERBDL_NO_RELOCATE=1 to disable (e.g. to write elsewhere on purpose). + """ + if not isinstance(path, str) or not path or os.environ.get('HERBDL_NO_RELOCATE'): + return path + # repo root = three levels up from this file: .../herbdl/finetuning/SWIN/ + repo_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) + m = re.match(r'^(.*/workspaces/[^/]+/herbdl)(/.*)?$', path) + if not m: + return path + relocated = repo_root + (m.group(2) or '') + if relocated != path: + print(f"__CUSTOM__: Relocated output path to current workspace:\n" + f" from {path}\n to {relocated}") + return relocated + + def main(): # Parse command line arguments for config file arg_parser = argparse.ArgumentParser(description="SWIN Fine-tuning with advanced augmentations") arg_parser.add_argument('--config', type=str, required=True, help='Path to YAML config file') + arg_parser.add_argument('--set', metavar='KEY=VALUE', action='append', default=[], + help='Override a config value using dotted key notation, e.g. training.seed=42') args = arg_parser.parse_args() # Load YAML config config = load_config_from_yaml(args.config) + # Apply --set overrides + def _coerce(v): + for cast in (int, float): + try: return cast(v) + except ValueError: pass + if v.lower() in ('true', 'false'): + return v.lower() == 'true' + return v + + for override in args.set: + key, _, value = override.partition('=') + parts = key.split('.') + d = config + for part in parts[:-1]: + d = d[part] + d[parts[-1]] = _coerce(value) + print(f"Config override: {key} = {d[parts[-1]]!r}") + # Extract custom parameters learning_rate_type = config['custom']['lr_type'] frozen = config['custom']['frozen'] @@ -666,6 +968,17 @@ def main(): multi_crop_enabled = multi_crop_config.get('enabled', False) multi_crop_sizes = multi_crop_config.get('crop_sizes', [256, 288, 320, 384, 448]) multi_crop_target_size = multi_crop_config.get('target_size', 224) + multi_crop_flip = multi_crop_config.get('flip', False) + + # Extract long-tail (balanced softmax / logit adjustment) parameters — Tier 1.3-A + long_tail_config = config.get('long_tail', {}) + use_logit_adjustment = long_tail_config.get('logit_adjustment', False) + logit_adjustment_tau = long_tail_config.get('tau', 1.0) + + # Extract EMA parameters — Tier 2.6 + ema_config = config.get('ema', {}) + use_ema = ema_config.get('enabled', False) + ema_decay = ema_config.get('decay', 0.9998) # Extract multi-task learning parameters multi_task_config = config.get('multi_task', {}) @@ -675,14 +988,28 @@ def main(): genus_weight = multi_task_config.get('genus_weight', 0.3) species_weight = multi_task_config.get('species_weight', 1.0) + # Extract ArcFace parameters + arcface_config = config.get('arcface', {}) + use_arcface = arcface_config.get('enabled', False) + arcface_embedding_size = arcface_config.get('embedding_size', 512) + arcface_scale = arcface_config.get('scale', 30.0) + arcface_margin = arcface_config.get('margin', 0.50) + arcface_num_subcenters = arcface_config.get('num_subcenters', 3) + arcface_hybrid_ce_weight = arcface_config.get('hybrid_ce_weight', 0.0) + print(f"__CUSTOM__: Learning rate type: {learning_rate_type}") print(f"__CUSTOM__: Frozen: {frozen}") print(f"__CUSTOM__: Frozen type: {frozen_type}") print(f"__CUSTOM__: Advanced augmentation: {use_advanced_aug}") print(f"__CUSTOM__: Multi-task learning: {use_multi_task}") + print(f"__CUSTOM__: ArcFace: {use_arcface}") + print(f"__CUSTOM__: Logit adjustment (balanced softmax): {use_logit_adjustment} (tau={logit_adjustment_tau})") + print(f"__CUSTOM__: Weight EMA: {use_ema} (decay={ema_decay})") if use_multi_task: print(f"__CUSTOM__: Min species samples: {min_species_samples}") print(f"__CUSTOM__: Loss weights - Family: {family_weight}, Genus: {genus_weight}, Species: {species_weight}") + if use_arcface: + print(f"__CUSTOM__: ArcFace embedding_size={arcface_embedding_size}, scale={arcface_scale}, margin={arcface_margin}, k={arcface_num_subcenters}, hybrid_ce_weight={arcface_hybrid_ce_weight}") # Create ModelArguments from config model_args = ModelArguments( @@ -714,17 +1041,27 @@ def main(): train_val_split=config['data']['train_val_split'], ) + # Warmup: transformers requires `warmup_steps` to be an int. Configs in this repo + # follow the convention that a float in (0, 1) means "fraction of total steps" — route + # those to `warmup_ratio` instead (e.g. 0.05 -> 5% warmup); ints pass through as steps. + _warmup = config['training'].get('warmup_steps', 0) + if isinstance(_warmup, float) and 0.0 < _warmup < 1.0: + _warmup_steps, _warmup_ratio = 0, _warmup + else: + _warmup_steps, _warmup_ratio = int(_warmup), 0.0 + # Create TrainingArguments from config training_args = TrainingArguments( - output_dir=config['training']['output_dir'], - logging_dir=config['training']['logging_dir'], + output_dir=_relocate_output_dir(config['training']['output_dir']), + logging_dir=_relocate_output_dir(config['training']['logging_dir']), do_train=config['training']['do_train'], do_eval=config['training']['do_eval'], per_device_train_batch_size=config['training']['per_device_train_batch_size'], per_device_eval_batch_size=config['training']['per_device_eval_batch_size'], - learning_rate=config['training']['learning_rate'], + learning_rate=float(config['training']['learning_rate']), num_train_epochs=config['training']['num_train_epochs'], - warmup_steps=config['training']['warmup_steps'], + warmup_steps=_warmup_steps, + warmup_ratio=_warmup_ratio, weight_decay=config['training']['weight_decay'], gradient_accumulation_steps=config['training']['gradient_accumulation_steps'], lr_scheduler_type=config['training']['lr_scheduler_type'], @@ -732,14 +1069,19 @@ def main(): save_strategy=config['training']['save_strategy'], save_total_limit=config['training']['save_total_limit'], eval_strategy=config['training']['eval_strategy'], - eval_steps=config['training']['eval_steps'], - report_to=config['training']['report_to'], + eval_steps=config['training'].get('eval_steps', None), # only used when eval_strategy == "steps" + report_to=config['training']['report_to'] if config['wandb'].get('enabled', True) else 'none', bf16=config['training']['bf16'], - dataloader_num_workers=config['training']['dataloader_num_workers'], + dataloader_num_workers=_resolve_num_workers(config['training']['dataloader_num_workers']), + dataloader_pin_memory=config['training'].get('dataloader_pin_memory', True), remove_unused_columns=config['training']['remove_unused_columns'], - overwrite_output_dir=config['training']['overwrite_output_dir'], seed=config['training']['seed'], label_smoothing_factor=aug_config.get('label_smoothing', 0.0) if use_advanced_aug else 0.0, + eval_on_start=config['training'].get('eval_on_start', False), + torch_compile=config['training'].get('torch_compile', False), + gradient_checkpointing=config['training'].get('gradient_checkpointing', False), + gradient_checkpointing_kwargs=config['training'].get('gradient_checkpointing_kwargs', None), + load_best_model_at_end=config['training'].get('load_best_model_at_end', False), ) # Setup logging @@ -750,52 +1092,56 @@ def main(): ) # Initialize wandb with complete config - wandb_config = { - # Model config - "model_name": model_args.model_name_or_path, - "model_revision": model_args.model_revision, - "ignore_mismatched_sizes": model_args.ignore_mismatched_sizes, - # Data config - "train_file": data_args.train_file, - "validation_file": data_args.validation_file, - "image_column_name": data_args.image_column_name, - "label_column_name": data_args.label_column_name, - "max_train_samples": data_args.max_train_samples, - "max_eval_samples": data_args.max_eval_samples, - "train_val_split": data_args.train_val_split, - # Training config - "learning_rate": training_args.learning_rate, - "per_device_train_batch_size": training_args.per_device_train_batch_size, - "per_device_eval_batch_size": training_args.per_device_eval_batch_size, - "num_train_epochs": training_args.num_train_epochs, - "warmup_steps": training_args.warmup_steps, - "weight_decay": training_args.weight_decay, - "gradient_accumulation_steps": training_args.gradient_accumulation_steps, - "lr_scheduler_type": training_args.lr_scheduler_type, - "bf16": training_args.bf16, - "seed": training_args.seed, - # Custom config - "frozen": frozen, - "frozen_type": frozen_type, - "learning_rate_type": learning_rate_type, - # Augmentation config - "use_advanced_augmentation": use_advanced_aug, - "augmentation_config": aug_config if use_advanced_aug else None, - } + if config['wandb'].get('enabled', True): + wandb_config = { + # Model config + "model_name": model_args.model_name_or_path, + "model_revision": model_args.model_revision, + "ignore_mismatched_sizes": model_args.ignore_mismatched_sizes, + # Data config + "train_file": data_args.train_file, + "validation_file": data_args.validation_file, + "image_column_name": data_args.image_column_name, + "label_column_name": data_args.label_column_name, + "max_train_samples": data_args.max_train_samples, + "max_eval_samples": data_args.max_eval_samples, + "train_val_split": data_args.train_val_split, + # Training config + "learning_rate": training_args.learning_rate, + "per_device_train_batch_size": training_args.per_device_train_batch_size, + "per_device_eval_batch_size": training_args.per_device_eval_batch_size, + "num_train_epochs": training_args.num_train_epochs, + "warmup_steps": training_args.warmup_steps, + "weight_decay": training_args.weight_decay, + "gradient_accumulation_steps": training_args.gradient_accumulation_steps, + "lr_scheduler_type": training_args.lr_scheduler_type, + "bf16": training_args.bf16, + "seed": training_args.seed, + # Custom config + "frozen": frozen, + "frozen_type": frozen_type, + "learning_rate_type": learning_rate_type, + # Augmentation config + "use_advanced_augmentation": use_advanced_aug, + "augmentation_config": aug_config if use_advanced_aug else None, + } - wandb.init( - entity=config['wandb']['entity'], - project=config['wandb']['project'], - resume=config['wandb']['resume'], - name=run_name, - group=run_group, - id=run_id, - config=wandb_config - ) + wandb.init( + entity=config['wandb']['entity'], + project=config['wandb']['project'], + resume=config['wandb']['resume'], + name=run_name, + group=run_group, + id=run_id, + config=wandb_config, + notes=config['custom']['run_notes'] + ) + else: + os.environ['WANDB_DISABLED'] = 'true' # Set the learning rate scheduler parameters from config if 'lr_scheduler_kwargs' in config['training'] and config['training']['lr_scheduler_kwargs']: - training_args.learning_rate_kwargs = config['training']['lr_scheduler_kwargs'] + training_args.lr_scheduler_kwargs = config['training']['lr_scheduler_kwargs'] if training_args.should_log: transformers.utils.logging.set_verbosity_info() @@ -815,18 +1161,19 @@ def main(): logger.info(f"Training/evaluation parameters {training_args}") # Detecting last checkpoint + overwrite_output_dir = config['training'].get('overwrite_output_dir', False) last_checkpoint = None - if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir: + if os.path.isdir(training_args.output_dir) and training_args.do_train and not overwrite_output_dir: last_checkpoint = get_last_checkpoint(training_args.output_dir) if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0: raise ValueError( f"Output directory ({training_args.output_dir}) already exists and is not empty. " - "Use --overwrite_output_dir to overcome." + "Set overwrite_output_dir: true in your config to overcome." ) elif last_checkpoint is not None and training_args.resume_from_checkpoint is None: logger.info( f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change " - "the `--output_dir` or add `--overwrite_output_dir` to train from scratch." + "the `output_dir` or set `overwrite_output_dir: true` in your config to train from scratch." ) # Set seed before initializing model @@ -996,15 +1343,27 @@ def compute_metrics(p): species_logits = p.predictions species_predictions = np.argmax(species_logits, axis=1) if len(species_logits.shape) > 1 else species_logits + family_predictions = np.argmax(family_logits, axis=1) if 'family_logits' in locals() and len(family_logits.shape) > 1 else None + genus_predictions = np.argmax(genus_logits, axis=1) if 'genus_logits' in locals() and len(genus_logits.shape) > 1 else None # Compute species accuracy (primary metric) species_accuracy = accuracy_metric.compute(predictions=species_predictions, references=p.label_ids)["accuracy"] - species_f1 = f1_metric.compute(predictions=species_predictions, references=p.label_ids, average="weighted")["f1"] + species_f1 = f1_metric.compute(predictions=species_predictions, references=p.label_ids, average="macro")["f1"] + + family_accuracy = accuracy_metric.compute(predictions=family_predictions, references=p.label_ids)["accuracy"] if family_predictions is not None else None + genus_accuracy = accuracy_metric.compute(predictions=genus_predictions, references=p.label_ids)["accuracy"] if genus_predictions is not None else None + + family_f1 = f1_metric.compute(predictions=family_predictions, references=p.label_ids, average="macro")["f1"] if family_predictions is not None else None + genus_f1 = f1_metric.compute(predictions=genus_predictions, references=p.label_ids, average="macro")["f1"] if genus_predictions is not None else None metrics = { "accuracy": species_accuracy, # Primary accuracy is species "species_accuracy": species_accuracy, "species_f1": species_f1, + "family_accuracy": family_accuracy, + "genus_accuracy": genus_accuracy, + "family_f1": family_f1, + "genus_f1": genus_f1 } # If we have genus and family logits, compute their accuracies too @@ -1034,15 +1393,95 @@ def compute_metrics(p): else: # Predictions contain label indices predictions = p.predictions accuracy = accuracy_metric.compute(predictions=predictions, references=p.label_ids)["accuracy"] - f1_score = f1_metric.compute(predictions=predictions, references=p.label_ids, average="weighted")["f1"] + f1_score = f1_metric.compute(predictions=predictions, references=p.label_ids, average="macro")["f1"] return { "accuracy": accuracy, "f1": f1_score } - # Create model based on whether multi-task learning is enabled - if use_multi_task: + # Long-tail: per-class log-prior for balanced softmax (Tier 1.3-A). + # Computed once over the (already filtered/split) training set, in the SAME + # class-index space the species/CE head outputs, so it lines up with the logits. + log_prior = None + if use_logit_adjustment: + from collections import Counter + if use_multi_task: + sp2id = hierarchical_mappings['species2id'] + cnt = Counter(sp2id[s] for s in dataset["train"]["species"]) + n_cls = hierarchical_mappings['num_species'] + else: + cnt = Counter(dataset["train"][data_args.label_column_name]) + n_cls = num_labels + freq = np.array([cnt.get(i, 0) for i in range(n_cls)], dtype=np.float64) + freq = freq / max(freq.sum(), 1.0) + freq = np.clip(freq, 1e-12, None) # floor empty classes so log() is finite + log_prior = torch.tensor(np.log(freq), dtype=torch.float32) + print(f"__CUSTOM__: Balanced softmax log-prior built over {n_cls} classes " + f"(min/max log-prior = {log_prior.min().item():.3f}/{log_prior.max().item():.3f})") + + # Create model based on which objectives are enabled + if use_arcface: + print("__CUSTOM__: Creating SwinWithArcFace model") + arc_num_species = hierarchical_mappings['num_species'] if use_multi_task else num_labels + config_obj = AutoConfig.from_pretrained( + model_args.config_name or model_args.model_name_or_path, + num_labels=arc_num_species, + finetuning_task="image-classification", + cache_dir=model_args.cache_dir, + revision=model_args.model_revision, + token=model_args.token, + trust_remote_code=model_args.trust_remote_code, + ) + base_model = AutoModelForImageClassification.from_pretrained( + model_args.model_name_or_path, + from_tf=bool(".ckpt" in model_args.model_name_or_path), + config=config_obj, + cache_dir=model_args.cache_dir, + revision=model_args.model_revision, + token=model_args.token, + trust_remote_code=model_args.trust_remote_code, + ignore_mismatched_sizes=model_args.ignore_mismatched_sizes, + ) + model = SwinWithArcFace( + base_model, + num_species=arc_num_species, + embedding_size=arcface_embedding_size, + scale=arcface_scale, + margin=arcface_margin, + num_subcenters=arcface_num_subcenters, + num_families=hierarchical_mappings['num_families'] if use_multi_task else None, + num_genera=hierarchical_mappings['num_genera'] if use_multi_task else None, + family_weight=family_weight, + genus_weight=genus_weight, + hybrid_ce_weight=arcface_hybrid_ce_weight, + ) + print(f"__CUSTOM__: SwinWithArcFace created — num_species={arc_num_species}, " + f"multi_task={use_multi_task}, hybrid_ce_weight={arcface_hybrid_ce_weight}") + # Overlay non-backbone weights from checkpoint (preserves embedding/arcface/CE heads + # when chaining ArcFace→Hybrid or ArcFace→384, since AutoModelForImageClassification + # only maps swin.* keys and discards the custom heads). + _ckpt_dir = model_args.model_name_or_path + if os.path.isdir(_ckpt_dir): + _st_path = os.path.join(_ckpt_dir, 'model.safetensors') + _bin_path = os.path.join(_ckpt_dir, 'pytorch_model.bin') + if os.path.exists(_st_path) or os.path.exists(_bin_path): + try: + if os.path.exists(_st_path): + from safetensors.torch import load_file as _load_st + _ckpt_sd = _load_st(_st_path) + else: + _ckpt_sd = torch.load(_bin_path, map_location='cpu') + # Only load keys that are NOT backbone (swin.*) — backbone already loaded + # and handles window-size mismatches (e.g. 224→384) via ignore_mismatched_sizes. + _non_backbone = {k: v for k, v in _ckpt_sd.items() if not k.startswith('swin.')} + _res = model.load_state_dict(_non_backbone, strict=False) + _loaded = len(_non_backbone) - len(_res.missing_keys) + print(f"__CUSTOM__: Overlaid {_loaded}/{len(_non_backbone)} non-backbone weights from checkpoint") + except Exception as _e: + print(f"__CUSTOM__: Could not overlay non-backbone weights: {_e}") + + elif use_multi_task: print("__CUSTOM__: Creating multi-task SWIN model") # First load a base model @@ -1221,9 +1660,6 @@ def val_transforms(example_batch): _val_transforms(Image.open(pil_img).convert("RGB")) for pil_img in example_batch[data_args.image_column_name] ] - # Keep the label for the collator/trainer - example_batch["label"] = example_batch[data_args.label_column_name] - # Add hierarchical labels for multi-task learning if use_multi_task: example_batch["family_label"] = [ @@ -1274,12 +1710,18 @@ def val_transforms(example_batch): else: collator_num_classes = num_labels - if use_mixup_cutmix or use_multi_task: - # Use mixup collator (can handle both mixup/cutmix and multi-task) + if use_mixup_cutmix or use_multi_task or use_arcface or use_logit_adjustment: + # Use mixup collator (can handle mixup/cutmix, multi-task, and arcface). + # Also used for plain single-task + balanced softmax (mixup disabled), since + # the logit adjustment lives in MixupTrainer.compute_loss. if use_mixup_cutmix: print("__CUSTOM__: Using Mixup/CutMix data collator" + (" with multi-task support" if use_multi_task else "")) - else: + elif use_arcface: + print("__CUSTOM__: Using ArcFace data collator" + (" with multi-task support" if use_multi_task else "")) + elif use_multi_task: print("__CUSTOM__: Using multi-task data collator") + else: + print("__CUSTOM__: Using data collator (balanced softmax, no mixup)") data_collator = MixupCutmixCollator( mixup_alpha=aug_config.get('mixup', {}).get('alpha', 0.8) if use_mixup_cutmix else 0, @@ -1287,19 +1729,24 @@ def val_transforms(example_batch): prob=aug_config.get('mixup_cutmix_prob', 0.5) if use_mixup_cutmix else 0, label_smoothing=aug_config.get('label_smoothing', 0.1), num_classes=collator_num_classes, - multi_task=use_multi_task + multi_task=use_multi_task, + label_column_name=data_args.label_column_name, ) - # Use custom trainer for mixup/cutmix loss or multi-task learning + # Use custom trainer for mixup/cutmix loss, multi-task learning, or arcface trainer = MixupTrainer( model=model, args=training_args, train_dataset=dataset["train"] if training_args.do_train else None, eval_dataset=dataset["validation"] if training_args.do_eval else None, compute_metrics=compute_metrics, - tokenizer=image_processor, + processing_class=image_processor, data_collator=data_collator, multi_task=use_multi_task, + arcface=use_arcface, + logit_adjustment=use_logit_adjustment, + log_prior=log_prior, + logit_adjustment_tau=logit_adjustment_tau, preprocess_logits_for_metrics=preprocess_logits_for_metrics ) else: @@ -1310,11 +1757,20 @@ def val_transforms(example_batch): train_dataset=dataset["train"] if training_args.do_train else None, eval_dataset=dataset["validation"] if training_args.do_eval else None, compute_metrics=compute_metrics, - tokenizer=image_processor, + processing_class=image_processor, data_collator=collate_fn, preprocess_logits_for_metrics=preprocess_logits_for_metrics ) + # Swap in filtered W&B callback to suppress label2id/id2label from config uploads. + trainer.remove_callback(WandbCallback) + trainer.add_callback(FilteredWandbCallback) + + # Weight EMA (Tier 2.6): copies averaged weights into the model at train end, + # so the final evaluate()/save_model() below reflect the EMA weights. + if use_ema: + trainer.add_callback(EMACallback(decay=ema_decay)) + # Training if training_args.do_train: checkpoint = None @@ -1325,6 +1781,10 @@ def val_transforms(example_batch): train_result = trainer.train(resume_from_checkpoint=checkpoint) trainer.save_model() + # Ensure config.json is always present for custom (non-PreTrainedModel) wrappers + # so that downstream stages can call AutoConfig.from_pretrained on this directory. + if hasattr(model, 'config') and not isinstance(model, transformers.PreTrainedModel): + model.config.save_pretrained(training_args.output_dir) trainer.log_metrics("train", train_result.metrics) trainer.save_metrics("train", train_result.metrics) trainer.save_state() @@ -1343,6 +1803,7 @@ def val_transforms(example_batch): target_size=multi_crop_target_size, image_mean=image_processor.image_mean, image_std=image_processor.image_std, + flip=multi_crop_flip, ) multi_crop_evaluate( model=model, diff --git a/finetuning/SWIN/configs/swin_base_baseline.yml b/finetuning/SWIN/configs/swin_base_baseline.yml new file mode 100644 index 0000000..92e2a23 --- /dev/null +++ b/finetuning/SWIN/configs/swin_base_baseline.yml @@ -0,0 +1,71 @@ +# SWIN Base Unfrozen (Full Fine-tuning) Configuration +# Model configuration +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +# Data configuration +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +# Training configuration +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_B_BASELINE_HIGHER_LR" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_B_BASELINE_HIGHER_LR" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0002 # Lower learning rate for full fine-tuning + num_train_epochs: 50 + warmup_steps: 0.05 + weight_decay: 0.01 # Add weight decay for regularization + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + lr_scheduler_kwargs: + eta_min: 0.000001 + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +# Custom configuration +custom: + lr_type: "cosine" + frozen: false # No freezing - full fine-tuning + frozen_type: "none" + run_group: "SWIN_Base" + run_name: "SWIN_Base_Baseline_HigherLR" + run_id: "swin_base_baseline_highlr_052026" + +# WandB configuration +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs/swin_base_pretrained_linear.yml b/finetuning/SWIN/configs/swin_base_pretrained_linear.yml new file mode 100644 index 0000000..389527f --- /dev/null +++ b/finetuning/SWIN/configs/swin_base_pretrained_linear.yml @@ -0,0 +1,65 @@ +# SWIN Base — clean fine-tune from ImageNet-22k pretrained weights +# Linear LR schedule, 5-epoch warmup (1.25e-7 → 1.25e-4), no heavy augs, no multi-task +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_PRETRAINED_LINEAR" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_PRETRAINED_LINEAR" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 1.25e-4 + num_train_epochs: 50 + warmup_steps: 0.05 # 5 epochs out of 100 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "linear" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + report_to: "wandb" + bf16: true + dataloader_num_workers: 8 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "linear" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base" + run_name: "SWIN_Base_Pretrained_Linear" + run_id: "swin_base_pretrained_linear_052026" + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs/swin_base_unfrozen_15k.yml b/finetuning/SWIN/configs/swin_base_unfrozen_15k.yml index 2287c59..333fd6a 100644 --- a/finetuning/SWIN/configs/swin_base_unfrozen_15k.yml +++ b/finetuning/SWIN/configs/swin_base_unfrozen_15k.yml @@ -29,8 +29,8 @@ data: # Training configuration training: - output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_UNFROZEN_15K" - logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_UNFROZEN_15K" + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_UNFROZEN_MACRO" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_UNFROZEN_MACRO" do_train: true do_eval: true per_device_train_batch_size: 128 @@ -61,8 +61,8 @@ custom: frozen: false # No freezing - full fine-tuning frozen_type: "none" run_group: "SWIN_Base" - run_name: "SWIN_Base_Unfrozen_15K" - run_id: "swin_base_unfrozen_15k_012626" + run_name: "SWIN_Base_Baseline" + run_id: "swin_base_unfrozen_15k_040326" # WandB configuration wandb: diff --git a/finetuning/SWIN/configs_advanced/swin_base_224_arcface.yml b/finetuning/SWIN/configs_advanced/swin_base_224_arcface.yml index 25fb69b..dfc65a9 100644 --- a/finetuning/SWIN/configs_advanced/swin_base_224_arcface.yml +++ b/finetuning/SWIN/configs_advanced/swin_base_224_arcface.yml @@ -118,6 +118,7 @@ multi_crop: # WandB configuration wandb: + enabled: true entity: "gardoslab" project: "herbdl" resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_base_224_augmented.yml b/finetuning/SWIN/configs_advanced/swin_base_224_augmented.yml new file mode 100644 index 0000000..f00b20d --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_base_224_augmented.yml @@ -0,0 +1,103 @@ +# SWIN Base 224 - Baseline with Heavy Augmentation +# Single-task species classification, full fine-tuning, 15K classes + +# Model configuration +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +# Data configuration +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +# Training configuration +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_224_AUGMENTED_3" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_224_AUGMENTED_3" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 64 + learning_rate: 0.00005 + num_train_epochs: 50 + warmup_steps: 500 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + lr_scheduler_kwargs: + eta_min: 0.000001 + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +# Custom configuration +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base" + run_name: "SWIN_Base_224_Augmented3" + run_id: "swin_base_224_aug_040626" + run_notes: "Less aggressive LR and RandAugment compared to Augmented2" + +# Augmentation settings +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 6 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + +# Multi-task disabled (single-task baseline) +multi_task: + enabled: false + +# Multi-crop testing (inference only) +multi_crop: + enabled: false + +# WandB configuration +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_base_224_enhanced.yml b/finetuning/SWIN/configs_advanced/swin_base_224_enhanced.yml index c02620e..1b470e7 100644 --- a/finetuning/SWIN/configs_advanced/swin_base_224_enhanced.yml +++ b/finetuning/SWIN/configs_advanced/swin_base_224_enhanced.yml @@ -102,6 +102,7 @@ multi_crop: # WandB configuration wandb: + enabled: true entity: "gardoslab" project: "herbdl" resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_base_224_multitask.yml b/finetuning/SWIN/configs_advanced/swin_base_224_multitask.yml index 9bd27e1..68b1205 100644 --- a/finetuning/SWIN/configs_advanced/swin_base_224_multitask.yml +++ b/finetuning/SWIN/configs_advanced/swin_base_224_multitask.yml @@ -31,15 +31,15 @@ data: # Training configuration training: - output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_224_MULTITASK" - logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_224_MULTITASK" + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_224_MULTITASK_2" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_224_MULTITASK_2" do_train: true do_eval: true per_device_train_batch_size: 128 per_device_eval_batch_size: 32 learning_rate: 0.0005 # 5e-4 (higher LR for multi-task learning) num_train_epochs: 100 - warmup_steps: 500 + warmup_steps: 1000 weight_decay: 0.01 gradient_accumulation_steps: 1 lr_scheduler_type: "cosine" @@ -64,11 +64,11 @@ custom: frozen_type: "none" run_group: "SWIN_Base_MultiTask" run_name: "SWIN_Base_224_MultiTask" - run_id: "swin_base_224_mt_051826" + run_id: "swin_base_224_mt_032426" # Advanced augmentation settings augmentation: - use_advanced: true + use_advanced: false randaugment: num_ops: 2 magnitude: 9 @@ -108,6 +108,7 @@ multi_crop: # WandB configuration wandb: + enabled: true entity: "gardoslab" project: "herbdl" resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_base_384_enhanced.yml b/finetuning/SWIN/configs_advanced/swin_base_384_enhanced.yml index dd5fda4..f965138 100644 --- a/finetuning/SWIN/configs_advanced/swin_base_384_enhanced.yml +++ b/finetuning/SWIN/configs_advanced/swin_base_384_enhanced.yml @@ -101,6 +101,7 @@ multi_crop: # WandB configuration wandb: + enabled: true entity: "gardoslab" project: "herbdl" resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_baseline_augmented.yml b/finetuning/SWIN/configs_advanced/swin_baseline_augmented.yml new file mode 100644 index 0000000..523ebf4 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_baseline_augmented.yml @@ -0,0 +1,103 @@ +# SWIN Base 224 - Baseline with Heavy Augmentation +# Single-task species classification, full fine-tuning, 15K classes + +# Model configuration +model: + model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_BASELINE/checkpoint-131250" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +# Data configuration +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +# Training configuration +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASELINE+AUG" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASELINE+AUG" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 64 + learning_rate: 0.00005 + num_train_epochs: 75 + warmup_steps: 1500 + weight_decay: 0.05 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +# Custom configuration +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base" + run_name: "SWIN_Baseline+Aug" + run_id: "swin_baseline+aug_041026" + run_notes: "More aggressive augmentation from baseline checkpoint. Higher mixup/cutmix prob, stronger RandAugment, more random erasing. Targeting 0.78 F1." + +# Augmentation settings +# Augmentation settings (more aggressive) +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + +# Multi-task disabled (single-task baseline) +multi_task: + enabled: false + +# Multi-crop testing (inference only) +multi_crop: + enabled: false + +# WandB configuration +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_baseline_augmented_v2.yml b/finetuning/SWIN/configs_advanced/swin_baseline_augmented_v2.yml new file mode 100644 index 0000000..97d25e4 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_baseline_augmented_v2.yml @@ -0,0 +1,104 @@ +# SWIN Base 224 - Augmented v2 (more aggressive) +# Continuing from augmented checkpoint, heavier augmentation toward 0.78 F1 target + +# Model configuration +model: + model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_BASELINE/checkpoint-131250" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +# Data configuration +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +# Training configuration +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASELINE+AUG_v2" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASELINE+AUG_v2" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 64 + learning_rate: 0.000005 + num_train_epochs: 75 + warmup_steps: 1500 + weight_decay: 0.05 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + lr_scheduler_kwargs: + eta_min: 0.0000003 + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +# Custom configuration +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base" + run_name: "SWIN_Baseline+Aug_v2" + run_id: "swin_baseline+aug_v2_040926" + run_notes: "More aggressive augmentation from baseline checkpoint. Higher mixup/cutmix prob, stronger RandAugment, more random erasing. Targeting 0.78 F1." + +# Augmentation settings (more aggressive) +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + +# Multi-task disabled +multi_task: + enabled: false + +# Multi-crop testing (inference only) +multi_crop: + enabled: false + +# WandB configuration +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_384.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_384.yml new file mode 100644 index 0000000..281450e --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_384.yml @@ -0,0 +1,114 @@ +# Curriculum 384 — Scale to 384 resolution from Hybrid checkpoint +# Uses 384 architecture config + image processor with weights transferred from 224 Hybrid model. +# Window size mismatch (7→12) means position biases reinit, but backbone knowledge transfers. +# Effective batch: 64 * 2 (grad accum) = 128. ArcFace hybrid preserved. + +model: + model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_HYBRID" + config_name: "microsoft/swin-base-patch4-window12-384-in22k" + cache_dir: null + model_revision: "main" + image_processor_name: "microsoft/swin-base-patch4-window12-384-in22k" + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_384" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_384" + do_train: true + do_eval: true + per_device_train_batch_size: 64 + per_device_eval_batch_size: 16 + learning_rate: 0.00005 + num_train_epochs: 50 + warmup_steps: 1000 + weight_decay: 0.05 + gradient_accumulation_steps: 2 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum" + run_name: "Curriculum_384" + run_id: "swin_curriculum_384" + run_notes: "384 resolution. Backbone from Hybrid (224) checkpoint with 384 arch config. Position biases reinit. ArcFace hybrid preserved." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: false + cutmix: + enabled: false + mixup_cutmix_prob: 0.0 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + color_jitter: + enabled: true + brightness: 0.4 + contrast: 0.4 + saturation: 0.4 + hue: 0.1 + +multi_task: + enabled: true + min_species_samples: 2 + family_weight: 0.2 + genus_weight: 0.3 + species_weight: 1.0 + +arcface: + enabled: true + embedding_size: 512 + scale: 30.0 + margin: 0.50 + num_subcenters: 3 + hybrid_ce_weight: 0.3 + +multi_crop: + enabled: true + crop_sizes: [400, 416, 448, 480, 512] + target_size: 384 + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_arcface.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_arcface.yml new file mode 100644 index 0000000..e5b1a96 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_arcface.yml @@ -0,0 +1,113 @@ +# Curriculum ArcFace — SubCenter ArcFace + MultiTask from MultiTask checkpoint +# New ArcFace/embedding heads on top of curriculum-trained backbone. +# Mixup/CutMix disabled (incompatible with ArcFace hard-label margin). + +model: + model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_MULTITASK" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_ARCFACE" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_ARCFACE" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0001 + num_train_epochs: 60 + warmup_steps: 1000 + weight_decay: 0.05 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum" + run_name: "Curriculum_ArcFace" + run_id: "swin_curriculum_arcface" + run_notes: "ArcFace stage. SubCenter ArcFace for species + CE auxiliary heads for family/genus. Backbone from MultiTask checkpoint." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: false + cutmix: + enabled: false + mixup_cutmix_prob: 0.0 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + color_jitter: + enabled: true + brightness: 0.4 + contrast: 0.4 + saturation: 0.4 + hue: 0.1 + +multi_task: + enabled: true + min_species_samples: 2 + family_weight: 0.2 + genus_weight: 0.3 + species_weight: 1.0 + +arcface: + enabled: true + embedding_size: 512 + scale: 30.0 + margin: 0.50 + num_subcenters: 3 + hybrid_ce_weight: 0.0 + +multi_crop: + enabled: false + crop_sizes: [256, 288, 320, 384, 448] + target_size: 224 + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_hybrid.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_hybrid.yml new file mode 100644 index 0000000..dc73558 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_hybrid.yml @@ -0,0 +1,113 @@ +# Curriculum Hybrid — CE + ArcFace blended loss from ArcFace checkpoint +# Adds a parallel CE head (weight=0.3) alongside ArcFace (weight=0.7). +# Stable CE gradients help further fine-tune the backbone. + +model: + model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_ARCFACE" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_HYBRID" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_HYBRID" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.00003 + num_train_epochs: 40 + warmup_steps: 500 + weight_decay: 0.05 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum" + run_name: "Curriculum_Hybrid" + run_id: "swin_curriculum_hybrid" + run_notes: "Hybrid loss: 0.7*ArcFace + 0.3*CE for species. CE auxiliary for family/genus. From ArcFace checkpoint." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: false + cutmix: + enabled: false + mixup_cutmix_prob: 0.0 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + color_jitter: + enabled: true + brightness: 0.4 + contrast: 0.4 + saturation: 0.4 + hue: 0.1 + +multi_task: + enabled: true + min_species_samples: 2 + family_weight: 0.2 + genus_weight: 0.3 + species_weight: 1.0 + +arcface: + enabled: true + embedding_size: 512 + scale: 30.0 + margin: 0.50 + num_subcenters: 3 + hybrid_ce_weight: 0.3 + +multi_crop: + enabled: false + crop_sizes: [256, 288, 320, 384, 448] + target_size: 224 + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_multitask.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_multitask.yml new file mode 100644 index 0000000..8aa0982 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_multitask.yml @@ -0,0 +1,98 @@ +# Curriculum Multi-Task — from S3 checkpoint + multi-task heads +# New family/genus heads on top of S3 backbone. Heavy augmentation preserved. + +model: + model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S3_CONT" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_MULTITASK" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_MULTITASK" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.00003 + num_train_epochs: 75 + warmup_steps: 1000 + weight_decay: 0.05 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum" + run_name: "Curriculum_MultiTask" + run_id: "swin_curriculum_multitask" + run_notes: "Multi-task from S3_cont checkpoint. Family/genus heads randomly initialized on top of curriculum-trained backbone." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + +multi_task: + enabled: true + min_species_samples: 2 + family_weight: 0.2 + genus_weight: 0.3 + species_weight: 1.0 + +multi_crop: + enabled: false + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_384.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_384.yml new file mode 100644 index 0000000..d5c25fa --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_384.yml @@ -0,0 +1,116 @@ +# Pretrained Full Pipeline — All techniques from SwinV2-384 pretrained weights. +# No curriculum staging. Heavy aug + MultiTask CE applied from epoch 1. +# SwinV2 window12to24-192to384 fine-tuned checkpoint (384px native resolution). +# Effective batch size 128 (64 per device * grad_accum 2). + +model: + model_name_or_path: "microsoft/swinv2-base-patch4-window12to24-192to384-22kto1k-ft" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_384" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_384" + do_train: true + do_eval: true + per_device_train_batch_size: 64 + per_device_eval_batch_size: 16 + learning_rate: 0.0001 + num_train_epochs: 100 + warmup_steps: 2000 + weight_decay: 0.05 + gradient_accumulation_steps: 2 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum_Pretrained" + run_name: "Curriculum_Pretrained_384" + run_id: "swin_curriculum_pretrained_384" + run_notes: "All techniques from SwinV2-384 pretrained (window12to24-192to384-22kto1k-ft). Heavy aug + MultiTask CE. No staged curriculum." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + color_jitter: + enabled: true + brightness: 0.4 + contrast: 0.4 + saturation: 0.4 + hue: 0.1 + +multi_task: + enabled: true + min_species_samples: 2 + family_weight: 0.2 + genus_weight: 0.3 + species_weight: 1.0 + +arcface: + enabled: false + embedding_size: 512 + scale: 30.0 + margin: 0.50 + num_subcenters: 3 + hybrid_ce_weight: 0.0 + +multi_crop: + enabled: false + crop_sizes: [400, 416, 448, 480, 512] + target_size: 384 + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s1.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s1.yml new file mode 100644 index 0000000..aea8c06 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s1.yml @@ -0,0 +1,96 @@ +# Curriculum Pretrained S1 — Mild augmentation from ImageNet-22k pretrained weights +# Skips the no-aug baseline stage: starts directly from HF pretrained checkpoint with mild aug. +# Higher LR than original S1 (1e-4 vs 5e-5) since the classifier head is randomly initialized. +# More epochs (50) to allow the fresh head and backbone to co-adapt before aug ramps up. + +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S1" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S1" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 64 + learning_rate: 0.0001 + num_train_epochs: 50 + warmup_steps: 1000 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 3 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum_Pretrained" + run_name: "Curriculum_Pretrained_S1" + run_id: "swin_curriculum_pretrained_s1" + run_notes: "Stage 1. Mild aug directly from ImageNet-22k pretrained weights. No no-aug baseline stage." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 4 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.3 + label_smoothing: 0.05 + random_erasing: + enabled: true + probability: 0.1 + min_area: 0.02 + max_area: 0.33 + +multi_task: + enabled: false + +multi_crop: + enabled: false + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s2.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s2.yml new file mode 100644 index 0000000..83e00a2 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s2.yml @@ -0,0 +1,94 @@ +# Curriculum Pretrained S2 — Medium augmentation from Pretrained S1 checkpoint +# Identical technique progression to original S2. Stepping up aug difficulty. + +model: + model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S1" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S2" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S2" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 64 + learning_rate: 0.00003 + num_train_epochs: 30 + warmup_steps: 500 + weight_decay: 0.03 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 3 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum_Pretrained" + run_name: "Curriculum_Pretrained_S2" + run_id: "swin_curriculum_pretrained_s2" + run_notes: "Stage 2. Medium aug from Pretrained S1 checkpoint. Stepping up difficulty." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 7 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.4 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.15 + min_area: 0.02 + max_area: 0.33 + +multi_task: + enabled: false + +multi_crop: + enabled: false + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s3.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s3.yml new file mode 100644 index 0000000..c8db5be --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s3.yml @@ -0,0 +1,94 @@ +# Curriculum Pretrained S3 — Heavy augmentation from Pretrained S2 checkpoint +# Identical technique progression to original S3. Maximum regularization. + +model: + model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S2" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S3" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S3" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 64 + learning_rate: 0.00002 + num_train_epochs: 50 + warmup_steps: 500 + weight_decay: 0.05 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum_Pretrained" + run_name: "Curriculum_Pretrained_S3" + run_id: "swin_curriculum_pretrained_s3" + run_notes: "Stage 3. Heavy aug from Pretrained S2 checkpoint. Maximum regularization." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + +multi_task: + enabled: false + +multi_crop: + enabled: false + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_s1.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_s1.yml new file mode 100644 index 0000000..8a27723 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_s1.yml @@ -0,0 +1,94 @@ +# Curriculum Stage 1 — Mild augmentation from baseline +# mag=4, prob=0.3, erasing=0.1 | 25 epochs | lr=5e-5 + +model: + model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_BASELINE/checkpoint-131250" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S1" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S1" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 64 + learning_rate: 0.00005 + num_train_epochs: 25 + warmup_steps: 1000 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 3 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum" + run_name: "Curriculum_S1" + run_id: "swin_curriculum_s1" + run_notes: "Stage 1 of 3. Mild augmentation from baseline. Soft-label warm-up before increasing difficulty." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 4 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.3 + label_smoothing: 0.05 + random_erasing: + enabled: true + probability: 0.1 + min_area: 0.02 + max_area: 0.33 + +multi_task: + enabled: false + +multi_crop: + enabled: false + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_s2.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_s2.yml new file mode 100644 index 0000000..71b8fb0 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_s2.yml @@ -0,0 +1,94 @@ +# Curriculum Stage 2 — Medium augmentation from S1 +# mag=7, prob=0.4, erasing=0.15 | 30 epochs | lr=3e-5 + +model: + model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S1" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S2" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S2" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 64 + learning_rate: 0.00003 + num_train_epochs: 30 + warmup_steps: 500 + weight_decay: 0.03 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 3 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum" + run_name: "Curriculum_S2" + run_id: "swin_curriculum_s2" + run_notes: "Stage 2 of 3. Medium augmentation from S1 checkpoint. Stepping up difficulty." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 7 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.4 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.15 + min_area: 0.02 + max_area: 0.33 + +multi_task: + enabled: false + +multi_crop: + enabled: false + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_s3.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_s3.yml new file mode 100644 index 0000000..f5e6a11 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_s3.yml @@ -0,0 +1,94 @@ +# Curriculum Stage 3 — Heavy augmentation from S2 +# mag=9, prob=0.5, erasing=0.25 | 50 epochs | lr=2e-5 + +model: + model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S2" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S3" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S3" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 64 + learning_rate: 0.00002 + num_train_epochs: 50 + warmup_steps: 500 + weight_decay: 0.05 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum" + run_name: "Curriculum_S3" + run_id: "swin_curriculum_s3" + run_notes: "Stage 3 of 3. Heavy augmentation from S2 checkpoint. Maximum regularization push toward 0.78 F1." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + +multi_task: + enabled: false + +multi_crop: + enabled: false + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_s3_cont.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_s3_cont.yml new file mode 100644 index 0000000..ff75221 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_s3_cont.yml @@ -0,0 +1,94 @@ +# Curriculum Stage 3 continuation — Heavy augmentation, epochs 51-100 +# Fresh cosine schedule from S3 final model. Same augmentation settings as S3. + +model: + model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S3" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S3_CONT" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S3_CONT" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 64 + learning_rate: 0.00002 + num_train_epochs: 50 + warmup_steps: 500 + weight_decay: 0.05 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum" + run_name: "Curriculum_S3_cont" + run_id: "swin_curriculum_s3_cont" + run_notes: "Stage 3 continuation (epochs 51-100). Fresh cosine schedule from S3 final model. Same heavy augmentation." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + +multi_task: + enabled: false + +multi_crop: + enabled: false + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_v2.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_v2.yml new file mode 100644 index 0000000..d9766ed --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_curriculum_v2.yml @@ -0,0 +1,114 @@ +# Curriculum V2 — SWIN V2 architecture upgrade from 384 checkpoint +# SWIN V2 arch is too different to chain directly, so we load V2 pretrained weights +# but re-apply all techniques (heavy aug + arcface hybrid + multitask) from epoch 1. +# Uses 192 native resolution for V2 (pre-trained window12-192). + +model: + model_name_or_path: "microsoft/swinv2-base-patch4-window12-192-22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_V2" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_V2" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0003 + num_train_epochs: 100 + warmup_steps: 2000 + weight_decay: 0.05 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 42 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Curriculum" + run_name: "Curriculum_V2" + run_id: "swin_curriculum_v2" + run_notes: "SWIN V2 architecture with all techniques from scratch. ImageNet22k pretrained V2 weights. All curriculum-derived techniques applied from epoch 1." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: false + cutmix: + enabled: false + mixup_cutmix_prob: 0.0 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + color_jitter: + enabled: true + brightness: 0.4 + contrast: 0.4 + saturation: 0.4 + hue: 0.1 + +multi_task: + enabled: true + min_species_samples: 2 + family_weight: 0.2 + genus_weight: 0.3 + species_weight: 1.0 + +arcface: + enabled: true + embedding_size: 512 + scale: 30.0 + margin: 0.50 + num_subcenters: 3 + hybrid_ce_weight: 0.3 + +multi_crop: + enabled: false + crop_sizes: [224, 256, 288, 320, 352] + target_size: 192 + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_large_384_concrete.yml b/finetuning/SWIN/configs_advanced/swin_large_384_concrete.yml new file mode 100644 index 0000000..797a4ed --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_large_384_concrete.yml @@ -0,0 +1,150 @@ +# ============================================================================= +# Concrete next run (SWIN_training_setup_summary.md §10/§11 — "Putting it together") +# ============================================================================= +# A single strong single-model candidate combining the cheap-but-high-value levers: +# - Backbone : SWIN-L @384, ImageNet-22k pretrained (Tier 1.1 "stay in SWIN", Tier 1.2) +# - Loss : balanced-softmax CE (Tier 1.3-A) + multi-task family/genus/species heads +# - Schedule : 100 epochs, 5% warmup, cosine; EMA on (Tier 2.6) +# - Aug : MEDIUM (RandAugment mag 7, mild erasing/mixup) — respects the curriculum +# finding that heavy aug applied COLD destroys performance (0.61 vs 0.745). +# - Inference: multi-crop + flip TTA wired (Tier 2.7), enabled only for final prediction. +# +# WARM-START (recommended once a 384 checkpoint exists — Tier 2.5 "always chain"): +# point model.model_name_or_path at a converged SWIN-L 384 output_dir (keep +# config_name / image_processor_name on the 384 arch) and raise RandAugment to 9. +# Cold-from-in22k here is the dependency-free default; chaining is strictly better. +# +# Seeds: launch the ensemble with submit_concrete.sh, which overrides +# training.seed / training.output_dir / custom.run_id via --set per job. +# ============================================================================= + +model: + model_name_or_path: "microsoft/swin-large-patch4-window12-384-in22k" + config_name: "microsoft/swin-large-patch4-window12-384-in22k" + cache_dir: null + model_revision: "main" + image_processor_name: "microsoft/swin-large-patch4-window12-384-in22k" + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: 50000 + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/tgardos/herbdl/finetuning/output/SWIN/SWIN_L_384_CONCRETE_SEED0" + logging_dir: "/projectnb/herbdl/workspaces/tgardos/herbdl/finetuning/output/SWIN/SWIN_L_384_CONCRETE_SEED0" + do_train: true + do_eval: true + per_device_train_batch_size: 16 + per_device_eval_batch_size: 8 + learning_rate: 0.0001 + num_train_epochs: 100 + warmup_steps: 0.05 + weight_decay: 0.05 + gradient_accumulation_steps: 8 # effective batch = 16 * 8 = 128 + gradient_checkpointing: true # needed to fit SWIN-L @384 (wrappers pass it through) + lr_scheduler_type: "cosine" + lr_scheduler_kwargs: + eta_min: 0.000001 + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 3 + eval_strategy: "epoch" + eval_on_start: true + load_best_model_at_end: false # EMA copies averaged weights in at train end; do not reload "best" + report_to: "wandb" + bf16: true + dataloader_num_workers: 8 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_L_384_Concrete" + run_name: "SWIN_L_384_Concrete_Seed0" + run_id: "swin_l_384_concrete_seed0" + run_notes: "Concrete next run: SWIN-L 384 in22k, balanced-softmax + multi-task CE, medium aug, EMA, 100ep. Seed 0." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 7 # MEDIUM (not 9) — cold start, see curriculum finding + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.3 # mild mixing for a cold backbone + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.15 + min_area: 0.02 + max_area: 0.33 + color_jitter: + enabled: true + brightness: 0.4 + contrast: 0.4 + saturation: 0.4 + hue: 0.1 + +multi_task: + enabled: true + min_species_samples: 2 + family_weight: 0.2 + genus_weight: 0.3 + species_weight: 1.0 + +# Balanced softmax / logit adjustment (Tier 1.3-A). Adds tau * log(class_prior) to the +# species logits during TRAINING only (off at inference) to lift macro-F1 on the long tail. +long_tail: + logit_adjustment: true + tau: 1.0 + +# Weight EMA (Tier 2.6). EMA weights are copied into the model at train end, so the +# final eval/save reflect them. Keep load_best_model_at_end false (see above). +ema: + enabled: true + decay: 0.9998 + +# Multi-crop + horizontal-flip TTA (Tier 2.7). Leave disabled during training; enable +# for the final/leaderboard prediction (crops are sized around the 384 target). +multi_crop: + enabled: false + crop_sizes: [400, 416, 448, 480, 512] + target_size: 384 + flip: true + +# ArcFace deferred (Tier 2.4) — kept off for this run. +arcface: + enabled: false + embedding_size: 512 + scale: 30.0 + margin: 0.50 + num_subcenters: 3 + hybrid_ce_weight: 0.0 + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed0.yml b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed0.yml new file mode 100644 index 0000000..57c0fe2 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed0.yml @@ -0,0 +1,115 @@ +# Pretrained Full Pipeline (SwinL-224) — seed 0, LINEAR LR +# SwinL-224 (in22k pretrained), all techniques (heavy aug + MultiTask CE), linear LR + 5-epoch warmup. +# Linear LR seed ensemble run 1/5. + +model: + model_name_or_path: "microsoft/swin-large-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED0" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED0" + do_train: true + do_eval: true + per_device_train_batch_size: 64 + per_device_eval_batch_size: 16 + learning_rate: 0.0001 + num_train_epochs: 100 + warmup_steps: 0.05 + weight_decay: 0.05 + gradient_accumulation_steps: 2 + lr_scheduler_type: "linear" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "linear" + frozen: false + frozen_type: "none" + run_group: "SWIN_Pretrained_SwinL224_Linear_Seeds" + run_name: "Pretrained_SwinL224_Linear_Seed0" + run_id: "swin_pretrained_swinl224_linear_seed0" + run_notes: "Linear LR seed ensemble. SwinL-224 in22k pretrained, heavy aug + MultiTask CE, 5-epoch warmup. Seed 0." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + color_jitter: + enabled: true + brightness: 0.4 + contrast: 0.4 + saturation: 0.4 + hue: 0.1 + +multi_task: + enabled: true + min_species_samples: 2 + family_weight: 0.2 + genus_weight: 0.3 + species_weight: 1.0 + +arcface: + enabled: false + embedding_size: 512 + scale: 30.0 + margin: 0.50 + num_subcenters: 3 + hybrid_ce_weight: 0.0 + +multi_crop: + enabled: false + crop_sizes: [400, 416, 448, 480, 512] + target_size: 224 + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed1.yml b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed1.yml new file mode 100644 index 0000000..36b41c9 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed1.yml @@ -0,0 +1,115 @@ +# Pretrained Full Pipeline (SwinL-224) — seed 1, LINEAR LR +# SwinL-224 (in22k pretrained), all techniques (heavy aug + MultiTask CE), linear LR + 5-epoch warmup. +# Linear LR seed ensemble run 2/5. + +model: + model_name_or_path: "microsoft/swin-large-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED1" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED1" + do_train: true + do_eval: true + per_device_train_batch_size: 64 + per_device_eval_batch_size: 16 + learning_rate: 0.0001 + num_train_epochs: 100 + warmup_steps: 0.05 + weight_decay: 0.05 + gradient_accumulation_steps: 2 + lr_scheduler_type: "linear" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 1 + +custom: + lr_type: "linear" + frozen: false + frozen_type: "none" + run_group: "SWIN_Pretrained_SwinL224_Linear_Seeds" + run_name: "Pretrained_SwinL224_Linear_Seed1" + run_id: "swin_pretrained_swinl224_linear_seed1" + run_notes: "Linear LR seed ensemble. SwinL-224 in22k pretrained, heavy aug + MultiTask CE, 5-epoch warmup. Seed 1." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + color_jitter: + enabled: true + brightness: 0.4 + contrast: 0.4 + saturation: 0.4 + hue: 0.1 + +multi_task: + enabled: true + min_species_samples: 2 + family_weight: 0.2 + genus_weight: 0.3 + species_weight: 1.0 + +arcface: + enabled: false + embedding_size: 512 + scale: 30.0 + margin: 0.50 + num_subcenters: 3 + hybrid_ce_weight: 0.0 + +multi_crop: + enabled: false + crop_sizes: [400, 416, 448, 480, 512] + target_size: 224 + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed2.yml b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed2.yml new file mode 100644 index 0000000..893f052 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed2.yml @@ -0,0 +1,115 @@ +# Pretrained Full Pipeline (SwinL-224) — seed 2, LINEAR LR +# SwinL-224 (in22k pretrained), all techniques (heavy aug + MultiTask CE), linear LR + 5-epoch warmup. +# Linear LR seed ensemble run 3/5. + +model: + model_name_or_path: "microsoft/swin-large-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED2" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED2" + do_train: true + do_eval: true + per_device_train_batch_size: 64 + per_device_eval_batch_size: 16 + learning_rate: 0.0001 + num_train_epochs: 100 + warmup_steps: 0.05 + weight_decay: 0.05 + gradient_accumulation_steps: 2 + lr_scheduler_type: "linear" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 2 + +custom: + lr_type: "linear" + frozen: false + frozen_type: "none" + run_group: "SWIN_Pretrained_SwinL224_Linear_Seeds" + run_name: "Pretrained_SwinL224_Linear_Seed2" + run_id: "swin_pretrained_swinl224_linear_seed2" + run_notes: "Linear LR seed ensemble. SwinL-224 in22k pretrained, heavy aug + MultiTask CE, 5-epoch warmup. Seed 2." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + color_jitter: + enabled: true + brightness: 0.4 + contrast: 0.4 + saturation: 0.4 + hue: 0.1 + +multi_task: + enabled: true + min_species_samples: 2 + family_weight: 0.2 + genus_weight: 0.3 + species_weight: 1.0 + +arcface: + enabled: false + embedding_size: 512 + scale: 30.0 + margin: 0.50 + num_subcenters: 3 + hybrid_ce_weight: 0.0 + +multi_crop: + enabled: false + crop_sizes: [400, 416, 448, 480, 512] + target_size: 224 + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed3.yml b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed3.yml new file mode 100644 index 0000000..19b9586 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed3.yml @@ -0,0 +1,115 @@ +# Pretrained Full Pipeline (SwinL-224) — seed 3, LINEAR LR +# SwinL-224 (in22k pretrained), all techniques (heavy aug + MultiTask CE), linear LR + 5-epoch warmup. +# Linear LR seed ensemble run 4/5. + +model: + model_name_or_path: "microsoft/swin-large-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED3" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED3" + do_train: true + do_eval: true + per_device_train_batch_size: 64 + per_device_eval_batch_size: 16 + learning_rate: 0.0001 + num_train_epochs: 100 + warmup_steps: 0.05 + weight_decay: 0.05 + gradient_accumulation_steps: 2 + lr_scheduler_type: "linear" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 3 + +custom: + lr_type: "linear" + frozen: false + frozen_type: "none" + run_group: "SWIN_Pretrained_SwinL224_Linear_Seeds" + run_name: "Pretrained_SwinL224_Linear_Seed3" + run_id: "swin_pretrained_swinl224_linear_seed3" + run_notes: "Linear LR seed ensemble. SwinL-224 in22k pretrained, heavy aug + MultiTask CE, 5-epoch warmup. Seed 3." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + color_jitter: + enabled: true + brightness: 0.4 + contrast: 0.4 + saturation: 0.4 + hue: 0.1 + +multi_task: + enabled: true + min_species_samples: 2 + family_weight: 0.2 + genus_weight: 0.3 + species_weight: 1.0 + +arcface: + enabled: false + embedding_size: 512 + scale: 30.0 + margin: 0.50 + num_subcenters: 3 + hybrid_ce_weight: 0.0 + +multi_crop: + enabled: false + crop_sizes: [400, 416, 448, 480, 512] + target_size: 224 + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed4.yml b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed4.yml new file mode 100644 index 0000000..5eb2667 --- /dev/null +++ b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed4.yml @@ -0,0 +1,115 @@ +# Pretrained Full Pipeline (SwinL-224) — seed 4, LINEAR LR +# SwinL-224 (in22k pretrained), all techniques (heavy aug + MultiTask CE), linear LR + 5-epoch warmup. +# Linear LR seed ensemble run 5/5. + +model: + model_name_or_path: "microsoft/swin-large-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED4" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED4" + do_train: true + do_eval: true + per_device_train_batch_size: 64 + per_device_eval_batch_size: 16 + learning_rate: 0.0001 + num_train_epochs: 100 + warmup_steps: 0.05 + weight_decay: 0.05 + gradient_accumulation_steps: 2 + lr_scheduler_type: "linear" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + eval_on_start: true + report_to: "wandb" + bf16: true + dataloader_num_workers: 16 + remove_unused_columns: false + overwrite_output_dir: false + seed: 4 + +custom: + lr_type: "linear" + frozen: false + frozen_type: "none" + run_group: "SWIN_Pretrained_SwinL224_Linear_Seeds" + run_name: "Pretrained_SwinL224_Linear_Seed4" + run_id: "swin_pretrained_swinl224_linear_seed4" + run_notes: "Linear LR seed ensemble. SwinL-224 in22k pretrained, heavy aug + MultiTask CE, 5-epoch warmup. Seed 4." + +augmentation: + use_advanced: true + randaugment: + num_ops: 2 + magnitude: 9 + mixup: + enabled: true + alpha: 0.8 + cutmix: + enabled: true + alpha: 1.0 + mixup_cutmix_prob: 0.5 + label_smoothing: 0.1 + random_erasing: + enabled: true + probability: 0.25 + min_area: 0.02 + max_area: 0.33 + color_jitter: + enabled: true + brightness: 0.4 + contrast: 0.4 + saturation: 0.4 + hue: 0.1 + +multi_task: + enabled: true + min_species_samples: 2 + family_weight: 0.2 + genus_weight: 0.3 + species_weight: 1.0 + +arcface: + enabled: false + embedding_size: 512 + scale: 30.0 + margin: 0.50 + num_subcenters: 3 + hybrid_ce_weight: 0.0 + +multi_crop: + enabled: false + crop_sizes: [400, 416, 448, 480, 512] + target_size: 224 + +wandb: + enabled: true + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/configs_advanced/swinv2_base_192_enhanced.yml b/finetuning/SWIN/configs_advanced/swinv2_base_192_enhanced.yml index 310d69d..b4183e8 100644 --- a/finetuning/SWIN/configs_advanced/swinv2_base_192_enhanced.yml +++ b/finetuning/SWIN/configs_advanced/swinv2_base_192_enhanced.yml @@ -101,6 +101,7 @@ multi_crop: # WandB configuration wandb: + enabled: true entity: "gardoslab" project: "herbdl" resume: "allow" diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr1e4.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr1e4.yml new file mode 100644 index 0000000..a8785e3 --- /dev/null +++ b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr1e4.yml @@ -0,0 +1,67 @@ +# SWIN Base — LR tuning: cosine schedule, lr=1e-4, no warmup +# Baseline of the cosine LR sweep (lowest LR). Compare with lr2e4 and lr5e4. +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR1E4" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR1E4" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0001 + num_train_epochs: 100 + warmup_steps: 0 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + torch_compile: true + report_to: "wandb" + bf16: true + dataloader_num_workers: -1 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base_LR_Sweep" + run_name: "SWIN_Base_Cosine_LR1e4" + run_id: "swin_base_cosine_lr1e4_052026" + run_notes: "LR sweep: cosine, lr=1e-4, no warmup. ImageNet-22k pretrained." + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr1e4_warmup.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr1e4_warmup.yml new file mode 100644 index 0000000..3db0559 --- /dev/null +++ b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr1e4_warmup.yml @@ -0,0 +1,67 @@ +# SWIN Base — LR tuning: cosine schedule, lr=1e-4, 5-epoch warmup (0 → 1e-4) +# Warmup variant of swin_base_cosine_lr1e4. warmup_steps=0.1 (ratio) → ~8964 warmup steps over 5 epochs. +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR1E4_WARMUP" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR1E4_WARMUP" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0001 + num_train_epochs: 100 + warmup_steps: 0.05 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + torch_compile: true + report_to: "wandb" + bf16: true + dataloader_num_workers: -1 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base_LR_Sweep" + run_name: "SWIN_Base_Cosine_LR1e4_Warmup" + run_id: "swin_base_cosine_lr1e4_warmup_052026" + run_notes: "LR sweep: cosine, lr=1e-4, 5-epoch warmup (0→1e-4). ImageNet-22k pretrained." + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr2e4.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr2e4.yml new file mode 100644 index 0000000..c9bc330 --- /dev/null +++ b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr2e4.yml @@ -0,0 +1,67 @@ +# SWIN Base — LR tuning: cosine schedule, lr=2e-4, no warmup +# Mid-point of the cosine LR sweep. Compare with lr1e4 and lr5e4. +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR2E4" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR2E4" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0002 + num_train_epochs: 100 + warmup_steps: 0 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + torch_compile: true + report_to: "wandb" + bf16: true + dataloader_num_workers: -1 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base_LR_Sweep" + run_name: "SWIN_Base_Cosine_LR2e4" + run_id: "swin_base_cosine_lr2e4_052026" + run_notes: "LR sweep: cosine, lr=2e-4, no warmup. ImageNet-22k pretrained." + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr2e4_warmup.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr2e4_warmup.yml new file mode 100644 index 0000000..8c037df --- /dev/null +++ b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr2e4_warmup.yml @@ -0,0 +1,67 @@ +# SWIN Base — LR tuning: cosine schedule, lr=2e-4, 5-epoch warmup (0 → 2e-4) +# Warmup variant of swin_base_cosine_lr2e4. warmup_steps=0.1 (ratio) → ~8964 warmup steps over 5 epochs. +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR2E4_WARMUP" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR2E4_WARMUP" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0002 + num_train_epochs: 100 + warmup_steps: 0.05 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + torch_compile: true + report_to: "wandb" + bf16: true + dataloader_num_workers: -1 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base_LR_Sweep" + run_name: "SWIN_Base_Cosine_LR2e4_Warmup" + run_id: "swin_base_cosine_lr2e4_warmup_052026" + run_notes: "LR sweep: cosine, lr=2e-4, 5-epoch warmup (0→2e-4). ImageNet-22k pretrained." + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr5e4.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr5e4.yml new file mode 100644 index 0000000..cf2860c --- /dev/null +++ b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr5e4.yml @@ -0,0 +1,67 @@ +# SWIN Base — LR tuning: cosine schedule, lr=5e-4, no warmup +# Upper end of the cosine LR sweep. Compare with lr1e4 and lr2e4. +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR5E4" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR5E4" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0005 + num_train_epochs: 100 + warmup_steps: 0 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + torch_compile: true + report_to: "wandb" + bf16: true + dataloader_num_workers: -1 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base_LR_Sweep" + run_name: "SWIN_Base_Cosine_LR5e4" + run_id: "swin_base_cosine_lr5e4_052026" + run_notes: "LR sweep: cosine, lr=5e-4, no warmup. ImageNet-22k pretrained." + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr5e4_warmup.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr5e4_warmup.yml new file mode 100644 index 0000000..324e2f8 --- /dev/null +++ b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr5e4_warmup.yml @@ -0,0 +1,67 @@ +# SWIN Base — LR tuning: cosine schedule, lr=5e-4, 5-epoch warmup (0 → 5e-4) +# Warmup variant of swin_base_cosine_lr5e4. warmup_steps=0.1 (ratio) → ~8964 warmup steps over 5 epochs. +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR5E4_WARMUP" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR5E4_WARMUP" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0005 + num_train_epochs: 100 + warmup_steps: 0.05 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "cosine" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + torch_compile: true + report_to: "wandb" + bf16: true + dataloader_num_workers: -1 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "cosine" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base_LR_Sweep" + run_name: "SWIN_Base_Cosine_LR5e4_Warmup" + run_id: "swin_base_cosine_lr5e4_warmup_052026" + run_notes: "LR sweep: cosine, lr=5e-4, 5-epoch warmup (0→5e-4). ImageNet-22k pretrained." + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr1e4.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr1e4.yml new file mode 100644 index 0000000..318ab7a --- /dev/null +++ b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr1e4.yml @@ -0,0 +1,67 @@ +# SWIN Base — LR tuning: linear schedule, lr=1e-4, no warmup +# Baseline of the linear LR sweep (lowest LR). Compare with lr2e4 and lr5e4. +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR1E4" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR1E4" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0001 + num_train_epochs: 100 + warmup_steps: 0 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "linear" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + torch_compile: true + report_to: "wandb" + bf16: true + dataloader_num_workers: -1 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "linear" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base_LR_Sweep" + run_name: "SWIN_Base_Linear_LR1e4" + run_id: "swin_base_linear_lr1e4_052026" + run_notes: "LR sweep: linear, lr=1e-4, no warmup. ImageNet-22k pretrained." + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr1e4_warmup.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr1e4_warmup.yml new file mode 100644 index 0000000..2f8505c --- /dev/null +++ b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr1e4_warmup.yml @@ -0,0 +1,67 @@ +# SWIN Base — LR tuning: linear schedule, lr=1e-4, 5-epoch warmup (0 → 1e-4) +# Warmup variant of swin_base_linear_lr1e4. warmup_steps=0.1 (ratio) → ~8964 warmup steps over 5 epochs. +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR1E4_WARMUP" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR1E4_WARMUP" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0001 + num_train_epochs: 100 + warmup_steps: 0.05 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "linear" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + torch_compile: true + report_to: "wandb" + bf16: true + dataloader_num_workers: -1 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "linear" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base_LR_Sweep" + run_name: "SWIN_Base_Linear_LR1e4_Warmup" + run_id: "swin_base_linear_lr1e4_warmup_052026" + run_notes: "LR sweep: linear, lr=1e-4, 5-epoch warmup (0→1e-4). ImageNet-22k pretrained." + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr2e4.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr2e4.yml new file mode 100644 index 0000000..ccf82aa --- /dev/null +++ b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr2e4.yml @@ -0,0 +1,67 @@ +# SWIN Base — LR tuning: linear schedule, lr=2e-4, no warmup +# Mid-point of the linear LR sweep. Compare with lr1e4 and lr5e4. +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR2E4" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR2E4" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0002 + num_train_epochs: 100 + warmup_steps: 0 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "linear" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + torch_compile: true + report_to: "wandb" + bf16: true + dataloader_num_workers: -1 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "linear" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base_LR_Sweep" + run_name: "SWIN_Base_Linear_LR2e4" + run_id: "swin_base_linear_lr2e4_052026" + run_notes: "LR sweep: linear, lr=2e-4, no warmup. ImageNet-22k pretrained." + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr2e4_warmup.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr2e4_warmup.yml new file mode 100644 index 0000000..ae117d0 --- /dev/null +++ b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr2e4_warmup.yml @@ -0,0 +1,67 @@ +# SWIN Base — LR tuning: linear schedule, lr=2e-4, 5-epoch warmup (0 → 2e-4) +# Warmup variant of swin_base_linear_lr2e4. warmup_steps=0.1 (ratio) → ~8964 warmup steps over 5 epochs. +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR2E4_WARMUP" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR2E4_WARMUP" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0002 + num_train_epochs: 100 + warmup_steps: 0.05 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "linear" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + torch_compile: true + report_to: "wandb" + bf16: true + dataloader_num_workers: -1 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "linear" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base_LR_Sweep" + run_name: "SWIN_Base_Linear_LR2e4_Warmup" + run_id: "swin_base_linear_lr2e4_warmup_052026" + run_notes: "LR sweep: linear, lr=2e-4, 5-epoch warmup (0→2e-4). ImageNet-22k pretrained." + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr5e4.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr5e4.yml new file mode 100644 index 0000000..dbe188a --- /dev/null +++ b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr5e4.yml @@ -0,0 +1,68 @@ +# SWIN Base — LR tuning: linear schedule, lr=5e-4, no warmup +# Upper end of the linear LR sweep. Compare with lr1e4 and lr2e4. +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR5E4" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR5E4" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0005 + num_train_epochs: 100 + warmup_steps: 0 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "linear" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + torch_compile: true + report_to: "wandb" + bf16: true + dataloader_num_workers: -1 + dataloader_pin_memory: false + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "linear" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base_LR_Sweep" + run_name: "SWIN_Base_Linear_LR5e4" + run_id: "swin_base_linear_lr5e4_052026" + run_notes: "LR sweep: linear, lr=5e-4, no warmup. ImageNet-22k pretrained." + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr5e4_warmup.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr5e4_warmup.yml new file mode 100644 index 0000000..8420d46 --- /dev/null +++ b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr5e4_warmup.yml @@ -0,0 +1,67 @@ +# SWIN Base — LR tuning: linear schedule, lr=5e-4, 5-epoch warmup (0 → 5e-4) +# Warmup variant of swin_base_linear_lr5e4. warmup_steps=0.1 (ratio) → ~8964 warmup steps over 5 epochs. +model: + model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k" + config_name: null + cache_dir: null + model_revision: "main" + image_processor_name: null + token: null + trust_remote_code: false + ignore_mismatched_sizes: true + +data: + dataset_name: null + dataset_config_name: null + data_file: null + data_dir: null + train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json" + validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json" + image_column_name: "filename" + label_column_name: "scientificNameEncoded" + max_seq_length: 15 + max_train_samples: null + max_eval_samples: null + overwrite_cache: false + preprocessing_num_workers: null + train_val_split: 0.2 + +training: + output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR5E4_WARMUP" + logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR5E4_WARMUP" + do_train: true + do_eval: true + per_device_train_batch_size: 128 + per_device_eval_batch_size: 32 + learning_rate: 0.0005 + num_train_epochs: 100 + warmup_steps: 0.05 + weight_decay: 0.01 + gradient_accumulation_steps: 1 + lr_scheduler_type: "linear" + logging_strategy: "epoch" + save_strategy: "epoch" + save_total_limit: 5 + eval_strategy: "steps" + eval_steps: 8964 + torch_compile: true + report_to: "wandb" + bf16: true + dataloader_num_workers: -1 + remove_unused_columns: false + overwrite_output_dir: false + seed: 0 + +custom: + lr_type: "linear" + frozen: false + frozen_type: "none" + run_group: "SWIN_Base_LR_Sweep" + run_name: "SWIN_Base_Linear_LR5e4_Warmup" + run_id: "swin_base_linear_lr5e4_warmup_052026" + run_notes: "LR sweep: linear, lr=5e-4, 5-epoch warmup (0→5e-4). ImageNet-22k pretrained." + +wandb: + entity: "gardoslab" + project: "herbdl" + resume: "allow" diff --git a/finetuning/SWIN/launch_sweep.py b/finetuning/SWIN/launch_sweep.py new file mode 100644 index 0000000..47b1ec5 --- /dev/null +++ b/finetuning/SWIN/launch_sweep.py @@ -0,0 +1,124 @@ +#!/usr/bin/env python3 +""" +Launch a sweep of training jobs from a single base config + a list of per-job overrides. + +Usage: + python launch_sweep.py --base configs/swin_base_pretrained_linear.yml --sweep my_sweep.yml + python launch_sweep.py --base configs/swin_base_pretrained_linear.yml --sweep my_sweep.yml --dry-run + +Sweep YAML format: + qsub: # optional — overrides defaults below + h_rt: "48:00:00" + gpus: 1 + gpu_c: 7.0 + pe: "omp 8" + + experiments: + - training.seed: 0 + custom.run_id: my_run_seed0_052026 + training.output_dir: /path/to/output/SEED0 + training.logging_dir: /path/to/output/SEED0 + + - training.seed: 1 + custom.run_id: my_run_seed1_052026 + training.output_dir: /path/to/output/SEED1 + training.logging_dir: /path/to/output/SEED1 +""" + +import argparse +import os +import subprocess +import sys + +import yaml + +DEFAULTS = { + "h_rt": "48:00:00", + "pe": "omp 8", + "P": "herbdl", + "gpus": 1, + "gpu_c": 8.0, + "M": "faridkar@bu.edu", +} + +SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__)) +TRAIN_SCRIPT = os.path.join(SCRIPT_DIR, "train_advanced.sh") + + +def build_set_args(overrides: dict) -> str: + """Convert {key: value} overrides to '--set key=value ...' string.""" + parts = [] + for key, value in overrides.items(): + parts.append(f"--set {key}={value}") + return " ".join(parts) + + +def submit(base_config: str, overrides: dict, qsub_opts: dict, dry_run: bool) -> None: + run_id = overrides.get("custom.run_id", "unknown") + job_name = run_id.upper().replace("-", "_")[:15] # qsub name limit + set_args = build_set_args(overrides) + + cmd = [ + "qsub", + "-l", f"h_rt={qsub_opts['h_rt']}", + "-pe", qsub_opts["pe"], + "-P", qsub_opts["P"], + "-l", f"gpus={qsub_opts['gpus']}", + "-l", f"gpu_c={qsub_opts['gpu_c']}", + "-m", "beas", + "-M", qsub_opts["M"], + "-N", job_name, + "-v", f"CONFIG_FILE={base_config},SET_ARGS={set_args}", + TRAIN_SCRIPT, + ] + + print(f"[{'DRY RUN' if dry_run else 'SUBMIT'}] {run_id}") + print(f" overrides : {set_args or '(none)'}") + print(f" job name : {job_name}") + print(f" command : {' '.join(cmd)}") + print() + + if not dry_run: + result = subprocess.run(cmd, capture_output=True, text=True) + if result.returncode != 0: + print(f" ERROR: {result.stderr.strip()}", file=sys.stderr) + else: + print(f" {result.stdout.strip()}") + + +def main(): + parser = argparse.ArgumentParser(description="Launch a sweep of qsub training jobs") + parser.add_argument("--base", required=True, help="Base config YAML path") + parser.add_argument("--sweep", required=True, help="Sweep spec YAML path") + parser.add_argument("--dry-run", action="store_true", help="Print commands without submitting") + args = parser.parse_args() + + if not os.path.isfile(args.base): + sys.exit(f"Base config not found: {args.base}") + if not os.path.isfile(args.sweep): + sys.exit(f"Sweep file not found: {args.sweep}") + + with open(args.sweep) as f: + sweep = yaml.safe_load(f) + + experiments = sweep.get("experiments", []) + if not experiments: + sys.exit("No experiments found in sweep file.") + + qsub_opts = {**DEFAULTS, **sweep.get("qsub", {})} + + print(f"Base config : {args.base}") + print(f"Experiments : {len(experiments)}") + print(f"qsub opts : {qsub_opts}") + print() + + for exp in experiments: + # Convert all values to strings for --set compatibility + overrides = {str(k): str(v) for k, v in exp.items()} + submit(args.base, overrides, qsub_opts, args.dry_run) + + print("Done." if not args.dry_run else "Dry run complete — nothing submitted.") + + +if __name__ == "__main__": + main() diff --git a/finetuning/SWIN/submit_concrete.sh b/finetuning/SWIN/submit_concrete.sh new file mode 100755 index 0000000..b22dff9 --- /dev/null +++ b/finetuning/SWIN/submit_concrete.sh @@ -0,0 +1,64 @@ +#!/bin/bash +# General-purpose seed-ensemble launcher for any advanced config. +# Defaults to the concrete SWIN-L 384 run for backward compatibility. +# +# Key env vars (all optional): +# CONFIG — config file path (default: swin_large_384_concrete.yml) +# RUN_PREFIX — base name used for output dirs and W&B run id/name +# (default: SWIN_L_384_CONCRETE) +# OUT_BASE — output root (default: .../workspaces/faridkar/.../SWIN) +# SEEDS — space-separated (default: "0 1 2") +# NGPUS — GPUs per job (default: 1; set to 2 for multi-GPU / DDP) +# GPU_MEM — GPU memory request (default: 80G) +# CKPT — warm-start checkpoint dir (overrides model.model_name_or_path) +# EMAIL — notification email (default: faridkar@bu.edu) +# +# Usage examples: +# bash submit_concrete.sh # concrete run, seeds 0 1 2 (defaults) +# SEEDS="0 1 2 3 4" bash submit_concrete.sh # concrete run, 5 seeds +# +# CONFIG=configs_advanced/swinv2_large_192_heavy_multitask.yml \ +# RUN_PREFIX=SWINV2_L_192_HEAVY_MT \ +# SEEDS="0 1 2 3 4" \ +# NGPUS=2 \ +# bash submit_concrete.sh +# +# Nothing is auto-submitted by Claude — run this yourself when ready. + +SEEDS=${SEEDS:-"0 1 2"} +CONFIG=${CONFIG:-"configs_advanced/swin_large_384_concrete.yml"} +RUN_PREFIX=${RUN_PREFIX:-"SWIN_L_384_CONCRETE"} +OUT_BASE=${OUT_BASE:-"/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN"} +NGPUS=${NGPUS:-1} +GPU_MEM=${GPU_MEM:-"80G"} +EMAIL=${EMAIL:-"faridkar@bu.edu"} + +OMP_THREADS=$(( NGPUS * 8 )) +QSUB_ARGS="-l h_rt=48:00:00 -pe omp ${OMP_THREADS} -P herbdl -l gpus=${NGPUS} -l gpu_c=8.0 -l gpu_memory=${GPU_MEM} -m beas -M ${EMAIL}" + +# Pass NPROC_PER_NODE only when using multiple GPUs (triggers torchrun in train_advanced.sh) +NPROC_VAR="" +[ "$NGPUS" -gt 1 ] && NPROC_VAR=",NPROC_PER_NODE=${NGPUS}" + +# Derive a short job-name prefix (first 8 chars, uppercase, no special chars) +JOB_PREFIX=$(echo "$RUN_PREFIX" | tr '[:lower:]' '[:upper:]' | tr -cd 'A-Z0-9' | cut -c1-8) + +for seed in $SEEDS; do + RUN_ID=$(echo "${RUN_PREFIX}_seed${seed}" | tr '[:upper:]' '[:lower:]') + RUN_NAME="${RUN_PREFIX}_Seed${seed}" + OUT="${OUT_BASE}/${RUN_PREFIX}_SEED${seed}" + + SET_ARGS="--set training.seed=${seed} --set training.output_dir=${OUT} --set training.logging_dir=${OUT} --set custom.run_id=${RUN_ID} --set custom.run_name=${RUN_NAME}" + if [ -n "$CKPT" ]; then + SET_ARGS="${SET_ARGS} --set model.model_name_or_path=${CKPT}" + fi + + JOB=$(qsub $QSUB_ARGS \ + -N "${JOB_PREFIX}_S${seed}" \ + -v CONFIG_FILE="${CONFIG}",SET_ARGS="${SET_ARGS}"${NPROC_VAR} \ + train_advanced.sh | grep -oP '(?<=job )\d+') + echo "Submitted seed ${seed}: job ${JOB} -> ${OUT}" +done + +echo +echo "Monitor with: qstat -u \$USER" diff --git a/finetuning/SWIN/submit_pretrained_seeds.sh b/finetuning/SWIN/submit_pretrained_seeds.sh new file mode 100755 index 0000000..37017a2 --- /dev/null +++ b/finetuning/SWIN/submit_pretrained_seeds.sh @@ -0,0 +1,26 @@ +#!/bin/bash +# Submit 5 independent seed runs of the pretrained-384 full pipeline. +# Each job runs on 1 GPU (A100 80G) for up to 48h. +# +# Usage: +# bash submit_pretrained_seeds.sh # submit all 5 seeds +# SEEDS="0 2 4" bash submit_pretrained_seeds.sh # submit a subset + +SEEDS=${SEEDS:-"0 1 2 3 4"} + +QSUB_ARGS="-l h_rt=48:00:00 -pe omp 8 -P herbdl -l gpus=1 -l gpu_c=8.0 -l gpu_memory=80G -m beas -M faridkar@bu.edu" + +for seed in $SEEDS; do + CONFIG="configs_advanced/swin_pretrained_384_seed${seed}.yml" + JOB_NAME="PRETRAINED_384_S${seed}" + + JOB=$(qsub $QSUB_ARGS \ + -N "$JOB_NAME" \ + -v CONFIG_FILE="$CONFIG" \ + train_advanced.sh | grep -oP '(?<=job )\d+') + + echo "Submitted seed ${seed}: job ${JOB} (${CONFIG})" +done + +echo "" +echo "Monitor with: qstat -u faridkar" diff --git a/finetuning/SWIN/train_advanced.sh b/finetuning/SWIN/train_advanced.sh index 23c571e..3407ee5 100755 --- a/finetuning/SWIN/train_advanced.sh +++ b/finetuning/SWIN/train_advanced.sh @@ -1,23 +1,38 @@ #!/bin/bash -l module load miniconda -module load academic-ml/fall-2025 +module load academic-ml/spring-2026 -conda activate herb_env +conda activate spring-2026-pyt -# Path to config file - can be set via environment variable or use default -# Options: -# - configs_advanced/swin_base_224_enhanced.yml -# - configs_advanced/swin_base_384_enhanced.yml -# - configs_advanced/swinv2_base_192_enhanced.yml -# If CONFIG_FILE is not set (e.g., via qsub -v), use default +# CONFIG_FILE must be provided (e.g. via `qsub -v CONFIG_FILE=...`, as submit_concrete.sh +# does). Fail fast rather than silently running an arbitrary default config. if [ -z "$CONFIG_FILE" ]; then - CONFIG_FILE="configs_advanced/swin_base_224_multitask.yml" + echo "ERROR: CONFIG_FILE is not set. Pass it explicitly, e.g.:" >&2 + echo " qsub -v CONFIG_FILE=configs_advanced/swin_large_384_concrete.yml ... train_advanced.sh" >&2 + echo " (or use submit_concrete.sh, which sets it for you)" >&2 + exit 1 +fi + +if [ ! -f "$CONFIG_FILE" ]; then + echo "ERROR: CONFIG_FILE '$CONFIG_FILE' not found (cwd: $(pwd))." >&2 + exit 1 fi echo "Using config file: $CONFIG_FILE" +[ -n "$SET_ARGS" ] && echo "Overrides: $SET_ARGS" -python SWIN_finetuning_advanced.py --config $CONFIG_FILE +# Multi-GPU: set NPROC_PER_NODE= in the qsub -v args to launch with torchrun (DDP). +# Single GPU (default): plain python. +NPROC=${NPROC_PER_NODE:-1} +if [ "$NPROC" -gt 1 ]; then + echo "Launching with torchrun --nproc_per_node=$NPROC" + torchrun --nproc_per_node=$NPROC --standalone \ + SWIN_finetuning_advanced.py --config $CONFIG_FILE ${SET_ARGS} +else + python SWIN_finetuning_advanced.py --config $CONFIG_FILE ${SET_ARGS} +fi # Example qsub command for multi-GPU training: -# qsub -l h_rt=48:00:00 -pe omp 16 -P herbdl -l gpus=2 -l gpu_c=8.0 -l gpu_memory=80G -m beas -M faridkar@bu.edu -N SWINB_MT train_advanced.sh +# qsub -l h_rt=48:00:00 -pe omp 16 -P herbdl -l gpus=2 -l gpu_c=8.0 -l gpu_memory=80G \ +# -v NPROC_PER_NODE=2 -m beas -M faridkar@bu.edu -N SWIN_MULTIGPU train_advanced.sh