diff --git a/finetuning/SWIN/CONCRETE_RUN_README.md b/finetuning/SWIN/CONCRETE_RUN_README.md
new file mode 100644
index 0000000..fdcf5ac
--- /dev/null
+++ b/finetuning/SWIN/CONCRETE_RUN_README.md
@@ -0,0 +1,260 @@
+# Concrete next run — SWIN-L 384 (Tier 1 & 2)
+
+Implements the "Putting it together — a concrete next run" recipe from
+`SWIN_training_setup_summary.md` (§10 recommendations, §11 step-by-step), as a single
+strong single-model candidate plus the code features it needs.
+
+## What this run is
+
+| Lever | Choice | Section |
+|-------|--------|---------|
+| Backbone | `microsoft/swin-large-patch4-window12-384-in22k` (stay in SWIN) | 1.1 / 1.2 |
+| Resolution | 384 (processor-driven, no resize code) | 1.2 |
+| Loss | balanced-softmax CE + multi-task family/genus/species heads | 1.3-A |
+| Schedule | 100 epochs, 5% warmup, cosine, **EMA on** | 2.6 |
+| Augmentation | MEDIUM (RandAug mag 7, mild mixup/erasing) — cold-start safe | 2.5 |
+| Inference | multi-crop + flip TTA wired, enabled only for final prediction | 2.7 |
+
+Config: [`configs_advanced/swin_large_384_concrete.yml`](configs_advanced/swin_large_384_concrete.yml)
+
+## Code features added to `SWIN_finetuning_advanced.py`
+
+All are config-gated and default **off**, so existing configs behave exactly as before.
+
+1. **Balanced softmax / logit adjustment (Tier 1.3-A)** — new `long_tail` section.
+   A per-class `log_prior` (log training frequency, in the species/CE head's index
+   space) is added to the species logits **during training only** (`logits + tau*log_prior`),
+   then plain argmax at inference. Down-weights head classes to lift macro-F1 on the long
+   tail. Applied in `MixupTrainer.compute_loss` for single-task and multi-task, in both the
+   mixup and non-mixup paths. Not applied to ArcFace.
+   ```yaml
+   long_tail:
+     logit_adjustment: true
+     tau: 1.0          # strength; 1.0 = standard balanced softmax
+   ```
+
+2. **Weight EMA (Tier 2.6)** — new `ema` section + `EMACallback`.
+   Maintains a shadow average of the parameters (`shadow = decay*shadow + (1-decay)*param`
+   every step) and copies it into the model at train end, so the final `evaluate()` and
+   `save_model()` reflect EMA weights. **Keep `load_best_model_at_end: false`** — the
+   best-checkpoint reload would otherwise be overwritten by the EMA copy.
+   ```yaml
+   ema:
+     enabled: true
+     decay: 0.9998
+   ```
+
+3. **Horizontal-flip TTA (Tier 2.7)** — `multi_crop.flip`.
+   `build_multi_crop_transforms(..., flip=True)` also emits a flipped variant of each crop,
+   so logits average over crops × {orig, flip}. Leave `multi_crop.enabled: false` during
+   training; enable it for the final/leaderboard prediction only.
+
+4. **Gradient-checkpointing passthrough** — `MultiTaskSwinModel` / `SwinWithArcFace` now
+   forward `gradient_checkpointing_enable/disable` to the backbone, so
+   `training.gradient_checkpointing: true` works for the wrapped models (needed to fit
+   SWIN-L @384 on one GPU).
+
+## Environment setup (one-time)
+
+Jobs run via `train_advanced.sh`, which loads:
+
+```bash
+module load miniconda
+module load academic-ml/spring-2026
+conda activate spring-2026-pyt
+```
+
+`spring-2026-pyt` already provides torch 2.9.1, transformers 4.57.3 (≥4.52, required),
+datasets, accelerate, safetensors, torchvision, scikit-learn, pillow, pyyaml, numpy. Two
+packages it does **not** include are needed by the trainer — install them once into your
+user-site:
+
+```bash
+module load miniconda && conda activate spring-2026-pyt
+pip install --user evaluate wandb
+```
+
+Notes:
+- `evaluate` is required (accuracy / macro-F1); `wandb` is needed because the configs use
+  `report_to: wandb`. Set `--set training.report_to=none` (or `wandb.enabled: false`) to skip W&B.
+- If `import wandb` fails with `cannot import name 'validate_core_schema' from 'pydantic_core'`,
+  the `--user` install shadowed the env's `pydantic_core`. Remove the duplicate so the env's
+  copy is used again:
+  `rm -rf ~/.local/lib/python3.12/site-packages/pydantic_core ~/.local/lib/python3.12/site-packages/pydantic_core-*.dist-info`
+- `evaluate.load(...)` downloads its metric script from the HF hub on first use and caches it
+  under `~/.cache/huggingface`. Run the smoke test (below) once from a login node to warm the
+  cache if your compute nodes can't reach the hub.
+- The PyTorch env requires `gpu_c >= 7.0`; the submit scripts request `gpu_c=8.0` (A100), so OK.
+
+Sanity check the env:
+```bash
+python -c "import torch, transformers, datasets, evaluate, wandb; print('env OK')"
+```
+
+### Weights & Biases (logs to gardoslab / herbdl)
+
+The trainer calls `wandb.init(entity="gardoslab", project="herbdl", name=run_name,
+group=run_group, id=run_id, ...)` straight from the config (see
+`SWIN_finetuning_advanced.py`), so no code change is needed — you only need team membership
++ a valid API key.
+
+1. **Be a member of the `gardoslab` team.** Open <https://wandb.ai/gardoslab> while signed in.
+   If you can't see it, ask the team owner to invite your W&B username. `entity="gardoslab"`
+   fails with a permission error until you're a member — being logged in is not enough.
+
+2. **Authenticate on SCC** (login node; `~/.netrc` is shared, so compute-node jobs reuse it —
+   no per-job login). Grab your key from <https://wandb.ai/authorize>:
+   ```bash
+   module load miniconda && conda activate spring-2026-pyt
+   wandb login --relogin        # paste key; --relogin replaces a stale key
+   ```
+
+3. **Verify** (the stored key can be stale even though `~/.netrc` exists):
+   ```bash
+   wandb login --verify
+   python -c "import wandb; v=wandb.Api().viewer; print(v.username, '| teams:', v.teams)"
+   ```
+   `gardoslab` should appear in `teams`.
+
+Notes:
+- Alternative to `~/.netrc`: `export WANDB_API_KEY=<key>` in your shell profile (keeps the
+  key out of any committed script).
+- The seed loop in `submit_concrete.sh` sets a distinct `run_id`/`run_name` per seed, so seeds
+  appear as separate runs grouped under `SWIN_L_384_Concrete`.
+- To skip W&B for a run: `--set training.report_to=none` (or `wandb.enabled: false`).
+- If a compute node can't reach W&B: `export WANDB_MODE=offline`, then `wandb sync <run_dir>` later.
+
+## How to launch (you run this — nothing is auto-submitted)
+
+Single run (seed 0):
+```bash
+cd finetuning/SWIN
+SEEDS="0" bash submit_concrete.sh
+```
+
+3- or 5-seed ensemble:
+```bash
+bash submit_concrete.sh                 # seeds 0 1 2
+SEEDS="0 1 2 3 4" bash submit_concrete.sh
+```
+
+Each job requests 1 A100-80G GPU on `herbdl` for 48h and writes to
+`finetuning/output/SWIN/SWIN_L_384_CONCRETE_SEED<seed>/`. Adjust the `-M` email in
+`submit_concrete.sh` if needed.
+
+### Smoke test first (recommended)
+Verify the pipeline end-to-end cheaply before committing 48h jobs:
+```bash
+qsub -l h_rt=2:00:00 -pe omp 8 -P herbdl -l gpus=1 -l gpu_c=8.0 -l gpu_memory=80G \
+     -N SWINL384_SMOKE \
+     -v CONFIG_FILE=configs_advanced/swin_large_384_concrete.yml \
+-v SET_ARGS="--set data.max_train_samples=2000 --set data.max_eval_samples=2000 --set training.num_train_epochs=1 --set training.output_dir=/projectnb/herbdl/workspaces/tgardos/herbdl/finetuning/output/SWIN/SMOKE --set training.overwrite_output_dir=true --set wandb.enabled=false" \
+     train_advanced.sh
+```
+
+## Output paths auto-relocate to your workspace
+
+Most configs in this repo (inherited from faridkar's) hardcode `output_dir`/`logging_dir`
+under `/projectnb/herbdl/workspaces/faridkar/herbdl/...`. The trainer rewrites any
+`.../workspaces/<author>/herbdl` prefix to the repo you actually run from, preserving the
+trailing run name — so a `tgardos` checkout writes to
+`/projectnb/herbdl/workspaces/tgardos/herbdl/finetuning/output/SWIN/<NAME>` automatically,
+with no YAML edits. It logs the rewrite (`__CUSTOM__: Relocated output path ...`). Set
+`HERBDL_NO_RELOCATE=1` to disable (e.g. to write somewhere else via an explicit path).
+
+## Warm-start (Tier 2.5 — recommended once a 384 checkpoint exists)
+
+Cold-from-in22k is the dependency-free default. The curriculum finding is that chaining a
+hard change from a converged checkpoint beats cold-starting it. Once you have a converged
+SWIN-L 384 run, chain from it (keep `config_name`/`image_processor_name` on the 384 arch)
+and raise `augmentation.randaugment.magnitude` to 9:
+```bash
+CKPT=/projectnb/herbdl/workspaces/tgardos/herbdl/finetuning/output/SWIN/SWIN_L_384_CONCRETE_SEED0 \
+    SEEDS="1" bash submit_concrete.sh
+```
+
+## OOM / memory tuning
+
+SWIN-L @384 is heavy. If a job OOMs, lower the per-device batch and raise grad-accum to
+keep the effective batch (~128) constant, e.g. via `--set`:
+```
+--set training.per_device_train_batch_size=8 --set training.gradient_accumulation_steps=16
+```
+`gradient_checkpointing: true` is already on.
+
+## Final prediction with TTA
+
+For the leaderboard/final eval, enable TTA on the trained checkpoint:
+```yaml
+multi_crop:
+  enabled: true
+  crop_sizes: [400, 416, 448, 480, 512]
+  target_size: 384
+  flip: true
+```
+The trainer runs `multi_crop_evaluate` after the standard eval and prints averaged
+accuracy + macro-F1 (`__CUSTOM__: Multi-crop eval ...`). Mirror the same crops/flip in
+`prediction.py` / `kaggle_submission.py` so the submission matches the eval.
+
+## Metrics
+
+Both top-1 accuracy and macro-F1 are reported every epoch (`eval_accuracy` /
+`eval_species_f1` for multi-task). Macro-F1 over the long tail is the number to watch
+(Tier 0).
+
+## Remote monitoring from phone / MacBook (Claude Code Remote Control)
+
+To babysit a run (check `qstat`, read logs, tweak configs) from an iPhone or MacBook, use
+Claude Code **Remote Control** — the `claude` process keeps running on the SCC login node
+(full `/projectnb` + `qsub` access), and your phone/browser are just remote windows into it.
+This is different from *Claude Code on the web*, whose cloud sandbox has **no** SCC access.
+
+### Updating Claude Code on SCC (needed: ≥ 2.1.51 for Remote Control)
+
+Claude Code here is installed as an npm **prefix** install and run via a shell alias:
+```bash
+alias claude='npx --prefix ~/claude-code claude'
+```
+Because of that, `claude update` does **not** work — it targets npm's global prefix, which
+is the read-only shared module dir (`/share/pkg.8/.../spring-2026-pyt`). Update the copy the
+alias actually uses instead:
+```bash
+module load miniconda && conda activate spring-2026-pyt   # for a consistent node/npm
+npm install --prefix ~/claude-code @anthropic-ai/claude-code@latest
+npx --prefix ~/claude-code claude --version               # confirm >= 2.1.51
+```
+Re-run that `npm install --prefix` line whenever you want to upgrade (don't use `claude update`).
+
+### Starting a Remote Control session
+
+Remote Control requires a **claude.ai subscription login (Pro/Max/Team/Enterprise) — API keys
+are not supported**. On the SCC login node:
+```bash
+unset ANTHROPIC_API_KEY          # if set, it blocks Remote Control
+claude /login                    # choose the claude.ai option (not a Console API key)
+
+tmux new -s claude-hpc           # persistent: survives SSH disconnects
+# inside tmux:
+cd /projectnb/herbdl/workspaces/tgardos/herbdl
+claude remote-control --name "HerbDL SWIN-L 384"
+```
+It prints a session URL and offers a QR code (press space). Detach with `Ctrl-b d`; Claude
+keeps running.
+
+- **iPhone:** Claude app → **Code** tab → pick "HerbDL SWIN-L 384" (or scan the QR).
+- **MacBook:** open the session URL, or go to **claude.ai/code** and pick the session. For a
+  local terminal instead: `ssh -t scc1.bu.edu "tmux attach -t claude-hpc"`.
+
+Notes:
+- Keep Claude on the **login node** (lightweight coordinator); GPU training stays in `qsub`
+  jobs on compute nodes. Don't run training directly under Claude.
+- Remote Control can **push a phone notification** when a long task finishes (enable via `/config`).
+- Text commands (`/context`, `/usage`) work from mobile; interactive pickers (`/resume`, `/mcp`)
+  only from the local terminal.
+
+## Deferred (next ensemble members)
+
+Per the chosen scope, these are intentionally **not** in this run and remain available to
+add later as additional ensemble members: domain-pretrained backbone swap (Tier 1.1
+timm/open_clip loader), warmed-up ArcFace rescue (Tier 2.4), class-balanced sampler /
+two-stage cRT (Tier 1.3-B), and +2021 data (Tier 2.8).
diff --git a/finetuning/SWIN/CURRICULUM_REPORT.md b/finetuning/SWIN/CURRICULUM_REPORT.md
new file mode 100644
index 0000000..d12332a
--- /dev/null
+++ b/finetuning/SWIN/CURRICULUM_REPORT.md
@@ -0,0 +1,125 @@
+# Curriculum Learning — Stage-by-Stage Impact Report
+
+## Starting Point: SWIN_BASE_BASELINE
+
+**What it is:** SWIN-Base (224px, ImageNet-22k pretrained), fine-tuned with standard CE loss, no augmentation beyond basic resizing/normalization, unfrozen backbone from the start.
+
+**Result:** Peak F1 = **0.7454** @ epoch 47.8
+
+**Interpretation:** Solid starting point. Slow convergence curve — model starts at 0.58 F1 and takes ~48 epochs to plateau. This is the reference to beat.
+
+---
+
+## Interlude: Standalone Augmentation Test (SWIN_BASE_224_AUGMENTED)
+
+**What it added:** Heavy augmentation (RandAugment mag=9, Mixup α=0.8, CutMix α=1.0, RandomErasing 25%, label smoothing 0.1) applied directly from scratch — no warm-up, no curriculum.
+
+**Result:** Peak F1 = **0.6118** @ epoch 44.4 — **worse than baseline by 3.4 points**
+
+**Why it failed:** Throwing all regularization at a model cold is destructive. Strong Mixup/CutMix targets corrupt learning signal before the backbone has stabilized. The model oscillates and never recovers — note the flat 0.57–0.61 plateau from epoch 20–99. This is the key motivation for curriculum learning.
+
+---
+
+## Curriculum Stage 1 — Mild Augmentation Warm-up
+
+**What changed:** Initialized from baseline checkpoint. RandAugment mag=4 (mild), Mixup α=0.8, CutMix α=1.0, RandomErasing p=0.1, label smoothing 0.05. LR = 5e-5.
+
+**Result:** Peak F1 = **0.7214** @ epoch 23.9
+
+**Interpretation:** Starts immediately at 0.69 F1 (baseline already baked in), reaches 0.72 in 24 epochs. The mild augmentation + lower LR successfully builds on the baseline without disrupting it. Notably, this run converges faster than the baseline — 0.69 at epoch 3 vs. 0.58 for baseline.
+
+**Gain vs baseline at epoch 24:** +0.013 F1
+
+---
+
+## Curriculum Stage 2 — Medium Augmentation
+
+**What changed:** From S1 checkpoint. RandAugment mag=7 (stepped up), RandomErasing p=0.15, label smoothing 0.1. LR = 3e-5.
+
+**Result:** Peak F1 = **0.7421** @ epoch 27.3
+
+**Gain vs S1:** +0.021 F1
+
+**Interpretation:** The stepped-up augmentation is now helping rather than hurting, because the backbone is already warm. Model jumps to 0.72 at epoch 3 and climbs to 0.74 by epoch 27.
+
+---
+
+## Curriculum Stage 3 — Heavy Augmentation
+
+**What changed:** From S2 checkpoint. RandAugment mag=9 (full strength), RandomErasing p=0.25. LR = 2e-5. 50 epochs.
+
+**Result:** Peak F1 = **0.7510** @ epoch 41.0
+
+**Gain vs S2:** +0.009 F1. Diminishing returns beginning.
+
+**Interpretation:** Full augmentation now converges to a higher ceiling than baseline. However, the improvement margin is shrinking. The model starts at 0.74 immediately and creeps upward slowly — most gain is in early epochs, then it plateaus.
+
+---
+
+## Curriculum Stage 3-Cont — Extended Cosine Schedule
+
+**What changed:** From S3 final model (not best checkpoint). Fresh cosine LR schedule restart from 2e-5. Same augmentation. Intended to push past the S3 plateau.
+
+**Result:** Peak F1 = **0.7510** @ epoch 50.0
+
+**Gain vs S3:** **+0.000 F1**
+
+**Interpretation:** The LR restart did not help — S3 had already converged. The model stays in the same 0.74–0.75 band the entire 50 epochs. This suggests the 224px + CE + augmentation combination has hit its ceiling.
+
+---
+
+## Curriculum MultiTask — Auxiliary Family/Genus Heads
+
+**What changed:** From S3-Cont final model. Added CE auxiliary heads for family and genus (weights 0.2×family + 0.3×genus + 1.0×species). Mixup/CutMix retained. LR = 3e-4 (higher — new heads need to train). 100 epochs.
+
+**Result:** Peak F1 = **0.7523** @ epoch 68.3
+
+**Gain vs S3-Cont:** +0.001 F1 net, but with a very different trajectory.
+
+**Key observation:** The new family/genus heads start randomly initialized → eval_on_start near-zero → slow recovery through ~40 epochs before exceeding S3-Cont. MultiTask eventually pulls ahead but the improvement is modest. The multi-task signal is providing regularization but not a dramatic accuracy boost on its own.
+
+---
+
+## Curriculum ArcFace — SubCenter ArcFace Metric Learning
+
+**What changed:** From MultiTask checkpoint. Replaced CE species head with SubCenter ArcFace (embedding=512, scale=30, margin=0.5, k=3 sub-centers). Mixup/CutMix disabled (incompatible with hard labels). Hybrid CE weight = 0.0. LR = 1e-4. 60 epochs.
+
+**Result:** Peak F1 = **0.7376** @ epoch 58.1
+
+**Gain vs MultiTask:** **–0.015 F1** — a regression.
+
+**Interpretation:** ArcFace starts from near-zero (random embedding + weight matrix initialization), takes ~40 epochs just to recover to MultiTask's level, and peaks 1.5% *below* the MultiTask checkpoint it started from. The loss function change required too many epochs to re-learn what CE had already learned. The 60-epoch budget was insufficient for ArcFace to amortize its warm-up cost and then improve further.
+
+---
+
+## Summary Table
+
+| Stage | Technique Added | Peak F1 | Δ vs Previous | Epochs to Peak |
+|-------|----------------|---------|--------------|----------------|
+| Baseline | CE, no augmentation | 0.7454 | — | 47.8 |
+| Aug (standalone) | Heavy aug, no curriculum | 0.6118 | –0.034 | 44.4 |
+| S1 | Mild aug (warm-up) | 0.7214 | –0.024* | 23.9 |
+| S2 | Medium aug | 0.7421 | +0.021 | 27.3 |
+| S3 | Heavy aug | 0.7510 | +0.009 | 41.0 |
+| S3-Cont | LR restart | 0.7510 | +0.000 | 50.0 |
+| MultiTask | Family/genus aux heads | 0.7523 | +0.001 | 68.3 |
+| ArcFace | Metric learning loss | 0.7376 | **–0.015** | 58.1 |
+
+\* S1 starts below baseline because it used fewer epochs (25 vs. 48 for baseline). Chaining S1→S2→S3 ultimately exceeds the baseline ceiling (0.751 vs. 0.745).
+
+---
+
+## Key Takeaways
+
+1. **Curriculum ordering matters critically.** Applying heavy augmentation cold destroyed performance (0.61). Applied progressively, it exceeds baseline (0.751 vs. 0.745).
+
+2. **The aug curriculum plateau is around 0.750–0.752.** S3, S3-Cont, and MultiTask all peak in this band. The 224px CE model appears structurally capped here.
+
+3. **MultiTask gave only marginal gain (+0.001).** The auxiliary signal helps slightly but the species task already dominates. More useful as regularization than as a direct accuracy booster.
+
+4. **ArcFace regressed.** The 60-epoch budget was too short — ArcFace requires a long cold-start recovery period before it can outperform CE. The hybrid/384 stages queued after it will inherit this disadvantage.
+
+5. **The gap to 0.80 is still ~5 points.** The most promising levers remaining are:
+   - **384px resolution** — larger receptive field is known to help fine-grained recognition
+   - **SWIN V2 architecture** — updated relative position bias and scaled cosine attention
+   - **Revisiting ArcFace** with a longer budget or frozen-backbone warm-up phase
diff --git a/finetuning/SWIN/SWIN_finetuning.py b/finetuning/SWIN/SWIN_finetuning.py
index ae4d6d0..df2f20d 100644
--- a/finetuning/SWIN/SWIN_finetuning.py
+++ b/finetuning/SWIN/SWIN_finetuning.py
@@ -51,9 +51,21 @@
 from transformers.utils.versions import require_version
 
 import wandb
+from transformers.integrations import WandbCallback
 
 os.environ['WANDB_DISABLED'] = 'false'
 
+_WANDB_CONFIG_BLOCKLIST = {"label2id", "id2label"}
+
+class FilteredWandbCallback(WandbCallback):
+    """WandbCallback that skips large, uninformative model config keys."""
+    def on_train_begin(self, args, state, control, model=None, **kwargs):
+        super().on_train_begin(args, state, control, model=model, **kwargs)
+        wandb.config.update(
+            {k: None for k in _WANDB_CONFIG_BLOCKLIST if k in wandb.config},
+            allow_val_change=True,
+        )
+
 
 """ Fine-tuning a 🤗 Transformers model for image classification"""
 
@@ -622,6 +634,8 @@ def val_transforms(example_batch):
         tokenizer=image_processor,
         data_collator=collate_fn,
     )
+    trainer.remove_callback(WandbCallback)
+    trainer.add_callback(FilteredWandbCallback)
 
     # Training
     if training_args.do_train:
diff --git a/finetuning/SWIN/SWIN_finetuning_advanced.py b/finetuning/SWIN/SWIN_finetuning_advanced.py
index db32b3b..9354479 100644
--- a/finetuning/SWIN/SWIN_finetuning_advanced.py
+++ b/finetuning/SWIN/SWIN_finetuning_advanced.py
@@ -16,16 +16,20 @@
 import argparse
 import logging
 import os
+import re
 import sys
 import yaml
 from dataclasses import dataclass, field
 from typing import Optional
 import random
 
+import math
+
 import evaluate
 import numpy as np
 import torch
 import torch.nn as nn
+import torch.nn.functional as F
 from datasets import load_dataset
 from PIL import Image
 from torchvision.transforms import (
@@ -50,6 +54,7 @@
     AutoModelForImageClassification,
     HfArgumentParser,
     Trainer,
+    TrainerCallback,
     TrainingArguments,
     set_seed,
 )
@@ -57,8 +62,20 @@
 from transformers.utils.versions import require_version
 
 import wandb
+from transformers.integrations import WandbCallback
+
+os.environ['WANDB_DISABLED'] = 'false'  # may be overridden by config
 
-os.environ['WANDB_DISABLED'] = 'false'
+_WANDB_CONFIG_BLOCKLIST = {"label2id", "id2label"}
+
+class FilteredWandbCallback(WandbCallback):
+    """WandbCallback that skips large, uninformative model config keys."""
+    def on_train_begin(self, args, state, control, model=None, **kwargs):
+        super().on_train_begin(args, state, control, model=model, **kwargs)
+        wandb.config.update(
+            {k: None for k in _WANDB_CONFIG_BLOCKLIST if k in wandb.config},
+            allow_val_change=True,
+        )
 
 """ Fine-tuning a 🤗 Transformers model for image classification with advanced augmentations"""
 
@@ -233,6 +250,13 @@ def __init__(self, base_model, num_families, num_genera, num_species):
         self.genus_classifier = nn.Linear(hidden_size, num_genera)
         self.species_classifier = nn.Linear(hidden_size, num_species)
 
+    def gradient_checkpointing_enable(self, **kwargs):
+        # Passthrough so HF Trainer's gradient_checkpointing flag reaches the backbone.
+        self.swin.gradient_checkpointing_enable(**kwargs)
+
+    def gradient_checkpointing_disable(self):
+        self.swin.gradient_checkpointing_disable()
+
     def forward(self, pixel_values, family_labels=None, genus_labels=None, species_labels=None, **kwargs):
         outputs = self.swin(pixel_values)
         pooled_output = outputs.pooler_output  # [batch_size, hidden_size]
@@ -260,17 +284,139 @@ def forward(self, pixel_values, family_labels=None, genus_labels=None, species_l
         }
 
 
+class SubCenterArcMarginProduct(nn.Module):
+    """SubCenter ArcFace margin head (k sub-centers per class for robustness to label noise)."""
+    def __init__(self, in_features, out_features, k=3, s=30.0, m=0.50, easy_margin=False):
+        super().__init__()
+        self.in_features = in_features
+        self.out_features = out_features
+        self.s = s
+        self.m = m
+        self.k = k
+        self.weight = nn.Parameter(torch.FloatTensor(out_features * k, in_features))
+        nn.init.xavier_uniform_(self.weight)
+        self.easy_margin = easy_margin
+        self.cos_m = math.cos(m)
+        self.sin_m = math.sin(m)
+        self.th = math.cos(math.pi - m)
+        self.mm = math.sin(math.pi - m) * m
+
+    def forward(self, embeddings, labels=None):
+        embeddings = F.normalize(embeddings, p=2, dim=1)
+        weight = F.normalize(self.weight, p=2, dim=1)
+        cosine = F.linear(embeddings, weight).view(-1, self.out_features, self.k)
+        cosine, _ = torch.max(cosine, dim=2)  # [B, num_classes]
+
+        if labels is None:
+            return cosine * self.s
+
+        sine = torch.sqrt((1.0 - cosine.pow(2)).clamp(0, 1))
+        phi = cosine * self.cos_m - sine * self.sin_m
+        phi = torch.where(cosine > self.th, phi, cosine - self.mm) if not self.easy_margin else torch.where(cosine > 0, phi, cosine)
+        one_hot = torch.zeros_like(cosine).scatter_(1, labels.view(-1, 1).long(), 1)
+        return (one_hot * phi + (1.0 - one_hot) * cosine) * self.s
+
+
+class SwinWithArcFace(nn.Module):
+    """
+    SWIN backbone + SubCenter ArcFace species head.
+    Optionally adds CE auxiliary heads for family/genus (multi-task).
+    Optionally blends a CE species head with ArcFace (hybrid loss).
+    """
+    def __init__(self, base_model, num_species, embedding_size=512, scale=30.0, margin=0.50,
+                 num_subcenters=3, num_families=None, num_genera=None,
+                 family_weight=0.2, genus_weight=0.3, hybrid_ce_weight=0.0):
+        super().__init__()
+        self.config = base_model.config
+        if hasattr(base_model, 'swinv2'):
+            self.swin = base_model.swinv2
+        elif hasattr(base_model, 'swin'):
+            self.swin = base_model.swin
+        else:
+            raise ValueError("Base model must have 'swin' or 'swinv2' attribute")
+
+        hidden_size = base_model.config.hidden_size
+        self.num_species = num_species
+        self.family_weight = family_weight
+        self.genus_weight = genus_weight
+        self.hybrid_ce_weight = hybrid_ce_weight
+
+        self.embedding = nn.Linear(hidden_size, embedding_size)
+        self.bn = nn.BatchNorm1d(embedding_size)
+        self.arcface = SubCenterArcMarginProduct(embedding_size, num_species, k=num_subcenters, s=scale, m=margin)
+
+        if hybrid_ce_weight > 0:
+            self.ce_classifier = nn.Linear(hidden_size, num_species)
+
+        self.use_multi_task = num_families is not None and num_genera is not None
+        if self.use_multi_task:
+            self.family_classifier = nn.Linear(hidden_size, num_families)
+            self.genus_classifier = nn.Linear(hidden_size, num_genera)
+
+    def gradient_checkpointing_enable(self, **kwargs):
+        # Passthrough so HF Trainer's gradient_checkpointing flag reaches the backbone.
+        self.swin.gradient_checkpointing_enable(**kwargs)
+
+    def gradient_checkpointing_disable(self):
+        self.swin.gradient_checkpointing_disable()
+
+    def forward(self, pixel_values, labels=None, family_labels=None, genus_labels=None, species_labels=None, **kwargs):
+        pooled = self.swin(pixel_values).pooler_output
+        embeddings = self.bn(self.embedding(pooled))
+        arc_labels = species_labels if species_labels is not None else labels
+
+        if arc_labels is not None:
+            arc_logits = self.arcface(embeddings, arc_labels)
+            arc_loss = F.cross_entropy(arc_logits, arc_labels)
+
+            if self.hybrid_ce_weight > 0:
+                ce_logits = self.ce_classifier(pooled)
+                ce_loss = F.cross_entropy(ce_logits, arc_labels)
+                w = self.hybrid_ce_weight
+                loss = (1 - w) * arc_loss + w * ce_loss
+                logits = torch.log((1 - w) * F.softmax(arc_logits, dim=1) + w * F.softmax(ce_logits, dim=1) + 1e-8)
+            else:
+                loss = arc_loss
+                logits = arc_logits
+
+            if self.use_multi_task and family_labels is not None and genus_labels is not None:
+                loss = loss + self.family_weight * F.cross_entropy(self.family_classifier(pooled), family_labels)
+                loss = loss + self.genus_weight * F.cross_entropy(self.genus_classifier(pooled), genus_labels)
+        else:
+            # Inference: cosine similarity, no margin
+            weight = F.normalize(self.arcface.weight, p=2, dim=1)
+            emb = F.normalize(embeddings, p=2, dim=1)
+            cosine = F.linear(emb, weight).view(-1, self.num_species, self.arcface.k)
+            cosine, _ = torch.max(cosine, dim=2)
+            arc_logits = cosine * self.arcface.s
+
+            if self.hybrid_ce_weight > 0:
+                ce_logits = self.ce_classifier(pooled)
+                w = self.hybrid_ce_weight
+                logits = torch.log((1 - w) * F.softmax(arc_logits, dim=1) + w * F.softmax(ce_logits, dim=1) + 1e-8)
+            else:
+                logits = arc_logits
+            loss = None
+
+        result = {'loss': loss, 'logits': logits}
+        if self.use_multi_task:
+            result['family_logits'] = self.family_classifier(pooled)
+            result['genus_logits'] = self.genus_classifier(pooled)
+        return result
+
+
 class MixupCutmixCollator:
     """
     Collator that applies Mixup and/or Cutmix augmentation.
     """
-    def __init__(self, mixup_alpha=0.8, cutmix_alpha=1.0, prob=0.5, label_smoothing=0.1, num_classes=1000, multi_task=False):
+    def __init__(self, mixup_alpha=0.8, cutmix_alpha=1.0, prob=0.5, label_smoothing=0.1, num_classes=1000, multi_task=False, label_column_name="label"):
         self.mixup_alpha = mixup_alpha
         self.cutmix_alpha = cutmix_alpha
         self.prob = prob
         self.label_smoothing = label_smoothing
         self.num_classes = num_classes
         self.multi_task = multi_task
+        self.label_column_name = label_column_name
 
     def __call__(self, examples):
         pixel_values = torch.stack([example["pixel_values"] for example in examples])
@@ -280,9 +426,11 @@ def __call__(self, examples):
         if "label" in examples[0]:
             labels = torch.tensor([example["label"] for example in examples])
         else:
-            # This is for validation/evaluation - no mixup/cutmix should be applied
-            # Just return the basic batch
-            result = {"pixel_values": pixel_values}
+            # Validation/evaluation — no mixup/cutmix, just collate cleanly
+            result = {
+                "pixel_values": pixel_values,
+                "labels": torch.tensor([example[self.label_column_name] for example in examples]),
+            }
 
             if self.multi_task and "family_label" in examples[0]:
                 result.update({
@@ -375,15 +523,49 @@ class MixupTrainer(Trainer):
     """
     Custom Trainer that handles Mixup/Cutmix loss computation and batch-wise evaluation.
     """
-    def __init__(self, *args, multi_task=False, **kwargs):
+    def __init__(self, *args, multi_task=False, arcface=False,
+                 logit_adjustment=False, log_prior=None, logit_adjustment_tau=1.0, **kwargs):
         super().__init__(*args, **kwargs)
         self.multi_task = multi_task
+        self.arcface = arcface
+        # Balanced-softmax / logit adjustment (Tier 1.3-A). log_prior is a
+        # [num_species] tensor of log class frequencies; added to the species
+        # logits during TRAINING only (never at inference) to down-weight head
+        # classes and lift macro-F1 on the long tail.
+        self.logit_adjustment = logit_adjustment
+        self.log_prior = log_prior
+        self.logit_adjustment_tau = logit_adjustment_tau
+
+    def _adjust_logits(self, logits):
+        """Add tau * log_prior to species logits (balanced softmax). No-op if disabled."""
+        if self.log_prior is None:
+            return logits
+        return logits + self.logit_adjustment_tau * self.log_prior.to(logits.device, logits.dtype)
 
     def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
         labels_a = inputs.pop("labels")
         labels_b = inputs.pop("labels_b", None)
         lam = inputs.pop("lam", 1.0)
 
+        smoothing = getattr(self.data_collator, 'label_smoothing', 0.0)
+
+        # ArcFace: model computes its own loss internally (no mixup/cutmix with ArcFace)
+        if self.arcface:
+            if self.multi_task:
+                family_labels = inputs.pop("family_labels", None)
+                genus_labels = inputs.pop("genus_labels", None)
+                species_labels = inputs.pop("species_labels", None)
+                outputs = model(
+                    pixel_values=inputs["pixel_values"],
+                    species_labels=species_labels,
+                    family_labels=family_labels,
+                    genus_labels=genus_labels,
+                )
+            else:
+                outputs = model(pixel_values=inputs["pixel_values"], labels=labels_a)
+            loss = outputs.get("loss")
+            return (loss, outputs) if return_outputs else loss
+
         # Handle multi-task learning
         if self.multi_task:
             family_labels = inputs.pop("family_labels")
@@ -402,12 +584,13 @@ def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=N
 
             # If we have mixup/cutmix, we need to manually compute the mixed loss
             if family_labels_b is not None:
-                loss_fct = nn.CrossEntropyLoss()
+                loss_fct = nn.CrossEntropyLoss(label_smoothing=smoothing)
 
                 # Get logits for each taxonomy level
                 family_logits = outputs.get("family_logits")
                 genus_logits = outputs.get("genus_logits")
-                species_logits = outputs.get("species_logits")
+                # Balanced softmax: adjust species logits only (the long-tailed target)
+                species_logits = self._adjust_logits(outputs.get("species_logits"))
 
                 # Compute mixed losses
                 family_loss = lam * loss_fct(family_logits, family_labels) + (1 - lam) * loss_fct(family_logits, family_labels_b)
@@ -416,6 +599,16 @@ def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=N
 
                 # Combined loss with same weighting as the model
                 loss = species_loss + 0.3 * genus_loss + 0.2 * family_loss
+            elif self.logit_adjustment:
+                # No mixup this batch, but balanced softmax is on: recompute the
+                # combined loss in-trainer so the species logits get the log-prior
+                # (the model's internal loss does not apply it).
+                loss_fct = nn.CrossEntropyLoss(label_smoothing=smoothing)
+                species_logits = self._adjust_logits(outputs.get("species_logits"))
+                species_loss = loss_fct(species_logits, species_labels)
+                genus_loss = loss_fct(outputs.get("genus_logits"), genus_labels)
+                family_loss = loss_fct(outputs.get("family_logits"), family_labels)
+                loss = species_loss + 0.3 * genus_loss + 0.2 * family_loss
             else:
                 # Model already computed the loss
                 loss = outputs.get("loss")
@@ -423,17 +616,14 @@ def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=N
             return (loss, outputs) if return_outputs else loss
         else:
             # Standard single-task training
-            # For standard models, only pass pixel_values (labels handled separately)
             outputs = model(pixel_values=inputs["pixel_values"])
-            logits = outputs.get("logits")
+            logits = self._adjust_logits(outputs.get("logits"))  # balanced softmax (no-op if disabled)
+            loss_fct = nn.CrossEntropyLoss(label_smoothing=smoothing)
 
             if labels_b is not None:
                 # Mixup/Cutmix loss
-                loss_fct = nn.CrossEntropyLoss()
                 loss = lam * loss_fct(logits, labels_a) + (1 - lam) * loss_fct(logits, labels_b)
             else:
-                # Standard loss
-                loss_fct = nn.CrossEntropyLoss()
                 loss = loss_fct(logits, labels_a)
 
             return (loss, outputs) if return_outputs else loss
@@ -508,7 +698,10 @@ def evaluation_loop(
                 labels = labels_a
 
             with torch.no_grad():
-                if self.multi_task:
+                if self.arcface:
+                    # ArcFace inference: no labels → cosine similarity logits (no margin)
+                    outputs = model(pixel_values=inputs["pixel_values"])
+                elif self.multi_task:
                     # For multi-task, pass all labels to model
                     outputs = model(
                         pixel_values=inputs["pixel_values"],
@@ -582,6 +775,53 @@ def evaluation_loop(
         )
 
 
+class EMACallback(TrainerCallback):
+    """
+    Exponential Moving Average of model weights (Tier 2.6).
+
+    Keeps a shadow copy of the trainable parameters, updated every optimizer
+    step as shadow = decay*shadow + (1-decay)*param. At the end of training the
+    EMA weights are copied into the model, so the final `trainer.evaluate()` and
+    `trainer.save_model()` both reflect the averaged weights (typically a steady
+    +0.2-0.5% for ~free). Only parameters are averaged; buffers (e.g. BN running
+    stats) are left as-is.
+
+    Note: do not combine with `load_best_model_at_end: true` — the best-checkpoint
+    reload happens before this callback and would be overwritten by the EMA copy.
+    """
+    def __init__(self, decay=0.9998):
+        self.decay = decay
+        self.shadow = None
+
+    def on_train_begin(self, args, state, control, model=None, **kwargs):
+        self.shadow = {
+            n: p.detach().clone().float()
+            for n, p in model.named_parameters() if p.requires_grad
+        }
+        print(f"__CUSTOM__: EMA enabled (decay={self.decay}); tracking {len(self.shadow)} parameter tensors")
+
+    def on_step_end(self, args, state, control, model=None, **kwargs):
+        if self.shadow is None:
+            return
+        d = self.decay
+        with torch.no_grad():
+            for n, p in model.named_parameters():
+                if n in self.shadow:
+                    self.shadow[n].mul_(d).add_(p.detach().float(), alpha=1.0 - d)
+
+    def copy_to_model(self, model):
+        if self.shadow is None:
+            return
+        with torch.no_grad():
+            for n, p in model.named_parameters():
+                if n in self.shadow:
+                    p.data.copy_(self.shadow[n].to(p.dtype))
+
+    def on_train_end(self, args, state, control, model=None, **kwargs):
+        print("__CUSTOM__: Copying EMA weights into model for final eval/save")
+        self.copy_to_model(model)
+
+
 def load_config_from_yaml(config_path):
     """Load configuration from YAML file."""
     with open(config_path, 'r') as f:
@@ -589,17 +829,21 @@ def load_config_from_yaml(config_path):
     return config
 
 
-def build_multi_crop_transforms(crop_sizes, target_size, image_mean, image_std):
-    """Returns one Compose transform per crop size for multi-crop TTA."""
-    return [
-        Compose([
-            Resize(crop_size),
-            CenterCrop(target_size),
-            ToTensor(),
-            Normalize(mean=image_mean, std=image_std),
-        ])
-        for crop_size in crop_sizes
-    ]
+def build_multi_crop_transforms(crop_sizes, target_size, image_mean, image_std, flip=False):
+    """
+    Returns Compose transforms for multi-crop TTA (Tier 2.7).
+
+    One transform per crop size; if `flip` is True, also emit a horizontally
+    flipped variant of each crop, so logits are averaged over crops x {orig, flip}.
+    """
+    norm = Normalize(mean=image_mean, std=image_std)
+    transforms = []
+    for crop_size in crop_sizes:
+        base = [Resize(crop_size), CenterCrop(target_size)]
+        transforms.append(Compose(base + [ToTensor(), norm]))
+        if flip:
+            transforms.append(Compose(base + [RandomHorizontalFlip(p=1.0), ToTensor(), norm]))
+    return transforms
 
 
 def multi_crop_evaluate(model, filepaths, labels, crop_transforms, device, compute_metrics_fn):
@@ -640,15 +884,73 @@ def multi_crop_evaluate(model, filepaths, labels, crop_transforms, device, compu
     return metrics
 
 
+def _resolve_num_workers(n):
+    """Return n, or all scheduler-allocated CPUs when n == -1."""
+    if n == -1:
+        try:
+            return len(os.sched_getaffinity(0))
+        except AttributeError:
+            return os.cpu_count() or 8
+    return n
+
+
+def _relocate_output_dir(path):
+    """
+    Re-root an output/logging path to the workspace this script is actually
+    running from, instead of whoever authored the config.
+
+    Configs in this repo hardcode paths like
+    /projectnb/herbdl/workspaces/<author>/herbdl/finetuning/output/SWIN/<NAME>.
+    When a different user runs the same config, rewrite the
+    `.../workspaces/<author>/herbdl` prefix to this checkout's repo root so the
+    run is written under the runner's own workspace rather than the author's.
+    The trailing run name (.../output/SWIN/<NAME>) is preserved. No-op if the
+    path doesn't match that layout or is already under this repo. Set
+    HERBDL_NO_RELOCATE=1 to disable (e.g. to write elsewhere on purpose).
+    """
+    if not isinstance(path, str) or not path or os.environ.get('HERBDL_NO_RELOCATE'):
+        return path
+    # repo root = three levels up from this file: .../herbdl/finetuning/SWIN/<file>
+    repo_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+    m = re.match(r'^(.*/workspaces/[^/]+/herbdl)(/.*)?$', path)
+    if not m:
+        return path
+    relocated = repo_root + (m.group(2) or '')
+    if relocated != path:
+        print(f"__CUSTOM__: Relocated output path to current workspace:\n"
+              f"             from {path}\n               to {relocated}")
+    return relocated
+
+
 def main():
     # Parse command line arguments for config file
     arg_parser = argparse.ArgumentParser(description="SWIN Fine-tuning with advanced augmentations")
     arg_parser.add_argument('--config', type=str, required=True, help='Path to YAML config file')
+    arg_parser.add_argument('--set', metavar='KEY=VALUE', action='append', default=[],
+                            help='Override a config value using dotted key notation, e.g. training.seed=42')
     args = arg_parser.parse_args()
 
     # Load YAML config
     config = load_config_from_yaml(args.config)
 
+    # Apply --set overrides
+    def _coerce(v):
+        for cast in (int, float):
+            try: return cast(v)
+            except ValueError: pass
+        if v.lower() in ('true', 'false'):
+            return v.lower() == 'true'
+        return v
+
+    for override in args.set:
+        key, _, value = override.partition('=')
+        parts = key.split('.')
+        d = config
+        for part in parts[:-1]:
+            d = d[part]
+        d[parts[-1]] = _coerce(value)
+        print(f"Config override: {key} = {d[parts[-1]]!r}")
+
     # Extract custom parameters
     learning_rate_type = config['custom']['lr_type']
     frozen = config['custom']['frozen']
@@ -666,6 +968,17 @@ def main():
     multi_crop_enabled = multi_crop_config.get('enabled', False)
     multi_crop_sizes = multi_crop_config.get('crop_sizes', [256, 288, 320, 384, 448])
     multi_crop_target_size = multi_crop_config.get('target_size', 224)
+    multi_crop_flip = multi_crop_config.get('flip', False)
+
+    # Extract long-tail (balanced softmax / logit adjustment) parameters — Tier 1.3-A
+    long_tail_config = config.get('long_tail', {})
+    use_logit_adjustment = long_tail_config.get('logit_adjustment', False)
+    logit_adjustment_tau = long_tail_config.get('tau', 1.0)
+
+    # Extract EMA parameters — Tier 2.6
+    ema_config = config.get('ema', {})
+    use_ema = ema_config.get('enabled', False)
+    ema_decay = ema_config.get('decay', 0.9998)
 
     # Extract multi-task learning parameters
     multi_task_config = config.get('multi_task', {})
@@ -675,14 +988,28 @@ def main():
     genus_weight = multi_task_config.get('genus_weight', 0.3)
     species_weight = multi_task_config.get('species_weight', 1.0)
 
+    # Extract ArcFace parameters
+    arcface_config = config.get('arcface', {})
+    use_arcface = arcface_config.get('enabled', False)
+    arcface_embedding_size = arcface_config.get('embedding_size', 512)
+    arcface_scale = arcface_config.get('scale', 30.0)
+    arcface_margin = arcface_config.get('margin', 0.50)
+    arcface_num_subcenters = arcface_config.get('num_subcenters', 3)
+    arcface_hybrid_ce_weight = arcface_config.get('hybrid_ce_weight', 0.0)
+
     print(f"__CUSTOM__: Learning rate type: {learning_rate_type}")
     print(f"__CUSTOM__: Frozen: {frozen}")
     print(f"__CUSTOM__: Frozen type: {frozen_type}")
     print(f"__CUSTOM__: Advanced augmentation: {use_advanced_aug}")
     print(f"__CUSTOM__: Multi-task learning: {use_multi_task}")
+    print(f"__CUSTOM__: ArcFace: {use_arcface}")
+    print(f"__CUSTOM__: Logit adjustment (balanced softmax): {use_logit_adjustment} (tau={logit_adjustment_tau})")
+    print(f"__CUSTOM__: Weight EMA: {use_ema} (decay={ema_decay})")
     if use_multi_task:
         print(f"__CUSTOM__: Min species samples: {min_species_samples}")
         print(f"__CUSTOM__: Loss weights - Family: {family_weight}, Genus: {genus_weight}, Species: {species_weight}")
+    if use_arcface:
+        print(f"__CUSTOM__: ArcFace embedding_size={arcface_embedding_size}, scale={arcface_scale}, margin={arcface_margin}, k={arcface_num_subcenters}, hybrid_ce_weight={arcface_hybrid_ce_weight}")
 
     # Create ModelArguments from config
     model_args = ModelArguments(
@@ -714,17 +1041,27 @@ def main():
         train_val_split=config['data']['train_val_split'],
     )
 
+    # Warmup: transformers requires `warmup_steps` to be an int. Configs in this repo
+    # follow the convention that a float in (0, 1) means "fraction of total steps" — route
+    # those to `warmup_ratio` instead (e.g. 0.05 -> 5% warmup); ints pass through as steps.
+    _warmup = config['training'].get('warmup_steps', 0)
+    if isinstance(_warmup, float) and 0.0 < _warmup < 1.0:
+        _warmup_steps, _warmup_ratio = 0, _warmup
+    else:
+        _warmup_steps, _warmup_ratio = int(_warmup), 0.0
+
     # Create TrainingArguments from config
     training_args = TrainingArguments(
-        output_dir=config['training']['output_dir'],
-        logging_dir=config['training']['logging_dir'],
+        output_dir=_relocate_output_dir(config['training']['output_dir']),
+        logging_dir=_relocate_output_dir(config['training']['logging_dir']),
         do_train=config['training']['do_train'],
         do_eval=config['training']['do_eval'],
         per_device_train_batch_size=config['training']['per_device_train_batch_size'],
         per_device_eval_batch_size=config['training']['per_device_eval_batch_size'],
-        learning_rate=config['training']['learning_rate'],
+        learning_rate=float(config['training']['learning_rate']),
         num_train_epochs=config['training']['num_train_epochs'],
-        warmup_steps=config['training']['warmup_steps'],
+        warmup_steps=_warmup_steps,
+        warmup_ratio=_warmup_ratio,
         weight_decay=config['training']['weight_decay'],
         gradient_accumulation_steps=config['training']['gradient_accumulation_steps'],
         lr_scheduler_type=config['training']['lr_scheduler_type'],
@@ -732,14 +1069,19 @@ def main():
         save_strategy=config['training']['save_strategy'],
         save_total_limit=config['training']['save_total_limit'],
         eval_strategy=config['training']['eval_strategy'],
-        eval_steps=config['training']['eval_steps'],
-        report_to=config['training']['report_to'],
+        eval_steps=config['training'].get('eval_steps', None),  # only used when eval_strategy == "steps"
+        report_to=config['training']['report_to'] if config['wandb'].get('enabled', True) else 'none',
         bf16=config['training']['bf16'],
-        dataloader_num_workers=config['training']['dataloader_num_workers'],
+        dataloader_num_workers=_resolve_num_workers(config['training']['dataloader_num_workers']),
+        dataloader_pin_memory=config['training'].get('dataloader_pin_memory', True),
         remove_unused_columns=config['training']['remove_unused_columns'],
-        overwrite_output_dir=config['training']['overwrite_output_dir'],
         seed=config['training']['seed'],
         label_smoothing_factor=aug_config.get('label_smoothing', 0.0) if use_advanced_aug else 0.0,
+        eval_on_start=config['training'].get('eval_on_start', False),
+        torch_compile=config['training'].get('torch_compile', False),
+        gradient_checkpointing=config['training'].get('gradient_checkpointing', False),
+        gradient_checkpointing_kwargs=config['training'].get('gradient_checkpointing_kwargs', None),
+        load_best_model_at_end=config['training'].get('load_best_model_at_end', False),
     )
 
     # Setup logging
@@ -750,52 +1092,56 @@ def main():
     )
 
     # Initialize wandb with complete config
-    wandb_config = {
-        # Model config
-        "model_name": model_args.model_name_or_path,
-        "model_revision": model_args.model_revision,
-        "ignore_mismatched_sizes": model_args.ignore_mismatched_sizes,
-        # Data config
-        "train_file": data_args.train_file,
-        "validation_file": data_args.validation_file,
-        "image_column_name": data_args.image_column_name,
-        "label_column_name": data_args.label_column_name,
-        "max_train_samples": data_args.max_train_samples,
-        "max_eval_samples": data_args.max_eval_samples,
-        "train_val_split": data_args.train_val_split,
-        # Training config
-        "learning_rate": training_args.learning_rate,
-        "per_device_train_batch_size": training_args.per_device_train_batch_size,
-        "per_device_eval_batch_size": training_args.per_device_eval_batch_size,
-        "num_train_epochs": training_args.num_train_epochs,
-        "warmup_steps": training_args.warmup_steps,
-        "weight_decay": training_args.weight_decay,
-        "gradient_accumulation_steps": training_args.gradient_accumulation_steps,
-        "lr_scheduler_type": training_args.lr_scheduler_type,
-        "bf16": training_args.bf16,
-        "seed": training_args.seed,
-        # Custom config
-        "frozen": frozen,
-        "frozen_type": frozen_type,
-        "learning_rate_type": learning_rate_type,
-        # Augmentation config
-        "use_advanced_augmentation": use_advanced_aug,
-        "augmentation_config": aug_config if use_advanced_aug else None,
-    }
+    if config['wandb'].get('enabled', True):
+        wandb_config = {
+            # Model config
+            "model_name": model_args.model_name_or_path,
+            "model_revision": model_args.model_revision,
+            "ignore_mismatched_sizes": model_args.ignore_mismatched_sizes,
+            # Data config
+            "train_file": data_args.train_file,
+            "validation_file": data_args.validation_file,
+            "image_column_name": data_args.image_column_name,
+            "label_column_name": data_args.label_column_name,
+            "max_train_samples": data_args.max_train_samples,
+            "max_eval_samples": data_args.max_eval_samples,
+            "train_val_split": data_args.train_val_split,
+            # Training config
+            "learning_rate": training_args.learning_rate,
+            "per_device_train_batch_size": training_args.per_device_train_batch_size,
+            "per_device_eval_batch_size": training_args.per_device_eval_batch_size,
+            "num_train_epochs": training_args.num_train_epochs,
+            "warmup_steps": training_args.warmup_steps,
+            "weight_decay": training_args.weight_decay,
+            "gradient_accumulation_steps": training_args.gradient_accumulation_steps,
+            "lr_scheduler_type": training_args.lr_scheduler_type,
+            "bf16": training_args.bf16,
+            "seed": training_args.seed,
+            # Custom config
+            "frozen": frozen,
+            "frozen_type": frozen_type,
+            "learning_rate_type": learning_rate_type,
+            # Augmentation config
+            "use_advanced_augmentation": use_advanced_aug,
+            "augmentation_config": aug_config if use_advanced_aug else None,
+        }
 
-    wandb.init(
-        entity=config['wandb']['entity'],
-        project=config['wandb']['project'],
-        resume=config['wandb']['resume'],
-        name=run_name,
-        group=run_group,
-        id=run_id,
-        config=wandb_config
-    )
+        wandb.init(
+            entity=config['wandb']['entity'],
+            project=config['wandb']['project'],
+            resume=config['wandb']['resume'],
+            name=run_name,
+            group=run_group,
+            id=run_id,
+            config=wandb_config,
+            notes=config['custom']['run_notes']
+        )
+    else:
+        os.environ['WANDB_DISABLED'] = 'true'
 
     # Set the learning rate scheduler parameters from config
     if 'lr_scheduler_kwargs' in config['training'] and config['training']['lr_scheduler_kwargs']:
-        training_args.learning_rate_kwargs = config['training']['lr_scheduler_kwargs']
+        training_args.lr_scheduler_kwargs = config['training']['lr_scheduler_kwargs']
 
     if training_args.should_log:
         transformers.utils.logging.set_verbosity_info()
@@ -815,18 +1161,19 @@ def main():
     logger.info(f"Training/evaluation parameters {training_args}")
 
     # Detecting last checkpoint
+    overwrite_output_dir = config['training'].get('overwrite_output_dir', False)
     last_checkpoint = None
-    if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir:
+    if os.path.isdir(training_args.output_dir) and training_args.do_train and not overwrite_output_dir:
         last_checkpoint = get_last_checkpoint(training_args.output_dir)
         if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0:
             raise ValueError(
                 f"Output directory ({training_args.output_dir}) already exists and is not empty. "
-                "Use --overwrite_output_dir to overcome."
+                "Set overwrite_output_dir: true in your config to overcome."
             )
         elif last_checkpoint is not None and training_args.resume_from_checkpoint is None:
             logger.info(
                 f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change "
-                "the `--output_dir` or add `--overwrite_output_dir` to train from scratch."
+                "the `output_dir` or set `overwrite_output_dir: true` in your config to train from scratch."
             )
 
     # Set seed before initializing model
@@ -996,15 +1343,27 @@ def compute_metrics(p):
                 species_logits = p.predictions
 
             species_predictions = np.argmax(species_logits, axis=1) if len(species_logits.shape) > 1 else species_logits
+            family_predictions = np.argmax(family_logits, axis=1) if 'family_logits' in locals() and len(family_logits.shape) > 1 else None
+            genus_predictions = np.argmax(genus_logits, axis=1) if 'genus_logits' in locals() and len(genus_logits.shape) > 1 else None
 
             # Compute species accuracy (primary metric)
             species_accuracy = accuracy_metric.compute(predictions=species_predictions, references=p.label_ids)["accuracy"]
-            species_f1 = f1_metric.compute(predictions=species_predictions, references=p.label_ids, average="weighted")["f1"]
+            species_f1 = f1_metric.compute(predictions=species_predictions, references=p.label_ids, average="macro")["f1"]
+
+            family_accuracy = accuracy_metric.compute(predictions=family_predictions, references=p.label_ids)["accuracy"] if family_predictions is not None else None
+            genus_accuracy = accuracy_metric.compute(predictions=genus_predictions, references=p.label_ids)["accuracy"] if genus_predictions is not None else None
+
+            family_f1 = f1_metric.compute(predictions=family_predictions, references=p.label_ids, average="macro")["f1"] if family_predictions is not None else None
+            genus_f1 = f1_metric.compute(predictions=genus_predictions, references=p.label_ids, average="macro")["f1"] if genus_predictions is not None else None
 
             metrics = {
                 "accuracy": species_accuracy,  # Primary accuracy is species
                 "species_accuracy": species_accuracy,
                 "species_f1": species_f1,
+                "family_accuracy": family_accuracy,
+                "genus_accuracy": genus_accuracy,
+                "family_f1": family_f1,
+                "genus_f1": genus_f1
             }
 
             # If we have genus and family logits, compute their accuracies too
@@ -1034,15 +1393,95 @@ def compute_metrics(p):
             else: # Predictions contain label indices
                 predictions = p.predictions
             accuracy = accuracy_metric.compute(predictions=predictions, references=p.label_ids)["accuracy"]
-            f1_score = f1_metric.compute(predictions=predictions, references=p.label_ids, average="weighted")["f1"]
+            f1_score = f1_metric.compute(predictions=predictions, references=p.label_ids, average="macro")["f1"]
 
             return {
                 "accuracy": accuracy,
                 "f1": f1_score
             }
 
-    # Create model based on whether multi-task learning is enabled
-    if use_multi_task:
+    # Long-tail: per-class log-prior for balanced softmax (Tier 1.3-A).
+    # Computed once over the (already filtered/split) training set, in the SAME
+    # class-index space the species/CE head outputs, so it lines up with the logits.
+    log_prior = None
+    if use_logit_adjustment:
+        from collections import Counter
+        if use_multi_task:
+            sp2id = hierarchical_mappings['species2id']
+            cnt = Counter(sp2id[s] for s in dataset["train"]["species"])
+            n_cls = hierarchical_mappings['num_species']
+        else:
+            cnt = Counter(dataset["train"][data_args.label_column_name])
+            n_cls = num_labels
+        freq = np.array([cnt.get(i, 0) for i in range(n_cls)], dtype=np.float64)
+        freq = freq / max(freq.sum(), 1.0)
+        freq = np.clip(freq, 1e-12, None)  # floor empty classes so log() is finite
+        log_prior = torch.tensor(np.log(freq), dtype=torch.float32)
+        print(f"__CUSTOM__: Balanced softmax log-prior built over {n_cls} classes "
+              f"(min/max log-prior = {log_prior.min().item():.3f}/{log_prior.max().item():.3f})")
+
+    # Create model based on which objectives are enabled
+    if use_arcface:
+        print("__CUSTOM__: Creating SwinWithArcFace model")
+        arc_num_species = hierarchical_mappings['num_species'] if use_multi_task else num_labels
+        config_obj = AutoConfig.from_pretrained(
+            model_args.config_name or model_args.model_name_or_path,
+            num_labels=arc_num_species,
+            finetuning_task="image-classification",
+            cache_dir=model_args.cache_dir,
+            revision=model_args.model_revision,
+            token=model_args.token,
+            trust_remote_code=model_args.trust_remote_code,
+        )
+        base_model = AutoModelForImageClassification.from_pretrained(
+            model_args.model_name_or_path,
+            from_tf=bool(".ckpt" in model_args.model_name_or_path),
+            config=config_obj,
+            cache_dir=model_args.cache_dir,
+            revision=model_args.model_revision,
+            token=model_args.token,
+            trust_remote_code=model_args.trust_remote_code,
+            ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,
+        )
+        model = SwinWithArcFace(
+            base_model,
+            num_species=arc_num_species,
+            embedding_size=arcface_embedding_size,
+            scale=arcface_scale,
+            margin=arcface_margin,
+            num_subcenters=arcface_num_subcenters,
+            num_families=hierarchical_mappings['num_families'] if use_multi_task else None,
+            num_genera=hierarchical_mappings['num_genera'] if use_multi_task else None,
+            family_weight=family_weight,
+            genus_weight=genus_weight,
+            hybrid_ce_weight=arcface_hybrid_ce_weight,
+        )
+        print(f"__CUSTOM__: SwinWithArcFace created — num_species={arc_num_species}, "
+              f"multi_task={use_multi_task}, hybrid_ce_weight={arcface_hybrid_ce_weight}")
+        # Overlay non-backbone weights from checkpoint (preserves embedding/arcface/CE heads
+        # when chaining ArcFace→Hybrid or ArcFace→384, since AutoModelForImageClassification
+        # only maps swin.* keys and discards the custom heads).
+        _ckpt_dir = model_args.model_name_or_path
+        if os.path.isdir(_ckpt_dir):
+            _st_path = os.path.join(_ckpt_dir, 'model.safetensors')
+            _bin_path = os.path.join(_ckpt_dir, 'pytorch_model.bin')
+            if os.path.exists(_st_path) or os.path.exists(_bin_path):
+                try:
+                    if os.path.exists(_st_path):
+                        from safetensors.torch import load_file as _load_st
+                        _ckpt_sd = _load_st(_st_path)
+                    else:
+                        _ckpt_sd = torch.load(_bin_path, map_location='cpu')
+                    # Only load keys that are NOT backbone (swin.*) — backbone already loaded
+                    # and handles window-size mismatches (e.g. 224→384) via ignore_mismatched_sizes.
+                    _non_backbone = {k: v for k, v in _ckpt_sd.items() if not k.startswith('swin.')}
+                    _res = model.load_state_dict(_non_backbone, strict=False)
+                    _loaded = len(_non_backbone) - len(_res.missing_keys)
+                    print(f"__CUSTOM__: Overlaid {_loaded}/{len(_non_backbone)} non-backbone weights from checkpoint")
+                except Exception as _e:
+                    print(f"__CUSTOM__: Could not overlay non-backbone weights: {_e}")
+
+    elif use_multi_task:
         print("__CUSTOM__: Creating multi-task SWIN model")
 
         # First load a base model
@@ -1221,9 +1660,6 @@ def val_transforms(example_batch):
             _val_transforms(Image.open(pil_img).convert("RGB")) for pil_img in example_batch[data_args.image_column_name]
         ]
 
-        # Keep the label for the collator/trainer
-        example_batch["label"] = example_batch[data_args.label_column_name]
-
         # Add hierarchical labels for multi-task learning
         if use_multi_task:
             example_batch["family_label"] = [
@@ -1274,12 +1710,18 @@ def val_transforms(example_batch):
     else:
         collator_num_classes = num_labels
 
-    if use_mixup_cutmix or use_multi_task:
-        # Use mixup collator (can handle both mixup/cutmix and multi-task)
+    if use_mixup_cutmix or use_multi_task or use_arcface or use_logit_adjustment:
+        # Use mixup collator (can handle mixup/cutmix, multi-task, and arcface).
+        # Also used for plain single-task + balanced softmax (mixup disabled), since
+        # the logit adjustment lives in MixupTrainer.compute_loss.
         if use_mixup_cutmix:
             print("__CUSTOM__: Using Mixup/CutMix data collator" + (" with multi-task support" if use_multi_task else ""))
-        else:
+        elif use_arcface:
+            print("__CUSTOM__: Using ArcFace data collator" + (" with multi-task support" if use_multi_task else ""))
+        elif use_multi_task:
             print("__CUSTOM__: Using multi-task data collator")
+        else:
+            print("__CUSTOM__: Using data collator (balanced softmax, no mixup)")
 
         data_collator = MixupCutmixCollator(
             mixup_alpha=aug_config.get('mixup', {}).get('alpha', 0.8) if use_mixup_cutmix else 0,
@@ -1287,19 +1729,24 @@ def val_transforms(example_batch):
             prob=aug_config.get('mixup_cutmix_prob', 0.5) if use_mixup_cutmix else 0,
             label_smoothing=aug_config.get('label_smoothing', 0.1),
             num_classes=collator_num_classes,
-            multi_task=use_multi_task
+            multi_task=use_multi_task,
+            label_column_name=data_args.label_column_name,
         )
 
-        # Use custom trainer for mixup/cutmix loss or multi-task learning
+        # Use custom trainer for mixup/cutmix loss, multi-task learning, or arcface
         trainer = MixupTrainer(
             model=model,
             args=training_args,
             train_dataset=dataset["train"] if training_args.do_train else None,
             eval_dataset=dataset["validation"] if training_args.do_eval else None,
             compute_metrics=compute_metrics,
-            tokenizer=image_processor,
+            processing_class=image_processor,
             data_collator=data_collator,
             multi_task=use_multi_task,
+            arcface=use_arcface,
+            logit_adjustment=use_logit_adjustment,
+            log_prior=log_prior,
+            logit_adjustment_tau=logit_adjustment_tau,
             preprocess_logits_for_metrics=preprocess_logits_for_metrics
         )
     else:
@@ -1310,11 +1757,20 @@ def val_transforms(example_batch):
             train_dataset=dataset["train"] if training_args.do_train else None,
             eval_dataset=dataset["validation"] if training_args.do_eval else None,
             compute_metrics=compute_metrics,
-            tokenizer=image_processor,
+            processing_class=image_processor,
             data_collator=collate_fn,
             preprocess_logits_for_metrics=preprocess_logits_for_metrics
         )
 
+    # Swap in filtered W&B callback to suppress label2id/id2label from config uploads.
+    trainer.remove_callback(WandbCallback)
+    trainer.add_callback(FilteredWandbCallback)
+
+    # Weight EMA (Tier 2.6): copies averaged weights into the model at train end,
+    # so the final evaluate()/save_model() below reflect the EMA weights.
+    if use_ema:
+        trainer.add_callback(EMACallback(decay=ema_decay))
+
     # Training
     if training_args.do_train:
         checkpoint = None
@@ -1325,6 +1781,10 @@ def val_transforms(example_batch):
 
         train_result = trainer.train(resume_from_checkpoint=checkpoint)
         trainer.save_model()
+        # Ensure config.json is always present for custom (non-PreTrainedModel) wrappers
+        # so that downstream stages can call AutoConfig.from_pretrained on this directory.
+        if hasattr(model, 'config') and not isinstance(model, transformers.PreTrainedModel):
+            model.config.save_pretrained(training_args.output_dir)
         trainer.log_metrics("train", train_result.metrics)
         trainer.save_metrics("train", train_result.metrics)
         trainer.save_state()
@@ -1343,6 +1803,7 @@ def val_transforms(example_batch):
             target_size=multi_crop_target_size,
             image_mean=image_processor.image_mean,
             image_std=image_processor.image_std,
+            flip=multi_crop_flip,
         )
         multi_crop_evaluate(
             model=model,
diff --git a/finetuning/SWIN/configs/swin_base_baseline.yml b/finetuning/SWIN/configs/swin_base_baseline.yml
new file mode 100644
index 0000000..92e2a23
--- /dev/null
+++ b/finetuning/SWIN/configs/swin_base_baseline.yml
@@ -0,0 +1,71 @@
+# SWIN Base Unfrozen (Full Fine-tuning) Configuration
+# Model configuration
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+# Data configuration
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+# Training configuration
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_B_BASELINE_HIGHER_LR"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_B_BASELINE_HIGHER_LR"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0002  # Lower learning rate for full fine-tuning
+  num_train_epochs: 50
+  warmup_steps: 0.05
+  weight_decay: 0.01  # Add weight decay for regularization
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  lr_scheduler_kwargs:
+    eta_min: 0.000001
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+# Custom configuration
+custom:
+  lr_type: "cosine"
+  frozen: false  # No freezing - full fine-tuning
+  frozen_type: "none"
+  run_group: "SWIN_Base"
+  run_name: "SWIN_Base_Baseline_HigherLR"
+  run_id: "swin_base_baseline_highlr_052026"
+
+# WandB configuration
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs/swin_base_pretrained_linear.yml b/finetuning/SWIN/configs/swin_base_pretrained_linear.yml
new file mode 100644
index 0000000..389527f
--- /dev/null
+++ b/finetuning/SWIN/configs/swin_base_pretrained_linear.yml
@@ -0,0 +1,65 @@
+# SWIN Base — clean fine-tune from ImageNet-22k pretrained weights
+# Linear LR schedule, 5-epoch warmup (1.25e-7 → 1.25e-4), no heavy augs, no multi-task
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_PRETRAINED_LINEAR"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_PRETRAINED_LINEAR"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 1.25e-4
+  num_train_epochs: 50
+  warmup_steps: 0.05        # 5 epochs out of 100
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "linear"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 8
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "linear"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base"
+  run_name: "SWIN_Base_Pretrained_Linear"
+  run_id: "swin_base_pretrained_linear_052026"
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs/swin_base_unfrozen_15k.yml b/finetuning/SWIN/configs/swin_base_unfrozen_15k.yml
index 2287c59..333fd6a 100644
--- a/finetuning/SWIN/configs/swin_base_unfrozen_15k.yml
+++ b/finetuning/SWIN/configs/swin_base_unfrozen_15k.yml
@@ -29,8 +29,8 @@ data:
 
 # Training configuration
 training:
-  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_UNFROZEN_15K"
-  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_UNFROZEN_15K"
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_UNFROZEN_MACRO"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_UNFROZEN_MACRO"
   do_train: true
   do_eval: true
   per_device_train_batch_size: 128
@@ -61,8 +61,8 @@ custom:
   frozen: false  # No freezing - full fine-tuning
   frozen_type: "none"
   run_group: "SWIN_Base"
-  run_name: "SWIN_Base_Unfrozen_15K"
-  run_id: "swin_base_unfrozen_15k_012626"
+  run_name: "SWIN_Base_Baseline"
+  run_id: "swin_base_unfrozen_15k_040326"
 
 # WandB configuration
 wandb:
diff --git a/finetuning/SWIN/configs_advanced/swin_base_224_arcface.yml b/finetuning/SWIN/configs_advanced/swin_base_224_arcface.yml
index 25fb69b..dfc65a9 100644
--- a/finetuning/SWIN/configs_advanced/swin_base_224_arcface.yml
+++ b/finetuning/SWIN/configs_advanced/swin_base_224_arcface.yml
@@ -118,6 +118,7 @@ multi_crop:
 
 # WandB configuration
 wandb:
+  enabled: true
   entity: "gardoslab"
   project: "herbdl"
   resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_base_224_augmented.yml b/finetuning/SWIN/configs_advanced/swin_base_224_augmented.yml
new file mode 100644
index 0000000..f00b20d
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_base_224_augmented.yml
@@ -0,0 +1,103 @@
+# SWIN Base 224 - Baseline with Heavy Augmentation
+# Single-task species classification, full fine-tuning, 15K classes
+
+# Model configuration
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+# Data configuration
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+# Training configuration
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_224_AUGMENTED_3"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_224_AUGMENTED_3"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 64
+  learning_rate: 0.00005
+  num_train_epochs: 50
+  warmup_steps: 500
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  lr_scheduler_kwargs:
+    eta_min: 0.000001
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+# Custom configuration
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base"
+  run_name: "SWIN_Base_224_Augmented3"
+  run_id: "swin_base_224_aug_040626"
+  run_notes: "Less aggressive LR and RandAugment compared to Augmented2"
+
+# Augmentation settings
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 6
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+
+# Multi-task disabled (single-task baseline)
+multi_task:
+  enabled: false
+
+# Multi-crop testing (inference only)
+multi_crop:
+  enabled: false
+
+# WandB configuration
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_base_224_enhanced.yml b/finetuning/SWIN/configs_advanced/swin_base_224_enhanced.yml
index c02620e..1b470e7 100644
--- a/finetuning/SWIN/configs_advanced/swin_base_224_enhanced.yml
+++ b/finetuning/SWIN/configs_advanced/swin_base_224_enhanced.yml
@@ -102,6 +102,7 @@ multi_crop:
 
 # WandB configuration
 wandb:
+  enabled: true
   entity: "gardoslab"
   project: "herbdl"
   resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_base_224_multitask.yml b/finetuning/SWIN/configs_advanced/swin_base_224_multitask.yml
index 9bd27e1..68b1205 100644
--- a/finetuning/SWIN/configs_advanced/swin_base_224_multitask.yml
+++ b/finetuning/SWIN/configs_advanced/swin_base_224_multitask.yml
@@ -31,15 +31,15 @@ data:
 
 # Training configuration
 training:
-  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_224_MULTITASK"
-  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_224_MULTITASK"
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_224_MULTITASK_2"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_224_MULTITASK_2"
   do_train: true
   do_eval: true
   per_device_train_batch_size: 128
   per_device_eval_batch_size: 32
   learning_rate: 0.0005  # 5e-4 (higher LR for multi-task learning)
   num_train_epochs: 100
-  warmup_steps: 500
+  warmup_steps: 1000
   weight_decay: 0.01
   gradient_accumulation_steps: 1
   lr_scheduler_type: "cosine"
@@ -64,11 +64,11 @@ custom:
   frozen_type: "none"
   run_group: "SWIN_Base_MultiTask"
   run_name: "SWIN_Base_224_MultiTask"
-  run_id: "swin_base_224_mt_051826"
+  run_id: "swin_base_224_mt_032426"
 
 # Advanced augmentation settings
 augmentation:
-  use_advanced: true
+  use_advanced: false
   randaugment:
     num_ops: 2
     magnitude: 9
@@ -108,6 +108,7 @@ multi_crop:
 
 # WandB configuration
 wandb:
+  enabled: true
   entity: "gardoslab"
   project: "herbdl"
   resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_base_384_enhanced.yml b/finetuning/SWIN/configs_advanced/swin_base_384_enhanced.yml
index dd5fda4..f965138 100644
--- a/finetuning/SWIN/configs_advanced/swin_base_384_enhanced.yml
+++ b/finetuning/SWIN/configs_advanced/swin_base_384_enhanced.yml
@@ -101,6 +101,7 @@ multi_crop:
 
 # WandB configuration
 wandb:
+  enabled: true
   entity: "gardoslab"
   project: "herbdl"
   resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_baseline_augmented.yml b/finetuning/SWIN/configs_advanced/swin_baseline_augmented.yml
new file mode 100644
index 0000000..523ebf4
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_baseline_augmented.yml
@@ -0,0 +1,103 @@
+# SWIN Base 224 - Baseline with Heavy Augmentation
+# Single-task species classification, full fine-tuning, 15K classes
+
+# Model configuration
+model:
+  model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_BASELINE/checkpoint-131250"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+# Data configuration
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+# Training configuration
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASELINE+AUG"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASELINE+AUG"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 64
+  learning_rate: 0.00005
+  num_train_epochs: 75
+  warmup_steps: 1500
+  weight_decay: 0.05
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+# Custom configuration
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base"
+  run_name: "SWIN_Baseline+Aug"
+  run_id: "swin_baseline+aug_041026"
+  run_notes: "More aggressive augmentation from baseline checkpoint. Higher mixup/cutmix prob, stronger RandAugment, more random erasing. Targeting 0.78 F1."
+
+# Augmentation settings
+# Augmentation settings (more aggressive)
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+
+# Multi-task disabled (single-task baseline)
+multi_task:
+  enabled: false
+
+# Multi-crop testing (inference only)
+multi_crop:
+  enabled: false
+
+# WandB configuration
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_baseline_augmented_v2.yml b/finetuning/SWIN/configs_advanced/swin_baseline_augmented_v2.yml
new file mode 100644
index 0000000..97d25e4
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_baseline_augmented_v2.yml
@@ -0,0 +1,104 @@
+# SWIN Base 224 - Augmented v2 (more aggressive)
+# Continuing from augmented checkpoint, heavier augmentation toward 0.78 F1 target
+
+# Model configuration
+model:
+  model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_BASELINE/checkpoint-131250"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+# Data configuration
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+# Training configuration
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASELINE+AUG_v2"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASELINE+AUG_v2"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 64
+  learning_rate: 0.000005
+  num_train_epochs: 75
+  warmup_steps: 1500
+  weight_decay: 0.05
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  lr_scheduler_kwargs:
+    eta_min: 0.0000003
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+# Custom configuration
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base"
+  run_name: "SWIN_Baseline+Aug_v2"
+  run_id: "swin_baseline+aug_v2_040926"
+  run_notes: "More aggressive augmentation from baseline checkpoint. Higher mixup/cutmix prob, stronger RandAugment, more random erasing. Targeting 0.78 F1."
+
+# Augmentation settings (more aggressive)
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+
+# Multi-task disabled
+multi_task:
+  enabled: false
+
+# Multi-crop testing (inference only)
+multi_crop:
+  enabled: false
+
+# WandB configuration
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_384.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_384.yml
new file mode 100644
index 0000000..281450e
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_384.yml
@@ -0,0 +1,114 @@
+# Curriculum 384 — Scale to 384 resolution from Hybrid checkpoint
+# Uses 384 architecture config + image processor with weights transferred from 224 Hybrid model.
+# Window size mismatch (7→12) means position biases reinit, but backbone knowledge transfers.
+# Effective batch: 64 * 2 (grad accum) = 128. ArcFace hybrid preserved.
+
+model:
+  model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_HYBRID"
+  config_name: "microsoft/swin-base-patch4-window12-384-in22k"
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: "microsoft/swin-base-patch4-window12-384-in22k"
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_384"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_384"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 64
+  per_device_eval_batch_size: 16
+  learning_rate: 0.00005
+  num_train_epochs: 50
+  warmup_steps: 1000
+  weight_decay: 0.05
+  gradient_accumulation_steps: 2
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum"
+  run_name: "Curriculum_384"
+  run_id: "swin_curriculum_384"
+  run_notes: "384 resolution. Backbone from Hybrid (224) checkpoint with 384 arch config. Position biases reinit. ArcFace hybrid preserved."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: false
+  cutmix:
+    enabled: false
+  mixup_cutmix_prob: 0.0
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+  color_jitter:
+    enabled: true
+    brightness: 0.4
+    contrast: 0.4
+    saturation: 0.4
+    hue: 0.1
+
+multi_task:
+  enabled: true
+  min_species_samples: 2
+  family_weight: 0.2
+  genus_weight: 0.3
+  species_weight: 1.0
+
+arcface:
+  enabled: true
+  embedding_size: 512
+  scale: 30.0
+  margin: 0.50
+  num_subcenters: 3
+  hybrid_ce_weight: 0.3
+
+multi_crop:
+  enabled: true
+  crop_sizes: [400, 416, 448, 480, 512]
+  target_size: 384
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_arcface.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_arcface.yml
new file mode 100644
index 0000000..e5b1a96
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_arcface.yml
@@ -0,0 +1,113 @@
+# Curriculum ArcFace — SubCenter ArcFace + MultiTask from MultiTask checkpoint
+# New ArcFace/embedding heads on top of curriculum-trained backbone.
+# Mixup/CutMix disabled (incompatible with ArcFace hard-label margin).
+
+model:
+  model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_MULTITASK"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_ARCFACE"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_ARCFACE"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0001
+  num_train_epochs: 60
+  warmup_steps: 1000
+  weight_decay: 0.05
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum"
+  run_name: "Curriculum_ArcFace"
+  run_id: "swin_curriculum_arcface"
+  run_notes: "ArcFace stage. SubCenter ArcFace for species + CE auxiliary heads for family/genus. Backbone from MultiTask checkpoint."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: false
+  cutmix:
+    enabled: false
+  mixup_cutmix_prob: 0.0
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+  color_jitter:
+    enabled: true
+    brightness: 0.4
+    contrast: 0.4
+    saturation: 0.4
+    hue: 0.1
+
+multi_task:
+  enabled: true
+  min_species_samples: 2
+  family_weight: 0.2
+  genus_weight: 0.3
+  species_weight: 1.0
+
+arcface:
+  enabled: true
+  embedding_size: 512
+  scale: 30.0
+  margin: 0.50
+  num_subcenters: 3
+  hybrid_ce_weight: 0.0
+
+multi_crop:
+  enabled: false
+  crop_sizes: [256, 288, 320, 384, 448]
+  target_size: 224
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_hybrid.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_hybrid.yml
new file mode 100644
index 0000000..dc73558
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_hybrid.yml
@@ -0,0 +1,113 @@
+# Curriculum Hybrid — CE + ArcFace blended loss from ArcFace checkpoint
+# Adds a parallel CE head (weight=0.3) alongside ArcFace (weight=0.7).
+# Stable CE gradients help further fine-tune the backbone.
+
+model:
+  model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_ARCFACE"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_HYBRID"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_HYBRID"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.00003
+  num_train_epochs: 40
+  warmup_steps: 500
+  weight_decay: 0.05
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum"
+  run_name: "Curriculum_Hybrid"
+  run_id: "swin_curriculum_hybrid"
+  run_notes: "Hybrid loss: 0.7*ArcFace + 0.3*CE for species. CE auxiliary for family/genus. From ArcFace checkpoint."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: false
+  cutmix:
+    enabled: false
+  mixup_cutmix_prob: 0.0
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+  color_jitter:
+    enabled: true
+    brightness: 0.4
+    contrast: 0.4
+    saturation: 0.4
+    hue: 0.1
+
+multi_task:
+  enabled: true
+  min_species_samples: 2
+  family_weight: 0.2
+  genus_weight: 0.3
+  species_weight: 1.0
+
+arcface:
+  enabled: true
+  embedding_size: 512
+  scale: 30.0
+  margin: 0.50
+  num_subcenters: 3
+  hybrid_ce_weight: 0.3
+
+multi_crop:
+  enabled: false
+  crop_sizes: [256, 288, 320, 384, 448]
+  target_size: 224
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_multitask.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_multitask.yml
new file mode 100644
index 0000000..8aa0982
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_multitask.yml
@@ -0,0 +1,98 @@
+# Curriculum Multi-Task — from S3 checkpoint + multi-task heads
+# New family/genus heads on top of S3 backbone. Heavy augmentation preserved.
+
+model:
+  model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S3_CONT"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_MULTITASK"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_MULTITASK"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.00003
+  num_train_epochs: 75
+  warmup_steps: 1000
+  weight_decay: 0.05
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum"
+  run_name: "Curriculum_MultiTask"
+  run_id: "swin_curriculum_multitask"
+  run_notes: "Multi-task from S3_cont checkpoint. Family/genus heads randomly initialized on top of curriculum-trained backbone."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+
+multi_task:
+  enabled: true
+  min_species_samples: 2
+  family_weight: 0.2
+  genus_weight: 0.3
+  species_weight: 1.0
+
+multi_crop:
+  enabled: false
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_384.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_384.yml
new file mode 100644
index 0000000..d5c25fa
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_384.yml
@@ -0,0 +1,116 @@
+# Pretrained Full Pipeline — All techniques from SwinV2-384 pretrained weights.
+# No curriculum staging. Heavy aug + MultiTask CE applied from epoch 1.
+# SwinV2 window12to24-192to384 fine-tuned checkpoint (384px native resolution).
+# Effective batch size 128 (64 per device * grad_accum 2).
+
+model:
+  model_name_or_path: "microsoft/swinv2-base-patch4-window12to24-192to384-22kto1k-ft"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_384"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_384"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 64
+  per_device_eval_batch_size: 16
+  learning_rate: 0.0001
+  num_train_epochs: 100
+  warmup_steps: 2000
+  weight_decay: 0.05
+  gradient_accumulation_steps: 2
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum_Pretrained"
+  run_name: "Curriculum_Pretrained_384"
+  run_id: "swin_curriculum_pretrained_384"
+  run_notes: "All techniques from SwinV2-384 pretrained (window12to24-192to384-22kto1k-ft). Heavy aug + MultiTask CE. No staged curriculum."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+  color_jitter:
+    enabled: true
+    brightness: 0.4
+    contrast: 0.4
+    saturation: 0.4
+    hue: 0.1
+
+multi_task:
+  enabled: true
+  min_species_samples: 2
+  family_weight: 0.2
+  genus_weight: 0.3
+  species_weight: 1.0
+
+arcface:
+  enabled: false
+  embedding_size: 512
+  scale: 30.0
+  margin: 0.50
+  num_subcenters: 3
+  hybrid_ce_weight: 0.0
+
+multi_crop:
+  enabled: false
+  crop_sizes: [400, 416, 448, 480, 512]
+  target_size: 384
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s1.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s1.yml
new file mode 100644
index 0000000..aea8c06
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s1.yml
@@ -0,0 +1,96 @@
+# Curriculum Pretrained S1 — Mild augmentation from ImageNet-22k pretrained weights
+# Skips the no-aug baseline stage: starts directly from HF pretrained checkpoint with mild aug.
+# Higher LR than original S1 (1e-4 vs 5e-5) since the classifier head is randomly initialized.
+# More epochs (50) to allow the fresh head and backbone to co-adapt before aug ramps up.
+
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S1"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S1"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 64
+  learning_rate: 0.0001
+  num_train_epochs: 50
+  warmup_steps: 1000
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 3
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum_Pretrained"
+  run_name: "Curriculum_Pretrained_S1"
+  run_id: "swin_curriculum_pretrained_s1"
+  run_notes: "Stage 1. Mild aug directly from ImageNet-22k pretrained weights. No no-aug baseline stage."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 4
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.3
+  label_smoothing: 0.05
+  random_erasing:
+    enabled: true
+    probability: 0.1
+    min_area: 0.02
+    max_area: 0.33
+
+multi_task:
+  enabled: false
+
+multi_crop:
+  enabled: false
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s2.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s2.yml
new file mode 100644
index 0000000..83e00a2
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s2.yml
@@ -0,0 +1,94 @@
+# Curriculum Pretrained S2 — Medium augmentation from Pretrained S1 checkpoint
+# Identical technique progression to original S2. Stepping up aug difficulty.
+
+model:
+  model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S1"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S2"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S2"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 64
+  learning_rate: 0.00003
+  num_train_epochs: 30
+  warmup_steps: 500
+  weight_decay: 0.03
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 3
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum_Pretrained"
+  run_name: "Curriculum_Pretrained_S2"
+  run_id: "swin_curriculum_pretrained_s2"
+  run_notes: "Stage 2. Medium aug from Pretrained S1 checkpoint. Stepping up difficulty."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 7
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.4
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.15
+    min_area: 0.02
+    max_area: 0.33
+
+multi_task:
+  enabled: false
+
+multi_crop:
+  enabled: false
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s3.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s3.yml
new file mode 100644
index 0000000..c8db5be
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_pretrained_s3.yml
@@ -0,0 +1,94 @@
+# Curriculum Pretrained S3 — Heavy augmentation from Pretrained S2 checkpoint
+# Identical technique progression to original S3. Maximum regularization.
+
+model:
+  model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S2"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S3"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_PRETRAINED_S3"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 64
+  learning_rate: 0.00002
+  num_train_epochs: 50
+  warmup_steps: 500
+  weight_decay: 0.05
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum_Pretrained"
+  run_name: "Curriculum_Pretrained_S3"
+  run_id: "swin_curriculum_pretrained_s3"
+  run_notes: "Stage 3. Heavy aug from Pretrained S2 checkpoint. Maximum regularization."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+
+multi_task:
+  enabled: false
+
+multi_crop:
+  enabled: false
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_s1.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_s1.yml
new file mode 100644
index 0000000..8a27723
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_s1.yml
@@ -0,0 +1,94 @@
+# Curriculum Stage 1 — Mild augmentation from baseline
+# mag=4, prob=0.3, erasing=0.1 | 25 epochs | lr=5e-5
+
+model:
+  model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_BASE_BASELINE/checkpoint-131250"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S1"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S1"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 64
+  learning_rate: 0.00005
+  num_train_epochs: 25
+  warmup_steps: 1000
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 3
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum"
+  run_name: "Curriculum_S1"
+  run_id: "swin_curriculum_s1"
+  run_notes: "Stage 1 of 3. Mild augmentation from baseline. Soft-label warm-up before increasing difficulty."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 4
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.3
+  label_smoothing: 0.05
+  random_erasing:
+    enabled: true
+    probability: 0.1
+    min_area: 0.02
+    max_area: 0.33
+
+multi_task:
+  enabled: false
+
+multi_crop:
+  enabled: false
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_s2.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_s2.yml
new file mode 100644
index 0000000..71b8fb0
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_s2.yml
@@ -0,0 +1,94 @@
+# Curriculum Stage 2 — Medium augmentation from S1
+# mag=7, prob=0.4, erasing=0.15 | 30 epochs | lr=3e-5
+
+model:
+  model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S1"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S2"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S2"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 64
+  learning_rate: 0.00003
+  num_train_epochs: 30
+  warmup_steps: 500
+  weight_decay: 0.03
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 3
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum"
+  run_name: "Curriculum_S2"
+  run_id: "swin_curriculum_s2"
+  run_notes: "Stage 2 of 3. Medium augmentation from S1 checkpoint. Stepping up difficulty."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 7
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.4
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.15
+    min_area: 0.02
+    max_area: 0.33
+
+multi_task:
+  enabled: false
+
+multi_crop:
+  enabled: false
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_s3.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_s3.yml
new file mode 100644
index 0000000..f5e6a11
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_s3.yml
@@ -0,0 +1,94 @@
+# Curriculum Stage 3 — Heavy augmentation from S2
+# mag=9, prob=0.5, erasing=0.25 | 50 epochs | lr=2e-5
+
+model:
+  model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S2"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S3"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S3"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 64
+  learning_rate: 0.00002
+  num_train_epochs: 50
+  warmup_steps: 500
+  weight_decay: 0.05
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum"
+  run_name: "Curriculum_S3"
+  run_id: "swin_curriculum_s3"
+  run_notes: "Stage 3 of 3. Heavy augmentation from S2 checkpoint. Maximum regularization push toward 0.78 F1."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+
+multi_task:
+  enabled: false
+
+multi_crop:
+  enabled: false
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_s3_cont.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_s3_cont.yml
new file mode 100644
index 0000000..ff75221
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_s3_cont.yml
@@ -0,0 +1,94 @@
+# Curriculum Stage 3 continuation — Heavy augmentation, epochs 51-100
+# Fresh cosine schedule from S3 final model. Same augmentation settings as S3.
+
+model:
+  model_name_or_path: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S3"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S3_CONT"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_S3_CONT"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 64
+  learning_rate: 0.00002
+  num_train_epochs: 50
+  warmup_steps: 500
+  weight_decay: 0.05
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum"
+  run_name: "Curriculum_S3_cont"
+  run_id: "swin_curriculum_s3_cont"
+  run_notes: "Stage 3 continuation (epochs 51-100). Fresh cosine schedule from S3 final model. Same heavy augmentation."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+
+multi_task:
+  enabled: false
+
+multi_crop:
+  enabled: false
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_curriculum_v2.yml b/finetuning/SWIN/configs_advanced/swin_curriculum_v2.yml
new file mode 100644
index 0000000..d9766ed
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_curriculum_v2.yml
@@ -0,0 +1,114 @@
+# Curriculum V2 — SWIN V2 architecture upgrade from 384 checkpoint
+# SWIN V2 arch is too different to chain directly, so we load V2 pretrained weights
+# but re-apply all techniques (heavy aug + arcface hybrid + multitask) from epoch 1.
+# Uses 192 native resolution for V2 (pre-trained window12-192).
+
+model:
+  model_name_or_path: "microsoft/swinv2-base-patch4-window12-192-22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_V2"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_CURRICULUM_V2"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0003
+  num_train_epochs: 100
+  warmup_steps: 2000
+  weight_decay: 0.05
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 42
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Curriculum"
+  run_name: "Curriculum_V2"
+  run_id: "swin_curriculum_v2"
+  run_notes: "SWIN V2 architecture with all techniques from scratch. ImageNet22k pretrained V2 weights. All curriculum-derived techniques applied from epoch 1."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: false
+  cutmix:
+    enabled: false
+  mixup_cutmix_prob: 0.0
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+  color_jitter:
+    enabled: true
+    brightness: 0.4
+    contrast: 0.4
+    saturation: 0.4
+    hue: 0.1
+
+multi_task:
+  enabled: true
+  min_species_samples: 2
+  family_weight: 0.2
+  genus_weight: 0.3
+  species_weight: 1.0
+
+arcface:
+  enabled: true
+  embedding_size: 512
+  scale: 30.0
+  margin: 0.50
+  num_subcenters: 3
+  hybrid_ce_weight: 0.3
+
+multi_crop:
+  enabled: false
+  crop_sizes: [224, 256, 288, 320, 352]
+  target_size: 192
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_large_384_concrete.yml b/finetuning/SWIN/configs_advanced/swin_large_384_concrete.yml
new file mode 100644
index 0000000..797a4ed
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_large_384_concrete.yml
@@ -0,0 +1,150 @@
+# =============================================================================
+# Concrete next run (SWIN_training_setup_summary.md §10/§11 — "Putting it together")
+# =============================================================================
+# A single strong single-model candidate combining the cheap-but-high-value levers:
+#   - Backbone : SWIN-L @384, ImageNet-22k pretrained (Tier 1.1 "stay in SWIN", Tier 1.2)
+#   - Loss     : balanced-softmax CE (Tier 1.3-A) + multi-task family/genus/species heads
+#   - Schedule : 100 epochs, 5% warmup, cosine; EMA on (Tier 2.6)
+#   - Aug      : MEDIUM (RandAugment mag 7, mild erasing/mixup) — respects the curriculum
+#                finding that heavy aug applied COLD destroys performance (0.61 vs 0.745).
+#   - Inference: multi-crop + flip TTA wired (Tier 2.7), enabled only for final prediction.
+#
+# WARM-START (recommended once a 384 checkpoint exists — Tier 2.5 "always chain"):
+#   point model.model_name_or_path at a converged SWIN-L 384 output_dir (keep
+#   config_name / image_processor_name on the 384 arch) and raise RandAugment to 9.
+#   Cold-from-in22k here is the dependency-free default; chaining is strictly better.
+#
+# Seeds: launch the ensemble with submit_concrete.sh, which overrides
+#   training.seed / training.output_dir / custom.run_id via --set per job.
+# =============================================================================
+
+model:
+  model_name_or_path: "microsoft/swin-large-patch4-window12-384-in22k"
+  config_name: "microsoft/swin-large-patch4-window12-384-in22k"
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: "microsoft/swin-large-patch4-window12-384-in22k"
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: 50000
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/tgardos/herbdl/finetuning/output/SWIN/SWIN_L_384_CONCRETE_SEED0"
+  logging_dir: "/projectnb/herbdl/workspaces/tgardos/herbdl/finetuning/output/SWIN/SWIN_L_384_CONCRETE_SEED0"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 16
+  per_device_eval_batch_size: 8
+  learning_rate: 0.0001
+  num_train_epochs: 100
+  warmup_steps: 0.05
+  weight_decay: 0.05
+  gradient_accumulation_steps: 8          # effective batch = 16 * 8 = 128
+  gradient_checkpointing: true            # needed to fit SWIN-L @384 (wrappers pass it through)
+  lr_scheduler_type: "cosine"
+  lr_scheduler_kwargs:
+    eta_min: 0.000001
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 3
+  eval_strategy: "epoch"
+  eval_on_start: true
+  load_best_model_at_end: false           # EMA copies averaged weights in at train end; do not reload "best"
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 8
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_L_384_Concrete"
+  run_name: "SWIN_L_384_Concrete_Seed0"
+  run_id: "swin_l_384_concrete_seed0"
+  run_notes: "Concrete next run: SWIN-L 384 in22k, balanced-softmax + multi-task CE, medium aug, EMA, 100ep. Seed 0."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 7                          # MEDIUM (not 9) — cold start, see curriculum finding
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.3                   # mild mixing for a cold backbone
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.15
+    min_area: 0.02
+    max_area: 0.33
+  color_jitter:
+    enabled: true
+    brightness: 0.4
+    contrast: 0.4
+    saturation: 0.4
+    hue: 0.1
+
+multi_task:
+  enabled: true
+  min_species_samples: 2
+  family_weight: 0.2
+  genus_weight: 0.3
+  species_weight: 1.0
+
+# Balanced softmax / logit adjustment (Tier 1.3-A). Adds tau * log(class_prior) to the
+# species logits during TRAINING only (off at inference) to lift macro-F1 on the long tail.
+long_tail:
+  logit_adjustment: true
+  tau: 1.0
+
+# Weight EMA (Tier 2.6). EMA weights are copied into the model at train end, so the
+# final eval/save reflect them. Keep load_best_model_at_end false (see above).
+ema:
+  enabled: true
+  decay: 0.9998
+
+# Multi-crop + horizontal-flip TTA (Tier 2.7). Leave disabled during training; enable
+# for the final/leaderboard prediction (crops are sized around the 384 target).
+multi_crop:
+  enabled: false
+  crop_sizes: [400, 416, 448, 480, 512]
+  target_size: 384
+  flip: true
+
+# ArcFace deferred (Tier 2.4) — kept off for this run.
+arcface:
+  enabled: false
+  embedding_size: 512
+  scale: 30.0
+  margin: 0.50
+  num_subcenters: 3
+  hybrid_ce_weight: 0.0
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed0.yml b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed0.yml
new file mode 100644
index 0000000..57c0fe2
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed0.yml
@@ -0,0 +1,115 @@
+# Pretrained Full Pipeline (SwinL-224) — seed 0, LINEAR LR
+# SwinL-224 (in22k pretrained), all techniques (heavy aug + MultiTask CE), linear LR + 5-epoch warmup.
+# Linear LR seed ensemble run 1/5.
+
+model:
+  model_name_or_path: "microsoft/swin-large-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED0"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED0"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 64
+  per_device_eval_batch_size: 16
+  learning_rate: 0.0001
+  num_train_epochs: 100
+  warmup_steps: 0.05
+  weight_decay: 0.05
+  gradient_accumulation_steps: 2
+  lr_scheduler_type: "linear"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "linear"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Pretrained_SwinL224_Linear_Seeds"
+  run_name: "Pretrained_SwinL224_Linear_Seed0"
+  run_id: "swin_pretrained_swinl224_linear_seed0"
+  run_notes: "Linear LR seed ensemble. SwinL-224 in22k pretrained, heavy aug + MultiTask CE, 5-epoch warmup. Seed 0."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+  color_jitter:
+    enabled: true
+    brightness: 0.4
+    contrast: 0.4
+    saturation: 0.4
+    hue: 0.1
+
+multi_task:
+  enabled: true
+  min_species_samples: 2
+  family_weight: 0.2
+  genus_weight: 0.3
+  species_weight: 1.0
+
+arcface:
+  enabled: false
+  embedding_size: 512
+  scale: 30.0
+  margin: 0.50
+  num_subcenters: 3
+  hybrid_ce_weight: 0.0
+
+multi_crop:
+  enabled: false
+  crop_sizes: [400, 416, 448, 480, 512]
+  target_size: 224
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed1.yml b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed1.yml
new file mode 100644
index 0000000..36b41c9
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed1.yml
@@ -0,0 +1,115 @@
+# Pretrained Full Pipeline (SwinL-224) — seed 1, LINEAR LR
+# SwinL-224 (in22k pretrained), all techniques (heavy aug + MultiTask CE), linear LR + 5-epoch warmup.
+# Linear LR seed ensemble run 2/5.
+
+model:
+  model_name_or_path: "microsoft/swin-large-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED1"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED1"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 64
+  per_device_eval_batch_size: 16
+  learning_rate: 0.0001
+  num_train_epochs: 100
+  warmup_steps: 0.05
+  weight_decay: 0.05
+  gradient_accumulation_steps: 2
+  lr_scheduler_type: "linear"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 1
+
+custom:
+  lr_type: "linear"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Pretrained_SwinL224_Linear_Seeds"
+  run_name: "Pretrained_SwinL224_Linear_Seed1"
+  run_id: "swin_pretrained_swinl224_linear_seed1"
+  run_notes: "Linear LR seed ensemble. SwinL-224 in22k pretrained, heavy aug + MultiTask CE, 5-epoch warmup. Seed 1."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+  color_jitter:
+    enabled: true
+    brightness: 0.4
+    contrast: 0.4
+    saturation: 0.4
+    hue: 0.1
+
+multi_task:
+  enabled: true
+  min_species_samples: 2
+  family_weight: 0.2
+  genus_weight: 0.3
+  species_weight: 1.0
+
+arcface:
+  enabled: false
+  embedding_size: 512
+  scale: 30.0
+  margin: 0.50
+  num_subcenters: 3
+  hybrid_ce_weight: 0.0
+
+multi_crop:
+  enabled: false
+  crop_sizes: [400, 416, 448, 480, 512]
+  target_size: 224
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed2.yml b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed2.yml
new file mode 100644
index 0000000..893f052
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed2.yml
@@ -0,0 +1,115 @@
+# Pretrained Full Pipeline (SwinL-224) — seed 2, LINEAR LR
+# SwinL-224 (in22k pretrained), all techniques (heavy aug + MultiTask CE), linear LR + 5-epoch warmup.
+# Linear LR seed ensemble run 3/5.
+
+model:
+  model_name_or_path: "microsoft/swin-large-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED2"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED2"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 64
+  per_device_eval_batch_size: 16
+  learning_rate: 0.0001
+  num_train_epochs: 100
+  warmup_steps: 0.05
+  weight_decay: 0.05
+  gradient_accumulation_steps: 2
+  lr_scheduler_type: "linear"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 2
+
+custom:
+  lr_type: "linear"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Pretrained_SwinL224_Linear_Seeds"
+  run_name: "Pretrained_SwinL224_Linear_Seed2"
+  run_id: "swin_pretrained_swinl224_linear_seed2"
+  run_notes: "Linear LR seed ensemble. SwinL-224 in22k pretrained, heavy aug + MultiTask CE, 5-epoch warmup. Seed 2."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+  color_jitter:
+    enabled: true
+    brightness: 0.4
+    contrast: 0.4
+    saturation: 0.4
+    hue: 0.1
+
+multi_task:
+  enabled: true
+  min_species_samples: 2
+  family_weight: 0.2
+  genus_weight: 0.3
+  species_weight: 1.0
+
+arcface:
+  enabled: false
+  embedding_size: 512
+  scale: 30.0
+  margin: 0.50
+  num_subcenters: 3
+  hybrid_ce_weight: 0.0
+
+multi_crop:
+  enabled: false
+  crop_sizes: [400, 416, 448, 480, 512]
+  target_size: 224
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed3.yml b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed3.yml
new file mode 100644
index 0000000..19b9586
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed3.yml
@@ -0,0 +1,115 @@
+# Pretrained Full Pipeline (SwinL-224) — seed 3, LINEAR LR
+# SwinL-224 (in22k pretrained), all techniques (heavy aug + MultiTask CE), linear LR + 5-epoch warmup.
+# Linear LR seed ensemble run 4/5.
+
+model:
+  model_name_or_path: "microsoft/swin-large-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED3"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED3"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 64
+  per_device_eval_batch_size: 16
+  learning_rate: 0.0001
+  num_train_epochs: 100
+  warmup_steps: 0.05
+  weight_decay: 0.05
+  gradient_accumulation_steps: 2
+  lr_scheduler_type: "linear"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 3
+
+custom:
+  lr_type: "linear"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Pretrained_SwinL224_Linear_Seeds"
+  run_name: "Pretrained_SwinL224_Linear_Seed3"
+  run_id: "swin_pretrained_swinl224_linear_seed3"
+  run_notes: "Linear LR seed ensemble. SwinL-224 in22k pretrained, heavy aug + MultiTask CE, 5-epoch warmup. Seed 3."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+  color_jitter:
+    enabled: true
+    brightness: 0.4
+    contrast: 0.4
+    saturation: 0.4
+    hue: 0.1
+
+multi_task:
+  enabled: true
+  min_species_samples: 2
+  family_weight: 0.2
+  genus_weight: 0.3
+  species_weight: 1.0
+
+arcface:
+  enabled: false
+  embedding_size: 512
+  scale: 30.0
+  margin: 0.50
+  num_subcenters: 3
+  hybrid_ce_weight: 0.0
+
+multi_crop:
+  enabled: false
+  crop_sizes: [400, 416, 448, 480, 512]
+  target_size: 224
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed4.yml b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed4.yml
new file mode 100644
index 0000000..5eb2667
--- /dev/null
+++ b/finetuning/SWIN/configs_advanced/swin_pretrained_384_seed4.yml
@@ -0,0 +1,115 @@
+# Pretrained Full Pipeline (SwinL-224) — seed 4, LINEAR LR
+# SwinL-224 (in22k pretrained), all techniques (heavy aug + MultiTask CE), linear LR + 5-epoch warmup.
+# Linear LR seed ensemble run 5/5.
+
+model:
+  model_name_or_path: "microsoft/swin-large-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED4"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/SWIN_PRETRAINED_SWINL224_LINEAR_SEED4"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 64
+  per_device_eval_batch_size: 16
+  learning_rate: 0.0001
+  num_train_epochs: 100
+  warmup_steps: 0.05
+  weight_decay: 0.05
+  gradient_accumulation_steps: 2
+  lr_scheduler_type: "linear"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  eval_on_start: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: 16
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 4
+
+custom:
+  lr_type: "linear"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Pretrained_SwinL224_Linear_Seeds"
+  run_name: "Pretrained_SwinL224_Linear_Seed4"
+  run_id: "swin_pretrained_swinl224_linear_seed4"
+  run_notes: "Linear LR seed ensemble. SwinL-224 in22k pretrained, heavy aug + MultiTask CE, 5-epoch warmup. Seed 4."
+
+augmentation:
+  use_advanced: true
+  randaugment:
+    num_ops: 2
+    magnitude: 9
+  mixup:
+    enabled: true
+    alpha: 0.8
+  cutmix:
+    enabled: true
+    alpha: 1.0
+  mixup_cutmix_prob: 0.5
+  label_smoothing: 0.1
+  random_erasing:
+    enabled: true
+    probability: 0.25
+    min_area: 0.02
+    max_area: 0.33
+  color_jitter:
+    enabled: true
+    brightness: 0.4
+    contrast: 0.4
+    saturation: 0.4
+    hue: 0.1
+
+multi_task:
+  enabled: true
+  min_species_samples: 2
+  family_weight: 0.2
+  genus_weight: 0.3
+  species_weight: 1.0
+
+arcface:
+  enabled: false
+  embedding_size: 512
+  scale: 30.0
+  margin: 0.50
+  num_subcenters: 3
+  hybrid_ce_weight: 0.0
+
+multi_crop:
+  enabled: false
+  crop_sizes: [400, 416, 448, 480, 512]
+  target_size: 224
+
+wandb:
+  enabled: true
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/configs_advanced/swinv2_base_192_enhanced.yml b/finetuning/SWIN/configs_advanced/swinv2_base_192_enhanced.yml
index 310d69d..b4183e8 100644
--- a/finetuning/SWIN/configs_advanced/swinv2_base_192_enhanced.yml
+++ b/finetuning/SWIN/configs_advanced/swinv2_base_192_enhanced.yml
@@ -101,6 +101,7 @@ multi_crop:
 
 # WandB configuration
 wandb:
+  enabled: true
   entity: "gardoslab"
   project: "herbdl"
   resume: "allow"
diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr1e4.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr1e4.yml
new file mode 100644
index 0000000..a8785e3
--- /dev/null
+++ b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr1e4.yml
@@ -0,0 +1,67 @@
+# SWIN Base — LR tuning: cosine schedule, lr=1e-4, no warmup
+# Baseline of the cosine LR sweep (lowest LR). Compare with lr2e4 and lr5e4.
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR1E4"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR1E4"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0001
+  num_train_epochs: 100
+  warmup_steps: 0
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  torch_compile: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: -1
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base_LR_Sweep"
+  run_name: "SWIN_Base_Cosine_LR1e4"
+  run_id: "swin_base_cosine_lr1e4_052026"
+  run_notes: "LR sweep: cosine, lr=1e-4, no warmup. ImageNet-22k pretrained."
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr1e4_warmup.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr1e4_warmup.yml
new file mode 100644
index 0000000..3db0559
--- /dev/null
+++ b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr1e4_warmup.yml
@@ -0,0 +1,67 @@
+# SWIN Base — LR tuning: cosine schedule, lr=1e-4, 5-epoch warmup (0 → 1e-4)
+# Warmup variant of swin_base_cosine_lr1e4. warmup_steps=0.1 (ratio) → ~8964 warmup steps over 5 epochs.
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR1E4_WARMUP"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR1E4_WARMUP"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0001
+  num_train_epochs: 100
+  warmup_steps: 0.05
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  torch_compile: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: -1
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base_LR_Sweep"
+  run_name: "SWIN_Base_Cosine_LR1e4_Warmup"
+  run_id: "swin_base_cosine_lr1e4_warmup_052026"
+  run_notes: "LR sweep: cosine, lr=1e-4, 5-epoch warmup (0→1e-4). ImageNet-22k pretrained."
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr2e4.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr2e4.yml
new file mode 100644
index 0000000..c9bc330
--- /dev/null
+++ b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr2e4.yml
@@ -0,0 +1,67 @@
+# SWIN Base — LR tuning: cosine schedule, lr=2e-4, no warmup
+# Mid-point of the cosine LR sweep. Compare with lr1e4 and lr5e4.
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR2E4"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR2E4"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0002
+  num_train_epochs: 100
+  warmup_steps: 0
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  torch_compile: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: -1
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base_LR_Sweep"
+  run_name: "SWIN_Base_Cosine_LR2e4"
+  run_id: "swin_base_cosine_lr2e4_052026"
+  run_notes: "LR sweep: cosine, lr=2e-4, no warmup. ImageNet-22k pretrained."
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr2e4_warmup.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr2e4_warmup.yml
new file mode 100644
index 0000000..8c037df
--- /dev/null
+++ b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr2e4_warmup.yml
@@ -0,0 +1,67 @@
+# SWIN Base — LR tuning: cosine schedule, lr=2e-4, 5-epoch warmup (0 → 2e-4)
+# Warmup variant of swin_base_cosine_lr2e4. warmup_steps=0.1 (ratio) → ~8964 warmup steps over 5 epochs.
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR2E4_WARMUP"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR2E4_WARMUP"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0002
+  num_train_epochs: 100
+  warmup_steps: 0.05
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  torch_compile: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: -1
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base_LR_Sweep"
+  run_name: "SWIN_Base_Cosine_LR2e4_Warmup"
+  run_id: "swin_base_cosine_lr2e4_warmup_052026"
+  run_notes: "LR sweep: cosine, lr=2e-4, 5-epoch warmup (0→2e-4). ImageNet-22k pretrained."
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr5e4.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr5e4.yml
new file mode 100644
index 0000000..cf2860c
--- /dev/null
+++ b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr5e4.yml
@@ -0,0 +1,67 @@
+# SWIN Base — LR tuning: cosine schedule, lr=5e-4, no warmup
+# Upper end of the cosine LR sweep. Compare with lr1e4 and lr2e4.
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR5E4"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR5E4"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0005
+  num_train_epochs: 100
+  warmup_steps: 0
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  torch_compile: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: -1
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base_LR_Sweep"
+  run_name: "SWIN_Base_Cosine_LR5e4"
+  run_id: "swin_base_cosine_lr5e4_052026"
+  run_notes: "LR sweep: cosine, lr=5e-4, no warmup. ImageNet-22k pretrained."
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr5e4_warmup.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr5e4_warmup.yml
new file mode 100644
index 0000000..324e2f8
--- /dev/null
+++ b/finetuning/SWIN/hyperparameter_configs/swin_base_cosine_lr5e4_warmup.yml
@@ -0,0 +1,67 @@
+# SWIN Base — LR tuning: cosine schedule, lr=5e-4, 5-epoch warmup (0 → 5e-4)
+# Warmup variant of swin_base_cosine_lr5e4. warmup_steps=0.1 (ratio) → ~8964 warmup steps over 5 epochs.
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR5E4_WARMUP"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/COSINE_LR5E4_WARMUP"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0005
+  num_train_epochs: 100
+  warmup_steps: 0.05
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "cosine"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  torch_compile: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: -1
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "cosine"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base_LR_Sweep"
+  run_name: "SWIN_Base_Cosine_LR5e4_Warmup"
+  run_id: "swin_base_cosine_lr5e4_warmup_052026"
+  run_notes: "LR sweep: cosine, lr=5e-4, 5-epoch warmup (0→5e-4). ImageNet-22k pretrained."
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr1e4.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr1e4.yml
new file mode 100644
index 0000000..318ab7a
--- /dev/null
+++ b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr1e4.yml
@@ -0,0 +1,67 @@
+# SWIN Base — LR tuning: linear schedule, lr=1e-4, no warmup
+# Baseline of the linear LR sweep (lowest LR). Compare with lr2e4 and lr5e4.
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR1E4"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR1E4"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0001
+  num_train_epochs: 100
+  warmup_steps: 0
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "linear"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  torch_compile: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: -1
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "linear"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base_LR_Sweep"
+  run_name: "SWIN_Base_Linear_LR1e4"
+  run_id: "swin_base_linear_lr1e4_052026"
+  run_notes: "LR sweep: linear, lr=1e-4, no warmup. ImageNet-22k pretrained."
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr1e4_warmup.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr1e4_warmup.yml
new file mode 100644
index 0000000..2f8505c
--- /dev/null
+++ b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr1e4_warmup.yml
@@ -0,0 +1,67 @@
+# SWIN Base — LR tuning: linear schedule, lr=1e-4, 5-epoch warmup (0 → 1e-4)
+# Warmup variant of swin_base_linear_lr1e4. warmup_steps=0.1 (ratio) → ~8964 warmup steps over 5 epochs.
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR1E4_WARMUP"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR1E4_WARMUP"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0001
+  num_train_epochs: 100
+  warmup_steps: 0.05
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "linear"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  torch_compile: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: -1
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "linear"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base_LR_Sweep"
+  run_name: "SWIN_Base_Linear_LR1e4_Warmup"
+  run_id: "swin_base_linear_lr1e4_warmup_052026"
+  run_notes: "LR sweep: linear, lr=1e-4, 5-epoch warmup (0→1e-4). ImageNet-22k pretrained."
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr2e4.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr2e4.yml
new file mode 100644
index 0000000..ccf82aa
--- /dev/null
+++ b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr2e4.yml
@@ -0,0 +1,67 @@
+# SWIN Base — LR tuning: linear schedule, lr=2e-4, no warmup
+# Mid-point of the linear LR sweep. Compare with lr1e4 and lr5e4.
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR2E4"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR2E4"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0002
+  num_train_epochs: 100
+  warmup_steps: 0
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "linear"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  torch_compile: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: -1
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "linear"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base_LR_Sweep"
+  run_name: "SWIN_Base_Linear_LR2e4"
+  run_id: "swin_base_linear_lr2e4_052026"
+  run_notes: "LR sweep: linear, lr=2e-4, no warmup. ImageNet-22k pretrained."
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr2e4_warmup.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr2e4_warmup.yml
new file mode 100644
index 0000000..ae117d0
--- /dev/null
+++ b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr2e4_warmup.yml
@@ -0,0 +1,67 @@
+# SWIN Base — LR tuning: linear schedule, lr=2e-4, 5-epoch warmup (0 → 2e-4)
+# Warmup variant of swin_base_linear_lr2e4. warmup_steps=0.1 (ratio) → ~8964 warmup steps over 5 epochs.
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR2E4_WARMUP"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR2E4_WARMUP"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0002
+  num_train_epochs: 100
+  warmup_steps: 0.05
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "linear"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  torch_compile: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: -1
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "linear"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base_LR_Sweep"
+  run_name: "SWIN_Base_Linear_LR2e4_Warmup"
+  run_id: "swin_base_linear_lr2e4_warmup_052026"
+  run_notes: "LR sweep: linear, lr=2e-4, 5-epoch warmup (0→2e-4). ImageNet-22k pretrained."
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr5e4.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr5e4.yml
new file mode 100644
index 0000000..dbe188a
--- /dev/null
+++ b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr5e4.yml
@@ -0,0 +1,68 @@
+# SWIN Base — LR tuning: linear schedule, lr=5e-4, no warmup
+# Upper end of the linear LR sweep. Compare with lr1e4 and lr2e4.
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR5E4"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR5E4"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0005
+  num_train_epochs: 100
+  warmup_steps: 0
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "linear"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  torch_compile: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: -1
+  dataloader_pin_memory: false
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "linear"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base_LR_Sweep"
+  run_name: "SWIN_Base_Linear_LR5e4"
+  run_id: "swin_base_linear_lr5e4_052026"
+  run_notes: "LR sweep: linear, lr=5e-4, no warmup. ImageNet-22k pretrained."
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr5e4_warmup.yml b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr5e4_warmup.yml
new file mode 100644
index 0000000..8420d46
--- /dev/null
+++ b/finetuning/SWIN/hyperparameter_configs/swin_base_linear_lr5e4_warmup.yml
@@ -0,0 +1,67 @@
+# SWIN Base — LR tuning: linear schedule, lr=5e-4, 5-epoch warmup (0 → 5e-4)
+# Warmup variant of swin_base_linear_lr5e4. warmup_steps=0.1 (ratio) → ~8964 warmup steps over 5 epochs.
+model:
+  model_name_or_path: "microsoft/swin-base-patch4-window7-224-in22k"
+  config_name: null
+  cache_dir: null
+  model_revision: "main"
+  image_processor_name: null
+  token: null
+  trust_remote_code: false
+  ignore_mismatched_sizes: true
+
+data:
+  dataset_name: null
+  dataset_config_name: null
+  data_file: null
+  data_dir: null
+  train_file: "/projectnb/herbdl/data/kaggle-herbaria/train_2022.json"
+  validation_file: "/projectnb/herbdl/data/kaggle-herbaria/val_2022.json"
+  image_column_name: "filename"
+  label_column_name: "scientificNameEncoded"
+  max_seq_length: 15
+  max_train_samples: null
+  max_eval_samples: null
+  overwrite_cache: false
+  preprocessing_num_workers: null
+  train_val_split: 0.2
+
+training:
+  output_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR5E4_WARMUP"
+  logging_dir: "/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN/LINEAR_LR5E4_WARMUP"
+  do_train: true
+  do_eval: true
+  per_device_train_batch_size: 128
+  per_device_eval_batch_size: 32
+  learning_rate: 0.0005
+  num_train_epochs: 100
+  warmup_steps: 0.05
+  weight_decay: 0.01
+  gradient_accumulation_steps: 1
+  lr_scheduler_type: "linear"
+  logging_strategy: "epoch"
+  save_strategy: "epoch"
+  save_total_limit: 5
+  eval_strategy: "steps"
+  eval_steps: 8964
+  torch_compile: true
+  report_to: "wandb"
+  bf16: true
+  dataloader_num_workers: -1
+  remove_unused_columns: false
+  overwrite_output_dir: false
+  seed: 0
+
+custom:
+  lr_type: "linear"
+  frozen: false
+  frozen_type: "none"
+  run_group: "SWIN_Base_LR_Sweep"
+  run_name: "SWIN_Base_Linear_LR5e4_Warmup"
+  run_id: "swin_base_linear_lr5e4_warmup_052026"
+  run_notes: "LR sweep: linear, lr=5e-4, 5-epoch warmup (0→5e-4). ImageNet-22k pretrained."
+
+wandb:
+  entity: "gardoslab"
+  project: "herbdl"
+  resume: "allow"
diff --git a/finetuning/SWIN/launch_sweep.py b/finetuning/SWIN/launch_sweep.py
new file mode 100644
index 0000000..47b1ec5
--- /dev/null
+++ b/finetuning/SWIN/launch_sweep.py
@@ -0,0 +1,124 @@
+#!/usr/bin/env python3
+"""
+Launch a sweep of training jobs from a single base config + a list of per-job overrides.
+
+Usage:
+  python launch_sweep.py --base configs/swin_base_pretrained_linear.yml --sweep my_sweep.yml
+  python launch_sweep.py --base configs/swin_base_pretrained_linear.yml --sweep my_sweep.yml --dry-run
+
+Sweep YAML format:
+  qsub:                          # optional — overrides defaults below
+    h_rt: "48:00:00"
+    gpus: 1
+    gpu_c: 7.0
+    pe: "omp 8"
+
+  experiments:
+    - training.seed: 0
+      custom.run_id: my_run_seed0_052026
+      training.output_dir: /path/to/output/SEED0
+      training.logging_dir: /path/to/output/SEED0
+
+    - training.seed: 1
+      custom.run_id: my_run_seed1_052026
+      training.output_dir: /path/to/output/SEED1
+      training.logging_dir: /path/to/output/SEED1
+"""
+
+import argparse
+import os
+import subprocess
+import sys
+
+import yaml
+
+DEFAULTS = {
+    "h_rt":  "48:00:00",
+    "pe":    "omp 8",
+    "P":     "herbdl",
+    "gpus":  1,
+    "gpu_c": 8.0,
+    "M":     "faridkar@bu.edu",
+}
+
+SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
+TRAIN_SCRIPT = os.path.join(SCRIPT_DIR, "train_advanced.sh")
+
+
+def build_set_args(overrides: dict) -> str:
+    """Convert {key: value} overrides to '--set key=value ...' string."""
+    parts = []
+    for key, value in overrides.items():
+        parts.append(f"--set {key}={value}")
+    return " ".join(parts)
+
+
+def submit(base_config: str, overrides: dict, qsub_opts: dict, dry_run: bool) -> None:
+    run_id = overrides.get("custom.run_id", "unknown")
+    job_name = run_id.upper().replace("-", "_")[:15]   # qsub name limit
+    set_args = build_set_args(overrides)
+
+    cmd = [
+        "qsub",
+        "-l", f"h_rt={qsub_opts['h_rt']}",
+        "-pe", qsub_opts["pe"],
+        "-P",  qsub_opts["P"],
+        "-l",  f"gpus={qsub_opts['gpus']}",
+        "-l",  f"gpu_c={qsub_opts['gpu_c']}",
+        "-m",  "beas",
+        "-M",  qsub_opts["M"],
+        "-N",  job_name,
+        "-v",  f"CONFIG_FILE={base_config},SET_ARGS={set_args}",
+        TRAIN_SCRIPT,
+    ]
+
+    print(f"[{'DRY RUN' if dry_run else 'SUBMIT'}] {run_id}")
+    print(f"  overrides : {set_args or '(none)'}")
+    print(f"  job name  : {job_name}")
+    print(f"  command   : {' '.join(cmd)}")
+    print()
+
+    if not dry_run:
+        result = subprocess.run(cmd, capture_output=True, text=True)
+        if result.returncode != 0:
+            print(f"  ERROR: {result.stderr.strip()}", file=sys.stderr)
+        else:
+            print(f"  {result.stdout.strip()}")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Launch a sweep of qsub training jobs")
+    parser.add_argument("--base",    required=True, help="Base config YAML path")
+    parser.add_argument("--sweep",   required=True, help="Sweep spec YAML path")
+    parser.add_argument("--dry-run", action="store_true", help="Print commands without submitting")
+    args = parser.parse_args()
+
+    if not os.path.isfile(args.base):
+        sys.exit(f"Base config not found: {args.base}")
+    if not os.path.isfile(args.sweep):
+        sys.exit(f"Sweep file not found: {args.sweep}")
+
+    with open(args.sweep) as f:
+        sweep = yaml.safe_load(f)
+
+    experiments = sweep.get("experiments", [])
+    if not experiments:
+        sys.exit("No experiments found in sweep file.")
+
+    qsub_opts = {**DEFAULTS, **sweep.get("qsub", {})}
+
+    print(f"Base config : {args.base}")
+    print(f"Experiments : {len(experiments)}")
+    print(f"qsub opts   : {qsub_opts}")
+    print()
+
+    for exp in experiments:
+        # Convert all values to strings for --set compatibility
+        overrides = {str(k): str(v) for k, v in exp.items()}
+        submit(args.base, overrides, qsub_opts, args.dry_run)
+
+    print("Done." if not args.dry_run else "Dry run complete — nothing submitted.")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/finetuning/SWIN/submit_concrete.sh b/finetuning/SWIN/submit_concrete.sh
new file mode 100755
index 0000000..b22dff9
--- /dev/null
+++ b/finetuning/SWIN/submit_concrete.sh
@@ -0,0 +1,64 @@
+#!/bin/bash
+# General-purpose seed-ensemble launcher for any advanced config.
+# Defaults to the concrete SWIN-L 384 run for backward compatibility.
+#
+# Key env vars (all optional):
+#   CONFIG      — config file path   (default: swin_large_384_concrete.yml)
+#   RUN_PREFIX  — base name used for output dirs and W&B run id/name
+#                 (default: SWIN_L_384_CONCRETE)
+#   OUT_BASE    — output root        (default: .../workspaces/faridkar/.../SWIN)
+#   SEEDS       — space-separated    (default: "0 1 2")
+#   NGPUS       — GPUs per job       (default: 1; set to 2 for multi-GPU / DDP)
+#   GPU_MEM     — GPU memory request (default: 80G)
+#   CKPT        — warm-start checkpoint dir (overrides model.model_name_or_path)
+#   EMAIL       — notification email (default: faridkar@bu.edu)
+#
+# Usage examples:
+#   bash submit_concrete.sh                            # concrete run, seeds 0 1 2 (defaults)
+#   SEEDS="0 1 2 3 4" bash submit_concrete.sh          # concrete run, 5 seeds
+#
+#   CONFIG=configs_advanced/swinv2_large_192_heavy_multitask.yml \
+#   RUN_PREFIX=SWINV2_L_192_HEAVY_MT \
+#   SEEDS="0 1 2 3 4" \
+#   NGPUS=2 \
+#       bash submit_concrete.sh
+#
+# Nothing is auto-submitted by Claude — run this yourself when ready.
+
+SEEDS=${SEEDS:-"0 1 2"}
+CONFIG=${CONFIG:-"configs_advanced/swin_large_384_concrete.yml"}
+RUN_PREFIX=${RUN_PREFIX:-"SWIN_L_384_CONCRETE"}
+OUT_BASE=${OUT_BASE:-"/projectnb/herbdl/workspaces/faridkar/herbdl/finetuning/output/SWIN"}
+NGPUS=${NGPUS:-1}
+GPU_MEM=${GPU_MEM:-"80G"}
+EMAIL=${EMAIL:-"faridkar@bu.edu"}
+
+OMP_THREADS=$(( NGPUS * 8 ))
+QSUB_ARGS="-l h_rt=48:00:00 -pe omp ${OMP_THREADS} -P herbdl -l gpus=${NGPUS} -l gpu_c=8.0 -l gpu_memory=${GPU_MEM} -m beas -M ${EMAIL}"
+
+# Pass NPROC_PER_NODE only when using multiple GPUs (triggers torchrun in train_advanced.sh)
+NPROC_VAR=""
+[ "$NGPUS" -gt 1 ] && NPROC_VAR=",NPROC_PER_NODE=${NGPUS}"
+
+# Derive a short job-name prefix (first 8 chars, uppercase, no special chars)
+JOB_PREFIX=$(echo "$RUN_PREFIX" | tr '[:lower:]' '[:upper:]' | tr -cd 'A-Z0-9' | cut -c1-8)
+
+for seed in $SEEDS; do
+    RUN_ID=$(echo "${RUN_PREFIX}_seed${seed}" | tr '[:upper:]' '[:lower:]')
+    RUN_NAME="${RUN_PREFIX}_Seed${seed}"
+    OUT="${OUT_BASE}/${RUN_PREFIX}_SEED${seed}"
+
+    SET_ARGS="--set training.seed=${seed} --set training.output_dir=${OUT} --set training.logging_dir=${OUT} --set custom.run_id=${RUN_ID} --set custom.run_name=${RUN_NAME}"
+    if [ -n "$CKPT" ]; then
+        SET_ARGS="${SET_ARGS} --set model.model_name_or_path=${CKPT}"
+    fi
+
+    JOB=$(qsub $QSUB_ARGS \
+        -N "${JOB_PREFIX}_S${seed}" \
+        -v CONFIG_FILE="${CONFIG}",SET_ARGS="${SET_ARGS}"${NPROC_VAR} \
+        train_advanced.sh | grep -oP '(?<=job )\d+')
+    echo "Submitted seed ${seed}: job ${JOB}  ->  ${OUT}"
+done
+
+echo
+echo "Monitor with: qstat -u \$USER"
diff --git a/finetuning/SWIN/submit_pretrained_seeds.sh b/finetuning/SWIN/submit_pretrained_seeds.sh
new file mode 100755
index 0000000..37017a2
--- /dev/null
+++ b/finetuning/SWIN/submit_pretrained_seeds.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+# Submit 5 independent seed runs of the pretrained-384 full pipeline.
+# Each job runs on 1 GPU (A100 80G) for up to 48h.
+#
+# Usage:
+#   bash submit_pretrained_seeds.sh           # submit all 5 seeds
+#   SEEDS="0 2 4" bash submit_pretrained_seeds.sh  # submit a subset
+
+SEEDS=${SEEDS:-"0 1 2 3 4"}
+
+QSUB_ARGS="-l h_rt=48:00:00 -pe omp 8 -P herbdl -l gpus=1 -l gpu_c=8.0 -l gpu_memory=80G -m beas -M faridkar@bu.edu"
+
+for seed in $SEEDS; do
+    CONFIG="configs_advanced/swin_pretrained_384_seed${seed}.yml"
+    JOB_NAME="PRETRAINED_384_S${seed}"
+
+    JOB=$(qsub $QSUB_ARGS \
+        -N "$JOB_NAME" \
+        -v CONFIG_FILE="$CONFIG" \
+        train_advanced.sh | grep -oP '(?<=job )\d+')
+
+    echo "Submitted seed ${seed}: job ${JOB}  (${CONFIG})"
+done
+
+echo ""
+echo "Monitor with: qstat -u faridkar"
diff --git a/finetuning/SWIN/train_advanced.sh b/finetuning/SWIN/train_advanced.sh
index 23c571e..3407ee5 100755
--- a/finetuning/SWIN/train_advanced.sh
+++ b/finetuning/SWIN/train_advanced.sh
@@ -1,23 +1,38 @@
 #!/bin/bash -l
 
 module load miniconda
-module load academic-ml/fall-2025
+module load academic-ml/spring-2026
 
-conda activate herb_env
+conda activate spring-2026-pyt
 
-# Path to config file - can be set via environment variable or use default
-# Options:
-#   - configs_advanced/swin_base_224_enhanced.yml
-#   - configs_advanced/swin_base_384_enhanced.yml
-#   - configs_advanced/swinv2_base_192_enhanced.yml
-# If CONFIG_FILE is not set (e.g., via qsub -v), use default
+# CONFIG_FILE must be provided (e.g. via `qsub -v CONFIG_FILE=...`, as submit_concrete.sh
+# does). Fail fast rather than silently running an arbitrary default config.
 if [ -z "$CONFIG_FILE" ]; then
-    CONFIG_FILE="configs_advanced/swin_base_224_multitask.yml"
+    echo "ERROR: CONFIG_FILE is not set. Pass it explicitly, e.g.:" >&2
+    echo "  qsub -v CONFIG_FILE=configs_advanced/swin_large_384_concrete.yml ... train_advanced.sh" >&2
+    echo "  (or use submit_concrete.sh, which sets it for you)" >&2
+    exit 1
+fi
+
+if [ ! -f "$CONFIG_FILE" ]; then
+    echo "ERROR: CONFIG_FILE '$CONFIG_FILE' not found (cwd: $(pwd))." >&2
+    exit 1
 fi
 
 echo "Using config file: $CONFIG_FILE"
+[ -n "$SET_ARGS" ] && echo "Overrides: $SET_ARGS"
 
-python SWIN_finetuning_advanced.py --config $CONFIG_FILE
+# Multi-GPU: set NPROC_PER_NODE=<n> in the qsub -v args to launch with torchrun (DDP).
+# Single GPU (default): plain python.
+NPROC=${NPROC_PER_NODE:-1}
+if [ "$NPROC" -gt 1 ]; then
+    echo "Launching with torchrun --nproc_per_node=$NPROC"
+    torchrun --nproc_per_node=$NPROC --standalone \
+        SWIN_finetuning_advanced.py --config $CONFIG_FILE ${SET_ARGS}
+else
+    python SWIN_finetuning_advanced.py --config $CONFIG_FILE ${SET_ARGS}
+fi
 
 # Example qsub command for multi-GPU training:
-# qsub -l h_rt=48:00:00 -pe omp 16 -P herbdl -l gpus=2 -l gpu_c=8.0 -l gpu_memory=80G -m beas -M faridkar@bu.edu -N SWINB_MT train_advanced.sh
+# qsub -l h_rt=48:00:00 -pe omp 16 -P herbdl -l gpus=2 -l gpu_c=8.0 -l gpu_memory=80G \
+#      -v NPROC_PER_NODE=2 -m beas -M faridkar@bu.edu -N SWIN_MULTIGPU train_advanced.sh