feat: add fish species classification model with MobileNetV3-Small#117
feat: add fish species classification model with MobileNetV3-Small#117arcgod-design wants to merge 1 commit into
Conversation
|
Someone is attempting to deploy a commit to the karan3431's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
🎉 Thank you for your Pull Request! We're thrilled to have your contribution to FreshScan AI. Before we review, please make sure you have:
A maintainer will review your code as soon as possible! |
|
Warning
|
| Layer / File(s) | Summary |
|---|---|
Species module: labels, metadata, model, and inference backend/species.py |
Declares SPECIES_LABELS, SPECIES_METADATA, NUM_SPECIES, and device selection. Defines get_species_model() (MobileNetV3-Small with custom head), species_transform preprocessing pipeline, load_species_model() with checkpoint handling, and predict_species() returning structured top-1 result with fallback when the model is unavailable. |
Backend config, startup, and payload helpers backend/main.py |
Imports species helpers, adds SPECIES_MODEL_PATH env var, and wires species model loading into the lifespan startup (with PyTorch presence gating). Extends _build_scan_payload with an optional species_info parameter and derived field defaulting; updates _row_to_payload to read species_detected from the DB row and enrich it with SPECIES_METADATA. |
Scan endpoints: prediction and persistence backend/main.py |
Both /api/v1/scan and /api/v1/scan-auto real-inference paths now call predict_species on the input image and persist species_detected from the result. Demo-mode paths pass hardcoded species_info into _build_scan_payload. |
Species Model Training Script
| Layer / File(s) | Summary |
|---|---|
Training script: dataset loading, loop, and CLI scripts/train_species.py |
Implements train_model with ImageFolder dataset loading, train/val augmentation transforms, epoch loop with cross-entropy loss, validation accuracy tracking, cosine LR scheduling, and best-model checkpointing to Models/species_mobilenetv3.pth. Adds an argparse CLI entrypoint. |
Sequence Diagram(s)
sequenceDiagram
rect rgba(173, 216, 230, 0.5)
Note over Client,Database: /api/v1/scan or /api/v1/scan-auto (real inference)
end
participant Client
participant ScanEndpoint
participant predict_species
participant _build_scan_payload
participant Database
Client->>ScanEndpoint: POST image
ScanEndpoint->>predict_species: image (PIL)
predict_species-->>ScanEndpoint: species_info {common_name, scientific_name, habitat, confidence}
ScanEndpoint->>_build_scan_payload: health_data + species_info
_build_scan_payload-->>ScanEndpoint: scan payload with species fields and tags
ScanEndpoint->>Database: INSERT scan (species_detected = species_info.common_name)
Database-->>ScanEndpoint: scan_id
ScanEndpoint-->>Client: scan payload
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
🐟 A fish swims by, its name unknown no more,
The model speaks: "Rohu Carp!" from pixel shore.
MobileNetV3 peers with tiny eyes,
Through softmax clouds and normalization skies.
The scanner now knows every fin and scale —
A bunny trained the net, and none shall fail! 🐰✨
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |
✅ Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The pull request title accurately and concisely describes the main feature: adding a fish species classification model using MobileNetV3-Small architecture. |
| Linked Issues check | ✅ Passed | The pull request successfully addresses all three coding objectives from issue #2: implements a lightweight MobileNetV3-Small classifier, provides a training script for dataset support, and integrates species prediction into the FastAPI pipeline for dynamic species_detected population. |
| Out of Scope Changes check | ✅ Passed | All changes align with the scope of issue #2: three files added/modified focus on species classification implementation, model training infrastructure, and FastAPI integration without introducing unrelated functionality. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
- Create PR with unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/main.py (1)
515-529:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t persist the zero-confidence fallback as a detected species.
When species weights are unavailable,
predict_species()returnsRohu Carpwithconfidence: 0.0; both inserts persist only the name, so history later shows a hardcoded Rohu Carp as if it were a real detection. Store an explicit unknown/unclassified species for zero-confidence results, or persist species confidence alongside the label and expose it in the payload.Also applies to: 636-649
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/main.py` around lines 515 - 529, The issue is that when predict_species() returns a zero-confidence fallback species (Rohu Carp), the insert statement only stores the species name without the confidence information, making it indistinguishable from a real detection in the database history. Fix this by modifying the table insert for the "species_detected" field to check if the confidence score from species_info is zero and either store an explicit "unclassified" or "unknown" value instead of the fallback name, or persist both the species name and its confidence score together. Apply the same fix in both locations where the scans table is inserted: in the section around the initial insert (line 515) and in the second similar insert location mentioned (lines 636-649).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/main.py`:
- Line 35: The import of species module inside the _row_to_payload() function
bypasses the top-level PyTorch gating mechanism, which means history
serialization can crash if PyTorch/TorchVision is unavailable. Move the species
module import and SPECIES_METADATA extraction to the top-level guarded import
block (where the try-except handles PyTorch availability), define a lightweight
fallback SPECIES_METADATA in the except branch for when PyTorch is unavailable,
then replace the dynamic import inside _row_to_payload() with a reference to the
module-level SPECIES_METADATA variable to ensure it uses the pre-initialized
value.
In `@backend/species.py`:
- Around line 95-100: The checkpoint loading code (around line 95-100 in the
conditional branches checking for "model_state_dict") does not validate that the
class-to-index mapping used during training matches the current SPECIES_LABELS
ordering. Modify the checkpoint saving code (referenced in lines 137-146) to
include the species_labels or class_to_idx mapping alongside the
model_state_dict. Then in the checkpoint loading code, extract and validate this
saved mapping against the current SPECIES_LABELS, and either reject the
checkpoint if there is a critical mismatch or apply a remapping transformation
to correct the model output indices before mapping them to species names during
inference.
- Around line 8-10: The torchvision version constraint in requirements.txt is
incompatible with the torch version specified. Update the torchvision constraint
in requirements.txt from the current incompatible version specification to match
the compatible version pair pinned in the Dockerfile (torchvision should be
pinned to version 0.17.2 or use a range like >=0.17.0,<0.18.0 to align with
torch>=2.2.0). This ensures that both the Dockerfile and requirements.txt will
install compatible versions of torch and torchvision, preventing import failures
when installing from requirements.txt.
- Around line 89-104: The torch.load() and load_state_dict() calls in the
species model loading section can raise exceptions from invalid or incompatible
checkpoint files, which will abort the entire startup process. Wrap the
checkpoint loading logic (from torch.load through the load_state_dict call and
model preparation) in a try-except block. When any exception occurs during
loading, catch it, reset the global variables _species_model and _species_loaded
to their initial state, log a warning message about the load failure, and return
early. This ensures species classification remains optional and the application
can continue with the fallback behavior when weights are invalid or
incompatible.
In `@scripts/train_species.py`:
- Around line 21-23: The docstring in lines 21-23 claims the script generates
synthetic training data when no dataset is available, but the actual code at
lines 64-69 calls process.exit(1) when dataset folders are missing,
contradicting this promise. Either update the docstring to accurately reflect
that the script exits when dataset folders are not found, or implement the
synthetic data generation logic that the docstring claims exists. Choose
whichever approach aligns with the intended behavior and make sure the
documentation and actual behavior match.
- Around line 71-84: The ImageFolder class sorts class names alphabetically,
creating indices that don't match the fixed SPECIES_LABELS order used at
inference, causing silent misclassifications. Create a target_transform function
that maps the alphabetically-sorted indices from ImageFolder to the
SPECIES_LABELS order, then apply it to both train_dataset and val_dataset when
creating the ImageFolder instances. Additionally, replace num_classes =
len(class_to_idx) with num_classes = NUM_SPECIES to ensure the model
architecture matches the expected number of species classes and prevents
checkpoint shape mismatches.
---
Outside diff comments:
In `@backend/main.py`:
- Around line 515-529: The issue is that when predict_species() returns a
zero-confidence fallback species (Rohu Carp), the insert statement only stores
the species name without the confidence information, making it indistinguishable
from a real detection in the database history. Fix this by modifying the table
insert for the "species_detected" field to check if the confidence score from
species_info is zero and either store an explicit "unclassified" or "unknown"
value instead of the fallback name, or persist both the species name and its
confidence score together. Apply the same fix in both locations where the scans
table is inserted: in the section around the initial insert (line 515) and in
the second similar insert location mentioned (lines 636-649).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: bbdc1a30-f334-46a7-a4db-e6030c164bf0
📒 Files selected for processing (3)
backend/main.pybackend/species.pyscripts/train_species.py
| try: | ||
| from inference import load_models, predict_stream_a, predict_stream_b | ||
| from fusion import process_and_fuse | ||
| from species import load_species_model, predict_species |
There was a problem hiding this comment.
Don’t import species metadata inside history conversion.
Line 304 bypasses the existing PyTorch gating: if species.py failed to import because Torch/TorchVision is unavailable, history serialization can still crash here. Import SPECIES_METADATA in the guarded top-level import path and provide a lightweight fallback in the except branch, then use that module-level value in _row_to_payload().
Proposed direction
- from species import load_species_model, predict_species
+ from species import SPECIES_METADATA, load_species_model, predict_species- from species import SPECIES_METADATA
metadata = SPECIES_METADATA.get(species_name, {"scientific_name": "Labeo rohita", "habitat": "Freshwater"})Also define a fallback SPECIES_METADATA in the existing import failure branch so _row_to_payload() remains usable without PyTorch.
Also applies to: 302-305
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/main.py` at line 35, The import of species module inside the
_row_to_payload() function bypasses the top-level PyTorch gating mechanism,
which means history serialization can crash if PyTorch/TorchVision is
unavailable. Move the species module import and SPECIES_METADATA extraction to
the top-level guarded import block (where the try-except handles PyTorch
availability), define a lightweight fallback SPECIES_METADATA in the except
branch for when PyTorch is unavailable, then replace the dynamic import inside
_row_to_payload() with a reference to the module-level SPECIES_METADATA variable
to ensure it uses the pre-initialized value.
| import torch | ||
| import torch.nn as nn | ||
| from torchvision import models, transforms |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Locate dependency declarations and show torch/torchvision pins with context.
fd -i '^(requirements.*|pyproject\.toml|setup\.py|poetry\.lock|uv\.lock|Pipfile|Pipfile\.lock|Dockerfile.*)$' \
--exec sh -c 'echo "### $1"; rg -n -C2 "torch|torchvision" "$1" || true' sh {}Repository: jpdevhub/FreshScanAi
Length of output: 1064
Fix incompatible torch/torchvision version constraint in requirements.txt.
The Dockerfile correctly pins torch==2.2.2 and torchvision==0.17.2 (a compatible pair), but requirements.txt pins torchvision>=0.27.0, which requires torch>=2.4.0. This mismatch will cause import failures when installing from requirements.txt without the Dockerfile context.
Update requirements.txt to pin torchvision>=0.17.0,<0.18.0 (or use torchvision==0.17.2 to match Dockerfile) to align with torch>=2.2.0.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/species.py` around lines 8 - 10, The torchvision version constraint
in requirements.txt is incompatible with the torch version specified. Update the
torchvision constraint in requirements.txt from the current incompatible version
specification to match the compatible version pair pinned in the Dockerfile
(torchvision should be pinned to version 0.17.2 or use a range like
>=0.17.0,<0.18.0 to align with torch>=2.2.0). This ensures that both the
Dockerfile and requirements.txt will install compatible versions of torch and
torchvision, preventing import failures when installing from requirements.txt.
| path = Path(weights_path) | ||
| if not path.exists(): | ||
| print(f"WARNING: Species model not found at {path}. Using default species.") | ||
| return | ||
|
|
||
| _species_model = get_species_model() | ||
| checkpoint = torch.load(path, map_location=device, weights_only=True) | ||
|
|
||
| if isinstance(checkpoint, dict) and "model_state_dict" in checkpoint: | ||
| _species_model.load_state_dict(checkpoint["model_state_dict"]) | ||
| else: | ||
| _species_model.load_state_dict(checkpoint) | ||
|
|
||
| _species_model.to(device) | ||
| _species_model.eval() | ||
| _species_loaded = True |
There was a problem hiding this comment.
Keep species loading optional when the checkpoint is present but invalid.
A bad or incompatible weights file currently raises from torch.load() / load_state_dict() and aborts startup, even though species classification is intended to fall back when unavailable. Reset the globals and return the fallback path on load failure.
Proposed defensive load handling
path = Path(weights_path)
if not path.exists():
print(f"WARNING: Species model not found at {path}. Using default species.")
+ _species_model = None
+ _species_loaded = False
return
- _species_model = get_species_model()
- checkpoint = torch.load(path, map_location=device, weights_only=True)
-
- if isinstance(checkpoint, dict) and "model_state_dict" in checkpoint:
- _species_model.load_state_dict(checkpoint["model_state_dict"])
- else:
- _species_model.load_state_dict(checkpoint)
-
- _species_model.to(device)
- _species_model.eval()
- _species_loaded = True
+ try:
+ _species_model = get_species_model()
+ checkpoint = torch.load(path, map_location=device, weights_only=True)
+
+ if isinstance(checkpoint, dict) and "model_state_dict" in checkpoint:
+ _species_model.load_state_dict(checkpoint["model_state_dict"])
+ else:
+ _species_model.load_state_dict(checkpoint)
+
+ _species_model.to(device)
+ _species_model.eval()
+ _species_loaded = True
+ except Exception as exc:
+ print(f"WARNING: Failed to load species model from {path}: {exc}. Using default species.")
+ _species_model = None
+ _species_loaded = False
+ return
print(f"Species model loaded from {path}")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/species.py` around lines 89 - 104, The torch.load() and
load_state_dict() calls in the species model loading section can raise
exceptions from invalid or incompatible checkpoint files, which will abort the
entire startup process. Wrap the checkpoint loading logic (from torch.load
through the load_state_dict call and model preparation) in a try-except block.
When any exception occurs during loading, catch it, reset the global variables
_species_model and _species_loaded to their initial state, log a warning message
about the load failure, and return early. This ensures species classification
remains optional and the application can continue with the fallback behavior
when weights are invalid or incompatible.
| checkpoint = torch.load(path, map_location=device, weights_only=True) | ||
|
|
||
| if isinstance(checkpoint, dict) and "model_state_dict" in checkpoint: | ||
| _species_model.load_state_dict(checkpoint["model_state_dict"]) | ||
| else: | ||
| _species_model.load_state_dict(checkpoint) |
There was a problem hiding this comment.
Persist and validate the class-index mapping with the checkpoint.
Inference maps top_idx directly into SPECIES_LABELS, but the training snippet saves only a bare state_dict, so no label order is available to validate. If the dataset class order differs from SPECIES_LABELS, every prediction can be assigned the wrong species. Save species_labels or class_to_idx with the checkpoint and reject/remap mismatches during load.
Also applies to: 137-146
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/species.py` around lines 95 - 100, The checkpoint loading code
(around line 95-100 in the conditional branches checking for "model_state_dict")
does not validate that the class-to-index mapping used during training matches
the current SPECIES_LABELS ordering. Modify the checkpoint saving code
(referenced in lines 137-146) to include the species_labels or class_to_idx
mapping alongside the model_state_dict. Then in the checkpoint loading code,
extract and validate this saved mapping against the current SPECIES_LABELS, and
either reject the checkpoint if there is a critical mismatch or apply a
remapping transformation to correct the model output indices before mapping them
to species names during inference.
| If no dataset is available, the script generates synthetic training data | ||
| from publicly available fish images for demonstration purposes. | ||
| """ |
There was a problem hiding this comment.
Fix the docstring fallback claim (or implement it).
Lines [21]-[23] say synthetic data is auto-generated, but Lines [64]-[69] hard-exit when dataset folders are missing. This is misleading for CLI users.
Suggested patch
-If no dataset is available, the script generates synthetic training data
-from publicly available fish images for demonstration purposes.
+This script requires a prepared dataset on disk and does not
+auto-generate or download training data.Also applies to: 64-69
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/train_species.py` around lines 21 - 23, The docstring in lines 21-23
claims the script generates synthetic training data when no dataset is
available, but the actual code at lines 64-69 calls process.exit(1) when dataset
folders are missing, contradicting this promise. Either update the docstring to
accurately reflect that the script exits when dataset folders are not found, or
implement the synthetic data generation logic that the docstring claims exists.
Choose whichever approach aligns with the intended behavior and make sure the
documentation and actual behavior match.
| train_dataset = datasets.ImageFolder(str(train_dir), transform=train_transform) | ||
| val_dataset = datasets.ImageFolder(str(val_dir), transform=val_transform) if val_dir.exists() else None | ||
|
|
||
| # Verify class mapping matches our labels | ||
| class_to_idx = train_dataset.class_to_idx | ||
| print(f"Found {len(class_to_idx)} classes: {list(class_to_idx.keys())}") | ||
|
|
||
| train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0) | ||
| val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0) if val_dataset else None | ||
|
|
||
| # ── Model ─────────────────────────────────────────────────────────────── | ||
| num_classes = len(class_to_idx) | ||
| model = get_species_model(num_classes) | ||
| model.to(device) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the key files mentioned
git ls-files | grep -E "(train_species|species\.py|backend)" | head -20Repository: jpdevhub/FreshScanAi
Length of output: 570
🏁 Script executed:
# Find SPECIES_LABELS and NUM_SPECIES definitions
rg -n "SPECIES_LABELS|NUM_SPECIES" --type pyRepository: jpdevhub/FreshScanAi
Length of output: 619
🏁 Script executed:
# Examine train_species.py to understand the full context
fd "train_species.py" --type f -exec wc -l {} \;Repository: jpdevhub/FreshScanAi
Length of output: 251
🏁 Script executed:
# Get the actual SPECIES_LABELS definition
sed -n '17,42p' backend/species.pyRepository: jpdevhub/FreshScanAi
Length of output: 1093
🏁 Script executed:
# Check the train_species.py code around lines 71-84 and the docstring
head -100 scripts/train_species.pyRepository: jpdevhub/FreshScanAi
Length of output: 3468
🏁 Script executed:
# Check how ImageFolder class_to_idx is ordered
rg -A 10 "ImageFolder.*target_transform" --type pyRepository: jpdevhub/FreshScanAi
Length of output: 46
🏁 Script executed:
# Check how ImageFolder sorts classes - test with Python
python3 << 'PY'
from torchvision import datasets
import tempfile
import os
from pathlib import Path
# Create a temp directory with the species names
with tempfile.TemporaryDirectory() as tmpdir:
species = ["Rohu Carp", "Catla Carp", "Mrigal Carp", "Pangas", "Basa",
"Tilapia", "Pomfret", "Kingfish", "Mackerel", "Sardine"]
for s in species:
os.makedirs(os.path.join(tmpdir, s), exist_ok=True)
# Create a dummy file
Path(os.path.join(tmpdir, s, "dummy.txt")).touch()
# Create ImageFolder and check class_to_idx
folder = datasets.ImageFolder(tmpdir)
print("ImageFolder class_to_idx:")
for label, idx in sorted(folder.class_to_idx.items(), key=lambda x: x[1]):
print(f" {idx}: {label}")
PYRepository: jpdevhub/FreshScanAi
Length of output: 188
🏁 Script executed:
# Check the inference code to confirm how it decodes logits
sed -n '125,150p' backend/species.pyRepository: jpdevhub/FreshScanAi
Length of output: 881
🏁 Script executed:
# Check if there's any remapping or label handling in the training script
rg -A 20 "class_to_idx" scripts/train_species.pyRepository: jpdevhub/FreshScanAi
Length of output: 1201
🏁 Script executed:
# Verify ImageFolder's alphabetical sorting behavior by checking documentation/behavior
python3 << 'PY'
# ImageFolder sorts class names alphabetically by default
species_list = ["Rohu Carp", "Catla Carp", "Mrigal Carp", "Pangas", "Basa",
"Tilapia", "Pomfret", "Kingfish", "Mackerel", "Sardine"]
sorted_species = sorted(species_list)
print("ImageFolder alphabetical sort (expected class_to_idx order):")
for i, s in enumerate(sorted_species):
print(f" {i}: {s}")
print("\nSPECIES_LABELS order (expected by inference):")
for i, s in enumerate(species_list):
print(f" {i}: {s}")
print("\nMismatch example:")
print(f" ImageFolder index 0 = '{sorted_species[0]}' but SPECIES_LABELS[0] = '{species_list[0]}'")
PYRepository: jpdevhub/FreshScanAi
Length of output: 532
🏁 Script executed:
# Check how get_species_model handles num_classes parameter
sed -n '50,80p' backend/species.pyRepository: jpdevhub/FreshScanAi
Length of output: 1051
🏁 Script executed:
# Verify the docstring claim about synthetic data generation
rg -B 5 -A 10 "synthetic" scripts/train_species.pyRepository: jpdevhub/FreshScanAi
Length of output: 436
Enforce a stable label-index contract before training.
Training uses ImageFolder, which sorts class names alphabetically (Basa, Catla Carp, ..., Tilapia), but inference decodes logits directly using the fixed SPECIES_LABELS order (Rohu Carp, Catla Carp, ..., Sardine). This causes silent misclassification: e.g., "Rohu Carp" images receive index 7 during training, but index 7 is decoded as "Kingfish" at inference. Line 82 also allows num_classes to vary, creating checkpoint shape mismatches.
Remap ImageFolder indices to match SPECIES_LABELS order via target_transform before creating dataloaders, and use num_classes = NUM_SPECIES instead of len(class_to_idx).
Additionally, the module docstring (lines 8–15) claims the script generates synthetic training data as a fallback, but the code exits with sys.exit(1) when train_dir is missing. Either implement the fallback or correct the docstring.
Suggested patch for label remapping
+ from species import SPECIES_LABELS, NUM_SPECIES
+
train_dataset = datasets.ImageFolder(str(train_dir), transform=train_transform)
val_dataset = datasets.ImageFolder(str(val_dir), transform=val_transform) if val_dir.exists() else None
# Verify class mapping matches our labels
class_to_idx = train_dataset.class_to_idx
+ expected_labels = set(SPECIES_LABELS)
+ train_labels = set(class_to_idx.keys())
+ if train_labels != expected_labels:
+ missing = sorted(expected_labels - train_labels)
+ extra = sorted(train_labels - expected_labels)
+ raise ValueError(
+ f"Dataset labels must match SPECIES_LABELS exactly. missing={missing}, extra={extra}"
+ )
+
+ species_idx = {label: i for i, label in enumerate(SPECIES_LABELS)}
+ train_remap = {idx: species_idx[label] for label, idx in class_to_idx.items()}
+ train_dataset.target_transform = lambda idx: train_remap[idx]
+
+ if val_dataset is not None:
+ val_labels = set(val_dataset.class_to_idx.keys())
+ if val_labels != expected_labels:
+ raise ValueError("Validation labels must match SPECIES_LABELS exactly.")
+ val_remap = {idx: species_idx[label] for label, idx in val_dataset.class_to_idx.items()}
+ val_dataset.target_transform = lambda idx: val_remap[idx]
+
print(f"Found {len(class_to_idx)} classes: {list(class_to_idx.keys())}")
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0) if val_dataset else None
# ── Model ───────────────────────────────────────────────────────────────
- num_classes = len(class_to_idx)
+ num_classes = NUM_SPECIES🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/train_species.py` around lines 71 - 84, The ImageFolder class sorts
class names alphabetically, creating indices that don't match the fixed
SPECIES_LABELS order used at inference, causing silent misclassifications.
Create a target_transform function that maps the alphabetically-sorted indices
from ImageFolder to the SPECIES_LABELS order, then apply it to both
train_dataset and val_dataset when creating the ImageFolder instances.
Additionally, replace num_classes = len(class_to_idx) with num_classes =
NUM_SPECIES to ensure the model architecture matches the expected number of
species classes and prevents checkpoint shape mismatches.
|
where is the model give the drive link or if small size you can also push here |
Summary
Changes
backend/species.py: NEW — Species classifier module with MobileNetV3-Small architecture, labels, metadata, and inference functionscripts/train_species.py: NEW — Training script for the species model (supports custom datasets or online datasets like FishNet-121)backend/main.py: Integrated species classifier into scan pipelineSpecies Supported
Architecture
Integration Points
process_scan(): Classifies species from body image, populatesspecies_detectedfieldscan_auto(): Classifies species from uploaded image_build_scan_payload(): Accepts optionalspecies_infodict_row_to_payload(): Usesspecies_detectedfrom DB with metadata lookupTraining
Closes
closes #2
Summary by CodeRabbit
New Features
Chores