feat: add fish species classification model with MobileNetV3-Small by arcgod-design · Pull Request #117 · jpdevhub/FreshScanAi

arcgod-design · 2026-06-18T21:28:03Z

Summary

Added fish species classification using MobileNetV3-Small (~2.5M params)
Replaces hardcoded "Rohu Carp" with dynamic species detection
Supports 10 common fish species found in local markets

Changes

backend/species.py: NEW — Species classifier module with MobileNetV3-Small architecture, labels, metadata, and inference function
scripts/train_species.py: NEW — Training script for the species model (supports custom datasets or online datasets like FishNet-121)
backend/main.py: Integrated species classifier into scan pipeline

Species Supported

Rohu Carp (Labeo rohita)
Catla Carp (Catla catla)
Mrigal Carp (Cirrhinus cirrhosus)
Pangas (Pangasius hypophthalmus)
Basa (Pangasius bocourti)
Tilapia (Oreochromis niloticus)
Pomfret (Pampus argenteus)
Kingfish (Scomberomorus commerson)
Mackerel (Rastrelliger kanagurta)
Sardine (Sardinella longiceps)

Architecture

Model: MobileNetV3-Small backbone + custom classifier head (256 → num_classes)
Input: 224×224 RGB image
Output: Species label + confidence score
Preprocessing: ImageNet normalization (same as existing models)
Fallback: Returns "Rohu Carp" with 0.0 confidence if model weights not found

Integration Points

process_scan(): Classifies species from body image, populates species_detected field
scan_auto(): Classifies species from uploaded image
_build_scan_payload(): Accepts optional species_info dict
_row_to_payload(): Uses species_detected from DB with metadata lookup
Server startup: Loads species model weights alongside existing models

Training

# With custom dataset
python scripts/train_species.py --data_dir ./dataset --epochs 20

# Dataset structure expected:
#   dataset/train/<species_name>/images...
#   dataset/val/<species_name>/images...

Closes

closes #2

Summary by CodeRabbit

New Features
- Fish species classification now integrated into scan functionality, automatically detecting species from images.
- Scan results now include species information (common name, scientific name, habitat) with confidence levels.
- Species detection available for both standard and automated scan modes.
Chores
- Added training infrastructure for the species classification model.

…loses jpdevhub#2)

vercel · 2026-06-18T21:28:07Z

Someone is attempting to deploy a commit to the karan3431's projects Team on Vercel.

A member of the Team first needs to authorize it.

github-actions · 2026-06-18T21:28:13Z

🎉 Thank you for your Pull Request! We're thrilled to have your contribution to FreshScan AI.

Before we review, please make sure you have:

Followed the CONTRIBUTING.md guidelines.
Ensured all automated CI checks (linting, tests) are passing.
Checked that your commit messages follow the Conventional Commits format.

A maintainer will review your code as soon as possible!

coderabbitai · 2026-06-18T21:28:21Z

Warning

`.coderabbit.yaml` has a parsing error

The CodeRabbit configuration file in this repository has a parsing error and default settings were used instead. Please fix the error(s) in the configuration file. You can initialize chat with CodeRabbit to get help with the configuration file.

💥 Parsing errors (1)

Validation error: Invalid input: expected object, received boolean at "reviews.auto_review"

⚙️ Configuration instructions

Please see the configuration documentation for more information.
You can also validate your configuration using the online YAML validator.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

📝 Walkthrough

Walkthrough

Adds backend/species.py implementing a MobileNetV3-Small fish species classifier with label constants, metadata, model loading, and inference. backend/main.py is updated to load the species model at startup, extend payload helpers with species fields, and call predict_species in both /api/v1/scan and /api/v1/scan-auto. A standalone scripts/train_species.py training script is also introduced.

Changes

Species Classifier Module and Backend Integration

Layer / File(s)	Summary
Species module: labels, metadata, model, and inference `backend/species.py`	Declares `SPECIES_LABELS`, `SPECIES_METADATA`, `NUM_SPECIES`, and device selection. Defines `get_species_model()` (MobileNetV3-Small with custom head), `species_transform` preprocessing pipeline, `load_species_model()` with checkpoint handling, and `predict_species()` returning structured top-1 result with fallback when the model is unavailable.
Backend config, startup, and payload helpers `backend/main.py`	Imports species helpers, adds `SPECIES_MODEL_PATH` env var, and wires species model loading into the lifespan startup (with PyTorch presence gating). Extends `_build_scan_payload` with an optional `species_info` parameter and derived field defaulting; updates `_row_to_payload` to read `species_detected` from the DB row and enrich it with `SPECIES_METADATA`.
Scan endpoints: prediction and persistence `backend/main.py`	Both `/api/v1/scan` and `/api/v1/scan-auto` real-inference paths now call `predict_species` on the input image and persist `species_detected` from the result. Demo-mode paths pass hardcoded `species_info` into `_build_scan_payload`.

Species Model Training Script

Layer / File(s)	Summary
Training script: dataset loading, loop, and CLI `scripts/train_species.py`	Implements `train_model` with `ImageFolder` dataset loading, train/val augmentation transforms, epoch loop with cross-entropy loss, validation accuracy tracking, cosine LR scheduling, and best-model checkpointing to `Models/species_mobilenetv3.pth`. Adds an argparse CLI entrypoint.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(173, 216, 230, 0.5)
    Note over Client,Database: /api/v1/scan or /api/v1/scan-auto (real inference)
  end
  participant Client
  participant ScanEndpoint
  participant predict_species
  participant _build_scan_payload
  participant Database

  Client->>ScanEndpoint: POST image
  ScanEndpoint->>predict_species: image (PIL)
  predict_species-->>ScanEndpoint: species_info {common_name, scientific_name, habitat, confidence}
  ScanEndpoint->>_build_scan_payload: health_data + species_info
  _build_scan_payload-->>ScanEndpoint: scan payload with species fields and tags
  ScanEndpoint->>Database: INSERT scan (species_detected = species_info.common_name)
  Database-->>ScanEndpoint: scan_id
  ScanEndpoint-->>Client: scan payload

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐟 A fish swims by, its name unknown no more,
The model speaks: "Rohu Carp!" from pixel shore.
MobileNetV3 peers with tiny eyes,
Through softmax clouds and normalization skies.
The scanner now knows every fin and scale —
A bunny trained the net, and none shall fail! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately and concisely describes the main feature: adding a fish species classification model using MobileNetV3-Small architecture.
Linked Issues check	✅ Passed	The pull request successfully addresses all three coding objectives from issue `#2`: implements a lightweight MobileNetV3-Small classifier, provides a training script for dataset support, and integrates species prediction into the FastAPI pipeline for dynamic species_detected population.
Out of Scope Changes check	✅ Passed	All changes align with the scope of issue `#2`: three files added/modified focus on species classification implementation, model training infrastructure, and FastAPI integration without introducing unrelated functionality.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/main.py (1)
515-529: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t persist the zero-confidence fallback as a detected species.

When species weights are unavailable, predict_species() returns Rohu Carp with confidence: 0.0; both inserts persist only the name, so history later shows a hardcoded Rohu Carp as if it were a real detection. Store an explicit unknown/unclassified species for zero-confidence results, or persist species confidence alongside the label and expose it in the payload.

Also applies to: 636-649
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/main.py` around lines 515 - 529, The issue is that when
predict_species() returns a zero-confidence fallback species (Rohu Carp), the
insert statement only stores the species name without the confidence
information, making it indistinguishable from a real detection in the database
history. Fix this by modifying the table insert for the "species_detected" field
to check if the confidence score from species_info is zero and either store an
explicit "unclassified" or "unknown" value instead of the fallback name, or
persist both the species name and its confidence score together. Apply the same
fix in both locations where the scans table is inserted: in the section around
the initial insert (line 515) and in the second similar insert location
mentioned (lines 636-649).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/main.py`:
- Line 35: The import of species module inside the _row_to_payload() function
bypasses the top-level PyTorch gating mechanism, which means history
serialization can crash if PyTorch/TorchVision is unavailable. Move the species
module import and SPECIES_METADATA extraction to the top-level guarded import
block (where the try-except handles PyTorch availability), define a lightweight
fallback SPECIES_METADATA in the except branch for when PyTorch is unavailable,
then replace the dynamic import inside _row_to_payload() with a reference to the
module-level SPECIES_METADATA variable to ensure it uses the pre-initialized
value.

In `@backend/species.py`:
- Around line 95-100: The checkpoint loading code (around line 95-100 in the
conditional branches checking for "model_state_dict") does not validate that the
class-to-index mapping used during training matches the current SPECIES_LABELS
ordering. Modify the checkpoint saving code (referenced in lines 137-146) to
include the species_labels or class_to_idx mapping alongside the
model_state_dict. Then in the checkpoint loading code, extract and validate this
saved mapping against the current SPECIES_LABELS, and either reject the
checkpoint if there is a critical mismatch or apply a remapping transformation
to correct the model output indices before mapping them to species names during
inference.
- Around line 8-10: The torchvision version constraint in requirements.txt is
incompatible with the torch version specified. Update the torchvision constraint
in requirements.txt from the current incompatible version specification to match
the compatible version pair pinned in the Dockerfile (torchvision should be
pinned to version 0.17.2 or use a range like >=0.17.0,<0.18.0 to align with
torch>=2.2.0). This ensures that both the Dockerfile and requirements.txt will
install compatible versions of torch and torchvision, preventing import failures
when installing from requirements.txt.
- Around line 89-104: The torch.load() and load_state_dict() calls in the
species model loading section can raise exceptions from invalid or incompatible
checkpoint files, which will abort the entire startup process. Wrap the
checkpoint loading logic (from torch.load through the load_state_dict call and
model preparation) in a try-except block. When any exception occurs during
loading, catch it, reset the global variables _species_model and _species_loaded
to their initial state, log a warning message about the load failure, and return
early. This ensures species classification remains optional and the application
can continue with the fallback behavior when weights are invalid or
incompatible.

In `@scripts/train_species.py`:
- Around line 21-23: The docstring in lines 21-23 claims the script generates
synthetic training data when no dataset is available, but the actual code at
lines 64-69 calls process.exit(1) when dataset folders are missing,
contradicting this promise. Either update the docstring to accurately reflect
that the script exits when dataset folders are not found, or implement the
synthetic data generation logic that the docstring claims exists. Choose
whichever approach aligns with the intended behavior and make sure the
documentation and actual behavior match.
- Around line 71-84: The ImageFolder class sorts class names alphabetically,
creating indices that don't match the fixed SPECIES_LABELS order used at
inference, causing silent misclassifications. Create a target_transform function
that maps the alphabetically-sorted indices from ImageFolder to the
SPECIES_LABELS order, then apply it to both train_dataset and val_dataset when
creating the ImageFolder instances. Additionally, replace num_classes =
len(class_to_idx) with num_classes = NUM_SPECIES to ensure the model
architecture matches the expected number of species classes and prevents
checkpoint shape mismatches.

---

Outside diff comments:
In `@backend/main.py`:
- Around line 515-529: The issue is that when predict_species() returns a
zero-confidence fallback species (Rohu Carp), the insert statement only stores
the species name without the confidence information, making it indistinguishable
from a real detection in the database history. Fix this by modifying the table
insert for the "species_detected" field to check if the confidence score from
species_info is zero and either store an explicit "unclassified" or "unknown"
value instead of the fallback name, or persist both the species name and its
confidence score together. Apply the same fix in both locations where the scans
table is inserted: in the section around the initial insert (line 515) and in
the second similar insert location mentioned (lines 636-649).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: bbdc1a30-f334-46a7-a4db-e6030c164bf0

📥 Commits

Reviewing files that changed from the base of the PR and between d5dab73 and 763f1a5.

📒 Files selected for processing (3)

backend/main.py
backend/species.py
scripts/train_species.py

coderabbitai · 2026-06-18T21:35:26Z

 try:
    from inference import load_models, predict_stream_a, predict_stream_b
    from fusion import process_and_fuse
+    from species import load_species_model, predict_species


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t import species metadata inside history conversion.

Line 304 bypasses the existing PyTorch gating: if species.py failed to import because Torch/TorchVision is unavailable, history serialization can still crash here. Import SPECIES_METADATA in the guarded top-level import path and provide a lightweight fallback in the except branch, then use that module-level value in _row_to_payload().

Proposed direction

- from species import load_species_model, predict_species + from species import SPECIES_METADATA, load_species_model, predict_species

- from species import SPECIES_METADATA metadata = SPECIES_METADATA.get(species_name, {"scientific_name": "Labeo rohita", "habitat": "Freshwater"})

Also define a fallback SPECIES_METADATA in the existing import failure branch so _row_to_payload() remains usable without PyTorch.

Also applies to: 302-305

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/main.py` at line 35, The import of species module inside the _row_to_payload() function bypasses the top-level PyTorch gating mechanism, which means history serialization can crash if PyTorch/TorchVision is unavailable. Move the species module import and SPECIES_METADATA extraction to the top-level guarded import block (where the try-except handles PyTorch availability), define a lightweight fallback SPECIES_METADATA in the except branch for when PyTorch is unavailable, then replace the dynamic import inside _row_to_payload() with a reference to the module-level SPECIES_METADATA variable to ensure it uses the pre-initialized value.

coderabbitai · 2026-06-18T21:35:26Z

+import torch
+import torch.nn as nn
+from torchvision import models, transforms


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Locate dependency declarations and show torch/torchvision pins with context. fd -i '^(requirements.*|pyproject\.toml|setup\.py|poetry\.lock|uv\.lock|Pipfile|Pipfile\.lock|Dockerfile.*)$' \ --exec sh -c 'echo "### $1"; rg -n -C2 "torch|torchvision" "$1" || true' sh {}

Repository: jpdevhub/FreshScanAi

Length of output: 1064

Fix incompatible torch/torchvision version constraint in requirements.txt.

The Dockerfile correctly pins torch==2.2.2 and torchvision==0.17.2 (a compatible pair), but requirements.txt pins torchvision>=0.27.0, which requires torch>=2.4.0. This mismatch will cause import failures when installing from requirements.txt without the Dockerfile context.

Update requirements.txt to pin torchvision>=0.17.0,<0.18.0 (or use torchvision==0.17.2 to match Dockerfile) to align with torch>=2.2.0.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/species.py` around lines 8 - 10, The torchvision version constraint in requirements.txt is incompatible with the torch version specified. Update the torchvision constraint in requirements.txt from the current incompatible version specification to match the compatible version pair pinned in the Dockerfile (torchvision should be pinned to version 0.17.2 or use a range like >=0.17.0,<0.18.0 to align with torch>=2.2.0). This ensures that both the Dockerfile and requirements.txt will install compatible versions of torch and torchvision, preventing import failures when installing from requirements.txt.

coderabbitai · 2026-06-18T21:35:26Z

+    path = Path(weights_path)
+    if not path.exists():
+        print(f"WARNING: Species model not found at {path}. Using default species.")
+        return
+
+    _species_model = get_species_model()
+    checkpoint = torch.load(path, map_location=device, weights_only=True)
+
+    if isinstance(checkpoint, dict) and "model_state_dict" in checkpoint:
+        _species_model.load_state_dict(checkpoint["model_state_dict"])
+    else:
+        _species_model.load_state_dict(checkpoint)
+
+    _species_model.to(device)
+    _species_model.eval()
+    _species_loaded = True


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep species loading optional when the checkpoint is present but invalid.

A bad or incompatible weights file currently raises from torch.load() / load_state_dict() and aborts startup, even though species classification is intended to fall back when unavailable. Reset the globals and return the fallback path on load failure.

Proposed defensive load handling

path = Path(weights_path) if not path.exists(): print(f"WARNING: Species model not found at {path}. Using default species.") + _species_model = None + _species_loaded = False return - _species_model = get_species_model() - checkpoint = torch.load(path, map_location=device, weights_only=True) - - if isinstance(checkpoint, dict) and "model_state_dict" in checkpoint: - _species_model.load_state_dict(checkpoint["model_state_dict"]) - else: - _species_model.load_state_dict(checkpoint) - - _species_model.to(device) - _species_model.eval() - _species_loaded = True + try: + _species_model = get_species_model() + checkpoint = torch.load(path, map_location=device, weights_only=True) + + if isinstance(checkpoint, dict) and "model_state_dict" in checkpoint: + _species_model.load_state_dict(checkpoint["model_state_dict"]) + else: + _species_model.load_state_dict(checkpoint) + + _species_model.to(device) + _species_model.eval() + _species_loaded = True + except Exception as exc: + print(f"WARNING: Failed to load species model from {path}: {exc}. Using default species.") + _species_model = None + _species_loaded = False + return print(f"Species model loaded from {path}")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/species.py` around lines 89 - 104, The torch.load() and load_state_dict() calls in the species model loading section can raise exceptions from invalid or incompatible checkpoint files, which will abort the entire startup process. Wrap the checkpoint loading logic (from torch.load through the load_state_dict call and model preparation) in a try-except block. When any exception occurs during loading, catch it, reset the global variables _species_model and _species_loaded to their initial state, log a warning message about the load failure, and return early. This ensures species classification remains optional and the application can continue with the fallback behavior when weights are invalid or incompatible.

coderabbitai · 2026-06-18T21:35:26Z

+    checkpoint = torch.load(path, map_location=device, weights_only=True)
+
+    if isinstance(checkpoint, dict) and "model_state_dict" in checkpoint:
+        _species_model.load_state_dict(checkpoint["model_state_dict"])
+    else:
+        _species_model.load_state_dict(checkpoint)


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Persist and validate the class-index mapping with the checkpoint.

Inference maps top_idx directly into SPECIES_LABELS, but the training snippet saves only a bare state_dict, so no label order is available to validate. If the dataset class order differs from SPECIES_LABELS, every prediction can be assigned the wrong species. Save species_labels or class_to_idx with the checkpoint and reject/remap mismatches during load.

Also applies to: 137-146

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/species.py` around lines 95 - 100, The checkpoint loading code (around line 95-100 in the conditional branches checking for "model_state_dict") does not validate that the class-to-index mapping used during training matches the current SPECIES_LABELS ordering. Modify the checkpoint saving code (referenced in lines 137-146) to include the species_labels or class_to_idx mapping alongside the model_state_dict. Then in the checkpoint loading code, extract and validate this saved mapping against the current SPECIES_LABELS, and either reject the checkpoint if there is a critical mismatch or apply a remapping transformation to correct the model output indices before mapping them to species names during inference.

coderabbitai · 2026-06-18T21:35:26Z

+If no dataset is available, the script generates synthetic training data
+from publicly available fish images for demonstration purposes.
+"""


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the docstring fallback claim (or implement it).

Lines [21]-[23] say synthetic data is auto-generated, but Lines [64]-[69] hard-exit when dataset folders are missing. This is misleading for CLI users.

Suggested patch

-If no dataset is available, the script generates synthetic training data -from publicly available fish images for demonstration purposes. +This script requires a prepared dataset on disk and does not +auto-generate or download training data.

Also applies to: 64-69

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/train_species.py` around lines 21 - 23, The docstring in lines 21-23 claims the script generates synthetic training data when no dataset is available, but the actual code at lines 64-69 calls process.exit(1) when dataset folders are missing, contradicting this promise. Either update the docstring to accurately reflect that the script exits when dataset folders are not found, or implement the synthetic data generation logic that the docstring claims exists. Choose whichever approach aligns with the intended behavior and make sure the documentation and actual behavior match.

coderabbitai · 2026-06-18T21:35:26Z

+    train_dataset = datasets.ImageFolder(str(train_dir), transform=train_transform)
+    val_dataset = datasets.ImageFolder(str(val_dir), transform=val_transform) if val_dir.exists() else None
+
+    # Verify class mapping matches our labels
+    class_to_idx = train_dataset.class_to_idx
+    print(f"Found {len(class_to_idx)} classes: {list(class_to_idx.keys())}")
+
+    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0)
+    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0) if val_dataset else None
+
+    # ── Model ───────────────────────────────────────────────────────────────
+    num_classes = len(class_to_idx)
+    model = get_species_model(num_classes)
+    model.to(device)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate the key files mentioned git ls-files | grep -E "(train_species|species\.py|backend)" | head -20

Repository: jpdevhub/FreshScanAi

Length of output: 570

🏁 Script executed:

# Find SPECIES_LABELS and NUM_SPECIES definitions rg -n "SPECIES_LABELS|NUM_SPECIES" --type py

Repository: jpdevhub/FreshScanAi

Length of output: 619

🏁 Script executed:

# Examine train_species.py to understand the full context fd "train_species.py" --type f -exec wc -l {} \;

Repository: jpdevhub/FreshScanAi

Length of output: 251

🏁 Script executed:

# Get the actual SPECIES_LABELS definition sed -n '17,42p' backend/species.py

Repository: jpdevhub/FreshScanAi

Length of output: 1093

🏁 Script executed:

# Check the train_species.py code around lines 71-84 and the docstring head -100 scripts/train_species.py

Repository: jpdevhub/FreshScanAi

Length of output: 3468

🏁 Script executed:

# Check how ImageFolder class_to_idx is ordered rg -A 10 "ImageFolder.*target_transform" --type py

Repository: jpdevhub/FreshScanAi

Length of output: 46

🏁 Script executed:

# Check how ImageFolder sorts classes - test with Python python3 << 'PY' from torchvision import datasets import tempfile import os from pathlib import Path # Create a temp directory with the species names with tempfile.TemporaryDirectory() as tmpdir: species = ["Rohu Carp", "Catla Carp", "Mrigal Carp", "Pangas", "Basa", "Tilapia", "Pomfret", "Kingfish", "Mackerel", "Sardine"] for s in species: os.makedirs(os.path.join(tmpdir, s), exist_ok=True) # Create a dummy file Path(os.path.join(tmpdir, s, "dummy.txt")).touch() # Create ImageFolder and check class_to_idx folder = datasets.ImageFolder(tmpdir) print("ImageFolder class_to_idx:") for label, idx in sorted(folder.class_to_idx.items(), key=lambda x: x[1]): print(f" {idx}: {label}") PY

Repository: jpdevhub/FreshScanAi

Length of output: 188

🏁 Script executed:

# Check the inference code to confirm how it decodes logits sed -n '125,150p' backend/species.py

Repository: jpdevhub/FreshScanAi

Length of output: 881

🏁 Script executed:

# Check if there's any remapping or label handling in the training script rg -A 20 "class_to_idx" scripts/train_species.py

Repository: jpdevhub/FreshScanAi

Length of output: 1201

🏁 Script executed:

# Verify ImageFolder's alphabetical sorting behavior by checking documentation/behavior python3 << 'PY' # ImageFolder sorts class names alphabetically by default species_list = ["Rohu Carp", "Catla Carp", "Mrigal Carp", "Pangas", "Basa", "Tilapia", "Pomfret", "Kingfish", "Mackerel", "Sardine"] sorted_species = sorted(species_list) print("ImageFolder alphabetical sort (expected class_to_idx order):") for i, s in enumerate(sorted_species): print(f" {i}: {s}") print("\nSPECIES_LABELS order (expected by inference):") for i, s in enumerate(species_list): print(f" {i}: {s}") print("\nMismatch example:") print(f" ImageFolder index 0 = '{sorted_species[0]}' but SPECIES_LABELS[0] = '{species_list[0]}'") PY

Repository: jpdevhub/FreshScanAi

Length of output: 532

🏁 Script executed:

# Check how get_species_model handles num_classes parameter sed -n '50,80p' backend/species.py

Repository: jpdevhub/FreshScanAi

Length of output: 1051

🏁 Script executed:

# Verify the docstring claim about synthetic data generation rg -B 5 -A 10 "synthetic" scripts/train_species.py

Repository: jpdevhub/FreshScanAi

Length of output: 436

Enforce a stable label-index contract before training.

Training uses ImageFolder, which sorts class names alphabetically (Basa, Catla Carp, ..., Tilapia), but inference decodes logits directly using the fixed SPECIES_LABELS order (Rohu Carp, Catla Carp, ..., Sardine). This causes silent misclassification: e.g., "Rohu Carp" images receive index 7 during training, but index 7 is decoded as "Kingfish" at inference. Line 82 also allows num_classes to vary, creating checkpoint shape mismatches.

Remap ImageFolder indices to match SPECIES_LABELS order via target_transform before creating dataloaders, and use num_classes = NUM_SPECIES instead of len(class_to_idx).

Additionally, the module docstring (lines 8–15) claims the script generates synthetic training data as a fallback, but the code exits with sys.exit(1) when train_dir is missing. Either implement the fallback or correct the docstring.

Suggested patch for label remapping

+ from species import SPECIES_LABELS, NUM_SPECIES + train_dataset = datasets.ImageFolder(str(train_dir), transform=train_transform) val_dataset = datasets.ImageFolder(str(val_dir), transform=val_transform) if val_dir.exists() else None # Verify class mapping matches our labels class_to_idx = train_dataset.class_to_idx + expected_labels = set(SPECIES_LABELS) + train_labels = set(class_to_idx.keys()) + if train_labels != expected_labels: + missing = sorted(expected_labels - train_labels) + extra = sorted(train_labels - expected_labels) + raise ValueError( + f"Dataset labels must match SPECIES_LABELS exactly. missing={missing}, extra={extra}" + ) + + species_idx = {label: i for i, label in enumerate(SPECIES_LABELS)} + train_remap = {idx: species_idx[label] for label, idx in class_to_idx.items()} + train_dataset.target_transform = lambda idx: train_remap[idx] + + if val_dataset is not None: + val_labels = set(val_dataset.class_to_idx.keys()) + if val_labels != expected_labels: + raise ValueError("Validation labels must match SPECIES_LABELS exactly.") + val_remap = {idx: species_idx[label] for label, idx in val_dataset.class_to_idx.items()} + val_dataset.target_transform = lambda idx: val_remap[idx] + print(f"Found {len(class_to_idx)} classes: {list(class_to_idx.keys())}") train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0) val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0) if val_dataset else None # ── Model ─────────────────────────────────────────────────────────────── - num_classes = len(class_to_idx) + num_classes = NUM_SPECIES

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/train_species.py` around lines 71 - 84, The ImageFolder class sorts class names alphabetically, creating indices that don't match the fixed SPECIES_LABELS order used at inference, causing silent misclassifications. Create a target_transform function that maps the alphabetically-sorted indices from ImageFolder to the SPECIES_LABELS order, then apply it to both train_dataset and val_dataset when creating the ImageFolder instances. Additionally, replace num_classes = len(class_to_idx) with num_classes = NUM_SPECIES to ensure the model architecture matches the expected number of species classes and prevents checkpoint shape mismatches.

jpdevhub · 2026-06-22T08:42:37Z

where is the model give the drive link or if small size you can also push here

feat: add fish species classification model with MobileNetV3-Small (c…

763f1a5

…loses jpdevhub#2)

github-actions Bot added the size/level: medium label Jun 18, 2026

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

Conversation

arcgod-design commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Species Supported

Architecture

Integration Points

Training

Closes

Summary by CodeRabbit

Uh oh!

vercel Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

.coderabbit.yaml has a parsing error

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

jpdevhub commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arcgod-design commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

`.coderabbit.yaml` has a parsing error