Plant Disease Detection from Leaf Images

A machine learning project that detects onion plant diseases from leaf photos using MobileNetV2 transfer learning with Stratified K-Fold Cross-Validation and Oversampling for imbalanced datasets.

Course Project — Machine Learning Dataset: Onion Diseases – Kaggle (Tejas Barguje Patil)

About the Project

This project uses a pre-trained MobileNetV2 convolutional neural network (CNN) with transfer learning to classify onion leaf images into 15 disease categories.

Key features:

Stratified K-Fold Cross-Validation (5 folds) for fair, robust evaluation
Oversampling to handle severe class imbalance (e.g., 3,440 Healthy vs. 7 Bulb Rot images)
Multi-architecture comparison — MobileNetV2, ResNet50V2, and DenseNet121

Training is done in two phases per fold:

Phase 1 — Train only the custom classification head (base model frozen)
Phase 2 — Fine-tune the top layers of the chosen backbone

Tech Stack:

Python 3.11 / 3.12
TensorFlow 2.20.0 / Keras 3
MobileNetV2, ResNet50V2, DenseNet121 (pre-trained on ImageNet)

Disease Classes

#	Class	Image Count
1	Alternaria_D	830
2	Botrytis Leaf Blight	289
3	Bulb Rot	7
4	Bulb_blight-D	394
5	Caterpillar-P	1,558
6	Downy Mildew	37
7	Fusarium-D	1,276
8	Healthy Leaves	3,440
9	Iris Yellow Virus Augment	1,899
10	onion1	132
11	Purple Blotch	847
12	Rust	213
13	stemphylium Leaf Blight	1,606
14	Virosis-D	512
15	Xanthomonas Leaf Blight	189

Requirements

Python 3.11 or 3.12
Windows 10 / 11
At least 4GB RAM
Internet connection (for downloading pre-trained weights)

Python Packages

tensorflow==2.20.0
numpy
matplotlib
scikit-learn
seaborn
Pillow
kaggle

Installation

Step 1 — Clone this repository

git clone https://github.com/dnjstr/plant-disease-detection.git
cd plant-disease-detection

Step 2 — Create and activate virtual environment

python -m venv venv
venv\Scripts\activate

Step 3 — Install dependencies

pip install -r requirements.txt

Step 4 — Download the dataset

Go to: https://www.kaggle.com/datasets/tejasbargujepatil/onion-diseases/data
Click Download (free Kaggle account required)
Extract the ZIP and rename the folder to raw_dataset
Place raw_dataset/ inside the project folder

How to Run

Run the scripts in this order:

1. Visualize the dataset (optional but recommended) — shows class imbalance before and after oversampling

python visualize_dataset.py

2. Train the model with Cross-Validation — runs 5-fold stratified CV with oversampling, saves best model per fold

python train_cv.py

3. Compare architectures (optional) — trains MobileNetV2, ResNet50V2, and DenseNet121 for comparison

python compare_architectures.py

4. Evaluate the model — prints accuracy and saves confusion matrix

python evaluate.py

5. Predict a leaf image — test with your own photo

python predict.py --image path/to/leaf.jpg

Or predict a whole folder of images:

python predict.py --folder path/to/images/

Project Structure

plant-disease-detection/
│
├── raw_dataset/                    ← download from Kaggle (not in repo)
│   ├── Alternaria_D/
│   ├── Healthy leaves/
│   └── ... (15 classes)
│
├── cv_models/                      ← auto-created by train_cv.py
│   ├── model_fold_1.keras
│   ├── model_fold_2.keras
│   └── ... (one model per fold)
│
├── train_cv.py                     ← main training script (Stratified K-Fold + Oversampling)
├── compare_architectures.py        ← compares MobileNetV2, ResNet50V2, DenseNet121
├── visualize_dataset.py            ← dataset distribution plots
├── predict.py                      ← predict on new images
├── evaluate.py                     ← model evaluation + confusion matrix
├── get_stats.py                    ← utility to count images per class
├── requirements.txt                ← all dependencies
├── class_names.json                ← saved class labels
├── dataset_stats.json              ← per-class image counts
│
├── dataset_distribution.png        ← class imbalance plot (after visualize_dataset.py)
├── model_comparison.png            ← architecture comparison chart (after compare_architectures.py)
└── confusion_matrix.png            ← confusion matrix (after evaluate.py)

Results

After training, the following output files are generated:

File	Description
`cv_models/model_fold_N.keras`	Best model saved for each CV fold
`cv_results.json`	Fold accuracies, mean, and std deviation
`dataset_distribution.png`	Before/after class balance visualization
`model_comparison.png`	Accuracy and speed comparison across architectures
`confusion_matrix.png`	Per-class prediction performance

Troubleshooting

Problem	Fix
`No module named tensorflow`	Run `venv\Scripts\activate` first
`FileNotFoundError: raw_dataset`	Download and place the dataset folder as described in Installation
`Could not resolve host` during pip	Add `--trusted-host pypi.org --trusted-host files.pythonhosted.org`
Out of memory during training	Change `BATCH_SIZE = 32` to `BATCH_SIZE = 16` in `train_cv.py`
Slow training	Normal on CPU (~30–60 min per fold). Set `EPOCHS = 5` in `train_cv.py` for a quick test
Wrong model loaded in evaluate.py	Edit `MODEL_PATH` in `evaluate.py` to point to the fold you want

License

This project is for educational purposes only. Dataset credit: Tejas Barguje Patil on Kaggle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plant Disease Detection from Leaf Images

Table of Contents

About the Project

Disease Classes

Requirements

Python Packages

Installation

How to Run

Project Structure

Results

Troubleshooting

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
cv_data		cv_data
cv_models		cv_models
docs		docs
test_data		test_data
.gitignore		.gitignore
README.md		README.md
SETUP_GUIDE.md		SETUP_GUIDE.md
class_names.json		class_names.json
compare_architectures.py		compare_architectures.py
evaluate.py		evaluate.py
get_stats.py		get_stats.py
predict.py		predict.py
requirements.txt		requirements.txt
train_cv.py		train_cv.py
visualize_dataset.py		visualize_dataset.py

Folders and files

Latest commit

History

Repository files navigation

Plant Disease Detection from Leaf Images

Table of Contents

About the Project

Disease Classes

Requirements

Python Packages

Installation

How to Run

Project Structure

Results

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages