This project implements a multi-class semantic segmentation pipeline using a U-Net architecture in TensorFlow. It is designed for drone imagery and supports segmentation into multiple classes such as:
- Background
- Water
- Vegetation
- Structure (buildings)
The pipeline includes:
- Data preprocessing (RGB masks → label masks)
- Augmentation using Albumentations
- Custom Dice + CrossEntropy loss
- Mean IoU evaluation metric
- Training visualization (loss, accuracy, IoU)
- Prediction on new images using a separate script
https://drive.google.com/uc?id=1b88NGOW-7EgNQ1LLI0UHXE-KLOzaqnGm
├── main.py # Training pipeline
├── predict.py # Inference script
├── data.zip # Dataset (images + masks)
├── processed/ # Preprocessed dataset (auto-generated)
├── predictions/ # Output predictions
├── final_model.h5 # Saved trained model
└── README.md
To use GPU acceleration, you need to install:
- NVIDIA GPU drivers
- CUDA Toolkit
- cuDNN (CUDA Deep Neural Network library)
Run:
nvidia-smi
If your GPU is listed, proceed.
Download CUDA Toolkit from: https://developer.nvidia.com/cuda-downloads
Install a version compatible with your TensorFlow version.
Download cuDNN from: https://developer.nvidia.com/cudnn
Steps:
- Extract the cuDNN folder
- Copy contents into CUDA directory:
bin → C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vXX.X\bin
lib → ...\lib
include → ...\include
Add CUDA paths to system environment variables:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vXX.X\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vXX.X\libnvvp
Run Python:
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))If GPU appears → setup is correct.
Create and activate a virtual environment:
python -m venv .venv
.venv\Scripts\activate
python3 -m venv .venv
source .venv/bin/activate
Install dependencies:
pip install -r requirements.txt
If you don’t have a requirements file, install manually:
pip install tensorflow opencv-python numpy matplotlib albumentations scikit-learn seaborn
Your dataset should be structured inside data.zip:
input/
├── original_images/
│ img1.jpg
│ img2.jpg
├── masked_images/
img1.jpg
img2.jpg
- Each mask is a color-coded segmentation image
- Colors are mapped to class labels during preprocessing
Run:
python main.py
This will:
- Extract dataset
- Preprocess images and masks
- Train the U-Net model
- Save model as
final_model.h5 - Display training graphs (loss, accuracy, IoU)
The model reports:
- Accuracy (pixel-wise)
- Mean IoU (Intersection over Union)
This script performs inference on new images.
You can use:
- A single image
- A folder of images
Modify in predict.py:
INPUT_PATH = "test_images"
python predict.py
Predictions are saved in:
predictions/
image1_mask.png
image2_mask.png
Each output includes:
- Segmented mask (colored)
- Optional visualization (original + prediction)
| Label | Class | Color (RGB) |
|---|---|---|
| 0 | Background | (169,169,169) |
| 1 | Water | (14,135,204) |
| 2 | Vegetation | (124,252,0) |
| 3 | Structure | (155,38,182) |
- Replace U-Net with DeepLabV3+
- Add class-wise weighting
- Real-time segmentation (video/webcam)
- Better class separation (e.g., road vs background)