Skip to content

MikhailArtemyev/AI-ImageClassificationProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ResNet50 Image Classification Project

A deep learning implementation of ResNet50 architecture for 6-class image classification using Keras/TensorFlow. The provided model is trained to recognise hand gestures.

Overview

This project implements a custom ResNet50 (Residual Network) architecture for classifying images into 6 different classes. ResNet50 is a deep convolutional neural network that uses residual connections to enable training of very deep networks while avoiding the vanishing gradient problem.

Architecture

ResNet50 Structure

The ResNet50 model consists of:

  • Input Layer: Accepts 64×64×3 RGB images
  • Initial Convolution: 7×7 conv layer with 64 filters, stride 2
  • 5 Main Stages: Each containing residual blocks with skip connections
  • Global Average Pooling: Reduces spatial dimensions
  • Dense Output Layer: 6-class softmax classifier

Key Components

1. Identity Block

Input → Conv1×1 → BN → ReLU → Conv3×3 → BN → ReLU → Conv1×1 → BN → Add(shortcut) → ReLU → Output
  • Used when input and output dimensions match
  • Implements the core residual learning: H(x) = F(x) + x
  • Contains 3 convolutional layers with batch normalization

2. Convolutional Block

Input → Conv1×1 → BN → ReLU → Conv3×3 → BN → ReLU → Conv1×1 → BN
   ↓                                                              ↓
   → Conv1×1 → BN ────────────────────────────────────→ Add → ReLU → Output
  • Used when input and output dimensions differ
  • Includes a shortcut path with 1×1 convolution to match dimensions
  • Enables downsampling between stages

3. Network Stages

Stage Output Size Blocks Filters Operations
1 32×32 1 conv block [64,64,256] Initial feature extraction
2 32×32 1 conv + 2 identity [64,64,256] Low-level features
3 16×16 1 conv + 3 identity [128,128,512] Mid-level features, 2× downsample
4 8×8 1 conv + 5 identity [256,256,1024] High-level features, 2× downsample
5 4×4 1 conv + 2 identity [512,512,2048] Abstract features, 2× downsample

How It Works

1. Residual Learning

ResNet addresses the vanishing gradient problem in deep networks through skip connections:

  • Instead of learning H(x) directly, the network learns the residual F(x) = H(x) - x
  • The final output becomes H(x) = F(x) + x
  • This allows gradients to flow directly through skip connections during backpropagation

2. Feature Extraction Pipeline

  1. Early Layers: Extract low-level features (edges, textures)
  2. Middle Layers: Combine features into patterns and shapes
  3. Deep Layers: Learn high-level semantic representations
  4. Global Pooling: Aggregates spatial information
  5. Classifier: Maps features to class probabilities

3. Training Process

  • Data Preprocessing: Images normalized to [0,1] range
  • Data Splitting: 70% training, 30% testing
  • Loss Function: Categorical crossentropy for multi-class classification
  • Optimizer: Adam with learning rate 0.00015
  • Target: Automatically trains until >85% accuracy

Project Structure

project/
├── resnet_model.py           # Main implementation
├── load_images.py           # Image loading utility
├── images_with_labels/      # Training dataset
├── images/                  # Test images
└── resnet50_trained.keras   # Saved model

Key Functions

Model Creation

new_model(path_to_model, print_summary=False)
  • Creates fresh ResNet50 model
  • Compiles with Adam optimizer
  • Saves to specified path

Training

train_model(path_to_model, path_to_images)
train_()  # Auto-trains until 85% accuracy on test set 
  • Loads dataset with 70/30 split
  • Trains for 10 epochs per iteration
  • Evaluates on test set
  • Saves improved model

Prediction

predict(path_to_model, path_to_image)
  • Loads trained model
  • Preprocesses input image
  • Returns class probabilities and prediction

Usage

Creating a New Model

new_model('my_model.keras', print_summary=True)

Training

# Single training session
train_model('my_model.keras', 'path/to/images')

# Auto-train until target accuracy
train_()

Making Predictions

predict('resnet50_trained.keras', 'path/to/image.jpg')

Technical Details

  • Input Size: 64×64×3 RGB images
  • Classes: 6 (labeled 0-5)
  • Parameters: ~23M trainable parameters
  • Memory: Moderate GPU memory requirements
  • Training Time: Varies with dataset size and hardware

Dependencies

keras
tensorflow
numpy
matplotlib

Model Performance

The model automatically trains until achieving >85% test accuracy. Performance tracking includes:

  • Training/validation loss curves
  • Accuracy metrics per epoch
  • Stage-wise feature map dimensions
  • Final classification results

Acknowledgments:

This work is highly inspired by DeepLearning.AI Specialization: https://www.coursera.org/learn/convolutional-neural-networks

About

image recognition using keras (September 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages