Skip to content

Archana-P-Nair/CaptionGenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

20 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ–ผ๏ธ Image Caption Generator using Deep Learning

๐Ÿš€ A full-stack Image Caption Generator that automatically generates meaningful, human-like captions for images using CNN + LSTM, wrapped with a frontend UI, backend API, and deployed for real-world usage.

๐ŸŒŸ Live Deployment

๐Ÿ”— https://caption-generator-green.vercel.app/


๐Ÿ“Œ Project Overview

The goal of this project is to generate descriptive captions for images by learning both:

  • visual features from images, and
  • linguistic patterns from text data.

๐Ÿง  The system:

  • Accepts an image upload from the user
  • Extracts image features using a pre-trained CNN
  • Generates captions word-by-word using an LSTM
  • Serves predictions through a backend API
  • Displays results on a deployed frontend interface

๐Ÿง  Model Architecture

This project uses a CNN + RNN (LSTM) hybrid deep learning architecture.

๐Ÿ”น CNN โ€“ Image Feature Extraction

  • Model: Xception (pre-trained on ImageNet)
  • Input Size: 299 ร— 299
  • Output: 2048-dimensional feature vector

๐Ÿ”น RNN โ€“ Caption Generation

  • Embedding Layer: Converts tokens into dense vectors
  • LSTM Layer: Captures sequence and context
  • Dense + Softmax: Predicts next word from vocabulary

๐Ÿ“Œ Image features and text embeddings are merged to predict captions sequentially.


๐Ÿ› ๏ธ Tech Stack

๐Ÿง  Machine Learning

  • TensorFlow
  • Keras
  • CNN (Xception)
  • RNN / LSTM
  • NumPy, Pandas
  • Pickle

๐ŸŒ Backend

  • Python
  • FastAPI
  • Uvicorn
  • TensorFlow Serving Logic
  • REST API

๐ŸŽจ Frontend

  • Next.js
  • React
  • Image Upload Interface
  • Caption Display UI

โ˜๏ธ Deployment

  • Frontend: Vercel
  • Backend: Render (FastAPI)
  • Model: Loaded at runtime for inference

๐Ÿ–ฅ๏ธ Application Features

โœจ Upload any image
โœจ Generate captions instantly
โœจ Clean & responsive UI
โœจ Backend-powered inference
โœจ Fully deployed and accessible online


๐Ÿ“ Project Structure

CaptionGenerator/
โ”‚
โ”œโ”€โ”€ frontend/
โ”‚   โ”œโ”€โ”€ app/                
โ”‚   โ”œโ”€โ”€ components/         
โ”‚   โ”œโ”€โ”€ public/             
โ”‚   โ”œโ”€โ”€ styles/             
โ”‚   โ””โ”€โ”€ package.json        
โ”‚
โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ main.py            
โ”‚   โ”œโ”€โ”€ caption_service.py  
โ”‚   โ”œโ”€โ”€ utils.py            
โ”‚   โ””โ”€โ”€ requirements.txt    
โ”‚
โ”œโ”€โ”€ models/
โ”‚   โ””โ”€โ”€ model_9.h5          
โ”‚
โ”œโ”€โ”€ tokenizer.p            
โ””โ”€โ”€ README.md               

๐Ÿš€ Running Locally

๐Ÿ”ง Backend

pip install -r backend/requirements.txt
uvicorn backend.main:app --reload

๐Ÿ”ง Frontend

cd frontend
npm install
npm run dev
Set environment variable: NEXT_PUBLIC_API_URL=http://localhost:8000

๐Ÿ”ฎ Future Enhancements

  • ๐Ÿค– Transformer-based captioning models
  • ๐Ÿ” Beam search decoding
  • ๐ŸŒ Multilingual captions
  • ๐ŸŽฅ Video captioning
  • ๐Ÿ“Š Performance metrics (BLEU score)

๐Ÿ‘ฉโ€๐Ÿ’ป Author

Archana P Nair ๐Ÿ”— GitHub: https://github.com/Archana-P-Nair

โญ If you like this project, donโ€™t forget to star the repo!

Untitled.video.-.Made.with.Clipchamp.mp4

About

๐ŸŒ๐Ÿ–ผ๏ธ This project is a full-stack Image Caption Generator that uses a CNN (Xception) + LSTM deep learning architecture to generate meaningful, human-like captions for images. It features a Next.js frontend and a FastAPI backend, enabling real-time image captioning through a deployed web interface. The system leverages transfer learning and NLP

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors