🖼️ Image Caption Generator using Deep Learning

🚀 A full-stack Image Caption Generator that automatically generates meaningful, human-like captions for images using CNN + LSTM, wrapped with a frontend UI, backend API, and deployed for real-world usage.

🌟 Live Deployment

🔗 https://caption-generator-green.vercel.app/

📌 Project Overview

The goal of this project is to generate descriptive captions for images by learning both:

visual features from images, and
linguistic patterns from text data.

🧠 The system:

Accepts an image upload from the user
Extracts image features using a pre-trained CNN
Generates captions word-by-word using an LSTM
Serves predictions through a backend API
Displays results on a deployed frontend interface

🧠 Model Architecture

This project uses a CNN + RNN (LSTM) hybrid deep learning architecture.

🔹 CNN – Image Feature Extraction

Model: Xception (pre-trained on ImageNet)
Input Size: 299 × 299
Output: 2048-dimensional feature vector

🔹 RNN – Caption Generation

Embedding Layer: Converts tokens into dense vectors
LSTM Layer: Captures sequence and context
Dense + Softmax: Predicts next word from vocabulary

📌 Image features and text embeddings are merged to predict captions sequentially.

🛠️ Tech Stack

🧠 Machine Learning

TensorFlow
Keras
CNN (Xception)
RNN / LSTM
NumPy, Pandas
Pickle

🌐 Backend

Python
FastAPI
Uvicorn
TensorFlow Serving Logic
REST API

🎨 Frontend

Next.js
React
Image Upload Interface
Caption Display UI

☁️ Deployment

Frontend: Vercel
Backend: Render (FastAPI)
Model: Loaded at runtime for inference

🖥️ Application Features

✨ Upload any image
✨ Generate captions instantly
✨ Clean & responsive UI
✨ Backend-powered inference
✨ Fully deployed and accessible online

📁 Project Structure

CaptionGenerator/
│
├── frontend/
│   ├── app/                
│   ├── components/         
│   ├── public/             
│   ├── styles/             
│   └── package.json        
│
├── backend/
│   ├── main.py            
│   ├── caption_service.py  
│   ├── utils.py            
│   └── requirements.txt    
│
├── models/
│   └── model_9.h5          
│
├── tokenizer.p            
└── README.md

🚀 Running Locally

🔧 Backend

pip install -r backend/requirements.txt
uvicorn backend.main:app --reload

🔧 Frontend

cd frontend
npm install
npm run dev
Set environment variable: NEXT_PUBLIC_API_URL=http://localhost:8000

🔮 Future Enhancements

🤖 Transformer-based captioning models
🔍 Beam search decoding
🌍 Multilingual captions
🎥 Video captioning
📊 Performance metrics (BLEU score)

👩‍💻 Author

Archana P Nair 🔗 GitHub: https://github.com/Archana-P-Nair

⭐ If you like this project, don’t forget to star the repo!

Untitled.video.-.Made.with.Clipchamp.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
backend		backend
frontend		frontend
models		models
share/man/man1		share/man/man1
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
descriptions.txt		descriptions.txt
features.p		features.p
main.py		main.py
test.py		test.py
tokenizer.p		tokenizer.p

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🖼️ Image Caption Generator using Deep Learning

🚀 A full-stack Image Caption Generator that automatically generates meaningful, human-like captions for images using CNN + LSTM, wrapped with a frontend UI, backend API, and deployed for real-world usage.

🌟 Live Deployment

📌 Project Overview

🧠 Model Architecture

🔹 CNN – Image Feature Extraction

🔹 RNN – Caption Generation

🛠️ Tech Stack

🧠 Machine Learning

🌐 Backend

🎨 Frontend

☁️ Deployment

🖥️ Application Features

📁 Project Structure

🚀 Running Locally

🔧 Backend

🔧 Frontend

🔮 Future Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🖼️ Image Caption Generator using Deep Learning

🚀 A full-stack Image Caption Generator that automatically generates meaningful, human-like captions for images using CNN + LSTM, wrapped with a frontend UI, backend API, and deployed for real-world usage.

🌟 Live Deployment

📌 Project Overview

🧠 Model Architecture

🔹 CNN – Image Feature Extraction

🔹 RNN – Caption Generation

🛠️ Tech Stack

🧠 Machine Learning

🌐 Backend

🎨 Frontend

☁️ Deployment

🖥️ Application Features

📁 Project Structure

🚀 Running Locally

🔧 Backend

🔧 Frontend

🔮 Future Enhancements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages