๐ A full-stack Image Caption Generator that automatically generates meaningful, human-like captions for images using CNN + LSTM, wrapped with a frontend UI, backend API, and deployed for real-world usage.
๐ https://caption-generator-green.vercel.app/
The goal of this project is to generate descriptive captions for images by learning both:
- visual features from images, and
- linguistic patterns from text data.
๐ง The system:
- Accepts an image upload from the user
- Extracts image features using a pre-trained CNN
- Generates captions word-by-word using an LSTM
- Serves predictions through a backend API
- Displays results on a deployed frontend interface
This project uses a CNN + RNN (LSTM) hybrid deep learning architecture.
- Model: Xception (pre-trained on ImageNet)
- Input Size: 299 ร 299
- Output: 2048-dimensional feature vector
- Embedding Layer: Converts tokens into dense vectors
- LSTM Layer: Captures sequence and context
- Dense + Softmax: Predicts next word from vocabulary
๐ Image features and text embeddings are merged to predict captions sequentially.
- TensorFlow
- Keras
- CNN (Xception)
- RNN / LSTM
- NumPy, Pandas
- Pickle
- Python
- FastAPI
- Uvicorn
- TensorFlow Serving Logic
- REST API
- Next.js
- React
- Image Upload Interface
- Caption Display UI
- Frontend: Vercel
- Backend: Render (FastAPI)
- Model: Loaded at runtime for inference
โจ Upload any image
โจ Generate captions instantly
โจ Clean & responsive UI
โจ Backend-powered inference
โจ Fully deployed and accessible online
CaptionGenerator/
โ
โโโ frontend/
โ โโโ app/
โ โโโ components/
โ โโโ public/
โ โโโ styles/
โ โโโ package.json
โ
โโโ backend/
โ โโโ main.py
โ โโโ caption_service.py
โ โโโ utils.py
โ โโโ requirements.txt
โ
โโโ models/
โ โโโ model_9.h5
โ
โโโ tokenizer.p
โโโ README.md
pip install -r backend/requirements.txt
uvicorn backend.main:app --reloadcd frontend
npm install
npm run dev
Set environment variable: NEXT_PUBLIC_API_URL=http://localhost:8000- ๐ค Transformer-based captioning models
- ๐ Beam search decoding
- ๐ Multilingual captions
- ๐ฅ Video captioning
- ๐ Performance metrics (BLEU score)
๐ฉโ๐ป Author
Archana P Nair ๐ GitHub: https://github.com/Archana-P-Nair
โญ If you like this project, donโt forget to star the repo!