📝 SummarizeIt — AI-Powered Summarization App

A full-stack AI app that summarizes long conversations and articles using a fine-tuned Pegasus transformer. Includes chat vs. paragraph detection, summary length control, REST API, and clean responsive UI.

Tech Stack: Python • FastAPI • Transformers • PyTorch • Bootstrap • Docker

🎯 Try it Live • 🤖 Model • 🐛 Report Bug

📖 Inspired by recent research on fine-tuning Pegasus for dialogue summarization
(see this paper)

✨ Features

🧠 Fine-tuned Pegasus Model - Trained on SAMSum for summarizing dialogue and paragraphs
🤗 Hugging Face Integration - Model deployed on Hugging Face Hub, app hosted on Spaces
🎯 Smart Input Detection - Automatically detects chat vs. paragraph format
📏 Multiple Summary Lengths - Short, medium, and long summary options
🌐 Modern Web Interface - Clean, responsive design with real-time processing
📁 File Upload Support - Handle .txt and .md files up to 10MB
🔌 RESTful API - Complete API with OpenAPI documentation
📊 ROUGE Evaluation - Comprehensive model performance metrics

🔄 How It Works

📊 Complete Workflow

graph TD
    A[📚 SAMSum Dataset] --> B[🔧 Data Preprocessing]
    B --> C[🧠 Fine-tune Pegasus Model]
    C --> D[📈 Model Evaluation]
    D --> E[💾 Save Fine-tuned Model]
    E --> F[🤗 Upload to Hugging Face Hub]
    F --> G[🚀 Deploy FastAPI App]
    G --> H[🌐 Hugging Face Spaces]
    
    I[👤 User Input] --> J{📝 Input Type Detection}
    J -->|Chat| K[💬 Chat Processing]
    J -->|Article| L[📄 Article Processing]
    K --> M[🤖 Pegasus Model]
    L --> M
    M --> N[📋 Generate Summary]
    N --> O[📊 Calculate Metrics]
    O --> P[✅ Return Results]

🎯 Processing Pipeline

📥 Input Processing
- Detect input type (chat conversation vs. article)
- Clean and preprocess text
- Handle file uploads (.txt, .md)
🧠 AI Summarization
- Load fine-tuned Pegasus model from Hugging Face Hub
- Tokenize input text (max 1024 tokens)
- Generate summary based on selected length
- Apply post-processing filters
📊 Output Generation
- Calculate compression ratio
- Compute summary statistics
- Format response with metadata
- Return JSON response or web interface

🔧 Model Training Pipeline

graph LR
    A[📚 SAMSum Dataset<br/>16k+ conversations] --> B[🔧 Tokenization<br/>Max 1024 tokens]
    B --> C[🎯 Fine-tuning<br/>4 epochs]
    C --> D[📈 ROUGE Evaluation<br/>R-1, R-2, R-L]
    D --> E[💾 Model Export<br/>HuggingFace format]

Training Stats:

Dataset: 16,000+ chat conversations
Training Time: 2-4 hours (GPU) / 8-12 hours (CPU)
Model Size: ~2.3GB
Performance: 11.9% improvement in ROUGE-1 score

🚀 Quick Start

🎯 Try Online (No Installation Required)

💻 Local Setup

Option 1: Manual Setup

git clone https://github.com/ananthakr1shnan/SummarizeIt.git
cd SummarizeIt
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt
python main.py setup
python main.py serve

Option 2: Docker

git clone https://github.com/ananthakr1shnan/SummarizeIt.git
cd SummarizeIt
docker-compose up --build

🌐 Access the app at: http://localhost:8000

📋 Requirements

Tech Stack:

Backend: Python 3.11+, FastAPI, Uvicorn
AI/ML: Transformers, PyTorch, Datasets
Frontend: Bootstrap, HTML/CSS/JavaScript
Deployment: Docker, Hugging Face Spaces
Model: Fine-tuned Pegasus (google/pegasus-cnn_dailymail)

System Requirements:

8GB+ RAM (for model training)
CUDA GPU (optional, for faster training)
Docker (optional)

🎯 Usage

Web Interface

Open http://localhost:8000
Paste text or upload a file
Select summary length and content type
Click "Generate Summary"
Copy and share your summary

API Usage

import requests

response = requests.post("http://localhost:8000/summarize", json={
    "text": "Your long text here...",
    "summary_length": "medium",
    "input_type": "auto"
})

result = response.json()
print(result["summary"])

CLI Commands

python main.py setup      # Setup environment
python main.py train      # Train model (2-4 hours)
python main.py serve      # Start web server
python main.py evaluate   # Evaluate model

🧠 Model Performance

ROUGE Scores (Fine-tuned vs Base)

Metric	Base Model	Fine-tuned	Improvement
ROUGE-1	0.42	0.47	+11.9%
ROUGE-2	0.19	0.23	+21.1%
ROUGE-L	0.34	0.39	+14.7%

⚠️ Note: Due to hardware constraints, the model was trained for 1 epoch.
Full training with 4 epochs and a high-performance GPU would significantly improve results.

Performance Stats

Processing Speed: ~2-3 seconds per summary
Memory Usage: ~2GB GPU / ~4GB CPU
Batch Processing: ~1 second per text in batch

🔌 API Endpoints

Method	Endpoint	Description
`POST`	`/summarize`	Summarize single text
`POST`	`/summarize/batch`	Summarize multiple texts
`POST`	`/summarize/file`	Upload and summarize file
`GET`	`/health`	API health check
`GET`	`/docs`	Interactive API documentation

Request Format

{
  "text": "Your text here...",
  "summary_length": "medium",  // "short", "medium", "long"
  "input_type": "auto"         // "auto", "chat", "paragraph"
}

Response Format

{
  "summary": "Generated summary text...",
  "original_length": 450,
  "summary_length": 89,
  "compression_ratio": 0.2,
  "summary_type": "medium",
  "input_type": "paragraph"
}

🎓 Training

Train your own model with custom data:

# Quick training with default settings
python main.py train

# Custom configuration in config/settings.py
NUM_EPOCHS = 4 # Use 4 for full training. Set to 1 if you're working with limited hardware
BATCH_SIZE = 2
LEARNING_RATE = 5e-5

The training process includes:

Download SAMSum dataset
Preprocessing and tokenization
Model training
ROUGE evaluation
Model saving

🚀 Deployment

🤗 Hugging Face Deployment

Live App: Deployed on Hugging Face Spaces
Model Hub: Fine-tuned model available at Hugging Face Hub
Zero Setup: No installation required, just click and use!

📚 Reference

This project draws inspiration from:

Fine-tuning the Large Language Pegasus Model for Dialogue Summarization

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙋‍♂️ About Me

Hi, I'm Ananthakrishnan K — a B.Tech CS student passionate about Machine Learning and AI.
This project is part of my journey to master NLP and build real-world AI tools.

📫 Email • LinkedIn •

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.vscode		.vscode
Notebooks		Notebooks
config		config
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt
requirements_spaces.txt		requirements_spaces.txt
training_metadata.json		training_metadata.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📝 SummarizeIt — AI-Powered Summarization App

✨ Features

🔄 How It Works

📊 Complete Workflow

🎯 Processing Pipeline

🔧 Model Training Pipeline

🚀 Quick Start

🎯 Try Online (No Installation Required)

💻 Local Setup

📋 Requirements

🎯 Usage

Web Interface

API Usage

CLI Commands

🧠 Model Performance

ROUGE Scores (Fine-tuned vs Base)

Performance Stats

🔌 API Endpoints

Request Format

Response Format

🎓 Training

🚀 Deployment

🤗 Hugging Face Deployment

📚 Reference

📄 License

🙋‍♂️ About Me

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📝 SummarizeIt — AI-Powered Summarization App

✨ Features

🔄 How It Works

📊 Complete Workflow

🎯 Processing Pipeline

🔧 Model Training Pipeline

🚀 Quick Start

🎯 Try Online (No Installation Required)

💻 Local Setup

📋 Requirements

🎯 Usage

Web Interface

API Usage

CLI Commands

🧠 Model Performance

ROUGE Scores (Fine-tuned vs Base)

Performance Stats

🔌 API Endpoints

Request Format

Response Format

🎓 Training

🚀 Deployment

🤗 Hugging Face Deployment

📚 Reference

📄 License

🙋‍♂️ About Me

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages