A Streamlit-based NLP application that intelligently summarizes PDF documents and allows users to ask questions about the content using free Hugging Face transformer models.
- π PDF Upload: Upload and extract text from any PDF using PyMuPDF (fitz).
- π§Ύ Text Preview: View extracted text in a scrollable box before processing.
- βοΈ Chunking: Automatically splits long documents into 1000-word chunks for efficient summarization.
- π§ Summarization: Generates concise document summaries using
facebook/bart-large-cnn. - β Q&A System: Ask questions about the document β the system retrieves the most relevant text using semantic similarity (
sentence-transformers/all-MiniLM-L6-v2) and answers withdeepset/roberta-base-squad2. - β‘ Real-time Feedback: Displays progress spinners, success messages, and organized output sections for a smooth user experience.
- π¨ Clean UI: Built with Streamlit β minimal, modern, and responsive.
- Frontend: Streamlit
- Backend / NLP: Hugging Face Transformers
- Models Used:
- Summarization β
facebook/bart-large-cnn - Sentence Embeddings β
sentence-transformers/all-MiniLM-L6-v2 - Question Answering β
deepset/roberta-base-squad2
- Summarization β
- Text Extraction: PyMuPDF (fitz)
- Language: Python