An intelligent application that takes video input, transcribes the audio, and generates structured summaries and visual mindmaps using advanced AI models.
- Transcription: OpenAI Whisper (running locally).
- Summarization: Google Pegasus (Abstractive summarization).
- Intelligence: Groq Llama 3 (Entity extraction & Logic).
- Visualization: Automated Mermaid.js Mindmaps.
- Containerization: Fully Dockerized setup.
- Backend: FastAPI (Python)
- AI Models: Whisper, Pegasus-XSUM, SentenceTransformers
- Infrastructure: Docker & Docker Compose
- Clone the repository.
- Create a
.envfile inbackend/with your API key:GROQ_API_KEY=your_key_here - Build and run with Docker:
docker-compose up --build
- Access the API at
http://localhost:8000/docs.
The first build downloads ~3GB of AI models (Pegasus & Whisper). Subsequent starts are instant.