OmniMind one AI, many capabilities Analyze conversations, interpret images, and summarize documents all in one place.
The AI Playground is a multi-modal application that allows authenticated users to explore and interact with various AI-powered skills in a single platform.
The core capabilities include:
-
Conversation Analysis
- Upload audio files.
- Convert speech to text (STT).
- Perform speaker diarization (up to 2 speakers) without relying on STT vendor diarization.
-
Image Analysis
- Upload images.
- Generate detailed textual descriptions for the content of the image.
-
Document/URL Summarization (IN backend code not integrated due to time constraints)
- Upload documents (PDF, DOC).
- Provide URLs.
- Obtain concise summaries of the content.
- Frontend: https://pilvotask.netlify.app/
- Backend API: https://pilvo.onrender.com
-
Frontend: React + Vite (or chosen framework)
-
Backend: Node.js + Express
-
AI Models: Google Gemini API (via
GEMINI_API_KEY) -
Hosting:
- Frontend → Netlify
- Backend → Render
✅ Speech-to-text transcription ✅ Speaker diarization (max 2 speakers) without vendor diarization APIs ✅ AI-powered image description
git clone https://github.com/vijaykrishna483-cms/PILVO.git
cd PILVOcd ai-playground-backend
npm install-
Create a
.envfile inai-playground-backendand add:GEMINI_API_KEY=your_gemini_api_key_here -
Start the backend server:
nodemon index.js
cd ../ai-playground-frontend
npm install
npm run dev-
Select a skill:
- Conversation Analysis: Upload an audio file → get transcription + diarization.
- Image Analysis: Upload an image → get AI-generated description.
-
View results directly in the app interface.
- Backend is deployed at:
https://pilvo.onrender.com - Make sure your frontend
.env(if applicable) points API calls to this URL.OR PORT 5000 can be used in backend
This repository contains both the frontend and backend code for the AI Playground project. Follow the above steps to run locally or visit the live demo links to explore.