Real-time AI-powered communication coach that analyses spoken responses and delivers structured, actionable feedback.
Voice Analysis • AI Feedback • Interview Coaching • Production-Ready
⭐ If you like this project, consider giving it a star!
Real-time voice recording • AI-powered feedback • Structured scoring • Sample answers • Communication coaching
Most people struggle with spoken communication — interviews, presentations, and everyday conversations — yet receive little to no structured feedback on how they actually sound. Traditional coaching is expensive, inaccessible, and not available in real time.
Orato AI is a production-grade, full-stack AI coaching platform. Users record their spoken responses, and the system instantly transcribes, analyzes, and returns structured coaching feedback — scoring clarity, confidence, fluency, and structure — powered by Google Gemini or OpenAI.
Key value: Real-time, structured, actionable coaching available to anyone with a browser.
| Feature | Description |
|---|---|
| 🎙 Real-time voice recording | Capture spoken responses using the MediaRecorder API with live waveform visualization |
| 🧠 AI-powered feedback | Coaching analysis via Google Gemini API or OpenAI (configurable) |
| 📊 Structured scoring | Scores across Clarity, Confidence, Fluency, and Structure (0–100) |
| 🗣 Speech-to-text | Browser-native Web Speech API transcription with no third-party cost |
| 🔊 Text-to-speech playback | AI feedback narrated back via SpeechSynthesis API |
| 📈 Session analytics | Track improvement over time with per-session history and trend charts |
| 🌐 Multilingual support | Full English and Hindi interface via built-in translation layer |
| ⚡ Adaptive question generation | AI-generated questions tailored to role, difficulty, and session history |
| 🛡 Secure backend proxy | API keys never exposed to the browser — all AI calls routed through the Express server |
| 💾 In-memory caching | Response caching and transcript deduplication to reduce API cost and latency |
| Intelligent fallback feedback when AI is unavailable — no broken states | |
| 🌙 Dark mode | Full dark/light theme support |
+------------------------------------------------------------+
| FRONTEND (React + Vite) |
| |
| [Home] -> [Setup] -> [VoicePractice] -> [AIFeedback] |
| MediaRecorder API Web Speech API (STT) |
| SpeechSynthesis API (TTS) Recharts (analytics) |
+-----------------------------+------------------------------+
|
POST /api/ai (transcript + prompt)
|
v
+------------------------------------------------------------+
| BACKEND (Node.js + Express) |
| |
| Rate Limiter -> Request Validator -> Cache Lookup |
| | |
| +-----------+-----------+ |
| v v |
| Google Gemini API OpenAI API |
| +-----------+-----------+ |
| v |
| Response Validator -> Cache Store -> JSON Response |
+-----------------------------+------------------------------+
|
Structured feedback JSON
|
v
+----------------+
| React UI |
| Score cards |
| Coaching |
| insights |
+----------------+
User speaks -> MediaRecorder captures audio
-> Web Speech API transcribes to text
-> Frontend sends transcript + prompt to backend
-> Backend checks cache (dedup + prompt cache)
-> Backend calls Gemini / OpenAI with schema prompt
-> Response validated, clamped, cached
-> Structured JSON returned to frontend
-> UI renders scores, strengths, improvements, sample answer
-> SpeechSynthesis reads feedback aloud (optional)
Frontend
| Technology | Version | Purpose |
|---|---|---|
| React | 18 | UI framework |
| Vite | 6 | Build tool & dev server |
| Tailwind CSS | 3 | Utility-first styling |
| Framer Motion | 11 | Animations |
| Recharts | 2 | Session analytics charts |
| React Router | 6 | Client-side routing |
| React Query | 5 | Server state management |
Backend
| Technology | Version | Purpose |
|---|---|---|
| Node.js | 18+ | Runtime |
| Express | 4 | HTTP server |
| express-rate-limit | 7 | API rate limiting |
| dotenv | 16 | Environment config |
| cors | 2 | Cross-origin policy |
AI & Browser APIs
| API | Purpose |
|---|---|
| Google Gemini API | Primary AI provider for feedback & question generation |
| OpenAI API | Optional secondary AI provider |
| Web Speech API (SpeechRecognition) | Browser-native speech-to-text |
| MediaRecorder API | Audio capture with waveform |
| SpeechSynthesis API | Text-to-speech feedback playback |
1. User selects a practice mode (Interview / Presentation / Casual)
|
2. AI generates an adaptive question based on role and difficulty
|
3. User records their spoken response via the microphone
|
4. Web Speech API transcribes the audio to text in real time
|
5. Transcript and coaching prompt are sent to the backend proxy
|
6. Backend checks the in-memory cache; on a miss, calls Gemini or OpenAI
|
7. AI returns structured JSON: scores, strengths, improvements, tip, sample answer
|
8. Frontend renders animated score cards, coaching insights, and a model answer
|
9. SpeechSynthesis reads the feedback aloud if enabled
|
10. Session is saved; analytics charts update with the new data point
Orato-AI/
├── index.html
├── vite.config.js
├── tailwind.config.js
├── package.json
│
├── src/
│ ├── main.jsx # App entry point
│ ├── App.jsx # Route definitions
│ ├── Layout.jsx # Shell layout
│ │
│ ├── pages/
│ │ ├── Home.jsx # Mode selection dashboard
│ │ ├── Intro.jsx # Onboarding / landing
│ │ ├── InterviewSetup.jsx # Configure interview parameters
│ │ ├── QuestionSetup.jsx # Custom question setup
│ │ ├── VoicePractice.jsx # Recording + STT interface
│ │ └── AIFeedback.jsx # Feedback results & analytics
│ │
│ ├── components/
│ │ ├── VoiceWaveform.jsx # Live audio visualizer
│ │ ├── ScoreIndicator.jsx # Animated score ring
│ │ ├── SessionSummary.jsx # Per-session summary card
│ │ ├── ProgressSnapshot.jsx
│ │ ├── SettingsModal.jsx
│ │ ├── SettingsProvider.jsx
│ │ ├── ErrorBoundary.jsx
│ │ ├── translations.jsx # EN / HI strings
│ │ └── ui/ # Radix UI + shadcn components
│ │
│ ├── api/
│ │ ├── apiClient.js # Fetch wrapper (frontend -> backend)
│ │ ├── aiService.js # AI endpoint helpers
│ │ └── sessionAnalytics.js
│ │
│ ├── lib/
│ │ ├── AuthContext.jsx
│ │ ├── logger.js
│ │ ├── healthCheck.js
│ │ └── utils.js
│ │
│ └── utils/
│ └── index.js
│
└── server/
├── index.js # Express server -- AI proxy, cache, rate limit
├── config.js # Centralised server config
├── package.json
└── .env.example # Environment variable template
- Node.js 18+
- A Google Gemini API key or an OpenAI API key
git clone https://github.com/amansethhh/Orato-AI.git
cd Orato-AInpm installcd server
npm install
cd ..cp server/.env.example server/.envOpen server/.env and add your API key:
# Choose your AI provider: auto | gemini | openai
AI_PROVIDER=auto
GEMINI_API_KEY=your_gemini_key_here
OPENAI_API_KEY=your_openai_key_here # optional
PORT=3001
FRONTEND_URL=http://localhost:5173
NODE_ENV=developmentNote: If neither key is provided, the server runs in mock mode and returns locally-generated coaching feedback — useful for UI development without an API key.
cd server
node index.jsExpected output:
======================================================
Orato AI - Backend Server
======================================================
[INFO] Server running on http://localhost:3001
[INFO] AI Provider: GEMINI
Open a new terminal from the project root:
npm run devThe app opens automatically at http://localhost:5173.
| Variable | Required | Default | Description |
|---|---|---|---|
GEMINI_API_KEY |
Yes* | — | Google Gemini API key |
OPENAI_API_KEY |
Yes* | — | OpenAI API key |
AI_PROVIDER |
No | auto |
Force a provider: auto, gemini, openai |
PORT |
No | 3001 |
Backend server port |
FRONTEND_URL |
No | http://localhost:5173 |
Allowed CORS origin |
NODE_ENV |
No | development |
development or production |
*At least one AI key is required for real feedback. Without either, the server runs in mock mode.
| Variable | Description |
|---|---|
VITE_API_URL |
Backend base URL (defaults to /api via Vite proxy) |
| Optimization | Implementation |
|---|---|
| Code splitting | Vite manualChunks separates React, UI libs, and charts into parallel-loaded bundles |
| Lazy loading | Route-level code splitting via React Router lazy imports |
| In-memory caching | Backend caches AI responses by prompt hash (TTL: configurable) |
| Transcript deduplication | Identical transcripts return cached results within a dedup window |
| Rate limiting | express-rate-limit prevents abuse and protects API quota |
| Request validation | Prompt and transcript length limits enforced server-side before any AI call |
| Exponential backoff | Automatic retry with backoff on AI rate-limit (429) responses |
| Graceful fallback | Context-aware locally-generated feedback ensures zero broken states |
| Timeout handling | Per-request AbortController timeout on all AI calls |
Returns server status, active AI provider, uptime, and cache size.
{
"status": "ok",
"provider": "gemini",
"uptime": 142,
"cacheSize": 3,
"timestamp": "2025-01-01T00:00:00.000Z"
}Analyze a spoken response and return structured coaching feedback.
Request body:
{
"prompt": "The user answered the interview question: 'Tell me about yourself'",
"transcript": "Hi, I'm a software engineer with 3 years of experience..."
}Response:
{
"feedback": "Your opening established context well. Consider adding a specific achievement to strengthen impact.",
"clarity": 78,
"confidence": 65,
"structure": 72,
"fluency": 80,
"strengths": ["Clear introduction", "Good pacing"],
"improvements": ["Add a quantifiable result", "Reduce filler words"],
"tip": "Use the STAR framework: Situation, Task, Action, Result.",
"sampleAnswer": "I'm a software engineer with 3 years of experience...",
"filler_word_count": 4,
"_provider": "gemini"
}Generate an adaptive practice question.
Request body:
{
"prompt": "Generate a behavioral interview question for a senior frontend engineer role, medium difficulty."
}Response:
{
"question": "Describe a situation where you had to refactor a large codebase under time pressure. How did you prioritize?"
}- Real-time streaming AI — Stream AI tokens to the UI for lower perceived latency
- User authentication — Persistent accounts with cross-device session history
- Cloud deployment — Docker + Railway / Render backend, Vercel frontend
- Whisper STT — Server-side transcription via OpenAI Whisper for higher accuracy
- Video analysis — Webcam-based posture and eye-contact scoring
- Interview simulation — Multi-turn conversational interview mode
- PDF export — Download session feedback reports (html2canvas + jsPDF already included)
- Team / recruiter dashboard — Share sessions and track candidate progress
Contributions are welcome. Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name - Make your changes and commit:
git commit -m "feat: add your feature" - Push to your fork:
git push origin feature/your-feature-name - Open a pull request against
main
Please keep pull requests focused and include a clear description of the change and its motivation.
This project is licensed under the MIT License.



