Real-time body language & speech analysis powered by Vision Agents, Deepgram, ElevenLabs, and Gemini AI.
Practice interviews with an AI coach that sees you, hears you, and gives you instant, personalized feedback.
π Start Practising Β· π Features Β· ποΈ Architecture Β· π¬ Demo
AceView is an open-source AI interview coaching platform that joins your video call as a co-participant, conducts a real interview, and provides live feedback on six performance axes β all in real time.
"It's like having a senior interviewer and a performance coach in the same room, watching every move."
| What AceView tracks | |
|---|---|
| ποΈ | Eye Contact β Are you looking at the camera? |
| π§ | Posture β Are your shoulders straight and squared? |
| π¬ | Filler Words β How many "um", "like", "you know" did you say? |
| ποΈ | Speech Pace β Are you speaking at 130 WPM (ideal interview pace)? |
| π§ | AI Nudges β Real-time coaching tips when performance dips |
| π | Report Card β Gemini-generated grade + strengths + action plan |
- AI Video Coach β ElevenLabs-voiced AI joins the call, asks tailored interview questions, and listens to your answers
- Confidence Ring β Video border pulses green β yellow β red with your confidence level. Faster pulse = lower confidence
- Real-time Metrics β Posture, eye contact, speech pace, and filler words update every second on the right panel
- Live Transcript β See your words appear as you speak, with filler words highlighted in real time
Minimalist pop-up tips appear mid-session without interrupting your flow:
- "Sit up straight and square your shoulders" β when posture drops below 70
- "Look directly at your camera" β when eye contact drops below 65
- "You've used 5 filler words β try pausing instead" β at 5/10/15 filler thresholds
- "Make sure your face is visible on camera" β when face disappears from frame
- Multiple nudges can fire simultaneously with individual 10-second cooldowns
After every session, Gemini generates a personalised AβD graded report:
- Overall Score averaged across the full session (not just the last moment)
- Strengths β only metrics scoring β₯ 75 are listed as strengths (honest feedback)
- Areas to Improve β specific, actionable coaching points
- Tip of the Day β one concrete thing to work on next time
- Download as PDF β one-click clean PDF export
- Session history with per-session posture / eye / pace / filler breakdowns
- Improvement tracking across sessions
- Aggregated strengths and improvement areas from your latest session
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (Next.js) β
β VideoPreview βββ StreamProvider βββ MetricsDisplay β
β β β β β
β Stream WebRTC Custom Events Zustand Store β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β HTTP + WebRTC
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β BACKEND (FastAPI) β
β β
β /api/start-session βββΊ AgentLauncher β
β β β
β βββββββββββΌβββββββββββ β
β β Vision Agent β β
β β (Vision Agents SDK)β β
β ββββ¬βββββββ¬βββββββ¬ββββ β
β β β β β
β Deepgram β YOLOβ ElevenLabs β
β STT β Poseβ TTS β
β (speech) β (video) (voice) β
β β β
β AceViewVisionProcessor β
β βββ _calculate_posture() β YOLOv11 keypoints β
β βββ _calculate_eye_contact() β ear asymmetry β
β βββ _send_nudge_if_needed() β threshold checks β
β β
β /api/session/summary βββΊ Gemini 2.0 Flash (via OpenRouter) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Decision | Rationale |
|---|---|
| YOLO at 1 FPS | Prevents audio pipeline starvation β 3 FPS caused AudioQueue overflow |
| Ear asymmetry for eye contact | YOLO can't track eyeballs, but reliably detects which ear is visible (head rotation proxy) |
| Session averages on report card | Snapshot at session-end is unfair β 2 seconds of slouching shouldn't tank a 10-minute session |
| Subprocess isolation removed | AgentLauncher is stable; subprocess approach caused WebSocket timing issues |
| Filler words via regex | Deepgram SDK v2 doesn't support filler_words param β regex on transcript is equally accurate |
| Layer | Technology |
|---|---|
| Frontend | Next.js 15, React, Zustand, Tailwind CSS |
| Video SDK | Stream Video React SDK (@stream-io/video-react-sdk) |
| Backend | FastAPI, Uvicorn, Python 3.12 |
| Agent Framework | Vision Agents SDK (GetStream) |
| Pose Detection | YOLOv11 Pose (yolov11n-pose.pt) |
| Speech-to-Text | Deepgram Nova-2 (real-time streaming) |
| Text-to-Speech | ElevenLabs (Adam voice) |
| LLM | Gemini 2.0 Flash via OpenRouter |
| Package Manager | uv (Python), npm (Node) |
- Python 3.12+
- Node.js 18+
uvpackage manager- Git
The backend depends on Vision Agents as a local editable install. Both repos must sit in the same parent directory.
# From your chosen parent directory (e.g. Desktop)
git clone https://github.com/SKfaizan-786/aceview.git
git clone https://github.com/GetStream/Vision-Agents.gitYour folder structure must be:
<parent>/
AceView/ β this repo
Vision-Agents/ β GetStream SDK (sibling)
Two files in the Vision Agents SDK must be patched. Both are in the patches/ folder.
Patch 1 β SFU routing fix (fixes participant not found crash):
cd AceView
Copy-Item "patches\stream_edge_transport.py" "..\Vision-Agents\plugins\getstream\vision_agents\plugins\getstream\stream_edge_transport.py"Patch 2 β Deepgram STT fix (fixes TypeError: unexpected keyword argument 'filler_words'):
Copy-Item "patches\deepgram_stt.py" "..\Vision-Agents\plugins\deepgram\vision_agents\plugins\deepgram\deepgram_stt.py"
β οΈ Both patches are required. Skipping Patch 2 causes a silent crash β the AI joins but immediately disconnects.
Create backend/.env:
STREAM_API_KEY=your_stream_api_key
STREAM_SECRET=your_stream_api_secret
OPENROUTER_API_KEY=your_openrouter_key
ELEVENLABS_API_KEY=your_elevenlabs_key
DEEPGRAM_API_KEY=your_deepgram_keyGet API keys from the project owner. Never commit
.env.
# Backend
cd AceView/backend
uv sync
# Frontend
cd AceView/frontend
npm installOpen two terminals:
# Terminal 1 β Backend (port 8000)
cd AceView/backend
uv run python main.py
# Wait for: INFO: Application startup complete.
# Terminal 2 β Frontend (port 3000)
cd AceView/frontend
npm run devOpen http://localhost:3000 π
AceView/
βββ backend/
β βββ agents/
β β βββ interview_agent.py # Agent setup, STT handlers, WPM tracking
β β βββ vision_processor.py # YOLO pose analysis, nudge logic
β βββ main.py # FastAPI app, session endpoints
β βββ yolo26n-pose.pt # YOLOv11 pose model
β
βββ frontend/
β βββ app/
β β βββ page.tsx # Landing page
β β βββ interview/page.tsx # Main interview page
β β βββ dashboard/page.tsx # Session history dashboard
β βββ components/
β β βββ Interview/
β β β βββ VideoPreview.tsx # Confidence ring + YOLO video
β β β βββ MetricsDisplay.tsx # Live score bars
β β β βββ LiveTranscript.tsx # Real-time transcript
β β β βββ AIPromptOverlay.tsx # Nudge pop-ups
β β βββ StreamProvider.tsx # WebRTC + event bridge
β βββ store/
β βββ interviewStore.ts # Zustand store + session averages
β
βββ patches/
βββ stream_edge_transport.py # SFU routing fix
βββ deepgram_stt.py # Deepgram SDK compatibility fix
βββ README.md # Patch documentation
| Issue | Fix |
|---|---|
Failed to fetch on Start Session |
Stale backend process on port 8000. Run: Get-Process python* | Stop-Process -Force then restart backend |
uv sync fails with path errors |
Make sure Vision-Agents/ is a sibling of AceView/, not inside it |
| AI agent joins but immediately leaves | Deepgram patch not applied β run Patch 2 from Step 2 |
| Camera not showing | Allow camera/microphone permissions in browser when prompted |
participant not found SFU error |
Stream edge transport patch not applied β run Patch 1 from Step 2 |
- Fork the repo
- Create a feature branch (
git checkout -b feat/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feat/amazing-feature) - Open a Pull Request
MIT License β see LICENSE for details.
Built with β€οΈ using Vision Agents Β· Deepgram Β· ElevenLabs Β· Gemini AI Β· Stream Video
β Star this repo if AceView helped you ace your interviews!






