Skip to content

SKfaizan-786/aceview

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AceView

🎯 AceView β€” AI Interview Coach

Real-time body language & speech analysis powered by Vision Agents, Deepgram, ElevenLabs, and Gemini AI.
Practice interviews with an AI coach that sees you, hears you, and gives you instant, personalized feedback.

Next.js FastAPI Stream YOLO Deepgram Gemini License: MIT

AceView Landing Page

πŸš€ Start Practising Β· πŸ“– Features Β· πŸ—οΈ Architecture Β· 🎬 Demo


οΏ½ What is AceView?

AceView is an open-source AI interview coaching platform that joins your video call as a co-participant, conducts a real interview, and provides live feedback on six performance axes β€” all in real time.

"It's like having a senior interviewer and a performance coach in the same room, watching every move."

What AceView tracks
πŸ‘οΈ Eye Contact β€” Are you looking at the camera?
🧍 Posture β€” Are your shoulders straight and squared?
πŸ’¬ Filler Words β€” How many "um", "like", "you know" did you say?
πŸŽ™οΈ Speech Pace β€” Are you speaking at 130 WPM (ideal interview pace)?
🧠 AI Nudges β€” Real-time coaching tips when performance dips
πŸ“Š Report Card β€” Gemini-generated grade + strengths + action plan

✨ Features

πŸ”΄ Live Session

  • AI Video Coach β€” ElevenLabs-voiced AI joins the call, asks tailored interview questions, and listens to your answers
  • Confidence Ring β€” Video border pulses green β†’ yellow β†’ red with your confidence level. Faster pulse = lower confidence
  • Real-time Metrics β€” Posture, eye contact, speech pace, and filler words update every second on the right panel
  • Live Transcript β€” See your words appear as you speak, with filler words highlighted in real time

πŸ€– AI Nudges (Invisible Coaching)

Minimalist pop-up tips appear mid-session without interrupting your flow:

  • "Sit up straight and square your shoulders" β€” when posture drops below 70
  • "Look directly at your camera" β€” when eye contact drops below 65
  • "You've used 5 filler words β€” try pausing instead" β€” at 5/10/15 filler thresholds
  • "Make sure your face is visible on camera" β€” when face disappears from frame
  • Multiple nudges can fire simultaneously with individual 10-second cooldowns

πŸ“Š Session Report Card

After every session, Gemini generates a personalised A–D graded report:

  • Overall Score averaged across the full session (not just the last moment)
  • Strengths β€” only metrics scoring β‰₯ 75 are listed as strengths (honest feedback)
  • Areas to Improve β€” specific, actionable coaching points
  • Tip of the Day β€” one concrete thing to work on next time
  • Download as PDF β€” one-click clean PDF export

πŸ“ˆ Dashboard

  • Session history with per-session posture / eye / pace / filler breakdowns
  • Improvement tracking across sessions
  • Aggregated strengths and improvement areas from your latest session

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        FRONTEND (Next.js)                        β”‚
β”‚  VideoPreview ─── StreamProvider ─── MetricsDisplay             β”‚
β”‚      β”‚                  β”‚                   β”‚                    β”‚
β”‚  Stream WebRTC      Custom Events      Zustand Store            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚ HTTP + WebRTC
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        BACKEND (FastAPI)                         β”‚
β”‚                                                                  β”‚
β”‚  /api/start-session ──► AgentLauncher                           β”‚
β”‚                              β”‚                                   β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”‚
β”‚                    β”‚   Vision Agent      β”‚                       β”‚
β”‚                    β”‚  (Vision Agents SDK)β”‚                       β”‚
β”‚                    β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜                       β”‚
β”‚                       β”‚      β”‚      β”‚                            β”‚
β”‚              Deepgram  β”‚  YOLOβ”‚  ElevenLabs                     β”‚
β”‚              STT       β”‚  Poseβ”‚  TTS                            β”‚
β”‚              (speech)  β”‚  (video)  (voice)                      β”‚
β”‚                        β”‚                                         β”‚
β”‚              AceViewVisionProcessor                              β”‚
β”‚              β”œβ”€β”€ _calculate_posture()    ← YOLOv11 keypoints    β”‚
β”‚              β”œβ”€β”€ _calculate_eye_contact() ← ear asymmetry       β”‚
β”‚              └── _send_nudge_if_needed() ← threshold checks     β”‚
β”‚                                                                  β”‚
β”‚  /api/session/summary ──► Gemini 2.0 Flash (via OpenRouter)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Design Decisions

Decision Rationale
YOLO at 1 FPS Prevents audio pipeline starvation β€” 3 FPS caused AudioQueue overflow
Ear asymmetry for eye contact YOLO can't track eyeballs, but reliably detects which ear is visible (head rotation proxy)
Session averages on report card Snapshot at session-end is unfair β€” 2 seconds of slouching shouldn't tank a 10-minute session
Subprocess isolation removed AgentLauncher is stable; subprocess approach caused WebSocket timing issues
Filler words via regex Deepgram SDK v2 doesn't support filler_words param β€” regex on transcript is equally accurate

πŸ› οΈ Tech Stack

Layer Technology
Frontend Next.js 15, React, Zustand, Tailwind CSS
Video SDK Stream Video React SDK (@stream-io/video-react-sdk)
Backend FastAPI, Uvicorn, Python 3.12
Agent Framework Vision Agents SDK (GetStream)
Pose Detection YOLOv11 Pose (yolov11n-pose.pt)
Speech-to-Text Deepgram Nova-2 (real-time streaming)
Text-to-Speech ElevenLabs (Adam voice)
LLM Gemini 2.0 Flash via OpenRouter
Package Manager uv (Python), npm (Node)

πŸš€ Quick Start

Prerequisites

  • Python 3.12+
  • Node.js 18+
  • uv package manager
  • Git

1. Clone Both Repos (as sibling folders)

The backend depends on Vision Agents as a local editable install. Both repos must sit in the same parent directory.

# From your chosen parent directory (e.g. Desktop)
git clone https://github.com/SKfaizan-786/aceview.git
git clone https://github.com/GetStream/Vision-Agents.git

Your folder structure must be:

<parent>/
  AceView/          ← this repo
  Vision-Agents/    ← GetStream SDK (sibling)

2. Apply SDK Patches

Two files in the Vision Agents SDK must be patched. Both are in the patches/ folder.

Patch 1 β€” SFU routing fix (fixes participant not found crash):

cd AceView
Copy-Item "patches\stream_edge_transport.py" "..\Vision-Agents\plugins\getstream\vision_agents\plugins\getstream\stream_edge_transport.py"

Patch 2 β€” Deepgram STT fix (fixes TypeError: unexpected keyword argument 'filler_words'):

Copy-Item "patches\deepgram_stt.py" "..\Vision-Agents\plugins\deepgram\vision_agents\plugins\deepgram\deepgram_stt.py"

⚠️ Both patches are required. Skipping Patch 2 causes a silent crash β€” the AI joins but immediately disconnects.

3. Configure Environment Variables

Create backend/.env:

STREAM_API_KEY=your_stream_api_key
STREAM_SECRET=your_stream_api_secret
OPENROUTER_API_KEY=your_openrouter_key
ELEVENLABS_API_KEY=your_elevenlabs_key
DEEPGRAM_API_KEY=your_deepgram_key

Get API keys from the project owner. Never commit .env.

4. Install Dependencies

# Backend
cd AceView/backend
uv sync

# Frontend
cd AceView/frontend
npm install

5. Run

Open two terminals:

# Terminal 1 β€” Backend (port 8000)
cd AceView/backend
uv run python main.py
# Wait for: INFO: Application startup complete.

# Terminal 2 β€” Frontend (port 3000)
cd AceView/frontend
npm run dev

Open http://localhost:3000 πŸŽ‰


🎬 Demo

AceView Demo Video

▢️ Watch Full Demo on YouTube


πŸ“Έ Screenshots

🟒 Live Session β€” HIGH Confidence (Great Posture & Eye Contact)

Great Posture

πŸ‘οΈ Bad Eye Contact β€” AI Nudge Firing

AI Nudges

πŸ’¬ Filler Words Highlighted Live in Transcript

Filler Words

πŸ“Š AI Report Card β€” Grade & Personalised Feedback

Grade Card

πŸ“ˆ Dashboard β€” Session History & Progress

Dashboard


πŸ“ Project Structure

AceView/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   β”œβ”€β”€ interview_agent.py      # Agent setup, STT handlers, WPM tracking
β”‚   β”‚   └── vision_processor.py     # YOLO pose analysis, nudge logic
β”‚   β”œβ”€β”€ main.py                     # FastAPI app, session endpoints
β”‚   └── yolo26n-pose.pt             # YOLOv11 pose model
β”‚
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ page.tsx                # Landing page
β”‚   β”‚   β”œβ”€β”€ interview/page.tsx      # Main interview page
β”‚   β”‚   └── dashboard/page.tsx      # Session history dashboard
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ Interview/
β”‚   β”‚   β”‚   β”œβ”€β”€ VideoPreview.tsx    # Confidence ring + YOLO video
β”‚   β”‚   β”‚   β”œβ”€β”€ MetricsDisplay.tsx  # Live score bars
β”‚   β”‚   β”‚   β”œβ”€β”€ LiveTranscript.tsx  # Real-time transcript
β”‚   β”‚   β”‚   └── AIPromptOverlay.tsx # Nudge pop-ups
β”‚   β”‚   └── StreamProvider.tsx      # WebRTC + event bridge
β”‚   └── store/
β”‚       └── interviewStore.ts       # Zustand store + session averages
β”‚
└── patches/
    β”œβ”€β”€ stream_edge_transport.py    # SFU routing fix
    β”œβ”€β”€ deepgram_stt.py             # Deepgram SDK compatibility fix
    └── README.md                   # Patch documentation

πŸ› Troubleshooting

Issue Fix
Failed to fetch on Start Session Stale backend process on port 8000. Run: Get-Process python* | Stop-Process -Force then restart backend
uv sync fails with path errors Make sure Vision-Agents/ is a sibling of AceView/, not inside it
AI agent joins but immediately leaves Deepgram patch not applied β€” run Patch 2 from Step 2
Camera not showing Allow camera/microphone permissions in browser when prompted
participant not found SFU error Stream edge transport patch not applied β€” run Patch 1 from Step 2

🀝 Contributing

  1. Fork the repo
  2. Create a feature branch (git checkout -b feat/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feat/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

MIT License β€” see LICENSE for details.


Built with ❀️ using Vision Agents · Deepgram · ElevenLabs · Gemini AI · Stream Video

⭐ Star this repo if AceView helped you ace your interviews!

About

🎯 Ace your interviews with AI. Real-time body language & speech analysis powered by Vision Agents. Your AI coach that watches, listens, and helps you improve instantly.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors