The AI interviewer that sees you, hears you, and asks the question that exposes how you really think.
Hiring is a conversation. Static tests are a monologue. Alexis is the AI that can hold the conversation β voice, vision, and live code execution in one continuous loop. No clicks. No silence. No honor system.

βΆ Click to watch the demo on YouTube
| Landing | Live interview |
|---|---|
![]() |
![]() |
- Traditional online tests are silent because machines couldn't hold a conversation; that constraint is gone; the interview that needed a face is now possible.
- Code without context is meaningless β a senior engineer needs to know why a candidate made a specific design choice, not just if the tests pass.
- Remote interviews lack physical presence and struggle with cheating, requiring a new approach that blends visual integrity checking with a natural, conversational flow.
Alexis solves the technical interviewing problem natively by leveraging full-duplex voice, visual presence, and spatial awareness to replace the static text box.
- Only voice surfaces "why" follow-ups.
- Only vision (webcam + Gemini Vision observations) surfaces real integrity signals.
- Only a spatial face makes the interview feel like an interview, not an interrogation.
(See
src/components/agent/SelfView.tsx,src/lib/visual-observations.ts,src/components/agent/SpatialRealAvatar.tsx)
The system is built for ultra-low latency, natural pacing, and spontaneous interruptions.
- Native-audio Gemini Live (no STT/TTS round trip) and VAD-driven barge-in with mid-sentence interruption.
- Server-driven SpatialReal lip-sync keyframes that stay synced at 600+ ms RTT.
- One canonical audio path (AvatarKit's internal player muted) to prevent double-voice, and one-chunk hold-back so end=true always rides on real audio.
(See
src/lib/interview-live-client.ts,src/components/agent/SpatialRealAvatar.tsx)
The architecture tightly couples the real-time audio/visual layer with external sandbox actions.
- Spoken intent β typed tool registry β Daytona / analysis / state, in the same event tick; voice-controlled break / end / repeat-question tools.
- Sandbox lifecycle hardened with a server-side reaper (Vercel cron, label-paginated) so tab-close doesn't orphan workspaces.
- Path sanitizer is a validate-only allow-list (
/workspace,/tmp, NUL-byte blocked, traversal blocked); credential-mint endpoint behind aggressive rate-limit andGEMINI_AUTH_MODEflag staged for ephemeral-token migration. (Seesrc/lib/agent-tools.ts,src/app/api/cron/sandbox-cleanup/route.ts,src/lib/daytona.ts,src/app/api/gemini/session/route.ts)
flowchart LR
subgraph Candidate ["Candidate"]
Mic
Webcam
MonacoEditor["Monaco Editor"]
end
subgraph Browser ["Browser (Next.js + React 19)"]
LiveWS["Gemini Live WebSocket client"]
AudioBus["Audio Bus"]
Avatar["SpatialReal WASM Avatar"]
VisionObserver["Vision Observer"]
Store["Code-State Store (Zustand)"]
end
subgraph Server ["Server (Vercel Fluid Compute)"]
APIRoutes["API Routes"]
DaytonaSandbox["Daytona Sandbox"]
GeminiPro["Gemini 3 Pro (analysis)"]
CodeRabbit
Sentry
CronReaper["Cron Reaper"]
end
Mic --> AudioBus
Webcam --> VisionObserver
AudioBus --> LiveWS
VisionObserver --> LiveWS
LiveWS --> GeminiLive["Gemini Live"]
GeminiLive -->|"tool calls back"| LiveWS
LiveWS --> APIRoutes
APIRoutes --> DaytonaSandbox
APIRoutes -->|"analysis"| GeminiPro
CronReaper -->|"reaps Daytona by label app=alexis"| DaytonaSandbox
Voice, vision, and code state are one event stream. The agent sees, hears, and runs code in the same tick.
- Native-audio Gemini Live integration.
- VAD barge-in for natural interruptions.
- One-chunk hold-back for clean end-of-turn audio delivery.
- Webcam
SelfViewfor continuous candidate presence. - Gemini Vision observations periodically sampled.
- Multi-signal integrity score (eye contact, off-screen attention, second-device detection), all streamed into the interview store.
- SpatialReal AvatarKit (~970KB WASM, prefetched) rendering high-fidelity 3D presence.
- Server-driven keyframes ensuring visual sync.
- Custom audio bus subscribed to the same PCM stream that drives the speakers.
- Typed tool registry bound directly to the voice agent.
- "Run it" β Daytona executeCode.
- "Fix the syntax" β autofix with diff preview.
- "I need a minute" β graceful break.
- "End the interview" β report generation.
- Node 18+ (LTS)
- Docker (only if running Daytona locally)
- API keys for Daytona, Gemini, SpatialReal
git clone https://github.com/yhinai/alexis.git
cd alexis
npm install(Note: Repo uses legacy-peer-deps=true via .npmrc for AvatarKit's peer conflict)
Create a .env.local file with the following keys:
| Variable | Required | Description |
|---|---|---|
DAYTONA_API_KEY |
Yes | Your Daytona API Key |
DAYTONA_API_URL |
Yes | URL for Daytona Server |
GEMINI_API_KEY |
Yes | Google Gemini API Key |
SPATIALREAL_API_KEY |
Yes | SpatialReal API Key |
SPATIALREAL_APP_ID |
Yes | SpatialReal App ID |
NEXT_PUBLIC_SPATIALREAL_AVATAR_ID |
Yes | SpatialReal Avatar ID to render |
SENTRY_AUTH_TOKEN |
No | Sentry token for error tracking |
CRON_SECRET |
No | Required if deployed to Vercel for sandbox reaper |
GEMINI_AUTH_MODE |
No | direct default; ephemeral when SDK ephemeral tokens land |
NEXT_PUBLIC_USE_MOCK_DAYTONA |
No | true to run UI without a real Daytona container |
npm run devNavigate to http://localhost:3000 to start an interview.
/
βββ src/
β βββ app/
β β βββ api/ # API routes (Gemini auth, sandbox cleanup, execution)
β β βββ interview/ # Main interview session pages
β βββ components/
β β βββ agent/ # Agent UI (InterviewAgent.tsx, SystemDesignAgent.tsx, SpatialRealAvatar.tsx, SelfView.tsx, TranscriptPanel.tsx)
β β βββ editor/ # Code editor components
β βββ lib/ # Core logic (interview-live-client.ts, daytona.ts, gemini.ts, visual-observations.ts, agent-tools.ts, code-history.ts, rate-limiter.ts, auth.ts, constants.ts)
βββ vercel.json # Vercel deployment configuration and cron schedules
βββ plan.md # Project phase tracking and roadmap
- 93 tests passing, 0 failing (vitest)
- 17 API routes, ~40 lib modules, ~50 components
- Sandbox lifecycle: client cleanup is best-effort, server-side cron is the contract β see
src/app/api/cron/sandbox-cleanup/route.ts - Path sanitizer: validate-only allow-list, NUL-blocked, traversal-blocked β see
src/lib/daytona.ts - Credential-mint endpoint: rate-limited;
GEMINI_AUTH_MODEflag staged for ephemeral tokens β seesrc/app/api/gemini/session/route.ts - Type-safe tool registry: no
(toolFunctions as any)[name]anywhere - Phase plan tracked in
plan.md(50+ items across 5 phases)
- Ephemeral Gemini auth tokens.
- Candidate consent dialog.
- Demo-day fallbacks (text-only voice, static avatar).
BaseLiveClientextraction to deduplicate the three live-client modules.- Upstash Redis-backed rate limiter.
- Server-authoritative session state.
- Multi-language analyzers (TS, Go).
- Interview replay with consent.
- Keystroke + similarity-based integrity signals.
This project is licensed under the MIT License.
- SpatialReal for sponsoring the Voice & Vision Track.
- Gemini Live Team for the native-audio models.
- Daytona for the sandboxed code execution infrastructure.

