🤖 A voice-first AI assistant with offline speech recognition, OpenAI-powered responses and text-to-speech, wrapped in a sleek React + Vite interface.
🎤 Mic ──▶ 🎙 Vosk STT ──▶ 🧠 OpenAI LLM ──▶ 🔊 OpenAI TTS ──▶ 🔈 Speaker
| Layer | Technology | Role |
|---|---|---|
| 🖥️ Frontend | React + Vite | UI & hot module replacement |
| 🐍 Backend | Python | Server & orchestration |
| 🎙️ STT | Vosk | Offline speech-to-text |
| 🧠 AI | OpenAI GPT | Response generation |
| 🔊 TTS | OpenAI TTS | Voice synthesis |
| ☁️ Storage | Cloudinary | Audio & asset hosting |
| 🎭 Automation | Playwright | Browser control |
jarvis-main/
├── src/ # ⚛️ React + Vite frontend
├── server/
│ ├── main.py # 🐍 Backend entry point
│ ├── speech_io.py # 🎙️ Vosk STT (loads model at L13)
│ └── requirements.txt
├── vosk-model-small-en-us-0.15/ # 📦 Must stay in project root
├── .env # 🔑 Credentials (git-ignored)
└── vite.config.js
Install npm packages and Playwright browser binaries:
npm install
sudo npx playwright install-depsCreate an isolated environment and install backend dependencies:
python3 -m venv .venv
source .venv/bin/activate
pip install -r server/requirements.txtCreate .env or server/.env in the project root:
OPENAI_API_KEY=your_key_here
YOUR_CLOUD_NAME=your_cloud_name
YOUR_API_KEY=your_cloudinary_key
YOUR_API_SECRET=your_cloudinary_secretWarning
Never commit .env files. Both .env and server/.env are already listed in .gitignore.
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zipImportant
Keep the extracted vosk-model-small-en-us-0.15/ folder in the project root. speech_io.py loads it from there at line 13.
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
OpenAI secret key for LLM completions and TTS | ✅ |
YOUR_CLOUD_NAME |
Cloudinary cloud name from your dashboard | ✅ |
YOUR_API_KEY |
Cloudinary API key | ✅ |
YOUR_API_SECRET |
Cloudinary API secret — treat as a password | ✅ |
Start the frontend (in one terminal):
npm run devStart the backend (in another terminal):
source .venv/bin/activate
python -m server.main-
The following are excluded from version control via
.gitignore:vosk-model-small-en-us-0.15/ .venv/ .env server/.env audio output files screenshots/ -
The Vosk model enables fully offline speech recognition — no API call needed for STT.
-
OpenAI is used for both response generation (GPT) and voice synthesis (TTS).