A voice-enabled Windows desktop assistant powered by Gemini Live API or local LM Studio
- About the Project
- Features
- Architecture
- Requirements
- Installation
- Configuration
- Usage
- Supported Commands
- Local Mode (LM Studio)
- Project Structure
- Contributing
JARVIS is a real-time voice AI assistant developed for the Windows desktop environment. It is built on the Google Gemini 2.5 Flash Live API and can operate in both cloud (Gemini) and fully offline local (LM Studio) modes.
It communicates with the user via voice, processes voice commands in real time, and executes real actions on Windows through 16+ integrated tools.
- ๐๏ธ Real-time audio streaming โ 16 kHz input, 24 kHz output via PyAudio
- ๐ Natural voice responses โ Gemini Native Audio or Windows SAPI TTS
- โ๏ธ Text mode โ Written command support alongside voice
- ๐ Pause / Resume โ Instant pause without dropping the session
- ๐ฅ๏ธ Application management โ Open any Windows application by voice
- ๐ System info โ CPU, RAM, disk, battery, time, date, network status
- ๐ป PowerShell โ Run terminal commands via voice
- ๐๏ธ Screen analysis โ Capture and analyze the active window with AI (Gemini Vision or LM Studio Vision)
- ๐ Calendar โ Outlook or local JSON calendar; read, add, and delete events
- โฐ Reminders โ Outlook Tasks or local JSON; create and list reminders
- ๐ง Persistent memory โ Save and delete user-specific information in JSON
- ๐ฌ WhatsApp โ Compose and auto-send messages via Desktop or Web; import VCF contacts
- ๐ฆ๏ธ Weather โ Real-time weather summary (OpenWeatherMap)
- ๐ต Media playback โ YouTube, Spotify Desktop, and Apple Music Web integration
- ๐ Browser control โ Open URLs, Google search, play YouTube videos
- ๐ YouTube Analytics โ Channel statistics and video performance reports
- โ๏ธ Gemini mode โ Google Gemini 2.5 Flash Live API
- ๐ Local mode โ Fully offline operation with LM Studio (OpenAI-compatible); Whisper or Google STT
jarvis/
โโโ main.py โ Gemini Live session manager (JarvisLive)
โโโ ui.py โ Tkinter-based desktop UI (JarvisUI)
โโโ app_config.py โ Configuration read/write
โโโ core/
โ โโโ lmstudio_runtime.py โ Local mode engine (JarvisLocal)
โ โโโ prompt.txt โ AI system prompt
โโโ actions/ โ Tool modules
โ โโโ ...
โโโ memory/ โ Persistent memory
โโโ config/ โ API keys
Microphone โ PyAudio โ Gemini Live API
โ
Tool Call Detection
โ
actions/* modules
โ
Result โ Gemini โ Audio Output โ Speaker
Microphone โ SpeechRecognition (Whisper/Google STT) โ Text
โ
LM Studio API
โ
Tool Call Detection
โ
actions/* modules
โ
Text โ Windows SAPI TTS โ Speaker
| Requirement | Details |
|---|---|
| Python | 3.10 or higher |
| Operating System | Windows 10 / 11 |
| Gemini API Key | Free from Google AI Studio |
| Microphone | Any USB or built-in microphone |
| YouTube API | Optional โ for channel statistics |
| LM Studio | Optional โ for local offline mode |
google-genai
SpeechRecognition
pyaudio
psutil
Pillow
requests
pyautogui
pyttsx3
pywin32
openai-whisper (optional, for local STT)
git clone https://github.com/bnsware/jarvis.git
cd jarvissetup.batThis script performs the following steps in order:
- Checks for Python installation
- Creates a
venvvirtual environment - Installs required fonts on Windows
- Installs packages from
requirements.txt - Creates
config/api_keys.jsonfrom the template
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
copy config\api_keys.example.json config\api_keys.jsonNote: If you get an error installing PyAudio:
pip install pipwin pipwin install pyaudio
Edit the config/api_keys.json file:
{
"gemini_api_key": "AIza...",
"voice": "Charon",
"youtube_api_key": "",
"youtube_channel_handle": "@yourchannel",
"backend": "gemini",
"lmstudio_base_url": "http://127.0.0.1:1234/v1",
"lmstudio_model": "local-model",
"lmstudio_api_key": "lm-studio",
"stt_engine": "whisper",
"stt_language": "en-US"
}| Field | Description | Required |
|---|---|---|
gemini_api_key |
Google Gemini API key | Yes in Gemini mode |
voice |
Voice tone: Charon, Aoede, Fenrir, Kore, Puck |
No |
youtube_api_key |
YouTube Data API v3 key | For channel reports |
youtube_channel_handle |
Handle in @yourchannel format |
For channel reports |
backend |
gemini or lmstudio |
No (default: gemini) |
lmstudio_base_url |
LM Studio server address | Yes in local mode |
lmstudio_model |
Local model name to use | Yes in local mode |
stt_engine |
whisper or google |
In local mode |
stt_language |
STT language code | In local mode |
start.bator manually:
venv\Scripts\activate
python main.pyWhen JARVIS opens, a modern desktop window appears:
- Status indicator โ
LISTENING,THINKING,SPEAKING,ERROR - Log panel โ Real-time conversation stream
- Text box โ Enter written commands alongside voice
- Pause button โ Silences the assistant without dropping the session
- System panel โ CPU, RAM, battery, and weather widgets
"Open Spotify"
"Launch VS Code"
"What's the battery status?"
"How much RAM is being used?"
"What time is it?"
"List files on the desktop"
"What's on my calendar today?"
"What's my schedule tomorrow?"
"What's my next meeting?"
"What's on my calendar for the next 30 days?"
"Add a dentist appointment tomorrow at 2 PM"
"Add a meeting Monday from 10:00 to 11:00"
"Delete the meeting from my calendar"
"What are my reminders for today?"
"Show my upcoming reminders"
"Remind me about the dentist tomorrow morning at 9"
"Remind me to buy milk this evening"
"What's the weather in New York?"
"Search Google for learn Python"
"Open The Weeknd Blinding Lights on YouTube"
"Open github.com"
"Play The Weeknd Blinding Lights on Spotify"
"Open Adele Hello on Apple Music"
"Play lofi beats on YouTube"
"Send a goodnight message to Mom on WhatsApp"
"Prepare a WhatsApp message to John: let's meet tomorrow"
"What's on the screen?"
"Read this error"
"Analyze this window"
"What does it say here?"
"How are my YouTube stats?"
"Analyze my recent videos"
"Summarize my channel growth"
"My name is Alex"
"My project is a Python application"
"Remove the Claude limit note from your memory"
"Forget this"
To run entirely locally without an internet connection:
1. Download LM Studio and load a model: Download from lmstudio.ai, select a model, and start the server.
2. Update config/api_keys.json:
{
"backend": "lmstudio",
"lmstudio_base_url": "http://127.0.0.1:1234/v1",
"lmstudio_model": "lmstudio-community/mistral-7b-instruct",
"stt_engine": "whisper"
}3. For screen analysis, add a vision model:
{
"lmstudio_vision_model": "xtuner/llava-llama-3-8b-v1_1-gguf"
}What changes in local mode:
- Audio input โ Whisper (local) or Google STT
- Audio output โ Windows SAPI TTS (system voice)
- AI engine โ LM Studio OpenAI-compatible API
- All tools work the same way
jarvis/
โโโ main.py โ Entry point and Gemini Live engine
โโโ ui.py โ Desktop UI (Tkinter)
โโโ app_config.py โ Configuration manager
โโโ requirements.txt โ Python dependencies
โโโ setup.bat โ Automatic setup script
โโโ start.bat โ Quick launch script
โโโ pyrightconfig.json โ Type checking configuration
โโโ .gitignore
โโโ Fonts/ โ UI fonts
โโโ Icon/ โ Application icon
โโโ SFX/ โ Sound effects
โโโ core/
โ โโโ lmstudio_runtime.py โ Local AI engine
โ โโโ prompt.txt โ System prompt
โโโ actions/
โ โโโ __init__.py
โ โโโ browser.py โ Web browser control
โ โโโ calendar.py โ Calendar operations (Outlook + JSON)
โ โโโ health.py โ Health tracking
โ โโโ media.py โ Music/video playback
โ โโโ open_app.py โ Application launcher
โ โโโ reminders.py โ Reminder operations (Outlook + JSON)
โ โโโ screen_vision.py โ Screenshot + Vision AI
โ โโโ shell.py โ PowerShell integration
โ โโโ sys_info.py โ System info (psutil)
โ โโโ tts.py โ Windows SAPI text-to-speech
โ โโโ weather.py โ Weather (OpenWeatherMap)
โ โโโ whatsapp.py โ WhatsApp messaging
โ โโโ youtube_stats.py โ YouTube Data API
โโโ memory/
โ โโโ __init__.py
โ โโโ memory_manager.py โ JSON memory manager
โ โโโ memory.example.json โ Memory template
โ โโโ phone_book.example.json โ Phone book template
โโโ config/
โโโ api_keys.json โ API keys (gitignored)
โโโ api_keys.example.json โ Template
config/api_keys.jsonandmemory/memory.jsonare added to.gitignoreโ your API keys and personal data will not be included in the repository.phone_book.jsonis also excluded from publishing.- Only
*.example.jsontemplates are included in the repository.
- Fork this repository
- Create a new branch:
git checkout -b feature/new-feature - Commit your changes:
git commit -m "New feature: description" - Push your branch:
git push origin feature/new-feature - Open a Pull Request