Skip to content

bnsware/jarvis-windows

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– J.A.R.V.I.S โ€” Windows AI Assistant

A voice-enabled Windows desktop assistant powered by Gemini Live API or local LM Studio

Python Platform Gemini


๐Ÿ“‹ Table of Contents


๐ŸŽฏ About the Project

JARVIS is a real-time voice AI assistant developed for the Windows desktop environment. It is built on the Google Gemini 2.5 Flash Live API and can operate in both cloud (Gemini) and fully offline local (LM Studio) modes.

It communicates with the user via voice, processes voice commands in real time, and executes real actions on Windows through 16+ integrated tools.


โœจ Features

Voice and Speech

  • ๐ŸŽ™๏ธ Real-time audio streaming โ€” 16 kHz input, 24 kHz output via PyAudio
  • ๐Ÿ”Š Natural voice responses โ€” Gemini Native Audio or Windows SAPI TTS
  • โœ๏ธ Text mode โ€” Written command support alongside voice
  • ๐Ÿ”‡ Pause / Resume โ€” Instant pause without dropping the session

System Integration

  • ๐Ÿ–ฅ๏ธ Application management โ€” Open any Windows application by voice
  • ๐Ÿ“Š System info โ€” CPU, RAM, disk, battery, time, date, network status
  • ๐Ÿ’ป PowerShell โ€” Run terminal commands via voice
  • ๐Ÿ‘๏ธ Screen analysis โ€” Capture and analyze the active window with AI (Gemini Vision or LM Studio Vision)

Productivity

  • ๐Ÿ“… Calendar โ€” Outlook or local JSON calendar; read, add, and delete events
  • โฐ Reminders โ€” Outlook Tasks or local JSON; create and list reminders
  • ๐Ÿง  Persistent memory โ€” Save and delete user-specific information in JSON

Communication and Media

  • ๐Ÿ’ฌ WhatsApp โ€” Compose and auto-send messages via Desktop or Web; import VCF contacts
  • ๐ŸŒฆ๏ธ Weather โ€” Real-time weather summary (OpenWeatherMap)
  • ๐ŸŽต Media playback โ€” YouTube, Spotify Desktop, and Apple Music Web integration
  • ๐ŸŒ Browser control โ€” Open URLs, Google search, play YouTube videos
  • ๐Ÿ“ˆ YouTube Analytics โ€” Channel statistics and video performance reports

Dual Backend Support

  • โ˜๏ธ Gemini mode โ€” Google Gemini 2.5 Flash Live API
  • ๐Ÿ  Local mode โ€” Fully offline operation with LM Studio (OpenAI-compatible); Whisper or Google STT

๐Ÿ—๏ธ Architecture

jarvis/
โ”œโ”€โ”€ main.py                  โ† Gemini Live session manager (JarvisLive)
โ”œโ”€โ”€ ui.py                    โ† Tkinter-based desktop UI (JarvisUI)
โ”œโ”€โ”€ app_config.py            โ† Configuration read/write
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ lmstudio_runtime.py  โ† Local mode engine (JarvisLocal)
โ”‚   โ””โ”€โ”€ prompt.txt           โ† AI system prompt
โ”œโ”€โ”€ actions/                 โ† Tool modules
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ memory/                  โ† Persistent memory
โ””โ”€โ”€ config/                  โ† API keys

Data Flow โ€” Gemini Mode

Microphone โ†’ PyAudio โ†’ Gemini Live API
                             โ†“
                      Tool Call Detection
                             โ†“
                      actions/* modules
                             โ†“
                      Result โ†’ Gemini โ†’ Audio Output โ†’ Speaker

Data Flow โ€” Local Mode

Microphone โ†’ SpeechRecognition (Whisper/Google STT) โ†’ Text
                                                          โ†“
                                                   LM Studio API
                                                          โ†“
                                                  Tool Call Detection
                                                          โ†“
                                                  actions/* modules
                                                          โ†“
                                                  Text โ†’ Windows SAPI TTS โ†’ Speaker

๐Ÿ“ฆ Requirements

Requirement Details
Python 3.10 or higher
Operating System Windows 10 / 11
Gemini API Key Free from Google AI Studio
Microphone Any USB or built-in microphone
YouTube API Optional โ€” for channel statistics
LM Studio Optional โ€” for local offline mode

Python Dependencies

google-genai
SpeechRecognition
pyaudio
psutil
Pillow
requests
pyautogui
pyttsx3
pywin32
openai-whisper  (optional, for local STT)

๐Ÿš€ Installation

1. Clone the Repository

git clone https://github.com/bnsware/jarvis.git
cd jarvis

2. Automatic Setup (Recommended)

setup.bat

This script performs the following steps in order:

  • Checks for Python installation
  • Creates a venv virtual environment
  • Installs required fonts on Windows
  • Installs packages from requirements.txt
  • Creates config/api_keys.json from the template

3. Manual Setup

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
copy config\api_keys.example.json config\api_keys.json

Note: If you get an error installing PyAudio:

pip install pipwin
pipwin install pyaudio

โš™๏ธ Configuration

Edit the config/api_keys.json file:

{
  "gemini_api_key": "AIza...",
  "voice": "Charon",
  "youtube_api_key": "",
  "youtube_channel_handle": "@yourchannel",
  "backend": "gemini",
  "lmstudio_base_url": "http://127.0.0.1:1234/v1",
  "lmstudio_model": "local-model",
  "lmstudio_api_key": "lm-studio",
  "stt_engine": "whisper",
  "stt_language": "en-US"
}
Field Description Required
gemini_api_key Google Gemini API key Yes in Gemini mode
voice Voice tone: Charon, Aoede, Fenrir, Kore, Puck No
youtube_api_key YouTube Data API v3 key For channel reports
youtube_channel_handle Handle in @yourchannel format For channel reports
backend gemini or lmstudio No (default: gemini)
lmstudio_base_url LM Studio server address Yes in local mode
lmstudio_model Local model name to use Yes in local mode
stt_engine whisper or google In local mode
stt_language STT language code In local mode

๐ŸŽฎ Usage

Launching

start.bat

or manually:

venv\Scripts\activate
python main.py

Interface

When JARVIS opens, a modern desktop window appears:

  • Status indicator โ€” LISTENING, THINKING, SPEAKING, ERROR
  • Log panel โ€” Real-time conversation stream
  • Text box โ€” Enter written commands alongside voice
  • Pause button โ€” Silences the assistant without dropping the session
  • System panel โ€” CPU, RAM, battery, and weather widgets

๐Ÿ—ฃ๏ธ Supported Commands

System and Applications

"Open Spotify"
"Launch VS Code"
"What's the battery status?"
"How much RAM is being used?"
"What time is it?"
"List files on the desktop"

Calendar

"What's on my calendar today?"
"What's my schedule tomorrow?"
"What's my next meeting?"
"What's on my calendar for the next 30 days?"
"Add a dentist appointment tomorrow at 2 PM"
"Add a meeting Monday from 10:00 to 11:00"
"Delete the meeting from my calendar"

Reminders

"What are my reminders for today?"
"Show my upcoming reminders"
"Remind me about the dentist tomorrow morning at 9"
"Remind me to buy milk this evening"

Weather and Browser

"What's the weather in New York?"
"Search Google for learn Python"
"Open The Weeknd Blinding Lights on YouTube"
"Open github.com"

Media

"Play The Weeknd Blinding Lights on Spotify"
"Open Adele Hello on Apple Music"
"Play lofi beats on YouTube"

WhatsApp

"Send a goodnight message to Mom on WhatsApp"
"Prepare a WhatsApp message to John: let's meet tomorrow"

Screen Analysis

"What's on the screen?"
"Read this error"
"Analyze this window"
"What does it say here?"

YouTube Analytics

"How are my YouTube stats?"
"Analyze my recent videos"
"Summarize my channel growth"

Memory

"My name is Alex"
"My project is a Python application"
"Remove the Claude limit note from your memory"
"Forget this"

๐Ÿ  Local Mode (LM Studio)

To run entirely locally without an internet connection:

1. Download LM Studio and load a model: Download from lmstudio.ai, select a model, and start the server.

2. Update config/api_keys.json:

{
  "backend": "lmstudio",
  "lmstudio_base_url": "http://127.0.0.1:1234/v1",
  "lmstudio_model": "lmstudio-community/mistral-7b-instruct",
  "stt_engine": "whisper"
}

3. For screen analysis, add a vision model:

{
  "lmstudio_vision_model": "xtuner/llava-llama-3-8b-v1_1-gguf"
}

What changes in local mode:

  • Audio input โ†’ Whisper (local) or Google STT
  • Audio output โ†’ Windows SAPI TTS (system voice)
  • AI engine โ†’ LM Studio OpenAI-compatible API
  • All tools work the same way

๐Ÿ“ Project Structure

jarvis/
โ”œโ”€โ”€ main.py                  โ† Entry point and Gemini Live engine
โ”œโ”€โ”€ ui.py                    โ† Desktop UI (Tkinter)
โ”œโ”€โ”€ app_config.py            โ† Configuration manager
โ”œโ”€โ”€ requirements.txt         โ† Python dependencies
โ”œโ”€โ”€ setup.bat                โ† Automatic setup script
โ”œโ”€โ”€ start.bat                โ† Quick launch script
โ”œโ”€โ”€ pyrightconfig.json       โ† Type checking configuration
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ Fonts/                   โ† UI fonts
โ”œโ”€โ”€ Icon/                    โ† Application icon
โ”œโ”€โ”€ SFX/                     โ† Sound effects
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ lmstudio_runtime.py  โ† Local AI engine
โ”‚   โ””โ”€โ”€ prompt.txt           โ† System prompt
โ”œโ”€โ”€ actions/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ browser.py           โ† Web browser control
โ”‚   โ”œโ”€โ”€ calendar.py          โ† Calendar operations (Outlook + JSON)
โ”‚   โ”œโ”€โ”€ health.py            โ† Health tracking
โ”‚   โ”œโ”€โ”€ media.py             โ† Music/video playback
โ”‚   โ”œโ”€โ”€ open_app.py          โ† Application launcher
โ”‚   โ”œโ”€โ”€ reminders.py         โ† Reminder operations (Outlook + JSON)
โ”‚   โ”œโ”€โ”€ screen_vision.py     โ† Screenshot + Vision AI
โ”‚   โ”œโ”€โ”€ shell.py             โ† PowerShell integration
โ”‚   โ”œโ”€โ”€ sys_info.py          โ† System info (psutil)
โ”‚   โ”œโ”€โ”€ tts.py               โ† Windows SAPI text-to-speech
โ”‚   โ”œโ”€โ”€ weather.py           โ† Weather (OpenWeatherMap)
โ”‚   โ”œโ”€โ”€ whatsapp.py          โ† WhatsApp messaging
โ”‚   โ””โ”€โ”€ youtube_stats.py     โ† YouTube Data API
โ”œโ”€โ”€ memory/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ memory_manager.py    โ† JSON memory manager
โ”‚   โ”œโ”€โ”€ memory.example.json  โ† Memory template
โ”‚   โ””โ”€โ”€ phone_book.example.json โ† Phone book template
โ””โ”€โ”€ config/
    โ”œโ”€โ”€ api_keys.json        โ† API keys (gitignored)
    โ””โ”€โ”€ api_keys.example.json โ† Template

๐Ÿ”’ Security

  • config/api_keys.json and memory/memory.json are added to .gitignore โ€” your API keys and personal data will not be included in the repository.
  • phone_book.json is also excluded from publishing.
  • Only *.example.json templates are included in the repository.

๐Ÿค Contributing

  1. Fork this repository
  2. Create a new branch: git checkout -b feature/new-feature
  3. Commit your changes: git commit -m "New feature: description"
  4. Push your branch: git push origin feature/new-feature
  5. Open a Pull Request

Developed by bnsware and alppunlu

About

JARVIS is a real-time voice AI assistant developed for the Windows desktop environment. It is built on the Google Gemini 2.5 Flash Live API and can operate in both cloud (Gemini) and fully offline local (LM Studio) modes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.2%
  • Batchfile 0.8%