A local Python assistant that can:
- Watch a selected monitor (screen capture).
- Let you ask by voice or text about what is on that screen.
- Answer using a vision-capable AI model.
- Optionally speak responses aloud.
- Python 3.10+
- An OpenAI API key (
OPENAI_API_KEY) - Microphone access
- Desktop audio support for text-to-speech
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtSet your API key:
export OPENAI_API_KEY="your_api_key_here"List monitors first:
python app.py --list-monitorsRun with monitor 1:
python app.py --monitor 1Enable spoken AI replies:
python app.py --monitor 1 --voice-reply- Type a question and press Enter, or
- Press Enter on an empty line to speak through your microphone.
- Type
/quitto exit.
- If the AI cannot read details, ask it to describe larger UI areas or zoom your screen.
PyAudioinstallation can require OS-level packages (for exampleportaudio).- Change model with
--modelif you prefer a different vision-capable OpenAI model.