"I am completely operational, and all my circuits are functioning perfectly."
Local, multimodal AI agent — sees, hears, thinks, speaks, and acts on your machine.
Cross-platform (macOS, Windows, Linux) · Free mode (zero API keys) · Claude Code co-work hub.
Product Page · Free Mode · Quick Start · Co-Work · Changelog
A local, multimodal AI agent that sees you via webcam, hears your voice, thinks via LLM, speaks with a cloned voice, acts on your machine, and integrates with Claude Code via MCP. Runs entirely on your machine with a browser-based control panel.
Works on macOS, Windows, and Linux. One codebase, auto-detects OS at runtime.
| Capability | How |
|---|---|
| See | Webcam feed with browser HUD — scanlines, corner brackets, REC indicator |
| Hear | Browser mic recording (Web Audio API, live waveform, silence detection) + Whisper STT (API or local faster-whisper) |
| Think | Multi-provider LLM (GPT-4o, Claude, Gemini, Ollama) with function calling |
| Speak | 3 voice providers — Edge TTS (free/fast), ElevenLabs (paid/best), XTTS (local/cloned) |
| Act | 43 cross-platform tools — shell, apps, files, web search, memory, clipboard, app automation, Claude Code delegation, background tasks, artifacts, multi-agent orchestration |
| Chat | Terminal-style chat with streaming responses, 35 slash commands (categorized menu, keyboard nav), mic button — type or speak to HAL |
| Disambiguate | Smart choice sheet UI — HAL presents numbered options, user clicks to select |
| Integrate | MCP server exposes 21 tools to Claude Code/Desktop for bidirectional AI collaboration |
| Remember | Typed persistent memory — facts, decisions, preferences, session summaries |
| Know | Knowledge upload (drag-drop or button) — PDFs, docs, code, images. BM25 keyword search, deep-read or skim modes. Also loads local files + remote llms.txt URLs at boot |
| Co-Work | Background task runner, artifact workspace, multi-agent orchestration, cross-agent context handoff |
┌──────────────────────────────────────────────────────────┐
│ HAL9000 ENGINE │
│ │
│ Vision ──┐ │
│ ├──→ Brain (LLM + function calling) │
│ Browser ──┘ │ │ │
│ Mic + Chat ▼ ▼ │
│ Voice Tools (43) │
│ (Edge/11Labs/XTTS) (OS agent layer) │
│ │ │
│ ▼ │
│ Browser Audio + Waveform │
│ │
│ Knowledge ─── Memory (typed) ─── Session Tracking │
│ TaskRunner ── Orchestrator ── Artifact Store │
└──────────────────────────────────────────────────────────┘
│ │
Flask Server MCP Server
localhost:9000 (Claude Code integration)
Run HAL with zero API keys and zero cost — fully local operation.
# 1. Install Ollama (local LLM)
brew install ollama # macOS
# or: curl -fsSL https://ollama.com/install.sh | sh # Linux
# or: download from ollama.com # Windows
# 2. Pull a model
ollama pull llama3.1
# 3. Set up HAL
git clone https://github.com/shandar/HAL9000.git
cd HAL9000 && python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# 4. One line in .env
echo "FREE_MODE=true" > .env
# 5. Run
python server.py| Layer | Free Provider | Paid Alternative |
|---|---|---|
| Brain | Ollama (Llama 3.1, Mistral, Phi-3) | GPT-4o, Claude, Gemini |
| STT | faster-whisper (local Whisper) | OpenAI Whisper API |
| TTS | Edge TTS (default, always free) | ElevenLabs, XTTS |
FREE_MODE=true overrides AI_PROVIDER, STT_PROVIDER, and TTS_PROVIDER in one toggle. You can also mix — e.g., FREE_MODE=true with STT_PROVIDER=whisper_api for local brain + cloud STT.
HAL auto-detects your OS and uses the right system commands:
| Feature | macOS | Windows | Linux |
|---|---|---|---|
| Volume | AppleScript | nircmd / PowerShell | pactl / amixer |
| Brightness | ioreg | WMI | brightnessctl |
| Notifications | osascript | Toast API | notify-send |
| Clipboard | pbcopy/pbpaste | Get/Set-Clipboard | xclip / wl-clipboard |
| Screenshot | screencapture | PIL.ImageGrab | scrot / grim |
| Battery | pmset | psutil / WMI | psutil / sysfs |
| WiFi | networksetup | netsh wlan | nmcli |
| Apps | open -a + .app scan | start + Start Menu scan | gtk-launch + .desktop scan |
| Terminal | AppleScript Terminal | Windows Terminal / cmd | gnome-terminal / konsole |
| Embedded Terminal | ✅ xterm.js + PTY | ❌ External only | ✅ xterm.js + PTY |
No #ifdef, no separate builds — one pip install, one python server.py.
| Component | Requirement |
|---|---|
| OS | macOS 12+, Windows 10+, or Ubuntu 20.04+ (any modern Linux) |
| CPU | 4 cores (Intel i5 / Apple M1 / AMD Ryzen 5 or better) |
| RAM | 8 GB (Ollama loads models into memory — llama3.1 8B needs ~5 GB) |
| Disk | 6 GB free (3 GB for Ollama model + 1 GB for faster-whisper model + HAL) |
| Python | 3.10 or higher |
| Browser | Any modern browser (Chrome, Firefox, Safari, Edge) |
| Microphone | Required for voice input (built-in or USB) |
| Webcam | Optional — required only for vision features |
| Network | Not required (fully offline operation) |
| Component | Requirement |
|---|---|
| RAM | 4 GB (no local models loaded) |
| Disk | 500 MB free |
| Network | Required (API calls to OpenAI/Anthropic/Google) |
| API Keys | At least OPENAI_API_KEY for GPT-4o + Whisper STT |
| Mode | Brain Latency | STT Latency | TTS Latency | RAM Usage |
|---|---|---|---|---|
| Free (Ollama llama3.1) | ~2-5s (CPU), ~1-2s (Apple Silicon) | ~1-3s (faster-whisper base) | ~0.7s (Edge TTS) | ~5-6 GB |
| Free (Ollama phi3) | ~1-2s (CPU) | ~1-3s | ~0.7s | ~3 GB |
| Paid (GPT-4o) | ~1-2s (API) | ~0.5s (Whisper API) | ~0.7s (Edge) | ~200 MB |
| Paid (Claude) | ~1-3s (API) | ~0.5s | ~1.2s (ElevenLabs) | ~200 MB |
Apple Silicon users: Ollama runs significantly faster on M1/M2/M3/M4 chips with Metal acceleration. An M1 MacBook Air can run llama3.1 8B comfortably.
GPU users (Linux/Windows): Ollama supports NVIDIA CUDA. With a 6 GB+ VRAM GPU, expect 2-3x faster inference than CPU.
git clone https://github.com/shandar/HAL9000.git
cd HAL9000
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env # fill in your API keys (or set FREE_MODE=true)
python server.py # start the web control panelOpen http://localhost:9000 → click Activate.
Free mode? Just
echo "FREE_MODE=true" > .env— no API keys needed. See Free Mode.
# Register HAL as an MCP server for Claude Code
claude mcp add hal-9000 -- python /path/to/HAL9000/hal_mcp_server.pyNow Claude Code can see through your webcam, speak aloud, control your Mac, access HAL's memory, and hand off session context.
HAL operates as a co-work hub — coordinating work across HAL, Claude Code CLI, and Claude Desktop.
- Memories are typed:
fact,decision,preference,task,session_summary - Sessions auto-summarize on shutdown — HAL remembers what happened
- Claude Code can call
hal_get_contextto load relevant context at session start - Manual "wrap up" via
hal_save_sessionor thesave_sessiontool
- Submit long-running coding tasks via
background_tasktool - Tasks run asynchronously via
claude --printwith real-time progress - Configurable concurrency (default 2), 600s timeout, cancellation
- Task queue panel in the UI shows status with live updates
- HAL creates visual artifacts (code, markdown, HTML, Mermaid diagrams) via
create_artifact - Artifacts appear in a tabbed workspace panel alongside the chat
- 3-column layout when artifacts are active
- Copy button, close button, sandboxed HTML rendering
- Spawn multiple named Claude Code agents on parallel tasks via
orchestrate - Conflict detection when agents modify the same files
- Agent dashboard with status indicators, file lists, cancel controls
- Results summarized and stored in session memory
With
FREE_MODE=true, no API keys are needed at all. See Free Mode.
| Key | Where | Required |
|---|---|---|
OPENAI_API_KEY |
platform.openai.com | Only if using GPT-4o brain or Whisper API STT |
ANTHROPIC_API_KEY |
console.anthropic.com | Only if AI_PROVIDER=anthropic |
GEMINI_API_KEY |
aistudio.google.com | Only if AI_PROVIDER=gemini |
ELEVENLABS_API_KEY |
elevenlabs.io | Only if TTS_PROVIDER=elevenlabs |
ELEVENLABS_VOICE_ID |
ElevenLabs voice library | Only if TTS_PROVIDER=elevenlabs |
No API key needed for: Edge TTS (default voice), Ollama (local LLM), faster-whisper (local STT).
HAL supports 3 TTS providers, switchable at runtime from the dashboard:
| Provider | Cost | Speed | Quality | Config |
|---|---|---|---|---|
| Edge TTS (default) | Free | ~0.7s | Good — deep male voice | TTS_PROVIDER=edge |
| ElevenLabs | Paid | ~1.2s | Best — natural prosody | TTS_PROVIDER=elevenlabs |
| XTTS (local) | Free | ~4.5s | Good — voice cloning | TTS_PROVIDER=local (requires Python 3.11) |
Switch from the dashboard UI or set TTS_PROVIDER in .env.
Access at http://localhost:9000 after starting the server.
| Panel | Description |
|---|---|
| HAL panel | HAL 9000 eye with real-time waveform visualization overlaid on red block during speech |
| HAL image controls | 3D-style Vision/Voice/Claude buttons positioned on the HAL image strip |
| Webcam HUD | Live MJPEG feed with scanlines, corner brackets, REC indicator — collapses when vision is off |
| Voice selector | Segmented switch to swap between Edge/ElevenLabs/XTTS at runtime |
| Chat window | Terminal-style chat with prompt prefixes, streaming responses, formatted lists, mic button — Enter to send, Shift+Enter for newline |
| Slash commands | 35 categorized commands with keyboard navigation — type / to open menu |
| Choice sheet | Slide-up modal for disambiguation — auto-detects when HAL presents numbered options |
| Task queue | Collapsible panel showing background tasks and agents with live status |
| Workspace | Tabbed artifact panel — code, diagrams, HTML — stacks above chat when artifacts are created |
| Embedded terminal | Full interactive xterm.js terminal (PTY-backed) — run shell, Claude Code, review artifacts in-app (macOS/Linux) |
| Resizable layout | Drag handles between columns to resize HAL, workspace, and chat panels |
| Power button | Circular SVG power icon in top toolbar — activates/deactivates HAL |
| Status bar | Connection status, timestamp, version |
| Boot greeting | Time-aware creative HAL-style greeting with 20 randomized boot lines |
The UI uses a sci-fi industrial aesthetic — brushed metal bezels, LED indicator lights, recessed panels.
PWA support: Add to Home Screen on mobile for a native app experience.
HAL9000/
├── server.py # Flask web server + API endpoints (localhost only)
├── hal9000.py # HAL engine — lifecycle, main loop, browser audio
├── hal_mcp_server.py # MCP server for Claude Code/Desktop integration (21 tools)
├── config.py # Settings + env loading with safe parsing
├── requirements.txt
├── .env.example
├── .mcp.json # Project MCP config for Claude Code
│
├── core/
│ ├── brain.py # Multi-provider LLM + function calling (thread-safe)
│ ├── vision.py # Webcam capture + MJPEG stream
│ ├── hearing.py # Mic recording + VAD + Whisper STT
│ ├── voice.py # Multi-provider TTS (Edge/ElevenLabs/XTTS)
│ ├── memory_store.py # Typed memory store with auto-migration
│ ├── task_runner.py # Async background task queue for Claude Code
│ ├── orchestrator.py # Multi-agent coordinator with conflict detection
│ ├── terminal_server.py # Embedded WebSocket terminal (xterm.js PTY bridge, port 9001)
│ ├── platform/ # Cross-platform OS abstraction (auto-detected)
│ │ ├── __init__.py # Auto-detect: Darwin → mac, Windows → windows, Linux → linux
│ │ ├── base.py # Abstract PlatformAPI interface (15 methods)
│ │ ├── mac.py # macOS: AppleScript, osascript, pbcopy, screencapture
│ │ ├── windows.py # Windows: PowerShell, WMI, Toast, PIL.ImageGrab
│ │ └── linux.py # Linux: pactl, xclip, notify-send, brightnessctl
│ │
│ ├── tools/ # Tool registry + 43 tools across 8 domain modules
│ │ ├── __init__.py # Registry, execute(), format converters, security
│ │ ├── shell.py # run_shell (whitelisted commands)
│ │ ├── apps.py # open/quit/list apps, open URLs, app_action (cross-platform)
│ │ ├── files.py # list/read/write/search/info
│ │ ├── system.py # volume, brightness, notifications, clipboard, screenshot (cross-platform)
│ │ ├── web.py # web_search, fetch_url
│ │ ├── memory.py # remember, recall, forget, list_memories, save_session
│ │ ├── delegation.py # claude_code, background_task, orchestrate, agents (cross-platform)
│ │ └── artifacts.py # create_artifact, update_artifact
│ └── knowledge.py # Knowledge loader (files + URLs) + upload ingestion + BM25 search
│
├── knowledge/ # Drop files here or upload via UI — HAL indexes at boot + runtime
│ ├── sources.txt # Remote URLs to fetch (llms.txt, etc.)
│ └── *.txt # Local knowledge files
│
├── memory/ # Persistent typed memory (created at runtime)
│ └── facts.json # Typed entries: {id, type, content, timestamp, source, session_id, metadata}
│
├── assets/
│ ├── HAL.png # Dashboard hero image
│ ├── HAL-eye.png # App icon / PWA icon source
│ ├── manifest.json # PWA manifest
│ ├── sw.js # Service worker for PWA
│ └── voice/ # XTTS reference clips (optional)
│
└── templates/
└── index.html # Web dashboard + chat UI + waveform + task panel + workspace
HAL has been security-hardened:
| Measure | Detail |
|---|---|
| Command whitelist | run_shell only allows 77 approved commands (ls, git, npm, etc.) |
| Blocked commands | sudo, shutdown, diskutil, etc. are explicitly blocked |
| AppleScript escaping | All user strings escaped before osascript interpolation |
| App action blocklist | app_action blocks do shell script, system events, etc. |
| Localhost binding | Flask binds to 127.0.0.1 by default (override with HAL_HOST) |
| Code exec guard | /api/run only accepts requests from localhost (127.0.0.1 / ::1) |
| WebSocket origin check | Terminal WebSocket validates origin header — rejects cross-site connections |
| Input length limits | Chat input capped at 2000 chars |
| Secret file blocking | read_file refuses to read .env, credentials.json, etc. |
| Safe config parsing | Malformed env vars fall back to defaults instead of crashing |
No shell=True |
All subprocess calls use argument lists, never shell interpretation |
All settings in .env. See .env.example for the full list.
| Setting | Default | Description |
|---|---|---|
FREE_MODE |
false |
One toggle for zero-cost: Ollama + faster-whisper + Edge TTS |
AI_PROVIDER |
openai |
Brain provider: openai, anthropic, gemini, ollama |
STT_PROVIDER |
whisper_api |
Speech-to-text: whisper_api (cloud) or faster_whisper (local) |
TTS_PROVIDER |
edge |
Voice provider: edge, elevenlabs, local |
OLLAMA_MODEL |
llama3.1 |
Ollama model name (when using Ollama brain) |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
WHISPER_MODEL_SIZE |
base |
faster-whisper model: tiny, base, small, medium |
EDGE_VOICE |
en-US-GuyNeural |
Edge TTS voice ID |
FRAME_INTERVAL |
2.0 |
Seconds between webcam samples |
MIC_RECORD_SECONDS |
5 |
Max recording duration per utterance |
SILENCE_THRESHOLD |
500 |
Audio amplitude below this = silence |
SERVER_PORT |
9000 |
Web server port |
TOOL_MAX_ITERATIONS |
5 |
Max tool calls per conversation turn |
TASK_TIMEOUT |
600 |
Background task timeout (seconds) |
MAX_CONCURRENT_TASKS |
2 |
Max parallel background tasks |
MAX_AGENTS |
4 |
Max orchestrated agents |
HAL_TERMINAL_PORT |
9001 |
WebSocket port for embedded terminal |
HAL_HOST |
127.0.0.1 |
Server bind address (use 0.0.0.0 for LAN access) |
feat(core): add wake word detection
fix(vision): handle no-camera fallback
chore(deps): update anthropic sdk
- Changelog — Version history and changes
