LLMHome is a fully local home-automation system that connects a quantized LLM running via llama.cpp to a Home Assistant installation.
It supports natural-language device control, task/memory management, and contextual reasoning using home data.
All inference runs locally using a 3B-parameter model.
- Fully local LLM using
llama.cpp - Integration with Home Assistant via REST API
- Natural-language device control
- Persistent memory (
memory.json) - JSON-based action schema
- Terminal interface for user commands
- Expandable tool set for future automation
LLMHome/
│── assistant.py # Backend LLM + Home Assistant logic (FastAPI server)
│── interface.py # Terminal interface for sending natural-language queries
│── home.py # Home Assistant API utilities
│── home.env # Environment variables (HA URL + Token)
│── requirements.txt # Python dependencies
│── memory.json # Persistent event/task memory
│── models/ # GGUF models for llama.cpp
│── venv/ # Python virtual environment (created by user after cloning)
└── README.md
- Ubuntu / Linux
- Python 3.10+
- Minimum 8 GB RAM
- Working Home Assistant instance with API access
python3 -m venv venvsource venv/bin/activateInside the activated venv:
pip install -r requirements.txtRequired packages include:
fastapi
uvicorn
llama_cpp_python
requests
python-dotenv
This project uses llama.cpp through Python bindings, provided by the package:
llama_cpp_python
The package is already in requirements.txt, but you can install manually:
pip install llama_cpp_pythonTo use CUDA:
pip install llama_cpp_python[cuda]python3 -c "import llama_cpp; print('llama_cpp working')"If no errors appear, the binding is successfully installed.
assistant.py loads the GGUF model like this:
from llama_cpp import Llama
llm = Llama(
model_path="models/Qwen2.5-3B-Instruct-Q4_K_M.gguf",
n_ctx=4096,
n_threads=6
)You can adjust:
n_ctx→ context lengthn_threads→ number of CPU threadsn_gpu_layers→ number of layers offloaded to GPU (if configured)
- Open Home Assistant
- Navigate to Profile
- Create a Long-Lived Access Token
HA_URL=http://homeassistant.local:8123
HA_TOKEN=YOUR_LONG_LIVED_ACCESS_TOKEN
Recommended model:
Qwen2.5-3B-Instruct-GGUF
Download:
hf download Qwen/Qwen2.5-3B-Instruct-GGUF --include "*.gguf" --output-dir models/Ensure the model file is located inside:
models/
Use any quantization such as Q4_K_M.
Start the backend:
python3 assistant.pyExpected output:
Starting LLM Home Assistant...
Uvicorn running on http://0.0.0.0:8000
Run:
python3 interface.pySample interaction:
Room: kitchen
Command: turn on the lights
The terminal sends the user's natural-language query to the backend.
Tasks, reminders, and events are stored in memory.json.
The backend generates a structured prompt containing:
- Device states
- Time and date
- Recent events
- User tasks
- Available Home Assistant actions
The model returns JSON:
{
"response": "Turning on the kitchen light.",
"action": {
"domain": "light",
"service": "turn_on",
"entity_id": "light.kitchen"
}
}Home Assistant REST API is called with the action.
All actions are saved in memory.json.
Device Control
- Turn on the living room lights
- Switch off the bedroom fan
Tasks
- Remember to pay the rent
- Show my tasks
Status
- What devices are currently on?
- List all kitchen sensors
- Voice-to-text interface
- Camera/sensor summarization
- Energy efficiency analytics
- Automatic room context detection
- Custom fine-tuned home LLM