A local AI server with automatic backend management, text-to-speech, and a web-based control plane
Overview • Quick Start • Web UI • CLI Reference • Configuration • Architecture
Tama is a local AI server written in Rust that provides an OpenAI-compatible API on a single port. It automatically manages backend lifecycles — starting models on demand, routing requests, and unloading idle models to save resources.
Key features:
- OpenAI-compatible API — Works with any client that supports the OpenAI API format
- Text-to-Speech (TTS) — Built-in Kokoro-FastAPI backend for speech synthesis via
/v1/audio/*endpoints - Automatic backend management — Starts, routes, and unloads llama.cpp/ik_llama backends on demand
- Web-based control plane — Browser UI for managing models, TTS backends, viewing logs, benchmarks, downloads, and editing configuration
- GPU acceleration — Supports CUDA, Vulkan, Metal, and ROCm
- Linux support with native systemd integration
- Model optimization — Automatically detects VRAM and suggests optimal quantizations and context sizes
- Benchmarks — Run llama-bench and speculative decoding benchmarks from the CLI or web UI
- Downloads Center — Persistent download queue with real-time progress tracking
- Updates Center — Per-quant update management with automatic version checking
- Backup & Restore — Create and restore full configuration backups (config, model cards, database)
- Max loaded models — LRU eviction to cap concurrent model loads
- Multi-version backends — Install and switch between multiple backend versions
Linux (Debian/Ubuntu):
sudo dpkg -i tama_*.debLinux (Fedora/RHEL):
sudo rpm -i tama-*.rpmtama service install
tama service startTip
On Linux, Tama creates a systemd user unit.
Tama includes a web-based control plane for managing models, viewing logs, and editing configuration from your browser.
The web server starts automatically alongside the proxy when using tama service start.
For development or manual startup:
cargo run --package tama-web -- web --port 11435Open http://localhost:11435 to access the dashboard.
Note
The web UI proxies all /tama/v1/ requests to the running Tama proxy (default http://127.0.0.1:11434).
- Dashboard — Resource monitoring tiles (CPU, memory, GPU, VRAM) with sparkline charts, active models list with status and quick-load buttons
- Models — View installed models, pull new ones from HuggingFace, edit model configurations, manage sampling profiles
- Backends — Manage llama.cpp and ik_llama installations, switch between versions, update to latest
- Logs — Real-time log streaming with filtering
- Updates — Check for model/backend updates, track per-quant update status, apply updates in queue
- Downloads — Persistent download queue with progress tracking, history, and toast notifications
- Benchmarks — Run llama-bench or speculative decoding benchmarks, select backends and presets, view results table (tokens, PP/TG speed)
- Config Editor — Edit the full configuration directly from the browser with validation
- Model status tiles — See which models are running, their active backends, quantization, context size, and lifecycle state (idle/loading/loaded/unloading/failed)
- Sparkline charts — Real-time CPU, memory, GPU, and VRAM usage graphs
- Job log panel — Shared component for streaming backend logs with terminal styling
- Install modal — Guided installation flow for models and backends
- Model editor — Full model configuration editing with quantization selector, context length, sampling templates, and pull wizard
| Command | Description |
|---|---|
tama serve |
Start the OpenAI-compatible API server (port 11434) |
tama status |
Show status of all servers and running models |
tama service install |
Install as a system service |
tama service start |
Start the service |
tama service stop |
Stop the service |
tama service restart |
Restart the service |
tama service remove |
Remove the service |
| Command | Description |
|---|---|
tama model pull <repo> |
Pull a model from HuggingFace with quantization selection |
tama model ls |
List installed models |
tama model create |
Create a model config from an installed model |
tama model enable <name> |
Enable a model for on-demand loading |
tama model disable <name> |
Disable a model |
tama model rm <model> |
Remove an installed model |
tama model scan |
Scan for untracked GGUF files |
tama model search <query> |
Search HuggingFace for GGUF models |
tama model update [model] |
Check for and download model updates |
tama model verify [model] |
Verify GGUF files against HuggingFace hashes |
tama model prune |
Remove orphaned GGUF files |
tama model migrate |
Migrate model configs from TOML to database |
Tama manages LLM backend installations with automatic version tracking:
tama backend install llama_cpp # Download pre-built llama.cpp binaries
tama backend install ik_llama # Build from source
tama backend install llama_cpp --version b8407 # Specific version
tama backend install llama_cpp --build # Force build from source
tama backend update <name> # Update to latest version
tama backend list # List installed backends
tama backend remove <name> # Remove a backend
tama backend check-updates # Check for updatesTama supports Kokoro-FastAPI as a TTS backend, exposing OpenAI-compatible /v1/audio/* endpoints:
tama tts install kokoro_fastapi # Install the Kokoro-FastAPI backend
tama tts list # List available TTS backends
tama tts voices # List available voice optionstama server ls # List all servers with status
tama server add <name> <cmd> # Add a new server
tama server edit <name> <cmd> # Edit an existing server
tama server rm <name> # Remove a servertama profile list # List all available profiles
tama profile set <server> <name> # Set a server's sampling profile
tama profile clear <server> # Clear a server's sampling profile| Command | Description |
|---|---|
tama config show |
Print the current configuration |
tama config edit |
Open config file in editor |
tama config path |
Show the config file path |
tama backup # Create a backup archive (config + DB + model cards)
tama backup --output tama-backup.tar.gz # Custom output path
tama backup --dry-run # Preview what would be backed up
tama restore tama-backup.tar.gz # Restore from backup (merges config, models, database)
tama restore --skip-backends # Skip backend re-installation
tama restore --skip-models # Skip model re-downloadingNote
Backups include config.toml, model card files, and the SQLite database. Model GGUF files and backend binaries are not included — they must be re-downloaded after restore.
tama bench # Run a benchmark (llama-bench)
tama bench --backend <name> # Specify a backendBenchmarks can also be run from the web UI's Benchmarks page, which supports:
- llama-bench runner with preset configurations
- Speculative decoding benchmarks (
llama-clispec bench mode) - Backend selector and results table showing tokens, prompt processing (PP), and token generation (TG) speed
tama self-update # Update Tama to the latest versionTama auto-generates a config on first run:
- Linux:
~/.config/tama/config.toml
[backends.llama_cpp]
path = "/path/to/llama-server"
health_check_url = "http://localhost:8080/health"
[supervisor]
restart_policy = "always"
max_restarts = 10
restart_delay_ms = 3000
health_check_interval_ms = 5000
[proxy]
host = "0.0.0.0"
port = 11434
idle_timeout_secs = 300
startup_timeout_secs = 120
[max_loaded_models]
enabled = false
max = 5 # Maximum number of models loaded simultaneously (LRU eviction)Note
On first run after upgrading from kronk, Tama automatically migrates ~/.config/kronk to ~/.config/tama. Model configs are now stored in the SQLite database (tama.db) rather than config.toml — a migration runs automatically on upgrade.
~/.config/tama/
├── config.toml Main configuration (backends, proxy, supervisor)
├── tama.db SQLite database (models, backends, pulls, benchmarks)
├── configs/ Model cards with quant info and sampling presets
│ └── bartowski--OmniCoder-8B.toml
├── models/ GGUF model files
│ └── bartowski/OmniCoder-8B/*.gguf
├── backends/ llama.cpp and ik_llama binaries (versioned)
├── tts/ TTS backend installations (Kokoro-FastAPI)
└── logs/ Service logs
The installer detects your GPU and offers these acceleration options:
- CUDA (NVIDIA) — Fast inference on NVIDIA GPUs
- Vulkan (AMD/Intel/NVIDIA) — Cross-platform GPU acceleration
- Metal (Apple Silicon) — Native macOS GPU acceleration
- ROCm (AMD) — AMD GPU support on Linux
- CPU — Fallback when no GPU is available
tama/
├── crates/
│ ├── tama-core/ # Config, process supervisor, proxy, platform abstraction
│ ├── tama-cli/ # CLI binary with clap
│ ├── tama-mock/ # Mock LLM backend for testing
│ └── tama-web/ # Leptos web control plane (WASM + SSR)
├── config/ # Configuration templates
├── docs/ # Documentation
└── modelcards/ # Community model cards
- tama-core — Config management, process supervision, backend registry, proxy server with streaming, database (SQLite), backup/restore, benchmark runner, download queue
- tama-cli — Command-line interface with clap, interactive prompts with inquire
- tama-web — Leptos WASM frontend with real-time SSE updates, SSR server for hosting
- tama-mock — Mock backend for testing and development
tama serve(ortama service start) starts an OpenAI-compatible API server on port 11434- When a request arrives with
"model": "my-model", tama looks up the config from the database - If the backend isn't running, tama auto-assigns a free port and starts it
- The request is forwarded to the backend and the response is streamed back
- After
idle_timeout_secsof inactivity, the backend is shut down
The proxy exposes OpenAI-compatible API endpoints:
/tama/v1/chat/completions— Chat completions (streaming & non-streaming)/tama/v1/completions— Legacy completions/tama/v1/models— Model listing/tama/v1/audio/*— TTS endpoints (/v1/audio/speech,/v1/audio/models)/tama/v1/embeddings— Embeddings
All other non-tama paths are forwarded to the active backend via wildcard forwarding.
git clone https://github.com/danielcherubini/tama.git
cd tama
cargo build --releaseThe binary is at target/release/tama.
For development with the web UI:
# Install trunk for frontend builds
cargo install trunk
# Build and run with web features
cargo run --package tama-web -- web- TUI Dashboard — Terminal UI with ratatui for resource monitoring
- System tray — Quick service toggle from the system tray
- Tauri GUI — Lightweight desktop frontend for non-CLI users
