Tama

A local AI server with automatic backend management, text-to-speech, and a web-based control plane

Overview • Quick Start • Web UI • CLI Reference • Configuration • Architecture

Overview

Tama is a local AI server written in Rust that provides an OpenAI-compatible API on a single port. It automatically manages backend lifecycles — starting models on demand, routing requests, and unloading idle models to save resources.

Key features:

OpenAI-compatible API — Works with any client that supports the OpenAI API format
Text-to-Speech (TTS) — Built-in Kokoro-FastAPI backend for speech synthesis via /v1/audio/* endpoints
Automatic backend management — Starts, routes, and unloads llama.cpp/ik_llama backends on demand
Web-based control plane — Browser UI for managing models, TTS backends, viewing logs, benchmarks, downloads, and editing configuration
GPU acceleration — Supports CUDA, Vulkan, Metal, and ROCm
Linux support with native systemd integration
Model optimization — Automatically detects VRAM and suggests optimal quantizations and context sizes
Benchmarks — Run llama-bench and speculative decoding benchmarks from the CLI or web UI
Downloads Center — Persistent download queue with real-time progress tracking
Updates Center — Per-quant update management with automatic version checking
Backup & Restore — Create and restore full configuration backups (config, model cards, database)
Max loaded models — LRU eviction to cap concurrent model loads
Multi-version backends — Install and switch between multiple backend versions

Quick Start

Installation

Linux (Debian/Ubuntu):

sudo dpkg -i tama_*.deb

Linux (Fedora/RHEL):

sudo rpm -i tama-*.rpm

Run Tama

tama service install
tama service start

Tip

On Linux, Tama creates a systemd user unit.

Web UI

Tama includes a web-based control plane for managing models, viewing logs, and editing configuration from your browser.

Running the web UI

The web server starts automatically alongside the proxy when using tama service start.

For development or manual startup:

cargo run --package tama-web -- web --port 11435

Open http://localhost:11435 to access the dashboard.

Note

The web UI proxies all /tama/v1/ requests to the running Tama proxy (default http://127.0.0.1:11434).

Pages

Dashboard — Resource monitoring tiles (CPU, memory, GPU, VRAM) with sparkline charts, active models list with status and quick-load buttons
Models — View installed models, pull new ones from HuggingFace, edit model configurations, manage sampling profiles
Backends — Manage llama.cpp and ik_llama installations, switch between versions, update to latest
Logs — Real-time log streaming with filtering
Updates — Check for model/backend updates, track per-quant update status, apply updates in queue
Downloads — Persistent download queue with progress tracking, history, and toast notifications
Benchmarks — Run llama-bench or speculative decoding benchmarks, select backends and presets, view results table (tokens, PP/TG speed)
Config Editor — Edit the full configuration directly from the browser with validation

Components

Model status tiles — See which models are running, their active backends, quantization, context size, and lifecycle state (idle/loading/loaded/unloading/failed)
Sparkline charts — Real-time CPU, memory, GPU, and VRAM usage graphs
Job log panel — Shared component for streaming backend logs with terminal styling
Install modal — Guided installation flow for models and backends
Model editor — Full model configuration editing with quantization selector, context length, sampling templates, and pull wizard

CLI Reference

Server management

Command	Description
`tama serve`	Start the OpenAI-compatible API server (port 11434)
`tama status`	Show status of all servers and running models
`tama service install`	Install as a system service
`tama service start`	Start the service
`tama service stop`	Stop the service
`tama service restart`	Restart the service
`tama service remove`	Remove the service

Model management

Command	Description
`tama model pull <repo>`	Pull a model from HuggingFace with quantization selection
`tama model ls`	List installed models
`tama model create`	Create a model config from an installed model
`tama model enable <name>`	Enable a model for on-demand loading
`tama model disable <name>`	Disable a model
`tama model rm <model>`	Remove an installed model
`tama model scan`	Scan for untracked GGUF files
`tama model search <query>`	Search HuggingFace for GGUF models
`tama model update [model]`	Check for and download model updates
`tama model verify [model]`	Verify GGUF files against HuggingFace hashes
`tama model prune`	Remove orphaned GGUF files
`tama model migrate`	Migrate model configs from TOML to database

Backend management

Tama manages LLM backend installations with automatic version tracking:

tama backend install llama_cpp     # Download pre-built llama.cpp binaries
tama backend install ik_llama      # Build from source
tama backend install llama_cpp --version b8407  # Specific version
tama backend install llama_cpp --build    # Force build from source
tama backend update <name>         # Update to latest version
tama backend list                  # List installed backends
tama backend remove <name>        # Remove a backend
tama backend check-updates         # Check for updates

TTS (Text-to-Speech) management

Tama supports Kokoro-FastAPI as a TTS backend, exposing OpenAI-compatible /v1/audio/* endpoints:

tama tts install kokoro_fastapi    # Install the Kokoro-FastAPI backend
tama tts list                      # List available TTS backends
tama tts voices                    # List available voice options

Server management (multi-server)

tama server ls                    # List all servers with status
tama server add <name> <cmd>      # Add a new server
tama server edit <name> <cmd>     # Edit an existing server
tama server rm <name>             # Remove a server

Sampling profiles

tama profile list                  # List all available profiles
tama profile set <server> <name>  # Set a server's sampling profile
tama profile clear <server>        # Clear a server's sampling profile

Configuration

Command	Description
`tama config show`	Print the current configuration
`tama config edit`	Open config file in editor
`tama config path`	Show the config file path

Backup & Restore

tama backup                        # Create a backup archive (config + DB + model cards)
tama backup --output tama-backup.tar.gz  # Custom output path
tama backup --dry-run              # Preview what would be backed up
tama restore tama-backup.tar.gz    # Restore from backup (merges config, models, database)
tama restore --skip-backends       # Skip backend re-installation
tama restore --skip-models         # Skip model re-downloading

Note

Backups include config.toml, model card files, and the SQLite database. Model GGUF files and backend binaries are not included — they must be re-downloaded after restore.

Benchmarking

tama bench                         # Run a benchmark (llama-bench)
tama bench --backend <name>        # Specify a backend

Benchmarks can also be run from the web UI's Benchmarks page, which supports:

llama-bench runner with preset configurations
Speculative decoding benchmarks (llama-cli spec bench mode)
Backend selector and results table showing tokens, prompt processing (PP), and token generation (TG) speed

Self-update

tama self-update                   # Update Tama to the latest version

Configuration

Tama auto-generates a config on first run:

Linux: ~/.config/tama/config.toml

[backends.llama_cpp]
path = "/path/to/llama-server"
health_check_url = "http://localhost:8080/health"

[supervisor]
restart_policy = "always"
max_restarts = 10
restart_delay_ms = 3000
health_check_interval_ms = 5000

[proxy]
host = "0.0.0.0"
port = 11434
idle_timeout_secs = 300
startup_timeout_secs = 120

[max_loaded_models]
enabled = false
max = 5          # Maximum number of models loaded simultaneously (LRU eviction)

Note

On first run after upgrading from kronk, Tama automatically migrates ~/.config/kronk to ~/.config/tama. Model configs are now stored in the SQLite database (tama.db) rather than config.toml — a migration runs automatically on upgrade.

Directory layout

~/.config/tama/
├── config.toml              Main configuration (backends, proxy, supervisor)
├── tama.db                   SQLite database (models, backends, pulls, benchmarks)
├── configs/                 Model cards with quant info and sampling presets
│   └── bartowski--OmniCoder-8B.toml
├── models/                  GGUF model files
│   └── bartowski/OmniCoder-8B/*.gguf
├── backends/                llama.cpp and ik_llama binaries (versioned)
├── tts/                     TTS backend installations (Kokoro-FastAPI)
└── logs/                    Service logs

GPU acceleration

The installer detects your GPU and offers these acceleration options:

CUDA (NVIDIA) — Fast inference on NVIDIA GPUs
Vulkan (AMD/Intel/NVIDIA) — Cross-platform GPU acceleration
Metal (Apple Silicon) — Native macOS GPU acceleration
ROCm (AMD) — AMD GPU support on Linux
CPU — Fallback when no GPU is available

Architecture

tama/
├── crates/
│   ├── tama-core/       # Config, process supervisor, proxy, platform abstraction
│   ├── tama-cli/        # CLI binary with clap
│   ├── tama-mock/       # Mock LLM backend for testing
│   └── tama-web/        # Leptos web control plane (WASM + SSR)
├── config/              # Configuration templates
├── docs/                # Documentation
└── modelcards/         # Community model cards

Core components

tama-core — Config management, process supervision, backend registry, proxy server with streaming, database (SQLite), backup/restore, benchmark runner, download queue
tama-cli — Command-line interface with clap, interactive prompts with inquire
tama-web — Leptos WASM frontend with real-time SSE updates, SSR server for hosting
tama-mock — Mock backend for testing and development

How it works

tama serve (or tama service start) starts an OpenAI-compatible API server on port 11434
When a request arrives with "model": "my-model", tama looks up the config from the database
If the backend isn't running, tama auto-assigns a free port and starts it
The request is forwarded to the backend and the response is streamed back
After idle_timeout_secs of inactivity, the backend is shut down

Proxy endpoints

The proxy exposes OpenAI-compatible API endpoints:

/tama/v1/chat/completions — Chat completions (streaming & non-streaming)
/tama/v1/completions — Legacy completions
/tama/v1/models — Model listing
/tama/v1/audio/* — TTS endpoints (/v1/audio/speech, /v1/audio/models)
/tama/v1/embeddings — Embeddings

All other non-tama paths are forwarded to the active backend via wildcard forwarding.

Building from source

git clone https://github.com/danielcherubini/tama.git
cd tama
cargo build --release

The binary is at target/release/tama.

For development with the web UI:

# Install trunk for frontend builds
cargo install trunk

# Build and run with web features
cargo run --package tama-web -- web

Roadmap

TUI Dashboard — Terminal UI with ratatui for resource monitoring
System tray — Quick service toggle from the system tray
Tauri GUI — Lightweight desktop frontend for non-CLI users

Name		Name	Last commit message	Last commit date
Latest commit History 964 Commits
.github/workflows		.github/workflows
.start/specs/001-spec-ngram-mod-flags/plan		.start/specs/001-spec-ngram-mod-flags/plan
config		config
crates		crates
docs		docs
modelcards/Tesslate		modelcards/Tesslate
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ssh		ssh

Folders and files

Latest commit

History

Repository files navigation

Tama

Overview

Quick Start

Installation

Run Tama

Web UI

Running the web UI

Pages

Components

CLI Reference

Server management

Model management

Backend management

TTS (Text-to-Speech) management

Server management (multi-server)

Sampling profiles

Configuration

Backup & Restore

Benchmarking

Self-update

Configuration

Directory layout

GPU acceleration

Architecture

Core components

How it works

Proxy endpoints

Building from source

Roadmap

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 110

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages