Build, deploy, and orchestrate AI agents, multi-agent teams, and automated workflows — all from one unified control plane. Supports OpenAI, Anthropic, Google Gemini, Ollama, OpenRouter, and any OpenAI-compatible endpoint.
If you find this project useful, please consider giving it a star! It helps others discover the project and motivates continued development.
- Why Obsidian AI?
- Features
- Multi-Provider LLM Support
- Agent Builder
- Multi-Agent Teams
- Workflow Automation
- Real-Time Chat Playground
- Artifacts
- Tool Integration
- Dynamic Tool Creation
- MCP Protocol Support
- Human-in-the-Loop (HITL)
- Knowledge Bases & RAG
- Long-Term Agent Memory
- Agent Versioning & Rollback
- Eval Harness & Regression Testing
- Prompt Auto-Optimizer
- Prompt Vault
- WhatsApp Channel Integration
- Automatic Context Management
- Session History & Execution Traces
- Scheduled Workflows
- Secrets Vault
- Security & Authentication
- Admin Panel & RBAC
- Agent Import / Export
- Docker Sandbox
- Dual Database Support
- Architecture
- Quick Start
- Usage Guide
- API Reference
- Tech Stack
- Project Structure
- Updates
- Contributing
- License
Most AI agent frameworks are code-only libraries that require deep programming knowledge. Obsidian AI provides a complete visual interface for building, managing, and running AI agents — no SDK glue code required.
- No vendor lock-in — Swap between OpenAI, Anthropic, Google, Ollama, or any OpenAI-compatible provider without changing a single line of agent configuration.
- Visual orchestration — Create multi-agent teams, sequential pipelines, and parallel DAG workflows from a drag-and-drop canvas. No YAML, no code, no boilerplate.
- Production-ready security — JWT auth, TOTP 2FA, AES end-to-end encryption, Fernet secrets vault, role-based access control, and rate limiting out of the box.
- Self-hosted & open-source — Run entirely on your own infrastructure. Your data never leaves your servers.
- MCP-native — First-class Model Context Protocol support for connecting external tools and services to your agents.
Connect to any major LLM provider from a single interface. Add providers with encrypted API key storage, test connections, and switch models per-agent.
| Provider | Models | Type |
|---|---|---|
| OpenAI | GPT-4.1, GPT-4.1 mini, GPT-4.1 nano, o3, o4-mini, GPT-5, GPT-5 mini | Cloud |
| Anthropic | Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5 | Cloud |
| Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 3 Flash (Preview) | Cloud | |
| Ollama | Llama, Mistral, Qwen, Phi, DeepSeek — any local model | Local |
| OpenRouter | Access 100+ models through one API key | Cloud |
| Custom | Any OpenAI-compatible endpoint (LM Studio, vLLM, etc.) | Self-hosted |
Create AI agents with custom system prompts, model selection, tool attachments, and MCP server connections. Each agent is fully configurable and can be used standalone or as part of a team or workflow.
- Custom system prompts — Define agent behavior, personality, and instructions
- Per-agent model selection — Pick the right model for each agent's task
- Tool attachment — Equip agents with HTTP, Python, or built-in tools
- MCP server binding — Connect agents to external services via MCP
- Knowledge base attachment — Assign one or more knowledge bases for persistent RAG context
- Long-term memory — Each agent builds a persistent memory of each user across sessions
- Dynamic tool creation — Enable agents to propose and create new tools mid-conversation
- Human-in-the-loop overrides — Define per-agent tool approval requirements independently of tool-level flags
- Import / Export — Share agent configurations as portable JSON files across instances
Combine multiple agents into coordinated teams for complex tasks. Choose from three collaboration modes that determine how agents interact.
| Mode | Description | Best For |
|---|---|---|
| Coordinate | A lead agent delegates tasks to team members | Task decomposition, project management |
| Route | Messages are routed to the most appropriate agent | Customer support, multi-domain Q&A |
| Collaborate | Agents build on each other's responses sequentially | Research, content creation, review chains |
Define multi-step workflows where each step is handled by a specific agent. Build sequential pipelines or design complex parallel graphs with the visual DAG editor. Monitor execution in real-time with per-node status tracking.
- Visual DAG editor — Drag-and-drop canvas (
@xyflow/react) for building non-linear agent pipelines; connect nodes with edges, configure each node from a side-panel - Parallel execution — Nodes without dependency relationships run concurrently via an async topological executor; fan-out and fan-join patterns supported natively
- Sequential pipeline — Chain agents in order; each step receives the previous step's output
- Per-step agent assignment — Use different agents (and models) for each step
- Live run visualization — Node colors pulse in real-time as the graph executes (gray → blue pulsing → green/red) driven by SSE events keyed by node ID
- Cycle detection — DFS topological sort validates the graph at save time; cyclic graphs are rejected before they can be executed
- Layout persistence — Node positions are stored with the workflow definition and restored when the editor re-opens
- Run history — Track past executions with status (pending, running, completed, failed)
- Reusable definitions — Save workflow templates and run them on demand or on a schedule
- Cron scheduling — Schedule workflows to run automatically using standard cron expressions
A full-featured chat interface for interacting with agents, teams, and workflows. Powered by Server-Sent Events (SSE) for real-time streaming.
- Live streaming responses — Token-by-token output via SSE
- Tool execution visualization — See tool calls, parameters, and results inline
- Chain-of-thought reasoning — View agent thinking steps for supported models
- File attachments — Send images, PDFs, Word docs, markdown, and text files directly in chat
- Prompt suggestions — Quick-start prompts to get conversations going
- Markdown rendering — GitHub-flavored markdown with syntax highlighting (Shiki), math (KaTeX), and Mermaid diagrams
- Agentic tool loops — Agents can execute up to 10 rounds of tool calls per response
When an agent produces substantial standalone content — an HTML page, a code file, an SVG diagram, a JSON payload — it wraps it in a named artifact instead of a code block. Artifacts open in a persistent side panel alongside the chat.
- Rich preview — HTML, JSX, TSX, SVG, and CSS artifacts render live in a sandboxed iframe
- Syntax highlighting — Code view uses Shiki (github-dark theme) for all supported languages
- In-panel editing — Switch to Edit mode to modify artifact content directly; changes reflect instantly in the preview
- Multiple artifacts — Each session can accumulate multiple artifacts shown as tabs in the panel
- Copy & download — One-click copy to clipboard or download with the correct file extension
- Fullscreen mode — Expand the artifact panel to fill the entire screen
- Inline reference chips — Each artifact appears as a clickable chip in the chat message; the raw XML tag is never shown to the user
- Live streaming — The panel opens as soon as the agent starts writing an artifact; an animated tab shows the title while it's being written
- In-place editing — When the agent modifies an existing artifact, the tab updates in place rather than opening a new one
Artifact tag format:
<artifact id="unique_snake_case_id" title="Human-readable title" type="html|jsx|tsx|css|javascript|typescript|python|markdown|json|svg|text">
...content...
</artifact>
Equip agents with tools using pre-built templates or custom definitions. Tools are defined with JSON Schema parameters and can call external APIs, run Python code, or use built-in functions.
| Template | Description |
|---|---|
| Weather Lookup | Get current weather for any location |
| Calculator | Evaluate mathematical expressions |
| Web Search | Search the web for information |
| Date & Time | Get the current date and time |
| API Request | Call any external REST API endpoint |
| Custom Python | Write your own Python handler function |
| Blank Tool | Define a tool from scratch with full JSON Schema |
Agents can propose and create new tools mid-conversation — no pre-configuration needed. Enable the Allow Tool Creation toggle on any agent, then ask it to build a capability it doesn't have. The agent designs the tool, and a review card appears inline in the chat.
- Opt-in per agent — Only agents with the toggle enabled can propose tools
- Two handler types — Pure-Python (stdlib only) or HTTP REST endpoint
- Inline review card — Shows tool name, description, handler type, collapsible parameter schema and code
- Immediate availability — Approved tools are usable by the agent in the same session without a page reload
- Sidebar auto-refresh — The tools list updates silently on approval with no disruption to the active chat
- Safety timeout — Proposals are auto-rejected after 10 minutes if left unreviewed
Connect external services via the Model Context Protocol. MCP servers expose tools that agents can discover and use during conversations.
- Stdio transport — Run local MCP servers as child processes (Docker, npx, Python, etc.)
- SSE transport — Connect to remote MCP servers over HTTP
- Connection testing — Verify server connectivity and discover available tools before saving
- Environment variables — Pass API keys and configuration to MCP servers securely
- Per-agent binding — Attach specific MCP servers to specific agents
Pause agent execution before sensitive tool calls and require explicit human approval before proceeding. The agent's streaming generator suspends at the flagged tool, surfaces an approval card in the chat, and only resumes based on the user's decision.
- Tool-level flag — Enable "Requires human approval" on any tool via the tool creation dialog
- Agent-level overrides — Independently mark tool names in the agent's "Require Approval For" list — triggers HITL even when the tool-level flag is off
- MCP tool support — MCP tools can also be flagged via the agent override list
- Inline approval card — An amber shield card appears in the streaming message with the tool name, formatted arguments, and Approve / Deny buttons
- Reconnect safe — Pending approvals are persisted to the database; reloading the page re-fetches and re-renders the card
- Auto-deny on timeout — Approvals left unanswered for 10 minutes are automatically denied
- Server restart recovery — All stale pending approvals are auto-denied on startup
- Zero-polling — Uses
asyncio.Eventto suspend the generator; the approve/reject HTTP request sets the event, resuming execution with zero busy-waiting
Persistent, reusable knowledge bases that can be attached to agents for retrieval-augmented generation. Unlike session-level file attachments, knowledge bases are indexed once and searched on every message.
Knowledge Bases:
- Two document types — Paste text directly, or upload files (PDF, DOCX, TXT, Markdown)
- Automatic indexing — Content is chunked and embedded into a FAISS vector store on upload
- Per-agent attachment — Assign one or more knowledge bases to an agent from the agent dialog
- Shared KBs — Admins can create shared knowledge bases visible to all users
- Chat indicators — A pill badge in the chat bubble shows which KB was used for each response
Session-Level File Attachments:
- Supported formats — PNG, JPG, GIF, WebP, PDF, Word (.docx), Markdown, plain text
- Automatic chunking — Documents are split and indexed for vector search
- Per-session indexes — Each conversation gets its own isolated vector store
- Cross-platform — FAISS on Windows, Leann HNSW on Linux/macOS
Agents remember what matters across conversations. At the start of each new session, a background LLM reflection call distills the previous session into a compact set of durable facts — preferences, project context, decisions, and corrections — which are automatically injected into every future system prompt.
- Model-driven extraction — The agent's own LLM decides what to remember; no keyword rules or manual tagging required
- Four memory categories —
preference(how the user likes things),context(project/background info),decision(agreed-upon choices),correction(feedback on agent behaviour) - Zero-latency trigger — Reflection runs as a background task on the first message of a new session; the user's response is never delayed
- Bounded storage — Maximum 50 memories per agent/user pair; oldest low-confidence facts are evicted automatically when the cap is reached
- Key-based deduplication — If a new fact contradicts an existing memory, it replaces the old one rather than accumulating stale data
- Transparent & editable — Open any agent in edit mode to see all stored memories with category colour-coding; delete individual facts or clear everything with one click
- Persistent across restarts — Memories are stored in the database (
agent_memoriestable / MongoDB collection)
Every time an agent's configuration is saved, the previous version is automatically snapshotted. Browse the full history, inspect diffs, and restore any past configuration — all without ever losing your work.
- Auto-snapshot on save — A full config snapshot is taken before every update with zero extra clicks
- Version history panel — Expand the "Version History" section in any agent's edit dialog to see a list of all past versions with timestamps and change summaries
- Side-by-side diff view — Click any version to open a two-column diff comparing it to the previous snapshot
- One-click rollback — Restore a previous version; the rollback itself is also versioned so you can undo it too
- Safe iteration — Underpins the Prompt Auto-Optimizer and eval-driven changes; every prompt change is reversible
- Automatic pruning — Versions older than 72 hours are pruned daily (the most recent version is always kept)
Define test suites with expected inputs and outputs, run them against any agent configuration, and get scored pass/fail results. Catch quality regressions before they reach users.
- Test suite builder — Create named suites with as many test cases as needed; each case has an input, expected output, and a grading method
- Three grading methods —
exact_match(string equality),contains(substring check),llm_judge(a secondary LLM call scores the response and explains its reasoning) - Run against any config — Trigger a run from the Evals page; results are stored and browsable per suite
- Live results view — Expand each test case to see the actual output, expected output, pass/fail badge, and LLM judge reasoning where applicable
- Agent-level run history — See all eval runs across suites for a given agent
- Optimizer integration — The Prompt Auto-Optimizer uses eval suites to validate proposed prompt changes before surfacing them for review
New API endpoints:
| Endpoint | Method | Description |
|---|---|---|
/evals/suites |
GET / POST | List or create eval suites |
/evals/suites/{id} |
GET / PUT / DELETE | Get, update, or delete a suite |
/evals/suites/{id}/run |
POST | Trigger an eval run (background) |
/evals/runs/{id} |
GET / DELETE | Get or delete an eval run |
/evals/suites/{id}/runs |
GET | List all runs for a suite |
/evals/agents/{id}/runs |
GET | List all eval runs for an agent |
Automatically analyzes an agent's recent conversation history, identifies recurring failure patterns, and proposes an improved system prompt — with optional validation against an eval suite before you see it.
- Trace-driven analysis — The optimizer collects recent sessions and passes them to an LLM that identifies where the agent underperformed, misunderstood users, or violated its instructions
- Failure pattern extraction — Patterns are categorized by severity (low / medium / high) and displayed as labeled chips so you can see exactly what was wrong
- Prompt proposal — A second LLM call rewrites the system prompt to address the identified patterns while preserving the agent's existing capabilities
- Eval validation — Optionally select an eval suite; the optimizer runs the current and proposed prompts through the suite and shows a before/after score (e.g. baseline 60% → proposed 85%)
- Side-by-side diff view — The current and proposed prompts are shown side-by-side in a fixed-width monospace panel for easy comparison
- Accept or reject — Accepting applies the proposed prompt to the agent (and creates a version snapshot first for safety); rejecting discards it with an optional reason
- Past runs list — All previous optimization runs are listed below the active run panel for reference
- Weekly auto-sweep — APScheduler triggers the optimizer automatically every Monday at 02:00 for all agents that have enough recent traces (configurable minimum)
- Requires ≥ 5 sessions — The optimizer needs enough conversation data to identify meaningful patterns; running it on a fresh agent will report "not enough traces"
New API endpoints:
| Endpoint | Method | Description |
|---|---|---|
/optimizer/trigger |
POST | Start a new optimization run |
/optimizer/agents/{id} |
GET | List optimization runs for an agent |
/optimizer/runs/{id} |
GET | Poll run status |
/optimizer/runs/{id}/accept |
POST | Accept the proposed prompt |
/optimizer/runs/{id}/reject |
POST | Reject the proposed prompt |
/optimizer/runs/{id} |
DELETE | Delete a run record |
A library for storing and reusing system prompts across agents. Save prompts manually, or push proposed prompts directly from the Prompt Auto-Optimizer without leaving the review flow.
- Vault page — Browse, create, edit, and delete saved prompts from a dedicated settings page
- Optimizer integration — Accept an optimizer-proposed prompt directly into the vault (as a new entry or as an update to an existing one) alongside or instead of applying it to the agent
- Full CRUD — Each entry has a name, description, and prompt body
- Both databases — Works identically with SQLite and MongoDB
Connect any agent to a WhatsApp account via QR code scan. No Meta Business account, no API fees, no webhooks required.
- QR pairing — Click Connect, scan the QR code with WhatsApp → Linked Devices, and the channel goes live
- Per-channel agent — Route all messages on a channel to a specific agent; change the agent at any time without re-scanning
- Contact whitelist — Restrict which contacts the agent responds to; others receive a custom rejection message or are silently ignored
- Session continuity — Each
(channel, contact)pair maps to a persistent session; the agent remembers conversation history and applies long-term memory - Voice note transcription — Incoming voice notes are transcribed locally using faster-whisper (CPU, no cloud STT)
- Voice note replies — Reply with synthesised audio using the multi-backend TTS pipeline (see Voice Reply Setup):
- Qwen3-TTS (GPU): 9 preset voices across English, Chinese, Japanese, Korean; or reply in the user's own cloned voice
- Pocket TTS (CPU fallback): 8 voices, CPU-native
- Kokoro (last resort): lightweight CPU fallback
- Voice cloning — Record or upload a voice sample in the channel settings page; the agent replies in the user's voice. Includes a guided in-browser recorder with a phonetically rich script
- Per-channel TTS engine — Choose
auto,qwen, orclassicper channel - HITL in channels — Pending approvals surface in the global notification badge in the top bar
- Auto-reconnect — The sidecar reconnects all authenticated channels automatically on startup
When a conversation approaches a model's context limit, the platform automatically compresses older history so sessions can continue indefinitely without hitting token limits or requiring manual intervention.
- Threshold-based trigger — Compaction activates at 80% of the model's context window
- Recent messages preserved — The last 10 messages are always kept verbatim for continuity
- LLM-driven summarization — Older messages are summarized by the same LLM into a single compact summary message that replaces them
- Transparent notification — A
context_compactedevent is streamed to the frontend when compaction occurs - Audit trail — Each compaction is recorded in the database with metadata about what was compressed
- Model-aware limits — Context limits are tracked per model family (Claude 200k, GPT-4 128k, etc.) with a safe fallback for unknown models
Browse, search, and manage past conversations. Sessions are organized by agent or team and can be filtered by type.
Session History:
- Search — Find conversations by content
- Filter by type — View agent, team, or workflow sessions
- Resume conversations — Click a session to continue where you left off
- Delete sessions — Remove conversations you no longer need
Execution Traces:
Every agent session and scheduled workflow run records a full execution trace — a sequence of spans capturing each LLM call and tool/MCP call with timing, token usage, and input/output previews.
- Span types — LLM calls (Brain icon), tool calls (Wrench), MCP calls (Server), workflow steps (GitBranch)
- Round grouping — Spans are grouped by tool-loop iteration ("Initial", "Round 1", "Round 2" ...)
- Collapsible details — Each span reveals input/output JSON previews (truncated at 500 chars)
- Performance indicators — Duration badge turns amber for spans taking > 5s; error indicator on failed spans
- Token summary — Total duration, total tokens, and span count shown in the trace header
- Access — Hover any session row in
/sessionsand click the Activity icon
Run workflows automatically on a server-side cron schedule — no browser required. Scheduled runs are identical to manual runs: a WorkflowRun record is created, each step executes sequentially, and results appear in the workflow's run history.
- Cron expressions — Standard 5-field cron syntax (
minute hour day month weekday) with a helper link to crontab.guru - Fixed input text — Optionally supply the input text that feeds into the first workflow step at schedule creation time
- Active / paused toggle — Pause and resume schedules without deleting them
- Missed-run skipping — If the server is down during a scheduled run, the missed run is skipped rather than queued
- Restart persistence — Active schedules are automatically re-registered when the server starts up
- Per-user ownership — Each user manages their own schedules; they are not visible to other users
- Run history — Each scheduled execution appears alongside manual runs in the workflow history dialog
Store API keys, tokens, and sensitive credentials in an encrypted vault. Secrets are encrypted at rest and can be referenced when configuring providers — no need to paste raw API keys.
- Fernet encryption at rest — All secret values are encrypted before storage
- AES encryption in transit — Secrets are transmitted via encrypted payloads
- Reference in providers — Use stored secrets when adding LLM providers instead of entering keys directly
- CRUD management — Create, view, update, and delete secrets from the settings page
Enterprise-grade security features built in from day one.
| Feature | Implementation |
|---|---|
| JWT Authentication | HS256 signed tokens with configurable expiry |
| Two-Factor Auth (2FA) | TOTP-based with QR code enrollment (Google Authenticator, Authy) |
| End-to-End Encryption | AES-encrypted request payloads (CryptoJS client, PyCryptodome server) |
| Secrets at Rest | Fernet-encrypted storage for API keys and credentials |
| Password Hashing | Bcrypt with salt for all user passwords |
| Rate Limiting | Per-endpoint limits via SlowAPI (configurable per user and API client) |
| API Client Credentials | Generate client ID/secret pairs for programmatic access |
Manage users, assign roles, and control permissions with a granular role-based access control system.
- Admin & Guest roles — Two built-in roles with configurable permissions
- Six permission flags — Create Agents, Create Teams, Create Workflows, Create Tools, Manage Providers, Manage MCP Servers
- User management — Create, edit, and delete user accounts
- Statistics dashboard — View total users, admin count, and guest count
Share agent configurations as portable JSON files across any Obsidian AI instance. The export resolves all internal IDs to human-readable names so the file is fully self-contained and shareable. On import, names are matched back to the importing user's resources; any that don't exist are skipped and reported as warnings.
Export file format:
{
"aios_export_version": "1",
"exported_at": "2026-01-01T00:00:00Z",
"agent": {
"name": "Research Assistant",
"description": "Searches the web for information",
"system_prompt": "You are a research assistant...",
"provider_model_id": "gpt-4o",
"tools": ["web_search", "calculator"],
"mcp_servers": ["brave-search"],
"knowledge_bases": ["Company Docs"],
"hitl_confirmation_tools": ["delete_file"],
"config": { "temperature": 0.7, "max_tokens": 2000 }
}
}Give any agent or team an isolated Docker container for safe code execution and file operations. When sandbox is enabled, the agent gains a set of built-in tools to interact with the container and automatically surfaces any files it creates as artifacts in the chat panel.
- Per-agent and per-team — Enable sandbox independently on individual agents or entire teams; all agents in a sandboxed team share the same container
- Isolated container — Each sandbox runs the
obsidian-webdev-baseimage with a 512 MB memory cap, 1 CPU limit, and/workspaceas the working directory - Start / stop controls — Launch or terminate the container directly from the agent or team edit dialog; a pulsing status dot shows whether the container is live
- Sandbox status badge — A live indicator appears in the chat header whenever a sandbox-enabled agent or team has an active container
- Built-in sandbox tools — When an active sandbox container is detected, nine tools are automatically injected into the agent's tool set:
| Tool | Description |
|---|---|
sandbox_bash |
Run any shell command inside the container |
sandbox_write |
Write content to a file (creates parent directories automatically) |
sandbox_read |
Read a file's contents back from the container |
sandbox_ls |
List files and directories |
sandbox_glob |
Find files matching a shell glob pattern |
sandbox_grep |
Search file contents with regex |
sandbox_delete |
Delete a file or directory |
sandbox_python |
Execute Python 3.12 code and return output (numpy, pandas, matplotlib, scikit-learn, requests, httpx, pytest pre-installed) |
sandbox_node |
Execute JavaScript (Node.js 20) or TypeScript (ts-node) code and return output |
- Automatic artifact surfacing — The agent is instructed via system prompt to call
sandbox_readafter creating or modifying any file and wrap the result in an<artifact>tag, so files appear in the artifact panel with live preview, copy, and download
Run with zero-config SQLite out of the box, or switch to MongoDB for production deployments. Every endpoint supports both databases with automatic branching.
| Database | Config | Use Case |
|---|---|---|
| SQLite | Default — no setup required | Development, single-user, small teams |
| MongoDB | Set DATABASE_TYPE=mongo |
Production, multi-user, horizontal scaling |
┌──────────────────────────────┐ ┌──────────────────────────────┐
│ Frontend │ │ Backend │
│ Next.js 16 + React 19 │──────>│ FastAPI + SQLAlchemy │
│ Port 3000 │ /api │ Port 8000 │
│ │proxy │ │
│ ┌────────┐ ┌─────────────┐ │ │ ┌──────────┐ ┌───────────┐ │
│ │NextAuth│ │ Zustand │ │ │ │ JWT Auth │ │ LLM │ │
│ │ v5 │ │ State Mgmt │ │ │ │ + 2FA │ │ Providers │ │
│ └────────┘ └─────────────┘ │ │ └──────────┘ └───────────┘ │
│ ┌────────┐ ┌─────────────┐ │ │ ┌──────────┐ ┌───────────┐ │
│ │Radix UI│ │ CryptoJS │ │ │ │ MCP │ │ RAG │ │
│ │/shadcn │ │ Encryption │ │ │ │ Client │ │ Service │ │
│ └────────┘ └─────────────┘ │ │ └──────────┘ └───────────┘ │
└──────────────────────────────┘ └──────────┬───────────────────┘
│
┌──────────▼───────────────────┐
│ SQLite (default) / MongoDB │
└──────────────────────────────┘
How it works:
- The Next.js frontend proxies all
/api/*requests to the FastAPI backend vianext.config.tsrewrites - Authentication flows through NextAuth v5 on the frontend and JWT verification on the backend
- All auth-sensitive payloads are AES-encrypted end-to-end (CryptoJS client-side, PyCryptodome server-side)
- Chat responses stream via Server-Sent Events (SSE) for real-time token delivery
- LLM providers are abstracted behind a factory pattern — agents reference a provider, not a specific SDK
| Tool | Version | Purpose |
|---|---|---|
| Python | 3.12+ | Backend runtime |
| Node.js | 18+ | Frontend runtime |
| uv | Latest | Python package manager |
| npm | 9+ | Node package manager |
# Navigate to the backend directory
cd backend
# Install Python dependencies
uv sync
# Configure environment variables (see Environment Variables section)
# Edit .env with your keys
# Start the FastAPI development server
uv run uvicorn main:app --reloadThe backend API will be available at http://localhost:8000. Interactive docs at http://localhost:8000/docs.
# Navigate to the frontend directory
cd frontend
# Install Node.js dependencies
npm install
# Configure environment variables
# Edit .env.local with your keys
# Start the Next.js development server
npm run devThe frontend will be available at http://localhost:3000.
# Navigate to the wa-bridge directory
cd wa-bridge
# Install Node.js dependencies
npm install
# Start the WhatsApp bridge
node index.jsThe WhatsApp bridge will be available at http://localhost:3200.
WhatsApp channels support AI-generated voice note replies. This is optional — text replies work without it.
System requirement: ffmpeg
ffmpeg is required to convert synthesised audio to OGG Opus (the native WhatsApp voice note format). Install it and ensure it is on your system PATH:
# Windows
winget install ffmpeg
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt update && sudo apt install -y ffmpeg
# Fedora / RHEL
sudo dnf install ffmpeg
# Arch Linux
sudo pacman -S ffmpegRestart the backend after installing ffmpeg so the new PATH is picked up.
TTS engine selection
The platform supports three TTS backends selected per channel or globally via the tts_backend setting:
| Setting | Engine | Requirement | Quality |
|---|---|---|---|
auto (default) |
Qwen3-TTS on GPU, Pocket TTS on CPU | CUDA GPU for Qwen3 | High (GPU) / Good (CPU) |
qwen |
Qwen3-TTS forced | CUDA GPU + pip install qwen-tts |
High |
classic |
Pocket TTS → Kokoro | CPU only | Good |
Qwen3-TTS (GPU, high quality)
Qwen3-TTS is a high-quality neural TTS model from Alibaba. It requires a CUDA GPU and is auto-selected when one is available.
pip install qwen-ttsImportant: PyPI only ships the CPU build of PyTorch. Qwen3-TTS requires a CUDA-enabled build. After
uv sync, install the CUDA PyTorch wheel manually:# Check your CUDA version first: nvidia-smi # Then install the matching wheel (replace cu121 with cu118 or cu124 as needed): uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121Verify CUDA is available:
python -c "import torch; print(torch.cuda.is_available())"
Preset voices available: Ryan, Aiden (English), Vivian, Serena, Uncle_Fu, Dylan, Eric (Chinese), Ono_Anna (Japanese), Sohee (Korean).
Configure model size and device via environment variables:
QWEN_TTS_SIZE=0.6B # or 1.7B for higher quality (uses more VRAM)
QWEN_TTS_DEVICE=cuda:0Voice cloning
When Qwen3-TTS is active, users can record or upload a short voice sample (3 seconds minimum) directly in the channel settings page. The agent will reply in the user's own voice. A guided recording script is provided in the UI for best results. Voice clone prompts are cached in memory — the reference audio is only processed once per session.
VOICE_SAMPLES_DIR=voice_samples # Directory where uploaded voice samples are storedPocket TTS / Kokoro (CPU fallback)
If Qwen3-TTS is unavailable (no GPU or qwen-tts not installed), the system falls back to Pocket TTS (~471MB, CPU-native, ~3× faster than real-time), then to Kokoro if Pocket TTS also fails. Models are downloaded automatically from HuggingFace on first use. If all TTS backends fail, the reply is sent as text.
SoX (Optional)
SoX is used internally by Pocket TTS for audio processing. It is only needed when running in classic mode — if you have a CUDA GPU and Qwen3-TTS is active, SoX is not required.
# Windows
winget install sox
# macOS
brew install sox
# Ubuntu / Debian
sudo apt install sox
# Fedora / RHEL
sudo dnf install sox
# Arch Linux
sudo pacman -S soxFlash Attention (Optional)
Flash Attention 2 speeds up transformer attention computation for Qwen3-TTS on CUDA GPUs. Without it, the system falls back to standard PyTorch attention — output is identical, just slower.
uv pip install flash-attnRequires a compatible CUDA GPU and takes a while to compile.
If you plan to use the Docker Sandbox feature, build the base image once before starting agents with sandbox enabled:
docker build -f backend/Dockerfile.base -t obsidian-webdev-base:latest backend/This image includes Python 3.12, Node.js 20, and common dev tools. It only needs to be rebuilt if you modify backend/Dockerfile.base.
# From the project root directory
npm run devThis starts both the frontend and backend concurrently.
Important: You must generate real secret keys before starting the backend. Placeholder values will cause startup errors.
# Generate ENCRYPTION_KEY and JWT_SECRET_KEY (any random string works):
openssl rand -hex 32
# Generate PROVIDER_KEY_SECRET (must be a valid Fernet key — 32 url-safe base64-encoded bytes):
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"# Encryption key for AES payload encryption (must match frontend NEXT_PUBLIC_ENCRYPTION_KEY)
ENCRYPTION_KEY=<output of: openssl rand -hex 32>
# JWT authentication
JWT_SECRET_KEY=<output of: openssl rand -hex 32>
JWT_ALGORITHM=HS256
JWT_ACCESS_TOKEN_EXPIRE_MINUTES=30
# Fernet key for encrypting provider API keys at rest (MUST be a valid Fernet key)
PROVIDER_KEY_SECRET=<output of: python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())">
# Rate limiting (requests per minute)
RATE_LIMIT_USER=60
RATE_LIMIT_API_CLIENT=100
# Database type: "sqlite" (default) or "mongo"
DATABASE_TYPE=sqlite
# Tavily Search API key — required for the web_search agent tool
# Get a free key (1000 searches/month) at https://app.tavily.com
TAVILY_API_KEY=tvly-...
# WhatsApp bridge URL (default: http://localhost:3200)
WA_BRIDGE_URL=http://localhost:3200
# TTS — Qwen3-TTS settings (GPU only; ignored if qwen-tts not installed)
QWEN_TTS_SIZE=0.6B # or 1.7B for higher quality
QWEN_TTS_DEVICE=cuda:0
VOICE_SAMPLES_DIR=voice_samples# NextAuth.js signing secret
AUTH_SECRET=your-auth-secret
# Application URL
AUTH_URL=http://localhost:3000
# Public encryption key (must match backend ENCRYPTION_KEY)
NEXT_PUBLIC_ENCRYPTION_KEY=your-encryption-keyNavigate to the Chat page sidebar and click + next to Endpoints. Select a provider type (OpenAI, Anthropic, Google, Ollama, etc.), enter your API key or reference one from the secrets vault, and specify the model ID.
Click + next to Agents in the sidebar. Configure the agent with a name, system prompt, and model/provider selection. Optionally attach tools, MCP servers, and knowledge bases.
Select your agent from the sidebar and start a conversation. Responses stream in real-time. Attach files, view tool executions, and explore chain-of-thought reasoning.
Switch to the Teams tab, click +, and combine multiple agents. Choose a coordination mode (Coordinate, Route, or Collaborate) and start a team chat.
From the Dashboard, click Create Workflow. Add sequential steps, each assigned to a specific agent with custom instructions. Run the workflow and watch each step execute in real-time.
In the sidebar, click + next to MCP Servers. Set up servers using stdio (local Docker/npx/python) or SSE (remote HTTP) transport. Test the connection to discover available tools, then bind servers to agents.
Go to Settings > Secrets to store API keys and tokens in the encrypted vault. Reference stored secrets when adding LLM providers.
In Settings, enable 2FA by scanning the QR code with an authenticator app (Google Authenticator, Authy, etc.). Enter the 6-digit code to verify setup.
Navigate to the Admin panel to create users, assign roles (Admin/Guest), and configure granular resource permissions.
The backend exposes a RESTful API with interactive documentation:
- Swagger UI —
http://localhost:8000/docs - ReDoc —
http://localhost:8000/redoc
| Category | Endpoint | Method | Description |
|---|---|---|---|
| Auth | /auth/register |
POST | Create a new user account |
| Auth | /auth/login |
POST | Authenticate and receive JWT token |
| Agents | /agents |
GET | List all agents |
| Agents | /agents |
POST | Create a new agent |
| Agents | /agents/{id} |
PUT | Update an agent |
| Agents | /agents/{id} |
DELETE | Delete an agent |
| Agents | /agents/{id}/export |
GET | Export agent as JSON |
| Agents | /agents/import |
POST | Import agent from JSON |
| Teams | /teams |
GET | List all teams |
| Teams | /teams |
POST | Create a new team |
| Workflows | /workflows |
GET | List all workflows |
| Workflows | /workflows |
POST | Create a new workflow |
| Workflows | /workflows/{id}/run |
POST | Execute a workflow (SSE) |
| Workflows | /workflows/{id}/runs |
GET | List workflow run history |
| Schedules | /workflows/{id}/schedules |
GET | List schedules for a workflow |
| Schedules | /workflows/{id}/schedules |
POST | Create a cron schedule |
| Schedules | /schedules/{id} |
PUT | Update a schedule |
| Schedules | /schedules/{id} |
DELETE | Delete a schedule |
| Chat | /chat |
POST | Stream a chat response (SSE) |
| Sessions | /sessions |
GET | List conversation sessions |
| Sessions | /sessions/{id}/messages |
GET | Get session messages |
| Traces | /traces/sessions/{id} |
GET | Get full execution trace for a session |
| Traces | /traces/workflow-runs/{id} |
GET | Get execution trace for a workflow run |
| Providers | /providers |
GET | List configured LLM providers |
| Providers | /providers/{id}/test |
POST | Test provider connection |
| Providers | /providers/{id}/models |
GET | List available models |
| Tools | /tools |
GET | List available tool definitions |
| Tools | /tools |
POST | Create a tool definition |
| MCP | /mcp-servers |
GET | List MCP server configurations |
| MCP | /mcp-servers/{id}/test |
POST | Test MCP server connection |
| Knowledge | /knowledge |
GET | List knowledge bases |
| Knowledge | /knowledge/{id}/documents |
POST | Add a document to a knowledge base |
| Memory | /memory/agents/{id} |
GET | List agent memories |
| Memory | /memory/agents/{id} |
DELETE | Clear all agent memories |
| Versions | /versions/agents/{id} |
GET | List agent version snapshots |
| Versions | /versions/agents/{id}/{vid}/rollback |
POST | Rollback to a version |
| Evals | /evals/suites |
GET / POST | List or create eval suites |
| Evals | /evals/suites/{id}/run |
POST | Trigger an eval run |
| Evals | /evals/runs/{id} |
GET | Poll eval run status |
| Evals | /evals/agents/{id}/runs |
GET | List eval runs for an agent |
| Optimizer | /optimizer/trigger |
POST | Start a new optimization run |
| Optimizer | /optimizer/agents/{id} |
GET | List optimization runs for an agent |
| Optimizer | /optimizer/runs/{id} |
GET | Poll optimization run status |
| Optimizer | /optimizer/runs/{id}/accept |
POST | Accept the proposed prompt |
| Optimizer | /optimizer/runs/{id}/reject |
POST | Reject the proposed prompt |
| Secrets | /secrets |
GET | List user secrets (encrypted) |
| Secrets | /secrets |
POST | Create a new secret |
| Files | /files/upload |
POST | Upload a file attachment |
| Dashboard | /dashboard/stats |
GET | Get dashboard statistics |
| Admin | /admin/users |
GET | List all users (admin only) |
| Admin | /admin/users |
POST | Create a user (admin only) |
| Admin | /admin/users/{id} |
PUT | Update user role/permissions |
| User | /user/2fa/setup |
POST | Set up TOTP two-factor auth |
| User | /user/2fa/verify |
POST | Verify 2FA setup |
| User | /user/api-clients |
POST | Generate API client credentials |
/wa/channels |
GET / POST | List or create WhatsApp channels | |
/wa/channels/{id} |
GET / PATCH / DELETE | Get, update, or delete a channel | |
/wa/channels/{id}/connect |
POST | Initiate QR pairing | |
/wa/channels/{id}/disconnect |
POST | Disconnect session | |
/wa/channels/{id}/qr |
GET (SSE) | Stream QR code and connected event | |
/wa/channels/{id}/voice-sample |
POST / DELETE | Upload or remove voice clone sample | |
| Prompt Vault | /prompt-vault |
GET / POST | List or create saved prompts |
| Prompt Vault | /prompt-vault/{id} |
GET / PUT / DELETE | Get, update, or delete a prompt |
| Health | /health |
GET | Health check |
Generate API client credentials from Settings to access the API programmatically:
curl -H "X-API-Key: your-client-id" \
-H "X-API-Secret: your-client-secret" \
http://localhost:8000/agents| Technology | Purpose |
|---|---|
| Next.js 16 | React framework with App Router |
| React 19 | UI component library |
| NextAuth v5 | Authentication & session management |
| Tailwind CSS 4 | Utility-first styling |
| Radix UI / shadcn/ui | Accessible component primitives |
| Zustand | Lightweight state management |
| CryptoJS | Client-side AES encryption |
| Lucide React | Icon library |
| Motion | Animation library |
| Shiki | Syntax highlighting |
| Sonner | Toast notifications |
| Technology | Purpose |
|---|---|
| FastAPI | Async Python web framework |
| SQLAlchemy | SQL ORM (SQLite) |
| Motor | Async MongoDB driver |
| Pydantic | Data validation & serialization |
| python-jose | JWT token handling |
| bcrypt | Password hashing |
| PyCryptodome | AES encryption |
| PyOTP | TOTP two-factor authentication |
| SSE-Starlette | Server-Sent Events streaming |
| MCP SDK | Model Context Protocol client |
| FAISS | Vector similarity search |
| APScheduler | Cron-based background job scheduling |
| croniter | Cron expression validation |
| SlowAPI | Rate limiting |
| uv | Fast Python package manager |
obsidian-ai/
├── package.json # Root — runs frontend + backend concurrently
│
├── backend/ # FastAPI Python backend
│ ├── main.py # App entrypoint, middleware, lifespan
│ ├── pyproject.toml # Python dependencies (uv)
│ ├── .env # Backend environment config
│ │
│ ├── auth.py # JWT token generation & verification
│ ├── config.py # Database type configuration
│ ├── crypto_utils.py # AES payload encryption/decryption
│ ├── encryption.py # Fernet encryption for secrets at rest
│ ├── database.py # SQLAlchemy engine & session (SQLite)
│ ├── database_mongo.py # Motor async MongoDB driver setup
│ ├── models.py # SQLAlchemy ORM models
│ ├── schemas.py # Pydantic request/response schemas
│ ├── rate_limiter.py # Rate limiting middleware
│ ├── mcp_client.py # MCP server connection & tool bridging
│ ├── rag_service.py # Vector search & RAG pipeline
│ ├── file_storage.py # File upload/download handling
│ ├── scheduler.py # Global APScheduler instance + cron helpers
│ ├── scheduler_executor.py # Background workflow execution functions
│ ├── eval_engine.py # Eval grading logic + background run tasks
│ ├── optimizer.py # Prompt auto-optimizer pipeline (SQLite + Mongo)
│ │
│ ├── llm/ # LLM provider integrations
│ │ ├── base.py # Base provider interface
│ │ ├── provider_factory.py # Factory pattern for provider selection
│ │ ├── anthropic_provider.py # Anthropic (Claude) provider
│ │ ├── openai_provider.py # OpenAI / OpenRouter provider
│ │ ├── google_provider.py # Google Gemini provider
│ │ └── ollama_provider.py # Local Ollama provider
│ │
│ ├── services/ # Background services
│ │ ├── whatsapp_service.py # Incoming message handling + TTS dispatch
│ │ └── tts_service.py # Qwen3-TTS / Pocket TTS / Kokoro pipeline
│ │
│ └── routers/ # API route handlers
│ ├── auth_router.py # Login, register, 2FA/TOTP
│ ├── agents_router.py # Agent CRUD + import/export
│ ├── teams_router.py # Team coordination
│ ├── workflows_router.py # Workflow definitions
│ ├── workflow_runs_router.py # Workflow execution & tracking
│ ├── chat_router.py # Real-time streaming chat
│ ├── sessions_router.py # Conversation sessions
│ ├── providers_router.py # LLM provider configuration
│ ├── tools_router.py # Tool definitions
│ ├── mcp_servers_router.py # MCP server configuration
│ ├── secrets_router.py # Encrypted user secrets
│ ├── files_router.py # File attachments
│ ├── knowledge_router.py # Knowledge base CRUD + document indexing
│ ├── memory_router.py # Agent long-term memory
│ ├── versions_router.py # Agent version snapshots + rollback
│ ├── eval_router.py # Eval suite CRUD + run management
│ ├── optimizer_router.py # Prompt optimizer trigger + accept/reject + vault actions
│ ├── prompt_vault_router.py # Prompt Vault CRUD
│ ├── whatsapp_router.py # WhatsApp channel management + voice sample endpoints
│ ├── schedule_router.py # Workflow schedule CRUD + APScheduler sync
│ ├── traces_router.py # Execution trace read endpoints
│ ├── dashboard_router.py # Dashboard statistics
│ └── admin_router.py # System administration
│
└── frontend/ # Next.js 16 React frontend
├── package.json # Frontend dependencies
├── next.config.ts # Next.js config (API rewrites)
├── auth.ts # NextAuth.js v5 configuration
├── .env.local # Frontend environment config
│
├── app/ # Next.js app directory
│ ├── layout.tsx # Root layout (toasts, providers)
│ ├── page.tsx # Landing page
│ ├── login/page.tsx # Login page with 2FA support
│ ├── register/page.tsx # User registration
│ │
│ └── (authenticated)/ # Protected routes
│ ├── layout.tsx # Sidebar & navigation
│ ├── home/page.tsx # Dashboard
│ ├── playground/page.tsx # Chat interface
│ ├── sessions/page.tsx # Conversation history
│ ├── settings/page.tsx # User settings, 2FA, secrets
│ ├── knowledge/page.tsx # Knowledge base list
│ ├── knowledge/[id]/page.tsx # KB detail — add text/file documents
│ ├── evals/page.tsx # Eval suite builder + run results
│ ├── prompt-vault/page.tsx # Prompt Vault — save & reuse prompts
│ ├── channels/page.tsx # WhatsApp channel list
│ ├── channels/[id]/page.tsx # Channel settings — agent, voice, clone
│ └── admin/page.tsx # Admin panel
│
├── components/ # React components
│ ├── playground/ # Chat, sidebar, agent & team dialogs
│ ├── dialogs/ # Modal dialogs (tools, workflows, MCP, admin)
│ ├── ai-elements/ # Chat & streaming display components
│ └── ui/ # shadcn / Radix UI primitives
│
├── lib/ # Utility libraries
│ ├── api-client.ts # HTTP client with auth headers
│ ├── crypto.ts # Client-side AES encryption
│ ├── stream.ts # SSE streaming response handler
│ └── utils.ts # General utilities
│
├── stores/ # Zustand state management
│ ├── playground-store.ts # Agents, teams, workflows, artifacts state
│ ├── dashboard-store.ts # Dashboard state
│ ├── permissions-store.ts # User permissions state
│ └── admin-store.ts # Admin panel state
│
└── types/ # TypeScript type definitions
├── playground.ts # Agent, Team, Workflow, Artifact types
└── api.ts # API response types
Recent additions shipped to the platform.
Agents can now receive and reply to WhatsApp messages in real time — no Meta Business account, no API fees, no webhooks.
Connect any Obsidian AI agent to a WhatsApp account in seconds using a QR code scan. Once linked, every incoming message is routed through the full agent pipeline — tools, RAG, memory, HITL — and the reply lands back in WhatsApp automatically.
How it works:
A lightweight Node.js sidecar (wa-bridge/) runs alongside the backend. It manages a Baileys WhatsApp Web socket per channel, renders QR codes as PNG data URLs, and forwards incoming messages to FastAPI over localhost. The backend runs the agent headlessly (no SSE stream required) and posts the reply back through the sidecar.
What's included:
- Channels page — Create, manage, and delete WhatsApp channels from the sidebar; each channel shows live connection status (Connected / Awaiting QR / Disconnected)
- QR scan flow — Click Connect, a QR code appears in the UI; scan with WhatsApp → Linked Devices and the channel goes live in seconds
- Per-channel agent assignment — Each channel routes messages to a specific agent; change the agent at any time without re-scanning
- Contact whitelist — Optionally restrict which WhatsApp contacts the agent responds to; unrecognised senders can receive a custom rejection message or be silently ignored
- Persistent sessions — Each
(channel, WhatsApp contact)pair maps to a persistent Obsidian AI session; the agent remembers the full conversation history and applies long-term memory across interactions - HITL in channels — When a channel-triggered agent hits a HITL-flagged tool, execution pauses; the approval card surfaces in the global notification badge in the top bar and unblocks automatically once actioned
- Voice note transcription — Incoming WhatsApp voice notes are automatically transcribed using a local faster-whisper
smallmodel (244 MB, runs entirely on CPU, no external API). The transcript is passed to the agent as text — no audio files written to disk, no cloud services involved - Voice note replies — Optionally reply with a generated voice note instead of text. The TTS pipeline auto-selects the best available engine: Qwen3-TTS (high quality, CUDA GPU required, 9 preset voices across English/Chinese/Japanese/Korean) → Pocket TTS (CPU-native, ~471MB) → Kokoro (CPU last resort). The engine can be forced per channel (
auto/qwen/classic). Enable per channel, choose a voice, and optionally restrict voice replies to specific contacts. Requiresffmpeg(see Voice Reply Setup) - Voice cloning — When using Qwen3-TTS, users can record or upload a voice sample directly from the channel settings page (guided recording script included). The agent replies in the user's own voice. Clone prompts are cached — the reference audio is processed once and reused for all subsequent replies
- Global HITL badge — A new bell icon in the app header polls all sessions for pending approvals and lets you approve or reject tool calls from anywhere in the UI
- Auto-reconnect — The sidecar reconnects all previously authenticated channels automatically on startup; no manual re-scanning after a restart
New API endpoints:
| Endpoint | Method | Description |
|---|---|---|
/wa/channels |
GET | List all WhatsApp channels |
/wa/channels |
POST | Create a new channel |
/wa/channels/{id} |
GET / PATCH / DELETE | Get, update, or delete a channel |
/wa/channels/{id}/connect |
POST | Start the Baileys socket (triggers QR) |
/wa/channels/{id}/disconnect |
POST | Gracefully disconnect the socket |
/wa/channels/{id}/qr |
GET (SSE) | Stream QR code and connection events |
/wa/channels/{id}/status |
PATCH | Internal — sidecar updates channel status |
/wa/incoming |
POST | Internal — sidecar delivers incoming messages |
/chat/hitl/pending |
GET | List all pending HITL approvals across all sessions |
Running the sidecar:
The dev script starts the sidecar automatically alongside the frontend and backend. To run it manually:
cd wa-bridge && npm install && node index.jsEnvironment variables: WA_BRIDGE_PORT (default 3200), FASTAPI_URL (default http://localhost:8001), WA_AUTH_DIR (default wa-bridge/auth/).
Agents and teams can now be assigned an isolated Docker container for safe, persistent code execution. When a sandbox is active, nine built-in tools (sandbox_bash, sandbox_write, sandbox_read, sandbox_ls, sandbox_glob, sandbox_grep, sandbox_delete, sandbox_python, sandbox_node) are automatically injected into the agent's tool set — no manual tool configuration needed.
Start / stop from the dialog — A new Docker Sandbox section in the agent and team edit dialogs lets you enable the sandbox and, when editing an existing agent/team, start or stop the container with a single button click. A pulsing status dot shows whether the container is live.
Sandbox status badge — A live indicator appears in the chat header whenever the selected agent or team has sandbox enabled and an active container.
Automatic file surfacing — When a sandbox is active, a system prompt injection instructs the agent to call sandbox_read after creating or modifying any file and wrap the result in an <artifact> tag. Files produced inside the container appear automatically in the artifact panel with live preview, copy, and download.
New API endpoints:
| Endpoint | Method | Description |
|---|---|---|
/agents/{id}/sandbox/start |
POST | Start a sandbox container for an agent |
/agents/{id}/sandbox/stop |
POST | Stop and remove the agent's container |
/agents/{id}/sandbox/status |
GET | Get current container status |
/teams/{id}/sandbox/start |
POST | Start a sandbox container for a team |
/teams/{id}/sandbox/stop |
POST | Stop and remove the team's container |
/teams/{id}/sandbox/status |
GET | Get current container status |
A dedicated prompt library for storing, organizing, and reusing system prompts across agents.
- Vault page — Browse and manage all saved prompts from a dedicated settings page
- Save from optimizer — When the Prompt Auto-Optimizer proposes an improved prompt, accept it directly into the vault instead of (or in addition to) applying it to the agent
- Apply to agents — Select any saved prompt from the vault when configuring an agent's system prompt
- Full CRUD — Create, edit, and delete vault entries; each entry has a name, description, and prompt body
- MongoDB + SQLite support — Works with both database backends
New API endpoints:
| Endpoint | Method | Description |
|---|---|---|
/prompt-vault |
GET | List all prompts in the vault |
/prompt-vault |
POST | Create a new vault entry |
/prompt-vault/{id} |
GET / PUT / DELETE | Get, update, or delete a vault entry |
/optimizer/runs/{id}/save-to-vault |
POST | Save an optimizer-proposed prompt to the vault |
/optimizer/runs/{id}/update-vault |
POST | Update an existing vault entry with a proposed prompt |
The WhatsApp voice reply engine has been upgraded from Pocket TTS to a full multi-backend TTS pipeline with support for high-quality neural synthesis and per-user voice cloning.
Qwen3-TTS (GPU)
When a CUDA GPU is available, voice replies are synthesised using Qwen3-TTS — Alibaba's state-of-the-art text-to-speech model. It produces natural, expressive audio at 12.5 FPS codec frame rate and supports 9 preset voices across four languages:
| Voice | Language | Character |
|---|---|---|
| Ryan | English | Dynamic male |
| Aiden | English | Sunny American male |
| Vivian | Chinese | Bright young female |
| Serena | Chinese | Warm gentle female |
| Uncle_Fu | Chinese | Low mellow male |
| Dylan | Chinese | Natural male |
| Eric | Chinese | Husky male |
| Ono_Anna | Japanese | Playful female |
| Sohee | Korean | Warm female |
Voice cloning
Users can record a short voice sample (or upload an audio file) directly from the channel settings page. Once saved, the agent replies in the user's own cloned voice using the Qwen3-TTS Base model. The UI includes:
- Guided recording — An in-browser recorder with a phonetically rich script designed to capture enough phoneme coverage for a good clone
- Upload fallback — Alternatively upload any audio file (WAV, MP3, WebM, etc.); the backend normalises it to 16kHz mono WAV via ffmpeg automatically
- Clone cache — Voice clone prompts are computed once from the reference audio and cached in memory; subsequent replies reuse the cached prompt with zero extra latency
- Replace / delete — Replace the sample at any time or remove it to revert to a preset voice
Backend selection per channel
Each channel has an independent tts_backend setting:
| Value | Behaviour |
|---|---|
auto |
Qwen3-TTS if CUDA GPU available, otherwise Pocket TTS → Kokoro |
qwen |
Force Qwen3-TTS (error if GPU not available) |
classic |
Force Pocket TTS → Kokoro (CPU only) |
The CPU fallback chain (Pocket TTS → Kokoro → text) is fully preserved for servers without a GPU.
New API endpoints:
| Endpoint | Method | Description |
|---|---|---|
/wa/channels/{id}/voice-sample |
POST | Upload or record a voice clone sample |
/wa/channels/{id}/voice-sample |
DELETE | Remove the voice clone sample |
Contributions are welcome! Whether it's bug reports, feature requests, or code contributions — all input is valued.
-
Fork the repository
Click the Fork button at the top right of this page.
-
Clone your fork
git clone https://github.com/your-username/obsidian-ai.git cd obsidian-ai -
Create a feature branch
git checkout -b feature/your-feature-name
-
Make your changes and commit
git commit -m "Add your feature description" -
Push and open a Pull Request
git push origin feature/your-feature-name
Open a pull request against the
mainbranch with a clear description of your changes.
Found a bug or have a feature request? Open an issue with as much detail as possible.
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
- Free to use — Use, study, and modify for any purpose.
- Copyleft — If you distribute this software or run it as a network service, you must make the complete source code available under the same AGPL-3.0 terms.
- No additional restrictions — You cannot impose further restrictions on recipients' exercise of the rights granted by this license.
See the LICENSE file for the full terms.
