vault-runner

A self-hosted LLM job runner that turns an Obsidian vault into a distributed AI workbench.

Drop a Markdown file into a folder. A worker picks it up, routes it to the right model — local Ollama, Groq, Gemini, Anthropic, or HuggingFace — streams the output back into your vault, and ships a trace to both Grafana and Langfuse. No queues, no brokers, no cloud lock-in — just files, notes, and models you choose.

File-based job queue driven from your note-taking app — every job is a .md file that moves through _queue → _active → _completed.
Multi-provider model routing — local Ollama (Qwen, Gemma), Groq, Gemini, Anthropic, and HuggingFace in one config table; jobs route automatically by the model: field.
Agentic chain jobs — pre-defined multi-step pipelines where each step can call a different model or trigger an action (web search, URL fetch, GitLab push, CI poll).
LangGraph research pipeline — goal → parallel Tavily searches → entity/relation extraction into KuzuDB → synthesised report, all as a single job.
GitLab CI/CD code-gen loop — LLM generates code → pushes to GitLab → CI runs → on failure, LLM sees the log tail and retries, up to N times.
Dual observability — OpenTelemetry traces to Tempo/Grafana for infrastructure metrics, Langfuse for LLM-specific traces (prompts, completions, token counts per call).
MCP-integrated semantic memory — every past output is indexed in MemPalace; new jobs pull relevant context with one YAML flag (use_memory: smart).
Streaming UI — FastAPI + HTMX dashboard with live SSE output tail, job cancellation, template picker, and vault search.

Architecture

flowchart LR
    subgraph Laptop["Laptop · Obsidian"]
        Obsidian[Vault]
    end

    subgraph Sync["Syncthing"]
        SyncFolder[(Shared vault)]
    end

    subgraph VPS["VPS · Tailscale"]
        Queue["_queue → _active → _completed"]
        Runner["runbook.py\npoller + worker pool"]
        WebUI["FastAPI + HTMX UI"]
        Ollama1["Ollama · Qwen 2.5 14B\nQwen Coder 7B"]
        MemPalace["MemPalace MCP"]
        Research["research_runner.py\nLangGraph pipeline"]
    end

    subgraph Home["Home server · GPU"]
        Ollama2["Ollama · Gemma 3 26B"]
    end

    subgraph Cloud["Cloud APIs"]
        Groq["Groq\nLlama 3.3 70B"]
        Gemini["Google Gemini\n2.5 Flash"]
        Anthropic["Anthropic\nClaude Sonnet"]
        HF["HuggingFace\nInference Router"]
    end

    subgraph Obs["Observability"]
        OTel[OTel Collector]
        Tempo["Tempo + Grafana\ninfrastructure traces"]
        Langfuse["Langfuse\nLLM traces"]
        Discord[Discord]
    end

    subgraph Search["Search"]
        Tavily[Tavily API]
        SearXNG["SearXNG\nself-hosted fallback"]
    end

    Obsidian -->|write .md job| SyncFolder
    SyncFolder <-->|bi-dir sync| Queue
    Queue --> Runner
    Runner --> Ollama1
    Runner -.->|routed by model name| Ollama2
    Runner -.->|model_providers config| Groq
    Runner -.->|model_providers config| Gemini
    Runner -.->|model_providers config| Anthropic
    Runner -.->|model_providers config| HF
    Runner <--> MemPalace
    Runner --> Research
    Research --> Tavily
    Research --> SearXNG
    Runner --> Tavily
    Runner --> OTel --> Tempo
    Runner --> Langfuse
    Runner --> Discord
    WebUI <--> Queue
    WebUI <--> MemPalace
    Runner -->|SSE stream| WebUI

See docs/architecture.md for the deep dive.

Screenshots

Web UI (`control.davidcockson.com`)	Grafana / Tempo trace

Monitoring dashboard	Discord failure alerts

How a job flows

Author — write a Markdown file with YAML frontmatter in _queue/ (by hand in Obsidian or via the web UI):

---
type: text
model: llama-3.3-70b-versatile
use_memory: smart
---
Summarise what my vault says about distributed consensus.

Sync — Syncthing replicates the file from laptop to VPS within seconds.
Pick up — the poller moves the file to _active/ and spawns a worker thread.
Route — the model: field is looked up in model_providers; if it maps to a cloud provider (Groq, Gemini, Anthropic, HuggingFace), that API is called. Otherwise, model_runners maps it to an Ollama instance (local VPS or home server GPU).
Enrich — if use_memory: smart, MemPalace generates search queries from the task and injects the top-N relevant past outputs into the prompt.
Execute — the LLM call runs; for local Ollama jobs the UI tails tokens live over SSE.
Trace — the job span lands in Tempo (latency, token counts, model); the LLM call lands in Langfuse (prompt, completion, cost estimate).
Land — output is written to runner-outputs/<job-id>-output.md, the job moves to _completed/, the output is indexed back into MemPalace, and Discord gets a completion ping.

Job types

Type	What it does
`text`	Single prompt → single completion. Streams to the UI live.
`vision`	Prompt + image → completion (multimodal Ollama models).
`staged`	Multi-step checklist; each step accumulates context from the last. One job file.
`chain`	Pre-defined pipeline of steps, each step a separate job file (full trace per step). Steps can mix models and actions.
`chain_planner`	You give a goal; the LLM generates the step list, then executes it.
`research`	LangGraph pipeline: parallel Tavily searches → entity/relation extraction into KuzuDB → synthesised report.

Full reference: docs/job-types.md.

Chain actions

Chain steps can call actions instead of (or in addition to) an LLM. This is what makes the GitLab code-gen loop and research pipeline possible:

Action	What it does
`search`	Tavily web search; results injected as context for the next step. Falls back to SearXNG if Tavily key is absent.
`fetch`	Fetches a URL and extracts the main content (via `trafilatura`); result injected as context.
`gitlab_push`	Creates or updates a file in a GitLab repo and opens an MR from the previous step's output.
`gitlab_ci_poll`	Waits for the CI pipeline to go green. On failure: sends the log tail to the LLM for a fix, re-pushes, and retries up to `ci_max_retries` times.

Example — LLM code-gen loop:

---
type: chain
model: qwen2.5-coder:7b
chain:
  - prompt: "Write a Python script that parses nginx access logs and reports the top 10 IPs by request count."
    model: qwen2.5-coder:7b
  - action: gitlab_push
  - action: gitlab_ci_poll
---

Tech stack

Component	Choice	Why
Queue	Filesystem (`_queue → _active → _completed`)	Obsidian is already the UI; Syncthing handles replication; no broker to operate.
Worker	`runbook.py` — threaded poller	Simple, debuggable, restartable. Cancellation registry lets the UI kill in-flight jobs.
API / UI	FastAPI + HTMX + SSE	Server-rendered HTML with live token streaming, no SPA build step.
Local LLMs	Ollama — Qwen 2.5 14B / Coder 7B, Gemma 3 26B	No per-token costs; data stays on hardware I control.
Cloud LLMs	Groq (Llama 3.3 70B), Gemini 2.5 Flash, Anthropic Claude, HuggingFace	Routed by model name via a single config table; no code changes to switch.
Search	Tavily API (primary), SearXNG self-hosted (fallback)	Tavily reaches the open web from any server; SearXNG for airgapped/private search.
Research pipeline	LangGraph + KuzuDB	Multi-hop agentic search with entity/relation graph persistence between runs.
Memory	MemPalace (bundled in ./mempalace) over MCP	Vector search over every job output + book corpus; queryable from Claude Code via MCP too.
LLM observability	Langfuse (self-hosted)	Per-call traces: prompt, completion, token counts, model, cost estimate.
Infra observability	OpenTelemetry → Tempo + Grafana	Job lifecycle spans, LLM call latency, success rate dashboards.
Transport	Tailscale + Cloudflare Tunnel	Zero-trust mesh between machines; public UI without opening firewall ports.
CI/CD	GitLab CI (live deployment) + GitHub Actions (this repo)	87 pytest tests gate every merge; Bandit + pip-audit on every pipeline.

Run it yourself

git clone https://github.com/davidcockson-compliance/vault-runner.git
cd vault-runner/runner
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp config.example.yaml config.yaml   # edit paths, models, providers
cp .env.example .env                 # add API keys for any cloud providers you want
python runbook.py &
uvicorn web:app --host 0.0.0.0 --port 8000

Minimum viable setup needs only Ollama and a vault folder — cloud providers, Langfuse, GitLab integration, and Tavily are all optional and gated by config flags.

Full setup (Syncthing, systemd units, MemPalace, OTel, Langfuse) is in docs/deployment.md.

Project status

Live in production, handling jobs daily across a VPS (Qwen 2.5 14B) and a home-lab GPU server (Gemma 3 26B), with cloud API overflow to Groq and Gemini.

✅ Phase 1 — file-based queue, Ollama, OTel traces, Discord alerts
✅ Phase 2 — multi-step staged and chain jobs
✅ Phase 3 — multi-machine Ollama routing with health-check failover
✅ Phase 4 — MemPalace semantic memory (pre-job injection + post-job indexing)
✅ Phase 5 — streaming web UI (FastAPI + HTMX + SSE), job cancellation, template picker
✅ Phase 6 — multi-cloud provider routing (Groq, Gemini, Anthropic, HuggingFace)
✅ Phase 7 — GitLab CI/CD code-gen loop (generate → push → CI → LLM fix on failure → retry)
✅ Phase 8 — Langfuse LLM tracing; Tavily search replacing SearXNG as primary
✅ Phase 9 — LangGraph research pipeline with KuzuDB knowledge graph

Roadmap: tool-calling / skills framework, job scheduling (run_at: frontmatter), parallel chain steps, resume-from-checkpoint on crash.

Repo layout

vault-runner/
├── runner/             core poller (runbook.py), web UI, research pipeline, tests (87 pytest)
├── mempalace/          MemPalace MCP server consumed by vault-runner and Claude Code
├── vault-example/      minimal vault so you can try it immediately
└── docs/               architecture, deployment, job-types, integrations

Built by Dave

Production instance: control.davidcockson.com (private — see screenshots)
MIT licensed — see LICENSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vault-runner

Architecture

Screenshots

How a job flows

Job types

Chain actions

Tech stack

Run it yourself

Project status

Repo layout

Built by Dave

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
mempalace		mempalace
runner		runner
vault-example		vault-example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

vault-runner

Architecture

Screenshots

How a job flows

Job types

Chain actions

Tech stack

Run it yourself

Project status

Repo layout

Built by Dave

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages