This document covers the why and architecture behind the repo. For setup, deployment commands, and troubleshooting, see runbook.md.
- Why this exists
- Architecture
- How it's built — three pillars
- Deploy ordering: foundation → consumers
- What you can learn from this repo
Databricks shipped a lot of new generative-AI surface area in 2025–2026: Document Intelligence (ai_parse_document, ai_classify, ai_extract), Agent Bricks, AI Gateway, Lakebase, and Databricks Apps. The two source articles for this reference are Databricks' Document Intelligence launch article ("Why Your Agents Can't Read Enterprise Documents") and the Agent Bricks platform article. The reference exists to demonstrate those patterns end to end: parse messy enterprise PDFs into a governed document data layer, then build a governed agent on that enriched layer through Agent Bricks.
This repo is that worked example. Drop a PDF into a governed UC volume; ten minutes later, an analyst can ask cited questions in plain English with end-to-end audit. Document Intelligence prepares the governed source of truth; Knowledge Assistant handles cited document Q&A; Supervisor Agent coordinates document Q&A with structured KPI tools; AI Gateway, Unity Catalog, OBO, Lakebase, and CLEARS provide the governance and operating layer.
It also demonstrates a development workflow: Spec-Kit for spec-driven design, Claude Code with Databricks skill bundles for AI-assisted implementation, six non-negotiable constitution principles that gate every plan. See How it's built.
╔═══════════════════════════════════════════════════════════════════╗
║ pipelines/sql/ (one SQL file per tier) ║
╚═══════════════════════════════════════════════════════════════════╝
raw_filings/ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────────┐
ACME_10K.pdf ──▶ │ bronze_filings │──▶│ silver_parsed_ │──▶│ gold_filing_ │
BETA_10K.pdf │ (raw bytes, │ │ filings (parsed │ │ sections (one │
GAMMA_10K.pdf │ filename, │ │ VARIANT — │ │ row per parsed │
│ ingested_at) │ │ ai_parse_ │ │ $.sections[*]; │
│ │ │ document) │ │ uses full_ │
│ >50MB rejects: │ │ │ │ document when │
│ bronze_filings │ │ Status: ok / │ │ sections absent)│
│ _rejected │ │ partial / error │ │ │
└─────────────────┘ └─────────────────┘ │ gold_filing_kpis │
01_bronze.sql 02_silver_parse │ (typed columns: │
.sql │ segment_revenue │
│ ARRAY<STRUCT…>, │
│ top_risks │
│ ARRAY<STRING>) │
└──────────────────┘
03_gold_classify
_extract.sql
│
▼
┌──────────────────┐
│ gold_filing_ │
│ quality │
│ (5-dim rubric: │
│ parse, layout, │
│ ocr, sections, │
│ kpi → 0-30) │
└──────────────────┘
04_gold_quality.sql
Key idea — "parse once, extract many": PDFs are expensive to parse. Silver runs ai_parse_document exactly once per file and stores the structured result as a VARIANT. Everything downstream — classification, KPI extraction, summarization, quality scoring — reads the parsed output, never the raw bytes. This is a non-negotiable constitution principle.
Triggering: prod runs the pipeline in continuous: true mode so Auto Loader (read_files) reacts to new PDFs in the volume automatically. Demo overrides to continuous: false to avoid a 24/7 cluster during smoke iterations. See resources/foundation/doc_intel.pipeline.yml and the demo override block in databricks.yml.
gold_filing_sections ┌─────────────────────────┐
(governed Delta table) ─────▶ │ Mosaic AI Vector │
│ Search Index │
Filter: embed_eligible=true │ (Delta-Sync — auto- │
Embed column: "summary" │ refreshes when Gold │
│ updates) │
└─────────────────────────┘
Why "summary" not the raw text?
─────────────────────────────
Embedding a 50-page 10-K verbatim is noisy. We embed an LLM-written
summary instead — tighter, more searchable. Constitution principle IV:
"Quality before retrieval."
Ownership note: DAB manages the Vector Search endpoint (resources/foundation/filings_index.yml) and the index-refresh job (resources/consumers/index_refresh.job.yml). The index itself isn't yet a DAB-managed resource type as of CLI 0.298 — jobs/index_refresh/sync_index.py creates the Delta-Sync index on first run and triggers a sync on subsequent runs. The endpoint lives in foundation so first-deploy bootstrap can materialize the index before agent/document_intelligence_agent.py attaches it to Knowledge Assistant.
User question
│
▼
┌─────────────────────────────────────────────┐
│ Agent Bricks Supervisor Agent │
│ - owns routing and orchestration │
│ - runs under UC / AI Gateway governance │
└────────────┬─────────────────────┬──────────┘
│ │
▼ ▼
┌────────────────────────┐ ┌────────────────────────┐
│ Knowledge Assistant │ │ Structured KPI tool │
│ - cited document Q&A │ │ - reads Gold KPI table │
│ - grounded in parsed │ │ - deterministic tables │
│ document layer / VS │ │ for comparisons │
└────────────────────────┘ └────────────────────────┘
│ │
└──────────┬──────────┘
▼
┌──────────────────────┐
│ Agent output → App │
│ final answer, │
│ citations, feedback, │
│ latency, audit │
└──────────────────────┘
Databricks creation path: Create an AI agent → Knowledge Assistant for document Q&A. Supervisor Agent coordinates the Knowledge Assistant and UC function tools.
Repository code is limited to deterministic tool glue, app UI, evals, and deployment scripts.
Concrete Agent Bricks wiring:
agent/document_intelligence_agent.pycreates or updatesdoc-intel-knowledge-${target}as the Knowledge Assistant. Its source is the Vector Search index overgold_filing_sections_indexable, withsummaryas the text column andfilenameas the document URI column.agent/document_intelligence_agent.pycreates or updates the UC SQL function<catalog>.<schema>.lookup_10k_kpis.doc-intel-supervisor-${target}is the Supervisor Agent. Its tools are the Knowledge Assistant and the UC SQL KPI function. Supervisor Agent owns tool routing.- Agent Bricks generates concrete serving endpoint names for Knowledge Assistant and Supervisor Agent. The repo resolves the live Supervisor endpoint with
scripts/resolve-agent-endpoint.shand passes it into DAB asagent_endpoint_name. - Serving endpoint permissions are granted by endpoint ID after the generated endpoint is ready. The Databricks App does not bind to the endpoint as a resource; it invokes the resolved endpoint directly. Prod uses each user's OBO token. Demo uses the App service principal when
DOCINTEL_OBO_REQUIRED=false. - Agent Bricks responses use an OpenAI Responses-style
outputmessage sequence in current validation. The app displays the last output text group as the answer. Knowledge Assistant citations have been observed as markdown footnotes in intermediate messages, soapp/agent_bricks_response.pynormalizes those footnotes into citation chips.
┌──────────────────────────────────────────────────────────────────┐
│ │
│ Databricks App (Streamlit) ← user interacts here │
│ app/app.py │
│ │
│ ┌────────────────┐ ┌──────────────────┐ │
│ │ Chat input box │ │ Citation chips │ │
│ │ Thumbs up/down │ │ Markdown tables │ │
│ └────────┬───────┘ └─────┬────────────┘ │
│ │ │ │
└──────────────│─────────────────│─────────────────────────────────┘
│ │
│ query │ feedback writes
▼ ▼
┌────────────────────────┐ ┌────────────────────────┐
│ Agent Bricks endpoint │ │ Lakebase Postgres │
│ Knowledge Assistant + │ │ ───────────────── │
│ Supervisor Agent │ │ conversation_history │
│ │ │ query_logs │
│ + AI Gateway: │ │ feedback │
│ OBO, permissions, │ │ │
│ audit, rate limits, │ │ (Postgres for tiny │
│ guardrails │ │ per-turn writes — │
│ │ │ Delta isn't great │
│ │ │ at row-by-row) │
└────────────────────────┘ └────────────────────────┘
Target auth modes:
─────────────────
Prod reads `x-forwarded-access-token` from the request and invokes the
Agent Bricks endpoint with the user's identity. AI Gateway and Unity
Catalog enforce identity, permissions, audit, and routing across the
agent, model, tools, and data. User token passthrough is a hard
prerequisite for production.
Demo can set `DOCINTEL_OBO_REQUIRED=false`; the App service principal then
invokes the generated Supervisor endpoint and receives `CAN_QUERY` after
deploy. This is for development workspaces without Apps user-token
passthrough, not for production.
Why Postgres for state? Delta tables are great for analytics but bad at "insert one tiny row per chat turn at high frequency." Lakebase is Databricks's managed Postgres — same governance, right tool for the job.
This repo combines three things: Spec-Kit for spec-driven design, Databricks Asset Bundles + Claude Code skill bundles for declarative platform work, and Claude Code as the implementation surface.
Spec-Kit is a workflow that forces you to write — and clarify — a specification before writing code. Each phase is a slash-command in Claude Code that produces a checked-in artifact:
/speckit-specify → specs/<NNN>/spec.md What & why (no how)
│
▼
/speckit-clarify → appended Q&A in spec.md Resolve ambiguity
│
▼
/speckit-plan → specs/<NNN>/plan.md Tech stack + structure
│ + research.md, data-model.md,
│ contracts/, quickstart.md
▼
/speckit-tasks → specs/<NNN>/tasks.md Dependency-ordered tasks
│
▼
/speckit-analyze → cross-artifact consistency check
│
▼
/speckit-implement → the actual code
.specify/extensions.yml auto-commits at each phase boundary so the trail is clean. .specify/memory/constitution.md defines six non-negotiable principles every plan must respect:
| # | Principle | What it means |
|---|---|---|
| I | Unity Catalog source of truth | Every table, volume, model, index, endpoint lives under <catalog>.<schema> — no DBFS, no workspace-local resources |
| II | Parse once, extract many | ai_parse_document runs once at Silver → VARIANT; everything downstream reads the parsed output |
| III | Declarative over imperative | SDP SQL pipelines, Lakeflow Jobs, DAB resources — no production notebooks |
| IV | Quality before retrieval | 5-dim rubric scores every section; only ≥22/30 reach the index. Embed summary, not raw text |
| V | Eval-gated Agent Bricks | CLEARS scores must clear thresholds before any deploy is considered complete |
| VI | Reproducible deploys | databricks bundle deploy -t <env> recreates the entire stack; demo and prod parity enforced |
When you read specs/001-doc-intel-10k/plan.md you'll see a "Constitution Check" gate that maps each design decision back to the principle it satisfies. When you read specs/001-doc-intel-10k/tasks.md you'll see how each task derives from the plan, and how user-stories (P1, P2, P3) are independently demoable.
Databricks Asset Bundles (DABs) describe most of the workspace state as YAML. One root databricks.yml declares variables and targets (demo, prod); resources/**/*.yml declares each DAB-managed resource (pipeline, jobs, Vector Search endpoint, app, monitor, dashboard, Lakebase instance + catalog). databricks bundle deploy -t demo reconciles workspace state to YAML. The Vector Search index is still created and synced by jobs/index_refresh/sync_index.py until DAB supports index resources directly. Agent Bricks Knowledge Assistant and Supervisor Agent are SDK-managed by agent/document_intelligence_agent.py; DAB only passes the resolved generated Supervisor endpoint into the app through agent_endpoint_name.
This repo was built with Databricks-specific Claude Code skill bundles. Those bundles are distributed by Databricks via the CLI / Claude Code plugin channel and are not vendored in this open-source tree — install them locally if you have access, or reference the canonical Databricks docs (mapping in ../CONTRIBUTING.md).
| Skill bundle | What it provides | Canonical docs |
|---|---|---|
| databricks-core | Auth, profiles, data exploration, bundle basics | docs |
| databricks-dabs | DAB structure, validation, deploy workflow, target separation | docs |
| databricks-pipelines | Lakeflow Spark Declarative Pipelines (ai_parse_document, ai_classify, ai_extract, APPLY CHANGES INTO) |
docs |
| databricks-jobs | Lakeflow Jobs with retries, schedules, table-update / file-arrival triggers | docs |
| databricks-apps | Databricks Apps (Streamlit), App resource bindings | docs |
| databricks-lakebase | Lakebase Postgres instances, branches, computes, endpoint provisioning | docs |
| databricks-agent-bricks | Knowledge Assistant, Supervisor Agent, UC tools, endpoint lifecycle | docs |
Skills are loaded by Claude Code on demand. When you ask Claude to "wire up Vector Search," it should read the Databricks pipeline/model-serving guidance before writing YAML, so the output reflects current Databricks API shapes — not stale training data.
Spec-Kit produces the specs. The Databricks skills provide platform expertise. Claude Code orchestrates both: every phase artifact and every code file in this repo was authored by prompting Claude Code with the spec/plan/tasks as context.
The workflow looks like:
/speckit-specify→ Claude writes spec.md from a natural-language description, you iterate via/speckit-clarifyuntil ambiguity is resolved./speckit-plan→ Claude consults the constitution + Databricks skills, drafts plan.md with research decisions and architecture./speckit-tasks→ Claude generates a dependency-ordered task list grouped by user story (P1, P2, P3)./speckit-implement→ Claude writes the actual SQL/Python/YAML, one task at a time, committing per task.- Operational loops: when the deploy hits unexpected issues (it always does), Claude reads the runbook, fixes the issue, updates the runbook, commits.
AI-driven here means Claude carries the boring parts (boilerplate YAML, retry-loop scripts, dependency analysis) so you spend time on what the spec should say and what the constitution should require.
DABs reconcile workspace resources from YAML, but a fresh workspace has real data dependencies that cannot all be satisfied in one pass:
┌────────────────────────────────────────────────┐
│ What "bundle deploy" tries to create: │
│ │
│ ▸ Pipeline ────┐ │
│ ▸ Tables ────┼──── all need each other │
│ ▸ Vector idx ───┤ │
│ ▸ Agent Bricks ──┤ Monitor wants the │
│ ▸ App config ───┤ KPI table to exist │
│ ▸ App ───┤ BEFORE it can attach │
│ ▸ Monitor ────┘ │
│ ▸ Lakebase ──── │
└────────────────────────────────────────────────┘
App needs the generated Agent Bricks Supervisor endpoint name.
Supervisor needs Knowledge Assistant + UC function tools.
Knowledge Assistant needs the Vector Search index.
Monitor needs the table populated.
Table needs the pipeline to run.
▶ Single `bundle deploy` cannot create the whole stack from scratch.
The repo keeps this ordering explicit by splitting resources by dependency:
resources/
├── foundation/ ← no data deps — deploy first
│ ├── catalog.yml (schema + volume + grants)
│ ├── doc_intel.pipeline.yml
│ ├── retention.job.yml
│ ├── filings_index.yml (VS endpoint)
│ └── lakebase_instance.yml
│
└── consumers/ ← need foundation to be RUNNING and producing data
├── kpi_drift.yml (needs gold_filing_kpis table)
├── index_refresh.job.yml (needs source table)
├── analyst.app.yml (needs Lakebase + generated agent endpoint name)
├── usage.dashboard.yml
└── lakebase_catalog.yml (needs instance AVAILABLE)
scripts/bootstrap-demo.sh is the operational entry point for first bring-up and steady-state demo deploys. It stages foundation resources, materializes the data/index/Agent Bricks dependencies, restores consumer resources, restarts the app, grants access, and runs a smoke query.
The design point is not the script itself; it is that resource dependencies are explicit and repeatable. The exact command flow and failure modes are owned by runbook.md § Deploy paths and runbook.md § Known deploy ordering gaps.
- Wiring
ai_parse_documentinto Lakeflow SDP — pattern for streaming-tables +STREAM(...)views +APPLY CHANGES INTOkeyed on filename. - Scoring document quality before retrieval — five 0–6 dimensions in SQL, threshold filter on the index source.
- Agent Bricks orchestration — Knowledge Assistant for cited document Q&A, Supervisor Agent for orchestration, deterministic KPI tool glue for structured comparisons.
- Grounding an agent with citations — Document Intelligence output and the governed Vector Search / Knowledge Assistant source provide the citation-bearing context.
- Handling DAB deploy ordering — chicken-egg dependencies between heterogeneous resources, solved with a 5-step bootstrap rather than
depends_on(which DAB doesn't reliably honor across resource types). - Gating deploys on MLflow eval —
mlflow.evaluate(model_type="databricks-agent")with documented metric keys, per-axis thresholds, exit-code gate in CI. - End-to-end OBO — Databricks Apps user-token passthrough, Agent Bricks / AI Gateway identity enforcement, UC permissions, and audit verification are production prerequisites.
- Spec-Kit + Claude Code + Databricks skills composing — every artifact in
specs/andpipelines/andagent/was generated through that loop.