Design — Databricks Document Intelligence Agent

This document covers the why and architecture behind the repo. For setup, deployment commands, and troubleshooting, see runbook.md.

Why this exists
Architecture
How it's built — three pillars
Deploy ordering: foundation → consumers
What you can learn from this repo

Why this exists

Databricks shipped a lot of new generative-AI surface area in 2025–2026: Document Intelligence (ai_parse_document, ai_classify, ai_extract), Agent Bricks, AI Gateway, Lakebase, and Databricks Apps. The two source articles for this reference are Databricks' Document Intelligence launch article ("Why Your Agents Can't Read Enterprise Documents") and the Agent Bricks platform article. The reference exists to demonstrate those patterns end to end: parse messy enterprise PDFs into a governed document data layer, then build a governed agent on that enriched layer through Agent Bricks.

This repo is that worked example. Drop a PDF into a governed UC volume; ten minutes later, an analyst can ask cited questions in plain English with end-to-end audit. Document Intelligence prepares the governed source of truth; Knowledge Assistant handles cited document Q&A; Supervisor Agent coordinates document Q&A with structured KPI tools; AI Gateway, Unity Catalog, OBO, Lakebase, and CLEARS provide the governance and operating layer.

It also demonstrates a development workflow: Spec-Kit for spec-driven design, Claude Code with Databricks skill bundles for AI-assisted implementation, six non-negotiable constitution principles that gate every plan. See How it's built.

Architecture

Two halves: an offline pipeline, and an online agent

   ╔═══════════════════════════════════════════════════════════════════╗
   ║                  pipelines/sql/  (one SQL file per tier)          ║
   ╚═══════════════════════════════════════════════════════════════════╝

  raw_filings/       ┌─────────────────┐   ┌─────────────────┐   ┌──────────────────┐
  ACME_10K.pdf  ──▶  │  bronze_filings │──▶│ silver_parsed_  │──▶│ gold_filing_     │
  BETA_10K.pdf       │  (raw bytes,    │   │ filings (parsed │   │ sections (one    │
  GAMMA_10K.pdf      │   filename,     │   │ VARIANT —       │   │  row per parsed  │
                     │   ingested_at)  │   │ ai_parse_       │   │  $.sections[*];  │
                     │                 │   │ document)       │   │  uses full_      │
                     │  >50MB rejects: │   │                 │   │  document when   │
                     │  bronze_filings │   │ Status: ok /    │   │  sections absent)│
                     │  _rejected      │   │ partial / error │   │                  │
                     └─────────────────┘   └─────────────────┘   │ gold_filing_kpis │
                          01_bronze.sql       02_silver_parse    │ (typed columns:  │
                                                .sql             │  segment_revenue │
                                                                 │  ARRAY<STRUCT…>, │
                                                                 │  top_risks       │
                                                                 │  ARRAY<STRING>)  │
                                                                 └──────────────────┘
                                                                  03_gold_classify
                                                                  _extract.sql
                                                                          │
                                                                          ▼
                                                                 ┌──────────────────┐
                                                                 │ gold_filing_     │
                                                                 │ quality          │
                                                                 │ (5-dim rubric:   │
                                                                 │  parse, layout,  │
                                                                 │  ocr, sections,  │
                                                                 │  kpi → 0-30)     │
                                                                 └──────────────────┘
                                                                  04_gold_quality.sql

Key idea — "parse once, extract many": PDFs are expensive to parse. Silver runs ai_parse_document exactly once per file and stores the structured result as a VARIANT. Everything downstream — classification, KPI extraction, summarization, quality scoring — reads the parsed output, never the raw bytes. This is a non-negotiable constitution principle.

Triggering: prod runs the pipeline in continuous: true mode so Auto Loader (read_files) reacts to new PDFs in the volume automatically. Demo overrides to continuous: false to avoid a 24/7 cluster during smoke iterations. See resources/foundation/doc_intel.pipeline.yml and the demo override block in databricks.yml.

Vector Search bridges data and agent

   gold_filing_sections           ┌─────────────────────────┐
   (governed Delta table)  ─────▶ │  Mosaic AI Vector       │
                                  │  Search Index           │
   Filter: embed_eligible=true    │  (Delta-Sync — auto-    │
   Embed column: "summary"        │   refreshes when Gold    │
                                  │   updates)              │
                                  └─────────────────────────┘

   Why "summary" not the raw text?
   ─────────────────────────────
   Embedding a 50-page 10-K verbatim is noisy. We embed an LLM-written
   summary instead — tighter, more searchable. Constitution principle IV:
   "Quality before retrieval."

Ownership note: DAB manages the Vector Search endpoint (resources/foundation/filings_index.yml) and the index-refresh job (resources/consumers/index_refresh.job.yml). The index itself isn't yet a DAB-managed resource type as of CLI 0.298 — jobs/index_refresh/sync_index.py creates the Delta-Sync index on first run and triggers a sync on subsequent runs. The endpoint lives in foundation so first-deploy bootstrap can materialize the index before agent/document_intelligence_agent.py attaches it to Knowledge Assistant.

Agent Bricks target runtime

   User question
        │
        ▼
   ┌─────────────────────────────────────────────┐
   │ Agent Bricks Supervisor Agent               │
   │ - owns routing and orchestration            │
   │ - runs under UC / AI Gateway governance     │
   └────────────┬─────────────────────┬──────────┘
                │                     │
                ▼                     ▼
   ┌────────────────────────┐  ┌────────────────────────┐
   │ Knowledge Assistant    │  │ Structured KPI tool    │
   │ - cited document Q&A   │  │ - reads Gold KPI table │
   │ - grounded in parsed   │  │ - deterministic tables │
   │   document layer / VS  │  │   for comparisons      │
   └────────────────────────┘  └────────────────────────┘
                │                     │
                └──────────┬──────────┘
                           ▼
               ┌──────────────────────┐
               │ Agent output → App   │
               │ final answer,        │
               │ citations, feedback, │
               │ latency, audit       │
               └──────────────────────┘

Databricks creation path: Create an AI agent → Knowledge Assistant for document Q&A. Supervisor Agent coordinates the Knowledge Assistant and UC function tools.

Repository code is limited to deterministic tool glue, app UI, evals, and deployment scripts.

Concrete Agent Bricks wiring:

agent/document_intelligence_agent.py creates or updates doc-intel-knowledge-${target} as the Knowledge Assistant. Its source is the Vector Search index over gold_filing_sections_indexable, with summary as the text column and filename as the document URI column.
agent/document_intelligence_agent.py creates or updates the UC SQL function <catalog>.<schema>.lookup_10k_kpis.
doc-intel-supervisor-${target} is the Supervisor Agent. Its tools are the Knowledge Assistant and the UC SQL KPI function. Supervisor Agent owns tool routing.
Agent Bricks generates concrete serving endpoint names for Knowledge Assistant and Supervisor Agent. The repo resolves the live Supervisor endpoint with scripts/resolve-agent-endpoint.sh and passes it into DAB as agent_endpoint_name.
Serving endpoint permissions are granted by endpoint ID after the generated endpoint is ready. The Databricks App does not bind to the endpoint as a resource; it invokes the resolved endpoint directly. Prod uses each user's OBO token. Demo uses the App service principal when DOCINTEL_OBO_REQUIRED=false.
Agent Bricks responses use an OpenAI Responses-style output message sequence in current validation. The app displays the last output text group as the answer. Knowledge Assistant citations have been observed as markdown footnotes in intermediate messages, so app/agent_bricks_response.py normalizes those footnotes into citation chips.

Runtime stack

   ┌──────────────────────────────────────────────────────────────────┐
   │                                                                  │
   │     Databricks App (Streamlit)  ←  user interacts here          │
   │     app/app.py                                                   │
   │                                                                  │
   │     ┌────────────────┐   ┌──────────────────┐                   │
   │     │ Chat input box │   │ Citation chips   │                    │
   │     │ Thumbs up/down │   │ Markdown tables  │                    │
   │     └────────┬───────┘   └─────┬────────────┘                    │
   │              │                 │                                 │
   └──────────────│─────────────────│─────────────────────────────────┘
                  │                 │
                  │ query           │ feedback writes
                  ▼                 ▼
   ┌────────────────────────┐  ┌────────────────────────┐
   │ Agent Bricks endpoint  │  │  Lakebase Postgres     │
   │ Knowledge Assistant +  │  │  ─────────────────      │
   │ Supervisor Agent       │  │  conversation_history   │
   │                        │  │  query_logs             │
   │  + AI Gateway:         │  │  feedback               │
   │    OBO, permissions,   │  │                        │
   │    audit, rate limits, │  │  (Postgres for tiny    │
   │    guardrails          │  │   per-turn writes —    │
   │                        │  │   Delta isn't great    │
   │                        │  │   at row-by-row)       │
   └────────────────────────┘  └────────────────────────┘

   Target auth modes:
   ─────────────────
   Prod reads `x-forwarded-access-token` from the request and invokes the
   Agent Bricks endpoint with the user's identity. AI Gateway and Unity
   Catalog enforce identity, permissions, audit, and routing across the
   agent, model, tools, and data. User token passthrough is a hard
   prerequisite for production.

   Demo can set `DOCINTEL_OBO_REQUIRED=false`; the App service principal then
   invokes the generated Supervisor endpoint and receives `CAN_QUERY` after
   deploy. This is for development workspaces without Apps user-token
   passthrough, not for production.

Why Postgres for state? Delta tables are great for analytics but bad at "insert one tiny row per chat turn at high frequency." Lakebase is Databricks's managed Postgres — same governance, right tool for the job.

How it's built — three pillars

This repo combines three things: Spec-Kit for spec-driven design, Databricks Asset Bundles + Claude Code skill bundles for declarative platform work, and Claude Code as the implementation surface.

Pillar 1 — Spec-Kit (spec-driven development)

Spec-Kit is a workflow that forces you to write — and clarify — a specification before writing code. Each phase is a slash-command in Claude Code that produces a checked-in artifact:

   /speckit-specify   →  specs/<NNN>/spec.md         What & why (no how)
        │
        ▼
   /speckit-clarify   →  appended Q&A in spec.md     Resolve ambiguity
        │
        ▼
   /speckit-plan      →  specs/<NNN>/plan.md         Tech stack + structure
        │              + research.md, data-model.md,
        │                contracts/, quickstart.md
        ▼
   /speckit-tasks     →  specs/<NNN>/tasks.md        Dependency-ordered tasks
        │
        ▼
   /speckit-analyze   →  cross-artifact consistency check
        │
        ▼
   /speckit-implement →  the actual code

.specify/extensions.yml auto-commits at each phase boundary so the trail is clean. .specify/memory/constitution.md defines six non-negotiable principles every plan must respect:

#	Principle	What it means
I	Unity Catalog source of truth	Every table, volume, model, index, endpoint lives under `<catalog>.<schema>` — no DBFS, no workspace-local resources
II	Parse once, extract many	`ai_parse_document` runs once at Silver → VARIANT; everything downstream reads the parsed output
III	Declarative over imperative	SDP SQL pipelines, Lakeflow Jobs, DAB resources — no production notebooks
IV	Quality before retrieval	5-dim rubric scores every section; only ≥22/30 reach the index. Embed `summary`, not raw text
V	Eval-gated Agent Bricks	CLEARS scores must clear thresholds before any deploy is considered complete
VI	Reproducible deploys	`databricks bundle deploy -t <env>` recreates the entire stack; `demo` and `prod` parity enforced

When you read specs/001-doc-intel-10k/plan.md you'll see a "Constitution Check" gate that maps each design decision back to the principle it satisfies. When you read specs/001-doc-intel-10k/tasks.md you'll see how each task derives from the plan, and how user-stories (P1, P2, P3) are independently demoable.

Pillar 2 — Databricks Asset Bundles + the Claude Code skill suite

Databricks Asset Bundles (DABs) describe most of the workspace state as YAML. One root databricks.yml declares variables and targets (demo, prod); resources/**/*.yml declares each DAB-managed resource (pipeline, jobs, Vector Search endpoint, app, monitor, dashboard, Lakebase instance + catalog). databricks bundle deploy -t demo reconciles workspace state to YAML. The Vector Search index is still created and synced by jobs/index_refresh/sync_index.py until DAB supports index resources directly. Agent Bricks Knowledge Assistant and Supervisor Agent are SDK-managed by agent/document_intelligence_agent.py; DAB only passes the resolved generated Supervisor endpoint into the app through agent_endpoint_name.

This repo was built with Databricks-specific Claude Code skill bundles. Those bundles are distributed by Databricks via the CLI / Claude Code plugin channel and are not vendored in this open-source tree — install them locally if you have access, or reference the canonical Databricks docs (mapping in ../CONTRIBUTING.md).

Skill bundle	What it provides	Canonical docs
databricks-core	Auth, profiles, data exploration, bundle basics	docs
databricks-dabs	DAB structure, validation, deploy workflow, target separation	docs
databricks-pipelines	Lakeflow Spark Declarative Pipelines (`ai_parse_document`, `ai_classify`, `ai_extract`, `APPLY CHANGES INTO`)	docs
databricks-jobs	Lakeflow Jobs with retries, schedules, table-update / file-arrival triggers	docs
databricks-apps	Databricks Apps (Streamlit), App resource bindings	docs
databricks-lakebase	Lakebase Postgres instances, branches, computes, endpoint provisioning	docs
databricks-agent-bricks	Knowledge Assistant, Supervisor Agent, UC tools, endpoint lifecycle	docs

Skills are loaded by Claude Code on demand. When you ask Claude to "wire up Vector Search," it should read the Databricks pipeline/model-serving guidance before writing YAML, so the output reflects current Databricks API shapes — not stale training data.

Pillar 3 — Claude Code as the implementation surface

Spec-Kit produces the specs. The Databricks skills provide platform expertise. Claude Code orchestrates both: every phase artifact and every code file in this repo was authored by prompting Claude Code with the spec/plan/tasks as context.

The workflow looks like:

/speckit-specify → Claude writes spec.md from a natural-language description, you iterate via /speckit-clarify until ambiguity is resolved.
/speckit-plan → Claude consults the constitution + Databricks skills, drafts plan.md with research decisions and architecture.
/speckit-tasks → Claude generates a dependency-ordered task list grouped by user story (P1, P2, P3).
/speckit-implement → Claude writes the actual SQL/Python/YAML, one task at a time, committing per task.
Operational loops: when the deploy hits unexpected issues (it always does), Claude reads the runbook, fixes the issue, updates the runbook, commits.

AI-driven here means Claude carries the boring parts (boilerplate YAML, retry-loop scripts, dependency analysis) so you spend time on what the spec should say and what the constitution should require.

Deploy ordering: foundation → consumers

DABs reconcile workspace resources from YAML, but a fresh workspace has real data dependencies that cannot all be satisfied in one pass:

        ┌────────────────────────────────────────────────┐
        │   What "bundle deploy" tries to create:        │
        │                                                │
        │   ▸ Pipeline   ────┐                           │
        │   ▸ Tables     ────┼──── all need each other  │
        │   ▸ Vector idx  ───┤                           │
        │   ▸ Agent Bricks ──┤    Monitor wants the      │
        │   ▸ App config  ───┤    KPI table to exist     │
        │   ▸ App         ───┤    BEFORE it can attach   │
        │   ▸ Monitor    ────┘                           │
        │   ▸ Lakebase   ────                            │
        └────────────────────────────────────────────────┘

   App needs the generated Agent Bricks Supervisor endpoint name.
        Supervisor needs Knowledge Assistant + UC function tools.
              Knowledge Assistant needs the Vector Search index.
                    Monitor needs the table populated.
                          Table needs the pipeline to run.
   ▶ Single `bundle deploy` cannot create the whole stack from scratch.

The repo keeps this ordering explicit by splitting resources by dependency:

   resources/
   ├── foundation/        ← no data deps — deploy first
   │   ├── catalog.yml             (schema + volume + grants)
   │   ├── doc_intel.pipeline.yml
   │   ├── retention.job.yml
   │   ├── filings_index.yml       (VS endpoint)
   │   └── lakebase_instance.yml
   │
   └── consumers/         ← need foundation to be RUNNING and producing data
       ├── kpi_drift.yml         (needs gold_filing_kpis table)
       ├── index_refresh.job.yml (needs source table)
       ├── analyst.app.yml       (needs Lakebase + generated agent endpoint name)
       ├── usage.dashboard.yml
       └── lakebase_catalog.yml  (needs instance AVAILABLE)

scripts/bootstrap-demo.sh is the operational entry point for first bring-up and steady-state demo deploys. It stages foundation resources, materializes the data/index/Agent Bricks dependencies, restores consumer resources, restarts the app, grants access, and runs a smoke query.

The design point is not the script itself; it is that resource dependencies are explicit and repeatable. The exact command flow and failure modes are owned by runbook.md § Deploy paths and runbook.md § Known deploy ordering gaps.

What you can learn from this repo

Wiring ai_parse_document into Lakeflow SDP — pattern for streaming-tables + STREAM(...) views + APPLY CHANGES INTO keyed on filename.
Scoring document quality before retrieval — five 0–6 dimensions in SQL, threshold filter on the index source.
Agent Bricks orchestration — Knowledge Assistant for cited document Q&A, Supervisor Agent for orchestration, deterministic KPI tool glue for structured comparisons.
Grounding an agent with citations — Document Intelligence output and the governed Vector Search / Knowledge Assistant source provide the citation-bearing context.
Handling DAB deploy ordering — chicken-egg dependencies between heterogeneous resources, solved with a 5-step bootstrap rather than depends_on (which DAB doesn't reliably honor across resource types).
Gating deploys on MLflow eval — mlflow.evaluate(model_type="databricks-agent") with documented metric keys, per-axis thresholds, exit-code gate in CI.
End-to-end OBO — Databricks Apps user-token passthrough, Agent Bricks / AI Gateway identity enforcement, UC permissions, and audit verification are production prerequisites.
Spec-Kit + Claude Code + Databricks skills composing — every artifact in specs/ and pipelines/ and agent/ was generated through that loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design — Databricks Document Intelligence Agent

Table of contents

Why this exists

Architecture

Two halves: an offline pipeline, and an online agent

Vector Search bridges data and agent

Agent Bricks target runtime

Runtime stack

How it's built — three pillars

Pillar 1 — Spec-Kit (spec-driven development)

Pillar 2 — Databricks Asset Bundles + the Claude Code skill suite

Pillar 3 — Claude Code as the implementation surface

Deploy ordering: foundation → consumers

What you can learn from this repo

FilesExpand file tree

design.md

Latest commit

History

design.md

File metadata and controls

Design — Databricks Document Intelligence Agent

Table of contents

Why this exists

Architecture

Two halves: an offline pipeline, and an online agent

Vector Search bridges data and agent

Agent Bricks target runtime

Runtime stack

How it's built — three pillars

Pillar 1 — Spec-Kit (spec-driven development)

Pillar 2 — Databricks Asset Bundles + the Claude Code skill suite

Pillar 3 — Claude Code as the implementation surface

Deploy ordering: foundation → consumers

What you can learn from this repo