Use this guide to prove the reference implementation works in a Databricks workspace.
python3 -m py_compile \
agent/tools.py \
app/app.py app/agent_bricks_client.py app/agent_bricks_response.py app/lakebase_client.py \
evals/clears_eval.py agent/document_intelligence_agent.py \
scripts/wait_for_kpis.py samples/synthesize.py
bash -n scripts/bootstrap-demo.sh
pytest agent/tests
databricks bundle validate --strict -t demoExpected prod safety check:
databricks bundle validate --strict -t prodThis should fail unless service_principal_id is provided.
export DOCINTEL_CATALOG=workspace
export DOCINTEL_SCHEMA=docintel_10k_demo
export DOCINTEL_WAREHOUSE_ID=<warehouse-id>
./scripts/bootstrap-demo.shExpected outcomes:
- Foundation resources deploy first.
- Synthetic PDFs upload to the
raw_filingsvolume. - Pipeline creates Gold rows.
- Agent Bricks Knowledge Assistant and Supervisor Agent are created or updated.
- Consumer resources deploy cleanly.
- App config is applied with
bundle run analyst_app, includingDOCINTEL_AGENT_ENDPOINTset from the generated Supervisor endpoint name. - Bootstrap verifies the target auth mode. Demo leaves
user_api_scopesunset and grants the App service principalCAN_QUERY; prod requires OBO scopes. - Smoke query reaches the Agent Bricks supervisor endpoint.
If a prod app deploy fails with Databricks Apps - user token passthrough feature is not enabled, enable the workspace/org feature and rerun. Demo uses app_obo_required=false by default for workspaces where user-token passthrough is not enabled.
SELECT filename, company_name, fiscal_year, revenue, ebitda
FROM <catalog>.<schema>.gold_filing_kpis
ORDER BY filename;
SELECT filename, section_seq, section_label, quality_score
FROM <catalog>.<schema>.gold_filing_sections_indexable
ORDER BY filename, section_seq;Expected:
- ACME, BETA, and GAMMA have KPI rows.
garbage_10K_2024.pdfdoes not appear in the indexable table.
python evals/clears_eval.py \
--endpoint "$(./scripts/resolve-agent-endpoint.sh demo)" \
--dataset evals/dataset.jsonlExpected:
- Correctness, adherence, relevance, execution, safety, and latency thresholds pass.
- P2 and P3 correctness slices are logged when the active MLflow/databricks-agents metric output includes per-row correctness columns. Current 1.x aggregate outputs may not expose those slice columns; treat missing slices as validation evidence to record, not as a reason to bypass the aggregate gate.
- No citations reference
garbage_10K_2024.pdf.
- Open
doc-intel-analyst-demo. - Ask:
What were the top 3 risk factors disclosed by ACME in their FY24 10-K? - Confirm the response has citation chips and the turn is written to Lakebase.
- Ask:
What was ACME's revenue in fiscal year 2024?to verify the structured KPI tool path. - Submit thumbs feedback and confirm a feedback row is written.
- Demo: confirm
user_api_scopesis unset andDOCINTEL_OBO_REQUIRED=false. - Prod: confirm
user_api_scopesis present andDOCINTEL_OBO_REQUIRED=true. - Run for the target being verified:
TARGET=demo AGENT_ENDPOINT_NAME="$(./scripts/resolve-agent-endpoint.sh "$TARGET")" databricks bundle deploy -t "$TARGET" --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" databricks bundle run -t "$TARGET" --var "agent_endpoint_name=${AGENT_ENDPOINT_NAME}" analyst_app
- Confirm bootstrap or CI verifies the target auth mode. Demo grants the App SP endpoint access; prod verifies
serving.serving-endpointsandsqlscopes. - For prod, check audit logs for user-scoped downstream access through Agent Bricks, Knowledge Assistant, and the structured KPI SQL function.
- If prod cannot grant user-token passthrough, deployment is invalid and must fail.
As of 2026-04-26, the demo workspace evidence is:
- Bundle validation passed with the resolved Agent Bricks Supervisor endpoint.
- Document Intelligence Agent deployment succeeded:
- Knowledge Assistant display name:
doc-intel-knowledge-demo - Supervisor display name:
doc-intel-supervisor-demo - UC function:
workspace.docintel_10k_demo.lookup_10k_kpis
- Knowledge Assistant display name:
- Direct Supervisor endpoint smoke passed. The ACME FY2024 revenue question returned
$94.2 billionand referencedACME_10K_2024.pdfthrough the structured KPI path. - Databricks App deploy succeeded in demo App-SP mode:
- App:
doc-intel-analyst-demo - Endpoint:
mas-dc6aba10-endpoint DOCINTEL_OBO_REQUIRED=falseuser_api_scopesunset- App service principal granted
CAN_QUERYon the generated Supervisor endpoint.
- App:
- Lakebase OAuth credential handling was validated without
PGPASSWORD; the deployed app code mints the database password at connection time. - Vector Search
index_refreshjob rerun terminatedSUCCESS. - CLEARS live eval completed but failed the configured gate:
- MLflow run ID:
772e902cab92459f9bf569296fc5f801 - correctness:
0.323 - adherence:
0.000 - relevance/groundedness:
0.516 - safety:
1.000 - execution:
1.000 - latency p95:
31711ms
- MLflow run ID:
Status: Agent Bricks deployment mechanics, demo App deploy, Lakebase OAuth credential handling, and direct serving smoke passed. Reference-ready quality remains open until CLEARS passes. Prod OBO readiness remains open until validated in a workspace with user-token passthrough enabled.