Skip to content

Releases: debu-sinha/coco-reference

CoCo v2.0.0 — initial public release

20 Apr 02:07

Choose a tag to compare

CoCo v2.0.0 — initial public release

The first public release of CoCo, a reference implementation of a
natural-language cohort copilot on Databricks. Clone + deploy to your
own Databricks workspace in about 30 minutes.

What's in the box

  • DSPy ReAct agent with native tool calling on Foundation Model API
  • Mosaic AI Agent Framework serving endpoint with typed resource bindings
  • Databricks Apps (FastAPI + HTMX + SSE) front-end
  • Lakebase (managed Postgres) for per-user session state, with short-lived OAuth credentials and a pool-rotation pattern
  • MLflow Managed Prompt Registry with @production alias and evolutionary prompt optimization via mlflow.genai.optimize_prompts + GEPA
  • Unity Catalog for data, registered models, prompts, and UC volumes — everything namespaced per workshop attendee
  • Databricks Vector Search for clinical knowledge RAG, with an auto-synced index over the knowledge chunks table
  • End-to-end preflight check that exercises CREATE SCHEMA, CAN_QUERY on the LLM endpoint, and the Prompt Registry preview flag so deploys do not fail 20 minutes into setup
  • Teardown notebook that removes every per-user resource idempotently. Two attendees running teardown at the same time touch disjoint resources and cannot interfere.

Multi-user isolation

Every per-attendee resource (schema, Lakebase instance, VS endpoint and index, agent endpoint, app, prompts, registered model, MLflow experiment) is auto-namespaced from the workspace username. The only shared resource an attendee touches is the UC catalog itself, which is inherently admin-managed.

Security posture

See docs/SECURITY.md for the full story. Short version: the guardrails in src/coco/agent/guardrails.py are defense-in-depth on top of the app's service principal having read-only, schema-scoped UC grants. The repo ships an adversarial test suite covering multi-statement injection, nested block comments, escaped quotes, and CTE-based bypass attempts.

Deploy

git clone https://github.com/debu-sinha/coco-reference.git
cd coco-reference
python scripts/preflight_check.py -p PROFILE --warehouse-id WH_ID --catalog CATALOG
databricks bundle deploy -t demo -p PROFILE --var unique_id=YOUR_ID --var warehouse_id=WH_ID --var catalog=CATALOG
databricks bundle run setup_workspace -t demo -p PROFILE --var unique_id=YOUR_ID --var warehouse_id=WH_ID --var catalog=CATALOG

Docs

  • README — quick start
  • docs/PERMISSIONS.md — every permission the setup job needs
  • docs/SECURITY.md — threat model and limits
  • docs/ARCHITECTURE.md — full request flow and design decisions
  • docs/design/apps-mosaic-ai-agent-reference.md — canonical field reference capturing every Databricks Apps + Mosaic AI gotcha surfaced during development
  • docs/cost-attribution/ — per-workload cost tracking queries and a draft tagging policy