A management and observability platform for AI agents deployed on Databricks — purpose-built for teams operating production-grade agent infrastructure at scale.
As enterprises deploy more AI agents, a new operational challenge emerges: who is using what, how is it performing, and who has access?
Agents are created across multiple workspaces by different teams — some via Agent Bricks, some as custom serving endpoints, some as Databricks Apps. There's no single view of what's running, what it costs, or who can access it.
Agent Control Plane gives platform and ML teams a single pane of glass over all AI agents running across Databricks workspaces. It auto-discovers agents, tracks costs, surfaces MLflow observability data, and manages access — all from one app.
Built natively on Databricks: Lakebase, system tables, MLflow, Unity Catalog, and Databricks Apps.
Cost attribution and billing analytics powered by system.billing.usage. Track DBU spend per endpoint, token usage trends, and cost breakdown by SKU across all workspaces in the account. Drill into daily cost trends, product-level breakdown, and top consumers.
Auto-discovered agent registry across all workspaces. Finds serving endpoints, Databricks Apps, Genie Spaces, and Agent Bricks (Knowledge Assistants, Multi-Agent Supervisors, Knowledge Inference Engines). View endpoint status, operations metrics, interactive dependency topology, and test agents with an embedded playground.
Unified view of all model serving endpoints with usage analytics, token volume charts, per-endpoint and per-user breakdowns. Manage Unity Catalog permissions directly from the UI. Monitor operational metrics (requests, errors, latency), view individual request logs, and inspect rate limits and safety guardrails configured via AI Gateway.
Combined monitoring for Vector Search and Lakebase. Overview tab shows total cost, daily cost trends by product, and top workspaces by spend. Vector Search tab provides endpoint/index inventory, sync status, health history, and cost attribution by workload type (ingest, serving, storage). Lakebase tab shows instance inventory, compute vs storage cost, and per-workspace breakdown.
Cross-workspace MLflow experiments, evaluation runs, and traces. Queries system.mlflow.experiments_latest and system.mlflow.runs_latest for account-wide visibility. Filter by workspace, view trace details, and identify which data source (system table or REST API) each record came from.
Registry of Unity Catalog functions and MCP servers available to agents. Browse function signatures, descriptions, and catalog locations.
Multi-workspace federation dashboard. See agent inventory, cost breakdown, cloud provider, and deployment region per workspace. Drill into individual workspaces for detailed agent and cost views.
Identity and access management with all principals, builders/users breakdown, RBAC matrix, and per-agent permission management. User activity tab shows top users, request distribution, daily active user trends, activity heatmap (24h UTC, Monday-first), and per-user agent usage.
┌─────────────────────────────────────────────────────┐
│ Databricks APIs + System Tables │
│ (serving, billing, mlflow, access, apps, genie) │
└──────────────────────┬──────────────────────────────┘
│ Scheduled workflow (every 30 min)
▼
┌─────────────────────────────────────────────────────┐
│ Delta Tables → Lakebase (PostgreSQL) │
│ (agents, experiments, runs, traces, billing cache) │
└──────────────────────┬──────────────────────────────┘
│ Sub-100ms reads
▼
┌─────────────────────────────────────────────────────┐
│ FastAPI Backend (17 routers, 16 services) │
│ → React Frontend (TanStack Query + Tailwind) │
│ → Databricks App (OBO authentication) │
└─────────────────────────────────────────────────────┘
Key data sources:
system.serving.served_entities— cross-workspace agent discoverysystem.serving.endpoint_usage— per-endpoint request and token metricssystem.billing.usage+system.billing.list_prices— cost attributionsystem.mlflow.experiments_latest/runs_latest— observabilitysystem.access.audit— user activity and access patterns- Databricks REST APIs — endpoints, apps, genie spaces, traces
# 1. Clone and configure
git clone https://github.com/databricks-solutions/agent-control-plane.git
cd agent-control-plane
make setup
# 2. Edit .env with your Lakebase and workspace details
vi control-plane-app/.env
# 3. Deploy the app
make deploy
# 4. Deploy discovery workflows
make deploy-workflows TARGET=dev
# 5. Trigger first discovery run
make run-workflows TARGET=devSee the full Installation Guide for detailed setup including Lakebase creation, OBO configuration, and system table grants.
- Databricks workspace with Unity Catalog enabled
- Lakebase instance (PostgreSQL) — for fast dashboard reads
- SQL warehouse (serverless preferred) — for system table queries
- Databricks App with User Authorization (OBO) enabled
- Node.js 18+ and Python 3.10+ — for building and deploying
agent-control-plane/
├── control-plane-app/ # The Databricks App
│ ├── backend/ # FastAPI (Python)
│ │ ├── api/ # 17 route modules
│ │ ├── services/ # Business logic
│ │ ├── models/ # Pydantic schemas
│ │ └── utils/auth.py # OBO authentication
│ ├── frontend/ # React 18 + TypeScript
│ ├── tests/ # pytest + Playwright
│ ├── deploy.sh # Parameterized deploy script
│ ├── grant_sp_permissions.py # SP workspace permission setup
│ └── propagate-sp.sh # Cross-workspace SP propagation
├── workflows/ # Databricks Asset Bundles
│ ├── 01_discover_agents.py # Agent discovery → Delta
│ ├── 02_sync_to_lakebase.py # All Delta → Lakebase + billing cache
│ ├── 03_discover_knowledge_bases.py # Vector Search + Lakebase + billing → Delta
│ ├── 04_discover_observability.py # Cross-workspace traces → Delta
│ ├── 05_discover_user_analytics.py # User activity → Delta
│ ├── 06_discover_gateway_usage.py # Gateway usage → Delta
│ └── databricks.yml # Bundle configuration
├── docs/ # Documentation
│ ├── installation.md # Setup guide
│ └── configuration.md # Config reference
├── setup_lakebase_tables.py # One-time Lakebase schema setup
└── Makefile # Common operations
| Document | Description |
|---|---|
| Installation Guide | Step-by-step setup (Lakebase, App, workflows) |
| Configuration Reference | All env vars, workflow targets, finding your values |
| Contributing | Development setup, code style, PR process |
| Security | Vulnerability reporting, security model |
| Changelog | Version history |
| Releasing | How to create a release |
# Start backend (hot reload)
make backend
# Start frontend dev server
make frontend
# Run tests
make test
# Run all checks (Python + TypeScript)
make check
# See all available commands
make helpAll endpoints are versioned under /api/v1/. Interactive documentation available at /api/v1/docs when the app is running.
Key endpoints:
GET /api/v1/agents— discovered agent registryGET /api/v1/billing/page-data— cost attribution dashboardGET /api/v1/mlflow/experiments— MLflow experiments (cross-workspace)GET /api/v1/mlflow/traces— MLflow tracesGET /api/v1/gateway/overview— AI Gateway analyticsPOST /api/v1/agents/sync— trigger agent discovery refresh
Apache License 2.0. See LICENSE.







