Skip to content

databricks-solutions/agent-control-plane

Agent Control Plane

A management and observability platform for AI agents deployed on Databricks — purpose-built for teams operating production-grade agent infrastructure at scale.

The Problem

As enterprises deploy more AI agents, a new operational challenge emerges: who is using what, how is it performing, and who has access?

Agents are created across multiple workspaces by different teams — some via Agent Bricks, some as custom serving endpoints, some as Databricks Apps. There's no single view of what's running, what it costs, or who can access it.

The Solution

Agent Control Plane gives platform and ML teams a single pane of glass over all AI agents running across Databricks workspaces. It auto-discovers agents, tracks costs, surfaces MLflow observability data, and manages access — all from one app.

Built natively on Databricks: Lakebase, system tables, MLflow, Unity Catalog, and Databricks Apps.

Features

Governance

Cost attribution and billing analytics powered by system.billing.usage. Track DBU spend per endpoint, token usage trends, and cost breakdown by SKU across all workspaces in the account. Drill into daily cost trends, product-level breakdown, and top consumers.

Governance

Agents

Auto-discovered agent registry across all workspaces. Finds serving endpoints, Databricks Apps, Genie Spaces, and Agent Bricks (Knowledge Assistants, Multi-Agent Supervisors, Knowledge Inference Engines). View endpoint status, operations metrics, interactive dependency topology, and test agents with an embedded playground.

Agents

AI Gateway

Unified view of all model serving endpoints with usage analytics, token volume charts, per-endpoint and per-user breakdowns. Manage Unity Catalog permissions directly from the UI. Monitor operational metrics (requests, errors, latency), view individual request logs, and inspect rate limits and safety guardrails configured via AI Gateway.

AI Gateway

Knowledge Bases

Combined monitoring for Vector Search and Lakebase. Overview tab shows total cost, daily cost trends by product, and top workspaces by spend. Vector Search tab provides endpoint/index inventory, sync status, health history, and cost attribution by workload type (ingest, serving, storage). Lakebase tab shows instance inventory, compute vs storage cost, and per-workspace breakdown.

Knowledge Bases

Observability

Cross-workspace MLflow experiments, evaluation runs, and traces. Queries system.mlflow.experiments_latest and system.mlflow.runs_latest for account-wide visibility. Filter by workspace, view trace details, and identify which data source (system table or REST API) each record came from.

Observability

Tools

Registry of Unity Catalog functions and MCP servers available to agents. Browse function signatures, descriptions, and catalog locations.

Tools

Workspaces

Multi-workspace federation dashboard. See agent inventory, cost breakdown, cloud provider, and deployment region per workspace. Drill into individual workspaces for detailed agent and cost views.

Workspaces

Admin

Identity and access management with all principals, builders/users breakdown, RBAC matrix, and per-agent permission management. User activity tab shows top users, request distribution, daily active user trends, activity heatmap (24h UTC, Monday-first), and per-user agent usage.

Admin

Architecture

┌─────────────────────────────────────────────────────┐
│  Databricks APIs + System Tables                     │
│  (serving, billing, mlflow, access, apps, genie)     │
└──────────────────────┬──────────────────────────────┘
                       │  Scheduled workflow (every 30 min)
                       ▼
┌─────────────────────────────────────────────────────┐
│  Delta Tables → Lakebase (PostgreSQL)                │
│  (agents, experiments, runs, traces, billing cache)  │
└──────────────────────┬──────────────────────────────┘
                       │  Sub-100ms reads
                       ▼
┌─────────────────────────────────────────────────────┐
│  FastAPI Backend (17 routers, 16 services)          │
│  → React Frontend (TanStack Query + Tailwind)        │
│  → Databricks App (OBO authentication)               │
└─────────────────────────────────────────────────────┘

Key data sources:

  • system.serving.served_entities — cross-workspace agent discovery
  • system.serving.endpoint_usage — per-endpoint request and token metrics
  • system.billing.usage + system.billing.list_prices — cost attribution
  • system.mlflow.experiments_latest / runs_latest — observability
  • system.access.audit — user activity and access patterns
  • Databricks REST APIs — endpoints, apps, genie spaces, traces

Quick Start

# 1. Clone and configure
git clone https://github.com/databricks-solutions/agent-control-plane.git
cd agent-control-plane
make setup

# 2. Edit .env with your Lakebase and workspace details
vi control-plane-app/.env

# 3. Deploy the app
make deploy

# 4. Deploy discovery workflows
make deploy-workflows TARGET=dev

# 5. Trigger first discovery run
make run-workflows TARGET=dev

See the full Installation Guide for detailed setup including Lakebase creation, OBO configuration, and system table grants.

Prerequisites

  • Databricks workspace with Unity Catalog enabled
  • Lakebase instance (PostgreSQL) — for fast dashboard reads
  • SQL warehouse (serverless preferred) — for system table queries
  • Databricks App with User Authorization (OBO) enabled
  • Node.js 18+ and Python 3.10+ — for building and deploying

Project Structure

agent-control-plane/
├── control-plane-app/          # The Databricks App
│   ├── backend/                # FastAPI (Python)
│   │   ├── api/                # 17 route modules
│   │   ├── services/           # Business logic
│   │   ├── models/             # Pydantic schemas
│   │   └── utils/auth.py       # OBO authentication
│   ├── frontend/               # React 18 + TypeScript
│   ├── tests/                  # pytest + Playwright
│   ├── deploy.sh               # Parameterized deploy script
│   ├── grant_sp_permissions.py # SP workspace permission setup
│   └── propagate-sp.sh         # Cross-workspace SP propagation
├── workflows/                  # Databricks Asset Bundles
│   ├── 01_discover_agents.py          # Agent discovery → Delta
│   ├── 02_sync_to_lakebase.py         # All Delta → Lakebase + billing cache
│   ├── 03_discover_knowledge_bases.py # Vector Search + Lakebase + billing → Delta
│   ├── 04_discover_observability.py   # Cross-workspace traces → Delta
│   ├── 05_discover_user_analytics.py  # User activity → Delta
│   ├── 06_discover_gateway_usage.py   # Gateway usage → Delta
│   └── databricks.yml                 # Bundle configuration
├── docs/                       # Documentation
│   ├── installation.md         # Setup guide
│   └── configuration.md        # Config reference
├── setup_lakebase_tables.py    # One-time Lakebase schema setup
└── Makefile                    # Common operations

Documentation

Document Description
Installation Guide Step-by-step setup (Lakebase, App, workflows)
Configuration Reference All env vars, workflow targets, finding your values
Contributing Development setup, code style, PR process
Security Vulnerability reporting, security model
Changelog Version history
Releasing How to create a release

Development

# Start backend (hot reload)
make backend

# Start frontend dev server
make frontend

# Run tests
make test

# Run all checks (Python + TypeScript)
make check

# See all available commands
make help

API

All endpoints are versioned under /api/v1/. Interactive documentation available at /api/v1/docs when the app is running.

Key endpoints:

  • GET /api/v1/agents — discovered agent registry
  • GET /api/v1/billing/page-data — cost attribution dashboard
  • GET /api/v1/mlflow/experiments — MLflow experiments (cross-workspace)
  • GET /api/v1/mlflow/traces — MLflow traces
  • GET /api/v1/gateway/overview — AI Gateway analytics
  • POST /api/v1/agents/sync — trigger agent discovery refresh

License

Apache License 2.0. See LICENSE.

About

Databricks App for cross-workspace AI stack governance. Unified view of agents, serving endpoints, knowledge bases, and AI Gateway usage, backed by system tables + Lakebase with auto-refreshing discovery workflows.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors