Skip to content

[FEATURE] Distributed deployment: load balancing, managed deployments, and distributed stats tagging #179

@JerrettDavis

Description

@JerrettDavis

Problem Statement

Today's headroom story assumes one user running one instance on one machine. The moment an organization wants to run headroom as shared infrastructure — behind a load balancer, as a managed service, with centralized observability — there is no cohesive "run headroom as org infra" story. The pieces exist in fragments (TOIN tagging, CCR backends, Redis URL, liveness/readiness from #131) but nothing stitches them into a deployable recipe.

This issue is the umbrella for that work. Three sub-pieces compose into the full story.

Proposed Solution

1. Load balancing

Standardize the hooks a load balancer needs:

  • Health endpoints. Extend existing liveness/readiness from feat: add proxy liveness and readiness healthchecks #131 into a documented contract with separate /live, /ready, /startup paths (Kubernetes conventions) and clear semantics for each.
  • Performance counters. Expose per-instance in-flight requests, queue depth, p95 latency on the Prometheus endpoint (already at headroom/proxy/prometheus_metrics.py).
  • Safe collaboration semantics. Define what "safe" means for concurrent requests across instances — idempotency of TOIN writes, cache-aligner collision behavior, CCR read-through.
  • Pluggable shared state. Reuse HEADROOM_REDIS_URL (already referenced by CCR) for session state so any instance can serve any follow-up request. Document the fallback when Redis is not available.

Result: headroom can sit behind nginx, AWS ALB, Azure Front Door, or a Kubernetes Service with no special-case logic.

2. Managed / shared deployments

Document and support the pattern where one org-operated instance (or load-balanced fleet) serves many developers, instead of every developer running a local proxy:

  • Deployment profile (see existing headroom install apply --preset <x>) for managed-shared.
  • Central configuration surface — settings applied once by the operator, visible to all users (models.json, TOIN policy, caveman mode, detached mode).
  • Central observability — savings tracker and dashboard aggregate across the whole user base.
  • Per-user authentication and accounting (composed with TOIN tagging).

3. Distributed statistics tagging

Extend existing TOIN infrastructure so every request and session is persistently tagged with:

Tags flow into savings tracker, Prometheus, OTEL, and Langfuse export surfaces uniformly.

Use Case

  • Who: organizations deploying headroom as shared infrastructure; platform teams owning agent tooling.
  • How it helps: centralized control, centralized observability, per-user accountability, no per-developer setup.
  • Cost savings: unlocks chargeback, SLO tracking, and lets a platform team enforce compression policies once instead of N times.

Alternatives Considered

  • Per-developer setup only. Works for hobbyists; doesn't scale past small teams.
  • A separate enterprise fork. Fragmenting the project is worse than letting org infra be a first-class deployment profile.

Example API (Optional)

# Operator side
headroom install apply --preset managed-shared
export HEADROOM_REDIS_URL=redis://cache.internal:6379/0
export HEADROOM_CLUSTER_ENABLED=true
export HEADROOM_CLUSTER_ID=acme-prod
headroom proxy

# Developer side — zero local install
export ANTHROPIC_BASE_URL=https://headroom.acme.internal

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions