[FEATURE] Distributed deployment: load balancing, managed deployments, and distributed stats tagging

## Problem Statement

Today's headroom story assumes one user running one instance on one machine. The moment an organization wants to run headroom as shared infrastructure — behind a load balancer, as a managed service, with centralized observability — there is no cohesive "run headroom as org infra" story. The pieces exist in fragments (TOIN tagging, CCR backends, Redis URL, liveness/readiness from #131) but nothing stitches them into a deployable recipe.

This issue is the umbrella for that work. Three sub-pieces compose into the full story.

## Proposed Solution

### 1. Load balancing

Standardize the hooks a load balancer needs:

- **Health endpoints.** Extend existing liveness/readiness from #131 into a documented contract with separate `/live`, `/ready`, `/startup` paths (Kubernetes conventions) and clear semantics for each.
- **Performance counters.** Expose per-instance in-flight requests, queue depth, p95 latency on the Prometheus endpoint (already at `headroom/proxy/prometheus_metrics.py`).
- **Safe collaboration semantics.** Define what "safe" means for concurrent requests across instances — idempotency of TOIN writes, cache-aligner collision behavior, CCR read-through.
- **Pluggable shared state.** Reuse `HEADROOM_REDIS_URL` (already referenced by CCR) for session state so any instance can serve any follow-up request. Document the fallback when Redis is not available.

Result: headroom can sit behind nginx, AWS ALB, Azure Front Door, or a Kubernetes Service with no special-case logic.

### 2. Managed / shared deployments

Document and support the pattern where one org-operated instance (or load-balanced fleet) serves many developers, instead of every developer running a local proxy:

- Deployment profile (see existing `headroom install apply --preset <x>`) for `managed-shared`.
- Central configuration surface — settings applied once by the operator, visible to all users (models.json, TOIN policy, caveman mode, detached mode).
- Central observability — savings tracker and dashboard aggregate across the whole user base.
- Per-user authentication and accounting (composed with TOIN tagging).

### 3. Distributed statistics tagging

Extend existing `TOIN` infrastructure so every request and session is persistently tagged with:

- `user` (authenticated principal)
- `machine` / `hostname` (where the request originated)
- `ip` (with configurable truncation for privacy)
- `agent-type` (claude, codex, gemini, wrap, proxy)
- `cluster-id` (from #177)
- `session-uuid` (from #178)

Tags flow into savings tracker, Prometheus, OTEL, and Langfuse export surfaces uniformly.

## Use Case

- **Who:** organizations deploying headroom as shared infrastructure; platform teams owning agent tooling.
- **How it helps:** centralized control, centralized observability, per-user accountability, no per-developer setup.
- **Cost savings:** unlocks chargeback, SLO tracking, and lets a platform team enforce compression policies once instead of N times.

## Alternatives Considered

- **Per-developer setup only.** Works for hobbyists; doesn't scale past small teams.
- **A separate enterprise fork.** Fragmenting the project is worse than letting org infra be a first-class deployment profile.

## Example API (Optional)

```bash
# Operator side
headroom install apply --preset managed-shared
export HEADROOM_REDIS_URL=redis://cache.internal:6379/0
export HEADROOM_CLUSTER_ENABLED=true
export HEADROOM_CLUSTER_ID=acme-prod
headroom proxy

# Developer side — zero local install
export ANTHROPIC_BASE_URL=https://headroom.acme.internal
```

## Additional Context

- Are you willing to contribute this feature? **No — filing to track, open to discuss implementation.**
- Depends on #175 (filesystem contract), #177 (clustered mode), #178 (shared local operation / session aggregation). This is the logical next layer on top of those.
- Tangentially related to existing #84 (document how to deploy for Claude subscription accounts) — that issue is about auth; this one is about running the proxy as org infra. They likely share documentation surface.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] Distributed deployment: load balancing, managed deployments, and distributed stats tagging #179

Problem Statement

Proposed Solution

1. Load balancing

2. Managed / shared deployments

3. Distributed statistics tagging

Use Case

Alternatives Considered

Example API (Optional)

Additional Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE] Distributed deployment: load balancing, managed deployments, and distributed stats tagging #179

Description

Problem Statement

Proposed Solution

1. Load balancing

2. Managed / shared deployments

3. Distributed statistics tagging

Use Case

Alternatives Considered

Example API (Optional)

Additional Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions