Skip to content

API Endpoints

Rafael Gumieri edited this page May 15, 2026 · 4 revisions

API Endpoints

All /v1/* and /proxy/* endpoints require Authorization: Bearer <client_token> or Bearer <api_key_token>. API keys support RBAC — roles (admin/user/read-only), agent scoping, and endpoint restrictions. See Secrets for full RBAC configuration.

Endpoint Auth Description
POST /v1/chat/completions Bearer OpenAI-compatible chat with SSE streaming, agent fallback, MCP multi-turn tool loop
GET /v1/models Bearer Live model catalog from discovered providers + static registry (includes context window, max tokens, capabilities, pricing)
POST /v1/embeddings Bearer Passthrough proxy with token counting and rate limiting
POST /v1/responses Bearer Passthrough proxy with URL resolution and retry logic
POST /v1/images/generations Bearer Image generation (OpenAI-compatible)
POST /v1/audio/transcriptions Bearer Audio transcription (multipart form-data support)
POST /v1/audio/speech Bearer Text-to-speech synthesis
POST /v1/moderations Bearer Content moderation
POST /v1/rerank Bearer Re-ranking (Cohere/Jina-compatible)
POST /v1/a2a Bearer Agent-to-Agent protocol (Google A2A)
GET /v1/files Bearer File listing, upload, retrieval, deletion
POST /v1/batches Bearer Batch API operations
POST /proxy/{provider}/* Bearer Arbitrary provider endpoint passthrough (all HTTP methods, SSE streaming auto-detect)
GET /healthz None Engine health probe (returns OK if gateway is running)
GET /statsz None Token usage per model, circuit breaker states, MCP server status
GET /metrics None Prometheus-compatible metrics
GET /debug/pprof Bearer Go pprof profiling (CPU, memory, goroutines). Requires debug.pprof_enabled: true in config

Non-Streaming Chat Completions

POST /v1/chat/completions supports both stream: true (default) and stream: false.

When stream: false, the upstream SSE is buffered into a complete JSON response before returning. All pipeline features (redaction, routing, circuit breaker, MCP loop) work the same way.

Passthrough Proxy

The /proxy/{provider}/* endpoint routes to any provider endpoint:

POST /proxy/anthropic/v1/messages
GET /proxy/gemini/v1/models
POST /proxy/openai/v1/files

See Passthrough Proxy for details.

/healthz

Simple health check for load balancers and orchestration:

curl http://localhost:8080/healthz
# OK

/statsz

Usage statistics and system state:

curl http://localhost:8080/statsz

Returns: per-model request/error/token counters, circuit breaker states, MCP server status, latency data.

/metrics

Prometheus-compatible metrics for monitoring:

curl http://localhost:8080/metrics

Includes: request counts, token usage, latency histograms, circuit breaker states, rate limiter status, overflow guard triggers, MCP active goroutines.

/debug/pprof

Go pprof profiling endpoints for performance analysis:

curl http://localhost:8080/debug/pprof/

Available profiles:

  • heap - Memory heap sampling
  • goroutine - Goroutine stack traces
  • profile - CPU profiling (30s by default)
  • block - Blocking operations
  • mutex - Contention analysis

Enable in config:

{
  "debug": {
    "pprof_enabled": true
  }
}

Then use go tool pprof:

go tool pprof http://localhost:8080/debug/pprof/profile

Security: Requires Authorization: Bearer <token> header like all /v1/* endpoints. Only enable in production with proper access controls.

See Also

Getting Started

Core Concepts

Reference

Operations

  • Demo — Test all pipeline tiers
  • Troubleshooting — Common issues and solutions
  • FAQ — Frequently asked questions
  • Security — Security policy and vulnerability reporting

Project

Clone this wiki locally