API Endpoints

All /v1/* and /proxy/* endpoints require Authorization: Bearer <client_token> or Bearer <api_key_token>. API keys support RBAC — roles (admin/user/read-only), agent scoping, and endpoint restrictions. See Secrets for full RBAC configuration.

Endpoint	Auth	Description
`POST /v1/chat/completions`	Bearer	OpenAI-compatible chat with SSE streaming, agent fallback, MCP multi-turn tool loop
`GET /v1/models`	Bearer	Live model catalog from discovered providers + static registry (includes context window, max tokens, capabilities, pricing)
`POST /v1/embeddings`	Bearer	Passthrough proxy with token counting and rate limiting
`POST /v1/responses`	Bearer	Passthrough proxy with URL resolution and retry logic
`POST /v1/images/generations`	Bearer	Image generation (OpenAI-compatible)
`POST /v1/audio/transcriptions`	Bearer	Audio transcription (multipart form-data support)
`POST /v1/audio/speech`	Bearer	Text-to-speech synthesis
`POST /v1/moderations`	Bearer	Content moderation
`POST /v1/rerank`	Bearer	Re-ranking (Cohere/Jina-compatible)
`POST /v1/a2a`	Bearer	Agent-to-Agent protocol (Google A2A)
`GET /v1/files`	Bearer	File listing, upload, retrieval, deletion
`POST /v1/batches`	Bearer	Batch API operations
`POST /proxy/{provider}/*`	Bearer	Arbitrary provider endpoint passthrough (all HTTP methods, SSE streaming auto-detect)
`GET /healthz`	None	Engine health probe (returns OK if gateway is running)
`GET /statsz`	None	Token usage per model, circuit breaker states, MCP server status
`GET /metrics`	None	Prometheus-compatible metrics
`GET /debug/pprof`	Bearer	Go pprof profiling (CPU, memory, goroutines). Requires `debug.pprof_enabled: true` in config

Non-Streaming Chat Completions

POST /v1/chat/completions supports both stream: true (default) and stream: false.

When stream: false, the upstream SSE is buffered into a complete JSON response before returning. All pipeline features (redaction, routing, circuit breaker, MCP loop) work the same way.

Passthrough Proxy

The /proxy/{provider}/* endpoint routes to any provider endpoint:

POST /proxy/anthropic/v1/messages
GET /proxy/gemini/v1/models
POST /proxy/openai/v1/files

See Passthrough Proxy for details.

/healthz

Simple health check for load balancers and orchestration:

curl http://localhost:8080/healthz
# OK

/statsz

Usage statistics and system state:

curl http://localhost:8080/statsz

Returns: per-model request/error/token counters, circuit breaker states, MCP server status, latency data.

/metrics

Prometheus-compatible metrics for monitoring:

curl http://localhost:8080/metrics

Includes: request counts, token usage, latency histograms, circuit breaker states, rate limiter status, overflow guard triggers, MCP active goroutines.

/debug/pprof

Go pprof profiling endpoints for performance analysis:

curl http://localhost:8080/debug/pprof/

Available profiles:

heap - Memory heap sampling
goroutine - Goroutine stack traces
profile - CPU profiling (30s by default)
block - Blocking operations
mutex - Contention analysis

Enable in config:

{
  "debug": {
    "pprof_enabled": true
  }
}

Then use go tool pprof:

go tool pprof http://localhost:8080/debug/pprof/profile

Security: Requires Authorization: Bearer <token> header like all /v1/* endpoints. Only enable in production with proper access controls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

API Endpoints

API Endpoints

Non-Streaming Chat Completions

Passthrough Proxy

/healthz

/statsz

/metrics

/debug/pprof

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally