Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions .github/workflows/doc-drift.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Guards against regression of the provider-count and routing-strategy
# claims in user-facing docs. Lives separately from docs-ci.yml so the
# lychee link-check (which is currently failing on pre-existing
# fragments outside this PR's scope) does not block this lighter check.

name: doc-drift

on:
pull_request:
paths:
- 'docs/**'
- 'llms.txt'
- 'README.md'
- 'scripts/check-doc-drift.sh'
- '.github/workflows/doc-drift.yml'

permissions:
contents: read

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
drift:
name: doc-drift
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- uses: actions/checkout@v4

- name: check doc drift
run: |
set -euo pipefail
if [ ! -x scripts/check-doc-drift.sh ]; then
chmod +x scripts/check-doc-drift.sh
fi
bash scripts/check-doc-drift.sh
6 changes: 3 additions & 3 deletions crates/sbproxy-ai/src/providers/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ mod tests {
let yaml = decompress_embedded().expect("embedded gzip valid");
let catalog: YamlCatalog = serde_yaml::from_str(&yaml).expect("yaml parses");
assert!(
catalog.providers.len() >= 38,
catalog.providers.len() >= 43,
"expected the full default catalog to be embedded; got {}",
catalog.providers.len()
);
Expand All @@ -279,7 +279,7 @@ mod tests {
assert!(names.contains(&"openai".to_string()));
assert!(names.contains(&"anthropic".to_string()));
assert!(names.contains(&"watsonx".to_string()));
assert!(names.len() >= 38);
assert!(names.len() >= 43);
}

#[test]
Expand Down Expand Up @@ -372,6 +372,6 @@ mod tests {
// log and use the embedded set.
let registry = build_registry(Some(Path::new("/dev/null/nope/missing.yml")))
.expect("falls back when override unreadable");
assert!(registry.providers.len() >= 38);
assert!(registry.providers.len() >= 43);
}
}
11 changes: 10 additions & 1 deletion crates/sbproxy-httpkit/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
//! sbproxy-httpkit: HTTP utilities, buffer pool, and compression.
//! sbproxy-httpkit: shared `BytesMut` buffer pool for response body
//! buffering on the proxy hot path.
//!
//! The crate is the public-API entrypoint for plugin authors who need
//! to recycle response-body buffers without paying repeated heap
//! allocation, and is intentionally narrow: today it exposes only
//! [`bufferpool::BufferPool`]. Broader HTTP request/response helpers
//! (header parsing, body limits, compression) live in `sbproxy-core`,
//! `sbproxy-middleware`, and `sbproxy-transport`; they are not
//! re-exported here to keep the public surface stable.

#![forbid(unsafe_code)]
#![warn(missing_docs)]
Expand Down
6 changes: 5 additions & 1 deletion crates/sbproxy-modules/src/auth/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,11 @@ pub enum Auth {
Bearer(BearerAuth),
/// JWT validation (structure + expiry, signature check deferred).
Jwt(JwtAuth),
/// HTTP Digest Authentication (placeholder).
/// HTTP Digest Authentication. Implements the subset of RFC 7616
/// the proxy actually exposes (MD5 digest with `qop=auth`) and
/// tracks the highest accepted nonce-count per nonce so a captured
/// `Authorization` header cannot be replayed. See [`DigestAuth`]
/// for the implementation details.
Digest(DigestAuth),
/// Forward auth to an external service.
ForwardAuth(ForwardAuthProvider),
Expand Down
3 changes: 1 addition & 2 deletions docs-manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
*Last modified: 2026-05-03*

## Summary
- Total files: 78
- Total files: 48
- Keep: 52
- Scrub: 20
- Delete: 6
Expand Down Expand Up @@ -135,7 +135,6 @@

Before release, audit these against current Rust code:

- providers.md: Verify "20 native providers" and "200+ models" counts against `crates/sbproxy-ai/src/providers/`
- configuration.md: Sample config blocks against current schema in `crates/sbproxy-config/`
- build.md: Verify Cargo commands and feature flags against Cargo.toml
- examples: Spot-check 3-5 example configs in `examples/` directory match documented YAML syntax
Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ origins:
## What's in the box

- Reverse proxy: HTTP/1.1, HTTP/2, HTTP/3 (QUIC), WebSocket, gRPC, connection pooling, hot reload.
- AI gateway: 200+ LLM models, 9 routing strategies, OpenAI-compatible API, guardrails, budgets, virtual keys, MCP server.
- AI gateway: 200+ LLM models, 10 routing strategies, OpenAI-compatible API, guardrails, budgets, virtual keys, MCP server.
- Authentication: API key, basic, bearer, JWT, digest, forward auth, noop.
- Policies: rate limiting, IP filter, CEL expressions, WAF, DDoS, CSRF, security headers.
- Transforms: 18 request and response transforms (JSON, HTML, Markdown, CSS, Lua, JavaScript, encoding, and more).
Expand Down
4 changes: 2 additions & 2 deletions docs/ai-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

*Last modified: 2026-05-03*

SBproxy includes an AI gateway that sits between your application and LLM providers. You get one API endpoint with automatic failover, cost tracking, rate limits, and programmable routing across OpenAI, Anthropic, and other providers. The proxy ships with 36 OpenAI-compatible providers plus a native Anthropic translator, and the OpenRouter aggregator routes 200+ more.
SBproxy includes an AI gateway that sits between your application and LLM providers. You get one API endpoint with automatic failover, cost tracking, rate limits, and programmable routing across OpenAI, Anthropic, and other providers. The proxy ships with 43 native providers, including a native Anthropic translator, and the OpenRouter aggregator routes 200+ more.

## Provider setup

Expand All @@ -29,7 +29,7 @@ API keys support environment variable interpolation with `${VAR_NAME}` syntax. N

### Native providers

36 OpenAI-compatible providers ship in-tree alongside a native Anthropic translator and the OpenRouter aggregator (which routes 200+ more models). Direct adapters include `openai`, `anthropic`, `gemini`, `azure`, `bedrock`, `cohere`, `mistral`, `groq`, `deepseek`, `ollama`, `vllm`, `together`, `fireworks`, `perplexity`, `xai`, `sagemaker`, `databricks`, `oracle`, `watsonx`, and `openrouter`.
43 native providers ship in-tree alongside a native Anthropic translator and the OpenRouter aggregator (which routes 200+ more models). Direct adapters include `openai`, `anthropic`, `gemini`, `azure`, `bedrock`, `cohere`, `mistral`, `groq`, `deepseek`, `ollama`, `vllm`, `together`, `fireworks`, `perplexity`, `xai`, `sagemaker`, `databricks`, `oracle`, `watsonx`, and `openrouter`.

For models that are not natively supported, route through `openrouter` (200+ models behind one key) or point a `vllm` or generic OpenAI-compatible provider at a self-hosted endpoint via `base_url`. See `providers.md` for the full per-provider model table.

Expand Down
13 changes: 9 additions & 4 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,13 @@ sbproxy/
header modifiers, error pages, forward rules.
sbproxy-cache/ - Response cache trait, memory backend,
pluggable store interface, cache key partitioning.
sbproxy-security/ - WAF engine (OWASP CRS), DDoS protection, CSRF,
RFC 9421 message signatures, PII masking,
host filter (bloom + HashMap lookup).
sbproxy-security/ - Cross-cutting security primitives: crypto helpers,
host filter (bloom + HashMap lookup), client-IP
extraction with trusted-proxy CIDRs, PII redactor,
SSRF guard, plus optional headless-browser
detection and bot/agent verification helpers.
The WAF, DDoS, CSRF, and security_headers
policies live in sbproxy-modules/src/policy/.
sbproxy-tls/ - TLS termination via rustls 0.23 with the `ring`
crypto provider, ACME auto-cert (Let's Encrypt),
HTTP/3 listener wiring, OCSP stapling.
Expand Down Expand Up @@ -381,7 +385,7 @@ implements `sbproxy_ai::providers::Provider`. The provider list is also driven b
`providers.yaml`, which maps provider names to their base URLs and supported models. Rust
implementations handle request serialization and response normalization.

36 OpenAI-compatible providers ship in-tree alongside a native Anthropic
43 native providers ship in-tree alongside a native Anthropic
translator and the OpenRouter aggregator (which routes 200+ more models).
Direct adapters include OpenAI, Anthropic, Google Gemini, Azure
OpenAI, AWS Bedrock, Cohere, Mistral, DeepSeek, xAI / Grok, Perplexity,
Expand All @@ -402,6 +406,7 @@ adapters (Hugging Face TGI, LM Studio, llama.cpp).
| `cost_optimized` | Lowest score of `connections * 1000 + weight`. Utilization dominates; weight breaks ties in favor of cheaper providers. |
| `token_rate` | Provider with the most remaining tokens-per-minute headroom. |
| `sticky` | Pin a session key to one provider. Falls back to round robin without a session key. |
| `race` | Fan out to every healthy provider in parallel; first non-error response wins, the rest are cancelled. |

### Streaming

Expand Down
4 changes: 2 additions & 2 deletions docs/comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ SBproxy fits when you need a production reverse proxy *and* an AI gateway in the
### vs LiteLLM

LiteLLM is the most popular open-source AI gateway. It supports 100+ LLM providers.
SBproxy reaches 200+ models: 36 OpenAI-compatible providers plus a native Anthropic translator and the OpenRouter aggregator routing 200+ more, alongside a generic OpenAI-compatible adapter for self-hosted or proprietary endpoints.
SBproxy reaches 200+ models: 43 native providers, including a native Anthropic translator and the OpenRouter aggregator routing 200+ more, alongside a generic OpenAI-compatible adapter for self-hosted or proprietary endpoints.

| | SBproxy | LiteLLM |
|---|---------|---------|
| LLM providers | 200+ models (36 OpenAI-compatible + native Anthropic + OpenRouter + generic adapter) | 100+ native |
| LLM providers | 200+ models (43 native providers + OpenRouter aggregator + generic adapter) | 100+ native |
| General HTTP proxy | Yes | No |
| Implementation | Compiled native binary | Python |
| Min resources | 1 CPU, 256 MB | 4 CPU, 8 GB |
Expand Down
19 changes: 13 additions & 6 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,20 +55,25 @@ For AI-specific features in depth, see [ai-gateway.md](ai-gateway.md). For CEL,

SBproxy reads its configuration from a YAML file, typically named `sb.yml`. This file defines how the proxy listens for traffic, which hostnames it handles, and what it does with each request.

Load a config file:
Load a config file. The path must be supplied explicitly; the binary does not auto-discover `sb.yml` in the current directory.

```bash
# Default (looks for sb.yml in current directory)
sbproxy serve
# Explicit path
sbproxy --config /etc/sbproxy/production.yml

# Custom path
# Same thing via the `serve` subcommand and the short flag
sbproxy serve -f /etc/sbproxy/production.yml

# Or via env var for containerised deployments
SB_CONFIG_FILE=/etc/sbproxy/production.yml sbproxy
```

Validate without starting:

```bash
sbproxy validate -c sb.yml
sbproxy validate /etc/sbproxy/production.yml
# or
sbproxy --config /etc/sbproxy/production.yml --check
```

The config has two main sections: `proxy` (server-level settings) and `origins` (per-hostname routing and behavior). Optional shared-state blocks (`l2_cache_settings`, `messenger_settings`) live nested under `proxy`.
Expand Down Expand Up @@ -3130,7 +3135,9 @@ origins:
Check the configuration for errors without starting the proxy:

```bash
sbproxy validate -c sb.yml
sbproxy validate /etc/sbproxy/sb.yml
# or, equivalently, on a running --config invocation
sbproxy --config /etc/sbproxy/sb.yml --check
```

This catches:
Expand Down
3 changes: 2 additions & 1 deletion docs/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ The `ai_proxy` action turns SBproxy into an OpenAI-compatible API gateway. It ac

### Providers

SBproxy ships with 36 OpenAI-compatible providers plus a native Anthropic translator and the OpenRouter aggregator routing 200+ more. Adapters include openai, anthropic, gemini, azure, bedrock, cohere, mistral, groq, deepseek, ollama, vllm, together, fireworks, perplexity, xai, sagemaker, databricks, oracle, watsonx, openrouter, plus three local-runtime adapters (`tgi`, `lmstudio`, `llamacpp`). The `provider_type` field on a provider picks the adapter (when unset, SBproxy infers it from `name`). For models not covered by a native adapter, route through the openrouter provider (200+ models), or use a self-hosted OpenAI-compatible server via the `vllm`/`generic` adapter with a custom `base_url`.
SBproxy ships with 43 native providers, including a native Anthropic translator and the OpenRouter aggregator routing 200+ more. Adapters include openai, anthropic, gemini, azure, bedrock, cohere, mistral, groq, deepseek, ollama, vllm, together, fireworks, perplexity, xai, sagemaker, databricks, oracle, watsonx, openrouter, plus three local-runtime adapters (`tgi`, `lmstudio`, `llamacpp`). The `provider_type` field on a provider picks the adapter (when unset, SBproxy infers it from `name`). For models not covered by a native adapter, route through the openrouter provider (200+ models), or use a self-hosted OpenAI-compatible server via the `vllm`/`generic` adapter with a custom `base_url`.

```yaml
origins:
Expand Down Expand Up @@ -161,6 +161,7 @@ The `routing.strategy` field controls how requests are distributed across provid
| `least_connections` | Route to provider with fewest active requests |
| `token_rate` | Balance by token consumption rate |
| `sticky` | Pin requests to a provider using session/key |
| `race` | Fan out to every healthy provider in parallel; first non-error response wins, the rest are cancelled |

```yaml
action:
Expand Down
2 changes: 1 addition & 1 deletion docs/routing-strategies.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ inventory::submit! {

Once the crate is linked into the proxy binary, the strategy is discoverable by name. Configuration consumes it the same way an enterprise auth plugin would: by referencing the registered name in the load-balancer config and letting `build_routing_strategy` resolve it to an `Arc<dyn RoutingStrategy>`.

The OSS tree ships one trivial built-in strategy, `first-healthy` (`AlwaysFirstHealthyStrategy`), purely as a reference implementation for tests and documentation. Production deployments should continue to use the existing `lb_method` algorithms until the Fail-6 follow-up lands the real LoRA-aware, GPU-aware, and contextual-bandit strategies.
The OSS tree ships two built-in strategies: `first-healthy` (`AlwaysFirstHealthyStrategy`), a reference implementation that always picks the first healthy target, and `lora-aware` (`LoraAwareStrategy`), a production strategy described in detail below. The remaining production strategies (GPU-aware, contextual-bandit) are tracked under Fail-6; until they land, deployments that do not need LoRA affinity should continue to use the existing `lb_method` algorithms.

## LoRA-aware routing

Expand Down
6 changes: 3 additions & 3 deletions llms.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Most teams run nginx or Traefik for HTTP traffic, then route AI requests through

## Key facts

- 200+ LLM models reachable. 20 native provider adapters (OpenAI, Anthropic, Google, Mistral, Cohere, Groq, DeepSeek, and more) plus 200+ models through the OpenRouter aggregator.
- 200+ LLM models reachable. 43 native providers (OpenAI, Anthropic, Google, Mistral, Cohere, Groq, DeepSeek, and more) plus 200+ models through the OpenRouter aggregator.
- Sub-millisecond p99 proxy overhead at 50k+ rps on commodity hardware
- Hot reload with zero dropped connections
- 14-phase compiled pipeline per origin (compiled once, zero per-request allocation)
Expand Down Expand Up @@ -82,7 +82,7 @@ origins:
routing: fallback_chain
```

Routing strategies: round_robin, weighted, fallback_chain, random, lowest_latency, least_connections, cost_optimized, token_rate, sticky.
Routing strategies (10): round_robin, weighted, fallback_chain, random, lowest_latency, least_connections, cost_optimized, token_rate, sticky, race.

Guardrails: pii, injection, jailbreak, toxicity, content_safety, schema, regex.

Expand Down Expand Up @@ -114,7 +114,7 @@ Origin fields:
| Type | Description |
|------|-------------|
| proxy | Reverse proxy to upstream URL |
| ai_proxy | AI gateway with 200+ models (20 native adapters + OpenRouter aggregator), 9 routing strategies, guardrails, budgets |
| ai_proxy | AI gateway with 200+ models (43 native providers + OpenRouter aggregator), 10 routing strategies (round_robin, weighted, fallback_chain, random, lowest_latency, least_connections, cost_optimized, token_rate, sticky, race), guardrails, budgets |
| static | Static response (JSON or text) |
| redirect | HTTP redirect (301/302/307/308) |
| echo | Echo request as JSON (debugging) |
Expand Down
77 changes: 77 additions & 0 deletions scripts/check-doc-drift.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/usr/bin/env bash
#
# scripts/check-doc-drift.sh
#
# Guard against regression of provider-count and routing-strategy claims
# in user-facing docs. Code reality:
#
# - crates/sbproxy-ai/data/ai_providers.yml has 43 entries.
# - crates/sbproxy-ai/src/routing.rs defines 10 routing strategies
# (RoundRobin, Weighted, FallbackChain, Random, LowestLatency,
# LeastConnections, CostOptimized, TokenRate, Sticky, Race).
# - crates/sbproxy-modules/src/action/routing/ ships two built-in
# RoutingStrategy implementations: first-healthy and lora-aware.
#
# The strings below previously appeared in docs and went stale. If any
# reappears, this check fails so the offending PR can fix the count
# before merge.
#
# Usage:
# scripts/check-doc-drift.sh # scan default targets, exit 1 on hit
# scripts/check-doc-drift.sh --root . # explicit repo root
#
# Exit codes:
# 0 no stale strings found
# 1 one or more stale strings found
# 2 invalid CLI usage

set -euo pipefail

ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
while [ $# -gt 0 ]; do
case "$1" in
--root) ROOT_DIR="$2"; shift 2 ;;
-h|--help)
sed -n '1,30p' "$0"
exit 0
;;
*) echo "unknown arg: $1" >&2; exit 2 ;;
esac
done

# Targets we actively police. Adding a new doc surface that should be
# guarded is a one-line addition here.
TARGETS=(
"$ROOT_DIR/docs"
"$ROOT_DIR/llms.txt"
"$ROOT_DIR/README.md"
)

# Substrings that must never reappear. Each entry is a fixed (-F) string
# so YAML / table escapes do not matter.
STALE_STRINGS=(
"20 native"
"9 routing strategies"
"one trivial built-in strategy"
"36 OpenAI-compatible"
)

rc=0
for needle in "${STALE_STRINGS[@]}"; do
for target in "${TARGETS[@]}"; do
[ -e "$target" ] || continue
if hits=$(grep -RFn --binary-files=without-match \
--include='*.md' --include='*.txt' \
-e "$needle" "$target" 2>/dev/null); then
echo "stale string found: '$needle'" >&2
echo "$hits" | sed 's/^/ /' >&2
rc=1
fi
done
done

if [ "$rc" -eq 0 ]; then
echo "doc-drift: ok"
fi

exit "$rc"
Loading
Loading