Production-ready OpenAI-compatible proxy for NVIDIA AI with streaming, tool calls, CORS, connection pooling, and resilient background architecture.
- Full OpenAI API Compatibility: Drop-in replacement for OpenAI endpoints.
- Streaming (SSE) Support: Reliable real-time chunk streaming.
- Tool Calling: Full parallel tool call extraction and bridging.
- Resilience: Connection pooling, upstream retries (with jittered backoff), and background log retention.
- Observability: Structured JSON logging, SQLite request/response auditing, token tracking, and
/healthz//readyzprobes. - Security: In-flight masking of sensitive API keys within logs and explicit model allow-listing.
- Cross-platform & Standalone: Available as zero-dependency Linux and Windows executables, Docker containers, and a pip-installable package.
Run the latest image directly from the GitHub Container Registry:
docker run -d \
-p 8080:8089 \
-e CUSTOM_API_KEY="nvapi-..." \
-v $(pwd)/data:/app/data \
ghcr.io/unn-known1/nvidia-ai-gateway:latestDownload from Releases:
chmod +x nvidia-ai-gateway-linux-amd64
export CUSTOM_API_KEY="nvapi-..."
./nvidia-ai-gateway-linux-amd64 --port 8080git clone https://github.com/unn-Known1/NVIDIA-AI-Gateway.git
cd NVIDIA-AI-Gateway
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m gateway --port 8080You can configure the gateway using environment variables, an .env file in the working directory, or by passing a config file (--config config.ini).
CUSTOM_API_KEY(required): Your NVIDIA API key (nvapi-...).CUSTOM_BASE_URL: NVIDIA API base URL (default:https://integrate.api.nvidia.com/v1).CUSTOM_MODEL_ID: Default model if none is specified (default:stepfun-ai/step-3.5-flash).ALLOWED_MODELS: Comma-separated list of models the client is allowed to request, or*to allow any (default:*).GATEWAY_PORT: Port to listen on (default:8089).
UPSTREAM_RETRIES: Number of times to retry idempotent upstream requests on 5xx/429 errors (default:3).LOG_RETENTION_DAYS: Background worker cleans up SQLite logs older than this (default:30).LOG_REQUEST_BODIES: Log full request/response bodies to DB (true/false). Secrets are automatically redacted.
All endpoints are fully OpenAI-compatible:
POST /v1/chat/completions(streaming & non-streaming)POST /v1/completionsPOST /v1/embeddingsGET /v1/models
GET /healthz(liveness probe)GET /readyz(readiness probe - validates DB & upstream API)GET /gateway/statusGET /gateway/statsGET /gateway/logs
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="sk-gateway-..." # Grab this from the gateway startup banner!
)
response = client.chat.completions.create(
model="stepfun-ai/step-3.5-flash",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")The NVIDIA AI Gateway is designed to be lightweight yet highly concurrent:
- Web Server: Uses Flask served via a background
werkzeugWSGI server, capable of handling SSE streams cleanly. - Database: Uses SQLite with
WAL(Write-Ahead Logging) mode and thread-local connections to allow concurrent non-blocking reads/writes. - Connection Pooling: A global
requests.Sessionpool is maintained for upstream requests, eliminating TLS handshake overhead on every API call. - Background Workers: A dedicated daemon thread (
log_retention_worker) routinely vacuums the database to prevent unbounded disk usage.
nvidia-ai-gateway/
├── src/gateway/ # Main package
│ ├── __init__.py
│ └── __main__.py # Application entry point
├── scripts/ # Platform-specific installers and launchers
├── .github/workflows/ # CI/CD pipelines (Automated release builds)
├── pyproject.toml # Package configuration
└── requirements.txt # Pinned dependencies
# Basic connectivity test
export CUSTOM_API_KEY="nvapi-..."
python -m gateway &
curl http://localhost:8080/healthz
curl http://localhost:8080/readyzProblem: python3 -m venv fails with an ensurepip error.
Solution: Re-run ./scripts/install.sh. It automatically detects broken environments, uses the --without-pip flag, and bootstraps pip manually.
If port 8089 (or your configured port) is occupied:
export GATEWAY_PORT=8081
python -m gatewayApache License 2.0 - see LICENSE file.